How Crowdsourcing Feeds Hungry Big Data Apps

by: Bart De Lathouwer and Ron Exler


Data collection is a bottleneck for many enterprises. To deal with this issue, many firms use crowdsourcing: engaging large groups of people to provide the data.

P.T. Barnum said, “Every crowd has a silver lining.” In daily life, crowds often seem undesirable—traffic jams, long lines and noise. Yet crowds can bring benefits when they work together for positive change.

Enterprises often face complicated and large-scale decision challenges for which they need to collect many data points quickly. Using big data and predictive analytics tools with cloud platforms, enterprises can now store and analyze massive amounts of data. Big data solutions also accelerate computations so that analyses that once took hours are now done in seconds. Today, data collection is the bottleneck for many enterprise decision-making processes.

To overcome the collection bottleneck, enterprises are beginning to use crowdsourcing: engaging large groups of people to provide the needed data. Crowdsourcing accelerates data collection for enterprise applications and often involves collecting and managing geospatial data.

Sometimes, people volunteer their efforts to actively provide information, while, at other times, companies collect information in the background via Website tracking, for example. However, due to positioning device inaccuracies, place-name inconsistency and variable observer skills, volunteered geographic information (VGI) requires quality authentication.

Although human verification of crowd-sourced data improves quality, it is also time-consuming and difficult to manage. Therefore, enterprises benefit from automation to supplement quality assurance efforts.

Effective use of standards can simplify the process of conflation: unifying multiple separate sources of data into one integrated, all-encompassing result. Standards also can simplify quality assurance of multiple types and sources of geospatial data when included in the automation processes.

The Open Geospatial Consortium is an international industry consortium of more than 474 companies, government agencies and universities participating in a consensus process to develop publicly available interface standards. OGC standards support interoperable solutions that “geo-enable” the Web, wireless and location-based services and mainstream IT.

Therefore, OGC standards can be central to crowdsourcing initiatives. Several successes illustrate the benefits of using OGC standards in volunteered geospatial information.

Getting Citizens Involved

The COBWEB (Citizen OBservatory WEB) is establishing a testbed environment that will enable citizens living within the Biosphere Reserves in Wales, Germany and Greece to collect environmental data using mobile devices. COBWEB collection apps on mobile phones offer observers the ability to collect and send new environmental data, such as photos of vegetation, insects and wildlife.

The apps also help address quality through robust observer authentication and metadata collection. For example, one technique useful in collection of volunteered geographic information is “interactive direction of the observer,” in which the observer can be challenged with questions during the recording of observations, elevating the quality of the data.

After collection, COBWEB uses analytics to conflate crowd-sourced data with professionally gathered data to produce higher-quality data. Standards are important to data consistency, and OGC standards make it much easier to conflate different types of geospatial data, including point, raster, vector, point clouds and urban 3D models.

In addition, OGC standards such as Web Feature Service (WFS), Web Processing Service (WPS) and Web Map Service (WMS) generate geographic information systems layers from geographic point data collections. In the GIS, quality assurance analysts and perhaps even probabilistic algorithms can compare the newly observed data to existing professionally gathered data.

Crowds Populate Smart Cities

Citizen engagement is critical to improving services and living conditions. One result is that applications of crowdsourcing data collection extend into urban areas. The Netherlands and Berlin use open OGC’s CityGML—encoded 3D models that are part of their Spatial Data Infrastructures, which are good frameworks on which to collect many kinds of volunteered geographic information.

Similarly, CITI-SENSE encompasses the development of a sensor-based Citizens’ Observatory Community for improving the quality of life in cities. Started in 2012, 28 partner organizations created community-based environmental monitoring and information systems using state-of-the-art Earth observation applications. Across Europe, Israel, South Korea and Australia, CITI-SENSE contributes to the Global Earth Observation System of Systems (GEOSS), which provides common methodologies and standards for scientific approaches and data management of Earth data.

Citizens participating in CITI-SENSE use mobile sensor stations based on smartphones to help collect outdoor and indoor air-quality data. Collected data moves to a database using the open standard OGC WFS. Data processing services include the OGC WPS, which provides rules for inputs and outputs (requests and responses) for geospatial processing services, such as polygon overlay.

The OGC is working with its members to expand the testing and uses of standards in crowdsourcing. OGC Testbeds involve government, private sector and university organizations collaborating in a rapid prototyping activity to develop standards and best practices.

In OGC Testbed 9, the Cross-Community Interoperability (CCI) thread, participants advanced semantic mediation approaches that allowed conflating VGI data with U.S. Geological Survey (USGS) directories of place names (gazetteers). Using the OGC Observations & Measurements (O&M) data model enables transformation of heterogeneous VGI data into a standardized model and format.

As a narrow OGC Geography Markup Language (GML) application schema, O&M ensures a high degree of interoperability. To further support crowdsourcing, projects can also use the candidate OGC Event Service Interface Specification to incorporate real-time, complex event processing on incoming O&M-encoded volunteered geographic information data streams.

Incident management in cities is a growing challenge that’s addressed by applications such as Hexagon Geospatial’s Mobile Alert. Using mobile apps, people define and pinpoint issues, such as utility line damage, graffiti and illegal trash dumping, road potholes, missing streetlights and broken signage.

Geospatial Data Is Useful in Diversity of Efforts

Other enterprise-scale applications that rely on geospatial data and use crowdsourcing include:

· Search and rescue: People reviewed imagery and provided close to 13 million tags of objects in the effort to locate Malaysia Airlines Flight 370, using DigitalGlobe’s Tomnod application. Another project involved finding the location of an Idaho plane crash.

· Land preservation: Invasive weeds are aggressively spreading throughout Hawaii’s high-elevation rain forests, contributing to the destruction of more than 50 percent of Hawaii’s native forests. To help preserve Hawaii’s remaining native forests, DigitalGlobe partners with The Nature Conservancy to monitor change in land cover.

· Vehicle and pedestrian traffic: Crowdsourcing vehicle traffic is well-established with applications such as Google’s Waze. One recently funded company, Placemeter, pays people to attach an old smartphone to a street-facing window to measure pedestrian traffic via video feeds. Its systems detect and count pedestrians and vehicles in streets, estimate how busy places are, track how long people wait in line- and measure the speed of cars.

· Health information: HealthMap combines citizen-provided information with online sources to map public health threats. Flu Near You uses citizen-contributed information collected on its Website to map flu activity. Google Flu Trends analyzes flu-related searches to estimate influenza occurrence.

· Disaster planning and response: In response to the Haiti earthquake in 2010, the OpenStreetMap community began gathering data from imagery for damage assessment. The project used the OGC GeoPackage standard, which was developed and updated in OGC testbeds. Crowdsourcing was also critical for identifying damage and accelerating repairs in the Philippines in response to typhoon Haiyan in November 2013.

Crowdsourcing is an emerging and legitimate method for gathering data for enterprise-scale applications. Still, there are data quality concerns among involved professionals. Geospatial data is central to many enterprise crowdsourcing efforts, and data quality challenges result from the multiple and diverse sources.

Geospatial standards make it much easier to automate the conflation of multiple types and sources of geospatial data. Proper conflation improves quality by comparing citizen data to professionally collected data. Therefore, OGC standards should be central to crowdsourcing initiatives for collecting data that has location components.

Bart De Lathouwer is director of interoperability programs, Europe, for the Open Geospatial Consortium (OGC). He is responsible for planning and managing interoperability initiatives such as testbeds, pilot projects and interoperability experiments, with an emphasis on activities in Europe.

About the Author

Ron Exler is a senior consultant for OGC. His focus is on enterprise technology, its trends, and the connections between business and technology that are needed by decision-makers.

Article source:


Powered by Facebook Comments

Leave a Reply

Your email address will not be published. Required fields are marked *