The aim is the internal systematic biases in crowd-sourced geographic information datasets and citizen science data.
There has been a rapid increase in information gathered by people from all walks of life who are using connected devices with an ability to collect and share geographic information, such as GPS tracks, photographs with location information, or observations of the natural environment in citizen science projects. There is now a vast array of projects and activities that use this type of information, and each project has its own characteristics. Yet, it can be hypothesised that some of the characteristics of this information will be systematically biased, and these biases differ between projects and data sources.
Crowd-sourced datasets will have some systematic biases that repeat across crowd-sourcing platforms. For example the impact of population density, business activity, and tourism on the places where data is available, or a weekend or seasonal bias of the temporal period of data collection. Others biases are project-specific – for example, some projects manage to attract more young men, and therefore places that are of interest to this demographic will be over-represented. One of the major obstacles that limit the use of such data sources is understanding and separating systematic and project-level biases and then developing statistical methods to evaluate their impact. In order to use such datasets to identify hidden features and patterns, there is a need to identify what are the relationships between a dataset and the world.