Automatically identifying, prioritizing and extracting global biodiversity information
The Living Planet Index Database contains over 15000 records of population abundance for over 3000 species. It is used to assess the state of global biodiversity through the Living Planet Index, one of the global indicators used to assess progress towards the CBDs Aichi Biodiversity targets. However, the process of identifying and extracting information from source literature is slow and relies on individual researcher effort. In this project we propose to use existing machine-learning and data-mining tools to learn to differentiate sources (papers, reports, etc) that contain useful information (records of abundance) from those that do not. Initially this will involve training a model on our extensive source database (using abstracts, keywords, full text) to accurately discriminate ‘true’ sources from others. This model will then be developed into a tool that can be applied to online abstract repositories and to predict the likelihood of new sources containing useful information.