Extracting Predictive Knowledge from Free-Text Documents at Kew Gardens

Theme: Biodiversity & Ecology

Primary Supervisor:

Allan Tucker

Institute for the Environment, BRUNEL

Secondary Supervisor:

Don Kirkup

Biodiversity Information and Economic Botany, KEW

Project Description:

In the last two decades there has been a surge in data related to biodiversity of plants through, for example, on-line publications, DNA-sequences and images of specimens. Much of this data is semi-structured, temporal, spatial and noisy.

This PhD will focus on the use of textual data in floras, the traditional taxonomic research outputs from organisations such as Kew that deal with the nomenclature, geographical distribution, ecology and comparative morphology of the species of a region. The student will be trained in data-mining for ecology to develop a suite of tools for predictive ecology.

Text mining will be integrated with machine learning classifiers to identify common species, their traits and habitats in different ecoystems. For example, certain species may be commonly associated, but only when specific traits are evident and in particular types of geography (e.g. forest). These sorts of complex relationship will be automatically extracted from the historical texts.

Policy Impact of Research:

This research will give new insights into plant species and ecology. Through Dr Kirkup’s links with international initiatives (where the tools will be incorporated), knowledge will be disseminated so that institutions around the UK and Europe will be able to exploit their data more fully.


Stay informed

Subscribe to our RSS newsletter by email.


Find Us

University College London is the administrative lead.

Pearson Building, UCL, Gower Street, London, WC1E 6BT

Follow us on Twitter