In the last two decades there has been a surge in data related to biodiversity of plants through, for example, on-line publications, DNA-sequences and images of specimens. Much of this data is semi-structured, temporal, spatial and noisy.
This PhD will focus on the use of textual data in floras, the traditional taxonomic research outputs from organisations such as Kew that deal with the nomenclature, geographical distribution, ecology and comparative morphology of the species of a region. The student will be trained in data-mining for ecology to develop a suite of tools for predictive ecology.
Text mining will be integrated with machine learning classifiers to identify common species, their traits and habitats in different ecoystems. For example, certain species may be commonly associated, but only when specific traits are evident and in particular types of geography (e.g. forest). These sorts of complex relationship will be automatically extracted from the historical texts.