COBECORE: recovering eco-climatological data from Belgian colonial archives using computer vision, machine learning and citizen science.

Digital Dreams Workshop 2017

Koen Hufkens1, Kim Jacobsen2, Hans Beeckman2, Piet Stoffelen3, Filip Vandelook3, Jan Van den Bulcke4, Michael Amara5, Hans Verbeeck6

1Richardson Lab, Harvard University, 2Wood Biology Service, Royal Museum for Central Africa, 3Botanic Garden Meise, 4Laboratory of Wood Technology, Ghent University, 5State Archives Belgium, 6CAVElab Computational and Applied Vegetation Ecology, Ghent University

The historical archives of the ‘Institut National d’Etudes Agronomique du Congo Belge (INEAC)’ and ‘La régie des plantations de la colonie (REPCO)’, spanning approximately six decades (~1901 – 1960), at the State Archives, the Royal Museum for Central Africa and the herbarium collections of the Botanic Garden Meise hold vast amounts of data including historical forestry, climatological, ecological, biodiversity data and aerial photographs, with great potential and relevance for basic and applied forestry research in the central Congo Basin.

The COBECORE project aims to establish these baseline measurements by valorizing eco-climatological legacy data available within the INEAC archives and complementary historical archives and natural history collections. The project will make information stored in analog archives digitally accessible, through computer vision, machine learning and citizen science approaches. In particular, we use (elastic) image registration, and convolutional neural networks to facilitate the extraction and transcription of handwritten data entries in combination with and supported by citizen science based validation data. The project will result in a multi-faceted database, while linking (meta-) data to existing data records (i.e. digitized herbarium specimen at the Botanical Garden Meise), for direct applications in forestry research.

Here we report on the first half year of data recovery and discuss progress made in the automated data registration and transcription as well as the use of crowdsourcing / citizen science platform. The COBECORE project validates and underscores the importance of an interdisciplinary approach connecting the humanities and information technology (computer science) in unlocking archived (analog) data in support of the natural sciences.