We resolve this issue by transforming a table with known properties of the cells of the table onto a table with unknown properties using a spline deformation. One can imagine one of the sheets of paper being made out of rubber, now we wiggle this rubber sheet until it perfectly aligns with a reference table. This is what happens during the transformation process from one sheet layout into the next. We borrowed this technique from the medical sciences where it is used to align several medical images (x-ray / CT / MRI scans) taken over time.
The result of this exercise is shown below. Here the lines defining the table stay fixed while only the numbers switch between the two pages. Now the frame of reference is fixed between pages, we can extract the individual numbers more easily as they are always found in the same location. Not only that, a fixed frame of reference also allows us to (in part) remove the underlying layout of the table itself only retaining the hand written numbers. This procedure avoids some of the tedious work of marking the cells of various tables in the Old Weather citizen science project and allows to move ahead to transcription faster.
BLOG
climate_data pre-processing