After retraining the model had an accuracy of ~98%. For the task at hand this is sufficient, as additional screening based upon column wide statistics will be made. A visualization of the classification results of one particular table are given below (Fig 2.). The template matching visualization is used, where light blue pixels represent those of the template, red/pink pixels represent those of the matched table, blue pixels show agreement between the template and the matched table and, finally, white crosses indicate empty cells as predicted by our Tensorflow model.
In the below table we see only few misclassified cells. In particular we find one false positive, claiming to be empty when it is not, and six false negatives, where empty cells are not flagged. With over 400 values in the table and an accuracy of ~98% having an error rate of seven values is roughtly what you might expect.
BLOG
data_recovery digitization citizen_science meta-data