Neural Models for Semantic Analysis of Handwritten Document Images

Semantic analysis of handwritten document images offers a wide range
of practical application scenarios. A sequential combination of handwritten text
recognition (HTR) and a task-specific natural language processing system offers an
intuitive solution in this domain. However, this HTR-based approach suffers from the
problem of error propagation. An HTR-free model, which avoids explicit text recognition
and solves the task end-to-end, tackles this problem, but often produces poor results.
A possible reason for this is that it does not incorporate largely pre-trained semantic
word embeddings, which turn out to be one of the most powerful advantages in the
textual domain. In this work, we propose an HTR-based and an HTR-free model and compare
them on a variety of segmentation-based handwritten document image benchmarks including
semantic word spotting, named entity recognition, and question answering. Furthermore,
we propose a cross-modal knowledge distillation approach to integrate semantic knowledge
from textually pre-trained word embeddings into HTR-free models. In a series of
experiments, we investigate optimization strategies for robust semantic word image
representation. We show that the incorporation of semantic knowledge is beneficial for
HTR-free approaches in achieving state-of-the-art results on a variety of benchmarks.

  • Published in:
    International Journal on Document Analysis and Recognition
  • Type:
    Article
  • Authors:
    Tüselmann, Oliver; Fink, Gernot A.
  • Year:
    2024

Citation information

Tüselmann, Oliver; Fink, Gernot A.: Neural Models for Semantic Analysis of Handwritten Document Images, International Journal on Document Analysis and Recognition, 2024, June, https://link.springer.com/article/10.1007/s10032-024-00477-8, Tueselmann.Fink.2024a,