Neural Models for Semantic Analysis of Handwritten Document Images
Semantic analysis of handwritten document images offers a wide range
of practical application scenarios. A sequential combination of handwritten text
recognition (HTR) and a task-specific natural language processing system offers an
intuitive solution in this domain. However, this HTR-based approach suffers from the
problem of error propagation. An HTR-free model, which avoids explicit text recognition
and solves the task end-to-end, tackles this problem, but often produces poor results.
A possible reason for this is that it does not incorporate largely pre-trained semantic
word embeddings, which turn out to be one of the most powerful advantages in the
textual domain. In this work, we propose an HTR-based and an HTR-free model and compare
them on a variety of segmentation-based handwritten document image benchmarks including
semantic word spotting, named entity recognition, and question answering. Furthermore,
we propose a cross-modal knowledge distillation approach to integrate semantic knowledge
from textually pre-trained word embeddings into HTR-free models. In a series of
experiments, we investigate optimization strategies for robust semantic word image
representation. We show that the incorporation of semantic knowledge is beneficial for
HTR-free approaches in achieving state-of-the-art results on a variety of benchmarks.
- Veröffentlicht in:
International Journal on Document Analysis and Recognition - Typ:
Article - Autoren:
Tüselmann, Oliver; Fink, Gernot A. - Jahr:
2024
Informationen zur Zitierung
Tüselmann, Oliver; Fink, Gernot A.: Neural Models for Semantic Analysis of Handwritten Document Images, International Journal on Document Analysis and Recognition, 2024, June, https://link.springer.com/article/10.1007/s10032-024-00477-8, Tueselmann.Fink.2024a,
@Article{Tueselmann.Fink.2024a,
author={Tüselmann, Oliver; Fink, Gernot A.},
title={Neural Models for Semantic Analysis of Handwritten Document Images},
journal={International Journal on Document Analysis and Recognition},
month={June},
url={https://link.springer.com/article/10.1007/s10032-024-00477-8},
year={2024},
abstract={Semantic analysis of handwritten document images offers a wide range
of practical application scenarios. A sequential combination of handwritten text
recognition (HTR) and a task-specific natural language processing system offers an
intuitive solution in this domain. However, this HTR-based approach suffers from the
problem of error propagation. An HTR-free model, which avoids explicit text...}}