Table Structure Recognition via Encoder/Decoder Vision Transformers
Table structure recognition ({TSR}), the task of inferring the layout of tables, including the row, column, and cell structure, is a surprisingly complex task. With the growing amount and importance of digital documents, it has become an increasingly relevant problem, which nonetheless has not yet been solved adequately and still presents a very active area of research. In recent years, a growing number of deep-learning-based approaches to table parsing have been proposed.This paper presents a novel deep-learning-based table structure recognition method that can predict row, column, and cell bounds for table images with a high degree of accuracy. To achieve this goal, a multi-stage pipeline incorporating a Vision-Transformer-based Autoencoder model was devised. This model was trained to predict cell regions for table images, from which accurate cell bounds can be inferred, including spanning cells which cover multiple rows or columns. The goal was to obtain a model that generalizes well and can return accurate predictions on various tables of differing complexity, even if they contain little initial structural information.An additional modification to the model architecture presented in the Masked Autoencoder ({MAE}) approach was also evaluated.
- Published in:
2024 {IEEE} International Conference on Big Data ({BigData}) - Type:
Inproceedings - Authors:
Uedelhoven, Daniel; Lübbering, Max; Bauckhage, Christian; Sifa, Rafet - Year:
2024 - Source:
https://ieeexplore.ieee.org/abstract/document/10825230
Citation information
Uedelhoven, Daniel; Lübbering, Max; Bauckhage, Christian; Sifa, Rafet: Table Structure Recognition via Encoder/Decoder Vision Transformers, 2024 {IEEE} International Conference on Big Data ({BigData}), 2024, 8855--8858, December, https://ieeexplore.ieee.org/abstract/document/10825230, Uedelhoven.etal.2024a,
@Inproceedings{Uedelhoven.etal.2024a,
author={Uedelhoven, Daniel; Lübbering, Max; Bauckhage, Christian; Sifa, Rafet},
title={Table Structure Recognition via Encoder/Decoder Vision Transformers},
booktitle={2024 {IEEE} International Conference on Big Data ({BigData})},
pages={8855--8858},
month={December},
url={https://ieeexplore.ieee.org/abstract/document/10825230},
year={2024},
abstract={Table structure recognition ({TSR}), the task of inferring the layout of tables, including the row, column, and cell structure, is a surprisingly complex task. With the growing amount and importance of digital documents, it has become an increasingly relevant problem, which nonetheless has not yet been solved adequately and still presents a very active area of research. In recent years, a growing...}}