{PM}3-{KIE}: A Probabilistic Multi-Task Meta-Model for Document Key Information Extraction
Key Information Extraction ({KIE}) from visually rich documents is commonly approached as either fine-grained token classification or coarse-grained entity extraction. While token-level models capture spatial and visual cues, entity-level models better represent logical dependencies and align with real-world use cases.We introduce {PM}3-{KIE}, a probabilistic multi-task meta-model that incorporates both fine-grained and coarse-grained models. It serves as a lightweight reasoning layer that jointly predicts entities and all appearances in a document. {PM}3-{KIE} incorporates domain-specific schema constraints to enforce logical consistency and integrates large language models for semantic validation, thereby reducing extraction errors.Experiments on two public datasets, {DeepForm} and {FARA}, show that {PM}3-{KIE} outperforms three state-of-the-art models and a stacked ensemble, achieving a statistically significant 2\% improvement in F1 score.
- Published in:
Findings of the Association for Computational Linguistics: {ACL} 2025 - Type:
Inproceedings - Authors:
- Year:
2025 - Source:
https://aclanthology.org/2025.findings-acl.1075/
Citation information
: {PM}3-{KIE}: A Probabilistic Multi-Task Meta-Model for Document Key Information Extraction, Findings of the Association for Computational Linguistics: {ACL} 2025, 2025, 20890--20912, July, Association for Computational Linguistics, https://aclanthology.org/2025.findings-acl.1075/, Kirsch.etal.2025a,
@Inproceedings{Kirsch.etal.2025a,
author={Kirsch, Birgit; Allende-Cid, Héctor; Rueping, Stefan},
title={{PM}3-{KIE}: A Probabilistic Multi-Task Meta-Model for Document Key Information Extraction},
booktitle={Findings of the Association for Computational Linguistics: {ACL} 2025},
pages={20890--20912},
month={July},
publisher={Association for Computational Linguistics},
url={https://aclanthology.org/2025.findings-acl.1075/},
year={2025},
abstract={Key Information Extraction ({KIE}) from visually rich documents is commonly approached as either fine-grained token classification or coarse-grained entity extraction. While token-level models capture spatial and visual cues, entity-level models better represent logical dependencies and align with real-world use cases.We introduce {PM}3-{KIE}, a probabilistic multi-task meta-model that...}}