KPI-EDGAR: A Novel Dataset and Accompanying Metric for Relation Extraction from Financial Documents
We introduce KPI-EDGAR, a novel dataset for Joint Named Entity Recognition and Relation Extraction building on financial reports uploaded to the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system, where the main objective is to extract Key Performance Indicators (KPIs) from financial documents and link them to their numerical values and other attributes. We further provide four accompanying baselines for benchmarking potential future research. Additionally, we propose a new way of measuring the success of said extraction process by incorporating a word-level weighting scheme into the conventional F$_1$ score to better model the inherently fuzzy borders of the entity pairs of a relation in this domain.
- Published in:
2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA) - Type:
Inproceedings - Authors:
Deußer, Tobias; Ali, Syed Musharraf; Hillebrand, Lars; Nurchalifah, Desiana; Jacob, Basil; Bauckhage, Christian; Sifa, Rafet - Year:
2022 - Source:
https://ieeexplore.ieee.org/document/10069806
Citation information
Deußer, Tobias; Ali, Syed Musharraf; Hillebrand, Lars; Nurchalifah, Desiana; Jacob, Basil; Bauckhage, Christian; Sifa, Rafet: KPI-EDGAR: A Novel Dataset and Accompanying Metric for Relation Extraction from Financial Documents, 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA), 2022, https://ieeexplore.ieee.org/document/10069806, Deusser.etal.2022a,
@Inproceedings{Deusser.etal.2022a,
author={Deußer, Tobias; Ali, Syed Musharraf; Hillebrand, Lars; Nurchalifah, Desiana; Jacob, Basil; Bauckhage, Christian; Sifa, Rafet},
title={KPI-EDGAR: A Novel Dataset and Accompanying Metric for Relation Extraction from Financial Documents},
booktitle={2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)},
url={https://ieeexplore.ieee.org/document/10069806},
year={2022},
abstract={We introduce KPI-EDGAR, a novel dataset for Joint Named Entity Recognition and Relation Extraction building on financial reports uploaded to the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system, where the main objective is to extract Key Performance Indicators (KPIs) from financial documents and link them to their numerical values and other attributes. We further provide four...}}