Constructing CCEE an LLM evaluation dataset forComplex Context-aware Event Extraction for generegulatory networks
This paper presents a first look at CCEE (Complex Context-aware Event Extraction), a currently in the works novel
evaluation dataset for context-rich gene regulatory network extraction from scientific literature. We propose an
annotation scheme for cancer research papers, capturing both core gene interactions and extensive contextual
information across 10-14 categories per event, addressing limitations in existing datasets to test construction of
disease specific biomedical knowledge graphs. Unlike previous datasets that focus primarily on entity connections
of isolated triplets, CCEE links contextual attributes directly to gene regulatory events, providing a more integrated
representation of scientific knowledge. We illustrate the annotation on 9 papers manually labeled by multiple
experts, and give a first impression of challenges and ways to address them. Additionally we show first evaluations
of LLMs as an annotation system. While it under performs human experts in interaction type labeling, it matches
human performance on attributing entities as context to interactions.
- Published in:
8th Workshop on Semantic Web Solutions for Large-scale Biomedical Data Analytics - Type:
Inproceedings - Authors:
- Year:
2025 - Source:
https://ceur-ws.org/Vol-4001/paper2.pdf
Citation information
: Constructing CCEE an LLM evaluation dataset forComplex Context-aware Event Extraction for generegulatory networks, 8th Workshop on Semantic Web Solutions for Large-scale Biomedical Data Analytics, 2025, https://ceur-ws.org/Vol-4001/paper2.pdf, Labonte.Flek.2025a,
@Inproceedings{Labonte.Flek.2025a,
author={Labonté, Frederik; Flek, Lucie},
title={Constructing CCEE an LLM evaluation dataset forComplex Context-aware Event Extraction for generegulatory networks},
booktitle={8th Workshop on Semantic Web Solutions for Large-scale Biomedical Data Analytics},
url={https://ceur-ws.org/Vol-4001/paper2.pdf},
year={2025},
abstract={This paper presents a first look at CCEE (Complex Context-aware Event Extraction), a currently in the works novel evaluation dataset for context-rich gene regulatory network extraction from scientific literature. We propose an annotation scheme for cancer research papers, capturing both core gene interactions and extensive contextual information across 10-14 categories per event, addressing...}}