Constructing CCEE an LLM evaluation dataset forComplex Context-aware Event Extraction for generegulatory networks

This paper presents a first look at CCEE (Complex Context-aware Event Extraction), a currently in the works novel

evaluation dataset for context-rich gene regulatory network extraction from scientific literature. We propose an

annotation scheme for cancer research papers, capturing both core gene interactions and extensive contextual

information across 10-14 categories per event, addressing limitations in existing datasets to test construction of

disease specific biomedical knowledge graphs. Unlike previous datasets that focus primarily on entity connections

of isolated triplets, CCEE links contextual attributes directly to gene regulatory events, providing a more integrated

representation of scientific knowledge. We illustrate the annotation on 9 papers manually labeled by multiple

experts, and give a first impression of challenges and ways to address them. Additionally we show first evaluations

of LLMs as an annotation system. While it under performs human experts in interaction type labeling, it matches

human performance on attributing entities as context to interactions.

  • Published in:
    8th Workshop on Semantic Web Solutions for Large-scale Biomedical Data Analytics
  • Type:
    Inproceedings
  • Authors:
    Labonté, Frederik; Flek, Lucie
  • Year:
    2025
  • Source:
    https://ceur-ws.org/Vol-4001/paper2.pdf

Citation information

Labonté, Frederik; Flek, Lucie: Constructing CCEE an LLM evaluation dataset forComplex Context-aware Event Extraction for generegulatory networks, 8th Workshop on Semantic Web Solutions for Large-scale Biomedical Data Analytics, 2025, https://ceur-ws.org/Vol-4001/paper2.pdf, Labonte.Flek.2025a,