Noise Reduction in Distant Supervision for Relation Extraction using Probabilistic Soft Logic

Author: B. Kirsch, Z. Niyazova, S. Rüping, M. Mock
Journal: ECML PKDD 2019: Machine Learning and Knowledge Discovery in Databases
Year: 2019

Citation information

B. Kirsch, Z. Niyazova, S. Rüping, M. Mock:
Noise Reduction in Distant Supervision for Relation Extraction using Probabilistic Soft Logic.
ECML PKDD 2019: Machine Learning and Knowledge Discovery in Databases,
2019,
63-78,
Springer, Cham,
https://doi.org/10.1007/978-3-030-43887-6_6

The performance of modern relation extraction systems is to a great degree dependent on the size and quality of the underlying training corpus and in particular on the labels. Since generating these labels by human annotators is expensive, textit{Distant Supervision} has been proposed to automatically align entities in a knowledge base with a text corpus to generate annotations. However, this approach suffers from introducing noise, which negatively affects the performance of relation extraction systems.

To tackle this problem, we propose a probabilistic graphical model which simultaneously incorporates different sources of knowledge such as domain experts knowledge about the context and linguistic knowledge about the sentence structure in a principled way.

The model is defined using the declarative language provided by textit{Probabilistic Soft Logic}.

Experimental results show that the proposed approach, compared to the original distantly supervised set, not only improves the quality of such generated training data sets, but also the performance of the final relation extraction model.