Noise Reduction in Distant Supervision forRelation Extraction using Probabilistic Soft Logic

The performance of modern relation extraction systems is to a great degree dependent on the size and quality of the underlying training corpus and in particular on the labels. Since generating these labels by human annotators is expensive, textit{Distant Supervision} has been proposed to automatically align entities in a knowledge base with a text corpus to generate annotations. However, this approach suffers from introducing noise, which negatively affects the performance of relation extraction systems. To tackle this problem, we propose a probabilistic graphical model which simultaneously incorporates different sources of knowledge such as domain experts knowledge about the context and linguistic knowledge about the sentence structure in a principled way. The model is defined using the declarative language provided by textit{Probabilistic Soft Logic}. Experimental results show that the proposed approach, compared to the original distantly supervised set, not only improves the quality of such generated training data sets, but also the performance of the final relation extraction model.

  • Published in:
    ECML PKDD 2019: Machine Learning and Knowledge Discovery in Databases Workshop at European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD)
  • Type:
    Inproceedings
  • Authors:
    B. Kirsch, Z. Niyazova, S. Rüping, M. Mock
  • Year:
    2019

Citation information

B. Kirsch, Z. Niyazova, S. Rüping, M. Mock: Noise Reduction in Distant Supervision forRelation Extraction using Probabilistic Soft Logic, Workshop at European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), ECML PKDD 2019: Machine Learning and Knowledge Discovery in Databases, 2019, https://doi.org/10.1007/978-3-030-43887-6_6, Kirsch.etal.2019,