Interpretable and Low-Resource Entity Matching via Decoupling Feature Learning from Decision Making

Entity Matching (EM) aims at recognizing en-tity records that denote the same real-world ob-ject. Neural EM models learn vector represen-tation of entity descriptions and match entitiesend-to-end. Though robust, these methods re-quire many annotated resources for training,and lack of interpretability. In this paper, wepropose a novel EM framework that consists ofHeterogeneous Information Fusion (HIF) andKey Attribute Tree (KAT) Induction to decou-ple feature representation from matching deci-sion. Using self-supervised learning and maskmechanism in pre-trained language modeling,HIFlearns the embeddings of noisy attributevalues by inter-attribute attention with unla-beled data. Using a set of comparison fea-tures and a limited amount of annotated data,KATInduction learns an efficient decision treethat can be interpreted by generating entitymatching rules whose structure is advocatedby domain experts. Experiments on 6 pub-lic datasets and 3 industrial datasets show thatour method is highly efficient and outperformsSOTA EM models in most cases. We will re-lease the code upon acceptance.

Published in:
ACL IJCNLP Association for Computational Linguistics and International Joint Conference on Natural Language Processing (ACL-IJCNLP)
Type:
Inproceedings
Authors:
Z. Yao, C. Li, T. Dong , X. Lv, J. Yu, L. Hou, J. Li, Y. Zhang, Z. Dai
Year:
2021

Citation information

Z. Yao, C. Li, T. Dong , X. Lv, J. Yu, L. Hou, J. Li, Y. Zhang, Z. Dai: Interpretable and Low-Resource Entity Matching via Decoupling Feature Learning from Decision Making, Association for Computational Linguistics and International Joint Conference on Natural Language Processing (ACL-IJCNLP), ACL IJCNLP, 2021, https://doi.org/10.48550/arXiv.2106.04174, Yao.etal.2021,

Open BibTeX citation