Entity Matching (EM) aims at recognizing en-tity records that denote the same real-world ob-ject. Neural EM models learn vector represen-tation of entity descriptions and match entitiesend-to-end. Though robust, these methods re-quire many annotated resources for training,and lack of interpretability. In this paper, wepropose a novel EM framework that consists ofHeterogeneous Information Fusion (HIF) andKey Attribute Tree (KAT) Induction to decou-ple feature representation from matching deci-sion. Using self-supervised learning and maskmechanism in pre-trained language modeling,HIFlearns the embeddings of noisy attributevalues by inter-attribute attention with unla-beled data. Using a set of comparison fea-tures and a limited amount of annotated data,KATInduction learns an efficient decision treethat can be interpreted by generating entitymatching rules whose structure is advocatedby domain experts. Experiments on 6 pub-lic datasets and 3 industrial datasets show thatour method is highly efficient and outperformsSOTA EM models in most cases. We will re-lease the code upon acceptance.
Interpretable and Low-Resource Entity Matching via Decoupling Feature Learning from Decision Making
Citation information
Interpretable and Low-Resource Entity Matching via Decoupling Feature Learning from Decision Making.