Prior research in the area of Natural LanguageProcessing (NLP) has shown that including the syntactic structureof a sentence using a dependency parse tree while traininga representation learning model improves the performance ondownstream tasks. However, most of these modeling approachesmake use of the dependency parse tree of sentences for learn-ing task-specific word representations rather than consider-ing that for learning generic representations. In this paper,we propose a new model named DIBERT which stands forDependency Injected Bidirectional Encoder Representations fromTransformers. DIBERT is a variation of the BERT, that apartfrom Masked Language Modeling (MLM) and Next SentencePrediction (NSP) also incorporates an additional third objectivecalled Parent Prediction (PP). PP injects the syntactic structure ofa dependency tree while pre-training the DIBERT, which gener-ates syntax-aware generic representations. We use the WikiText-103 benchmark dataset to pre-train both the original BERT(BERT-Base) and the proposed DIBERT models. After fine-tuning, we observe that DIBERT performs better than BERT-Base on various NLP downstream tasks including SemanticSimilarity, Natural Language Inference and Sentiment Analysishinting at the fact that incorporating dependency informationwhen learning textual representations can improve the quality ofthe learned representations.
DIBERT: Dependency Injected Bidirectional Encoder Representations from Transformers
DIBERT: Dependency Injected Bidirectional Encoder Representations from Transformers.