Imposing Category Trees onto Word-Embeddings Using A Geometric Construction

We present a novel method to precisely impose tree-structured category information onto word-embeddings, resulting in ball embeddings in higher dimensional spaces (N-balls for short). Inclusion relations among N-balls implicitly encode subordinate relations among categories. The similarity measurement in terms of the cosine function is enriched by category information. Using a geometric construction method instead of back-propagation, we create large N-ball embeddings that satisfy two conditions: (1) category trees are precisely imposed onto word embeddings at zero energy cost; (2) pre-trained word embeddings are well preserved. A new benchmark data set is created for validating the category of unknown words. Experiments show that N-ball embeddings, carrying category information, significantly outperform word embeddings in the test of nearest neighborhoods, and demonstrate surprisingly good performance in validating categories of unknown words.

  • Published in:
    ICLR International Conference on Learning Representations (ICLR)
  • Type:
    Inproceedings
  • Authors:
    T. Dong, C. Bauckhage, H. Jin, J. Li, O. Cremers, D. Speicher, A.B. Cremers, J. Zimmermann
  • Year:
    2019

Citation information

T. Dong, C. Bauckhage, H. Jin, J. Li, O. Cremers, D. Speicher, A.B. Cremers, J. Zimmermann: Imposing Category Trees onto Word-Embeddings Using A Geometric Construction, International Conference on Learning Representations (ICLR), ICLR, 2019, https://www.semanticscholar.org/paper/Imposing-Category-Trees-Onto-Word-Embeddings-Using-Dong-Bauckhage/df97d99457cac7ba4cac120018174790f1e1bc1c, Dong.etal.2019,