Interpretable Topic Extraction and Word Embedding Learning using row-stochastic DEDICOM

Author: L. Hillebrand, D. Biesner, C. Bauckhage, R. Sifa
Journal: CD-MAKE 2020: Machine Learning and Knowledge Extraction
Year: 2020

Citation information

L. Hillebrand, D. Biesner, C. Bauckhage, R. Sifa:
Interpretable Topic Extraction and Word Embedding Learning using row-stochastic DEDICOM.
CD-MAKE 2020: Machine Learning and Knowledge Extraction,
2020,
401-422,
https://doi.org/10.1007/978-3-030-57321-8_22

The DEDICOM algorithm provides a uniquely interpretable matrix factorization method for symmetric and asymmetric square matrices.
We employ a new row-stochastic variation of DEDICOM on the pointwise mutual information matrices of text corpora to identify latent topic clusters within the vocabulary and simultaneously learn interpretable word embeddings.
We introduce a method to efficiently train a constrained DEDICOM algorithm and a qualitative evaluation of its topic modeling and word embedding performance.