The Search for Equations – Learning to Identify Similarities between Mathematical Expressions

On your search for scientific articles relevant to your research question, you judge the relevance of a mathematical expression that you stumble upon using extensive background knowledge about the domain, its problems and its notations. We wonder if machine learning can support this process and work toward implementing a search engine for mathematical expressions in scientific publications. Thousands of scientific publication with millions of mathematical expressions or equations are accessible at arXiv.org. We want to use this data to learn about equations, their distribution and their relations in order to find similar equations. To this end we propose an embedding model based on convolutional neural networks that maps bitmap images of equations into a low-dimensional vector-space where similarity is evaluated via dot-product. However, no annotated similarity data is available to train this mapping. We mitigate this by proposing a number of different unsupervised proxy tasks that use available features as weak labels. We evaluate our system using a number of metrics, including results on a small hand-labeled subset of equations. In addition, we show and discuss a number of result-sets for some sample queries. The results show that we are able to automatically identify related mathematical expressions. Our dataset is published at https://whadup.github.io/EquationLearning/ and we invite the community to use it.

Published in:
ECML PKDD 2019: Machine Learning and Knowledge Discovery in Databases European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD)
Type:
Inproceedings
Authors:
L. Pfahler, J. Schill, K. Morik
Year:
2019

Citation information

L. Pfahler, J. Schill, K. Morik: The Search for Equations – Learning to Identify Similarities between Mathematical Expressions, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), ECML PKDD 2019: Machine Learning and Knowledge Discovery in Databases, 2019, https://link.springer.com/chapter/10.1007/978-3-030-46133-1_42, Pfahler.etal.2019,

Open BibTeX citation