The Phenomenon of Correlated Representations in Contrastive Learning
Contrastive learning is widely considered to be an important domain of machine learning. Its main premise is to use contrasting samples of data in order to learn common features that allow for accurate data clustering in a representation space. The generation of such a representation or embedding space can be of great value, for instance for re-identification tasks. While working on one such task, we noticed that, paradoxically, decreasing training data resolution led to a considerably higher re-identification accuracy. Upon further analysis, we discovered that this occurs since during training, highly correlated features are learned that are ultimately redundant and limit the model’s performance. This effect is exaggerated with the use of highresolution data, therefore eventually decreasing the obtained re-identification accuracy. In this contribution, we characterize this phenomenon, which we believe to be a novel problem in contrastive learning. We study the effects of various changes to a common neural network architecture on this phenomenon, linking it to the concept of bias, and propose a set of solutions
to mitigate its effect.
- Published in:
International Joint Conference on Neural Networks - Type:
Inproceedings - Authors:
Klüttermann, Simon; Rutinowski, Jérôme; Müller, Emmanuel - Year:
2024
Citation information
Klüttermann, Simon; Rutinowski, Jérôme; Müller, Emmanuel: The Phenomenon of Correlated Representations in Contrastive Learning, International Joint Conference on Neural Networks, 2024, https://ieeexplore.ieee.org/document/10649913, Kluettermann.etal.2024a,
@Inproceedings{Kluettermann.etal.2024a,
author={Klüttermann, Simon; Rutinowski, Jérôme; Müller, Emmanuel},
title={The Phenomenon of Correlated Representations in Contrastive Learning},
booktitle={International Joint Conference on Neural Networks},
url={https://ieeexplore.ieee.org/document/10649913},
year={2024},
abstract={Contrastive learning is widely considered to be an important domain of machine learning. Its main premise is to use contrasting samples of data in order to learn common features that allow for accurate data clustering in a representation space. The generation of such a representation or embedding space can be of great value, for instance for re-identification tasks. While working on one such...}}