Inspecting and Measuring Fairness of unlabeled Image Datasets

Bias in training data can lead to algorithmic unfairness in machine learning tasks. Therefore, a general requirement for trustworthy AI is that data should be representative and free of bias. There are several approaches to measure fairness of a given dataset based on attributes such as gender or race. However, for unstructured data, such measures require the dataset to be labeled with respect to these attributes, and cannot be directly applied to unlabeled image datasets. We present an approach using foundation models to analyze the fairness of unlabeled images, exploiting the fact that foundation models implement a semantically consistent mapping from the unstructured image space to the embedding space. In particular, we systematically compare the embedding of a reference dataset known to be “fair” to an unlabeled image dataset. We show that the resulting data structures in the embedding support a systematic comparative analysis based on both qualitative as well as quantitative evaluation. We evaluate our approach analyzing the fairness of the target image dataset CelebA while using the FairFace dataset as reference. The validation against the ground truth labels of the CelebA dataset demonstrates the principal applicability of the overall approach. In sum, our work offers a novel perspective on fairness evaluation of images, as it requires no labeling but rather makes use of existing already labeled reference datasets.

Published in:
2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW)
Type:
Inproceedings
Authors:
Goerge, Rebekka; Mock, Michael; Akila, Maram
Year:
2024
Source:
https://ieeexplore.ieee.org/document/10555073

Citation information

Goerge, Rebekka; Mock, Michael; Akila, Maram: Inspecting and Measuring Fairness of unlabeled Image Datasets, 2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW), 2024, 191--200, May, https://ieeexplore.ieee.org/document/10555073, Goerge.etal.2024a,

Open BibTeX citation