Detecting Linguistic Indicators for Stereotype Assessment with Large Language Models
Social categories and stereotypes embedded in language can introduce data bias into the training of Large Language Models ({LLMs}). Despite safeguards, these biases often persist in model behavior, potentially leading to representational harm in outputs. While sociolinguistic research provides valuable insights into the formation and spread of stereotypes, {NLP} approaches for bias evaluation rarely draw on this foundation and often lack objectivity, precision, and interpretability. To fill this gap, we propose a new approach to assess stereotypes by detecting and quantifying the linguistic indication of a stereotype. We derive linguistic indicators from the Social Category and Stereotype Communication ({SCSC}) framework indicating strong social category formulation and stereotyping in language, and use them to build a categorization scheme. We use in-context learning to instruct {LLMs} to examine the linguistic properties of a sentence containing stereotypes, providing a basis for a fine-grained stereotype assessment. We develop a scoring function to measure linguistic indicators of stereotypes based on empirical evaluation. Our annotations of stereotyped sentences reveal that these linguistic indicators explain the strength of a stereotype. The models perform well in detecting and classifying linguistic indicators used to denote a category, but sometimes struggle with accurately evaluating the described associations. The use of more few-shot examples significantly improves the performance. Model performance increases with size, as Llama-3.3-70B-Instruct and {GPT}-4 achieve comparable results that surpass those of Mixtral-8x7B-Instruct, {GPT}-4-mini and Llama-3.1-8B-Instruct\_4bit. Code and annotations can be found in https://github.com/r-goerge/Detecting-Linguistic-Indicators-for-Stereotype-Assessment-with-{LLMs}.
- Published in:
Proceedings of the 2025 {ACM} Conference on Fairness, Accountability, and Transparency - Type:
Inproceedings - Authors:
- Year:
2025 - Source:
https://dl.acm.org/doi/10.1145/3715275.3732181
Citation information
: Detecting Linguistic Indicators for Stereotype Assessment with Large Language Models, Proceedings of the 2025 {ACM} Conference on Fairness, Accountability, and Transparency, 2025, 2796--2814, June, Association for Computing Machinery, https://dl.acm.org/doi/10.1145/3715275.3732181, Goerge.etal.2025a,
@Inproceedings{Goerge.etal.2025a,
author={Görge, Rebekka; Mock, Michael; Allende-Cid, Héctor},
title={Detecting Linguistic Indicators for Stereotype Assessment with Large Language Models},
booktitle={Proceedings of the 2025 {ACM} Conference on Fairness, Accountability, and Transparency},
pages={2796--2814},
month={June},
publisher={Association for Computing Machinery},
url={https://dl.acm.org/doi/10.1145/3715275.3732181},
year={2025},
abstract={Social categories and stereotypes embedded in language can introduce data bias into the training of Large Language Models ({LLMs}). Despite safeguards, these biases often persist in model behavior, potentially leading to representational harm in outputs. While sociolinguistic research provides valuable insights into the formation and spread of stereotypes, {NLP} approaches for bias evaluation...}}