Detecting Linguistic Indicators for Stereotype Assessment with Large Language Models

Social categories and stereotypes embedded in language can introduce data bias into the training of Large Language Models ({LLMs}). Despite safeguards, these biases often persist in model behavior, potentially leading to representational harm in outputs. While sociolinguistic research provides valuable insights into the formation and spread of stereotypes, {NLP} approaches for bias evaluation rarely draw on this foundation and often lack objectivity, precision, and interpretability. To fill this gap, we propose a new approach to assess stereotypes by detecting and quantifying the linguistic indication of a stereotype. We derive linguistic indicators from the Social Category and Stereotype Communication ({SCSC}) framework indicating strong social category formulation and stereotyping in language, and use them to build a categorization scheme. We use in-context learning to instruct {LLMs} to examine the linguistic properties of a sentence containing stereotypes, providing a basis for a fine-grained stereotype assessment. We develop a scoring function to measure linguistic indicators of stereotypes based on empirical evaluation. Our annotations of stereotyped sentences reveal that these linguistic indicators explain the strength of a stereotype. The models perform well in detecting and classifying linguistic indicators used to denote a category, but sometimes struggle with accurately evaluating the described associations. The use of more few-shot examples significantly improves the performance. Model performance increases with size, as Llama-3.3-70B-Instruct and {GPT}-4 achieve comparable results that surpass those of Mixtral-8x7B-Instruct, {GPT}-4-mini and Llama-3.1-8B-Instruct\_4bit. Code and annotations can be found in https://github.com/r-goerge/Detecting-Linguistic-Indicators-for-Stereotype-Assessment-with-{LLMs}.

  • Published in:
    Proceedings of the 2025 {ACM} Conference on Fairness, Accountability, and Transparency
  • Type:
    Inproceedings
  • Authors:
    Görge, Rebekka; Mock, Michael; Allende-Cid, Héctor
  • Year:
    2025
  • Source:
    https://dl.acm.org/doi/10.1145/3715275.3732181

Citation information

Görge, Rebekka; Mock, Michael; Allende-Cid, Héctor: Detecting Linguistic Indicators for Stereotype Assessment with Large Language Models, Proceedings of the 2025 {ACM} Conference on Fairness, Accountability, and Transparency, 2025, 2796--2814, June, Association for Computing Machinery, https://dl.acm.org/doi/10.1145/3715275.3732181, Goerge.etal.2025a,