Correcting Systematic Bias in {LLM}-Generated Dialogues Using Big Five Personality Traits

The ability of large language models ({LLMs}) to simulate human behavior and psychological traits holds significant promise for applications in psychology and the social sciences. This paper investigates the feasibility of generating synthetic dialogue datasets that accurately reflect real-world distributions of personality traits, based on the Big Five personality model. Using {GPT}-4o-mini, we prompt the model with personality traits that mirror population-level distributions to generate dialogues. However, systematic deviations, particularly in the representation of extreme personality traits, are observed — likely due to biases introduced during {LLM} training and alignment. To address these deviations, we propose a rescaling method that corrects the initial personality traits used for prompting the {LLM}, ensuring that the generated dialogues more closely match the expected distributions. This correction enhances the quality and reliability of the synthetic dialogues, paving the way for more effective use of {LLMs} in psychological research and social science applications.

Citation information

Sparrenberg, Lorenz; Schneider, Tobias; Deußer, Tobias; Koppenborg, Markus; Sifa, Rafet: Correcting Systematic Bias in {LLM}-Generated Dialogues Using Big Five Personality Traits, 2024 {IEEE} International Conference on Big Data ({BigData}), 2024, 3061--3069, December, https://ieeexplore.ieee.org/abstract/document/10825941, Sparrenberg.etal.2024a,

Associated Lamarr Researchers

Prof. Dr. Rafet Sifa

Prof. Dr. Rafet Sifa

Principal Investigator Hybrid ML to the profile