Correcting Systematic Bias in {LLM}-Generated Dialogues Using Big Five Personality Traits
The ability of large language models ({LLMs}) to simulate human behavior and psychological traits holds significant promise for applications in psychology and the social sciences. This paper investigates the feasibility of generating synthetic dialogue datasets that accurately reflect real-world distributions of personality traits, based on the Big Five personality model. Using {GPT}-4o-mini, we prompt the model with personality traits that mirror population-level distributions to generate dialogues. However, systematic deviations, particularly in the representation of extreme personality traits, are observed — likely due to biases introduced during {LLM} training and alignment. To address these deviations, we propose a rescaling method that corrects the initial personality traits used for prompting the {LLM}, ensuring that the generated dialogues more closely match the expected distributions. This correction enhances the quality and reliability of the synthetic dialogues, paving the way for more effective use of {LLMs} in psychological research and social science applications.
- Published in:
2024 {IEEE} International Conference on Big Data ({BigData}) - Type:
Inproceedings - Authors:
Sparrenberg, Lorenz; Schneider, Tobias; Deußer, Tobias; Koppenborg, Markus; Sifa, Rafet - Year:
2024 - Source:
https://ieeexplore.ieee.org/abstract/document/10825941
Citation information
Sparrenberg, Lorenz; Schneider, Tobias; Deußer, Tobias; Koppenborg, Markus; Sifa, Rafet: Correcting Systematic Bias in {LLM}-Generated Dialogues Using Big Five Personality Traits, 2024 {IEEE} International Conference on Big Data ({BigData}), 2024, 3061--3069, December, https://ieeexplore.ieee.org/abstract/document/10825941, Sparrenberg.etal.2024a,
@Inproceedings{Sparrenberg.etal.2024a,
author={Sparrenberg, Lorenz; Schneider, Tobias; Deußer, Tobias; Koppenborg, Markus; Sifa, Rafet},
title={Correcting Systematic Bias in {LLM}-Generated Dialogues Using Big Five Personality Traits},
booktitle={2024 {IEEE} International Conference on Big Data ({BigData})},
pages={3061--3069},
month={December},
url={https://ieeexplore.ieee.org/abstract/document/10825941},
year={2024},
abstract={The ability of large language models ({LLMs}) to simulate human behavior and psychological traits holds significant promise for applications in psychology and the social sciences. This paper investigates the feasibility of generating synthetic dialogue datasets that accurately reflect real-world distributions of personality traits, based on the Big Five personality model. Using {GPT}-4o-mini, we...}}