Predicting Player Churn with LLMs: A Comprehensive Evaluation of World Knowledge and Reasoning
While large language models (LLMs) have demonstrated impressive results on public benchmarks, their effectiveness in structured, real-world problems like behavioral analytics remains underexplored. This work assesses the out-of-the-box performance of LLMs for industry-specific downstream tasks, with player churn prediction as a representative task. Evaluating LLMs on public benchmarks risks data leakage and task-specific overfitting, so instead we perform experiments on a novel self-compiled dataset for churn prediction, a task not part of any standard benchmark. We compare the performance of OpenAI’s GPT-4.1 with traditional machine learning models, such as XGBoost and MLPs, and analyze the impact of the LLM’s extensive internal world knowledge and reasoning capabilities. With few-shot prompting, GPT-4.1 achieves a weighted F1 score of 0.787, matching the performance of XGBoost on the same set of samples. We show that the LLM can compensate for missing information with its internal world knowledge and reasoning capabilities, performing best if it can leverage both. Our results highlight the potential of LLMs for cross-game churn prediction and other structured, industry-specific tasks.
- Published in:
2025 IEEE 12th International Conference on Data Science and Advanced Analytics (DSAA) - Type:
Inproceedings - Authors:
- Year:
2025 - Source:
https://ieeexplore.ieee.org/document/11248010
Citation information
: Predicting Player Churn with LLMs: A Comprehensive Evaluation of World Knowledge and Reasoning, 2025 IEEE 12th International Conference on Data Science and Advanced Analytics (DSAA), 2025, 1--10, https://ieeexplore.ieee.org/document/11248010, Schneider.etal.2025a,
@Inproceedings{Schneider.etal.2025a,
author={Schneider, Tobias; Sparrenberg, Lorenz; Sifa, Rafet},
title={Predicting Player Churn with LLMs: A Comprehensive Evaluation of World Knowledge and Reasoning},
booktitle={2025 IEEE 12th International Conference on Data Science and Advanced Analytics (DSAA)},
pages={1--10},
url={https://ieeexplore.ieee.org/document/11248010},
year={2025},
abstract={While large language models (LLMs) have demonstrated impressive results on public benchmarks, their effectiveness in structured, real-world problems like behavioral analytics remains underexplored. This work assesses the out-of-the-box performance of LLMs for industry-specific downstream tasks, with player churn prediction as a representative task. Evaluating LLMs on public benchmarks risks data...}}