ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving
While Large Language Models (LLMs) have shown impressive capabilities in math problem-solving tasks, their robustness to noisy inputs is not well-studied. In this work, we propose ArithmAttack to examine how robust the LLMs are when they encounter noisy prompts that contain extra noise in the form of punctuation marks. While being easy to implement, ArithmAttack does not cause any information loss since words are not added or deleted from the context. We evaluate the robustness of seven LLMs, including LLama3, Mistral, and Mathstral, on noisy GSM8K and MultiArith datasets. Our experiments suggest that all the studied models show vulnerability to such noise, with more noise leading to poorer performances.
- Published in:
arXiv - Type:
Article - Authors:
Abedin, Zain Ul; Qamar, Shahzeb; Flek, Lucie; Karimi, Akbar - Year:
2025 - Source:
https://arxiv.org/pdf/2501.08203
Citation information
Abedin, Zain Ul; Qamar, Shahzeb; Flek, Lucie; Karimi, Akbar: ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving, arXiv, 2025, https://arxiv.org/pdf/2501.08203, Abedin.etal.2025a,
@Article{Abedin.etal.2025a,
author={Abedin, Zain Ul; Qamar, Shahzeb; Flek, Lucie; Karimi, Akbar},
title={ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving},
journal={arXiv},
url={https://arxiv.org/pdf/2501.08203},
year={2025},
abstract={While Large Language Models (LLMs) have shown impressive capabilities in math problem-solving tasks, their robustness to noisy inputs is not well-studied. In this work, we propose ArithmAttack to examine how robust the LLMs are when they encounter noisy prompts that contain extra noise in the form of punctuation marks. While being easy to implement, ArithmAttack does not cause any information...}}