Resource-Efficient Anonymization of Textual Data via Knowledge Distillation from Large Language Models

Protecting personal and sensitive information in textual data is increasingly crucial, especially when leveraging large language models ({LLMs}) that may pose privacy risks due to their {API}-based access. We introduce a novel approach and pipeline for anonymizing text across arbitrary domains without the need for manually labeled data or extensive computational resources. Our method employs knowledge distillation from {LLMs} into smaller encoder-only models via named entity recognition ({NER}) coupled with regular expressions to create a lightweight model capable of effective anonymization while preserving the semantic and contextual integrity of the data. This reduces computational overhead, enabling deployment on less powerful servers or even personal computing devices. Our findings suggest that knowledge distillation offers a scalable, resource-efficient pathway for anonymization, balancing privacy preservation with model performance and computational efficiency.

  • Veröffentlicht in:
    Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
  • Typ:
    Inproceedings
  • Autoren:
    Deußer, Tobias; Hahnbück, Max; Uelwer, Tobias; Zhao, Cong; Bauckhage, Christian; Sifa, Rafet
  • Jahr:
    2025
  • Source:
    https://aclanthology.org/2025.coling-industry.20/

Informationen zur Zitierung

Deußer, Tobias; Hahnbück, Max; Uelwer, Tobias; Zhao, Cong; Bauckhage, Christian; Sifa, Rafet: Resource-Efficient Anonymization of Textual Data via Knowledge Distillation from Large Language Models, Proceedings of the 31st International Conference on Computational Linguistics: Industry Track, 2025, 243--250, January, Association for Computational Linguistics, https://aclanthology.org/2025.coling-industry.20/, Deusser.etal.2025a,

Assoziierte Lamarr-ForscherInnen

Prof. Dr. Rafet Sifa

Prof. Dr. Rafet Sifa

Principal Investigator Hybrides ML zum Profil
Kopie von LAMARR Person 500x500 1 - Lamarr Institute for Machine Learning (ML) and Artificial Intelligence (AI)

Prof. Dr. Christian Bauckhage

Direktor zum Profil