{IKnow}: Instruction-Knowledge-Aware Continual Pretraining for Effective Domain Adaptation

Continual pretraining promises to adapt large language models ({LLMs}) to new domains using only unlabeled test-time data, but naively applying standard self-supervised objectives to instruction-tuned models is known to degrade their instruction-following capability and semantic representations. Existing fixes assume access to the original base model or rely on knowledge from an external domain-specific database – both of which pose a realistic barrier in settings where the base model weights are withheld for safety reasons or reliable external corpora are unavailable. In this work, we propose Instruction-Knowledge-Aware Continual Adaptation ({IKnow}), a simple and general framework that formulates novel self-supervised objectives in the instruction-response dialogue format. Rather than depend- ing on external resources, {IKnow} leverages domain knowledge embedded within the text itself and learns to encode it at a deeper semantic level.

Citation information

Zhang, Tianyi; Mai, Florian; Flek, Lucie: {IKnow}: Instruction-Knowledge-Aware Continual Pretraining for Effective Domain Adaptation, arXiv, 2025, {arXiv}:2510.20377, October, {arXiv}, http://arxiv.org/abs/2510.20377, Zhang.etal.2025a,

Associated Lamarr Researchers

Photo. Portrait of Florian Mai.

Dr. Florian Mai

Scientific Coordinator NLP to the profile
Prof. Dr. Lucie Flek

Prof. Dr. Lucie Flek

Area Chair NLP to the profile