{IKnow}: Instruction-Knowledge-Aware Continual Pretraining for Effective Domain Adaptation

Continual pretraining promises to adapt large language models ({LLMs}) to new domains using only unlabeled test-time data, but naively applying standard self-supervised objectives to instruction-tuned models is known to degrade their instruction-following capability and semantic representations. Existing fixes assume access to the original base model or rely on knowledge from an external domain-specific database – both of which pose a realistic barrier in settings where the base model weights are withheld for safety reasons or reliable external corpora are unavailable. In this work, we propose Instruction-Knowledge-Aware Continual Adaptation ({IKnow}), a simple and general framework that formulates novel self-supervised objectives in the instruction-response dialogue format. Rather than depend- ing on external resources, {IKnow} leverages domain knowledge embedded within the text itself and learns to encode it at a deeper semantic level.

Citation information

Zhang, Tianyi; Mai, Florian; Flek, Lucie: {IKnow}: Instruction-Knowledge-Aware Continual Pretraining for Effective Domain Adaptation, arXiv, 2025, {arXiv}:2510.20377, October, {arXiv}, http://arxiv.org/abs/2510.20377, Zhang.etal.2025a,