{IKnow}: Instruction-Knowledge-Aware Continual Pretraining for Effective Domain Adaptation
Continual pretraining promises to adapt large language models ({LLMs}) to new domains using only unlabeled test-time data, but naively applying standard self-supervised objectives to instruction-tuned models is known to degrade their instruction-following capability and semantic representations. Existing fixes assume access to the original base model or rely on knowledge from an external domain-specific database – both of which pose a realistic barrier in settings where the base model weights are withheld for safety reasons or reliable external corpora are unavailable. In this work, we propose Instruction-Knowledge-Aware Continual Adaptation ({IKnow}), a simple and general framework that formulates novel self-supervised objectives in the instruction-response dialogue format. Rather than depend- ing on external resources, {IKnow} leverages domain knowledge embedded within the text itself and learns to encode it at a deeper semantic level.
- Published in:
arXiv - Type:
Article - Authors:
- Year:
2025 - Source:
http://arxiv.org/abs/2510.20377
Citation information
: {IKnow}: Instruction-Knowledge-Aware Continual Pretraining for Effective Domain Adaptation, arXiv, 2025, {arXiv}:2510.20377, October, {arXiv}, http://arxiv.org/abs/2510.20377, Zhang.etal.2025a,
@Article{Zhang.etal.2025a,
author={Zhang, Tianyi; Mai, Florian; Flek, Lucie},
title={{IKnow}: Instruction-Knowledge-Aware Continual Pretraining for Effective Domain Adaptation},
journal={arXiv},
number={{arXiv}:2510.20377},
month={October},
publisher={{arXiv}},
url={http://arxiv.org/abs/2510.20377},
year={2025},
abstract={Continual pretraining promises to adapt large language models ({LLMs}) to new domains using only unlabeled test-time data, but naively applying standard self-supervised objectives to instruction-tuned models is known to degrade their instruction-following capability and semantic representations. Existing fixes assume access to the original base model or rely on knowledge from an external...}}