How Small Can You Go? Compact Language Models for On-Device Critical Error Detection in Machine Translation

Large Language Models ({LLMs}) excel at evaluating machine translation ({MT}), but their scale and cost hinder deployment on edge devices and in privacy-sensitive workflows. We ask: how small can you get while still detecting meaning-altering translation errors? Focusing on English->German Critical Error Detection ({CED}), we benchmark sub-2B models ({LFM}2-350M, Qwen-3-0.6B/1.7B, Llama-3.2-1B-Instruct, Gemma-3-1B) across {WMT}21, {WMT}22, and {SynCED}-{EnDe}-2025. Our framework standardizes prompts, applies lightweight logit-bias calibration and majority voting, and reports both semantic quality ({MCC}, F1-{ERR}/F1-{NOT}) and compute metrics ({VRAM}, latency, throughput). Results reveal a clear sweet spot around one billion parameters: Gemma-3-1B provides the best quality-efficiency trade-off, reaching {MCC}=0.77 with F1-{ERR}=0.98 on {SynCED}-{EnDe}-2025 after merged-weights fine-tuning, while maintaining 400 ms single-sample latency on a {MacBook} Pro M4 Pro (24 {GB}). At larger scale, Qwen-3-1.7B attains the highest absolute {MCC} (+0.11 over Gemma) but with higher compute cost. In contrast, ultra-small models (0.6B) remain usable with few-shot calibration yet under-detect entity and number errors. Overall, compact, instruction-tuned {LLMs} augmented with lightweight calibration and small-sample supervision can deliver trustworthy, on-device {CED} for {MT}, enabling private, low-cost error screening in real-world translation pipelines. All datasets, prompts, and scripts are publicly available at our {GitHub} repository.

  • Published in:
    arXiv
  • Type:
    Article
  • Authors:
    Chopra, Muskaan; Sparrenberg, Lorenz; Khanna, Sarthak; Sifa, Rafet
  • Year:
    2025
  • Source:
    http://arxiv.org/abs/2511.09748

Citation information

Chopra, Muskaan; Sparrenberg, Lorenz; Khanna, Sarthak; Sifa, Rafet: How Small Can You Go? Compact Language Models for On-Device Critical Error Detection in Machine Translation, arXiv, 2025, {arXiv}:2511.09748, November, {arXiv}, http://arxiv.org/abs/2511.09748, Chopra.etal.2025a,