It’s All Connected: A Survey for Multimodal Arabic {AI}
Multimodal {AI} integrates text, vision, and speech within unified reasoning frameworks, yet Arabic remains significantly underrepresented due to diglossia, morphological complexity, and scarce multimodal resources. This survey delivers the first comprehensive technical roadmap for Arabic multimodal {AI}, covering the progression from unimodal Arabic {NLP}, {OCR}, and {ASR} to recent Arabic-capable Multimodal Large Language Models ({MLLMs}). We review available multimodal datasets, modality encoders, tokenization approaches, connector designs, and fusion strategies used in state-of-the-art systems. We also provide the first consolidated evaluation of Arabic-capable {MLLMs} on multimodal benchmarks {ARB} and {PEARL} analyzing performance, robustness, and domain generalization across {OCR}-grounded and open-domain {VQA} settings. Despite recent progress, challenges persist in cultural grounding, dialect inclusivity, dataset scale, and open-access ecosystem maturity. We outline actionable directions for scalable and culturally aligned Arabic multimodal intelligence, including parameter-efficient adaptation, broader corpus development, and unified evaluation protocols. By consolidating technical advances and empirical insights, this survey establishes a foundation to guide the next generation of Arabic-centric multimodal research.
- Published in:
Research Square - Type:
Article - Authors:
- Year:
2025 - Source:
https://www.researchsquare.com/article/rs-8007923/v1
Citation information
: It’s All Connected: A Survey for Multimodal Arabic {AI}, Research Square, 2025, November, https://www.researchsquare.com/article/rs-8007923/v1, Aldabbas.etal.2025b,
@Article{Aldabbas.etal.2025b,
author={Aldabbas, Farizeh; Elsafty, Hossam; Sifa, Rafet},
title={It’s All Connected: A Survey for Multimodal Arabic {AI}},
journal={Research Square},
month={November},
url={https://www.researchsquare.com/article/rs-8007923/v1},
year={2025},
abstract={Multimodal {AI} integrates text, vision, and speech within unified reasoning frameworks, yet Arabic remains significantly underrepresented due to diglossia, morphological complexity, and scarce multimodal resources. This survey delivers the first comprehensive technical roadmap for Arabic multimodal {AI}, covering the progression from unimodal Arabic {NLP}, {OCR}, and {ASR} to recent...}}