EMNLP 2024: Multilingual Instruction Tuning in Polyglot Language Models
Between November 12th and 16th, the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP) will take place at the Hyatt Regency Miami Hotel in Miami, Florida. Alexander Weber, PhD candidate at Lamarr’s partner organization Fraunhofer IAIS, will hold a presentation on his newest paper titled “Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand for Multilingual Instructions?”. He will dive into how multilingual instruction-tuning impacts language model performance across a variety of languages, addressing a growing need for adaptable AI language assistants worldwide.
This session will explore Weber’s extensive empirical study on multilingual language models (LLMs), highlighting that instruction-tuning on parallel multilingual datasets significantly enhances cross-lingual capabilities – improving performance by up to 9.9% – as opposed to monolingual or non-parallel multilingual data. This work also critically examines the Superficial Alignment Hypothesis, offering evidence that larger datasets in instruction-tuning are essential, especially for mid-sized models.
For those interested in Alexander Weber’s innovative research, his latest blog post on the Lamarr ML-Blog breaks down the benefits of multilingual instruction-tuning for language models pre-trained on Indo-European languages. The blog highlights how parallel datasets – designed to maintain semantic alignment across languages – significantly improve cross-lingual capabilities over monolingual data. This work also introduces valuable new multilingual resources and evaluation datasets specifically created to enhance polyglot model training.
The registration for EMNLP 2024 remains open at: https://2024.emnlp.org/registration/.
Details
Date
12. - 16. November 2024
Location
Hyatt Regency Miami Hotel, Miami