Physics-LLM: New research project within the ErUM-Data funding initiative

Dr. Tim Ruhe, Principal Investigator at the Lamarr Institute, representing the Physics-LLM project on AI-based research data management in physics
Dr. Tim Ruhe, Principal Investigator at the Lamarr Institute for Machine Learning and Artificial Intelligence and coordinator of the ErUM-Data project Physics-LLM.

The new research project Physics-LLM develops a modular research-data-management toolkit based on large language models (LLMs) and agent-based AI to support researchers’ workflows across the entire scientific discovery lifecycle — from metadata annotation and smart data retrieval to analysis pipelines and reproducible process reuse.  The project is funded within the ErUM-Data framework, the German national funding initiative for research-data infrastructures in physics. Coordinated by Dr. Tim Ruhe, Associated Principal Investigator at the Lamarr Institute for Machine Learning and Artificial Intelligence, Physics-LLM brings together expertise from physics, computer science and data science to turn growing data volumes into usable scientific resources.

Physics-LLM develops AI-based tools to fundamentally accelerate physics research by optimizing automated data selection, data management pipelines, and analysis process reproducibility with LLM agents, enabling diverse research data in physics to be more easily discovered, understood and reused in the right contexts. The toolkit will provide components for automated metadata enrichment, semantic search across heterogeneous sources, and AI-assisted workflow documentation. It integrates both classical research data and non-classical sources such as software repositories and electronic lab notebooks, which are essential for reproducible science.

“Physics-LLM aims to operationalise FAIR data principles through AI-supported workflows that integrate structured and unstructured research outputs across the physics data lifecycle.” says Dr. Tim Ruhe. By translating FAIR data principles (making research data findable, accessible, interoperable and reusable across project,s experiments, and institutions) into concrete, AI-supported workflows, Physics-LLM directly contributes to the core objectives of ErUM-Data.

“LLMs are powerful at working with scientific language, but in physics they must also be precise and traceable.  Physics produces not only data, but also code, notes, and analysis decisions. With LLM agents, we can connect these pieces into a searchable, reusable, credible research record — turning scattered outputs into workflows that others can understand and reproduce,“ says Prof. Dr. Lucie Flek, Lamarr-chair for Natural Language Processing.

Funded by the German Federal Ministry of Education and Research, ErUM-Data supports large collaborative projects that build sustainable, interoperable and AI-ready research-data infrastructures for physics. Within this framework, Phsyics-LLM is funded with approximately € 2.8 million over a period of three years. With its interdisciplinary consortium and strong integration of AI expertise from the Lamarr Institute for Machine Learning and Artificial Intelligence, Physics-LLM illustrates how advanced AI methods can be translated into practical research-data workflows — strengthening the foundations of data-driven physics research in Germany.

More news