Towards Standardised Dataset Creation for Human Activity Recognition: Framework, Taxonomy, Checklist, and Best Practices

Well-annotated and consistent datasets are essential for training supervised and self-supervised models, especially in human activity recognition (HAR). However, unlike research areas such as image recognition, HAR datasets vary widely in sensor types, environments, subjects, and presentation formats, often reflecting the individual practices of their creators. This inconsistency hinders usability, reproducibility, and long-term value. In this paper, we propose a standardized framework for creating HAR datasets, including taxonomies, a detailed checklist, and best practices to guide dataset development. We retrospectively apply this checklist to benchmark datasets HDM05, HDM12 Dance, HuGaDB, UMAFall, LARa, OpenPack, CAARL, and DaRA and compare them with industry-focused datasets to illustrate common gaps and opportunities for improvement.

  • Published in:
    The European Conference on Artificial Intelligence
  • Type:
    Inproceedings
  • Authors:
    Niemann, Friedrich; Rueda, Fernando Moya; Al Kfari, Moh’d Khier; Nair, Nilah Ravi; Lüdtke, Stefan; Kirchheim, Alice
  • Year:
    2025

Citation information

Niemann, Friedrich; Rueda, Fernando Moya; Al Kfari, Moh’d Khier; Nair, Nilah Ravi; Lüdtke, Stefan; Kirchheim, Alice: Towards Standardised Dataset Creation for Human Activity Recognition: Framework, Taxonomy, Checklist, and Best Practices, The European Conference on Artificial Intelligence, 2025, Niemann.etal.2025a,