EnQuery: Ensemble Policies for Diverse Query-Generation in Preference Alignment of Robot Navigation
To align mobile robot navigation policies with user preferences through reinforcement learning from human feedback (RLHF), reliable and behavior-diverse user queries are required. However, deterministic policies fail to generate a variety of navigation trajectory suggestions for a given navigation task. In this paper, we introduce EnQuery, a query generation approach using an ensemble of policies that achieve behavioral diversity through a regularization term. For a given navigation task, EnQuery produces multiple navigation trajectory suggestions, thereby optimizing the efficiency of preference data collection with fewer queries. Our methodology demonstrates superior performance in aligning navigation policies with user preferences in low-query regimes, offering enhanced policy convergence from sparse preference queries. The evaluation is complemented with a novel explainability representation, capturing full scene navigation behavior of the mobile robot in a single plot. Our code is available online at https://github.com/hrl-bonn/EnQuery
- Published in:
2024 33rd IEEE International Conference on Robot and Human Interactive Communication (ROMAN) - Type:
Inproceedings - Authors:
de Heuvel, Jorge; Seiler, Florian; Bennewitz, Maren - Year:
2024 - Source:
https://ieeexplore.ieee.org/document/10731470
Citation information
de Heuvel, Jorge; Seiler, Florian; Bennewitz, Maren: EnQuery: Ensemble Policies for Diverse Query-Generation in Preference Alignment of Robot Navigation, 2024 33rd IEEE International Conference on Robot and Human Interactive Communication (ROMAN), 2024, https://ieeexplore.ieee.org/document/10731470, Heuvel.etal.2024b,
@Inproceedings{Heuvel.etal.2024b,
author={de Heuvel, Jorge; Seiler, Florian; Bennewitz, Maren},
title={EnQuery: Ensemble Policies for Diverse Query-Generation in Preference Alignment of Robot Navigation},
booktitle={2024 33rd IEEE International Conference on Robot and Human Interactive Communication (ROMAN)},
url={https://ieeexplore.ieee.org/document/10731470},
year={2024},
abstract={To align mobile robot navigation policies with user preferences through reinforcement learning from human feedback (RLHF), reliable and behavior-diverse user queries are required. However, deterministic policies fail to generate a variety of navigation trajectory suggestions for a given navigation task. In this paper, we introduce EnQuery, a query generation approach using an ensemble of policies...}}