Action Space Design – Reinforcement Learning for Robot Motor Skills

Blogbeitrag Bild - Lamarr Institute for Machine Learning (ML) and Artificial Intelligence (AI)
© Adobe Stock

Introduction

Reinforcement learning (RL) is a powerful method for developing robot motor skills (see the previous blogposts of the RL for robotics series). However, selecting the appropriate action space for learning is crucial and often relies on intuition. For instance, a wheeled robot may use a wheel velocity action space, while a legged robot typically uses joint positions, and manipulators may use Cartesian targets. Although common action spaces, like position control for legged locomotion, have been established, important questions remain: What makes position control effective for legged locomotion instead of directly using torques? Is it suitable for all robot tasks, or are there more effective options for different systems? 

Robots generally operate within a physics-defined action space, where, typically, electric actuators use current to track torque through a high-frequency control loop. While RL policies can output torques directly, many studies recommend alternative action spaces such as joint position, joint velocity, or task-space setpoints, which are converted to torques via feedback laws. The choice of action space significantly affects robot learning in contexts like character animation, manipulation, and flying robots. Although position control is often favored, some studies suggest that joint velocity may be more effective for specific tasks.  

The Omniverse Isaac Gym suite for robot learning tasks illustrates the complexity of action space implementation with a diverse mix of action types. To this end, our study examines the impact of action space selection across various tasks and robots with distinct dynamics (see Fig. 1). 

Bild1 7 - Lamarr Institute for Machine Learning (ML) and Artificial Intelligence (AI)
© Fraunhofer IML, MIT, NVIDIA
Fig. 1: Overview of robot learning tasks in our case study, which features diverse action spaces. We analyze a subset of the Omniverse Isaac Gym suite and the evoBOT and Go1 platforms to guide practitioners in selecting the appropriate action space for new tasks.

Generalized Action Space

We propose a generalized parameterization of action spaces, enabling a unified representation of common configuration-based tasks (see Fig. 2). Conceptually, in RL, we can describe the transformation from policy output and state as well as historical information into motor torque commands. While this framework encompasses both task-space and configuration-space action spaces, we focus the study on configuration-space control, which involves joint position, velocity, and torque. 

We identify common action spaces within this framework. When command and torque updates occur at the same frequency, these action spaces can be represented as linear mappings from state and action to torque. This approach reveals intuitive relationships among action spaces; for instance, delta position control with a simple integrator can be equivalent to torque control with a damping term, and delta velocity control can align with torque control. 

The generalized parametrization also allows the interpretation of the parameters involved as part of the final layer of the policy network. The choice of action spaces relates to how these parameters are initialized, leading us to consider whether the benefits of a specific action space arise from its architecture (facilitating effective learning) or from proper initialization (enhancing performance). 

image 4 - Lamarr Institute for Machine Learning (ML) and Artificial Intelligence (AI)
© Fraunhofer IML & MIT
Fig. 2: Action Space Illustrative Diagram.A. Classical decomposition of the policy and low-level controller. B. Generalized parametrization of action spaces, where the low-level controller acts as a linear layer in the learning scheme. C. Differences among action spaces can be viewed in policy architecture and initialization, with pre-training one space to replicate another’s exploration. 

Practical Guidelines for Action Space Selection

Our findings indicate that the choice of action space varies based on the robot and the task. Here are some clear guidelines to help you select the right action space for new robotic tasks:

  1. Impact of Dynamics: The choice of action space affects learning differently in robots with distinct dynamics. Two robots can show opposite trends in action space selection even for the same locomotion objective. Thus, consider how the robot moves and its dynamics when choosing an action space. Use position control if you need to keep the joints near a particular position, and torque control if continuous rotation is required.
  2. Tuning Exploration: Action space selection often involves adjusting exploration behavior. For some tasks, refining initial exploration can significantly reduce the performance gap between action spaces. Visualizing position, velocity, and torque during random tests can provide insights. Additionally, starting with imitation learning can help combine behaviors from different action spaces.
  3. Expressive Capacity: Although there can be significant differences in RL performance between action spaces, effective policies can often be adapted across these spaces with similar results. Choose action space gains that enhance learning for your task. If you need to switch settings later, consider using teacher-student learning to facilitate the transition.
  4. Policy Behavior Over Time: The policy’s behavior between timesteps can influence performance, but the effects may vary. We found that tuning high-level and low-level control frequencies is interconnected. Generally, it is advisable to run the policy more frequently without skipping frames for better performance.

Summary

This study examined how action space selection impacts learning in the Unitree Go1 quadruped, the evoBOT hybrid robot, and the Omniverse Isaac Gym task suite. Our findings reveal that the choice of action space significantly affects learning outcomes, with different robots displaying opposite trends for the same locomotion objectives, underscoring the necessity of customizing action space selection based on each robot’s dynamics.

We also found that adjusting initial exploration behavior can help reduce performance gaps between action spaces for specific tasks. However, this strategy is not universally effective, highlighting the task-dependent nature of exploration tuning. Furthermore, despite performance variations among action spaces, effective policies can be adapted with similar success, indicating that while expressive capacity matters, it does not strictly dictate optimal performance. Lastly, we identified a notable relationship between tuning control frequency and action space representation, emphasizing the complexity of optimizing action spaces concerning temporal control parameters.

Overall, our findings demonstrate that action space selection significantly impacts learning performance in a task-dependent way. The practical implications of action space selection are primarily attributed to improved policy initialization and behavior between timesteps, providing valuable insights for enhancing robotic learning and control across various platforms and tasks.

For more details on the underlying methodology, benchmarking results, and training setups, please refer to the open-access publication presented at the 2024 Conference on Robot Learning (CoRL).

Julian Eßer, Gabriel Margolis,

9. April 2025

Topics

Julian Eßer

Julian Eßer is a research associate in Robotics and AI at the Fraunhofer Institute for Material Flow and Logistics in Dortmund. After receiving his B.S. and M.S. (with distinction) in mechanical engineering from the University of Duisburg-Essen, Germany in 2018 and 2020, respectively, he currently is conducting his PhD on learning-based control of highly dynamic robots at TU Dortmund. His research focus there lies in the field of Embodied AI, […]

Gabriel Margolis

I am a PhD student at the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT, advised by Pulkit Agrawal. I study learned control as a component of complete robotic systems. Previously, I received my BS (’20) and MEng (’21) degrees at MIT.

More blog posts