Efficient Multi-Object Pose Estimation using Multi-Resolution Deformable Attention and Query Aggregation

Object pose estimation is a long-standing problem in computer vision. Recently, attention-based vision transformer models have achieved state-of-the-art results in many computer vision applications. Exploiting the permutation-invariant nature of the attention mechanism, a family of vision transformer models formulate multi-object pose estimation as a set prediction problem. However, existing vision transformer models for multi-object pose estimation rely exclusively on the attention mechanism. Convolutional neural networks, on the other hand, hard-wire various inductive biases into their architecture. In this paper, we investigate incorporating inductive biases in vision transformer models for multi-object pose estimation, which facilitates learning long-range dependencies while circumventing the costly global attention. In particular, we use multi-resolution deformable attention, where the attention operation is performed only between a few deformed reference points. Furthermore, we propose a query aggregation mechanism that enables increasing the number of object queries without increasing the computational complexity. We evaluate the proposed model on the challenging YCB-Video dataset and report state-of-the-art results.

  • Published in:
    IEEE International Conference on Robotic Computing
  • Type:
    Inproceedings
  • Authors:
    Periyasamy, Arul Selvam; Tsaturyan, Vladimir; Behnke, Sven
  • Year:
    2023

Citation information

Periyasamy, Arul Selvam; Tsaturyan, Vladimir; Behnke, Sven: Efficient Multi-Object Pose Estimation using Multi-Resolution Deformable Attention and Query Aggregation, IEEE International Conference on Robotic Computing, 2023, November, https://ais.uni-bonn.de/papers/IRC_2023_Periyasamy.pdf, Periyasamy.etal.2023b,

Associated Lamarr Researchers

lamarr institute person Behnke Sven - Lamarr Institute for Machine Learning (ML) and Artificial Intelligence (AI)

Prof. Dr. Sven Behnke

Area Chair Embodied AI to the profile