{SfmOcc}: Vision-Based 3D Semantic Occupancy Prediction in Urban Environments

Semantic scene understanding is crucial for autonomous systems and 3D semantic occupancy prediction is a key task since it provides geometric and possibly semantic information of the vehicle’s surroundings. Most existing vision-based approaches to occupancy estimation rely on 3D voxel labels or segmented {LiDAR} point clouds for supervision. This limits their application to the availability of a 3D {LiDAR} sensor or the costly labeling of the voxels. While other approaches rely only on images for training, they usually supervise only with a few consecutive images and optimize for proxy tasks like volume reconstruction or depth prediction. In this paper, we propose a novel method for semantic occupancy prediction using only vision data also for supervision. We leverage all the available training images of a sequence and use bundle adjustment to align the images and estimate camera poses from which we then obtain depth images. We compute semantic maps from a pre-trained open-vocabulary image model and generate occupancy pseudo labels to explicitly optimize for the 3D semantic occupancy prediction task. Without any manual or {LiDAR}-based labels, our approach predicts full 3D occupancy voxel grids and achieves state-of-the-art results for 3D occupancy prediction among methods trained without labels.

Veröffentlicht in:
{IEEE} Robotics and Automation Letters
Typ:
Article
Autoren:
Marcuzzi, Rodrigo; Nunes, Lucas; Marks, Elias; Wiesmann, Louis; Läbe, Thomas; Behley, Jens; Stachniss, Cyrill
Jahr:
2025
Source:
https://ieeexplore.ieee.org/document/10947319/authors

Informationen zur Zitierung

Marcuzzi, Rodrigo; Nunes, Lucas; Marks, Elias; Wiesmann, Louis; Läbe, Thomas; Behley, Jens; Stachniss, Cyrill: {SfmOcc}: Vision-Based 3D Semantic Occupancy Prediction in Urban Environments, {IEEE} Robotics and Automation Letters, 2025, 10, 5, 5074--5081, May, https://ieeexplore.ieee.org/document/10947319/authors, Marcuzzi.etal.2025a,

BibTeX-Zitat öffnen

{SfmOcc}: Vision-Based 3D Semantic Occupancy Prediction in Urban Environments

Informationen zur Zitierung

Assoziierte Lamarr-ForscherInnen

Prof. Dr. Cyrill Stachniss