{SfmOcc}: Vision-Based 3D Semantic Occupancy Prediction in Urban Environments
Semantic scene understanding is crucial for autonomous systems and 3D semantic occupancy prediction is a key task since it provides geometric and possibly semantic information of the vehicle’s surroundings. Most existing vision-based approaches to occupancy estimation rely on 3D voxel labels or segmented {LiDAR} point clouds for supervision. This limits their application to the availability of a 3D {LiDAR} sensor or the costly labeling of the voxels. While other approaches rely only on images for training, they usually supervise only with a few consecutive images and optimize for proxy tasks like volume reconstruction or depth prediction. In this paper, we propose a novel method for semantic occupancy prediction using only vision data also for supervision. We leverage all the available training images of a sequence and use bundle adjustment to align the images and estimate camera poses from which we then obtain depth images. We compute semantic maps from a pre-trained open-vocabulary image model and generate occupancy pseudo labels to explicitly optimize for the 3D semantic occupancy prediction task. Without any manual or {LiDAR}-based labels, our approach predicts full 3D occupancy voxel grids and achieves state-of-the-art results for 3D occupancy prediction among methods trained without labels.
- Veröffentlicht in:
{IEEE} Robotics and Automation Letters - Typ:
Article - Autoren:
Marcuzzi, Rodrigo; Nunes, Lucas; Marks, Elias; Wiesmann, Louis; Läbe, Thomas; Behley, Jens; Stachniss, Cyrill - Jahr:
2025 - Source:
https://ieeexplore.ieee.org/document/10947319/authors
Informationen zur Zitierung
Marcuzzi, Rodrigo; Nunes, Lucas; Marks, Elias; Wiesmann, Louis; Läbe, Thomas; Behley, Jens; Stachniss, Cyrill: {SfmOcc}: Vision-Based 3D Semantic Occupancy Prediction in Urban Environments, {IEEE} Robotics and Automation Letters, 2025, 10, 5, 5074--5081, May, https://ieeexplore.ieee.org/document/10947319/authors, Marcuzzi.etal.2025a,
@Article{Marcuzzi.etal.2025a,
author={Marcuzzi, Rodrigo; Nunes, Lucas; Marks, Elias; Wiesmann, Louis; Läbe, Thomas; Behley, Jens; Stachniss, Cyrill},
title={{SfmOcc}: Vision-Based 3D Semantic Occupancy Prediction in Urban Environments},
journal={{IEEE} Robotics and Automation Letters},
volume={10},
number={5},
pages={5074--5081},
month={May},
url={https://ieeexplore.ieee.org/document/10947319/authors},
year={2025},
abstract={Semantic scene understanding is crucial for autonomous systems and 3D semantic occupancy prediction is a key task since it provides geometric and possibly semantic information of the vehicle's surroundings. Most existing vision-based approaches to occupancy estimation rely on 3D voxel labels or segmented {LiDAR} point clouds for supervision. This limits their application to the availability of a...}}