{VideoPCDNet}: Video Parsing and Prediction with Phase Correlation Networks
Understanding and predicting video content is essential for planning and reasoning in dynamic environments. Despite advancements, unsupervised learning of object representations and dynamics remains challenging. We present {VideoPCDNet}, an unsupervised framework for object-centric video decomposition and prediction. Our model uses frequency-domain phase correlation techniques to recursively parse videos into object components, which are represented as transformed versions of learned object prototypes, enabling accurate and interpretable tracking. By explicitly modeling object motion through a combination of frequency domain operations and lightweight learned modules, {VideoPCDNet} enables accurate unsupervised object tracking and prediction of future video frames. In our experiments, we demonstrate that {VideoPCDNet} outperforms multiple object-centric baseline models for unsupervised tracking and prediction on several synthetic datasets, while learning interpretable object and motion representations.
- Published in:
arXiv - Type:
Article - Authors:
- Year:
2025 - Source:
http://arxiv.org/abs/2506.19621
Citation information
: {VideoPCDNet}: Video Parsing and Prediction with Phase Correlation Networks, arXiv, 2025, {arXiv}:2506.19621, June, {arXiv}, http://arxiv.org/abs/2506.19621, Vicente.etal.2025a,
@Article{Vicente.etal.2025a,
author={Vicente, Noel José Rodrigues; Lehner, Enrique; Villar-Corrales, Angel; Nogga, Jan; Behnke, Sven},
title={{VideoPCDNet}: Video Parsing and Prediction with Phase Correlation Networks},
journal={arXiv},
number={{arXiv}:2506.19621},
month={June},
publisher={{arXiv}},
url={http://arxiv.org/abs/2506.19621},
year={2025},
abstract={Understanding and predicting video content is essential for planning and reasoning in dynamic environments. Despite advancements, unsupervised learning of object representations and dynamics remains challenging. We present {VideoPCDNet}, an unsupervised framework for object-centric video decomposition and prediction. Our model uses frequency-domain phase correlation techniques to recursively...}}