Joint Optimization for 4D Human-Scene Reconstruction in the Wild
Zhizheng Liu , Joe Lin , Wayne Wu , Bolei Zhou
University of California, Los Angeles
TL;DR: Jointly optimize metric-scale scene reconstruction and human motion improves accuracy on both tasks.
Interactive Demos on Web Videos
Overview
We aim to capture human-scene interactions in the wild by reconstructing both the 4D global human motion and the 3D scene from monocular videos. We propose a novel method JOSH (Joint Optimization of Scene Geometry and Human Motion) that jointly optimizes motion and scene with the human-scene contact constraints. We further propose an efficient model variant, JOSH3R, to estimate human-scene reconstruction in real-time, greatly expediting web video processing. Experiment results show that JOSH achieves better accuracy for both global human motion estimation and dense scene reconstruction than other methods.
Results on Datasets
Evaluation on Global Human Motion Estimation with the EMDB Dataset
Evaluation on Global Camera Trajectory Estimation with the SLOPER4D Dataset
Evaluation on 4D Human-Scene Reconstruction with the RICH Dataset
BibTeX
@article{alias,
  title   = {},
  author  = {},
  journal = {},
  year    = {}
}
Related Work
Zhizheng Liu, Joe Lin, Wayne Wu, Bolei Zhou. Learning to Generate Diverse Pedestrian Movements from Web Videos with Noisy Labels. Preprint (arXiv) , 2024.
Comment: This work proposes a dataset and a model for context-aware pedestrian movement generation from pseudo-labels of web videos. We can use JOSH to extract human and scene labels with better quality for pedestrian movement generation.
Bardienus Duisterhof, Lojze Zust, Philippe Weinzaepfel, Vincent Leroy, Yohann Cabon, Jérome Revaud. MASt3R-SfM: a fully-Integrated solution for unconstrained Structure-from-Motion. Preprint (arXiv) , 2024.
Comment: This work proposes an efficient and robust pipeline for dense scene reconstruction from an unconstrained collection of images. In our implementation of JOSH, we use its results as the initialization of the local scene reconstruction.