TL;DR: Jointly optimize metric-scale scene reconstruction and human motion improves accuracy on both tasks.
Interactive Demos on Web Videos
Overview
We aim to capture human-scene interactions in the wild by reconstructing both the 4D global human motion and the 3D scene from monocular videos. We propose a novel method
JOSH (Joint Optimization of Scene Geometry and Human Motion) that jointly optimizes motion and scene
with the human-scene contact constraints. We further propose an efficient model variant, JOSH3R, to estimate human-scene
reconstruction in real-time, greatly expediting web video processing. Experiment results show that JOSH achieves better
accuracy for both global human motion estimation and dense scene reconstruction than other methods.
Results on Datasets
Evaluation on Global Human Motion Estimation with the EMDB Dataset
Evaluation on Global Camera Trajectory Estimation with the SLOPER4D Dataset
Evaluation on 4D Human-Scene Reconstruction with the RICH Dataset
BibTeX
@article{alias,
title = {},
author = {},
journal = {},
year = {}
}
Comment: This work proposes a dataset and a model for context-aware pedestrian movement generation from pseudo-labels of web videos. We can use JOSH to extract human and scene labels with better quality for pedestrian movement generation.