We address a new task of context-aware pedestrian movement
generation from web videos, which has crucial applications like motion forecasting and scene simulation.
To support the task, we curate a new large-scale real-world pedestrian movement dataset CityWalkers with
pseudo-labels of diverse pedestrian movements and motion contexts.
We further propose a context-aware generative model PedGen that deals with label noises and models various
motion contexts to generate diverse pedestrian movements.
We hope this study will present new opportunities and facilitate future research on modeling pedestrian movements in real-world settings.
CityWalkers: Capturing Diverse Real-World Pedestrian Movements
Existing human motion datasets rarely capture natural pedestrian movements, lack diversity in scenes and human subjects,
and do not provide the critical context factors of pedestrian movements, such as surrounding environments, individual characteristics,
and route destinations. To support the task of context-aware pedestrian movement generation, we construct CityWalkers,
a large-scale dataset with real-world pedestrian movements in diverse urban environments annotated by pseudo-labeling techniques.
Our data source consists of 30.8 hours of high-quality web videos of walking in cities worldwide posted by content creators on YouTube, including 120,914
pedestrians and 16,215 scenes across 227 cities and 49 countries, making it the most diverse human
motion dataset regarding scene context and human subjects.
PedGen is a diffusion-based generative model and the first
method for the new task of context-aware pedestrian movement generation.
To mitigate the anomaly and incomplete labels from pseudo-labeling techniques, PedGen adopts a data iteration strategy to identify and remove low-quality labels from the dataset automatically and a motion mask embedding to train with partial labels.
To model the important context factors, PedGen considers the surrounding environment, the individual characteristics, and the goal points as input conditions to generate realistic and long-term pedestrian movements in urban scenes.
Results
First Row: Generation Results on CityWalkers.
Second Row: Zero-shot Generation Results on the Waymo dataset.
Third Row: Zero-shot Generation Results in simulated environments in CARLA.
Video Qualitative Results:
Application: Real-World Pedestrian Movement Forecasting
PedGen can predict future movements of real-world pedestrians and combine the results with 3D Gaussian Splatting to render realistic predictions.
Application: Populating Urban Scenes in Simulation
PedGen can populate urban scenes with diverse and realistic pedestrian movements.
BibTeX
@article{liu2024learning,
title={Learning to Generate Diverse Pedestrian Movements from Web Videos with Noisy Labels},
author={Liu, Zhizheng and Lin, Joe and Wu, Wayne and Zhou, Bolei},
journal={arXiv preprint arXiv:2410.07500},
year={2024}
}
Comment: State-of-the-art for 4D human motion estimation from in-the-wild videos. We use it to extract pedestrian movement pseudo-labels for CityWalkers.