Learning to Generate Diverse Pedestrian Movements from Web Videos with Noisy Labels
Zhizheng Liu , Joe Lin , Wayne Wu , Bolei Zhou
University of California, Los Angeles
Overview
We address a new task of context-aware pedestrian movement generation from web videos, which has crucial applications like motion forecasting and scene simulation. To support the task, we curate a new large-scale real-world pedestrian movement dataset CityWalkers with pseudo-labels of diverse pedestrian movements and motion contexts. We further propose a context-aware generative model PedGen that deals with label noises and models various motion contexts to generate diverse pedestrian movements. We hope this study will present new opportunities and facilitate future research on modeling pedestrian movements in real-world settings.
CityWalkers: Capturing Diverse Real-World Pedestrian Movements
Existing human motion datasets rarely capture natural pedestrian movements, lack diversity in scenes and human subjects, and do not provide the critical context factors of pedestrian movements, such as surrounding environments, individual characteristics, and route destinations. To support the task of context-aware pedestrian movement generation, we construct CityWalkers, a large-scale dataset with real-world pedestrian movements in diverse urban environments annotated by pseudo-labeling techniques. Our data source consists of 30.8 hours of high-quality web videos of walking in cities worldwide posted by content creators on YouTube, including 120,914 pedestrians and 16,215 scenes across 227 cities and 49 countries, making it the most diverse human motion dataset regarding scene context and human subjects.
PedGen: Generating Context-Aware Pedestrian Movements
PedGen is a diffusion-based generative model and the first method for the new task of context-aware pedestrian movement generation. To mitigate the anomaly and incomplete labels from pseudo-labeling techniques, PedGen adopts a data iteration strategy to identify and remove low-quality labels from the dataset automatically and a motion mask embedding to train with partial labels. To model the important context factors, PedGen considers the surrounding environment, the individual characteristics, and the goal points as input conditions to generate realistic and long-term pedestrian movements in urban scenes.
Results
First Row: Generation Results on CityWalkers.
Second Row: Zero-shot Generation Results on the Waymo dataset.
Third Row: Zero-shot Generation Results in simulated environments in CARLA.
Video Qualitative Results:
Application: Real-World Pedestrian Movement Forecasting
PedGen can predict future movements of real-world pedestrians and combine the results with 3D Gaussian Splatting to render realistic predictions.
Application: Populating Urban Scenes in Simulation
PedGen can populate urban scenes with diverse and realistic pedestrian movements.
BibTeX
  @article{liu2024learning,
    title={Learning to Generate Diverse Pedestrian Movements from Web Videos with Noisy Labels},
    author={Liu, Zhizheng and Lin, Joe and Wu, Wayne and Zhou, Bolei},
    journal={arXiv preprint arXiv:2410.07500},
    year={2024}
}
Related Work
Soyong Shin, Juyong Kim, Eni Halilaj, Michael J. Black. WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion. CVPR 2024.
Comment: State-of-the-art for 4D human motion estimation from in-the-wild videos. We use it to extract pedestrian movement pseudo-labels for CityWalkers.
Lan Feng, Quanyi Li, Zhenghao Peng, Shuhan Tan, Bolei Zhou. TrafficGen: Learning to Generate Diverse and Realistic Traffic Scenarios. ICRA 2023.
Comment: This work can generate realistic traffic scenarios from real-world data, which can be combined with PedGen to simulate urban environments.