PedGen

Zhizheng Liu , Joe Lin , Wayne Wu , Bolei Zhou

University of California, Los Angeles
ICLR 2025

Overview

We address a new task of context-aware pedestrian movement generation from web videos, which has crucial applications like motion forecasting and scene simulation. To support the task, we curate a new large-scale real-world pedestrian movement dataset CityWalkers with pseudo-labels of diverse pedestrian movements and motion contexts. We further propose a context-aware generative model PedGen that deals with label noises and models various motion contexts to generate diverse pedestrian movements. We hope this study will present new opportunities and facilitate future research on modeling pedestrian movements in real-world settings.

CityWalkers: Capturing Diverse Real-World Pedestrian Movements

Existing human motion datasets rarely capture natural pedestrian movements, lack diversity in scenes and human subjects, and do not provide the critical context factors of pedestrian movements, such as surrounding environments, individual characteristics, and route destinations. To support the task of context-aware pedestrian movement generation, we construct CityWalkers, a large-scale dataset with real-world pedestrian movements in diverse urban environments annotated by pseudo-labeling techniques. Our data source consists of 30.8 hours of high-quality web videos of walking in cities worldwide posted by content creators on YouTube, including 120,914 pedestrians and 16,215 scenes across 227 cities and 49 countries, making it the most diverse human motion dataset regarding scene context and human subjects.

PedGen: Generating Context-Aware Pedestrian Movements

PedGen is a diffusion-based generative model and the first method for the new task of context-aware pedestrian movement generation. To mitigate the anomaly and incomplete labels from pseudo-labeling techniques, PedGen adopts a data iteration strategy to identify and remove low-quality labels from the dataset automatically and a motion mask embedding to train with partial labels. To model the important context factors, PedGen considers the surrounding environment, the individual characteristics, and the goal points as input conditions to generate realistic and long-term pedestrian movements in urban scenes.

Results

First Row: Generation Results on CityWalkers.
Second Row: Zero-shot Generation Results on the Waymo dataset.
Third Row: Zero-shot Generation Results in simulated environments in CARLA.

Video Qualitative Results:

Application: Real-World Pedestrian Movement Forecasting

PedGen can predict future movements of real-world pedestrians and combine the results with 3D Gaussian Splatting to render realistic predictions.

Application: Populating Urban Scenes in Simulation

PedGen can populate urban scenes with diverse and realistic pedestrian movements.

BibTeX

  @inproceedings{liu2025learning,
    title={Learning to Generate Diverse Pedestrian Movements from Web Videos with Noisy Labels},
    author={Liu, Zhizheng and Lin, Joe and Wu, Wayne and Zhou, Bolei},
    booktitle={The Thirteenth International Conference on Learning Representations},
    year={2025}
  }

Related Work

Soyong Shin, Juyong Kim, Eni Halilaj, Michael J. Black. WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion. CVPR 2024.
Comment: State-of-the-art for 4D human motion estimation from in-the-wild videos. We use it to extract pedestrian movement pseudo-labels for CityWalkers.

Lan Feng, Quanyi Li, Zhenghao Peng, Shuhan Tan, Bolei Zhou. TrafficGen: Learning to Generate Diverse and Realistic Traffic Scenarios. ICRA 2023.
Comment: This work can generate realistic traffic scenarios from real-world data, which can be combined with PedGen to simulate urban environments.