Towards Smooth Video Composition

Qihang Zhang¹, Ceyuan Yang²,
Yujun Shen³, Yinghao Xu¹, Bolei Zhou⁴

¹ The Chinese University of Hong Kong, ² Shanghai AI Laboratory,
³ Ant Group, ⁴ University of California, Los Angeles

Paper | Code

** Generated videos from our methods.

Overview: A new baseline in video generation

Video generation requires synthesizing consistent and persistent frames with dynamic content over time. This work investigates modeling the temporal relations for composing video with arbitrary length, from a few frames to even infinite, using generative adversarial networks (GANs).

First, towards composing adjacent frames, we show that the alias-free operation for single image generation, together with adequately pre-learned knowledge, brings a smooth frame transition without compromising the per-frame quality.
Second, by incorporating the temporal shift module (TSM), originally designed for video understanding, into the discriminator, we manage to advance the generator in synthesizing more consistent dynamics.
Third, we develop a novel B-Spline based motion representation to ensure temporal smoothness to achieve infinite-length video generation. It can go beyond the frame number used in training. A low-rank temporal modulation is also proposed to alleviate repeating contents for long time generation.

We evaluate our approach on various datasets and show substantial improvements over video generation baselines.

Short range: Alias free property

Texture sticking appears in videos generated by StyleGAN-V, where texture sticks to fixed coordinates.

To give a better illustration, we track the pixels at certain coordinates as the video continues and the brush effect in the left part of the following video indicates that these pixels actually move little.

In contrast, our approach achieves smoother frame transition.

Mid range: Explicit temporal reasoning in Discriminator

We incorporate Temporal Shift Module (TSM) in the discriminator to perform explicit temporal reasoning.

As shown in the following chart, the performance is greatly improved with TSM module. (For FVD16 and FVD128, a lower number indicates better result.)

Long range: smooth composition for infinite length generation

Jittering phenomenon exists in StyleGAN-V. While our approach achieves much smoother results when generating long videos.

Reference

@article{zhang2022towards,
  title={Towards Smooth Video Composition},
  author={Zhang, Qihang and Yang, Ceyuan and Shen, Yujun and Xu, Yinghao and Zhou, Bolei},
  journal={International Conference on Learning Representations (ICLR)},
  year={2023}
}