Towards Smooth Video Composition
Qihang Zhang1Ceyuan Yang2
Yujun Shen3Yinghao Xu1Bolei Zhou4 
1 The Chinese University of Hong Kong, 2 Shanghai AI Laboratory,
3 Ant Group, 4 University of California, Los Angeles

** Generated videos from our methods.

Overview: A new baseline in video generation
Video generation requires synthesizing consistent and persistent frames with dynamic content over time. This work investigates modeling the temporal relations for composing video with arbitrary length, from a few frames to even infinite, using generative adversarial networks (GANs). We evaluate our approach on various datasets and show substantial improvements over video generation baselines.
Short range: Alias free property

Texture sticking appears in videos generated by StyleGAN-V, where texture sticks to fixed coordinates.

To give a better illustration, we track the pixels at certain coordinates as the video continues and the brush effect in the left part of the following video indicates that these pixels actually move little.

In contrast, our approach achieves smoother frame transition.

Mid range: Explicit temporal reasoning in Discriminator

We incorporate Temporal Shift Module (TSM) in the discriminator to perform explicit temporal reasoning.

As shown in the following chart, the performance is greatly improved with TSM module. (For FVD16 and FVD128, a lower number indicates better result.)

Long range: smooth composition for infinite length generation

Jittering phenomenon exists in StyleGAN-V. While our approach achieves much smoother results when generating long videos.

Reference
@article{zhang2022towards,
  title={Towards Smooth Video Composition},
  author={Zhang, Qihang and Yang, Ceyuan and Shen, Yujun and Xu, Yinghao and Zhou, Bolei},
  journal={International Conference on Learning Representations (ICLR)},
  year={2023}
}