Semantic Hierarchy Emerges in Deep Generative Representations for Scene Synthesis
Ceyuan Yang*Yujun Shen*Bolei Zhou
The Chinese University of Hong Kong
Overview
In this work, we show that highly-structured semantic hierarchy emerges from the generative representations as the variation factors for synthesizing scenes. By probing the layer-wise representations with a broad set of visual concepts at different abstraction levels, we are able to quantify the causality between the activations and the semantics occurring in the output image. The qualitative and quantitative results suggest that the generative representations learned by GANs are specialized to synthesize different hierarchical semantics: the early layers tend to determine the spatial layout and configuration, the middle layers control the categorical objects, and the later layers finally render the scene attributes as well as color scheme.
Results
Identifying such a set of manipulatable latent variation factors facilitates semantic scene manipulation.
Check more results of various scenes in the following video.
BibTeX
@article{yang2019semantic,
  title   = {Semantic Hierarchy Emerges in Deep Generative Representations for Scene Synthesis},
  author  = {Yang, Ceyuan and Shen, Yujun and Zhou, Bolei},
  journal = {arXiv preprint arXiv:1911.09267},
  year    = {2019}
}
Related Work
B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. Object Detectors Emerge in Deep Scene CNNs. ICLR, 2015.
Comment: Studies the emergent interpretable object detectors inside the CNNs trained for classifying scenes.
L. Goetschalckx, A. Andonian, A. Oliva, P. Isola. GANalyze: Toward Visual Definitions of Cognitive Image Properties. ICCV, 2019.
Comment: Navigates the manifold in the latent space to make images more or less memorable.
Y. Shen, J. Gu, X. Tang, B. Zhou. Interpreting Latent Space of GANs for Semantic Face Editing. CVPR, 2020.
Comment: Proposes a technique for semantic face editing in latent space.
A. Jahanian, L. Chai, P. Isola. On the "steerability" of generative adversarial networks. ICLR, 2020.
Comment: Shifts the distribution by "steering" the latent code to change camera motion and image color tone.