3D-aware Image Synthesis via Learning Structural and Textural Representations
Yinghao Xu1Sida Peng2Ceyuan Yang1Yujun Shen3Bolei Zhou1
1 The Chinese University of Hong Kong   2 Zhejiang University   3 ByteDance Inc.
Overview
This paper aims at achieving high-fidelity 3D-aware images synthesis. We propose a novel framework, termed as VolumeGAN, for synthesizing images under different camera views, through explicitly learning a structural representation and a textural representation. We first learn a feature volume to represent the underlying structure, which is then converted to a feature field using a NeRF-like model. The feature field is further accumulated into a 2D feature map as the textural representation, followed by a neural renderer for appearance synthesis. Such a design enables independent control of the shape and the appearance. Extensive experiments on a wide range of datasets show that our approach achieves sufficiently higher image quality and better 3D control than the previous methods.
Results
Independent control of structure (shape) and texture (appearance) achieved by VolumeGAN.
Qualitative comparison between our VolumeGAN and existing alternatives.
Demo

We include a demo video, which shows more results with varying camera views. From the video, we can see the continuous 3D control achieved by our VolumeGAN.
BibTeX
@article{xu2021volumegan,
  title   = {3D-aware Image Synthesis via Learning Structural and Textural Representations},
  author  = {Xu, Yinghao and Peng, Sida and Yang, Ceyuan and Shen, Yujun and Zhou, Bolei},
  article = {arXiv preprint arXiv:2112.10759},
  year    = {2021}
}
Related Work
Thu Nguyen-Phuoc, Chuan Li, Lucas Theis, Christian Richardt, Yong-Liang Yang. HoloGAN: Unsupervised learning of 3D representations from natural images. ICCV, 2019.
Comment: Proposes voxelized and implicit 3D representations and then render it to 2D image space with a reshape operation.
Michael Niemeyer, Andreas Geiger. GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields. CVPR, 2021.
Comment: Proposes the compositional generative neural feature fields for scene synthesis.