Stereo Magnification: Learning View Synthesis using Multiplane Images

Tinghui Zhou1
Richard Tucker2
John Flynn2
Graham Fyffe2
Noah Snavely2
UC Berkeley1, Google2

In this paper, we explore an intriguing scenario for view synthesis: extrapolating views from imagery captured by narrow-baseline stereo cameras, including VR cameras and now-widespread dual-lens camera phones. We call this problem stereo magnification, and propose a learning framework that leverages a new layered representation that we call multiplane images (MPIs). Our method also uses a massive new data source for learning view extrapolation: online videos on YouTube. Using data mined from such videos, we train a deep network that predicts an MPI from an input stereo image pair. This inferred MPI can then be used to synthesize a range of novel views of the scene, including views that extrapolate significantly beyond the input baseline.


Stereo Magnification: Learning View Synthesis using Multiplane Images

Tinghui Zhou, Richard Tucker, John Flynn, Graham Fyffe, Noah Snavely


[PDF (35MB)]
[Compressed PDF (1MB)]



RealEstate10K Dataset


Note: due to data restrictions, we are unable to release the same version used in the paper. However, we are working on updating the results using this public version.

Sample Results from the Paper

 [Google drive Link] (772 MB)

Supplemental video

Authors' Note

After the publication of this work, it came to the authors' attention that another earlier representation similar to the MPI was explored in Stereo Matching with Transparency and Matting, Richard Szeliski and Polina Golland, IJCV 1999. We encourage citing Szeliski & Golland as well if you find MPIs useful in your research.


We thank the anonymous reviewers for their valuable comments and Shubham Tulsiani for helpful discussions. This work was done while TZ was an intern at Google. This webpage template was borrowed from some colorful folks.