Stereo Magnification: Learning View Synthesis using Multiplane Images

Richard Tucker²

John Flynn²

Graham Fyffe²

UC Berkeley¹, Google²

In SIGGRAPH, 2018

In this paper, we explore an intriguing scenario for view synthesis: extrapolating views from imagery captured by narrow-baseline stereo cameras, including VR cameras and now-widespread dual-lens camera phones. We call this problem stereo magnification, and propose a learning framework that leverages a new layered representation that we call multiplane images (MPIs). Our method also uses a massive new data source for learning view extrapolation: online videos on YouTube. Using data mined from such videos, we train a deep network that predicts an MPI from an input stereo image pair. This inferred MPI can then be used to synthesize a range of novel views of the scene, including views that extrapolate significantly beyond the input baseline.

Paper

Stereo Magnification: Learning View Synthesis using Multiplane Images

Tinghui Zhou, Richard Tucker, John Flynn, Graham Fyffe, Noah Snavely

In SIGGRAPH 2018

[PDF (35MB)]

[Compressed PDF (1MB)]

[Bibtex]

Code

[GitHub]

RealEstate10K Dataset

[Link]

Note: due to data restrictions, we are unable to release the same version used in the paper. However, we are working on updating the results using this public version.

Sample Results from the Paper

[Google drive Link] (772 MB)

Supplemental video

Authors' Note

After the publication of this work, it came to the authors' attention that another earlier representation similar to the MPI was explored in Stereo Matching with Transparency and Matting, Richard Szeliski and Polina Golland, IJCV 1999. We encourage citing Szeliski & Golland as well if you find MPIs useful in your research.

Acknowledgements

We thank the anonymous reviewers for their valuable comments and Shubham Tulsiani for helpful discussions. This work was done while TZ was an intern at Google. This webpage template was borrowed from some colorful folks.