Pano2Vid: Automatic Cinematography for Watching 360° Videos.

Open AccessProceedings Article

Pano2Vid: Automatic Cinematography for Watching 360° Videos.

- pp 45

TLDR

Through experimental evaluation on multiple newly defined Pano2Vid performance measures against several baselines, it is shown that the method successfully produces informative videos that could conceivably have been captured by human videographers.

Abstract:

We introduce the novel task of Pano2Vid — automatic cinematography in panoramic 360◦ videos. Given a 360◦ video, the goal is to direct an imaginary camera to virtually capture natural-looking normal field-of-view (NFOV) video. By selecting “where to look” within the panorama at each time step, Pano2Vid aims to free both the videographer and the end viewer from the task of determining what to watch. Towards this goal, we first compile a dataset of 360◦ videos downloaded from the web, together with human-edited NFOV camera trajectories to facilitate evaluation. Next, we propose AutoCam, a data-driven approach to solve the Pano2Vid task.AutoCam leverages NFOV web video to discriminatively identify space-time “glimpses” of interest at each time instant, and then uses dynamic programming to select optimal humanlike camera trajectories. Through experimental evaluation on multiple newly defined Pano2Vid performance measures against several baselines, we show that our method successfully produces informative videos that could conceivably have been captured by human videographers.

Citations

PDF

Open Access

More filters

Posted Content

Saliency in VR: How do people explore virtual environments?

Vincent Sitzmann, +5 more

TL;DR: This work captures and analyzes gaze and head orientation data of 169 users exploring stereoscopic, static omni-directional panoramas, for a total of 1980 head and gaze trajectories for three different viewing conditions, which leads to several important insights, such as the existence of a particular fixation bias.

...read moreread less

Proceedings ArticleDOI

A dataset of head and eye movements for 360° videos

Erwan J. David, +4 more

TL;DR: This paper presents a novel dataset of 360° videos with associated eye and head movement data, which is a follow-up to the previous dataset for still images and its associated code is made publicly available to support research on visual attention for 360° content.

...read moreread less

Proceedings ArticleDOI

Kernel Transformer Networks for Compact Spherical Convolution

Yu-Chuan Su, +1 more

TL;DR: The Kernel Transformer Network (KTN) is presented to efficiently transfer convolution kernels from perspective images to the equirectangular projection of 360° images and successfully preserves the source CNN’s accuracy, while offering transferability, scalability to typical image resolutions, and, in many cases, a substantially lower memory footprint.

...read moreread less

Posted Content

Learning Spherical Convolution for Fast Features from 360{\deg} Imagery

Yu-Chuan Su, +1 more

- 02 Aug 2017 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: In this paper, a spherical convolutional network is proposed to translate a planar CNN to process 360° imagery directly in its equirectangular projection, sensitive to the varying distortion effects across the viewing sphere.

...read moreread less

Posted Content

Cube Padding for Weakly-Supervised Saliency Prediction in 360{\deg} Videos

Hsien-Tzu Cheng, +5 more

- 04 Jun 2018 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A spatial-temporal network which is (1) weakly-supervised trained and (2) tailor-made for 360° viewing sphere, and outperforms baseline methods in both speed and quality.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Generative Adversarial Nets

Ian Goodfellow, +7 more

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.

...read moreread less

Posted Content

Learning Spatiotemporal Features with 3D Convolutional Networks

Du Tran, +5 more

- 02 Dec 2014 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: In this article, the authors proposed a simple and effective approach for spatio-temporal feature learning using deep 3D convolutional networks (3D ConvNets) trained on a large scale supervised video dataset.

...read moreread less

Proceedings Article

Graph-Based Visual Saliency

Jonathan Harel, +2 more

TL;DR: A new bottom-up visual saliency model, Graph-Based Visual Saliency (GBVS), is proposed, which powerfully predicts human fixations on 749 variations of 108 natural images, achieving 98% of the ROC area of a human-based control, whereas the classical algorithms of Itti & Koch achieve only 84%.

...read moreread less

Journal ArticleDOI

Learning to Detect a Salient Object

Tie Liu, +6 more

- 01 Feb 2011 -

IEEE Transactions on Pattern Analysis an...

TL;DR: A set of novel features, including multiscale contrast, center-surround histogram, and color spatial distribution, are proposed to describe a salient object locally, regionally, and globally.

...read moreread less

Posted Content

Colorful Image Colorization

Richard Zhang, +2 more

- 28 Mar 2016 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: In this article, the problem of hallucinating a plausible color version of the photograph is addressed by posing it as a classification task and using class-balancing at training time to increase the diversity of colors in the result.

...read moreread less

Collapse

Pano2Vid: Automatic Cinematography for Watching 360° Videos.

Citations

Saliency in VR: How do people explore virtual environments?

A dataset of head and eye movements for 360° videos

Kernel Transformer Networks for Compact Spherical Convolution

Learning Spherical Convolution for Fast Features from 360{\deg} Imagery

Cube Padding for Weakly-Supervised Saliency Prediction in 360{\deg} Videos

References

Generative Adversarial Nets

Learning Spatiotemporal Features with 3D Convolutional Networks

Graph-Based Visual Saliency

Learning to Detect a Salient Object

Colorful Image Colorization

Related Papers (5)

Deep 360 Pilot: Learning a Deep Agent for Piloting through 360° Sports Videos

Making 360° Video Watchable in 2D: Learning Videography for Click Free Viewing

Graph-Based Visual Saliency

Deep Residual Learning for Image Recognition

Learning Spatiotemporal Features with 3D Convolutional Networks