scispace - formally typeset
Open AccessProceedings Article

Pano2Vid: Automatic Cinematography for Watching 360° Videos.

TLDR
Through experimental evaluation on multiple newly defined Pano2Vid performance measures against several baselines, it is shown that the method successfully produces informative videos that could conceivably have been captured by human videographers.
Abstract
We introduce the novel task of Pano2Vid — automatic cinematography in panoramic 360◦ videos. Given a 360◦ video, the goal is to direct an imaginary camera to virtually capture natural-looking normal field-of-view (NFOV) video. By selecting “where to look” within the panorama at each time step, Pano2Vid aims to free both the videographer and the end viewer from the task of determining what to watch. Towards this goal, we first compile a dataset of 360◦ videos downloaded from the web, together with human-edited NFOV camera trajectories to facilitate evaluation. Next, we propose AutoCam, a data-driven approach to solve the Pano2Vid task.AutoCam leverages NFOV web video to discriminatively identify space-time “glimpses” of interest at each time instant, and then uses dynamic programming to select optimal humanlike camera trajectories. Through experimental evaluation on multiple newly defined Pano2Vid performance measures against several baselines, we show that our method successfully produces informative videos that could conceivably have been captured by human videographers.

read more

Citations
More filters
Posted Content

Saliency in VR: How do people explore virtual environments?

TL;DR: This work captures and analyzes gaze and head orientation data of 169 users exploring stereoscopic, static omni-directional panoramas, for a total of 1980 head and gaze trajectories for three different viewing conditions, which leads to several important insights, such as the existence of a particular fixation bias.
Proceedings ArticleDOI

A dataset of head and eye movements for 360° videos

TL;DR: This paper presents a novel dataset of 360° videos with associated eye and head movement data, which is a follow-up to the previous dataset for still images and its associated code is made publicly available to support research on visual attention for 360° content.
Proceedings ArticleDOI

Kernel Transformer Networks for Compact Spherical Convolution

TL;DR: The Kernel Transformer Network (KTN) is presented to efficiently transfer convolution kernels from perspective images to the equirectangular projection of 360° images and successfully preserves the source CNN’s accuracy, while offering transferability, scalability to typical image resolutions, and, in many cases, a substantially lower memory footprint.
Posted Content

Learning Spherical Convolution for Fast Features from 360{\deg} Imagery

TL;DR: In this paper, a spherical convolutional network is proposed to translate a planar CNN to process 360° imagery directly in its equirectangular projection, sensitive to the varying distortion effects across the viewing sphere.
Posted Content

Cube Padding for Weakly-Supervised Saliency Prediction in 360{\deg} Videos

TL;DR: A spatial-temporal network which is (1) weakly-supervised trained and (2) tailor-made for 360° viewing sphere, and outperforms baseline methods in both speed and quality.
References
More filters
Journal ArticleDOI

Generative Adversarial Nets

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Posted Content

Learning Spatiotemporal Features with 3D Convolutional Networks

TL;DR: In this article, the authors proposed a simple and effective approach for spatio-temporal feature learning using deep 3D convolutional networks (3D ConvNets) trained on a large scale supervised video dataset.
Proceedings Article

Graph-Based Visual Saliency

TL;DR: A new bottom-up visual saliency model, Graph-Based Visual Saliency (GBVS), is proposed, which powerfully predicts human fixations on 749 variations of 108 natural images, achieving 98% of the ROC area of a human-based control, whereas the classical algorithms of Itti & Koch achieve only 84%.
Journal ArticleDOI

Learning to Detect a Salient Object

TL;DR: A set of novel features, including multiscale contrast, center-surround histogram, and color spatial distribution, are proposed to describe a salient object locally, regionally, and globally.
Posted Content

Colorful Image Colorization

TL;DR: In this article, the problem of hallucinating a plausible color version of the photograph is addressed by posing it as a classification task and using class-balancing at training time to increase the diversity of colors in the result.
Related Papers (5)