Open AccessProceedings Article
Pano2Vid: Automatic Cinematography for Watching 360° Videos.
Yu-Chuan Su,Dinesh Jayaraman,Kristen Grauman +2 more
- pp 45
TLDR
Through experimental evaluation on multiple newly defined Pano2Vid performance measures against several baselines, it is shown that the method successfully produces informative videos that could conceivably have been captured by human videographers.Abstract:
We introduce the novel task of Pano2Vid — automatic cinematography in panoramic 360◦ videos. Given a 360◦ video, the goal is to direct an imaginary camera to virtually capture natural-looking normal field-of-view (NFOV) video. By selecting “where to look” within the panorama at each time step, Pano2Vid aims to free both the videographer and the end viewer from the task of determining what to watch. Towards this goal, we first compile a dataset of 360◦ videos downloaded from the web, together with human-edited NFOV camera trajectories to facilitate evaluation. Next, we propose AutoCam, a data-driven approach to solve the Pano2Vid task.AutoCam leverages NFOV web video to discriminatively identify space-time “glimpses” of interest at each time instant, and then uses dynamic programming to select optimal humanlike camera trajectories. Through experimental evaluation on multiple newly defined Pano2Vid performance measures against several baselines, we show that our method successfully produces informative videos that could conceivably have been captured by human videographers.read more
Citations
More filters
Posted Content
Saliency in VR: How do people explore virtual environments?
TL;DR: This work captures and analyzes gaze and head orientation data of 169 users exploring stereoscopic, static omni-directional panoramas, for a total of 1980 head and gaze trajectories for three different viewing conditions, which leads to several important insights, such as the existence of a particular fixation bias.
Proceedings ArticleDOI
A dataset of head and eye movements for 360° videos
TL;DR: This paper presents a novel dataset of 360° videos with associated eye and head movement data, which is a follow-up to the previous dataset for still images and its associated code is made publicly available to support research on visual attention for 360° content.
Proceedings ArticleDOI
Kernel Transformer Networks for Compact Spherical Convolution
Yu-Chuan Su,Kristen Grauman +1 more
TL;DR: The Kernel Transformer Network (KTN) is presented to efficiently transfer convolution kernels from perspective images to the equirectangular projection of 360° images and successfully preserves the source CNN’s accuracy, while offering transferability, scalability to typical image resolutions, and, in many cases, a substantially lower memory footprint.
Posted Content
Learning Spherical Convolution for Fast Features from 360{\deg} Imagery
Yu-Chuan Su,Kristen Grauman +1 more
TL;DR: In this paper, a spherical convolutional network is proposed to translate a planar CNN to process 360° imagery directly in its equirectangular projection, sensitive to the varying distortion effects across the viewing sphere.
Posted Content
Cube Padding for Weakly-Supervised Saliency Prediction in 360{\deg} Videos
TL;DR: A spatial-temporal network which is (1) weakly-supervised trained and (2) tailor-made for 360° viewing sphere, and outperforms baseline methods in both speed and quality.
References
More filters
Journal ArticleDOI
Generative Adversarial Nets
Ian Goodfellow,Jean Pouget-Abadie,Mehdi Mirza,Bing Xu,David Warde-Farley,Sherjil Ozair,Aaron Courville,Yoshua Bengio +7 more
TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Posted Content
Learning Spatiotemporal Features with 3D Convolutional Networks
TL;DR: In this article, the authors proposed a simple and effective approach for spatio-temporal feature learning using deep 3D convolutional networks (3D ConvNets) trained on a large scale supervised video dataset.
Proceedings Article
Graph-Based Visual Saliency
TL;DR: A new bottom-up visual saliency model, Graph-Based Visual Saliency (GBVS), is proposed, which powerfully predicts human fixations on 749 variations of 108 natural images, achieving 98% of the ROC area of a human-based control, whereas the classical algorithms of Itti & Koch achieve only 84%.
Journal ArticleDOI
Learning to Detect a Salient Object
TL;DR: A set of novel features, including multiscale contrast, center-surround histogram, and color spatial distribution, are proposed to describe a salient object locally, regionally, and globally.
Posted Content
Colorful Image Colorization
TL;DR: In this article, the problem of hallucinating a plausible color version of the photograph is addressed by posing it as a classification task and using class-balancing at training time to increase the diversity of colors in the result.