Showing papers on "Video compression picture types published in 2017"

PDF

Open Access

Proceedings Article•DOI•

Video Frame Synthesis Using Deep Voxel Flow

[...]

Ziwei Liu¹, Raymond A. Yeh², Xiaoou Tang¹, Yiming Liu³, Aseem Agarwala⁴ - Show less +1 more•Institutions (4)

The Chinese University of Hong Kong¹, University of Illinois at Urbana–Champaign², Google³, Adobe Systems⁴

01 Oct 2017

TL;DR: Deep voxel flow as mentioned in this paper combines the advantages of optical flow and neural network-based methods by training a deep network that learns to synthesize video frames by flowing pixel values from existing ones, which can be applied at any video resolution.

...read moreread less

Abstract: We address the problem of synthesizing new video frames in an existing video, either in-between existing frames (interpolation), or subsequent to them (extrapolation). This problem is challenging because video appearance and motion can be highly complex. Traditional optical-flow-based solutions often fail where flow estimation is challenging, while newer neural-network-based methods that hallucinate pixel values directly often produce blurry results. We combine the advantages of these two methods by training a deep network that learns to synthesize video frames by flowing pixel values from existing ones, which we call deep voxel flow. Our method requires no human supervision, and any video can be used as training data by dropping, and then learning to predict, existing frames. The technique is efficient, and can be applied at any video resolution. We demonstrate that our method produces results that both quantitatively and qualitatively improve upon the state-of-the-art.

...read moreread less

601 citations

Journal Article•DOI•

A comparative review of tone-mapping algorithms for high dynamic range video

[...]

Gabriel Eilertsen¹, Rafal Mantiuk², Jonas Unger¹•Institutions (2)

Linköping University¹, University of Cambridge²

01 May 2017

TL;DR: This report sets out to summarize and categorize the research in tone‐mapping as of today, distilling the most important trends and characteristics of the tone reproduction pipeline and specifically focuses on tone-mapping of HDR video and the problems this medium entails.

...read moreread less

Abstract: Tone-mapping constitutes a key component within the field of high dynamic range HDR imaging. Its importance is manifested in the vast amount of tone-mapping methods that can be found in the literature, which are the result of an active development in the area for more than two decades. Although these can accommodate most requirements for display of HDR images, new challenges arose with the advent of HDR video, calling for additional considerations in the design of tone-mapping operators TMOs. Today, a range of TMOs exist that do support video material. We are now reaching a point where most camera captured HDR videos can be prepared in high quality without visible artifacts, for the constraints of a standard display device. In this report, we set out to summarize and categorize the research in tone-mapping as of today, distilling the most important trends and characteristics of the tone reproduction pipeline. While this gives a wide overview over the area, we then specifically focus on tone-mapping of HDR video and the problems this medium entails. First, we formulate the major challenges a video TMO needs to address. Then, we provide a description and categorization of each of the existing video TMOs. Finally, by constructing a set of quantitative measures, we evaluate the performance of a number of the operators, in order to give a hint on which can be expected to render the least amount of artifacts. This serves as a comprehensive reference, categorization and comparative assessment of the state-of-the-art in tone-mapping for HDR video.

...read moreread less

90 citations

Journal Article•DOI•

Comparison and Evaluation of Light Field Image Coding Approaches

[...]

Irene Viola¹, Martin Rerabek¹, Touradj Ebrahimi¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

15 Aug 2017-IEEE Journal of Selected Topics in Signal Processing

TL;DR: This paper aims at the evaluation of perceived visual quality of light field images and at comparing the performance of a few state-of-the-art algorithms for light field image compression, by means of a set of objective and subjective quality assessments.

...read moreread less

Abstract: The recent advances in light field imaging, supported among others by the introduction of commercially available cameras, e.g., Lytro or Raytrix, are changing the ways in which visual content is captured and processed. Efficient storage and delivery systems for light field images must rely on compression algorithms. Several methods to compress light field images have been proposed recently. However, in-depth evaluations of compression algorithms have rarely been reported. This paper aims at the evaluation of perceived visual quality of light field images and at comparing the performance of a few state-of-the-art algorithms for light field image compression. First, a processing chain for light field image compression and decompression is defined for two typical use cases, professional and consumer. Then, five light field compression algorithms are compared by means of a set of objective and subjective quality assessments. An interactive methodology recently introduced by authors, as well as a passive methodology is used to perform these evaluations. The results provide a useful benchmark for future development of compression solutions for light field images.

...read moreread less

86 citations

Proceedings Article•DOI•

Adaptive 360-degree video streaming using layered video coding

[...]

Afshin TaghaviNasrabadi¹, Anahita Mahzari¹, Joseph D. Beshay¹, Ravi Prakash¹•Institutions (1)

University of Texas at Dallas¹

18 Mar 2017

TL;DR: This work proposes using layered encoding for 360-degree video to improve QoE by reducing the probability of video freezes and the latency of response to the user head movements, which reduces the storage requirements significantly and improves in-network cache performance.

...read moreread less

Abstract: Virtual reality and 360-degree video streaming are growing rapidly; however, streaming 360-degree video is very challenging due to high bandwidth requirements. To address this problem, the video quality is adjusted according to the user viewport prediction. High quality video is only streamed for the user viewport, reducing the overall bandwidth consumption. Existing solutions use shallow buffers limited by the accuracy of viewport prediction. Therefore, playback is prone to video freezes which are very destructive for the Quality of Experience(QoE). We propose using layered encoding for 360-degree video to improve QoE by reducing the probability of video freezes and the latency of response to the user head movements. Moreover, this scheme reduces the storage requirements significantly and improves in-network cache performance.

...read moreread less

73 citations

Journal Article•DOI•

Multi-Modal Visual Features-Based Video Shot Boundary Detection

[...]

Sawitchaya Tippaya¹, Suchada Sitjongsataporn², Tele Tan³, Masood Mehmood Khan³, Kosin Chamnongthai¹ - Show less +1 more•Institutions (3)

King Mongkut's University of Technology Thonburi¹, Mahanakorn University of Technology², Curtin University³

21 Jun 2017-IEEE Access

TL;DR: A multi-modal visual features-based SBD framework is employed that aims to analyze the behaviors of visual representation in terms of the discontinuity signal and can achieve good accuracy in both types of video data set compared with other proposed SBD methods.

...read moreread less

Abstract: One of the essential pre-processing steps of semantic video analysis is the video shot boundary detection (SBD). It is the primary step to segment the sequence of video frames into shots. Many SBD systems using supervised learning have been proposed for years; however, the training process still remains its principal limitation. In this paper, a multi-modal visual features-based SBD framework is employed that aims to analyze the behaviors of visual representation in terms of the discontinuity signal. We adopt a candidate segment selection that performs without the threshold calculation but uses the cumulative moving average of the discontinuity signal to identify the position of shot boundaries and neglect the non-boundary video frames. The transition detection is structurally performed to distinguish candidate segment into a cut transition and a gradual transition, including fade in/out and logo occurrence. Experimental results are evaluated using the golf video clips and the TREC2001 documentary video data set. Results show that the proposed SBD framework can achieve good accuracy in both types of video data set compared with other proposed SBD methods.

...read moreread less

57 citations

Proceedings Article•DOI•

Review of image compression techniques

[...]

Abhipriya Singh¹, K.G. Kirar¹•Institutions (1)

Samrat Ashok Technological Institute¹

01 Oct 2017

TL;DR: K-means clustering, 2D-DWT and fuzzy logic based image compression are discussed, which are considered to be good techniques for reducing data redundancy in image processing.

...read moreread less

Abstract: Demand of multimedia growth, contributes to insufficient bandwidth of network and memory storage device. Therefore data compression is more required for reducing data redundancy to save more hardware space and transmission bandwidth. Image compression is one of the main research in the field of image processing. Many techniques are given for image compression. Some of which are discussed in this paper. This paper discusses k-means clustering, 2D-DWT and fuzzy logic based image compression.

...read moreread less

56 citations

Journal Article•DOI•

Temporally Dependent Rate-Distortion Optimization for Low-Delay Hierarchical Video Coding

[...]

Yanbo Gao¹, Ce Zhu¹, Shuai Li², Tianwu Yang³•Institutions (3)

University of Electronic Science and Technology of China¹, University of Wollongong², Singapore Polytechnic³

08 Jun 2017-IEEE Transactions on Image Processing

TL;DR: A hierarchical temporally dependent RDO scheme is developed specifically for the LD-HCS based on a source distortion propagation model and can achieve higher coding gains, coupled with QP adaption.

...read moreread less

Abstract: Low-delay hierarchical coding structure (LD-HCS), as one of the most important components in the latest High Efficiency Video Coding (HEVC) standard, greatly improves coding performance. It groups consecutive P/B frames into different layers and encodes them with different quantization parameters (QPs) and reference mechanisms in such a way that temporal dependency among frames can be exploited. However, due to varying characteristics of video contents, temporal dependency among coding units differs significantly from each other in the same or different layers, while a fixed LD-HCS scheme cannot take full advantage of the dependency, leading to a substantial loss in coding performance. This paper addresses the temporally dependent rate distortion optimization (RDO) problem by attempting to exploit varying temporal dependency of different units. First, the temporal relationship of different frames under the LD-HCS is examined, and hierarchical temporal propagation chains are constructed to represent the temporal dependency among coding units in different frames. Then, a hierarchical temporally dependent RDO scheme is developed specifically for the LD-HCS based on a source distortion propagation model. Experimental results show that our proposed scheme can achieve 2.5% and 2.3% BD-rate gain in average compared with the HEVC codec under the same configuration of P and B frames, respectively, with a negligible increase in encoding time. Furthermore, coupled with QP adaption, our proposed method can achieve higher coding gains, e.g., with multi-QP optimization, about 5.4% and 5.0% BD-rate saving in average over the HEVC codec under the same setting of P and B frames, respectively.

...read moreread less

54 citations

Proceedings Article•DOI•

Prioritized Buffer Control in Two-tier 360 Video Streaming

[...]

Fanyi Duanmu¹, Eymen Kurdoglu¹, S. Amir Hosseini¹, Yong Liu¹, Yao Wang¹ - Show less +1 more•Institutions (1)

New York University¹

11 Aug 2017

TL;DR: A two-tier 360 video streaming framework with prioritized buffer control is proposed to effectively accommodate the dynamics in both network bandwidth and viewing direction, and it is demonstrated that the proposed framework can significantly outperform the conventional360 video streaming solutions.

...read moreread less

Abstract: 360 degree video compression and streaming is one of the key components of Virtual Reality (VR) applications. In 360 video streaming, a user may freely navigate through the captured 3D environment by changing her desired viewing direction. Only a small portion of the entire 360 degree video is watched at any time. Streaming the entire 360 degree raw video is therefore unnecessary and bandwidth-consuming. One the other hand, only streaming the video in the predicted user's view direction will introduce streaming discontinuity whenever the the prediction is wrong. In this work, a two-tier 360 video streaming framework with prioritized buffer control is proposed to effectively accommodate the dynamics in both network bandwidth and viewing direction. Through simulations driven by real network bandwidth and viewing direction traces, we demonstrate that the proposed framework can significantly outperform the conventional 360 video streaming solutions.

...read moreread less

52 citations

Journal Article•DOI•

Compressed and raw video steganography techniques: a comprehensive survey and analysis

[...]

Ramadhan J. Mstafa¹, Khaled M. Elleithy¹•Institutions (1)

University of Bridgeport¹

01 Oct 2017-Multimedia Tools and Applications

TL;DR: This paper presents a comprehensive study and analysis of numerous cutting edge video steganography methods and their performance evaluations from literature, and suggests current research directions and recommendations to improve on existing video Steganography techniques.

...read moreread less

Abstract: In the last two decades, the science of covertly concealing and communicating data has acquired tremendous significance due to the technological advancement in communication and digital content. Steganography is the art of concealing secret data in a particular interactive media transporter, e.g., text, audio, image, and video data in order to build a covert communication between authorized parties. Nowadays, video steganography techniques have become important in many video-sharing and social networking applications such as Livestreaming, YouTube, Twitter, and Facebook because of the noteworthy development of advanced video over the Internet. The performance of any steganographic method ultimately relies on the imperceptibility, hiding capacity, and robustness. In the past decade, many video steganography methods have been proposed; however, the literature lacks of sufficient survey articles that discuss all techniques. This paper presents a comprehensive study and analysis of numerous cutting edge video steganography methods and their performance evaluations from literature. Both compressed and raw video steganography methods are surveyed. In the compressed domain, video steganography techniques are categorized according to the video compression stages as venues for data hiding such as intra frame prediction, inter frame prediction, motion vectors, transformed and quantized coefficients, and entropy coding. On the other hand, raw video steganography methods are classified into spatial and transform domains. This survey suggests current research directions and recommendations to improve on existing video steganography techniques.

...read moreread less

51 citations

Proceedings Article•DOI•

A C3D-Based Convolutional Neural Network for Frame Dropping Detection in a Single Video Shot

[...]

Chengjiang Long¹, Eric Smith¹, Arslan Basharat¹, Anthony Hoogs¹•Institutions (1)

Kitware¹

01 Jul 2017

TL;DR: This paper proposes a new approach for forensic analysis by exploiting the local spatio-temporal relationships within a portion of a video to robustly detect frame removals and produces a refined video-level confidence score that is superior to the raw output scores from the network.

...read moreread less

Abstract: Frame dropping is a type of video manipulation where consecutive frames are deleted to omit content from the original video. Automatically detecting dropped frames across a large archive of videos while maintaining a low false alarm rate is a challenging task in digital video forensics. We propose a new approach for forensic analysis by exploiting the local spatio-temporal relationships within a portion of a video to robustly detect frame removals. In this paper, we propose to adapt the Convolutional 3D Neural Network (C3D) for frame drop detection. In order to further suppress the errors due by the network, we produce a refined video-level confidence score and demonstrate that it is superior to the raw output scores from the network. We conduct experiments on two challenging video datasets containing rapid camera motion and zoom changes. The experimental results clearly demonstrate the efficacy of the proposed approach.

...read moreread less

38 citations

Journal Article•DOI•

Asymmetrically Compressed Stereoscopic 3D Videos: Quality Assessment and Rate-Distortion Performance Evaluation

[...]

Jiheng Wang¹, Shiqi Wang², Zhou Wang¹•Institutions (2)

University of Waterloo¹, Nanyang Technological University²

01 Mar 2017-IEEE Transactions on Image Processing

TL;DR: A binocular rivalry inspired model is applied to account for the prediction bias, leading to a significantly improved full reference quality prediction model of stereoscopic videos that allows us to quantitatively predict the coding gain of different variations of asymmetric video compression, and provides new insight on the development of high efficiency 3D video coding schemes.

...read moreread less

Abstract: Objective quality assessment of stereoscopic 3D video is challenging but highly desirable, especially in the application of stereoscopic video compression and transmission, where useful quality models are missing, that can guide the critical decision making steps in the selection of mixed-resolution coding, asymmetric quantization, and pre- and post-processing schemes. Here we first carry out subjective quality assessment experiments on two databases that contain various asymmetrically compressed stereoscopic 3D videos obtained from mixed-resolution coding, asymmetric transform-domain quantization coding, their combinations, and the multiple choices of postprocessing techniques. We compare these asymmetric stereoscopic video coding schemes with symmetric coding methods and verify their potential coding gains. We observe a strong systematic bias when using direct averaging of 2D video quality of both views to predict 3D video quality. We then apply a binocular rivalry inspired model to account for the prediction bias, leading to a significantly improved full reference quality prediction model of stereoscopic videos. The model allows us to quantitatively predict the coding gain of different variations of asymmetric video compression, and provides new insight on the development of high efficiency 3D video coding schemes.

...read moreread less

Journal Article•DOI•

An Efficient Framework for Compressed Domain Watermarking in P Frames of High-Efficiency Video Coding (HEVC)--Encoded Video

[...]

Tanima Dutta¹, Hari Prabhat Gupta¹•Institutions (1)

Indian Institute of Technology (BHU) Varanasi¹

17 Jan 2017-ACM Transactions on Multimedia Computing, Communications, and Applications

TL;DR: This article proposes a robust watermarking framework for HEVC-encoded video using informed detector and shows that the proposed work effectively limits the increase in video bitrate and degradation in perceptual quality.

...read moreread less

Abstract: Digital watermarking has received much attention in recent years as a promising solution to copyright protection. Video watermarking in compressed domain has gained importance since videos are stored and transmitted in a compressed format. This decreases the overhead to fully decode and re-encode the video for embedding and extraction of the watermark. High Efficiency Video Coding (HEVC/H.265) is the latest and most efficient video compression standard and a successor to H.264 Advanced Video Coding. In this article, we propose a robust watermarking framework for HEVC-encoded video using informed detector. A readable watermark is embedded invisibly in P frames for better perceptual quality. Our framework imposes security and robustness by selecting appropriate blocks using a random key and the spatio-temporal characteristics of the compressed video. A detail analysis of the strengths of different compressed domain features is performed for implementing the watermarking framework. We experimentally demonstrate the utility of the proposed work. The results show that the proposed work effectively limits the increase in video bitrate and degradation in perceptual quality. The proposed framework is robust against re-encoding and image processing attacks.

...read moreread less

Proceedings Article•DOI•

Spherical domain rate-distortion optimization for 360-degree video coding

[...]

Timing Li¹, Xu Jizheng², Zhenzhong Chen¹•Institutions (2)

Wuhan University¹, Microsoft²

01 Jul 2017

TL;DR: This paper derives the optimal rate-distortion relationship in spherical domain and presents its optimal solution based on HEVC/H.265 anchor for 360-degree video coding.

...read moreread less

Abstract: Emerging virtual reality (VR) applications bring much challenge to video coding for 360-degree videos. To compress this kind of video, each picture should be projected to a 2D plane (e.g. equirctangular projection map) first, adapting to the input of existing video coding systems. At the display side, an inverse projection is performed before viewport rendering. However, such a project introduces much different levels of distortions depending on the location, which makes the rate-distortion optimization process in video coding much inefficient. In this paper, we consider the distortion in spherical domain and analyse its influence to the rate-distortion optimization process. Then we derive the optimal rate-distortion relationship in spherical domain and present its optimal solution based on HEVC/H.265. Experimental results show that the proposed method can bring up to 11.5% bit-saving compared with the current HEVC/H.265 anchor for 360-degree video coding.

...read moreread less

Journal Article•DOI•

A video summarization approach based on the emulation of bottom-up mechanisms of visual attention

[...]

Hugo Jacob¹, Flávio Luis Cardeal Pádua¹, Anisio Lacerda¹, Adriano C. M. Pereira²•Institutions (2)

Centro Federal de Educação Tecnológica de Minas Gerais¹, Universidade Federal de Minas Gerais²

01 Oct 2017

TL;DR: Experimental results with videos from the Open Video Project show that the proposed new computational visual attention model represents an effective solution to the problem of automatic video summarization, producing video summaries with similar quality to the ground-truth manually created by a group of 50 users.

...read moreread less

Abstract: This work addresses the development of a computational model of visual attention to perform the automatic summarization of digital videos from television archives. Although the television system represents one of the most fascinating media phenomena ever created, we still observe the absence of effective solutions for content-based information retrieval from video recordings of programs produced by this media universe. This fact relates to the high complexity of the content-based video retrieval problem, which involves several challenges, among which we may highlight the usual demand on video summaries to facilitate indexing, browsing and retrieval operations. To achieve this goal, we propose a new computational visual attention model, inspired on the human visual system and based on computer vision methods (face detection, motion estimation and saliency map computation), to estimate static video abstracts, that is, collections of salient images or key frames extracted from the original videos. Experimental results with videos from the Open Video Project show that our approach represents an effective solution to the problem of automatic video summarization, producing video summaries with similar quality to the ground-truth manually created by a group of 50 users.

...read moreread less

Patent•

Method and apparatus of video coding

[...]

Ye Jing¹, Shan Liu¹, Shaw-Min Lei¹•Institutions (1)

MediaTek¹

23 Feb 2017

TL;DR: In this article, the authors proposed a method for video coding, which includes receiving input data associated with a current block in an image frame, generating an inter predictor of the current block, and generating an intra predictor based on samples of neighboring pixels and an intra prediction mode that locates the samples of neighbouring pixels.

...read moreread less

Abstract: Aspects of the disclosure include a method for video coding. The method includes receiving input data associated with a current block in an image frame, generating an inter predictor of the current block, and generating an intra predictor of the current block based on samples of neighboring pixels and an intra prediction mode that locates the samples of neighboring pixels. The method further includes generating a final predictor of the current block by combining the inter predictor and the intra predictor according to one or more intra weight coefficients associated with the intra prediction mode, and encoding or decoding the current block based on the final predictor to output encoded video data or a decoded block. The one or more intra weight coefficients indicate one or more ratios that corresponding one or more portions of the intra predictor are combined with the inter predictor, respectively.

...read moreread less

Journal Article•DOI•

CodingFlow: Enable Video Coding for Video Stabilization

[...]

Shuaicheng Liu¹, Mingyu Li¹, Shuyuan Zhu¹, Bing Zeng¹•Institutions (1)

University of Electronic Science and Technology of China¹

01 Jul 2017-IEEE Transactions on Image Processing

TL;DR: This paper enables video coding for video stabilization by constructing the camera motions based on the motion vectors employed in the video coding by designing a grid-based 2D method, named as CodingFlow, which is optimized for a spatially-variant motion compensation.

...read moreread less

Abstract: Video coding focuses on reducing the data size of videos. Video stabilization targets at removing shaky camera motions. In this paper, we enable video coding for video stabilization by constructing the camera motions based on the motion vectors employed in the video coding. The existing stabilization methods rely heavily on image features for the recovery of camera motions. However, feature tracking is time-consuming and prone to errors. On the other hand, nearly all captured videos have been compressed before any further processing and such a compression has produced a rich set of block-based motion vectors that can be utilized for estimating the camera motion. More specifically, video stabilization requires camera motions between two adjacent frames. However, motion vectors extracted from video coding may refer to non-adjacent frames. We first show that these non-adjacent motions can be transformed into adjacent motions such that each coding block within a frame contains a motion vector referring to its adjacent previous frame. Then, we regularize these motion vectors to yield a spatially-smoothed motion field at each frame, named as CodingFlow , which is optimized for a spatially-variant motion compensation. Based on CodingFlow, we finally design a grid-based 2D method to accomplish the video stabilization. Our method is evaluated in terms of efficiency and stabilization quality, both quantitatively and qualitatively, which shows that our method can achieve high-quality results compared with the state-of-the-art methods (feature-based).

...read moreread less

Proceedings Article•DOI•

View direction and bandwidth adaptive 360 degree video streaming using a two-tier system

[...]

Fanyi Duanmu¹, Eymen Kurdoglu¹, Yong Liu¹, Yao Wang¹•Institutions (1)

New York University¹

28 May 2017

TL;DR: A novel two-tier 360 degree video streaming scheme is proposed to accommodate the dynamics in both network bandwidth and viewing direction and it is demonstrated that the proposed framework can significantly outperform conventional 360 video streaming schemes.

...read moreread less

Abstract: 360 degree video compression and delivery is one of the key components of virtual reality (VR) applications. In such applications, the users may freely control and navigate the captured 3D environment from any viewing direction. Given that only a small portion of the entire video is watched at any time, fetching the entire 360 degree raw video is therefore unnecessary and bandwidth-consuming. In this work, a novel two-tier 360 degree video streaming scheme is proposed to accommodate the dynamics in both network bandwidth and viewing direction. Based on the real-trace driven simulations, we demonstrate that the proposed framework can significantly outperform conventional 360 video streaming schemes.

...read moreread less

Journal Article•DOI•

A No Reference Video Quality Metric Based on Jerkiness Estimation Focusing on Multiple Frame Freezing in Video Streaming

[...]

Muhammad Arslan Usman¹, Soo Young Shin¹, Muhammad Shahid², Benny Lovstrom²•Institutions (2)

Kumoh National Institute of Technology¹, Blekinge Institute of Technology²

04 May 2017-Iete Technical Review

TL;DR: An enhanced model of objective VQA based on the estimation of jerkiness is proposed, which performs better, in terms of estimating the impact of multiple frame freezing impairments, and has more affinity with the subjective test results.

...read moreread less

Abstract: In wireless networks, due to limited bandwidth and packet losses, seamless and ubiquitous delivery of high-quality video streaming services is a big challenge for the operators. In order to improve the process of online video quality monitoring, the presence of no reference (NR) objective video quality assessment (VQA) methods is required. In some networks, the video decoder on the reception side adopts a mechanism in which last correctly received frame is frozen and displayed on video display terminal until the next correct frame is received. This phenomenon, employed as an error concealment technique, can cause a perceptual jerkiness on the video display terminal. In this paper, we have proposed an enhanced model of objective VQA based on the estimation of jerkiness. A study of three contemporary NR methods, used for objective VQA and online monitoring of videos, has been included along with subjective VQA tests. The subjective tests were performed for a set of video sequences with specific spat...

...read moreread less

Proceedings Article•DOI•

Video steganography techniques: Taxonomy, challenges, and future directions

[...]

Ramadhan J. Mstafa¹, Khaled M. Elleithy¹, Eman Abdelfattah²•Institutions (2)

University of Bridgeport¹, Sacred Heart University²

05 May 2017

...read moreread less

Abstract: Nowadays, video steganography has become important in many security applications. The performance of any steganographic method ultimately relies on the imperceptibility, hiding capacity, and robustness. In the past decade, many video steganography methods have been proposed; however, the literature lacks of sufficient survey articles that discuss all techniques. This paper presents a comprehensive study and analysis of numerous cutting edge video steganography methods and their performance evaluations from literature. Both compressed and raw video steganographic methods are surveyed. In the compressed domain, video steganographic techniques are categorized according to the video compression stages as venues for data hiding such as intra frame prediction, inter frame prediction, motion vectors, transformed and quantized coefficients, and entropy coding. On the other hand, raw video steganographic methods are classified into spatial and transform domains. This survey suggests current research directions and recommendations to improve on existing video steganographic techniques.

...read moreread less

Journal Article•DOI•

Learning explicit video attributes from mid-level representation for video captioning

[...]

Fudong Nian¹, Teng Li¹, Yan Wang¹, Xinyu Wu², Bingbing Ni³, Changsheng Xu - Show less +2 more•Institutions (3)

Anhui University¹, Chinese Academy of Sciences², Shanghai Jiao Tong University³

01 Oct 2017-Computer Vision and Image Understanding

TL;DR: Experimental results on video captioning tasks show that the proposed method, utilizing only RGB frames as input without extra video or text training data, could achieve competitive performance with state-of-the-art methods.

...read moreread less

Patent•

360 degree video system with coordinate compression

[...]

Minhua Zhou¹, Xuemin Chen¹, Yi Hu¹•Institutions (1)

Avago Technologies¹

18 May 2017

TL;DR: In this article, a video capture device captures 360 degree video in a first projection format, and an encoding device encodes the captured 360-degree video into a 360-degrees video bitstream.

...read moreread less

Abstract: In a system for 360 degree video capture and playback, 360 degree video may be captured, stitched, encoded, decoded, rendered, and played-back. In one or more implementations, a video capture device captures 360 degree video in a first projection format, and an encoding device encodes the captured 360 degree video into a 360 degree video bitstream. In some aspects, the 360 degree video bitstream is encoded with an indication of the first projection format. In one or more implementations, a rendering device converts the decoded 360 degree video bitstream from the first projection format to a second projection format based on the indication. In one or more implementations, a processing device generates projection maps where each is respectively associated with a different projection format, and a rendering device renders the decoded 360 degree video bitstream using one of the projections maps.

...read moreread less

Journal Article•DOI•

Motion-Homogeneous-Based Fast Transcoding Method From H.264/AVC to HEVC

[...]

Hui Yuan¹, Guo Chenglin¹, Ju Liu¹, Xu Wang², Sam Kwong³ - Show less +1 more•Institutions (3)

Shandong University¹, Shenzhen University², City University of Hong Kong³

01 Jul 2017-IEEE Transactions on Multimedia

TL;DR: A fast H.264/advanced video coding (AVC) to HEVC transcoding method is proposed and the corresponding prediction unit (PU) mode's early termination strategies are proposed based on the CU size and corresponding prior statistical knowledge.

...read moreread less

Abstract: With the popularity of high-efficiency video coding (HEVC) standard, a video server usually transcodes a video stream to HEVC for its higher compression ratio. In this paper, a fast H.264/advanced video coding (AVC) to HEVC transcoding method is proposed. In the HEVC encoding procedure, a coding unit (CU), which is a motion-homogeneous block, is first checked based on the analysis of the decoded information from H.264/AVC bit stream. Then, for motion-homogeneous blocks, CU depth and the corresponding prediction unit (PU) mode's early termination strategies are proposed based on the CU size and corresponding prior statistical knowledge. For non-motion-homogeneous blocks, a corresponding PU mode's early termination strategy is also proposed. Experimental results demonstrate the effectiveness of the proposed method.

...read moreread less

Journal Article•DOI•

Object-Detection-Based Video Compression for Wireless Surveillance Systems

[...]

Lingchao Kong¹, Rui Dai¹•Institutions (1)

University of Cincinnati¹

01 Apr 2017-IEEE MultiMedia

TL;DR: A standards-compliant video-encoding scheme that can suppress unnecessary temporal fluctuation in stable background areas of a raw video, and improves object-detection performance and results in lower bit rates with comparable quality.

...read moreread less

Abstract: Many distributed wireless surveillance applications use compressed videos for automatic video analysis tasks. However, the accuracy of object detection--which is essential for video analysis--can be reduced because lossy compression degrades video quality. Current standardized video-encoding schemes can cause temporal fluctuation for encoded blocks in stable background areas of a raw video, which strongly affects object-detection accuracy. To obtain better object-detection performance on compressed videos, the authors introduce a standards-compliant video-encoding scheme that can suppress unnecessary temporal fluctuation in stable background areas. New mode-decision strategies, designed for both intra- and interframes, reduce the temporal fluctuation while maintaining acceptable rate-distortion performance. Experimental results show that, compared with traditional encoding schemes, the proposed scheme improves object-detection performance and results in lower bit rates with comparable quality.

...read moreread less

Journal Article•DOI•

Learning hierarchical video representation for action recognition

[...]

Qing Li¹, Zhaofan Qiu¹, Ting Yao², Tao Mei², Yong Rui², Jiebo Luo³ - Show less +2 more•Institutions (3)

University of Science and Technology of China¹, Microsoft², University of Rochester³

15 Feb 2017-International Journal of Multimedia Information Retrieval

TL;DR: This work represents the hierarchical structure of video with multiple granularities including, from short to long, single frame, consecutive frames (motion), short clip, and the entire video and proposes a novel deep learning framework to model each granularity individually.

...read moreread less

Abstract: Video analysis is an important branch of computer vision due to its wide applications, ranging from video surveillance, video indexing, and retrieval to human computer interaction. All of the applications are based on a good video representation, which encodes video content into a feature vector with fixed length. Most existing methods treat video as a flat image sequence, but from our observations we argue that video is an information-intensive media with intrinsic hierarchical structure, which is largely ignored by previous approaches. Therefore, in this work, we represent the hierarchical structure of video with multiple granularities including, from short to long, single frame, consecutive frames (motion), short clip, and the entire video. Furthermore, we propose a novel deep learning framework to model each granularity individually. Specifically, we model the frame and motion granularities with 2D convolutional neural networks and model the clip and video granularities with 3D convolutional neural networks. Long Short-Term Memory networks are applied on the frame, motion, and clip to further exploit the long-term temporal clues. Consequently, the whole framework utilizes multi-stream CNNs to learn a hierarchical representation that captures spatial and temporal information of video. To validate its effectiveness in video analysis, we apply this video representation to action recognition task. We adopt a distribution-based fusion strategy to combine the decision scores from all the granularities, which are obtained by using a softmax layer on the top of each stream. We conduct extensive experiments on three action benchmarks (UCF101, HMDB51, and CCV) and achieve competitive performance against several state-of-the-art methods.

...read moreread less

Proceedings Article•

Learning heterogeneous dictionary pair with feature projection matrix for pedestrian video retrieval via single query image

[...]

Xiaoke Zhu¹, Xiao-Yuan Jing², Fei Wu², Yunhong Wang³, Wangmeng Zuo⁴, Wei-Shi Zheng⁵ - Show less +2 more•Institutions (5)

Wuhan University¹, Nanjing University of Posts and Telecommunications², Beihang University³, Harbin Institute of Technology⁴, Sun Yat-sen University⁵

04 Feb 2017

TL;DR: A joint feature projection matrix and heterogeneous dictionary pair learning (PHDL) approach for IVPR is proposed and to ensure that the obtained coding coefficients have favorable discriminability, PHDL designs a point-to-set coefficient discriminant term.

...read moreread less

Abstract: Person re-identification (re-id) plays an important role in video surveillance and forensics applications. In many cases, person re-id needs to be conducted between image and video clip, e.g., re-identifying a suspect from large quantities of pedestrian videos given a single image of him. We call re-id in this scenario as image to video person re-id (IVPR). In practice, image and video are usually represented with different features, and there usually exist large variations between frames within each video. These factors make matching between image and video become a very challenging task. In this paper, we propose a joint feature projection matrix and heterogeneous dictionary pair learning (PHDL) approach for IVPR. Specifically, PHDL jointly learns an intra-video projection matrix and a pair of heterogeneous image and video dictionaries. With the learned projection matrix, the influence of variations within each video to the matching can be reduced. With the learned dictionary pair, the heterogeneous image and video features can be transformed into coding coefficients with the same dimension, such that the matching can be conducted using coding coefficients. Furthermore, to ensure that the obtained coding coefficients have favorable discriminability, PHDL designs a point-to-set coefficient discriminant term. Experiments on the public iLIDS-VID and PRID 2011 datasets demonstrate the effectiveness of the proposed approach.

...read moreread less

Patent•

Multi-pass non-separable transforms for video coding

[...]

Amir Said, Xin Zhao, Marta Karczewicz

14 Feb 2017

TL;DR: In this article, a multi-pass non-separable inverse transformation on the plurality of values to derive residual data that represents pixel differences between the current block of video data and a predictive block of the video data is performed.

...read moreread less

Abstract: An example method of decoding video data includes determining, by a video decoder and based on syntax elements in an encoded video bitstream, a plurality of values for a current block of the video data; performing, by the video decoder, a multi-pass non-separable inverse transformation on the plurality of values to derive residual data that represents pixel differences between the current block of the video data and a predictive block of the video data; and reconstructing, by the video decoder, the current block of the video data based on the residual data and the predictive block of the video data. In some examples, performing a pass of the multi-pass non-separable inverse transformation includes performing a plurality of Givens orthogonal transformations.

...read moreread less

Journal Article•DOI•

Scalable video summarization via sparse dictionary learning and selection simultaneously

[...]

Pouriya Etezadifar¹, Hassan Farsi¹•Institutions (1)

University of Birjand¹

01 Mar 2017-Multimedia Tools and Applications

TL;DR: A new method in which video summarization is performed as training and selection sparse dictionary problem simultaneously is proposed, and it is shown that the proposed method is able to improve the summarization of a large amount of video data compared to other methods.

...read moreread less

Abstract: Every day, a huge amount of video data is generated worldwide and processing this kind of data requires powerful resources in terms of time, manpower, and hardware. Therefore, to help quickly understand the content of video data, video summarization methods have been proposed. Recently, sparse formulation-based methods have been found to be able to summarize a large amount of video compared to other methods. In this paper, we propose a new method in which video summarization is performed as training and selection sparse dictionary problem simultaneously. It is shown that the proposed method is able to improve the summarization of a large amount of video data compared to other methods. Finally, the performance of the proposed method is compared to state-of-the-art methods using standard data sets, in which the key frames are manually tagged. The obtained results demonstrate that the proposed method improves video summarization compared to other methods.

...read moreread less

Posted Content•

Truly multi-modal YouTube-8M video classification with video, audio, and text

[...]

Zhe Wang, Kingsley Kuan, Mathieu Ravaut, Gaurav Manek, Sibo Song, Fang Yuan, Kim Seokhwan, Nancy F. Chen, Luis Fernando D'Haro Enriquez, Luu Anh Tuan, Hongyuan Zhu, Zeng Zeng, Ngai-Man Cheung, Georgios Piliouras, Jie Lin, Vijay Chandrasekhar - Show less +12 more

17 Jun 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work presents a classification framework for the joint use of text, visual and audio features, and conducts an extensive set of experiments to quantify the benefit that this additional mode brings.

...read moreread less

Abstract: The YouTube-8M video classification challenge requires teams to classify 0.7 million videos into one or more of 4,716 classes. In this Kaggle competition, we placed in the top 3% out of 650 participants using released video and audio features. Beyond that, we extend the original competition by including text information in the classification, making this a truly multi-modal approach with vision, audio and text. The newly introduced text data is termed as YouTube-8M-Text. We present a classification framework for the joint use of text, visual and audio features, and conduct an extensive set of experiments to quantify the benefit that this additional mode brings. The inclusion of text yields state-of-the-art results, e.g. 86.7% GAP on the YouTube-8M-Text validation dataset.

...read moreread less

Proceedings Article•DOI•

Multirate Multimodal Video Captioning

[...]

Ziwei Yang¹, Youjiang Xu¹, Huiyun Wang¹, Bo Wang¹, Yahong Han¹ - Show less +1 more•Institutions (1)

Tianjin University¹

23 Oct 2017

TL;DR: The approach for video captioning gets great performance on the 2nd MSR Video to Language Challenge and the approach utilizes a Multirate GRU to capture temporal structure of videos.

...read moreread less

Abstract: Automatically describing videos with natural language is a crucial challenge of video understanding. Compared to images, videos have specific spatial-temporal structure and various modality information. In this paper, we propose a Multirate Multimodal Approach for video captioning. Considering that the speed of motion in videos varies constantly, we utilize a Multirate GRU to capture temporal structure of videos. It encodes video frames with different intervals and has a strong ability to deal with motion speed variance. As videos contain different modality cues, we design a particular multimodal fusion method. By incorporating visual, motion, and topic information together, we construct a well-designed video representation. Then the video representation is fed into a RNN-based language model for generating natural language descriptions. We evaluate our approach for video captioning on "Microsoft Research - Video to Text" (MSR-VTT), a large-scale video benchmark for video understanding. And our approach gets great performance on the 2nd MSR Video to Language Challenge.

...read moreread less

Journal Article•DOI•

High-Performance Low-Area Video Up-Scaling Architecture for 4-K UHD Video

[...]

Jooseung Lee¹, In-Cheol Park¹•Institutions (1)

KAIST¹

01 Apr 2017-IEEE Transactions on Circuits and Systems Ii-express Briefs

TL;DR: Experimental results show that the proposed architecture provides the best visual quality at the cost of reasonable hardware resources.

...read moreread less

Abstract: A new algorithm and its hardware architecture are presented to up-scale high-definition (HD) and full-HD video streams to 4-K ultra-HD video streams in real time. The Lagrange interpolation is employed, as it provides high estimation accuracy and hardware-friendly properties. To enhance the accuracy further, the pixels at the edge regions are specially processed by employing an image-sharpening technique. Experimental results show that the proposed architecture provides the best visual quality at the cost of reasonable hardware resources.

...read moreread less

Collapse