scispace - formally typeset
Search or ask a question
Author

Fernando Pereira

Bio: Fernando Pereira is an academic researcher from Instituto Superior Técnico. The author has contributed to research in topics: Encoder & Point cloud. The author has an hindex of 32, co-authored 80 publications receiving 5282 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper provides an overview of the new tools, features and complexity of H.264/AVC.
Abstract: H.264/AVC, the result of the collaboration between the ISO/IEC Moving Picture Experts Group and the ITU-T Video Coding Experts Group, is the latest standard for video coding. The goals of this standardization effort were enhanced compression efficiency, network friendly video representation for interactive (video telephony) and non-interactive applications (broadcast, streaming, storage, video on demand). H.264/AVC provides gains in compression efficiency of up to 50% over a wide range of bit rates and video resolutions compared to previous standards. Compared to previous standards, the decoder complexity is about four times that of MPEG-2 and two times that of MPEG-4 Visual Simple Profile. This paper provides an overview of the new tools, features and complexity of H.264/AVC.

1,013 citations

01 Jan 2005
TL;DR: Besides forward and bidirectional motion estimation, a spatial motion smoothing algorithm to eliminate motion outliers is proposed that allows significant improvements in the rate-distortion (RD) performance without sacrificing the encoder complexity.
Abstract: Distributed video coding (DVC) is a new compression paradigm based on two key Information Theory results: the Slepian-Wolf and Wyner-Ziv theorems. A particular case of DVC deals with lossy source coding with side information at the decoder (Wyner-Ziv) and enables to shift the coding complexity from the encoder to the decoder. The solution here described is based on a very lightweight encoder leaving for the decoder the time consuming motion estimation/compensation task. In this paper, the performance of the pixel domain distributed video codec is improved by using better side information based derived by motion compensated frame interpolation algorithms at the decoder. Besides forward and bidirectional motion estimation, a spatial motion smoothing algorithm to eliminate motion outliers is proposed. This allows significant improvements in the rate-distortion (RD) performance without sacrificing the encoder complexity.

433 citations

Book
20 Jul 2002
TL;DR: A comprehensive, targeted guide to the MPEG-4 standard—and its use in cutting-edge applications, Fernando Pereira and Touradj Ebrahimi demonstrate how MPEG- 4 addresses tomorrow's multimedia applications more successfully than any previous standard.
Abstract: From the Publisher: The most complete, focused guide to MPEG-4—the breakthrough standard for interactive multimedia. The comprehensive, focused, up-to-the-minute guide to MPEG-4 Practical solutions for next-generation multimedia applications In-depth coverage of natural and synthetic audiovisual object coding, description, composition and synchronization Binary and textual scene description Transport and storage of MPEG-4 content MPEG-4 profiles and levels; verification tests MPEG-4 represents a breakthrough in multimedia, delivering not just outstanding compression but also a fully interactive user experience. In The MPEG-4 Book, two leaders of the MPEG-4 standards community offer a comprehensive, targeted guide to the MPEG-4 standard—and its use in cutting-edge applications. Fernando Pereira and Touradj Ebrahimi, together with a unique collection of key MPEG experts, demonstrate how MPEG-4 addresses tomorrow's multimedia applications more successfully than any previous standard. They review every element of the standard to offer you a book that covers: Synthetic and natural audio and video object coding, description and synchronization BIFS—the MPEG-4 language for scene description and interaction The extensible MPEG-4 textual format XMT Transport and delivery of MPEG-4 content MPEG-J: using Java classes within MPEG-4 content A complete overview of MPEG-4 Profiles and Levels Verification tests The authors also walk through the MPEG-4 Systems Reference Software ?offering powerful real-world insights for every product developer, softwareprofessional, engineer, and researcher involved with MPEG-4 and state-of-the-art multimedia delivery. Part of the new IMSC Press Series from the Integrated Multimedia System Center at the University of Southern California, a federally funded center specializing in cutting-edge multimedia research.

363 citations

Journal ArticleDOI
TL;DR: The article provides a comprehensive overview of MPEG-7's motivation, objectives, scope, and components.
Abstract: The recently completed ISO/IEC, International Standard 15938, formally called the Multimedia Content Description Interface (but better known as MPEG-7), provides a rich set of tools for completely describing multimedia content. The standard wasn't just designed from a content management viewpoint (classical archival information). It includes an innovative description of the media's content, which we can extract via content analysis and processing. MPEG-7 also isn't aimed at any one application; rather, the elements that MPEG-7 standardizes support as broad a range of applications as possible. This is one of the key differences between MPEG-7 and other metadata standards; it aims to be generic, not targeted to a specific application or application domain. The article provides a comprehensive overview of MPEG-7's motivation, objectives, scope, and components.

308 citations

Journal ArticleDOI
TL;DR: The higher the estimation granularity is, the better the rate-distortion performance is since the deeper the adaptation of the decoding process is to the video statistical characteristics, which means that the pixel and coefficient levels are the best performing for PDWZ and TDWZ solutions, respectively.
Abstract: In recent years, practical Wyner-Ziv (WZ) video coding solutions have been proposed with promising results. Most of the solutions available in the literature model the correlation noise (CN) between the original frame and its estimation made at the decoder, which is the so-called side information (SI), by a given distribution whose relevant parameters are estimated using an offline process, assuming that the SI is available at the encoder or the originals are available at the decoder. The major goal of this paper is to propose a more realistic WZ video coding approach by performing online estimation of the CN model parameters at the decoder, for pixel and transform domain WZ video codecs. In this context, several new techniques are proposed based on metrics which explore the temporal correlation between frames with different levels of granularity. For pixel-domain WZ (PDWZ) video coding, three levels of granularity are proposed: frame, block, and pixel levels. For transform-domain WZ (TDWZ) video coding, DCT bands and coefficients are the two granularity levels proposed. The higher the estimation granularity is, the better the rate-distortion performance is since the deeper the adaptation of the decoding process is to the video statistical characteristics, which means that the pixel and coefficient levels are the best performing for PDWZ and TDWZ solutions, respectively.

241 citations


Cited by
More filters
Book
19 Dec 2003
TL;DR: In this article, the MPEG-4 and H.264 standards are discussed and an overview of the technologies involved in their development is presented. But the focus is on the performance and not the technical aspects.
Abstract: About the Author.Foreword.Preface.Glossary.1. Introduction.2. Video Formats and Quality.3. Video Coding Concepts.4. The MPEG-4 and H.264 Standards.5. MPEG-4 Visual.6. H.264/MPEG-4 Part 10.7. Design and Performance.8. Applications and Directions.Bibliography.Index.

2,491 citations

Journal ArticleDOI
20 Nov 2017
TL;DR: In this paper, the authors provide a comprehensive tutorial and survey about the recent advances toward the goal of enabling efficient processing of DNNs, and discuss various hardware platforms and architectures that support DNN, and highlight key trends in reducing the computation cost of deep neural networks either solely via hardware design changes or via joint hardware and DNN algorithm changes.
Abstract: Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of DNNs to improve energy efficiency and throughput without sacrificing application accuracy or increasing hardware cost are critical to the wide deployment of DNNs in AI systems. This article aims to provide a comprehensive tutorial and survey about the recent advances toward the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various hardware platforms and architectures that support DNNs, and highlight key trends in reducing the computation cost of DNNs either solely via hardware design changes or via joint hardware design and DNN algorithm changes. It will also summarize various development resources that enable researchers and practitioners to quickly get started in this field, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic codesigns, being proposed in academia and industry. The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand the tradeoffs between various hardware architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand recent implementation trends and opportunities.

2,391 citations

MonographDOI
02 Sep 2003
TL;DR: This paper presents a meta-review of the MPEG-4 and H.264 standards for video quality and design, and some of the standards themselves have been revised and improved since their publication in 2009.
Abstract: About the Author.Foreword.Preface.Glossary.1. Introduction.2. Video Formats and Quality.3. Video Coding Concepts.4. The MPEG-4 and H.264 Standards.5. MPEG-4 Visual.6. H.264/MPEG-4 Part 10.7. Design and Performance.8. Applications and Directions.Bibliography.Index.

1,520 citations

Proceedings Article
01 Jan 2016
TL;DR: This work trains a convolutional network to generate future frames given an input sequence and proposes three different and complementary feature learning strategies: a multi-scale architecture, an adversarial training method, and an image gradient difference loss function.
Abstract: Learning to predict future images from a video sequence involves the construction of an internal representation that models the image evolution accurately, and therefore, to some degree, its content and dynamics. This is why pixel-space video prediction may be viewed as a promising avenue for unsupervised feature learning. In addition, while optical flow has been a very studied problem in computer vision for a long time, future frame prediction is rarely approached. Still, many vision applications could benefit from the knowledge of the next frames of videos, that does not require the complexity of tracking every pixel trajectories. In this work, we train a convolutional network to generate future frames given an input sequence. To deal with the inherently blurry predictions obtained from the standard Mean Squared Error (MSE) loss function, we propose three different and complementary feature learning strategies: a multi-scale architecture, an adversarial training method, and an image gradient difference loss function. We compare our predictions to different published results based on recurrent neural networks on the UCF101 dataset

1,369 citations

Posted Content
TL;DR: In this paper, a multi-scale architecture, an adversarial training method, and an image gradient difference loss function were proposed to predict future frames from a video sequence. But their performance was not as good as those of the previous works.
Abstract: Learning to predict future images from a video sequence involves the construction of an internal representation that models the image evolution accurately, and therefore, to some degree, its content and dynamics. This is why pixel-space video prediction may be viewed as a promising avenue for unsupervised feature learning. In addition, while optical flow has been a very studied problem in computer vision for a long time, future frame prediction is rarely approached. Still, many vision applications could benefit from the knowledge of the next frames of videos, that does not require the complexity of tracking every pixel trajectories. In this work, we train a convolutional network to generate future frames given an input sequence. To deal with the inherently blurry predictions obtained from the standard Mean Squared Error (MSE) loss function, we propose three different and complementary feature learning strategies: a multi-scale architecture, an adversarial training method, and an image gradient difference loss function. We compare our predictions to different published results based on recurrent neural networks on the UCF101 dataset

1,175 citations