scispace - formally typeset
Search or ask a question
Author

Robert L. Stevenson

Bio: Robert L. Stevenson is an academic researcher from University of Notre Dame. The author has contributed to research in topics: Image processing & Image restoration. The author has an hindex of 27, co-authored 146 publications receiving 5784 citations. Previous affiliations of Robert L. Stevenson include University Hospitals Birmingham NHS Foundation Trust & Purdue University.


Papers
More filters
Journal ArticleDOI
TL;DR: A novel observation model based on motion compensated subsampling is proposed for a video sequence and Bayesian restoration with a discontinuity-preserving prior image model is used to extract a high-resolution video still given a short low-resolution sequence.
Abstract: The human visual system appears to be capable of temporally integrating information in a video sequence in such a way that the perceived spatial resolution of a sequence appears much higher than the spatial resolution of an individual frame. While the mechanisms in the human visual system that do this are unknown, the effect is not too surprising given that temporally adjacent frames in a video sequence contain slightly different, but unique, information. This paper addresses the use of both the spatial and temporal information present in a short image sequence to create a single high-resolution video frame. A novel observation model based on motion compensated subsampling is proposed for a video sequence. Since the reconstruction problem is ill-posed, Bayesian restoration with a discontinuity-preserving prior image model is used to extract a high-resolution video still given a short low-resolution sequence. Estimates computed from a low-resolution image sequence containing a subpixel camera pan show dramatic visual and quantitative improvements over bilinear, cubic B-spline, and Bayesian single frame interpolations. Visual and quantitative improvements are also shown for an image sequence containing objects moving with independent trajectories. Finally, the video frame extraction algorithm is used for the motion-compensated scan conversion of interlaced video data, with a visual comparison to the resolution enhancement obtained from progressively scanned frames.

1,058 citations

Journal ArticleDOI
TL;DR: A method for nonlinear image expansion which preserves the discontinuities of the original image, producing an expanded image with improved definition is introduced.
Abstract: Accurate image expansion is important in many areas of image analysis. Common methods of expansion, such as linear and spline techniques, tend to smooth the image data at edge regions. This paper introduces a method for nonlinear image expansion which preserves the discontinuities of the original image, producing an expanded image with improved definition. The maximum a posteriori (MAP) estimation techniques that are proposed for noise-free and noisy images result in the optimization of convex functionals. The expanded images produced from these methods will be shown to be aesthetically and quantitatively superior to images expanded by the standard methods of replication, linear interpolation, and cubic B-spline expansion. >

580 citations

Proceedings ArticleDOI
09 Aug 1998
TL;DR: The state of the art of SR techniques is reviewed using a taxonomy of existing techniques and areas which promise performance improvements are identified.
Abstract: Growing interest in super-resolution (SR) restoration of video sequences and the closed related problem of construction of SR still images from image sequences has led to the emergence of several competing methodologies. We review the state of the art of SR techniques using a taxonomy of existing techniques. We critique these methods and identified areas which promise performance improvements.

518 citations

Proceedings ArticleDOI
24 Oct 1999
TL;DR: An approach for improving the effective dynamic range of cameras by using multiple photographs of the same scene taken with different exposure times, which enables the photographer to accurately capture scenes that contain a high dynamic range, i.e., scenes that have both very bright and very dark regions.
Abstract: This paper presents an approach for improving the effective dynamic range of cameras by using multiple photographs of the same scene taken with different exposure times. Using this method enables the photographer to accurately capture scenes that contain a high dynamic range, i.e., scenes that have both very bright and very dark regions. The approach requires an initial calibration, where the camera response function is determined. Once the response function for a camera is known, high dynamic range images can be computed easily. The high dynamic range output image consists of a weighted average of the multiply-exposed input images, and thus contains information captured by each of the input images. From a computational standpoint, the proposed algorithm is very efficient, and requires little processing time to determine a solution.

427 citations

Journal ArticleDOI
TL;DR: A new method is proposed for determining the camera's response function, which is an iterative procedure that need be done only once for a particular camera, and results in higher weight being assigned to pixels taken at longer exposure times.
Abstract: We present a new approach for improving the effective dynamic range of cameras by using multiple photographs of the same scene taken with different exposure times. Using this method enables the photographer to accurately capture scenes that contain high dynamic range by using a device with low dynamic range, which allows the capture of scenes that have both very bright and very dark regions. We approach the problem from a probabilistic standpoint, distinguishing it from the other methods reported in the literature on photographic dynamic range improvement. A new method is proposed for determining the camera's response function, which is an iterative procedure that need be done only once for a particular camera. With the response function known, high dynamic range images can be easily constructed by a weighted average of the input images. The particular form of weighting is controlled by the probabilistic formulation of the problem, and results in higher weight being assigned to pixels taken at longer exposure times. The advantages of this new weighting scheme are explained by com- parison with other methods in the literature. Experimental results are presented to demonstrate the utility of the algorithm. © 2003 SPIE

353 citations


Cited by
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: SRGAN as mentioned in this paper proposes a perceptual loss function which consists of an adversarial loss and a content loss, which pushes the solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images.
Abstract: Despite the breakthroughs in accuracy and speed of single image super-resolution using faster and deeper convolutional neural networks, one central problem remains largely unsolved: how do we recover the finer texture details when we super-resolve at large upscaling factors? The behavior of optimization-based super-resolution methods is principally driven by the choice of the objective function. Recent work has largely focused on minimizing the mean squared reconstruction error. The resulting estimates have high peak signal-to-noise ratios, but they are often lacking high-frequency details and are perceptually unsatisfying in the sense that they fail to match the fidelity expected at the higher resolution. In this paper, we present SRGAN, a generative adversarial network (GAN) for image super-resolution (SR). To our knowledge, it is the first framework capable of inferring photo-realistic natural images for 4x upscaling factors. To achieve this, we propose a perceptual loss function which consists of an adversarial loss and a content loss. The adversarial loss pushes our solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images. In addition, we use a content loss motivated by perceptual similarity instead of similarity in pixel space. Our deep residual network is able to recover photo-realistic textures from heavily downsampled images on public benchmarks. An extensive mean-opinion-score (MOS) test shows hugely significant gains in perceptual quality using SRGAN. The MOS scores obtained with SRGAN are closer to those of the original high-resolution images than to those obtained with any state-of-the-art method.

6,884 citations

Proceedings ArticleDOI
27 Jun 2016
TL;DR: This paper presents the first convolutional neural network capable of real-time SR of 1080p videos on a single K2 GPU and introduces an efficient sub-pixel convolution layer which learns an array of upscaling filters to upscale the final LR feature maps into the HR output.
Abstract: Recently, several models based on deep neural networks have achieved great success in terms of both reconstruction accuracy and computational performance for single image super-resolution. In these methods, the low resolution (LR) input image is upscaled to the high resolution (HR) space using a single filter, commonly bicubic interpolation, before reconstruction. This means that the super-resolution (SR) operation is performed in HR space. We demonstrate that this is sub-optimal and adds computational complexity. In this paper, we present the first convolutional neural network (CNN) capable of real-time SR of 1080p videos on a single K2 GPU. To achieve this, we propose a novel CNN architecture where the feature maps are extracted in the LR space. In addition, we introduce an efficient sub-pixel convolution layer which learns an array of upscaling filters to upscale the final LR feature maps into the HR output. By doing so, we effectively replace the handcrafted bicubic filter in the SR pipeline with more complex upscaling filters specifically trained for each feature map, whilst also reducing the computational complexity of the overall SR operation. We evaluate the proposed approach using images and videos from publicly available datasets and show that it performs significantly better (+0.15dB on Images and +0.39dB on Videos) and is an order of magnitude faster than previous CNN-based methods.

4,770 citations

Posted Content
TL;DR: SRGAN, a generative adversarial network (GAN) for image super-resolution (SR), is presented, to its knowledge, the first framework capable of inferring photo-realistic natural images for 4x upscaling factors and a perceptual loss function which consists of an adversarial loss and a content loss.
Abstract: Despite the breakthroughs in accuracy and speed of single image super-resolution using faster and deeper convolutional neural networks, one central problem remains largely unsolved: how do we recover the finer texture details when we super-resolve at large upscaling factors? The behavior of optimization-based super-resolution methods is principally driven by the choice of the objective function. Recent work has largely focused on minimizing the mean squared reconstruction error. The resulting estimates have high peak signal-to-noise ratios, but they are often lacking high-frequency details and are perceptually unsatisfying in the sense that they fail to match the fidelity expected at the higher resolution. In this paper, we present SRGAN, a generative adversarial network (GAN) for image super-resolution (SR). To our knowledge, it is the first framework capable of inferring photo-realistic natural images for 4x upscaling factors. To achieve this, we propose a perceptual loss function which consists of an adversarial loss and a content loss. The adversarial loss pushes our solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images. In addition, we use a content loss motivated by perceptual similarity instead of similarity in pixel space. Our deep residual network is able to recover photo-realistic textures from heavily downsampled images on public benchmarks. An extensive mean-opinion-score (MOS) test shows hugely significant gains in perceptual quality using SRGAN. The MOS scores obtained with SRGAN are closer to those of the original high-resolution images than to those obtained with any state-of-the-art method.

4,404 citations

Book
30 Sep 2010
TL;DR: Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images and takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene.
Abstract: Humans perceive the three-dimensional structure of the world with apparent ease. However, despite all of the recent advances in computer vision research, the dream of having a computer interpret an image at the same level as a two-year old remains elusive. Why is computer vision such a challenging problem and what is the current state of the art? Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images. It also describes challenging real-world applications where vision is being successfully used, both for specialized applications such as medical imaging, and for fun, consumer-level tasks such as image editing and stitching, which students can apply to their own personal photos and videos. More than just a source of recipes, this exceptionally authoritative and comprehensive textbook/reference also takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene. These problems are also analyzed using statistical models and solved using rigorous engineering techniques Topics and features: structured to support active curricula and project-oriented courses, with tips in the Introduction for using the book in a variety of customized courses; presents exercises at the end of each chapter with a heavy emphasis on testing algorithms and containing numerous suggestions for small mid-term projects; provides additional material and more detailed mathematical topics in the Appendices, which cover linear algebra, numerical techniques, and Bayesian estimation theory; suggests additional reading at the end of each chapter, including the latest research in each sub-field, in addition to a full Bibliography at the end of the book; supplies supplementary course material for students at the associated website, http://szeliski.org/Book/. Suitable for an upper-level undergraduate or graduate-level course in computer science or engineering, this textbook focuses on basic techniques that work under real-world conditions and encourages students to push their creative boundaries. Its design and exposition also make it eminently suitable as a unique reference to the fundamental techniques and current research literature in computer vision.

4,146 citations