scispace - formally typeset
Search or ask a question
Author

Michael S. Brown

Bio: Michael S. Brown is an academic researcher from York University. The author has contributed to research in topics: Color balance & Image restoration. The author has an hindex of 50, co-authored 214 publications receiving 9285 citations. Previous affiliations of Michael S. Brown include Chinese Academy of Sciences & University of North Carolina at Chapel Hill.


Papers
More filters
Proceedings ArticleDOI
01 Jun 2016
TL;DR: This paper proposes an effective method that uses simple patch-based priors for both the background and rain layers that removes rain streaks better than the existing methods qualitatively and quantitatively.
Abstract: This paper addresses the problem of rain streak removal from a single image. Rain streaks impair visibility of an image and introduce undesirable interference that can severely affect the performance of computer vision algorithms. Rain streak removal can be formulated as a layer decomposition problem, with a rain streak layer superimposed on a background layer containing the true scene content. Existing decomposition methods that address this problem employ either dictionary learning methods or impose a low rank structure on the appearance of the rain streaks. While these methods can improve the overall visibility, they tend to leave too many rain streaks in the background image or over-smooth the background image. In this paper, we propose an effective method that uses simple patch-based priors for both the background and rain layers. These priors are based on Gaussian mixture models and can accommodate multiple orientations and scales of the rain streaks. This simple approach removes rain streaks better than the existing methods qualitatively and quantitatively. We overview our method and demonstrate its effectiveness over prior work on a number of examples.

718 citations

Proceedings ArticleDOI
18 Jun 2018
TL;DR: This paper proposes a systematic procedure for estimating ground truth for noisy images that can be used to benchmark denoising performance for smartphone cameras and shows that CNN-based methods perform better when trained on the authors' high-quality dataset than when trained using alternative strategies, such as low-ISO images used as a proxy for ground truth data.
Abstract: The last decade has seen an astronomical shift from imaging with DSLR and point-and-shoot cameras to imaging with smartphone cameras. Due to the small aperture and sensor size, smartphone images have notably more noise than their DSLR counterparts. While denoising for smartphone images is an active research area, the research community currently lacks a denoising image dataset representative of real noisy images from smartphone cameras with high-quality ground truth. We address this issue in this paper with the following contributions. We propose a systematic procedure for estimating ground truth for noisy images that can be used to benchmark denoising performance for smartphone cameras. Using this procedure, we have captured a dataset - the Smartphone Image Denoising Dataset (SIDD) - of ~30,000 noisy images from 10 scenes under different lighting conditions using five representative smartphone cameras and generated their ground truth images. We used this dataset to benchmark a number of denoising algorithms. We show that CNN-based methods perform better when trained on our high-quality dataset than when trained using alternative strategies, such as low-ISO images used as a proxy for ground truth data.

552 citations

Proceedings ArticleDOI
06 Nov 2011
TL;DR: This paper describes an application framework to perform high quality upsampling on depth maps captured from a low-resolution and noisy 3D time-of-flight camera that has been coupled with a high-resolution RGB camera.
Abstract: This paper describes an application framework to perform high quality upsampling on depth maps captured from a low-resolution and noisy 3D time-of-flight (3D-ToF) camera that has been coupled with a high-resolution RGB camera. Our framework is inspired by recent work that uses nonlocal means filtering to regularize depth maps in order to maintain fine detail and structure. Our framework extends this regularization with an additional edge weighting scheme based on several image features based on the additional high-resolution RGB input. Quantitative and qualitative results show that our method outperforms existing approaches for 3D-ToF upsampling. We describe the complete process for this system, including device calibration, scene warping for input alignment, and even how the results can be further processed using simple user markup.

545 citations

Journal ArticleDOI
23 Jun 2013
TL;DR: This work investigates projective estimation under model inadequacies, i.e., when the underpinning assumptions of the projective model are not fully satisfied by the data, and proposes as-projective-as-possible warps that aim to be globally projective, yet allow local non-projectives to account for violations to the assumed imaging conditions.
Abstract: The success of commercial image stitching tools often leads to the impression that image stitching is a “solved problem”. The reality, however, is that many tools give unconvincing results when the input photos violate fairly restrictive imaging assumptions; the main two being that the photos correspond to views that differ purely by rotation, or that the imaged scene is effectively planar. Such assumptions underpin the usage of 2D projective transforms or homographies to align photos. In the hands of the casual user, such conditions are often violated, yielding misalignment artifacts or “ghosting” in the results. Accordingly, many existing image stitching tools depend critically on post-processing routines to conceal ghosting. In this paper, we propose a novel estimation technique called Moving Direct Linear Transformation (Moving DLT) that is able to tweak or fine-tune the projective warp to accommodate the deviations of the input data from the idealized conditions. This produces as-projective-as-possible image alignment that significantly reduces ghosting without compromising the geometric realism of perspective image stitching. Our technique thus lessens the dependency on potentially expensive postprocessing algorithms. In addition, we describe how multiple as-projective-as-possible warps can be simultaneously refined via bundle adjustment to accurately align multiple images for large panorama creation.

450 citations

Proceedings ArticleDOI
13 Jun 2010
TL;DR: This paper proposes an approach to extend edge-directed super-resolution to include detail from an image/texture example provided by the user (e.g., from the Internet), and can achieve quality results at very large magnification, which is often problematic for both edge- directed and learning-based approaches.
Abstract: Edge-directed image super resolution (SR) focuses on ways to remove edge artifacts in upsampled images. Under large magnification, however, textured regions become blurred and appear homogenous, resulting in a super-resolution image that looks unnatural. Alternatively, learning-based SR approaches use a large database of exemplar images for “hallucinating” detail. The quality of the upsampled image, especially about edges, is dependent on the suitability of the training images. This paper aims to combine the benefits of edge-directed SR with those of learning-based SR. In particular, we propose an approach to extend edge-directed super-resolution to include detail from an image/texture example provided by the user (e.g., from the Internet). A significant benefit of our approach is that only a single exemplar image is required to supply the missing detail – strong edges are obtained in the SR image even if they are not present in the example image due to the combination of the edge-directed approach. In addition, we can achieve quality results at very large magnification, which is often problematic for both edge-directed and learning-based approaches.

355 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: SRGAN as mentioned in this paper proposes a perceptual loss function which consists of an adversarial loss and a content loss, which pushes the solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images.
Abstract: Despite the breakthroughs in accuracy and speed of single image super-resolution using faster and deeper convolutional neural networks, one central problem remains largely unsolved: how do we recover the finer texture details when we super-resolve at large upscaling factors? The behavior of optimization-based super-resolution methods is principally driven by the choice of the objective function. Recent work has largely focused on minimizing the mean squared reconstruction error. The resulting estimates have high peak signal-to-noise ratios, but they are often lacking high-frequency details and are perceptually unsatisfying in the sense that they fail to match the fidelity expected at the higher resolution. In this paper, we present SRGAN, a generative adversarial network (GAN) for image super-resolution (SR). To our knowledge, it is the first framework capable of inferring photo-realistic natural images for 4x upscaling factors. To achieve this, we propose a perceptual loss function which consists of an adversarial loss and a content loss. The adversarial loss pushes our solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images. In addition, we use a content loss motivated by perceptual similarity instead of similarity in pixel space. Our deep residual network is able to recover photo-realistic textures from heavily downsampled images on public benchmarks. An extensive mean-opinion-score (MOS) test shows hugely significant gains in perceptual quality using SRGAN. The MOS scores obtained with SRGAN are closer to those of the original high-resolution images than to those obtained with any state-of-the-art method.

6,884 citations

Posted Content
TL;DR: SRGAN, a generative adversarial network (GAN) for image super-resolution (SR), is presented, to its knowledge, the first framework capable of inferring photo-realistic natural images for 4x upscaling factors and a perceptual loss function which consists of an adversarial loss and a content loss.
Abstract: Despite the breakthroughs in accuracy and speed of single image super-resolution using faster and deeper convolutional neural networks, one central problem remains largely unsolved: how do we recover the finer texture details when we super-resolve at large upscaling factors? The behavior of optimization-based super-resolution methods is principally driven by the choice of the objective function. Recent work has largely focused on minimizing the mean squared reconstruction error. The resulting estimates have high peak signal-to-noise ratios, but they are often lacking high-frequency details and are perceptually unsatisfying in the sense that they fail to match the fidelity expected at the higher resolution. In this paper, we present SRGAN, a generative adversarial network (GAN) for image super-resolution (SR). To our knowledge, it is the first framework capable of inferring photo-realistic natural images for 4x upscaling factors. To achieve this, we propose a perceptual loss function which consists of an adversarial loss and a content loss. The adversarial loss pushes our solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images. In addition, we use a content loss motivated by perceptual similarity instead of similarity in pixel space. Our deep residual network is able to recover photo-realistic textures from heavily downsampled images on public benchmarks. An extensive mean-opinion-score (MOS) test shows hugely significant gains in perceptual quality using SRGAN. The MOS scores obtained with SRGAN are closer to those of the original high-resolution images than to those obtained with any state-of-the-art method.

4,404 citations

Journal ArticleDOI
TL;DR: This work refers one to the original survey for descriptions of potential applications, summaries of AR system characteristics, and an introduction to the crucial problem of registration, including sources of registration error and error-reduction strategies.
Abstract: In 1997, Azuma published a survey on augmented reality (AR). Our goal is to complement, rather than replace, the original survey by presenting representative examples of the new advances. We refer one to the original survey for descriptions of potential applications (such as medical visualization, maintenance and repair of complex equipment, annotation, and path planning); summaries of AR system characteristics (such as the advantages and disadvantages of optical and video approaches to blending virtual and real, problems in display focus and contrast, and system portability); and an introduction to the crucial problem of registration, including sources of registration error and error-reduction strategies.

3,624 citations

Proceedings ArticleDOI
Bee Oh Lim1, Sanghyun Son1, Heewon Kim1, Seungjun Nah1, Kyoung Mu Lee1 
21 Jul 2017
TL;DR: This paper develops an enhanced deep super-resolution network (EDSR) with performance exceeding those of current state-of-the-art SR methods, and proposes a new multi-scale deepsuper-resolution system (MDSR) and training method, which can reconstruct high-resolution images of different upscaling factors in a single model.
Abstract: Recent research on super-resolution has progressed with the development of deep convolutional neural networks (DCNN). In particular, residual learning techniques exhibit improved performance. In this paper, we develop an enhanced deep super-resolution network (EDSR) with performance exceeding those of current state-of-the-art SR methods. The significant performance improvement of our model is due to optimization by removing unnecessary modules in conventional residual networks. The performance is further improved by expanding the model size while we stabilize the training procedure. We also propose a new multi-scale deep super-resolution system (MDSR) and training method, which can reconstruct high-resolution images of different upscaling factors in a single model. The proposed methods show superior performance over the state-of-the-art methods on benchmark datasets and prove its excellence by winning the NTIRE2017 Super-Resolution Challenge[26].

3,221 citations