scispace - formally typeset
Search or ask a question
Author

Alex Harvill

Bio: Alex Harvill is an academic researcher. The author has contributed to research in topics: Artificial neural network & Monte Carlo method. The author has an hindex of 4, co-authored 6 publications receiving 322 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: A novel, supervised learning approach that allows the filtering kernel to be more complex and general by leveraging a deep convolutional neural network (CNN) architecture and introduces a novel, kernel-prediction network which uses the CNN to estimate the local weighting kernels used to compute each denoised pixel from its neighbors.
Abstract: Regression-based algorithms have shown to be good at denoising Monte Carlo (MC) renderings by leveraging its inexpensive by-products (e.g., feature buffers). However, when using higher-order models to handle complex cases, these techniques often overfit to noise in the input. For this reason, supervised learning methods have been proposed that train on a large collection of reference examples, but they use explicit filters that limit their denoising ability. To address these problems, we propose a novel, supervised learning approach that allows the filtering kernel to be more complex and general by leveraging a deep convolutional neural network (CNN) architecture. In one embodiment of our framework, the CNN directly predicts the final denoised pixel value as a highly non-linear combination of the input features. In a second approach, we introduce a novel, kernel-prediction network which uses the CNN to estimate the local weighting kernels used to compute each denoised pixel from its neighbors. We train and evaluate our networks on production data and observe improvements over state-of-the-art MC denoisers, showing that our methods generalize well to a variety of scenes. We conclude by analyzing various components of our architecture and identify areas of further research in deep learning for MC denoising.

278 citations

Journal ArticleDOI
TL;DR: A theoretical analysis of convergence rates of kernel-predicting architectures is presented, shedding light on why kernel prediction performs better than synthesizing the colors directly, complementing the empirical evidence presented in this and previous works.
Abstract: We present a modular convolutional architecture for denoising rendered images. We expand on the capabilities of kernel-predicting networks by combining them with a number of task-specific modules, and optimizing the assembly using an asymmetric loss. The source-aware encoder---the first module in the assembly---extracts low-level features and embeds them into a common feature space, enabling quick adaptation of a trained network to novel data. The spatial and temporal modules extract abstract, high-level features for kernel-based reconstruction, which is performed at three different spatial scales to reduce low-frequency artifacts. The complete network is trained using a class of asymmetric loss functions that are designed to preserve details and provide the user with a direct control over the variance-bias trade-off during inference. We also propose an error-predicting module for inferring reconstruction error maps that can be used to drive adaptive sampling. Finally, we present a theoretical analysis of convergence rates of kernel-predicting architectures, shedding light on why kernel prediction performs better than synthesizing the colors directly, complementing the empirical evidence presented in this and previous works. We demonstrate that our networks attain results that compare favorably to state-of-the-art methods in terms of detail preservation, low-frequency noise removal, and temporal stability on a variety of production and academic datasets.

152 citations

Patent
31 Jul 2018
TL;DR: In this article, a modular architecture for denoising Monte Carlo renderings using neural networks is presented, which includes separate single-frame or temporal denoizing modules for individual scales, and one or more scale compositor neural networks configured to adaptively blend individual scales.
Abstract: A modular architecture is provided for denoising Monte Carlo renderings using neural networks. The temporal approach extracts and combines feature representations from neighboring frames rather than building a temporal context using recurrent connections. A multiscale architecture includes separate single-frame or temporal denoising modules for individual scales, and one or more scale compositor neural networks configured to adaptively blend individual scales. An error-predicting module is configured to produce adaptive sampling maps for a renderer to achieve more uniform residual noise distribution. An asymmetric loss function may be used for training the neural networks, which can provide control over the variance-bias trade-off during denoising.

5 citations

Patent
03 Oct 2019
TL;DR: In this paper, a modular architecture for denoising Monte Carlo renderings using neural networks is presented, which includes separate single-frame or temporal denoizing modules for individual scales, and one or more scale compositor neural networks configured to adaptively blend individual scales.
Abstract: A modular architecture is provided for denoising Monte Carlo renderings using neural networks. The temporal approach extracts and combines feature representations from neighboring frames rather than building a temporal context using recurrent connections. A multiscale architecture includes separate single-frame or temporal denoising modules for individual scales, and one or more scale compositor neural networks configured to adaptively blend individual scales. An error-predicting module is configured to produce adaptive sampling maps for a renderer to achieve more uniform residual noise distribution. An asymmetric loss function may be used for training the neural networks, which can provide control over the variance-bias trade-off during denoising.

4 citations

Patent
03 Oct 2019
TL;DR: In this article, a modular architecture for denoising Monte Carlo renderings using neural networks is presented, which includes separate single-frame or temporal denoizing modules for individual scales, and one or more scale compositor neural networks configured to adaptively blend individual scales.
Abstract: A modular architecture is provided for denoising Monte Carlo renderings using neural networks. The temporal approach extracts and combines feature representations from neighboring frames rather than building a temporal context using recurrent connections. A multiscale architecture includes separate single-frame or temporal denoising modules for individual scales, and one or more scale compositor neural networks configured to adaptively blend individual scales. An error-predicting module is configured to produce adaptive sampling maps for a renderer to achieve more uniform residual noise distribution. An asymmetric loss function may be used for training the neural networks, which can provide control over the variance-bias trade-off during denoising.

4 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Journal ArticleDOI
TL;DR: A comparative study of deep techniques in image denoising by classifying the deep convolutional neural networks for additive white noisy images, the deep CNNs for real noisy images; the deepCNNs for blind Denoising and the deep network for hybrid noisy images.

518 citations

Proceedings ArticleDOI
18 Jun 2018
TL;DR: In this paper, a convolutional neural network architecture is proposed for predicting spatially varying kernels that can both align and denoise frames, and a synthetic data generation approach based on a realistic noise formation model, and an optimization guided by an annealed loss function to avoid undesirable local minima.
Abstract: We present a technique for jointly denoising bursts of images taken from a handheld camera. In particular, we propose a convolutional neural network architecture for predicting spatially varying kernels that can both align and denoise frames, a synthetic data generation approach based on a realistic noise formation model, and an optimization guided by an annealed loss function to avoid undesirable local minima. Our model matches or outperforms the state-of-the-art across a wide range of noise levels on both real and synthetic data.

387 citations

Proceedings ArticleDOI
01 Oct 2019
TL;DR: Li et al. as mentioned in this paper proposed a Laplacian pyramid based kernel prediction network (LP-KPN), which efficiently learns per-pixel kernels to recover the HR image, which achieved better visual quality with sharper edges and finer textures on real-world scenes.
Abstract: Most of the existing learning-based single image super-resolution (SISR) methods are trained and evaluated on simulated datasets, where the low-resolution (LR) images are generated by applying a simple and uniform degradation (i.e., bicubic downsampling) to their high-resolution (HR) counterparts. However, the degradations in real-world LR images are far more complicated. As a consequence, the SISR models trained on simulated data become less effective when applied to practical scenarios. In this paper, we build a real-world super-resolution (RealSR) dataset where paired LR-HR images on the same scene are captured by adjusting the focal length of a digital camera. An image registration algorithm is developed to progressively align the image pairs at different resolutions. Considering that the degradation kernels are naturally non-uniform in our dataset, we present a Laplacian pyramid based kernel prediction network (LP-KPN), which efficiently learns per-pixel kernels to recover the HR image. Our extensive experiments demonstrate that SISR models trained on our RealSR dataset deliver better visual quality with sharper edges and finer textures on real-world scenes than those trained on simulated datasets. Though our RealSR dataset is built by using only two cameras (Canon 5D3 and Nikon D810), the trained model generalizes well to other camera devices such as Sony a7II and mobile phones.

318 citations

Journal ArticleDOI
TL;DR: This work proposes a variant of deep convolutional networks better suited to the class of noise present in Monte Carlo rendering, which allows for much larger pixel neighborhoods to be taken into account, while also improving execution speed by an order of magnitude.
Abstract: We describe a machine learning technique for reconstructing image sequences rendered using Monte Carlo methods. Our primary focus is on reconstruction of global illumination with extremely low sampling budgets at interactive rates. Motivated by recent advances in image restoration with deep convolutional networks, we propose a variant of these networks better suited to the class of noise present in Monte Carlo rendering. We allow for much larger pixel neighborhoods to be taken into account, while also improving execution speed by an order of magnitude. Our primary contribution is the addition of recurrent connections to the network in order to drastically improve temporal stability for sequences of sparsely sampled input images. Our method also has the desirable property of automatically modeling relationships based on auxiliary per-pixel input channels, such as depth and normals. We show significantly higher quality results compared to existing methods that run at comparable speeds, and furthermore argue a clear path for making our method run at realtime rates in the near future.

277 citations