scispace - formally typeset
Search or ask a question
Author

Feng Wu

Other affiliations: Center for Excellence in Education, Microsoft, Tongji University  ...read more
Bio: Feng Wu is an academic researcher from University of Science and Technology of China. The author has contributed to research in topics: Motion compensation & Data compression. The author has an hindex of 60, co-authored 645 publications receiving 15886 citations. Previous affiliations of Feng Wu include Center for Excellence in Education & Microsoft.


Papers
More filters
Journal ArticleDOI
TL;DR: A beam splitter is placed in front of the objective lens of CASSI, which allows the same scene to be simultaneously captured by a grayscale camera, which greatly eases the reconstruction problem and yields high-quality 3D spectral data.
Abstract: Coded aperture snapshot spectral imaging (CASSI) provides an efficient mechanism for recovering 3D spectral data from a single 2D measurement. However, since the reconstruction problem is severely underdetermined, the quality of recovered spectral data is usually limited. In this paper we propose a novel dual-camera design to improve the performance of CASSI while maintaining its snapshot advantage. Specifically, a beam splitter is placed in front of the objective lens of CASSI, which allows the same scene to be simultaneously captured by a grayscale camera. This uncoded grayscale measurement, in conjunction with the coded CASSI measurement, greatly eases the reconstruction problem and yields high-quality 3D spectral data. Both simulation and experimental results demonstrate the effectiveness of the proposed method.

118 citations

Journal ArticleDOI
TL;DR: A robust single-image super-resolution method for enlarging low quality web image/video degraded by downsampling and compression is proposed and it is verified that this adaptive regularization can steadily and greatly improve the pair matching accuracy in learning-based super- resolution.
Abstract: This paper proposes a robust single-image super-resolution method for enlarging low quality web image/video degraded by downsampling and compression. To simultaneously improve the resolution and perceptual quality of such web image/video, we bring forward a practical solution which combines adaptive regularization and learning-based super-resolution. The contribution of this work is twofold. First, we propose to analyze the image energy change characteristics during the iterative regularization process, i.e., the energy change ratio between primitive (e.g., edges, ridges and corners) and nonprimitive fields. Based on the revealed convergence property of the energy change ratio, appropriate regularization strength can then be determined to well balance compression artifacts removal and primitive components preservation. Second, we verify that this adaptive regularization can steadily and greatly improve the pair matching accuracy in learning-based super-resolution. Consequently, their combination effectively eliminates the quantization noise and meanwhile faithfully compensates the missing high-frequency details, yielding robust super-resolution performance in the compression scenario. Experimental results demonstrate that our solution produces visually pleasing enlargements for various web images/videos.

118 citations

Journal ArticleDOI
TL;DR: The proposed PixelMotionCNN (PMCNN) which includes motion extension and hybrid prediction networks can model spatiotemporal coherence to effectively perform predictive coding inside the learning network and provides a possible new direction to further improve compression efficiency and functionalities of future video coding.
Abstract: One key challenge to learning-based video compression is that motion predictive coding, a very effective tool for video compression, can hardly be trained into a neural network. In this paper, we propose the concept of PixelMotionCNN (PMCNN) which includes motion extension and hybrid prediction networks. PMCNN can model spatiotemporal coherence to effectively perform predictive coding inside the learning network. On the basis of PMCNN, we further explore a learning-based framework for video compression with additional components of iterative analysis/synthesis and binarization. The experimental results demonstrate the effectiveness of the proposed scheme. Although entropy coding and complex configurations are not employed in this paper, we still demonstrate superior performance compared with MPEG-2 and achieve comparable results with H.264 codec. The proposed learning-based scheme provides a possible new direction to further improve compression efficiency and functionalities of future video coding.

116 citations

Patent
27 May 2003
TL;DR: In this article, several improvements for use with bidirectionally predictive (B) pictures within a video sequence are provided for direct mode encoding and/or motion vector prediction using spatial prediction techniques.
Abstract: Several improvements for use with Bidirectionally Predictive (B) pictures within a video sequence are provided. In certain improvements Direct Mode encoding and/or Motion Vector Prediction are enhanced using spatial prediction techniques. In other improvements Motion Vector prediction includes temporal distance and subblock information, for example, for more accurate prediction. Such improvements and other presented herein significantly improve the performance of any applicable video coding system/logic.

115 citations

Journal ArticleDOI
TL;DR: This paper proposes an adaptive nonlocal sparse representation (ANSR) model to boost the performance of dual-camera compressive hyperspectral imaging (DCCHI), formulated as a 3D cube based sparse representation to make full use of the nonlocal similarity in both the spatial and spectral domains.
Abstract: Leveraging the compressive sensing (CS) theory, coded aperture snapshot spectral imaging (CASSI) provides an efficient solution to recover 3D hyperspectral data from a 2D measurement. The dual-camera design of CASSI, by adding an uncoded panchromatic measurement, enhances the reconstruction fidelity while maintaining the snapshot advantage. In this paper, we propose an adaptive nonlocal sparse representation (ANSR) model to boost the performance of dual-camera compressive hyperspectral imaging (DCCHI). Specifically, the CS reconstruction problem is formulated as a 3D cube based sparse representation to make full use of the nonlocal similarity in both the spatial and spectral domains. Our key observation is that, the panchromatic image, besides playing the role of direct measurement, can be further exploited to help the nonlocal similarity estimation. Therefore, we design a joint similarity metric by adaptively combining the internal similarity within the reconstructed hyperspectral image and the external similarity within the panchromatic image. In this way, the fidelity of CS reconstruction is greatly enhanced. Both simulation and hardware experimental results show significant improvement of the proposed method over the state-of-the-art.

115 citations


Cited by
More filters
Proceedings ArticleDOI
21 Jul 2017
TL;DR: SRGAN as mentioned in this paper proposes a perceptual loss function which consists of an adversarial loss and a content loss, which pushes the solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images.
Abstract: Despite the breakthroughs in accuracy and speed of single image super-resolution using faster and deeper convolutional neural networks, one central problem remains largely unsolved: how do we recover the finer texture details when we super-resolve at large upscaling factors? The behavior of optimization-based super-resolution methods is principally driven by the choice of the objective function. Recent work has largely focused on minimizing the mean squared reconstruction error. The resulting estimates have high peak signal-to-noise ratios, but they are often lacking high-frequency details and are perceptually unsatisfying in the sense that they fail to match the fidelity expected at the higher resolution. In this paper, we present SRGAN, a generative adversarial network (GAN) for image super-resolution (SR). To our knowledge, it is the first framework capable of inferring photo-realistic natural images for 4x upscaling factors. To achieve this, we propose a perceptual loss function which consists of an adversarial loss and a content loss. The adversarial loss pushes our solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images. In addition, we use a content loss motivated by perceptual similarity instead of similarity in pixel space. Our deep residual network is able to recover photo-realistic textures from heavily downsampled images on public benchmarks. An extensive mean-opinion-score (MOS) test shows hugely significant gains in perceptual quality using SRGAN. The MOS scores obtained with SRGAN are closer to those of the original high-resolution images than to those obtained with any state-of-the-art method.

6,884 citations

Book ChapterDOI
08 Oct 2016
TL;DR: In this paper, the authors combine the benefits of both approaches, and propose the use of perceptual loss functions for training feed-forward networks for image style transfer, where a feedforward network is trained to solve the optimization problem proposed by Gatys et al. in real-time.
Abstract: We consider image transformation problems, where an input image is transformed into an output image. Recent methods for such problems typically train feed-forward convolutional neural networks using a per-pixel loss between the output and ground-truth images. Parallel work has shown that high-quality images can be generated by defining and optimizing perceptual loss functions based on high-level features extracted from pretrained networks. We combine the benefits of both approaches, and propose the use of perceptual loss functions for training feed-forward networks for image transformation tasks. We show results on image style transfer, where a feed-forward network is trained to solve the optimization problem proposed by Gatys et al. in real-time. Compared to the optimization-based method, our network gives similar qualitative results but is three orders of magnitude faster. We also experiment with single-image super-resolution, where replacing a per-pixel loss with a perceptual loss gives visually pleasing results.

6,639 citations

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a deep learning method for single image super-resolution (SR), which directly learns an end-to-end mapping between the low/high-resolution images.
Abstract: We propose a deep learning method for single image super-resolution (SR). Our method directly learns an end-to-end mapping between the low/high-resolution images. The mapping is represented as a deep convolutional neural network (CNN) that takes the low-resolution image as the input and outputs the high-resolution one. We further show that traditional sparse-coding-based SR methods can also be viewed as a deep convolutional network. But unlike traditional methods that handle each component separately, our method jointly optimizes all layers. Our deep CNN has a lightweight structure, yet demonstrates state-of-the-art restoration quality, and achieves fast speed for practical on-line usage. We explore different network structures and parameter settings to achieve trade-offs between performance and speed. Moreover, we extend our network to cope with three color channels simultaneously, and show better overall reconstruction quality.

6,122 citations

Posted Content
TL;DR: This work considers image transformation problems, and proposes the use of perceptual loss functions for training feed-forward networks for image transformation tasks, and shows results on image style transfer, where aFeed-forward network is trained to solve the optimization problem proposed by Gatys et al. in real-time.
Abstract: We consider image transformation problems, where an input image is transformed into an output image. Recent methods for such problems typically train feed-forward convolutional neural networks using a \emph{per-pixel} loss between the output and ground-truth images. Parallel work has shown that high-quality images can be generated by defining and optimizing \emph{perceptual} loss functions based on high-level features extracted from pretrained networks. We combine the benefits of both approaches, and propose the use of perceptual loss functions for training feed-forward networks for image transformation tasks. We show results on image style transfer, where a feed-forward network is trained to solve the optimization problem proposed by Gatys et al in real-time. Compared to the optimization-based method, our network gives similar qualitative results but is three orders of magnitude faster. We also experiment with single-image super-resolution, where replacing a per-pixel loss with a perceptual loss gives visually pleasing results.

5,668 citations

Posted Content
TL;DR: SRGAN, a generative adversarial network (GAN) for image super-resolution (SR), is presented, to its knowledge, the first framework capable of inferring photo-realistic natural images for 4x upscaling factors and a perceptual loss function which consists of an adversarial loss and a content loss.
Abstract: Despite the breakthroughs in accuracy and speed of single image super-resolution using faster and deeper convolutional neural networks, one central problem remains largely unsolved: how do we recover the finer texture details when we super-resolve at large upscaling factors? The behavior of optimization-based super-resolution methods is principally driven by the choice of the objective function. Recent work has largely focused on minimizing the mean squared reconstruction error. The resulting estimates have high peak signal-to-noise ratios, but they are often lacking high-frequency details and are perceptually unsatisfying in the sense that they fail to match the fidelity expected at the higher resolution. In this paper, we present SRGAN, a generative adversarial network (GAN) for image super-resolution (SR). To our knowledge, it is the first framework capable of inferring photo-realistic natural images for 4x upscaling factors. To achieve this, we propose a perceptual loss function which consists of an adversarial loss and a content loss. The adversarial loss pushes our solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images. In addition, we use a content loss motivated by perceptual similarity instead of similarity in pixel space. Our deep residual network is able to recover photo-realistic textures from heavily downsampled images on public benchmarks. An extensive mean-opinion-score (MOS) test shows hugely significant gains in perceptual quality using SRGAN. The MOS scores obtained with SRGAN are closer to those of the original high-resolution images than to those obtained with any state-of-the-art method.

4,404 citations