scispace - formally typeset
Search or ask a question
Author

Wushao Wen

Other affiliations: ETH Zurich
Bio: Wushao Wen is an academic researcher from Hong Kong Polytechnic University. The author has contributed to research in topics: Image resolution & Image restoration. The author has an hindex of 2, co-authored 2 publications receiving 105 citations. Previous affiliations of Wushao Wen include ETH Zurich.

Papers
More filters
Proceedings Article•DOI•
16 Jun 2019
TL;DR: The 3rd NTIRE challenge on single-image super-resolution (restoration of rich details in a low-resolution image) is reviewed with a focus on proposed solutions and results and the state-of-the-art in real-world single image super- resolution.
Abstract: This paper reviewed the 3rd NTIRE challenge on single-image super-resolution (restoration of rich details in a low-resolution image) with a focus on proposed solutions and results. The challenge had 1 track, which was aimed at the real-world single image super-resolution problem with an unknown scaling factor. Participants were mapping low-resolution images captured by a DSLR camera with a shorter focal length to their high-resolution images captured at a longer focal length. With this challenge, we introduced a novel real-world super-resolution dataset (RealSR). The track had 403 registered participants, and 36 teams competed in the final testing phase. They gauge the state-of-the-art in real-world single image super-resolution.

118 citations

Proceedings Article•DOI•
16 Jun 2019
TL;DR: The first NTIRE challenge on perceptual image enhancement as discussed by the authors focused on proposed solutions and results of real-world photo enhancement problem, where the goal was to map low-quality photos from the iPhone 3GS device to the same photos captured with Canon 70D DSLR camera.
Abstract: This paper reviews the first NTIRE challenge on perceptual image enhancement with the focus on proposed solutions and results. The participating teams were solving a real-world photo enhancement problem, where the goal was to map low-quality photos from the iPhone 3GS device to the same photos captured with Canon 70D DSLR camera. The considered problem embraced a number of computer vision subtasks, such as image denoising, image resolution and sharpness enhancement, image color/contrast/exposure adjustment, etc. The target metric used in this challenge combined PSNR and SSIM scores with solutions' perceptual results measured in the user study. The proposed solutions significantly improved baseline results, defining the state-of-the-art for practical image enhancement.

45 citations


Cited by
More filters
Posted Content•
TL;DR: The superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, is shown, suggesting that the HRNet is a stronger backbone for computer vision problems.
Abstract: High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection. Existing state-of-the-art frameworks first encode the input image as a low-resolution representation through a subnetwork that is formed by connecting high-to-low resolution convolutions \emph{in series} (e.g., ResNet, VGGNet), and then recover the high-resolution representation from the encoded low-resolution representation. Instead, our proposed network, named as High-Resolution Network (HRNet), maintains high-resolution representations through the whole process. There are two key characteristics: (i) Connect the high-to-low resolution convolution streams \emph{in parallel}; (ii) Repeatedly exchange the information across resolutions. The benefit is that the resulting representation is semantically richer and spatially more precise. We show the superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, suggesting that the HRNet is a stronger backbone for computer vision problems. All the codes are available at~{\url{this https URL}}.

1,278 citations

Journal Article•DOI•
TL;DR: The High-Resolution Network (HRNet) as mentioned in this paper maintains high-resolution representations through the whole process by connecting the high-to-low resolution convolution streams in parallel and repeatedly exchanging the information across resolutions.
Abstract: High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection. Existing state-of-the-art frameworks first encode the input image as a low-resolution representation through a subnetwork that is formed by connecting high-to-low resolution convolutions in series (e.g., ResNet, VGGNet), and then recover the high-resolution representation from the encoded low-resolution representation. Instead, our proposed network, named as High-Resolution Network (HRNet), maintains high-resolution representations through the whole process. There are two key characteristics: (i) Connect the high-to-low resolution convolution streams in parallel and (ii) repeatedly exchange the information across resolutions. The benefit is that the resulting representation is semantically richer and spatially more precise. We show the superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, suggesting that the HRNet is a stronger backbone for computer vision problems. All the codes are available at https://github.com/HRNet .

1,162 citations

Book Chapter•DOI•
23 Aug 2020
TL;DR: MIRNet as mentioned in this paper proposes a multi-scale residual block containing several key elements: (a) parallel multi-resolution convolution streams for extracting mult-scale features, (b) information exchange across the multiresolution streams, (c) spatial and channel attention mechanisms for capturing contextual information, and (d) attention-based multiscale feature aggregation.
Abstract: With the goal of recovering high-quality image content from its degraded version, image restoration enjoys numerous applications, such as in surveillance, computational photography and medical imaging. Recently, convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task. Existing CNN-based methods typically operate either on full-resolution or on progressively low-resolution representations. In the former case, spatially precise but contextually less robust results are achieved, while in the latter case, semantically reliable but spatially less accurate outputs are generated. In this paper, we present an architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network and receiving strong contextual information from the low-resolution representations. The core of our approach is a multi-scale residual block containing several key elements: (a) parallel multi-resolution convolution streams for extracting multi-scale features, (b) information exchange across the multi-resolution streams, (c) spatial and channel attention mechanisms for capturing contextual information, and (d) attention based multi-scale feature aggregation. In a nutshell, our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details. Extensive experiments on five real image benchmark datasets demonstrate that our method, named as MIRNet, achieves state-of-the-art results for image denoising, super-resolution, and image enhancement. The source code and pre-trained models are available at https://github.com/swz30/MIRNet.

357 citations

Proceedings Article•DOI•
16 Jun 2019
TL;DR: It is found that the NTIRE 2019 challenges push the state-of-the-art in video deblurring and super-resolution, reaching compelling performance on the newly proposed REDS dataset.
Abstract: This paper introduces a novel large dataset for video deblurring, video super-resolution and studies the state-of-the-art as emerged from the NTIRE 2019 video restoration challenges. The video deblurring and video super-resolution challenges are each the first challenge of its kind, with 4 competitions, hundreds of participants and tens of proposed solutions. Our newly collected REalistic and Diverse Scenes dataset (REDS) was employed by the challenges. In our study, we compare the solutions from the challenges to a set of representative methods from the literature and evaluate them on our proposed REDS dataset. We find that the NTIRE 2019 challenges push the state-of-the-art in video deblurring and super-resolution, reaching compelling performance on our newly proposed REDS dataset.

328 citations

Proceedings Article•DOI•
Jianrui Cai1, Hui Zeng1, Hongwei Yong1, Zisheng Cao, Lei Zhang1 •
01 Oct 2019
TL;DR: Li et al. as mentioned in this paper proposed a Laplacian pyramid based kernel prediction network (LP-KPN), which efficiently learns per-pixel kernels to recover the HR image, which achieved better visual quality with sharper edges and finer textures on real-world scenes.
Abstract: Most of the existing learning-based single image super-resolution (SISR) methods are trained and evaluated on simulated datasets, where the low-resolution (LR) images are generated by applying a simple and uniform degradation (i.e., bicubic downsampling) to their high-resolution (HR) counterparts. However, the degradations in real-world LR images are far more complicated. As a consequence, the SISR models trained on simulated data become less effective when applied to practical scenarios. In this paper, we build a real-world super-resolution (RealSR) dataset where paired LR-HR images on the same scene are captured by adjusting the focal length of a digital camera. An image registration algorithm is developed to progressively align the image pairs at different resolutions. Considering that the degradation kernels are naturally non-uniform in our dataset, we present a Laplacian pyramid based kernel prediction network (LP-KPN), which efficiently learns per-pixel kernels to recover the HR image. Our extensive experiments demonstrate that SISR models trained on our RealSR dataset deliver better visual quality with sharper edges and finer textures on real-world scenes than those trained on simulated datasets. Though our RealSR dataset is built by using only two cameras (Canon 5D3 and Nikon D810), the trained model generalizes well to other camera devices such as Sony a7II and mobile phones.

318 citations