scispace - formally typeset
Search or ask a question
Author

Guoan Cheng

Bio: Guoan Cheng is an academic researcher from Hong Kong Polytechnic University. The author has contributed to research in topics: Computer science & Artificial intelligence. The author has an hindex of 1, co-authored 1 publications receiving 80 citations.

Papers
More filters
Proceedings ArticleDOI
16 Jun 2019
TL;DR: The 3rd NTIRE challenge on single-image super-resolution (restoration of rich details in a low-resolution image) is reviewed with a focus on proposed solutions and results and the state-of-the-art in real-world single image super- resolution.
Abstract: This paper reviewed the 3rd NTIRE challenge on single-image super-resolution (restoration of rich details in a low-resolution image) with a focus on proposed solutions and results. The challenge had 1 track, which was aimed at the real-world single image super-resolution problem with an unknown scaling factor. Participants were mapping low-resolution images captured by a DSLR camera with a shorter focal length to their high-resolution images captured at a longer focal length. With this challenge, we introduced a novel real-world super-resolution dataset (RealSR). The track had 403 registered participants, and 36 teams competed in the final testing phase. They gauge the state-of-the-art in real-world single image super-resolution.

118 citations

DOI
TL;DR: An involution-based lightweight method with contrastive learning for efficient SISR, which learns the weight but also the bias for convolution and additionally applies residual path to involution operation.
Abstract: Abstract. Single-image super-resolution (SISR) studies have achieved superior improvement with the development of convolution neural networks. However, most methods sink into the high computation cost. To tackle this issue, we propose an involution-based lightweight method with contrastive learning for efficient SISR. Unlike the original involution, we set the group number of involution operations to the input feature channels. This setting guarantees the spatial- and channel-specific peculiarity. Moreover, our implemented involution not only learns the weight but also the bias for convolution. Simultaneously, we rethink the kernel generation functions of involution. Instead, we utilize Sigmoid with reparameterized convolution. We additionally apply residual path to involution operation. Furthermore, contrastive learning is adopted during training to learn universal features. Compared with state-of-the-art efficient SISR methods, our proposed methods achieve the best performance with similar or fewer parameters.

1 citations

Journal ArticleDOI
31 Mar 2022-Entropy
TL;DR: A lightweight network that automatically searches dense connection (ASDCN) for image super-resolution (SR), which effectively reduces redundancy in dense connection and focuses on more valuable features.
Abstract: The development of display technology has continuously increased the requirements for image resolution. However, the imaging systems of many cameras are limited by their physical conditions, and the image resolution is often restrictive. Recently, several models based on deep convolutional neural network (CNN) have gained significant performance for image super-resolution (SR), while extensive memory consumption and computation overhead hinder practical applications. For this purpose, we present a lightweight network that automatically searches dense connection (ASDCN) for image super-resolution (SR), which effectively reduces redundancy in dense connection and focuses on more valuable features. We employ neural architecture search (NAS) to model the searching of dense connections. Qualitative and quantitative experiments on five public datasets show that our derived model achieves superior performance over the state-of-the-art models.
Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper improved semisupervised surveillance video character extraction and recognition with attentional learning multiframe feature fusion, and a character image denoising algorithm based on semi-supervised fuzzy C-means clustering was proposed to isolate and extract clean binary character images.
Abstract: Character extraction in the video is very helpful to the understanding of the video content, especially the artificially superimposed characters such as time and place in the surveillance video. However, the performance of the existing algorithms does not meet the needs of application. Therefore, the authors improve semisupervised surveillance video character extraction and recognition with attentional learning multiframe feature fusion. First, the multiframe fusion strategy based on an attention mechanism is adopted to solve the target missing problem, and the Dense ASPP network is introduced to solve the character multiscale problem. Second, a character image denoising algorithm based on semisupervised fuzzy C-means clustering is proposed to isolate and extract clean binary character images. Finally, for some video characters that may involve privacy, traditional and deep learning-based video restoration algorithms are used for characteristic elimination.
Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a novel feature fusion module and an effective feature enhancement module, which can significantly improve the performance of the original SSD, which achieved 82.5% mean Average Precision (mAP).
Abstract: Single Shot Multibox Detector (SSD) uses multi-scale feature maps to detect and recognize objects, which considers the advantages of both accuracy and speed, but it is still limited to detecting small-sized objects. Many researchers design new detectors to improve the accuracy by changing the structure of the multi-scale feature pyramid which has proved very useful. But most of them only simply merge several feature maps without making full use of the close connection between features with different scales. In contrast, a novel feature fusion module and an effective feature enhancement module is proposed, which can significantly improve the performance of the original SSD. In the feature fusion module, the feature pyramid is produced through iteratively fusing three feature maps with different receptive fields to obtain contextual information. In the feature enhancement module, the features are enhanced along the channel and spatial dimensions at the same time to improve their expression ability. Our network can achieve 82.5% mean Average Precision (mAP) on the VOC 2007 [Formula: see text], 81.4% mAP on the VOC 2012 [Formula: see text] and 34.8% mAP on COCO [Formula: see text]-[Formula: see text]2017, respectively, with the input size [Formula: see text]. Comparative experiments prove that our method outperforms many state-of-the-art detectors in both aspects of accuracy and speed.

Cited by
More filters
Book ChapterDOI
23 Aug 2020
TL;DR: MIRNet as mentioned in this paper proposes a multi-scale residual block containing several key elements: (a) parallel multi-resolution convolution streams for extracting mult-scale features, (b) information exchange across the multiresolution streams, (c) spatial and channel attention mechanisms for capturing contextual information, and (d) attention-based multiscale feature aggregation.
Abstract: With the goal of recovering high-quality image content from its degraded version, image restoration enjoys numerous applications, such as in surveillance, computational photography and medical imaging. Recently, convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task. Existing CNN-based methods typically operate either on full-resolution or on progressively low-resolution representations. In the former case, spatially precise but contextually less robust results are achieved, while in the latter case, semantically reliable but spatially less accurate outputs are generated. In this paper, we present an architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network and receiving strong contextual information from the low-resolution representations. The core of our approach is a multi-scale residual block containing several key elements: (a) parallel multi-resolution convolution streams for extracting multi-scale features, (b) information exchange across the multi-resolution streams, (c) spatial and channel attention mechanisms for capturing contextual information, and (d) attention based multi-scale feature aggregation. In a nutshell, our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details. Extensive experiments on five real image benchmark datasets demonstrate that our method, named as MIRNet, achieves state-of-the-art results for image denoising, super-resolution, and image enhancement. The source code and pre-trained models are available at https://github.com/swz30/MIRNet.

357 citations

Proceedings ArticleDOI
16 Jun 2019
TL;DR: It is found that the NTIRE 2019 challenges push the state-of-the-art in video deblurring and super-resolution, reaching compelling performance on the newly proposed REDS dataset.
Abstract: This paper introduces a novel large dataset for video deblurring, video super-resolution and studies the state-of-the-art as emerged from the NTIRE 2019 video restoration challenges. The video deblurring and video super-resolution challenges are each the first challenge of its kind, with 4 competitions, hundreds of participants and tens of proposed solutions. Our newly collected REalistic and Diverse Scenes dataset (REDS) was employed by the challenges. In our study, we compare the solutions from the challenges to a set of representative methods from the literature and evaluate them on our proposed REDS dataset. We find that the NTIRE 2019 challenges push the state-of-the-art in video deblurring and super-resolution, reaching compelling performance on our newly proposed REDS dataset.

328 citations

Proceedings ArticleDOI
01 Oct 2019
TL;DR: Li et al. as mentioned in this paper proposed a Laplacian pyramid based kernel prediction network (LP-KPN), which efficiently learns per-pixel kernels to recover the HR image, which achieved better visual quality with sharper edges and finer textures on real-world scenes.
Abstract: Most of the existing learning-based single image super-resolution (SISR) methods are trained and evaluated on simulated datasets, where the low-resolution (LR) images are generated by applying a simple and uniform degradation (i.e., bicubic downsampling) to their high-resolution (HR) counterparts. However, the degradations in real-world LR images are far more complicated. As a consequence, the SISR models trained on simulated data become less effective when applied to practical scenarios. In this paper, we build a real-world super-resolution (RealSR) dataset where paired LR-HR images on the same scene are captured by adjusting the focal length of a digital camera. An image registration algorithm is developed to progressively align the image pairs at different resolutions. Considering that the degradation kernels are naturally non-uniform in our dataset, we present a Laplacian pyramid based kernel prediction network (LP-KPN), which efficiently learns per-pixel kernels to recover the HR image. Our extensive experiments demonstrate that SISR models trained on our RealSR dataset deliver better visual quality with sharper edges and finer textures on real-world scenes than those trained on simulated datasets. Though our RealSR dataset is built by using only two cameras (Canon 5D3 and Nikon D810), the trained model generalizes well to other camera devices such as Sony a7II and mobile phones.

318 citations

Journal ArticleDOI
TL;DR: Deep convolutional networks–based super-resolution is a fast-growing field with numerous practical applications and this exposition extensively compare more than 30 state-of-the-art super-resolves.
Abstract: Deep convolutional networks–based super-resolution is a fast-growing field with numerous practical applications. In this exposition, we extensively compare more than 30 state-of-the-art super-resolution Convolutional Neural Networks (CNNs) over three classical and three recently introduced challenging datasets to benchmark single image super-resolution. We introduce a taxonomy for deep learning–based super-resolution networks that groups existing methods into nine categories including linear, residual, multi-branch, recursive, progressive, attention-based, and adversarial designs. We also provide comparisons between the models in terms of network complexity, memory footprint, model input and output, learning details, the type of network losses, and important architectural differences (e.g., depth, skip-connections, filters). The extensive evaluation performed shows the consistent and rapid growth in the accuracy in the past few years along with a corresponding boost in model complexity and the availability of large-scale datasets. It is also observed that the pioneering methods identified as the benchmarks have been significantly outperformed by the current contenders. Despite the progress in recent years, we identify several shortcomings of existing techniques and provide future research directions towards the solution of these open problems. Datasets and codes for evaluation are publicly available at https://github.com/saeed-anwar/SRsurvey.

162 citations

Journal ArticleDOI
TL;DR: In this paper, a deep Fourier channel attention network (DFCAN) was proposed to learn hierarchical representations of high-frequency information about diverse biological structures using multimodal structured illumination microscopy (SIM).
Abstract: Deep neural networks have enabled astonishing transformations from low-resolution (LR) to super-resolved images However, whether, and under what imaging conditions, such deep-learning models outperform super-resolution (SR) microscopy is poorly explored Here, using multimodality structured illumination microscopy (SIM), we first provide an extensive dataset of LR-SR image pairs and evaluate the deep-learning SR models in terms of structural complexity, signal-to-noise ratio and upscaling factor Second, we devise the deep Fourier channel attention network (DFCAN), which leverages the frequency content difference across distinct features to learn precise hierarchical representations of high-frequency information about diverse biological structures Third, we show that DFCAN's Fourier domain focalization enables robust reconstruction of SIM images under low signal-to-noise ratio conditions We demonstrate that DFCAN achieves comparable image quality to SIM over a tenfold longer duration in multicolor live-cell imaging experiments, which reveal the detailed structures of mitochondrial cristae and nucleoids and the interaction dynamics of organelles and cytoskeleton

132 citations