scispace - formally typeset
Search or ask a question
Author

Nelson Chong Ngee Bow

Other affiliations: ETH Zurich
Bio: Nelson Chong Ngee Bow is an academic researcher from Multimedia University. The author has contributed to research in topics: Image resolution & Feature extraction. The author has an hindex of 2, co-authored 3 publications receiving 32 citations. Previous affiliations of Nelson Chong Ngee Bow include ETH Zurich.

Papers
More filters
Proceedings ArticleDOI
16 Jun 2019
TL;DR: The first NTIRE challenge on perceptual image enhancement as discussed by the authors focused on proposed solutions and results of real-world photo enhancement problem, where the goal was to map low-quality photos from the iPhone 3GS device to the same photos captured with Canon 70D DSLR camera.
Abstract: This paper reviews the first NTIRE challenge on perceptual image enhancement with the focus on proposed solutions and results. The participating teams were solving a real-world photo enhancement problem, where the goal was to map low-quality photos from the iPhone 3GS device to the same photos captured with Canon 70D DSLR camera. The considered problem embraced a number of computer vision subtasks, such as image denoising, image resolution and sharpness enhancement, image color/contrast/exposure adjustment, etc. The target metric used in this challenge combined PSNR and SSIM scores with solutions' perceptual results measured in the user study. The proposed solutions significantly improved baseline results, defining the state-of-the-art for practical image enhancement.

45 citations

Proceedings ArticleDOI
16 Jun 2019
TL;DR: GANmera is proposed, a deep adversarial network which is capable of performing aesthetically-driven enhancement of photographs, which adopts a 2-way GAN architecture and is semi-supervised with aesthetic-based binary labels (good and bad).
Abstract: Generative adversarial networks (GANs) have become increasingly popular in recent years owing to its ability to synthesize and transfer. The image enhancement task can also be modeled as an image-to-image translation problem. In this paper, we propose GANmera, a deep adversarial network which is capable of performing aesthetically-driven enhancement of photographs. The network adopts a 2-way GAN architecture and is semi-supervised with aesthetic-based binary labels (good and bad). The network is trained with unpaired image sets, hence eliminating the need for strongly supervised before-after pairs. Using CycleGAN as the base architecture, several fine-grained modifications are made to the loss functions, activation functions and resizing schemes, to achieve improved stability in the generator. Two training strategies are devised to produce results with varying aesthetic output. Quantitative evaluation on the recent benchmark MIT-Adobe-5K dataset demonstrate the capability of our method in achieving state-of-the-art PSNR results. We also show qualitatively that the proposed approach produces aesthetically-pleasing images. This work is a shortlisted submission to the CVPR 2019 NTIRE Image Enhancement Challenge.

8 citations

Proceedings ArticleDOI
01 Dec 2020
TL;DR: Zhang et al. as mentioned in this paper proposed a low-light image enhancement method that consists of an image disentanglement network and an illumination boosting network, which decomposes the input image into image details and image illumination.
Abstract: Though learning-based low-light enhancement methods have achieved significant success, existing methods are still sensitive to noise and unnatural appearance. The problems may come from the lack of structural awareness and the confusion between noise and texture. Thus, we present a lowlight image enhancement method that consists of an image disentanglement network and an illumination boosting network. The disentanglement network is first used to decompose the input image into image details and image illumination. The extracted illumination part then goes through a multi-branch enhancement network designed to improve the dynamic range of the image. The multi-branch network extracts multi-level image features and enhances them via numerous subnets. These enhanced features are then fused to generate the enhanced illumination part. Finally, the denoised image details and the enhanced illumination are entangled to produce the normallight image. Experimental results show that our method can produce visually pleasing images in many public datasets.

Cited by
More filters
Posted Content
TL;DR: The superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, is shown, suggesting that the HRNet is a stronger backbone for computer vision problems.
Abstract: High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection. Existing state-of-the-art frameworks first encode the input image as a low-resolution representation through a subnetwork that is formed by connecting high-to-low resolution convolutions \emph{in series} (e.g., ResNet, VGGNet), and then recover the high-resolution representation from the encoded low-resolution representation. Instead, our proposed network, named as High-Resolution Network (HRNet), maintains high-resolution representations through the whole process. There are two key characteristics: (i) Connect the high-to-low resolution convolution streams \emph{in parallel}; (ii) Repeatedly exchange the information across resolutions. The benefit is that the resulting representation is semantically richer and spatially more precise. We show the superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, suggesting that the HRNet is a stronger backbone for computer vision problems. All the codes are available at~{\url{this https URL}}.

1,278 citations

Journal ArticleDOI
TL;DR: The High-Resolution Network (HRNet) as mentioned in this paper maintains high-resolution representations through the whole process by connecting the high-to-low resolution convolution streams in parallel and repeatedly exchanging the information across resolutions.
Abstract: High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection. Existing state-of-the-art frameworks first encode the input image as a low-resolution representation through a subnetwork that is formed by connecting high-to-low resolution convolutions in series (e.g., ResNet, VGGNet), and then recover the high-resolution representation from the encoded low-resolution representation. Instead, our proposed network, named as High-Resolution Network (HRNet), maintains high-resolution representations through the whole process. There are two key characteristics: (i) Connect the high-to-low resolution convolution streams in parallel and (ii) repeatedly exchange the information across resolutions. The benefit is that the resulting representation is semantically richer and spatially more precise. We show the superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, suggesting that the HRNet is a stronger backbone for computer vision problems. All the codes are available at https://github.com/HRNet .

1,162 citations

Proceedings ArticleDOI
01 Jun 2022
TL;DR: Restormer as discussed by the authors proposes an efficient Transformer model by making several key designs in the building blocks (multi-head attention and feed-forward network) such that it can capture long-range pixel interactions, while still remaining applicable to large images.
Abstract: Since convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data, these models have been extensively applied to image restoration and related tasks. Recently, another class of neural architectures, Transformers, have shown significant performance gains on natural language and high-level vision tasks. While the Transformer model mitigates the shortcomings of CNNs (i.e., limited receptive field and inadaptability to input content), its computational complexity grows quadratically with the spatial resolution, therefore making it infeasible to apply to most image restoration tasks involving high-resolution images. In this work, we propose an efficient Transformer model by making several key designs in the building blocks (multi-head attention and feed-forward network) such that it can capture long-range pixel interactions, while still remaining applicable to large images. Our model, named Restoration Transformer (Restormer), achieves state-of-the-art results on several image restoration tasks, including image deraining, single-image motion deblurring, defocus deblurring (single-image and dual-pixel data), and image denoising (Gaussian grayscale/color denoising, and real image denoising). The source code and pre-trained models are available at https://github.com/swz30/Restormer.

136 citations

Posted Content
TL;DR: This paper evaluates the performance and compares the results of all chipsets from Qualcomm, HiSilicon, Samsung, MediaTek and Unisoc that are providing hardware acceleration for AI inference and discusses the recent changes in the Android ML pipeline.
Abstract: The performance of mobile AI accelerators has been evolving rapidly in the past two years, nearly doubling with each new generation of SoCs. The current 4th generation of mobile NPUs is already approaching the results of CUDA-compatible Nvidia graphics cards presented not long ago, which together with the increased capabilities of mobile deep learning frameworks makes it possible to run complex and deep AI models on mobile devices. In this paper, we evaluate the performance and compare the results of all chipsets from Qualcomm, HiSilicon, Samsung, MediaTek and Unisoc that are providing hardware acceleration for AI inference. We also discuss the recent changes in the Android ML pipeline and provide an overview of the deployment of deep learning models on mobile devices. All numerical results provided in this paper can be found and are regularly updated on the official project website: this http URL.

88 citations