Home
/
Authors
/
Hsien-Kai Kuo

Author

Hsien-Kai Kuo

Other affiliations: National Chiao Tung University

Bio: Hsien-Kai Kuo is an academic researcher from MediaTek. The author has contributed to research in topics: Cache & Computer science. The author has an hindex of 7, co-authored 25 publications receiving 135 citations. Previous affiliations of Hsien-Kai Kuo include National Chiao Tung University.

Topics: Cache, Computer science, Cache pollution, Engineering, Shared memory ...read more

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Unified Dynamic Convolutional Network for Super-Resolution With Variational Degradations

[...]

Yu-Syuan Xu¹, Shou-Yao Roy Tseng¹, Yu Tseng¹, Hsien-Kai Kuo¹, Yi-Min Tsai¹ - Show less +1 more•Institutions (1)

MediaTek¹

14 Jun 2020

TL;DR: In this paper, the authors proposed a unified network to accommodate the variations from inter-image (cross-image variations) and intraimage (spatial variations) by incorporating dynamic convolution which is a far more flexible alternative to handle different variations.

...read moreread less

Abstract: Deep Convolutional Neural Networks (CNNs) have achieved remarkable results on Single Image Super-Resolution (SISR). Despite considering only a single degradation, recent studies also include multiple degrading effects to better reflect real-world cases. However, most of the works assume a fixed combination of degrading effects, or even train an individual network for different combinations. Instead, a more practical approach is to train a single network for wide-ranging and variational degradations. To fulfill this requirement, this paper proposes a unified network to accommodate the variations from inter-image (cross-image variations) and intra-image (spatial variations). Different from the existing works, we incorporate dynamic convolution which is a far more flexible alternative to handle different variations. In SISR with non-blind setting, our Unified Dynamic Convolutional Network for Variational Degradations (UDVD) is evaluated on both synthetic and real images with an extensive set of variations. The qualitative results demonstrate the effectiveness of UDVD over various existing works. Extensive experiments show that our UDVD achieves favorable or comparable performance on both synthetic and real images.

...read moreread less

59 citations

Posted Content•

Unified Dynamic Convolutional Network for Super-Resolution with Variational Degradations

[...]

Yu-Syuan Xu¹, Shou-Yao Roy Tseng¹, Yu Tseng¹, Hsien-Kai Kuo¹, Yi-Min Tsai¹ - Show less +1 more•Institutions (1)

MediaTek¹

15 Apr 2020-arXiv: Image and Video Processing

TL;DR: This paper proposes a unified network to accommodate the variations from inter-image (cross-image variations) and intra- image (spatial variations), and incorporates dynamic convolution which is a far more flexible alternative to handle different variations.

...read moreread less

45 citations

Proceedings Article•DOI•

Learned Smartphone ISP on Mobile NPUs with Deep Learning, Mobile AI 2021 Challenge: Report

[...]

Andrey Ignatov¹, Cheng-Ming Chiang², Hsien-Kai Kuo², Anastasia Sycheva, Radu Timofte¹, Min-Hung Chen³, Man-Yu Lee, Yu-Syuan Xu², Yu Tseng², Shusong Xu, Jin Guo, Chao-Hung Chen⁴, Ming-Chun Hsyu⁴, Wen-Chia Tsai⁴, Chao-Wei Chen⁴, Grigory Malivenko, Minsu Kwon, Myungje Lee, Jaeyoon Yoo⁵, Changbeom Kang, Shinjo Wang, Zheng Shaolong, Hao Dejun, Xie Fen, Feng Zhuang, Yipeng Ma⁶, Jingyang Peng⁶, Tao Wang⁷, Fenglong Song⁶, Chih-Chung Hsu, Kwan-Lin Chen, Mei-Hsuang Wu, Vishal Chudasama⁸, Kalpesh Prajapati⁸, Heena Patel⁵, Anjali Sarvaiya⁸, Kishor P. Upla⁸, Kiran B. Raja⁹, Raghavendra Ramachandra⁹, Christoph Busch⁹, Etienne de Stoutz¹ - Show less +37 more•Institutions (9)

ETH Zurich¹, MediaTek², Georgia Institute of Technology³, Industrial Technology Research Institute⁴, Seoul National University⁵, Huawei⁶, Bristol-Myers Squibb⁷, Sardar Vallabhbhai National Institute of Technology, Surat⁸, Norwegian University of Science and Technology⁹

01 Jan 2021

TL;DR: In this article, an end-to-end deep learning-based image signal processing (ISP) pipeline that can replace classical hand-crafted ISPs and achieve nearly real-time performance on smartphone NPUs was developed.

...read moreread less

Abstract: As the quality of mobile cameras starts to play a crucial role in modern smartphones, more and more attention is now being paid to ISP algorithms used to improve various perceptual aspects of mobile photos. In this Mobile AI challenge, the target was to develop an end-to-end deep learning-based image signal processing (ISP) pipeline that can replace classical hand-crafted ISPs and achieve nearly real-time performance on smartphone NPUs. For this, the participants were provided with a novel learned ISP dataset consisting of RAW-RGB image pairs captured with the Sony IMX586 Quad Bayer mobile sensor and a professional 102-megapixel medium format camera. The runtime of all models was evaluated on the MediaTek Dimensity 1000+ platform with a dedicated AI processing unit capable of accelerating both floating-point and quantized neural networks. The proposed solutions are fully compatible with the above NPU and are capable of processing Full HD photos under 60-100 milliseconds while achieving high fidelity results. A detailed description of all models developed in this challenge is provided in this paper.

...read moreread less

45 citations

Proceedings Article•DOI•

Deploying Image Deblurring across Mobile Devices: A Perspective of Quality and Latency

[...]

Cheng-Ming Chiang¹, Yu Tseng¹, Yu-Syuan Xu¹, Hsien-Kai Kuo¹, Yi-Min Tsai¹, Guan-Yu Chen¹, Koan-Sin Tan¹, Wei-Ting Wang¹, Yu-Chieh Lin¹, Shou-Yao Roy Tseng¹, Wei-Shiang Lin¹, Chia-Lin Yu¹, B.Y. Shen¹, Kloze Kao¹, Chia-Ming Cheng¹, Hung-Jen Chen¹ - Show less +12 more•Institutions (1)

MediaTek¹

14 Jun 2020

TL;DR: This is the first paper that addresses all the deployment issues of image deblurring task across mobile devices, and is adopted by the championship-winning team in NTIRE 2020 Image Deblurring Challenge on Smartphone Track.

...read moreread less

Abstract: Recently, image enhancement and restoration have become important applications on mobile devices, such as super-resolution and image deblurring. However, most state-of-the-art networks present extremely high computational complexity. This makes them difficult to be deployed on mobile devices with acceptable latency. Moreover, when deploying to different mobile devices, there is a large latency variation due to the difference and limitation of deep learning accelerators on mobile devices. In this paper, we conduct a search of portable network architectures for better quality-latency trade-off across mobile devices. We further present the effectiveness of widely used network optimizations for image deblurring task. This paper provides comprehensive experiments and comparisons to uncover the in-depth analysis for both latency and image quality. Through all the above works, we demonstrate the successful deployment of image deblurring application on mobile devices with the acceleration of deep learning accelerators. To the best of our knowledge, this is the first paper that addresses all the deployment issues of image deblurring task across mobile devices. This paper provides practical deployment-guidelines, and is adopted by the championship-winning team in NTIRE 2020 Image Deblurring Challenge on Smartphone Track.

...read moreread less

31 citations

Proceedings Article•DOI•

Cache Capacity Aware Thread Scheduling for Irregular Memory Access on many-core GPGPUs

[...]

Hsien-Kai Kuo¹, Ta-Kan Yen¹, Bo-Cheng Charles Lai¹, Jing-Yang Jou¹•Institutions (1)

National Chiao Tung University¹

29 Apr 2013

TL;DR: A Cache Capacity Aware Thread Scheduling Problem to capture the impact of cache capacity as well as different architectural considerations is formed and a proof to be NP-hard is proposed to perform the cache capacity aware thread scheduling.

...read moreread less

Abstract: On-chip shared cache is effective to alleviate the memory bottleneck in modern many-core systems, such as GPGPUs. However, when scheduling numerous concurrent threads on a GPGPU, a cache capacity agnostic scheduling scheme could lead to severe cache contention among threads and thus significant performance degradation. Moreover, the diverse working sets in irregular applications make the cache contention issue an even more serious problem. As a result, taking cache capacity into account has become a critical scheduling issue of GPGPUs. This paper formulates a Cache Capacity Aware Thread Scheduling Problem to capture the impact of cache capacity as well as different architectural considerations. With a proof to be NP-hard, this paper has proposed two algorithms to perform the cache capacity aware thread scheduling. The simulation results on Nvidia's Fermi configuration have shown that the proposed scheduling scheme can effectively avoid cache contention, and achieve an average of 44.7% cache miss reduction and 28.5% runtime enhancement. The paper also shows the runtime can be enhanced up to 62.5% for more complex applications.

...read moreread less

13 citations

1
2
3
4
…
5
6
7

Collapse

Cited by

PDF

Open Access

More filters

Parallel CAD: Algorithm Design and Programming Special Section Call for Papers TODAES: ACM Transactions on Design Automation of Electronic Systems

[...]

Kurt Keutzer, Peng Li, Li Shang, Hai Zhou

01 Jan 2010

TL;DR: This journal special section will cover recent progress on parallel CAD research, including algorithm foundations, programming models, parallel architectural-specific optimization, and verification, as well as other topics relevant to the design of parallel CAD algorithms and software tools.

...read moreread less

Abstract: High-performance parallel computer architecture and systems have been improved at a phenomenal rate. In the meantime, VLSI computer-aided design (CAD) software for multibillion-transistor IC design has become increasingly complex and requires prohibitively high computational resources. Recent studies have shown that, numerous CAD problems, with their high computational complexity, can greatly benefit from the fast-increasing parallel computation capabilities. However, parallel programming imposes big challenges for CAD applications. Fully exploiting the computational power of emerging general-purpose and domain-specific multicore/many-core processor systems, calls for fundamental research and engineering practice across every stage of parallel CAD design, from algorithm exploration, programming models, design-time and run-time environment, to CAD applications, such as verification, optimization, and simulation. This journal special section will cover recent progress on parallel CAD research, including algorithm foundations, programming models, parallel architectural-specific optimization, and verification. More specifically, papers with in-depth and extensive coverage of the following topics will be considered, as well as other topics relevant to the design of parallel CAD algorithms and software tools. 1. Parallel algorithm design and specification for CAD applications 2. Parallel programming models and languages of particular use in CAD 3. Runtime support and performance optimization for CAD applications 4. Parallel architecture-specific design and optimization for CAD applications 5. Parallel program debugging and verification techniques particularly relevant for CAD The papers should be submitted via the Manuscript Central website and should adhere to standard ACM TODAES formatting requirements (http://todaes.acm.org/). The page count limit is 25.

...read moreread less

459 citations

Proceedings Article•DOI•

Unsupervised Degradation Representation Learning for Blind Super-Resolution

[...]

Longguang Wang¹, Yingqian Wang¹, Xiaoyu Dong², Qingyu Xu¹, Jungang Yang¹, Wei An¹, Yulan Guo¹ - Show less +3 more•Institutions (2)

National University of Defense Technology¹, University of Tokyo²

20 Jun 2021

TL;DR: Wang et al. as discussed by the authors proposed an unsupervised degradation representation learning scheme for blind super-resolution without explicit degradation estimation, which can extract discriminative representations to obtain accurate degradation information.

...read moreread less

Abstract: Most existing CNN-based super-resolution (SR) methods are developed based on an assumption that the degradation is fixed and known (e.g., bicubic downsampling). However, these methods suffer a severe performance drop when the real degradation is different from their assumption. To handle various unknown degradations in real-world applications, previous methods rely on degradation estimation to reconstruct the SR image. Nevertheless, degradation estimation methods are usually time-consuming and may lead to SR failure due to large estimation errors. In this paper, we propose an unsupervised degradation representation learning scheme for blind SR without explicit degradation estimation. Specifically, we learn abstract representations to distinguish various degradations in the representation space rather than explicit estimation in the pixel space. Moreover, we introduce a Degradation-Aware SR (DASR) network with flexible adaption to various degradations based on the learned representations. It is demonstrated that our degradation representation learning scheme can extract discriminative representations to obtain accurate degradation information. Experiments on both synthetic and real images show that our network achieves state-of-the-art performance for the blind SR task. Code is available at: https://github.com/LongguangWang/DASR.

...read moreread less

178 citations

Journal Article•DOI•

Handbook of Approximation: Algorithms and Metaheuristics

[...]

C.J.H. Mann

15 Feb 2008-Kybernetes

164 citations

Proceedings Article•DOI•

Flow-based Kernel Prior with Application to Blind Super-Resolution

[...]

Jingyun Liang¹, Kai Zhang¹, Shuhang Gu¹, Luc Van Gool¹, Radu Timofte¹ - Show less +1 more•Institutions (1)

ETH Zurich¹

01 Jun 2021

TL;DR: In this paper, a normalizing flow-based kernel prior (FKP) is proposed to model the kernel in the latent space rather than the network parameter space, which allows it to generate reasonable kernel initialization, traverse the learned kernel manifold and improve the optimization stability.

...read moreread less

Abstract: Kernel estimation is generally one of the key problems for blind image super-resolution (SR). Recently, Double-DIP proposes to model the kernel via a network architecture prior, while KernelGAN employs the deep linear network and several regularization losses to constrain the kernel space. However, they fail to fully exploit the general SR kernel assumption that anisotropic Gaussian kernels are sufficient for image SR. To address this issue, this paper proposes a normalizing flow-based kernel prior (FKP) for kernel modeling. By learning an invertible mapping between the anisotropic Gaussian kernel distribution and a tractable latent distribution, FKP can be easily used to replace the kernel modeling modules of Double-DIP and KernelGAN. Specifically, FKP optimizes the kernel in the latent space rather than the network parameter space, which allows it to generate reasonable kernel initialization, traverse the learned kernel manifold and improve the optimization stability. Extensive experiments on synthetic and real-world images demonstrate that the proposed FKP can significantly improve the kernel estimation accuracy with less parameters, runtime and memory usage, leading to state-of-the-art blind SR results.

...read moreread less

95 citations

Posted Content•

Unfolding the Alternating Optimization for Blind Super Resolution

[...]

Zhengxiong Luo¹, Yan Huang¹, Shang Li¹, Liang Wang¹, Tieniu Tan² - Show less +1 more•Institutions (2)

Chinese Academy of Sciences¹, Center for Excellence in Education²

06 Oct 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: An alternating optimization algorithm, which can estimate blur kernel and restore SR image in a single model and is trained with the kernel estimated by \textit{Estimator}, instead of ground-truth kernel, thus the model could be more tolerant to the estimation error of the latter.

...read moreread less

Abstract: Previous methods decompose blind super resolution (SR) problem into two sequential steps: \textit{i}) estimating blur kernel from given low-resolution (LR) image and \textit{ii}) restoring SR image based on estimated kernel. This two-step solution involves two independently trained models, which may not be well compatible with each other. Small estimation error of the first step could cause severe performance drop of the second one. While on the other hand, the first step can only utilize limited information from LR image, which makes it difficult to predict highly accurate blur kernel. Towards these issues, instead of considering these two steps separately, we adopt an alternating optimization algorithm, which can estimate blur kernel and restore SR image in a single model. Specifically, we design two convolutional neural modules, namely \textit{Restorer} and \textit{Estimator}. \textit{Restorer} restores SR image based on predicted kernel, and \textit{Estimator} estimates blur kernel with the help of restored SR image. We alternate these two modules repeatedly and unfold this process to form an end-to-end trainable network. In this way, \textit{Estimator} utilizes information from both LR and SR images, which makes the estimation of blur kernel easier. More importantly, \textit{Restorer} is trained with the kernel estimated by \textit{Estimator}, instead of ground-truth kernel, thus \textit{Restorer} could be more tolerant to the estimation error of \textit{Estimator}. Extensive experiments on synthetic datasets and real-world images show that our model can largely outperform state-of-the-art methods and produce more visually favorable results at much higher speed. The source code is available at this https URL.

...read moreread less

87 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45

Collapse