Home
/
Authors
/
Xiaopeng Zhang

Author

Xiaopeng Zhang

Bio: Xiaopeng Zhang is an academic researcher from Qualcomm. The author has contributed to research in topics: Multispectral image & RGB color model. The author has an hindex of 9, co-authored 9 publications receiving 436 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Near-infrared guided color image dehazing

[...]

Chen Feng¹, Shaojie Zhuo¹, Xiaopeng Zhang¹, Liang Shen¹, Sabine Süsstrunk² - Show less +1 more•Institutions (2)

Qualcomm¹, École Polytechnique Fédérale de Lausanne²

01 Sep 2013

TL;DR: An improved image dehazing scheme using a pair of color and NIR images, which effectively estimates the airlight color and transfers details from the NIR, and can achieve substantial improvements on the detail recovery and the color distribution over the existing imageDehazing algorithms.

...read moreread less

Abstract: Near-infrared (NIR) light has stronger penetration capability than visible light due to its long wavelengths and is thus less scattered by particles in the air This makes it desirable for image dehazing to unveil details of distant objects in landscape photographs In this paper, we propose an improved image dehazing scheme using a pair of color and NIR images, which effectively estimates the airlight color and transfers details from the NIR A two-stage dehazing method is proposed by exploiting the dissimilarity between RGB and NIR for airlight color estimation, followed by a dehazing procedure through an optimization framework Experiments on captured haze images show that our method can achieve substantial improvements on the detail recovery and the color distribution over the existing image dehazing algorithms

...read moreread less

101 citations

Proceedings Article•DOI•

A Quantization-Friendly Separable Convolution for MobileNets

[...]

Tao Sheng¹, Chen Feng¹, Shaojie Zhuo¹, Xiaopeng Zhang¹, Liang Shen¹, Mickey Aleksic¹ - Show less +2 more•Institutions (1)

Qualcomm¹

22 Mar 2018

TL;DR: Zhang et al. as mentioned in this paper proposed a quantization-friendly separable convolution architecture, which can effectively offload GPU and make it possible to deploy DL on fixed-point pipeline.

...read moreread less

Abstract: As deep learning (DL) is being rapidly pushed to edge computing, researchers invented various ways to make inference computation more efficient on mobile/IoT devices, such as network pruning, parameter compression, and etc. Quantization, as one of the key approaches, can effectively offload GPU, and make it possible to deploy DL on fixed-point pipeline. Unfortunately, not all existing networks design are friendly to quantization. For example, the popular lightweight MobileNetV1, while it successfully reduces parameter size and computation latency with separable convolution, our experiment shows its quantized models have large performance gap against its float point models. To resolve this, we analyzed the root cause of quantization loss and proposed a quantization-friendly separable convolution architecture. By evaluating the image classification task on ImageNet2012 dataset, our modified MobileNetV1 model can archive 8-bit inference top-1 accuracy in 68.03%, almost closed the gap to the float pipeline.

...read moreread less

100 citations

Patent•

Multispectral eye analysis for identity authentication

[...]

Chen Feng¹, Xiaopeng Zhang¹, Shaojie Zhuo¹, Liang Shen¹, Tao Sheng¹, Alwyn Dos Remedios¹ - Show less +2 more•Institutions (1)

Qualcomm¹

15 Jul 2014

TL;DR: In this article, the authors proposed a method for generating high resolution iris templates and detecting spoofs, enabling more reliable and secure iris authentication using RGB and NIR images.

...read moreread less

Abstract: Certain aspects relate to systems and techniques for generating high resolution iris templates and for detecting spoofs, enabling more reliable and secure iris authentication. Pairs of RGB and NIR images can be captured by the iris authentication system for use in iris authentication, for example using an NIR LED flash and a four-channel image sensor. Multiple images of the user's iris can be captured by the system in a relatively short period of time and can be fused together to generate a high resolution iris image that can contain more detail of the iris structure and unique pattern than each individual images. The “liveness” of the iris, referring to whether the iris is a real human iris or an iris imitation, can be assessed via a liveness ratio based on comparison of known iris and sclera reflectance properties at various wavelengths to determined sensor responses at those same wavelengths.

...read moreread less

90 citations

Proceedings Article•DOI•

A Quantization-Friendly Separable Convolution for MobileNets

[...]

Tao Sheng¹, Chen Feng¹, Shaojie Zhuo¹, Xiaopeng Zhang¹, Liang Shen¹, Mickey Aleksic¹ - Show less +2 more•Institutions (1)

Qualcomm¹

22 Mar 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work analyzed the root cause of quantization loss and proposed a quantization-friendly separable convolution architecture that can archive 8-bit inference top-1 accuracy and almost closed the gap to the float pipeline.

...read moreread less

Abstract: As deep learning (DL) is being rapidly pushed to edge computing, researchers invented various ways to make inference computation more efficient on mobile/IoT devices, such as network pruning, parameter compression, and etc. Quantization, as one of the key approaches, can effectively offload GPU, and make it possible to deploy DL on fixed-point pipeline. Unfortunately, not all existing networks design are friendly to quantization. For example, the popular lightweight MobileNetV1, while it successfully reduces parameter size and computation latency with separable convolution, our experiment shows its quantized models have large accuracy gap against its float point models. To resolve this, we analyzed the root cause of quantization loss and proposed a quantization-friendly separable convolution architecture. By evaluating the image classification task on ImageNet2012 dataset, our modified MobileNetV1 model can archive 8-bit inference top-1 accuracy in 68.03%, almost closed the gap to the float pipeline.

...read moreread less

66 citations

Journal Article•DOI•

Low-Power Computer Vision: Status, Challenges, and Opportunities

[...]

Sergei Alyamkin, Matthew Ardi¹, Alexander C. Berg², Achille Brighton³, Bo Chen³, Yi Chen⁴, Hsin-Pai Cheng⁴, Zichen Fan⁵, Chen Feng⁶, Bo Fu¹, Kent Gauen¹, Abhinav Goel¹, Alexander Goncharenko, Xuyang Guo⁵, Soonhoi Ha⁷, Andrew Howard³, Xiao Hu¹, Yuanjun Huang⁸, Dong-Hyun Kang⁷, Jaeyoun Kim³, Jong-Gook Ko⁹, Alexander Kondratyev, Jun-Hyeok Lee, Seungjae Lee⁹, Suwoong Lee⁹, Zichao Li¹⁰, Zhiyu Liang⁶, Juzheng Liu⁵, Xin Liu⁴, Yang Lu¹¹, Yung-Hsiang Lu¹, Deeptanshu Malik¹, Hong Hanh Nguyen, Eunbyung Park², Denis Repin, Liang Shen⁶, Tao Sheng⁶, Fei Sun¹¹, David Svitov, George K. Thiruvathukal¹², Baiwu Zhang⁶, Jingchi Zhang⁴, Xiaopeng Zhang⁶, Shaojie Zhuo⁶ - Show less +40 more•Institutions (12)

Purdue University¹, University of North Carolina at Chapel Hill², Google³, Duke University⁴, Tsinghua University⁵, Qualcomm⁶, Seoul National University⁷, University of Science and Technology of China⁸, Electronics and Telecommunications Research Institute⁹, Nanjing University¹⁰, Facebook¹¹, Loyola University Chicago¹²

18 Apr 2019-IEEE Journal on Emerging and Selected Topics in Circuits and Systems

TL;DR: The state of the art for low-power solutions to detect objects in images is examined to suggest directions for research as well as opportunities forLow-power computer vision.

...read moreread less

Abstract: Computer vision has achieved impressive progress in recent years. Meanwhile, mobile phones have become the primary computing platforms for millions of people. In addition to mobile phones, many autonomous systems rely on visual data for making decisions, and some of these systems have limited energy (such as unmanned aerial vehicles also called drones and mobile robots). These systems rely on batteries, and energy efficiency is critical. This paper serves the following two main purposes. First, examine the state of the art for low-power solutions to detect objects in images. Since 2015, the IEEE Annual International Low-Power Image Recognition Challenge (LPIRC) has been held to identify the most energy-efficient computer vision solutions. This paper summarizes the 2018 winners’ solutions. Second, suggest directions for research as well as opportunities for low-power computer vision.

...read moreread less

48 citations

Cited by

PDF

Open Access

More filters

Posted Content•

Quantizing deep convolutional networks for efficient inference: A whitepaper

[...]

Raghuraman Krishnamoorthi

21 Jun 2018-arXiv: Learning

TL;DR: An overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations is presented and it is recommended that per-channel quantization of weights and per-layer quantized of activations be the preferred quantization scheme for hardware acceleration and kernel optimization.

...read moreread less

Abstract: We present an overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations. Per-channel quantization of weights and per-layer quantization of activations to 8-bits of precision post-training produces classification accuracies within 2% of floating point networks for a wide variety of CNN architectures. Model sizes can be reduced by a factor of 4 by quantizing weights to 8-bits, even when 8-bit arithmetic is not supported. This can be achieved with simple, post training quantization of weights.We benchmark latencies of quantized networks on CPUs and DSPs and observe a speedup of 2x-3x for quantized implementations compared to floating point on CPUs. Speedups of up to 10x are observed on specialized processors with fixed point SIMD capabilities, like the Qualcomm QDSPs with HVX. Quantization-aware training can provide further improvements, reducing the gap to floating point to 1% at 8-bit precision. Quantization-aware training also allows for reducing the precision of weights to four bits with accuracy losses ranging from 2% to 10%, with higher accuracy drop for smaller networks.We introduce tools in TensorFlow and TensorFlowLite for quantizing convolutional networks and review best practices for quantization-aware training to obtain high accuracy with quantized weights and activations. We recommend that per-channel quantization of weights and per-layer quantization of activations be the preferred quantization scheme for hardware acceleration and kernel optimization. We also propose that future processors and hardware accelerators for optimized inference support precisions of 4, 8 and 16 bits.

...read moreread less

676 citations

Book Chapter•DOI•

AI Benchmark: Running Deep Neural Networks on Android Smartphones

[...]

Andrey Ignatov¹, Radu Timofte¹, William Chou², Ke Wang³, Max Wu⁴, Tim Hartley, Luc Van Gool¹ - Show less +3 more•Institutions (4)

ETH Zurich¹, Qualcomm², Huawei³, MediaTek⁴

08 Sep 2018

TL;DR: A study of the current state of deep learning in the Android ecosystem and describe available frameworks, programming models and the limitations of running AI on smartphones, as well as an overview of the hardware acceleration resources available on four main mobile chipset platforms.

...read moreread less

Abstract: Over the last years, the computational power of mobile devices such as smartphones and tablets has grown dramatically, reaching the level of desktop computers available not long ago. While standard smartphone apps are no longer a problem for them, there is still a group of tasks that can easily challenge even high-end devices, namely running artificial intelligence algorithms. In this paper, we present a study of the current state of deep learning in the Android ecosystem and describe available frameworks, programming models and the limitations of running AI on smartphones. We give an overview of the hardware acceleration resources available on four main mobile chipset platforms: Qualcomm, HiSilicon, MediaTek and Samsung. Additionally, we present the real-world performance results of different mobile SoCs collected with AI Benchmark (http://ai-benchmark.com) that are covering all main existing hardware configurations.

...read moreread less

313 citations

Proceedings Article•DOI•

Data-Free Quantization Through Weight Equalization and Bias Correction

[...]

Markus Nagel¹, Mart van Baalen¹, Tijmen Blankevoort¹, Max Welling²•Institutions (2)

Qualcomm¹, University of Amsterdam²

11 Jun 2019

TL;DR: This work introduces a data-free quantization method for deep neural networks that does not require fine-tuning or hyperparameter selection, and achieves near-original model performance on common computer vision architectures and tasks.

...read moreread less

Abstract: We introduce a data-free quantization method for deep neural networks that does not require fine-tuning or hyperparameter selection. It achieves near-original model performance on common computer vision architectures and tasks. 8-bit fixed-point quantization is essential for efficient inference on modern deep learning hardware. However, quantizing models to run in 8-bit is a non-trivial task, frequently leading to either significant performance reduction or engineering time spent on training a network to be amenable to quantization. Our approach relies on equalizing the weight ranges in the network by making use of a scale-equivariance property of activation functions. In addition the method corrects biases in the error that are introduced during quantization. This improves quantization accuracy performance, and can be applied to many common computer vision architectures with a straight forward API call. For common architectures, such as the MobileNet family, we achieve state-of-the-art quantized model performance. We further show that the method also extends to other computer vision architectures and tasks such as semantic segmentation and object detection.

...read moreread less

311 citations

Journal Article•DOI•

Pruning and quantization for deep neural network acceleration: A survey

[...]

Tailin Liang¹, John Glossner¹, Lei Wang¹, Shaobo Shi¹, Xiaotong Zhang¹ - Show less +1 more•Institutions (1)

University of Science and Technology Beijing¹

21 Oct 2021-Neurocomputing

TL;DR: A survey on two types of network compression: pruning and quantization is provided, which compare current techniques, analyze their strengths and weaknesses, provide guidance for compressing networks, and discuss possible future compression techniques.

...read moreread less

266 citations

Posted Content•

Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware

[...]

Florian Tramèr¹, Dan Boneh¹•Institutions (1)

Stanford University¹

08 Jun 2018-arXiv: Machine Learning

TL;DR: Slalom as mentioned in this paper is a framework that securely delegates execution of all linear layers in a DNN from a TEE to a faster, yet untrusted, co-located processor.

...read moreread less

Abstract: As Machine Learning (ML) gets applied to security-critical or sensitive domains, there is a growing need for integrity and privacy for outsourced ML computations. A pragmatic solution comes from Trusted Execution Environments (TEEs), which use hardware and software protections to isolate sensitive computations from the untrusted software stack. However, these isolation guarantees come at a price in performance, compared to untrusted alternatives. This paper initiates the study of high performance execution of Deep Neural Networks (DNNs) in TEEs by efficiently partitioning DNN computations between trusted and untrusted devices. Building upon an efficient outsourcing scheme for matrix multiplication, we propose Slalom, a framework that securely delegates execution of all linear layers in a DNN from a TEE (e.g., Intel SGX or Sanctum) to a faster, yet untrusted, co-located processor. We evaluate Slalom by running DNNs in an Intel SGX enclave, which selectively delegates work to an untrusted GPU. For canonical DNNs (VGG16, MobileNet and ResNet variants) we obtain 6x to 20x increases in throughput for verifiable inference, and 4x to 11x for verifiable and private inference.

...read moreread less

183 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90

Collapse