scispace - formally typeset
Search or ask a question
Author

Xiaopeng Zhang

Bio: Xiaopeng Zhang is an academic researcher from Qualcomm. The author has contributed to research in topics: Multispectral image & RGB color model. The author has an hindex of 9, co-authored 9 publications receiving 436 citations.

Papers
More filters
Proceedings ArticleDOI
01 Sep 2013
TL;DR: An improved image dehazing scheme using a pair of color and NIR images, which effectively estimates the airlight color and transfers details from the NIR, and can achieve substantial improvements on the detail recovery and the color distribution over the existing imageDehazing algorithms.
Abstract: Near-infrared (NIR) light has stronger penetration capability than visible light due to its long wavelengths and is thus less scattered by particles in the air This makes it desirable for image dehazing to unveil details of distant objects in landscape photographs In this paper, we propose an improved image dehazing scheme using a pair of color and NIR images, which effectively estimates the airlight color and transfers details from the NIR A two-stage dehazing method is proposed by exploiting the dissimilarity between RGB and NIR for airlight color estimation, followed by a dehazing procedure through an optimization framework Experiments on captured haze images show that our method can achieve substantial improvements on the detail recovery and the color distribution over the existing image dehazing algorithms

101 citations

Proceedings ArticleDOI
Tao Sheng1, Chen Feng1, Shaojie Zhuo1, Xiaopeng Zhang1, Liang Shen1, Mickey Aleksic1 
22 Mar 2018
TL;DR: Zhang et al. as mentioned in this paper proposed a quantization-friendly separable convolution architecture, which can effectively offload GPU and make it possible to deploy DL on fixed-point pipeline.
Abstract: As deep learning (DL) is being rapidly pushed to edge computing, researchers invented various ways to make inference computation more efficient on mobile/IoT devices, such as network pruning, parameter compression, and etc. Quantization, as one of the key approaches, can effectively offload GPU, and make it possible to deploy DL on fixed-point pipeline. Unfortunately, not all existing networks design are friendly to quantization. For example, the popular lightweight MobileNetV1, while it successfully reduces parameter size and computation latency with separable convolution, our experiment shows its quantized models have large performance gap against its float point models. To resolve this, we analyzed the root cause of quantization loss and proposed a quantization-friendly separable convolution architecture. By evaluating the image classification task on ImageNet2012 dataset, our modified MobileNetV1 model can archive 8-bit inference top-1 accuracy in 68.03%, almost closed the gap to the float pipeline.

100 citations

Patent
Chen Feng1, Xiaopeng Zhang1, Shaojie Zhuo1, Liang Shen1, Tao Sheng1, Alwyn Dos Remedios1 
15 Jul 2014
TL;DR: In this article, the authors proposed a method for generating high resolution iris templates and detecting spoofs, enabling more reliable and secure iris authentication using RGB and NIR images.
Abstract: Certain aspects relate to systems and techniques for generating high resolution iris templates and for detecting spoofs, enabling more reliable and secure iris authentication. Pairs of RGB and NIR images can be captured by the iris authentication system for use in iris authentication, for example using an NIR LED flash and a four-channel image sensor. Multiple images of the user's iris can be captured by the system in a relatively short period of time and can be fused together to generate a high resolution iris image that can contain more detail of the iris structure and unique pattern than each individual images. The “liveness” of the iris, referring to whether the iris is a real human iris or an iris imitation, can be assessed via a liveness ratio based on comparison of known iris and sclera reflectance properties at various wavelengths to determined sensor responses at those same wavelengths.

90 citations

Proceedings ArticleDOI
Tao Sheng1, Chen Feng1, Shaojie Zhuo1, Xiaopeng Zhang1, Liang Shen1, Mickey Aleksic1 
TL;DR: This work analyzed the root cause of quantization loss and proposed a quantization-friendly separable convolution architecture that can archive 8-bit inference top-1 accuracy and almost closed the gap to the float pipeline.
Abstract: As deep learning (DL) is being rapidly pushed to edge computing, researchers invented various ways to make inference computation more efficient on mobile/IoT devices, such as network pruning, parameter compression, and etc. Quantization, as one of the key approaches, can effectively offload GPU, and make it possible to deploy DL on fixed-point pipeline. Unfortunately, not all existing networks design are friendly to quantization. For example, the popular lightweight MobileNetV1, while it successfully reduces parameter size and computation latency with separable convolution, our experiment shows its quantized models have large accuracy gap against its float point models. To resolve this, we analyzed the root cause of quantization loss and proposed a quantization-friendly separable convolution architecture. By evaluating the image classification task on ImageNet2012 dataset, our modified MobileNetV1 model can archive 8-bit inference top-1 accuracy in 68.03%, almost closed the gap to the float pipeline.

66 citations

Journal ArticleDOI
TL;DR: The state of the art for low-power solutions to detect objects in images is examined to suggest directions for research as well as opportunities forLow-power computer vision.
Abstract: Computer vision has achieved impressive progress in recent years. Meanwhile, mobile phones have become the primary computing platforms for millions of people. In addition to mobile phones, many autonomous systems rely on visual data for making decisions, and some of these systems have limited energy (such as unmanned aerial vehicles also called drones and mobile robots). These systems rely on batteries, and energy efficiency is critical. This paper serves the following two main purposes. First, examine the state of the art for low-power solutions to detect objects in images. Since 2015, the IEEE Annual International Low-Power Image Recognition Challenge (LPIRC) has been held to identify the most energy-efficient computer vision solutions. This paper summarizes the 2018 winners’ solutions. Second, suggest directions for research as well as opportunities for low-power computer vision.

48 citations


Cited by
More filters
Posted Content
TL;DR: An overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations is presented and it is recommended that per-channel quantization of weights and per-layer quantized of activations be the preferred quantization scheme for hardware acceleration and kernel optimization.
Abstract: We present an overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations. Per-channel quantization of weights and per-layer quantization of activations to 8-bits of precision post-training produces classification accuracies within 2% of floating point networks for a wide variety of CNN architectures. Model sizes can be reduced by a factor of 4 by quantizing weights to 8-bits, even when 8-bit arithmetic is not supported. This can be achieved with simple, post training quantization of weights.We benchmark latencies of quantized networks on CPUs and DSPs and observe a speedup of 2x-3x for quantized implementations compared to floating point on CPUs. Speedups of up to 10x are observed on specialized processors with fixed point SIMD capabilities, like the Qualcomm QDSPs with HVX. Quantization-aware training can provide further improvements, reducing the gap to floating point to 1% at 8-bit precision. Quantization-aware training also allows for reducing the precision of weights to four bits with accuracy losses ranging from 2% to 10%, with higher accuracy drop for smaller networks.We introduce tools in TensorFlow and TensorFlowLite for quantizing convolutional networks and review best practices for quantization-aware training to obtain high accuracy with quantized weights and activations. We recommend that per-channel quantization of weights and per-layer quantization of activations be the preferred quantization scheme for hardware acceleration and kernel optimization. We also propose that future processors and hardware accelerators for optimized inference support precisions of 4, 8 and 16 bits.

676 citations

Book ChapterDOI
08 Sep 2018
TL;DR: A study of the current state of deep learning in the Android ecosystem and describe available frameworks, programming models and the limitations of running AI on smartphones, as well as an overview of the hardware acceleration resources available on four main mobile chipset platforms.
Abstract: Over the last years, the computational power of mobile devices such as smartphones and tablets has grown dramatically, reaching the level of desktop computers available not long ago. While standard smartphone apps are no longer a problem for them, there is still a group of tasks that can easily challenge even high-end devices, namely running artificial intelligence algorithms. In this paper, we present a study of the current state of deep learning in the Android ecosystem and describe available frameworks, programming models and the limitations of running AI on smartphones. We give an overview of the hardware acceleration resources available on four main mobile chipset platforms: Qualcomm, HiSilicon, MediaTek and Samsung. Additionally, we present the real-world performance results of different mobile SoCs collected with AI Benchmark (http://ai-benchmark.com) that are covering all main existing hardware configurations.

313 citations

Proceedings ArticleDOI
11 Jun 2019
TL;DR: This work introduces a data-free quantization method for deep neural networks that does not require fine-tuning or hyperparameter selection, and achieves near-original model performance on common computer vision architectures and tasks.
Abstract: We introduce a data-free quantization method for deep neural networks that does not require fine-tuning or hyperparameter selection. It achieves near-original model performance on common computer vision architectures and tasks. 8-bit fixed-point quantization is essential for efficient inference on modern deep learning hardware. However, quantizing models to run in 8-bit is a non-trivial task, frequently leading to either significant performance reduction or engineering time spent on training a network to be amenable to quantization. Our approach relies on equalizing the weight ranges in the network by making use of a scale-equivariance property of activation functions. In addition the method corrects biases in the error that are introduced during quantization. This improves quantization accuracy performance, and can be applied to many common computer vision architectures with a straight forward API call. For common architectures, such as the MobileNet family, we achieve state-of-the-art quantized model performance. We further show that the method also extends to other computer vision architectures and tasks such as semantic segmentation and object detection.

311 citations

Journal ArticleDOI
TL;DR: A survey on two types of network compression: pruning and quantization is provided, which compare current techniques, analyze their strengths and weaknesses, provide guidance for compressing networks, and discuss possible future compression techniques.

266 citations

Posted Content
TL;DR: Slalom as mentioned in this paper is a framework that securely delegates execution of all linear layers in a DNN from a TEE to a faster, yet untrusted, co-located processor.
Abstract: As Machine Learning (ML) gets applied to security-critical or sensitive domains, there is a growing need for integrity and privacy for outsourced ML computations. A pragmatic solution comes from Trusted Execution Environments (TEEs), which use hardware and software protections to isolate sensitive computations from the untrusted software stack. However, these isolation guarantees come at a price in performance, compared to untrusted alternatives. This paper initiates the study of high performance execution of Deep Neural Networks (DNNs) in TEEs by efficiently partitioning DNN computations between trusted and untrusted devices. Building upon an efficient outsourcing scheme for matrix multiplication, we propose Slalom, a framework that securely delegates execution of all linear layers in a DNN from a TEE (e.g., Intel SGX or Sanctum) to a faster, yet untrusted, co-located processor. We evaluate Slalom by running DNNs in an Intel SGX enclave, which selectively delegates work to an untrusted GPU. For canonical DNNs (VGG16, MobileNet and ResNet variants) we obtain 6x to 20x increases in throughput for verifiable inference, and 4x to 11x for verifiable and private inference.

183 citations