scispace - formally typeset
Search or ask a question

Showing papers on "Quantization (image processing) published in 2018"


Journal ArticleDOI
TL;DR: A one-stage supervised deep hashing framework (SDHP) is proposed to learn high-quality binary codes, and a deep convolutional neural network is implemented to enforce the learned codes to meet the following criterions.
Abstract: Image content analysis is an important surround perception modality of intelligent vehicles. In order to efficiently recognize the on-road environment based on image content analysis from the large-scale scene database, relevant images retrieval becomes one of the fundamental problems. To improve the efficiency of calculating similarities between images, hashing techniques have received increasing attentions. For most existing hash methods, the suboptimal binary codes are generated, as the hand-crafted feature representation is not optimally compatible with the binary codes. In this paper, a one-stage supervised deep hashing framework (SDHP) is proposed to learn high-quality binary codes. A deep convolutional neural network is implemented, and we enforce the learned codes to meet the following criterions: 1) similar images should be encoded into similar binary codes, and vice versa; 2) the quantization loss from Euclidean space to Hamming space should be minimized; and 3) the learned codes should be evenly distributed. The method is further extended into SDHP+ to improve the discriminative power of binary codes. Extensive experimental comparisons with state-of-the-art hashing algorithms are conducted on CIFAR-10 and NUS-WIDE, the MAP of SDHP reaches to 87.67% and 77.48% with 48 b, respectively, and the MAP of SDHP+ reaches to 91.16%, 81.08% with 12 b, 48 b on CIFAR-10 and NUS-WIDE, respectively. It illustrates that the proposed method can obviously improve the search accuracy.

239 citations


Proceedings ArticleDOI
18 Jun 2018
TL;DR: The proposed CLIP-Q method (Compression Learning by In-Parallel Pruning-Quantization) compresses AlexNet, GoogLeNet, and ResNet-50 by 10-fold, while preserving the uncompressed network accuracies on ImageNet, to take advantage of the complementary nature of pruning and quantization.
Abstract: Deep neural networks enable state-of-the-art accuracy on visual recognition tasks such as image classification and object detection. However, modern deep networks contain millions of learned weights; a more efficient utilization of computation resources would assist in a variety of deployment scenarios, from embedded platforms with resource constraints to computing clusters running ensembles of networks. In this paper, we combine network pruning and weight quantization in a single learning framework that performs pruning and quantization jointly, and in parallel with fine-tuning. This allows us to take advantage of the complementary nature of pruning and quantization and to recover from premature pruning errors, which is not possible with current two-stage approaches. Our proposed CLIP-Q method (Compression Learning by In-Parallel Pruning-Quantization) compresses AlexNet by 51-fold, GoogLeNet by 10-fold, and ResNet-50 by 15-fold, while preserving the uncompressed network accuracies on ImageNet.

173 citations


Journal ArticleDOI
01 Jan 2018
TL;DR: The experimental results show that the proposed watermarking algorithm can obtain better invisibility of watermark and stronger robustness for common attacks, e.g., JPEG compression, cropping, and adding noise.
Abstract: This paper proposes a new blind watermarking algorithm, which embedding the binary watermark into the blue component of a RGB image in the spatial domain, to resolve the problem of protecting copyright. For embedding watermark, the generation principle and distribution features of direct current (DC) coefficient are used to directly modify the pixel values in the spatial domain, and then four different sub-watermarks are embedded into the different areas of the host image for four times, respectively. When watermark extraction, the sub-watermark is extracted with blind manner according to DC coefficients of watermarked image and the key-based quantization step, and then the statistical rule and the method of “first to select, second to combine” are proposed to form the final watermark. Hence, the proposed algorithm is executed in the spatial domain rather than in discrete cosine transform (DCT) domain, which not only has simple and quick performance of the spatial domain but also has high robustness feature of DCT domain. The experimental results show that the proposed watermarking algorithm can obtain better invisibility of watermark and stronger robustness for common attacks, e.g., JPEG compression, cropping, and adding noise. Comparison results also show the advantages of the proposed method.

172 citations


Journal ArticleDOI
TL;DR: This paper presents a method for generating HDR content from LDR content based on deep Convolutional Neural Networks (CNNs) termed ExpandNet, which accepts LDR images as input and generates images with an expanded range in an end‐to‐end fashion.
Abstract: High dynamic range (HDR) imaging provides the capability of handling real world lighting as opposed to the traditional low dynamic range (LDR) which struggles to accurately represent images with higher dynamic range. However, most imaging content is still available only in LDR. This paper presents a method for generating HDR content from LDR content based on deep Convolutional Neural Networks (CNNs) termed ExpandNet. ExpandNet accepts LDR images as input and generates images with an expanded range in an end-to-end fashion. The model attempts to reconstruct missing information that was lost from the original signal due to quantization, clipping, tone mapping or gamma correction. The added information is reconstructed from learned features, as the network is trained in a supervised fashion using a dataset of HDR images. The approach is fully automatic and data driven; it does not require any heuristics or human expertise. ExpandNet uses a multiscale architecture which avoids the use of upsampling layers to improve image quality. The method performs well compared to expansion/inverse tone mapping operators quantitatively on multiple metrics, even for badly exposed inputs.

164 citations


Proceedings Article
Chia-Yu Chen1, Jungwook Choi1, Daniel Brand1, Ankur Agrawal1, Wei Zhang1, Kailash Gopalakrishnan1 
29 Apr 2018
TL;DR: This paper introduces a novel technique - the Adaptive Residual Gradient Compression (AdaComp) scheme, based on localized selection of gradient residues and automatically tunes the compression rate depending on local activity.
Abstract: Highly distributed training of Deep Neural Networks (DNNs) on future compute platforms (offering 100 of TeraOps/s of computational capacity) is expected to be severely communication constrained. To overcome this limitation, new gradient compression techniques are needed that are computationally friendly, applicable to a wide variety of layers seen in Deep Neural Networks and adaptable to variations in network architectures as well as their hyper-parameters. In this paper we introduce a novel technique - the Adaptive Residual Gradient Compression ( AdaComp ) scheme. AdaComp is based on localized selection of gradient residues and automatically tunes the compression rate depending on local activity. We show excellent results on a wide spectrum of state of the art Deep Learning models in multiple domains (vision, speech, language), datasets (MNIST, CIFAR10, ImageNet, BN50, Shakespeare), optimizers (SGD with momentum, Adam) and network parameters (number of learners, minibatch-size etc.). Exploiting both sparsity and quantization, we demonstrate end-to-end compression rates of ∼ 200 × for fully-connected and recurrent layers, and ∼ 40 × for convolutional layers, without any noticeable degradation in model accuracies.

116 citations


Proceedings ArticleDOI
18 Jun 2018
TL;DR: This paper applies quantization techniques to FCNs for accurate biomedical image segmentation with a focus on a state-of-the-art segmentation framework, suggestive annotation, which judiciously extracts representative annotation samples from the original training dataset, obtaining an effective small-sized balanced training dataset.
Abstract: With pervasive applications of medical imaging in health-care, biomedical image segmentation plays a central role in quantitative analysis, clinical diagnosis, and medical intervention. Since manual annotation suffers limited reproducibility, arduous efforts, and excessive time, automatic segmentation is desired to process increasingly larger scale histopathological data. Recently, deep neural networks (DNNs), particularly fully convolutional networks (FCNs), have been widely applied to biomedical image segmentation, attaining much improved performance. At the same time, quantization of DNNs has become an active research topic, which aims to represent weights with less memory (precision) to considerably reduce memory and computation requirements of DNNs while maintaining acceptable accuracy. In this paper, we apply quantization techniques to FCNs for accurate biomedical image segmentation. Unlike existing literatures on quantization which primarily targets memory and computation complexity reduction, we apply quantization as a method to reduce overfitting in FCNs for better accuracy. Specifically, we focus on a state-of-the-art segmentation framework, suggestive annotation [26], which judiciously extracts representative annotation samples from the original training dataset, obtaining an effective small-sized balanced training dataset. We develop two new quantization processes for this framework: (1) suggestive annotation with quantization for highly representative training samples, and (2) network training with quantization for high accuracy. Extensive experiments on the MICCAI Gland dataset show that both quantization processes can improve the segmentation performance, and our proposed method exceeds the current state-of-the-art performance by up to 1%. In addition, our method has a reduction of up to 6.4x on memory usage.

103 citations


Proceedings Article
15 Feb 2018
TL;DR: This work proposes to directly learn the discrete codes in an end-to-end neural network by applying the Gumbel-softmax trick, and achieves 98% in a sentiment analysis task and 94% ~ 99% in machine translation tasks without performance loss.
Abstract: Natural language processing (NLP) models often require a massive number of parameters for word embeddings, resulting in a large storage or memory footprint. Deploying neural NLP models to mobile devices requires compressing the word embeddings without any significant sacrifices in performance. For this purpose, we propose to construct the embeddings with few basis vectors. For each word, the composition of basis vectors is determined by a hash code. To maximize the compression rate, we adopt the multi-codebook quantization approach instead of binary coding scheme. Each code is composed of multiple discrete numbers, such as (3, 2, 1, 8), where the value of each component is limited to a fixed range. We propose to directly learn the discrete codes in an end-to-end neural network by applying the Gumbel-softmax trick. Experiments show the compression rate achieves 98% in a sentiment analysis task and 94% ~ 99% in machine translation tasks without performance loss. In both tasks, the proposed method can improve the model performance by slightly lowering the compression rate. Compared to other approaches such as character-level segmentation, the proposed method is language-independent and does not require modifications to the network architecture.

87 citations


Posted Content
TL;DR: ExpandNet as discussed by the authors uses a CNN to reconstruct missing information that was lost from the original signal due to quantization, clipping, tone mapping or gamma correction, and uses a multiscale architecture which avoids the use of upsampling layers to improve image quality.
Abstract: High dynamic range (HDR) imaging provides the capability of handling real world lighting as opposed to the traditional low dynamic range (LDR) which struggles to accurately represent images with higher dynamic range. However, most imaging content is still available only in LDR. This paper presents a method for generating HDR content from LDR content based on deep Convolutional Neural Networks (CNNs) termed ExpandNet. ExpandNet accepts LDR images as input and generates images with an expanded range in an end-to-end fashion. The model attempts to reconstruct missing information that was lost from the original signal due to quantization, clipping, tone mapping or gamma correction. The added information is reconstructed from learned features, as the network is trained in a supervised fashion using a dataset of HDR images. The approach is fully automatic and data driven; it does not require any heuristics or human expertise. ExpandNet uses a multiscale architecture which avoids the use of upsampling layers to improve image quality. The method performs well compared to expansion/inverse tone mapping operators quantitatively on multiple metrics, even for badly exposed inputs.

81 citations


Posted Content
TL;DR: This paper proposes MDUnet, a multi-scale densely connected U-net for biomedical image segmentation, and adopts the optimal model based on the experiment and proposes a novel Multi-scale Dense U-Net (MDU- net) architecture with quantization, which reduce overfitting in MDU-Net for better accuracy.
Abstract: Radiologist is "doctor's doctor", biomedical image segmentation plays a central role in quantitative analysis, clinical diagnosis, and medical intervention. In the light of the fully convolutional networks (FCN) and U-Net, deep convolutional networks (DNNs) have made significant contributions in biomedical image segmentation applications. In this paper, based on U-Net, we propose MDUnet, a multi-scale densely connected U-net for biomedical image segmentation. we propose three different multi-scale dense connections for U shaped architectures encoder, decoder and across them. The highlights of our architecture is directly fuses the neighboring different scale feature maps from both higher layers and lower layers to strengthen feature propagation in current layer. Which can largely improves the information flow encoder, decoder and across them. Multi-scale dense connections, which means containing shorter connections between layers close to the input and output, also makes much deeper U-net possible. We adopt the optimal model based on the experiment and propose a novel Multi-scale Dense U-Net (MDU-Net) architecture with quantization. Which reduce overfitting in MDU-Net for better accuracy. We evaluate our purpose model on the MICCAI 2015 Gland Segmentation dataset (GlaS). The three multi-scale dense connections improve U-net performance by up to 1.8% on test A and 3.5% on test B in the MICCAI Gland dataset. Meanwhile the MDU-net with quantization achieves the superiority over U-Net performance by up to 3% on test A and 4.1% on test B.

63 citations


Book ChapterDOI
08 Sep 2018
TL;DR: A novel deep convolutional neural network is proposed for double JPEG detection using statistical histogram features from each block with a vectorized quantization table, which handles mixed JPEG quality factors and is suitable for real-world situations.
Abstract: Double JPEG detection is essential for detecting various image manipulations. This paper proposes a novel deep convolutional neural network for double JPEG detection using statistical histogram features from each block with a vectorized quantization table. In contrast to previous methods, the proposed approach handles mixed JPEG quality factors and is suitable for real-world situations. We collected real-world JPEG images from the image forensic service and generated a new double JPEG dataset with 1120 quantization tables to train the network. The proposed approach was verified experimentally to produce a state-of-the-art performance, successfully detecting various image manipulations.

60 citations


Journal ArticleDOI
TL;DR: Experimental results prove that the proposed image fusion method generates better effects on both visual perception and objective quantization than traditional methods.

Journal ArticleDOI
Graham Hudson1, Alain Leger, Birger Niss, Istvan Sebestyen, Jørgen Vaaben 
TL;DR: The JPEG standard has become one of the most successful standards in information and communication technologies (ICT) history as discussed by the authors and has been used for image compression for many applications, such as web pages, medical imaging, and public records.
Abstract: Digital image capture, processing, storage, transmission, and display are now taken for granted as part of the technology of modern everyday life. Digital image compression is one of the enabling technologies of the present multimedia world. The image compression technique used for application as diverse as photography, web pages, medical imaging, and public records is JPEG, named after the ISO/CCITT “joint photographic experts group,” established in 1986, which developed the technique in the late 1980s and produced the international standard in the early ’90s. ITU-T T.81¦ISO/IEC 10918-1, also called “JPEG-1” has become one of the most successful standards in information and communication technologies (ICT) history. The authors of this paper—all members of the original JPEG development team—were all intimately involved in image-coding research and JPEG in particular. The paper goes behind the scenes explaining why and how JPEG came about and looks under the bonnet of the technique explaining the different components that give the standard of its efficiency, versatility, and robustness that have made a technique that has stood the test of time and evolved to cover applications beyond its original scope. In addition, the authors give a short outlook of the main milestones in coding schemes of still images since “JPEG-1.”

Proceedings ArticleDOI
01 Dec 2018
TL;DR: This paper proposes a novel FPGA-based architecture for SSDLiteM2 in combination with hardware optimizations including fused BRB, processing element (PE) sharing and load-balanced channel pruning, and a novel quantization scheme called partial quantization has been developed.
Abstract: Convolutional neural network (CNN)-based object detection has been widely employed in various applications such as autonomous driving and intelligent video surveillance. However, the computational complexity of conventional convolution hinders its application in embedded systems. Recently, a mobile-friendly CNN model SSDLite-MobileNetV2 (SSDLiteM2) has been proposed for object detection. This model consists of a novel layer called bottleneck residual block (BRB). Although SSDLiteM2 contains far fewer parameters and computations than conventional CNN models, its performance on embedded devices still cannot meet the requirements of real-time processing. This paper proposes a novel FPGA-based architecture for SSDLiteM2 in combination with hardware optimizations including fused BRB, processing element (PE) sharing and load-balanced channel pruning. Moreover, a novel quantization scheme called partial quantization has been developed, which partially quantizes SSDLiteM2 to 8 bits with only 1.8% accuracy loss. Experiments show that the proposed design on a Xilinx ZC706 device can achieve up to 65 frames per second with 20.3 mean average precision on the COCO dataset.

Proceedings Article
01 Jan 2018
TL;DR: This article proposed GroupReduce, a novel compression method for neural language models based on vocabulary-partition (block) based low-rank matrix approximation and the inherent frequency distribution of tokens (the power-law distribution of words).
Abstract: Model compression is essential for serving large deep neural nets on devices with limited resources or applications that require real-time responses. For advanced NLP problems, a neural language model usually consists of recurrent layers (e.g., using LSTM cells), an embedding matrix for representing input tokens, and a softmax layer for generating output tokens. For problems with a very large vocabulary size, the embedding and the softmax matrices can account for more than half of the model size. For instance, the bigLSTM model achieves state-of-the-art performance on the One-Billion-Word (OBW) dataset with around 800k vocabulary, and its word embedding and softmax matrices use more than 6GBytes space, and are responsible for over 90\% of the model parameters. In this paper, we propose GroupReduce, a novel compression method for neural language models, based on vocabulary-partition (block) based low-rank matrix approximation and the inherent frequency distribution of tokens (the power-law distribution of words). We start by grouping words into $c$ blocks based on their frequency, and then refine the clustering iteratively by constructing weighted low-rank approximation for each block, where the weights are based the frequencies of the words in the block. The experimental results show our method can significantly outperform traditional compression methods such as low-rank approximation and pruning. On the OBW dataset, our method achieved 6.6x compression rate for the embedding and softmax matrices, and when combined with quantization, our method can achieve 26x compression rate without losing prediction accuracy.

Journal ArticleDOI
TL;DR: This work proposes and develops deconvolution architecture for efficient FPGA implementation, and a hardware mapping framework is developed to automatically generate the low-latency hardware design for any given CNN model on the target device.
Abstract: Convolutional Neural Networks-- (CNNs) based algorithms have been successful in solving image recognition problems, showing very large accuracy improvement. In recent years, deconvolution layers are widely used as key components in the state-of-the-art CNNs for end-to-end training and models to support tasks such as image segmentation and super resolution. However, the deconvolution algorithms are computationally intensive, which limits their applicability to real-time applications. Particularly, there has been little research on the efficient implementations of deconvolution algorithms on FPGA platforms that have been widely used to accelerate CNN algorithms by practitioners and researchers due to their high performance and power efficiency. In this work, we propose and develop deconvolution architecture for efficient FPGA implementation. FPGA-based accelerators are proposed for both deconvolution and CNN algorithms. Besides, memory sharing between the computation modules is proposed for the FPGA-based CNN accelerator as well as for other optimization techniques. A non-linear optimization model based on the performance model is introduced to efficiently explore the design space to achieve optimal processing speed of the system and improve power efficiency. Furthermore, a hardware mapping framework is developed to automatically generate the low-latency hardware design for any given CNN model on the target device. Finally, we implement our designs on Xilinx Zynq ZC706 board and the deconvolution accelerator achieves a performance of 90.1 giga operations per second (GOPS) under 200MHz working frequency and a performance density of 0.10 GOPS/DSP using 32-bit quantization, which significantly outperforms previous designs on FPGAs. A real-time application of scene segmentation on Cityscapes Dataset is used to evaluate our CNN accelerator on Zynq ZC706 board, and the system achieves a performance of 107 GOPS and 0.12 GOPS/DSP using 16-bit quantization and supports up to 17 frames per second for 512 × 512 image inputs with a power consumption of only 9.6W.

Proceedings ArticleDOI
24 Jun 2018
TL;DR: A Convolutional Neural Network (CNN) based architecture for SAO in-loop filtering operation without modifying anything on encoding process is proposed and experimental results show that the proposed model outperformed previous state-of-the-art models in terms of BD-PSNR and BD-BR.
Abstract: High Efficiency Video Coding (HEVC), which is the latest video coding standard currently, achieves up to 50% bit rate reduction compared to previous H264/AVC standard While performing the block based video coding, these lossy compression techniques produce various artifacts like blurring, distortion, ringing, and contouring effects on output frames, especially at low bit rates To reduce those compression artifacts HEVC adopted two post processing filtering technique namely de-blocking filter (DBF) and sample adaptive offset (SAO) on the decoder side While DBF applies to samples located at block boundaries, SAO nonlinear operation applies adaptively to samples satisfying the gradient based conditions through a lookup table Again SAO filter corrects the quantization errors by sending edge offset values to decoders This operation consumes extra signaling bit and becomes an overhead to network In this paper, we proposed a Convolutional Neural Network (CNN) based architecture for SAO in-loop filtering operation without modifying anything on encoding process Our experimental results show that our proposed model outperformed previous state-of-the-art models in terms of BD-PSNR (0408 dB) and BD-BR (344%), measured on a widely available standard video sequences

Journal ArticleDOI
TL;DR: A novel electrocardiogram (ECG) data compression algorithm which employs DCT based discrete orthogonal Stockwell transform which exploits the repetition of data instances to achieve higher compression without any relevant information loss is reported.

Journal ArticleDOI
12 Sep 2018-PLOS ONE
TL;DR: A Hybrid Geometric Spatial Image Representation (HGSIR) is explored that is based on the combination of histograms computed over the rectangular, triangular and circular regions of the image that outperforms the state-of-art research in terms of classification accuracy.
Abstract: The recent development in the technology has increased the complexity of image contents and demand for image classification becomes more imperative Digital images play a vital role in many applied domains such as remote sensing, scene analysis, medical care, textile industry and crime investigation Feature extraction and image representation is considered as an important step in scene analysis as it affects the image classification performance Automatic classification of images is an open research problem for image analysis and pattern recognition applications The Bag-of-Features (BoF) model is commonly used to solve image classification, object recognition and other computer vision-based problems In BoF model, the final feature vector representation of an image contains no information about the co-occurrence of features in the 2D image space This is considered as a limitation, as the spatial arrangement among visual words in image space contains the information that is beneficial for image representation and learning of classification model To deal with this, researchers have proposed different image representations Among these, the division of image-space into different geometric sub-regions for the extraction of histogram for BoF model is considered as a notable contribution for the extraction of spatial clues Keeping this in view, we aim to explore a Hybrid Geometric Spatial Image Representation (HGSIR) that is based on the combination of histograms computed over the rectangular, triangular and circular regions of the image Five standard image datasets are used to evaluate the performance of the proposed research The quantitative analysis demonstrates that the proposed research outperforms the state-of-art research in terms of classification accuracy

Posted Content
TL;DR: This paper shows that comparable performances can be obtained with a unique learned transform in the case of autoencoders, and saves a lot of training time.
Abstract: This paper explores the problem of learning transforms for image compression via autoencoders. Usually, the rate-distortion performances of image compression are tuned by varying the quantization step size. In the case of autoen-coders, this in principle would require learning one transform per rate-distortion point at a given quantization step size. Here, we show that comparable performances can be obtained with a unique learned transform. The different rate-distortion points are then reached by varying the quantization step size at test time. This approach saves a lot of training time.

Journal ArticleDOI
TL;DR: Compared with the current S-UNIWARD steganography, the message extraction error rates of the proposed algorithm after JPEG compression decrease from about 50 % to nearly 0, and the algorithm not only possesses the comparable JPEG compression resistant ability, but also has a stronger detection resistant performance and a higher operation efficiency.
Abstract: In order to improve the JPEG compression resistant performance of the current steganogrpahy algorithms resisting statistic detection, an adaptive steganography algorithm resisting JPEG compression and detection based on dither modulation is proposed. Utilizing the adaptive dither modulation algorithm based on the quantization tables, the embedding domains resisting JPEG compression for spatial images and JPEG images are determined separately. Then the embedding cost function is constructed by the embedding costs calculation algorithm based on side information. Finally, the RS coding is combined with the STCs to realize the minimum costs messages embedding while improving the correct rates of the extracted messages after JPEG compression. The experimental results demonstrate that the algorithm can be applied to both spatial images and JPEG images. Compared with the current S-UNIWARD steganography, the message extraction error rates of the proposed algorithm after JPEG compression decrease from about 50 % to nearly 0; compared with the current JPEG compression and detection resistant steganography algorithms, the proposed algorithm not only possesses the comparable JPEG compression resistant ability, but also has a stronger detection resistant performance and a higher operation efficiency.

Proceedings ArticleDOI
20 Apr 2018
TL;DR: In this paper, the authors explore the problem of learning transforms for image compression via autoencoders and show that comparable performance can be obtained with a unique learned transform, where different rate-distortion points are then reached by varying the quantization step size at test time.
Abstract: This paper explores the problem of learning transforms for image compression via autoencoders. Usually, the ratedistortion performances of image compression are tuned by varying the quantization step size. In the case of autoen-coders, this in principle would require learning one transform per rate-distortion point at a given quantization step size. Here, we show that comparable performances can be obtained with a unique learned transform. The different rate-distortion points are then reached by varying the quantization step size at test time. This approach saves a lot of training time.

Journal ArticleDOI
Hongyi Chen, Fan Zhang, Bo Tang, Qiang Yin, Xian Sun 
TL;DR: A new weight-based network pruning and adaptive architecture squeezing method is introduced to reduce the network storage and the time of inference and training process, meanwhile maintain a balance between compression ratio and classification accuracy.
Abstract: Deep convolutional neural networks (CNN) have been recently applied to synthetic aperture radar (SAR) for automatic target recognition (ATR) and have achieved state-of-the-art results with significantly improved recognition performance. However, the training period of deep CNN is long, and the size of the network is huge, sometimes reaching hundreds of megabytes. These two factors of deep CNN hinders its practical implementation and deployment in real-time SAR platforms that are typically resource-constrained. To address this challenge, this paper presents three strategies of network compression and acceleration to decrease computing and memory resource dependencies while maintaining a competitive accuracy. First, we introduce a new weight-based network pruning and adaptive architecture squeezing method to reduce the network storage and the time of inference and training process, meanwhile maintain a balance between compression ratio and classification accuracy. Then we employ weight quantization and coding to compress the network storage space. Due to the fact that the amount of calculation is mainly reflected in the convolution layer, a fast approach for pruned convolutional layers is proposed to reduce the number of multiplication by exploiting the sparsity in the activation inputs and weights. Experimental results show that the convolutional neural networks for SAR-ATR can be compressed by 40 × without loss of accuracy, and the number of multiplication can be reduced by 15 × . Combining these strategies, we can easily load the network in resource-constrained platforms, speed up the inference process to get the results in real-time or even retrain a more suitable network with new image data in a specific situation.

Posted Content
TL;DR: In this paper, the authors apply quantization techniques to fully convolutional networks (FCNs) for biomedical image segmentation, which can reduce memory and computation complexity of DNNs while maintaining acceptable accuracy.
Abstract: With pervasive applications of medical imaging in health-care, biomedical image segmentation plays a central role in quantitative analysis, clinical diagno- sis, and medical intervention. Since manual anno- tation su ers limited reproducibility, arduous e orts, and excessive time, automatic segmentation is desired to process increasingly larger scale histopathological data. Recently, deep neural networks (DNNs), par- ticularly fully convolutional networks (FCNs), have been widely applied to biomedical image segmenta- tion, attaining much improved performance. At the same time, quantization of DNNs has become an ac- tive research topic, which aims to represent weights with less memory (precision) to considerably reduce memory and computation requirements of DNNs while maintaining acceptable accuracy. In this paper, we apply quantization techniques to FCNs for accurate biomedical image segmentation. Unlike existing litera- ture on quantization which primarily targets memory and computation complexity reduction, we apply quan- tization as a method to reduce over tting in FCNs for better accuracy. Speci cally, we focus on a state-of- the-art segmentation framework, suggestive annotation [22], which judiciously extracts representative annota- tion samples from the original training dataset, obtain- ing an e ective small-sized balanced training dataset. We develop two new quantization processes for this framework: (1) suggestive annotation with quantiza- tion for highly representative training samples, and (2) network training with quantization for high accuracy. Extensive experiments on the MICCAI Gland dataset show that both quantization processes can improve the segmentation performance, and our proposed method exceeds the current state-of-the-art performance by up to 1%. In addition, our method has a reduction of up to 6.4x on memory usage.

Journal ArticleDOI
Jie Lei1, Xin Gao1, Zunlei Feng1, Qiu Huamou1, Mingli Song1 
TL;DR: Experimental results show MSDDN can better handle the defect variations than traditional models and general purpose convolutional neural networks.

Proceedings ArticleDOI
19 Mar 2018
TL;DR: In this paper, the authors describe the making of a real-time object detection in a live video stream processed on an embedded all-programmable device, where the required processing is tamed and parallelized across both the CPU cores and the programmable logic and how the most suitable resources and powerful extensions, such as NEON vectorization, are leveraged for the individual processing steps.
Abstract: Neural networks have established as a generic and powerful means to approach challenging problems such as image classification, object detection or decision making. Their successful employment foots on an enormous demand of compute. The quantization of network parameters and the processed data has proven a valuable measure to reduce the challenges of network inference so effectively that the feasible scope of applications is expanded even into the embedded domain. This paper describes the making of a real-time object detection in a live video stream processed on an embedded all-programmable device. The presented case illustrates how the required processing is tamed and parallelized across both the CPU cores and the programmable logic and how the most suitable resources and powerful extensions, such as NEON vectorization, are leveraged for the individual processing steps. The crafted result is an extended Darknet framework implementing a fully integrated, end-to-end solution from video capture over object annotation to video output applying neural network inference at different quantization levels running at 16 frames per second on an embedded Zynq UltraScale+ (XCZU3EG) platform.

Journal ArticleDOI
TL;DR: A memristor-based image compression architecture that exploits a lossy 2-D discrete wavelet transform that provides significant improvements in energy, area, and performance compared to the 32 levels CMOS implementation with comparable CW-SSIM.
Abstract: Memristor-based hardware accelerators are gaining an increased attention as a potential candidate to speed-up the vector-matrix operations commonly needed in many digital image processing tasks due to their area, speed, and energy efficiency. In this paper, a memristor-based image compression (MR-IC) architecture that exploits a lossy 2-D discrete wavelet transform is proposed. The architecture is composed of a computational memristor crossbar, an intermediate memory array that stores the row-transformed coefficients and a final memory that holds the compressed version of the original image. The computational memristor array performs in-memory computation on the initially stored transformation coefficients. Using the quantitative analysis approach, we demonstrate a $10\times $ reduction in a number of operations compared with a conventional application-specific integrated circuit implementation. This translates to five orders of magnitude reduction in area, around $11\times $ improvement in energy efficiency, and $1.28\times $ speedup in computation time. Image quality metrics, such as peak signal-to-noise ratio (PSNR), structural similarity (SSIM) index, and complex wavelet-SSIM (CW-SSIM), are used to quantify the reduction in image quality due to lossy compression. The achieved metrics for conventional versus MR-IC are: PSNR 57.24 versus 33.29 dB, SSIM 0.9994 versus 0.8853, and CW-SSIM 1 versus 0.9983. Simulation results show that the 32 quantization levels proposed architecture provides significant improvements in energy, area, and performance compared to the 32 levels CMOS implementation with comparable CW-SSIM.

Journal ArticleDOI
TL;DR: This paper proposes a novel intensity potential field to model the complicated relationships among pixels, and an adaptive de-quantization algorithm is proposed to convert low bit-depth images to high bit- depth ones.
Abstract: Display devices at bit depth of 10 or higher have been mature but the mainstream media source is still at bit depth of eight. To accommodate the gap, the most economic solution is to render source at low bit depth for high bit-depth display, which is essentially the procedure of de-quantization. Traditional methods, such as zero-padding or bit replication, introduce annoying false contour artifacts. To better estimate the least-significant bits, later works use filtering or interpolation approaches, which exploit only limited neighbor information, cannot thoroughly remove the false contours. In this paper, we propose a novel intensity potential (IP) field to model the complicated relationships among pixels. The potential value decreases as the spatial distance to the field source increases and the potentials from different field sources are additive. Based on the proposed IP field, an adaptive de-quantization procedure is then proposed to convert low-bit-depth images to high-bit-depth ones. To the best of our knowledge, this is the first attempt to apply potential field for natural images. The proposed potential field preserves local consistency and models the complicated contexts well. Extensive experiments on natural, synthetic, and high-dynamic range image data sets validate the efficiency of the proposed IP field. Significant improvements have been achieved over the state-of-the-art methods on both the peak signal-to-noise ratio and the structural similarity.

Posted Content
TL;DR: This work explores an network-binarization approach for SR tasks without sacrificing much reconstruction accuracy, and shows that binarized SR networks achieve comparable qualitative and quantitative results as their real-weight counterparts.
Abstract: Deep convolutional neural networks (DCNNs) have recently demonstrated high-quality results in single-image super-resolution (SR). DCNNs often suffer from over-parametrization and large amounts of redundancy, which results in inefficient inference and high memory usage, preventing massive applications on mobile devices. As a way to significantly reduce model size and computation time, binarized neural network has only been shown to excel on semantic-level tasks such as image classification and recognition. However, little effort of network quantization has been spent on image enhancement tasks like SR, as network quantization is usually assumed to sacrifice pixel-level accuracy. In this work, we explore an network-binarization approach for SR tasks without sacrificing much reconstruction accuracy. To achieve this, we binarize the convolutional filters in only residual blocks, and adopt a learnable weight for each binary filter. We evaluate this idea on several state-of-the-art DCNN-based architectures, and show that binarized SR networks achieve comparable qualitative and quantitative results as their real-weight counterparts. Moreover, the proposed binarized strategy could help reduce model size by 80% when applying on SRResNet, and could potentially speed up inference by 5 times.

Journal ArticleDOI
TL;DR: This paper proposes an unsupervised quantization strategy called between-cluster distance-based quantization to preserve the neighborhood structure between the binary fingerprint space and the original feature space and shows that the proposed method achieves effective performance under common modifications.
Abstract: Local feature points have been widely employed in robust image fingerprinting. One of their intrinsic advantages is their invariance under geometric transforms. However, their robustness against certain attacks that modify the positions of points, such as additive noising and blurring, is limited. In addition, local-feature-point-based approaches ignore the distribution of the feature points. In this paper, we harness feature point relationships, including local structures and global relevance, to overcome these limitations. In the relationship mining strategy, Delaunay triangulation is first applied to the feature points to capture their geometric structures. Subsequently, local structures are represented by searching for an independent set in the mapping graph constructed via Delaunay triangulation, whereas the global relevance is represented by the Laplacian of the graph. Finally, the local structures and global relevance are used as input to the quantization process of the image fingerprinting system. In the process of quantization, we propose an unsupervised quantization strategy called between-cluster distance-based quantization to preserve the neighborhood structure between the binary fingerprint space and the original feature space. Experimental results show that the proposed method achieves effective performance under common modifications.

Journal ArticleDOI
08 May 2018-PLOS ONE
TL;DR: It is shown that, if well designed, an image in its highly compressed form can be well-classified with a CNN model trained in advance using adequately compressed data and reduces energy consumption by ∼71% compared to a WISN that sends the original uncompressed images.
Abstract: With the introduction of various advanced deep learning algorithms, initiatives for image classification systems have transitioned over from traditional machine learning algorithms (e.g., SVM) to Convolutional Neural Networks (CNNs) using deep learning software tools. A prerequisite in applying CNN to real world applications is a system that collects meaningful and useful data. For such purposes, Wireless Image Sensor Networks (WISNs), that are capable of monitoring natural environment phenomena using tiny and low-power cameras on resource-limited embedded devices, can be considered as an effective means of data collection. However, with limited battery resources, sending high-resolution raw images to the backend server is a burdensome task that has direct impact on network lifetime. To address this problem, we propose an energy-efficient pre- and post- processing mechanism using image resizing and color quantization that can significantly reduce the amount of data transferred while maintaining the classification accuracy in the CNN at the backend server. We show that, if well designed, an image in its highly compressed form can be well-classified with a CNN model trained in advance using adequately compressed data. Our evaluation using a real image dataset shows that an embedded device can reduce the amount of transmitted data by ∼71% while maintaining a classification accuracy of ∼98%. Under the same conditions, this process naturally reduces energy consumption by ∼71% compared to a WISN that sends the original uncompressed images.