scispace - formally typeset
Search or ask a question
Author

Minho Ha

Bio: Minho Ha is an academic researcher from Pohang University of Science and Technology. The author has contributed to research in topics: Artificial neural network & Convolutional neural network. The author has an hindex of 4, co-authored 8 publications receiving 71 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: The proposed design, even with the additional error recovery module, is more accurate, requires less hardware, and consumes less power than previously proposed 4–2 compressor-based approximate multiplier designs.
Abstract: Approximate multiplication is a common operation used in approximate computing methods for high performance and low power computing. Power-efficient circuits for approximate multiplication can be realized with an approximate 4–2 compressor. This letter presents a novel design that uses a modification of a previous approximate 4–2 compressor design and adds an error recovery module. The proposed design, even with the additional error recovery module, is more accurate, requires less hardware, and consumes less power than previously proposed 4–2 compressor-based approximate multiplier designs.

130 citations

Journal ArticleDOI
TL;DR: This brief presents a hardware-efficient logarithm circuit design based on a novel discontinuous piecewise linear approximation method and results targeted for a commercial application specific integrated circuit cell library and field-programmable gate array show the practicality of the proposed design.
Abstract: This brief presents a hardware-efficient logarithm circuit design based on a novel discontinuous piecewise linear approximation method. Hardware synthesis results targeted for a commercial application specific integrated circuit cell library and field-programmable gate array show the practicality of the proposed design. A new figure of merit that combines error, area, time, and power is introduced and used to show that the proposed method provides the designer with useful design options when implementing logarithmic conversion.

9 citations

Journal ArticleDOI
TL;DR: The proposed selective DCNN scores up to $2.18\times $ higher than the state-of-the-art DCNN model when evaluated using NetScore, a comprehensive metric that considers both CNN performance and hardware cost.
Abstract: Neural networks trained using images with a certain type of distortion should be better at classifying test images with the same type of distortion than generally-trained neural networks, given other factors being equal. Based on this observation, an ensemble of convolutional neural networks (CNNs) trained with different types and degrees of distortions is used. However, instead of simply classifying test images of unknown distortion types with the entire ensemble of CNNs, an extra tiny CNN is specifically trained to distinguish between the different types and degrees of distortions. Then, only the dedicated CNN for that specific type and degree of distortion, as determined by the tiny CNN, is activated and used to classify a possibly distorted test image. This proposed architecture, referred to as a selective deep convolutional neural network (DCNN), is implemented and found to result in high accuracy with low hardware costs. Detailed simulations with realistic image distortion scenarios using three popular datasets show that memory, MAC operations, and energy savings of up to 93.68%, 93.61%, and 91.92%, respectively, can be achieved with almost no reduction in image classification accuracy. The proposed selective DCNN scores up to $2.18\times $ higher than the state-of-the-art DCNN model when evaluated using NetScore, a comprehensive metric that considers both CNN performance and hardware cost. In addition, it is shown that even higher hardware cost reduction can be achieved when selective DCNN is combined with previously proposed model compression techniques. Finally, experiments conducted with extended types and degrees of image distortion show that selective DCNN is highly scalable.

9 citations

Proceedings ArticleDOI
25 Mar 2019
TL;DR: A novel channel scaling scheme for convolutional neural networks (CNNs) that can improve the recognition accuracy for the practical distorted images without increasing the network complexity is presented.
Abstract: In this paper, we present a novel channel scaling scheme for convolutional neural networks (CNNs), which can improve the recognition accuracy for the practical distorted images without increasing the network complexity. During the training phase, the proposed work first prepares multiple filters under the same CNN architecture by taking account of different noise models and strengths. We then newly introduce an FFT-based noise classifier, which determines the noise property in the received input image by calculating the partial sum of the frequency-domain values. Based on the detected noise class, we dynamically change the filters of each CNN layer to provide the dedicated recognition. Furthermore, we propose a channel scaling technique to reduce the number of active filter parameters if the input data is relatively clean. Experimental results show that the proposed dynamic channel scaling reduces the computational complexity as well as the energy consumption, still providing the acceptable accuracy for intelligent edge devices.

8 citations

Journal ArticleDOI
TL;DR: Layerwise buffer voltage scaling is proposed as an effective technique for reducing buffer access energy and overall system energy without sacrificing image classification accuracy in a convolutional neural network.
Abstract: In order to effectively reduce buffer energy consumption, which constitutes a significant part of the total energy consumption in a convolutional neural network (CNN), it is useful to apply different amounts of energy conservation effort to the different levels of a CNN as the buffer energy to total energy usage ratios can differ quite substantially across the layers of a CNN. This article proposes layerwise buffer voltage scaling as an effective technique for reducing buffer access energy. Error-resilience analysis, including interlayer effects, conducted during design-time is used to determine the specific buffer supply voltage to be used for each layer of a CNN. Then these layer-specific buffer supply voltages are used in the CNN for image classification inference. Error injection experiments with three different types of CNN architectures show that, with this technique, the buffer access energy and overall system energy can be reduced by up to 68.41% and 33.68%, respectively, without sacrificing image classification accuracy.

7 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This study highlights that there is no unique winning approximate compressor topology since the best solution depends on the required precision, on the signedness of the multiplier and on the considered error metric.
Abstract: Approximate multipliers attract a large interest in the scientific literature that proposes several circuits built with approximate 4-2 compressors. Due to the large number of proposed solutions, the designer who wishes to use an approximate 4-2 compressor is faced with the problem of selecting the right topology. In this paper, we present a comprehensive survey and comparison of approximate 4-2 compressors previously proposed in literature. We present also a novel approximate compressor, so that a total of twelve different approximate 4-2 compressors are analyzed. The investigated circuits are employed to design $8\times 8$ and $16\times 16$ multipliers, implemented in 28nm CMOS technology. For each operand size we analyze two multiplier configurations, with different levels of approximations, both signed and unsigned. Our study highlights that there is no unique winning approximate compressor topology since the best solution depends on the required precision, on the signedness of the multiplier and on the considered error metric.

126 citations

Journal ArticleDOI
TL;DR: This study proposes an ultra-efficient imprecise 4:2 compressor and multiplier circuits as the building blocks of the approximate computing systems and indicates that the proposed inexact multiplier provides a significant compromise between accuracy and design efficiency for approximate computing.
Abstract: Approximate computing is an emerging approach for reducing the energy consumption and design complexity in many applications where accuracy is not a crucial necessity. In this study, ultra-efficient imprecise 4:2 compressor and multiplier circuits as the building blocks of the approximate computing systems are proposed. The proposed compressor uses only one majority gate which is different from the conventional design methods using AND - OR and XOR logics. Furthermore, the majority gate is the fundamental logic block in many of the emerging majority-friendly nanotechnologies such as quantum-dot cellular automata ( QCA ) and single-electron transistor ( SET ). The proposed circuits are designed using FinFET as a current industrial technology and are simulated with HSPICE at 7nm technology node. The results indicate that our imprecise compressor is superior to its previous counterparts in terms of delay, power consumption, power delay product (PDP) and area, and improves these parameters on average by 32%, 68%, 78%, and 66%, respectively. In addition, the proposed efficient approximate multiplier is utilized in image multiplying as an important image processing application. The HSPICE and MATLAB simulations indicate that the proposed inexact multiplier provides a significant compromise between accuracy and design efficiency for approximate computing.

118 citations

Journal ArticleDOI
TL;DR: The proposed approximate multiplier has an almost Gaussian error distribution with a near-zero mean value and is exploited in the structure of a JPEG encoder, sharpening, and classification applications, indicating that the quality degradation of the output is negligible.
Abstract: A scalable approximate multiplier, called truncation- and rounding-based scalable approximate multiplier (TOSAM) is presented, which reduces the number of partial products by truncating each of the input operands based on their leading one-bit position. In the proposed design, multiplication is performed by shift, add, and small fixed-width multiplication operations resulting in large improvements in the energy consumption and area occupation compared to those of the exact multiplier. To improve the total accuracy, input operands of the multiplication part are rounded to the nearest odd number. Because input operands are truncated based on their leading one-bit positions, the accuracy becomes weakly dependent on the width of the input operands and the multiplier becomes scalable. Higher improvements in design parameters (e.g., area and energy consumption) can be achieved as the input operand widths increase. To evaluate the efficiency of the proposed approximate multiplier, its design parameters are compared with those of an exact multiplier and some other recently proposed approximate multipliers. Results reveal that the proposed approximate multiplier with a mean absolute relative error in the range of 11%–0.3% improves delay, area, and energy consumption up to 41%, 90%, and 98%, respectively, compared to those of the exact multiplier. It also outperforms other approximate multipliers in terms of speed, area, and energy consumption. The proposed approximate multiplier has an almost Gaussian error distribution with a near-zero mean value. We exploit it in the structure of a JPEG encoder, sharpening, and classification applications. The results indicate that the quality degradation of the output is negligible. In addition, we suggest an accuracy configurable TOSAM where the energy consumption of the multiplication operation can be adjusted based on the minimum required accuracy.

99 citations

Journal ArticleDOI
TL;DR: This paper proposes efficient imprecise 4:2 and 5:2 compressors by modifying the truth table of the exact compressors to achieve simpler logic functions with fewer output errors and provides a significant compromise between hardware efficiency and accuracy for approximate computing.
Abstract: Approximate computing is a new paradigm for designing energy-efficient integrated circuits at the nanoscale. In this paper, we propose efficient imprecise 4:2 and 5:2 compressors by modifying the truth table of the exact compressors to achieve simpler logic functions with fewer output errors. The proposed approach leads to imprecise compressors with significantly fewer transistors and higher performance in comparison with their previous counterparts. Moreover, efficient approximate multipliers are designed based on the proposed imprecise compressors. The circuits are designed using FinFET as one of the leading industrial technologies and are simulated with HSPICE at 7 nm technology node. Furthermore, the approximate multipliers are used in the image processing applications, including image multiplication, sharpening and smoothing, and the peak signal to noise ratio (PSNR) and mean structure similarity index metric (MSSIM) as two important quality metrics are calculated using MATLAB. The results indicate significant improvements regarding performance, energy-efficiency, and the number of transistors compared to the other existing exact and approximate designs. The proposed 4:2 and 5:2 compressors improve the power delay product (PDP), on average by 59% and 68%, and area by 60% and 75%, respectively, in comparison with the previous designs. In addition, the proposed multipliers provide a significant compromise between hardware efficiency and accuracy for approximate computing. The proposed approximate multiplier using both imprecise 4:2 and 5:2 compressors improves the figure of merit, considering both image quality (based on PSNR and MSSIM) and energy efficiency, by 2.35 times as compared to its previous counterparts.

71 citations

Journal ArticleDOI
TL;DR: The design and analysis of two approximate compressors with reduced area, delay and power with comparable accuracy when compared with the existing architectures are explored and extended to project the application of the proposed design in error resilient applications like image smoothing and multiplication.
Abstract: High speed multimedia applications have paved way for a whole new area in high speed error-tolerant circuits with approximate computing. These applications deliver high performance at the cost of reduction in accuracy. Furthermore, such implementations reduce the complexity of the system architecture, delay and power consumption. This paper explores and proposes the design and analysis of two approximate compressors with reduced area, delay and power with comparable accuracy when compared with the existing architectures. The proposed designs are implemented using 45 nm CMOS technology and efficiency of the proposed designs have been extensively verified and projected on scales of area, delay, power, Power Delay Product (PDP), Error Rate (ER), Error Distance (ED), and Accurate Output Count (AOC). The proposed approximate 4:2 compressor shows 56.80% reduction in area, 57.20% reduction in power, and 73.30% reduction in delay compared to an accurate 4:2 compressor. The proposed compressors are utilised to implement $8 \times 8$ and $16 \times 16$ Dadda multipliers. These multipliers have comparable accuracy when compared with state-of-the-art approximate multipliers. The analysis is further extended to project the application of the proposed design in error resilient applications like image smoothing and multiplication.

57 citations