scispace - formally typeset
Open AccessJournal ArticleDOI

A 90 nm CMOS, $6\ {\upmu {\text{W}}}$ Power-Proportional Acoustic Sensing Frontend for Voice Activity Detection

Reads0
Chats0
TLDR
In this article, the authors presented a new power-proportional sensing paradigm and the use of machine-learning-assisted moderate-precision analog analytics for classification of speech and non-speech.
Abstract
This work presents a ${\text{sub}}{\text{-}}6\ \upmu {\text{W}} $ acoustic frontend for speech/non-speech classification in a voice activity detection (VAD) in 90 nm CMOS. Power consumption of the VAD system is minimized by architectural design around a new power-proportional sensing paradigm and the use of machine-learning-assisted moderate-precision analog analytics for classification. Power-proportional sensing allows for hierarchical and context-aware scaling of the frontend’s power consumption depending on the complexity of the ongoing information extraction, while the use of analog analytics brings increased power efficiency through switching on / off the computation of individual features depending on the features’ usefulness in a particular context. The proposed VAD system reduces the power consumption by $\text{{10}} \times $ as compared to state-of-the-art (SotA) systems and yet achieves an 89% average hit rate (HR) for a 12 dB signal-to-acoustic-noise ratio (SANR) in babble context, which is at par with software-based VAD systems.

read more

Content maybe subject to copyright    Report

Citati
o
Archi
v
Publi
s
Journ
Auth
o
IR
o
n
v
ed version
s
hed version
al homepag
e
o
r contact
Ko
m
A
9
for
IE
E
Aut
pa
p
htt
p
e
htt
p
ko
htt
p
m
ail Badami,
S
9
0 nm CMO
S
Voice Activi
t
E
E Journal of
S
hor manuscri
p
er, but witho
u
p
://ieeexplore.
i
p
://sscs.ieee.o
m
ail.badami@
e
p
s://lirias.kule
u
S
teven Lauw
e
S
, 6 μW Pow
t
y Detection
S
olid State Ci
r
pt: the conte
n
u
t the final typ
e
i
eee.org/docu
rg/en/publica
t
e
sat.kuleuve
n
u
ven.be/handl
e
e
reins, Wann
e
er-Proportio
n
r
cuits, Vol. 51
,
n
t is identica
l
e
setting by th
e
ment/731502
5
t
ions/ieee-
j
ou
r
n
.be
e
/123456789
/
s Meert, Mari
a
n
al Acoustic
,
Issue 1
l
to the cont
e
e
publisher
5
/?arnumber
=
r
nal-of-solid-s
t
/
514022
a
n Verhelst, (
2
Sensing Fr
o
e
nt of the pu
b
=
7315025
t
ate-circuits-
j
s
2
014),
o
ntend
b
lished
sc

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO
EDIT) <
1
A 90 nm CMOS, 6 μW Power-Proportional Acoustic
Sensing Frontend for Voice Activity Detection
Abstract – This work presents a sub-6 µW acoustic front-end for speech/non-speech
classification in a voice activity detection (VAD) in 90 nm CMOS. Power consumption of the
VAD system is minimized by architectural design around a new Power-Proportional sensing
paradigm and the use of machine-learning assisted moderate-precision analog analytics for
classification. Power-Proportional sensing allows for hierarchical and context-aware scaling of
the frontend’s power consumption depending on the complexity of the ongoing information
extraction, while the use of analog analytics brings increased power efficiency through switching
on/off the computation of individual features depending on the features’ usefulness in a
particular context. The proposed VAD system reduces the power consumption by 10X as
compared to state-of-the-art systems and yet achieves an 89% average hit rate for a 12 dB signal
to acoustic noise ratio in babble context, which is at par with software based VAD systems.
I. INTRODUCTION
Technological innovations are changing the way we interact with electronic devices.
Interactions like voice control and gesture recognition are rapidly gaining popularity. Such
natural interactive systems do not only need many integrated sensors, but also always-awake,
reactive sensor frontends. These frontends generate large amounts of raw signals that state-of-the
art (SotA) frontends immediately digitize for processing on a DSP. This very robust approach is
Komail Badami, Steven Lauwereins, Wannes Meert, Marian Verhelst, KU Leuven, Belgium

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO
EDIT) <
2
not power efficient, as not all raw sensor signals are equally relevant. The net information
content of a sensed signal is quite often significantly smaller than the Nyquist rate [1-7]. Existing
works such as Information-Rate processing [1,2], Analog to Information conversion [3-5] and
Compressed Sensing [6,7] show power savings by extracting or compressing the information
from signals before digitizing the data. However, as these schemes operate in a static way, the
compression or extraction parameters are set beforehand. Yet, the information content in raw
signals and its application relevance dynamically varies depending on the operating context.
Operating such systems efficiently hence requires a dynamic system adaptation depending on the
context or signal information content. Existing systems do not perform such fine grain adaptive
behavior, which severely limits their power savings as shown by solid line in Fig. 1.
We propose a self-scalable, Power-Proportional sensing paradigm which gracefully scales the
system’s power consumption with the amount and complexity of extracted information, i.e. the
power consumption for such a system increases only as the task of information extraction gets
more complex. To this end, in this paper we propose key enablers for Power-Proportionality and
apply them to a proof of concept acoustic frontend for voice activity detection (VAD).
VAD systems distinguish speech from non-speech in different background noise contexts for
varying signal to acoustic noise ratios (SANR). SotA VAD systems [8-10] extract complex
features like Mel-Frequency Cepstral Coefficients, DCT etc. to differentiate speech from non-
speech. The high computational complexity of such features results in large power consumption,
typically about 50 - 100 µW [8-11] in addition to the power consumption of the required active
microphone. Such a continuous large power consumption is unacceptable for battery powered
always-on sensor frontends. This work exploits our new Power-Proportional sensing paradigm
along with moderate-precision, computationally-inexpensive, analog feature-extraction, coupled

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO
EDIT) <
3
with an embedded mixed-signal classifier to save more than 10X power consumption over SotA
without compromising on the classification accuracy.
The outline for this paper is as follows. Section II discusses insights into the design principles
for Power-Proportional sensing and explains the rationale behind the analog feature-extraction
instead of the commonly used digital scheme. Section III describes the architecture and
specification set for VAD while the detailed implementation is discussed in Section IV.
Measurement results for the chip and for the full VAD system are discussed in Section V.
II. KEY PRINCIPLES FOR POWER EFFICIENT SENSING
This section details the two key principles that allow our always-on sensing system to scale its
power consumption with the information extracted saving 10X power over SotA VAD systems.
A. Power-Proportional Sensing
The core premise for Power-Proportional sensing is that power consumption of the sensing
system scales proportionally with the complexity of the sensing task. The sensing process with
the target of information extraction can increase in complexity along two dimensions:
First, the amount of information extracted from the incoming signal can scale in complexity.
Consider for example, the task of speaker identification v/s speech detection. The former task
entails the later as a prerequisite first step, hence justifying the increase in power consumption.
Enabling hierarchical operation for tasks of increasing complexity allows scaling of power
consumption with complexity of information extraction. In such an architecture each processing
stage extracts more complex information than the previous stage while consuming more power.
This enables information extraction by necessity, as is shown on the horizontal-axis in Fig. 1.
Secondly, even if the amount of extracted information remains the same, distinguishing the
useful information from the background noise (the context) is subject to varying levels of

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO
EDIT) <
4
difficulty. For this case consider the complexity of speech detection in a quiet office, in contrast
to a noisy street environment. The amount of information needed is same in both cases, but in the
latter case as the background noise maps directly onto the information spectrum, it creates in-
band interference on the desired signal. As such, distinguishing speech from non-speech
becomes more complex, hence justifying the increase in power consumption. Context-awareness
enables Power-Proportional sensing to scale power as the background noise context scales the
complexity of information extraction, as shown in bold in Fig. 1. For the example above,
context-awareness allows to use a much smaller discriminating feature subset in a low noise
environment and a relatively larger subset for noisy background contexts, hence scaling power.
SotA sensing systems do not exploit the power scaling opportunity offered by the above
scenarios, and typically operate constantly in full processing mode. This plateaus the on-state
power consumption for SotA sensing systems independent of system utility as shown in Fig. 1.
B. Power Efficiency through Analog Analytics
The Power-Proportional sensing paradigm as highlighted in previous paragraph needs
complexity and precision dependent power scalable hardware blocks. Such power scaling with
precision is very different for analog and digital implementations. Analog power consumption
scales gradually for thermal noise limited system with low-to-medium precision, while digital
has a logarithmic power v/s precision profile. As it has been shown in [12] and in Fig. 2, for a
0.25 µm CMOS technology, analog computation is not only more power-efficient than digital for
low-to-medium resolution processing, but also exhibits better scalability.
Reduction in supply voltage due to technology scaling allows more power efficient digital
circuits and questions the beneficial analog behavior in advanced technologies. This is because
with scaling, the cost of maintaining the same precision in analog increases as a larger bias

Citations
More filters
Journal ArticleDOI

An F-Band $n$ -Probe Standing Wave Detector for Complex Reflection Coefficient Measurements in 40-nm CMOS

TL;DR: In this article, an F-band-integrated standing wave detector for complex reflection measurements is presented, where the complex reflection coefficient is derived from a standing wave, measured by 312 power detectors coupled from underneath a transmission line (TL), taking the loss of the TL into account.
Journal ArticleDOI

Power-aware feature selection for optimized Analog-to-Feature converter

TL;DR: In this article , the authors proposed to use non-uniform wavelet sampling (NUWS) combined with feature selection to find and extract from the signal, a small set of relevant features for electrocardiogram (ECG) anomalies detection.
Journal ArticleDOI

Hardware Acceleration for Embedded Keyword Spotting: Tutorial and Survey

TL;DR: In recent years, Keyword Spotting (KWS) has become a crucial human-machine interface for mobile devices, allowing users to interact more naturally with their gadgets by leveraging their own voice.
Proceedings ArticleDOI

Cap-less audio preamplifiers for silicon microphones

TL;DR: In this paper, the authors proposed two fully integrated cap-less preamplifiers for silicon microphones, which achieve a flat frequency response between 20 Hz and 20 kHz with a programmable gain ranging from −6 dB to +18 dB, and an integrated input referred noise lower than −100 dBV.
Journal ArticleDOI

Ultra-Low-Power and Compact-Area Analog Audio Feature Extraction Based on Time-Mode Analog Filterbank Interpolation and Time-Mode Analog Rectification

TL;DR: In this paper , the authors proposed two time-mode analog signal processing (ASP) circuit techniques showcased in an analog audio feature extractor chip that advances the state of the art in power and area efficiency.
References
More filters
Journal ArticleDOI

An introduction to ROC analysis

TL;DR: The purpose of this article is to serve as an introduction to ROC graphs and as a guide for using them in research.
Journal ArticleDOI

An Introduction To Compressive Sampling

TL;DR: The theory of compressive sampling, also known as compressed sensing or CS, is surveyed, a novel sensing/sampling paradigm that goes against the common wisdom in data acquisition.
Journal ArticleDOI

A statistical model-based voice activity detection

TL;DR: An effective hang-over scheme which considers the previous observations by a first-order Markov process modeling of speech occurrences is proposed which shows significantly better performances than the G.729B VAD in low signal-to-noise ratio (SNR) and vehicular noise environments.
Journal ArticleDOI

Beyond Nyquist: Efficient Sampling of Sparse Bandlimited Signals

TL;DR: A new type of data acquisition system, called a random demodulator, that is constructed from robust, readily available components that supports the empirical observations, and a detailed theoretical analysis of the system's performance is provided.
Journal ArticleDOI

A Scale for the Measurement of the Psychological Magnitude Pitch

TL;DR: A subjective scale for the measurement of pitch was constructed from determinations of the half-value of pitches at various frequencies as mentioned in this paper, which differs from both the musical scale and the frequency scale, neither of which is subjective.
Related Papers (5)