scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A 90 nm CMOS, $6\ {\upmu {\text{W}}}$ Power-Proportional Acoustic Sensing Frontend for Voice Activity Detection

TL;DR: In this article, the authors presented a new power-proportional sensing paradigm and the use of machine-learning-assisted moderate-precision analog analytics for classification of speech and non-speech.
Abstract: This work presents a ${\text{sub}}{\text{-}}6\ \upmu {\text{W}} $ acoustic frontend for speech/non-speech classification in a voice activity detection (VAD) in 90 nm CMOS. Power consumption of the VAD system is minimized by architectural design around a new power-proportional sensing paradigm and the use of machine-learning-assisted moderate-precision analog analytics for classification. Power-proportional sensing allows for hierarchical and context-aware scaling of the frontend’s power consumption depending on the complexity of the ongoing information extraction, while the use of analog analytics brings increased power efficiency through switching on / off the computation of individual features depending on the features’ usefulness in a particular context. The proposed VAD system reduces the power consumption by $\text{{10}} \times $ as compared to state-of-the-art (SotA) systems and yet achieves an 89% average hit rate (HR) for a 12 dB signal-to-acoustic-noise ratio (SANR) in babble context, which is at par with software-based VAD systems.

Summary (3 min read)

Introduction

  • Power consumption of the VAD system is minimized by architectural design around a new Power-Proportional sensing paradigm and the use of machine-learning assisted moderate-precision analog analytics for classification.
  • Power-Proportional sensing allows for hierarchical and context-aware scaling of the frontend’s power consumption depending on the complexity of the ongoing information extraction, while the use of analog analytics brings increased power efficiency through switching on/off the computation of individual features depending on the features’ usefulness in a particular context.
  • Technological innovations are changing the way the authors interact with electronic devices.
  • Yet, the information content in raw signals and its application relevance dynamically varies depending on the operating context.
  • VAD systems distinguish speech from non-speech in different background noise contexts for varying signal to acoustic noise ratios (SANR).

A. Power-Proportional Sensing

  • The core premise for Power-Proportional sensing is that power consumption of the sensing system scales proportionally with the complexity of the sensing task.
  • First, the amount of information extracted from the incoming signal can scale in complexity.
  • In such an architecture each processing stage extracts more complex information than the previous stage while consuming more power.
  • Context-awareness enables Power-Proportional sensing to scale power as the background noise context scales the complexity of information extraction, as shown in bold in Fig.
  • SotA sensing systems do not exploit the power scaling opportunity offered by the above scenarios, and typically operate constantly in full processing mode.

B. Power Efficiency through Analog Analytics

  • The Power-Proportional sensing paradigm as highlighted in previous paragraph needs complexity and precision dependent power scalable hardware blocks.
  • Reduction in supply voltage due to technology scaling allows more power efficient digital circuits and questions the beneficial analog behavior in advanced technologies.
  • This is because with scaling, the cost of maintaining the same precision in analog increases as a larger bias > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 5 current is needed to reduce the noise-floor compensating for reduction in signal swing.
  • Hence, absolute precision requirements for such systems are rather modest, and mismatches and offset impairments are automatically taken care of by the embedded trained classifier in the loop.
  • As demonstrated by this work, as well as some existing works, machine learning assisted [13, 14] and/or digital calibration [15] can improve SNR by 6 – 10 dB for comparable power which pushes the efficiency crossover point in the rightward direction as shown in Fig.

III. SYSTEM ARCHITECTURE AND SPECIFICATIONS

  • This section highlights the use of the aforementioned key principles in the developed VAD architecture [16] and derives the specifications for the analog/mixed-signal building blocks.
  • If the signal is speech, the classifier wakes up the microcontroller for more advanced processing.
  • This allows scaling the power with necessary information as outlined in Section II.
  • As further modelled in subsection B, considering that the analog feature-extraction blocks are in the loop during this training operation, all static analog impairments such as mismatch, gain errors, or offsets are absorbed in the trained feature thresholds and do not affect the classification > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 7 accuracy.
  • B derives specifications for the targeted VAD system.

B. Specifications for VAD system

  • This section first derives the system level specifications and then the specifications for individual analog blocks.
  • Mathematically, each analog feature is defined as ∗ (1) where is the amplified acoustic signal, is the impulse response of band pass filter used to decompose the input signal into a smaller frequency band, ,∗ and represent the absolute value, convolution, and averaging respectively.
  • The MATLAB model varies the number of computed features in the above frequency range by scaling the Q factor of the band pass filters.
  • The results of the above simulation are shown in Fig. 4(a).
  • It can be seen that more features improve classification > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 8 accuracy, yet accuracy gains diminish beyond 16 features allowing us to limit their design to a maximum of 16 (individually (dis)activated) features.

IV. SYSTEM IMPLEMENTATION

  • This Section details the implementation nuances of the individual system blocks discussed in the previous section: namely the wakeup detector, the analog feature-extractor and the embedded mixed-signal classifier.
  • A further subsection discusses system training for the complete VAD system before discussing one-time calibration and measurement results in Section V.

A. Wakeup detector

  • The always-awake threshold-based wakeup detector acts as the system’s watch-dog that wakes up the analog feature-extractor only when a signal of sufficient strength is detected.
  • The wakeup detector is a low power 3-phase comparator and its schematic is shown in Fig.
  • Each amplifier is a PMOS input source-coupled single-ended differential amplifier and can be turned on/off individually to save power depending on the microphone’s signal-level and is designed to provide a mid-band gain of 20 dB.
  • Measured power consumption of this block is 700 nW when all four amplifier stages are turned on, and excluding the external bias.
  • > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 10.

B. Analog feature-extractor

  • On receiving the wakeup signal from the threshold based wakeup detector, the analog featureextractor decomposes the input signal into the set of 16 features.
  • This contributes to Power-Proportional information extraction, as it allows turning off amplifier stages of unused features along with all other circuitry involved in individual feature computation.
  • A.2. The sub-blocks of the analog feature-extractor are now explained in more detail.
  • To cover for this, the f-3dB of the amplifiers in each band also increases progressively from band 1 to band 16.
  • The architecture of the currentmode averaging is shown in Fig. 12.

C. Decision tree based classifier

  • The extracted feature subset, af5 - af12, is passed on to the on-chip classifier (Fig. 5) while the complete feature-set af1 - af16 can be passed on to an off-chip ADC for more complex information extraction, such as context-change detection and retraining the classifier as in [22].
  • In these cases, the Nyquist sampling rate for the features is only 16x2x16 = 512 Hz instead of 8 kHz for audio.
  • Each node of the decision tree can be configured to select one feature out of af5 - af12.
  • To this end, the on-chip decision tree classifier is trained with their modified C4.5 algorithm with 160 s of labeled data from the standardized NOIZEUS database [23].
  • The authors modification to C4.5 maximizes the information-gain/watt and therefore outputs a resourceefficient model that maximizes the information capture while minimizing the power [22].

B. System measurement results

  • The chip is integrated with the microcontroller using external level-shifters and DACs, to form the complete VAD.
  • Fig. 20 shows a one-time calibration to characterize for mismatch in the ADC and DAC paths.
  • This subsection also displays the classification accuracy results for the complete VAD system and illustrates the achieved Power-Proportionality.
  • Receiver operating characteristic (ROC) curves characterize the classifier systems and depict hit-rates (HR) for the variables under observation [24].
  • The power consumption for signal detection is measured to be below 1 µW, whereas power consumption for classification varies depending on complexity of the operating context and has an upper bound of 6 µW.

VI. CONCLUSIONS

  • This work demonstrates a power efficient acoustic sensing frontend for speech/non-speech classification in a voice activity detection system.
  • K. Badami, S. Lauwereins, W. Meert, and M. Verhelst, ‘Context-aware hierarchical informationsensing in a 6μW 90nm CMOS voice activity detector’, 2015 IEEE International Solid-State Circuits Conference - Digest of Technical Papers, 2015.

Did you find this useful? Give us your feedback

Figures (23)

Content maybe subject to copyright    Report

Citati
o
Archi
v
Publi
s
Journ
Auth
o
IR
o
n
v
ed version
s
hed version
al homepag
e
o
r contact
Ko
m
A
9
for
IE
E
Aut
pa
p
htt
p
e
htt
p
ko
htt
p
m
ail Badami,
S
9
0 nm CMO
S
Voice Activi
t
E
E Journal of
S
hor manuscri
p
er, but witho
u
p
://ieeexplore.
i
p
://sscs.ieee.o
m
ail.badami@
e
p
s://lirias.kule
u
S
teven Lauw
e
S
, 6 μW Pow
t
y Detection
S
olid State Ci
r
pt: the conte
n
u
t the final typ
e
i
eee.org/docu
rg/en/publica
t
e
sat.kuleuve
n
u
ven.be/handl
e
e
reins, Wann
e
er-Proportio
n
r
cuits, Vol. 51
,
n
t is identica
l
e
setting by th
e
ment/731502
5
t
ions/ieee-
j
ou
r
n
.be
e
/123456789
/
s Meert, Mari
a
n
al Acoustic
,
Issue 1
l
to the cont
e
e
publisher
5
/?arnumber
=
r
nal-of-solid-s
t
/
514022
a
n Verhelst, (
2
Sensing Fr
o
e
nt of the pu
b
=
7315025
t
ate-circuits-
j
s
2
014),
o
ntend
b
lished
sc

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO
EDIT) <
1
A 90 nm CMOS, 6 μW Power-Proportional Acoustic
Sensing Frontend for Voice Activity Detection
Abstract – This work presents a sub-6 µW acoustic front-end for speech/non-speech
classification in a voice activity detection (VAD) in 90 nm CMOS. Power consumption of the
VAD system is minimized by architectural design around a new Power-Proportional sensing
paradigm and the use of machine-learning assisted moderate-precision analog analytics for
classification. Power-Proportional sensing allows for hierarchical and context-aware scaling of
the frontend’s power consumption depending on the complexity of the ongoing information
extraction, while the use of analog analytics brings increased power efficiency through switching
on/off the computation of individual features depending on the features’ usefulness in a
particular context. The proposed VAD system reduces the power consumption by 10X as
compared to state-of-the-art systems and yet achieves an 89% average hit rate for a 12 dB signal
to acoustic noise ratio in babble context, which is at par with software based VAD systems.
I. INTRODUCTION
Technological innovations are changing the way we interact with electronic devices.
Interactions like voice control and gesture recognition are rapidly gaining popularity. Such
natural interactive systems do not only need many integrated sensors, but also always-awake,
reactive sensor frontends. These frontends generate large amounts of raw signals that state-of-the
art (SotA) frontends immediately digitize for processing on a DSP. This very robust approach is
Komail Badami, Steven Lauwereins, Wannes Meert, Marian Verhelst, KU Leuven, Belgium

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO
EDIT) <
2
not power efficient, as not all raw sensor signals are equally relevant. The net information
content of a sensed signal is quite often significantly smaller than the Nyquist rate [1-7]. Existing
works such as Information-Rate processing [1,2], Analog to Information conversion [3-5] and
Compressed Sensing [6,7] show power savings by extracting or compressing the information
from signals before digitizing the data. However, as these schemes operate in a static way, the
compression or extraction parameters are set beforehand. Yet, the information content in raw
signals and its application relevance dynamically varies depending on the operating context.
Operating such systems efficiently hence requires a dynamic system adaptation depending on the
context or signal information content. Existing systems do not perform such fine grain adaptive
behavior, which severely limits their power savings as shown by solid line in Fig. 1.
We propose a self-scalable, Power-Proportional sensing paradigm which gracefully scales the
system’s power consumption with the amount and complexity of extracted information, i.e. the
power consumption for such a system increases only as the task of information extraction gets
more complex. To this end, in this paper we propose key enablers for Power-Proportionality and
apply them to a proof of concept acoustic frontend for voice activity detection (VAD).
VAD systems distinguish speech from non-speech in different background noise contexts for
varying signal to acoustic noise ratios (SANR). SotA VAD systems [8-10] extract complex
features like Mel-Frequency Cepstral Coefficients, DCT etc. to differentiate speech from non-
speech. The high computational complexity of such features results in large power consumption,
typically about 50 - 100 µW [8-11] in addition to the power consumption of the required active
microphone. Such a continuous large power consumption is unacceptable for battery powered
always-on sensor frontends. This work exploits our new Power-Proportional sensing paradigm
along with moderate-precision, computationally-inexpensive, analog feature-extraction, coupled

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO
EDIT) <
3
with an embedded mixed-signal classifier to save more than 10X power consumption over SotA
without compromising on the classification accuracy.
The outline for this paper is as follows. Section II discusses insights into the design principles
for Power-Proportional sensing and explains the rationale behind the analog feature-extraction
instead of the commonly used digital scheme. Section III describes the architecture and
specification set for VAD while the detailed implementation is discussed in Section IV.
Measurement results for the chip and for the full VAD system are discussed in Section V.
II. KEY PRINCIPLES FOR POWER EFFICIENT SENSING
This section details the two key principles that allow our always-on sensing system to scale its
power consumption with the information extracted saving 10X power over SotA VAD systems.
A. Power-Proportional Sensing
The core premise for Power-Proportional sensing is that power consumption of the sensing
system scales proportionally with the complexity of the sensing task. The sensing process with
the target of information extraction can increase in complexity along two dimensions:
First, the amount of information extracted from the incoming signal can scale in complexity.
Consider for example, the task of speaker identification v/s speech detection. The former task
entails the later as a prerequisite first step, hence justifying the increase in power consumption.
Enabling hierarchical operation for tasks of increasing complexity allows scaling of power
consumption with complexity of information extraction. In such an architecture each processing
stage extracts more complex information than the previous stage while consuming more power.
This enables information extraction by necessity, as is shown on the horizontal-axis in Fig. 1.
Secondly, even if the amount of extracted information remains the same, distinguishing the
useful information from the background noise (the context) is subject to varying levels of

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO
EDIT) <
4
difficulty. For this case consider the complexity of speech detection in a quiet office, in contrast
to a noisy street environment. The amount of information needed is same in both cases, but in the
latter case as the background noise maps directly onto the information spectrum, it creates in-
band interference on the desired signal. As such, distinguishing speech from non-speech
becomes more complex, hence justifying the increase in power consumption. Context-awareness
enables Power-Proportional sensing to scale power as the background noise context scales the
complexity of information extraction, as shown in bold in Fig. 1. For the example above,
context-awareness allows to use a much smaller discriminating feature subset in a low noise
environment and a relatively larger subset for noisy background contexts, hence scaling power.
SotA sensing systems do not exploit the power scaling opportunity offered by the above
scenarios, and typically operate constantly in full processing mode. This plateaus the on-state
power consumption for SotA sensing systems independent of system utility as shown in Fig. 1.
B. Power Efficiency through Analog Analytics
The Power-Proportional sensing paradigm as highlighted in previous paragraph needs
complexity and precision dependent power scalable hardware blocks. Such power scaling with
precision is very different for analog and digital implementations. Analog power consumption
scales gradually for thermal noise limited system with low-to-medium precision, while digital
has a logarithmic power v/s precision profile. As it has been shown in [12] and in Fig. 2, for a
0.25 µm CMOS technology, analog computation is not only more power-efficient than digital for
low-to-medium resolution processing, but also exhibits better scalability.
Reduction in supply voltage due to technology scaling allows more power efficient digital
circuits and questions the beneficial analog behavior in advanced technologies. This is because
with scaling, the cost of maintaining the same precision in analog increases as a larger bias

Citations
More filters
Proceedings ArticleDOI
Dewei Wang1, Sung Justin Kim1, Minhao Yang1, Aurel A. Lazar1, Mingoo Seok1 
13 Feb 2021
TL;DR: In this paper, a normalized acoustic feature extractor chip (NAFE) was proposed for always-on keyword spotting (KWS), which can take an acoustic signal from a microphone and produce spike-rate coded features.
Abstract: In mobile and edge devices, always-on keyword spotting (KWS) is an essential function to detect wake-up words. Recent works achieved extremely low power dissipation down to $\sim500$ nW [1]. However, most of them adopt noise-dependent training, i.e. training for a specific signal-to-noise ratio (SNR) and noise type [1], and therefore their accuracies degrade for different SNR levels and noise types that are not targeted in the training (Fig. 9.9.1, top left). To improve robustness, so-called noise-independent training can be considered, which is to use the training data that includes all the possible SNR levels and noise types [2]. But, this approach is challenging for an ultra-low-power device since it demands a large neural network to learn all the possible features. A neural network of a fixed size has its own memory capacity limit and reaches a plateau in accuracy if it has to learn more than its limit (Fig. 9.9.1, top right). On the other hand, it is known that biological acoustic systems employ a simpler process, called divisive energy normalization (DN), to maintain accuracy even in varying noise conditions [3]. In this work, therefore, by adopting such a DN, we prototype a normalized acoustic feature extractor chip (NAFE) in 65nm. The NAFE can take an acoustic signal from a microphone and produce spike-rate coded features. We pair NAFE with a spiking neural network (SNN) classifier chip [4], creating the end-to-end KWS system. The proposed system achieves 89-to-94% accuracy across -5 to 20dB SNRs and four different noise types on HeySnips [5], while the baseline without DN achieves a much lower accuracy of 71-87%. NAFE consumes up to 109nW and the KWS system 570nW.

14 citations

Journal ArticleDOI
TL;DR: In this article, a bagged tree machine learning (BTML) classifier was used to detect hypnosis level using EEG-based monitoring processor, which is based on 12 temporal and spectral features and achieved high classification accuracy.
Abstract: Most surgical procedures are not possible without general anesthesia which necessitates continuous and accurate monitoring of the patients’ level of hypnosis (LoH). Currently, the LoH is monitored using the conventional methods of either observing the patient’s physiological parameters or using electroencephalogram (EEG)-based monitors. To overcome the limitations of the conventional methods, this work implements an accurate EEG-based LoH monitoring processor using a bagged tree machine-learning (BTML) classifier. It is based on 12 temporal and spectral features to incorporate robustness against age variation and achieve high classification accuracy. Spectral features are computed using discrete wavelet transform (DWT) that uses time-multiplexed filter (TMF) architecture. The TMF DWT consumes 110.6-nJ/feature vector for a 100-tap filter while reducing the area by 11% compared with the conventional method. Moreover, the BTML is implemented using a pipelined approach which enables an efficient on-chip implementation to reduce the hardware cost by $15\times $ compared with the parallel approach. The proposed processor is implemented using a 180-nm CMOS process with an active area of 0.9 mm2 while consuming 1.6 mW. The accuracy of the proposed hypnotic state monitor is verified using two EEG databases with a total of 95 patients and achieves a sensitivity and specificity of 95.4% and 97.7%, respectively.

14 citations

Journal ArticleDOI
Bo Liu1, Zhen Wang, Fan Hu1, Yang Jing1, Wentao Zhu1, Huang Lepeng1, Yu Gong1, Ge Wei1, Longxing Shi1 
TL;DR: An energy-efficient reconfigurable accelerator for keyword spotting (EERA-KWS) based on binary weight network (BWN) and fabricated in 28-nm CMOS technology, which can achieve 163 TOPS/W, which is over 1.8 times better than the state-of-the-art architecture.
Abstract: This paper proposed an energy-efficient reconfigurable accelerator for keyword spotting (EERA-KWS) based on binary weight network (BWN) and fabricated in 28-nm CMOS technology. This keyword spotting system consists of two parts: the feature extraction based on melscale frequency cepstral coefficients (MFCC) and the keywords classification based on a BWN model, which is trained through the Google’s Speech Commands database and deployed on our custom. To reduce the power consumption while maintaining the system recognition accuracy, we first optimize the MFCC implementation with approximate computing techniques, including Pre-emphasis coefficient transformation, rectangular Mel filtering, Framing and FFT optimization. Then, we propose a precision self-adaptive reconfigurable accelerator with digital-analog mixed approximate computing units to process the BWN efficiently. Based on the SNR prediction of background noise and post-detection of network output confidence, the BWN accelerator data path can be dynamically and adaptively reconfigured as 4, 8, or 16 bits. For the BWN accelerator, we proposed a time-delay based addition unit to process bit-wise approximate computing for the convolution layers and fully connected layers, and a LUT based unit for the activation layers. Implemented under TSMC 28 nm HPC+ process technology, the estimated power is $77.8~\mu \text{W}~\sim ~115.9\mu \text{W}$ , the energy efficiency can achieve 163 TOPS/W, which is over $1.8\times $ better than the state-of-the-art architecture.

13 citations

Journal ArticleDOI
27 May 2018
TL;DR: This paper proposes a flexible and iterative neural architecture able to implement multiple types of clique-based neural networks of up to 3968 neurons that has been integrated in a ST 65-nm CMOS ASIC and validated in the context of ECG classification.
Abstract: Clique-based neural networks implement low-complexity functions working with a reduced connectivity between neurons. Thus, they address very specific applications operating with a very low-energy budget. However, the implementation in the state of the art is not flexible and a fabricated circuit is only usable in a unique use case. Besides, the silicon area of hardwired circuits grows exponentially with the number of implemented neurons that is prohibitive for embedded applications. This paper proposes a flexible and iterative neural architecture capable of implementing multiple types of clique-based neural networks of up to 3968 neurons. The circuit has been integrated in an ST 65-nm CMOS ASIC and occupies a 0.21-mm2 silicon surface area. The proper functioning of the circuit is illustrated using two application cases: a keyword recovery application and an electrocardiogram classification. The neurons outputs are updated 83 ns after a stimulation, and a neuron needs an energy of 115 fJ to propagate a change at the input to its output.

11 citations

Proceedings ArticleDOI
01 Aug 2018
TL;DR: To overcome the power consumption limits provided by state of the art threshold detection methods, a novel threshold detection method based on injection-locked oscillator time-domain comparator is proposed and designed in a FDSOI 22 nm process.
Abstract: This paper introduces a review of event-driven wake-up sensors detection approaches and investigates time domain injection-locked oscillator design solutions to reduce power consumption. Threshold-based sensing, digital classification and analog feature extraction method are presented. To overcome the power consumption limits provided by state of the art threshold detection methods, a novel threshold detection method based on injection-locked oscillator time-domain comparator is proposed and designed in a FDSOI 22 nm process. Simulations show a power consumption of 790 pW for an input full scale signal of 320 mV with a frequency of 1 kHz.

11 citations

References
More filters
Journal ArticleDOI
TL;DR: The purpose of this article is to serve as an introduction to ROC graphs and as a guide for using them in research.

17,017 citations

Journal ArticleDOI
TL;DR: The theory of compressive sampling, also known as compressed sensing or CS, is surveyed, a novel sensing/sampling paradigm that goes against the common wisdom in data acquisition.
Abstract: Conventional approaches to sampling signals or images follow Shannon's theorem: the sampling rate must be at least twice the maximum frequency present in the signal (Nyquist rate). In the field of data conversion, standard analog-to-digital converter (ADC) technology implements the usual quantized Shannon representation - the signal is uniformly sampled at or above the Nyquist rate. This article surveys the theory of compressive sampling, also known as compressed sensing or CS, a novel sensing/sampling paradigm that goes against the common wisdom in data acquisition. CS theory asserts that one can recover certain signals and images from far fewer samples or measurements than traditional methods use.

9,686 citations

Journal ArticleDOI
TL;DR: An effective hang-over scheme which considers the previous observations by a first-order Markov process modeling of speech occurrences is proposed which shows significantly better performances than the G.729B VAD in low signal-to-noise ratio (SNR) and vehicular noise environments.
Abstract: In this letter, we develop a robust voice activity detector (VAD) for the application to variable-rate speech coding. The developed VAD employs the decision-directed parameter estimation method for the likelihood ratio test. In addition, we propose an effective hang-over scheme which considers the previous observations by a first-order Markov process modeling of speech occurrences. According to our simulation results, the proposed VAD shows significantly better performances than the G.729B VAD in low signal-to-noise ratio (SNR) and vehicular noise environments.

1,341 citations

Journal ArticleDOI
TL;DR: A new type of data acquisition system, called a random demodulator, that is constructed from robust, readily available components that supports the empirical observations, and a detailed theoretical analysis of the system's performance is provided.
Abstract: Wideband analog signals push contemporary analog-to-digital conversion (ADC) systems to their performance limits. In many applications, however, sampling at the Nyquist rate is inefficient because the signals of interest contain only a small number of significant frequencies relative to the band limit, although the locations of the frequencies may not be known a priori. For this type of sparse signal, other sampling strategies are possible. This paper describes a new type of data acquisition system, called a random demodulator, that is constructed from robust, readily available components. Let K denote the total number of frequencies in the signal, and let W denote its band limit in hertz. Simulations suggest that the random demodulator requires just O(K log(W/K)) samples per second to stably reconstruct the signal. This sampling rate is exponentially lower than the Nyquist rate of W hertz. In contrast to Nyquist sampling, one must use nonlinear methods, such as convex programming, to recover the signal from the samples taken by the random demodulator. This paper provides a detailed theoretical analysis of the system's performance that supports the empirical observations.

1,138 citations

Journal ArticleDOI
TL;DR: A subjective scale for the measurement of pitch was constructed from determinations of the half-value of pitches at various frequencies as mentioned in this paper, which differs from both the musical scale and the frequency scale, neither of which is subjective.
Abstract: A subjective scale for the measurement of pitch was constructed from determinations of the half‐value of pitches at various frequencies. This scale differs from both the musical scale and the frequency scale, neither of which is subjective. Five observers fractionated tones of 10 different frequencies at a loudness level of 60 db. From these fractionations a numerical scale was constructed which is proportional to the perceived magnitude of subjective pitch. In numbering the scale the 1000‐cycle tone was assigned the pitch of 1000 subjective units (mels). The close agreement of the pitch scale with an integration of the differential thresholds (DL's) shows that, unlike the DL's for loudness, all DL's for pitch are of uniform subjective magnitude. The agreement further implies that pitch and differential sensitivity to pitch are both rectilinear functions of extent on the basilar membrane. The correspondence of the pitch scale and the experimentally determined location of the resonant areas of the basilar membrane suggests that, in cutting a pitch in half, the observer adjusts the tone until it stimulates a position half‐way from the original locus to the apical end of the membrane. Measurement of the subjective size of musical intervals (such as octaves) in terms of the pitch scale shows that the intervals become larger as the frequency of the mid‐point of the interval increases (except in the two highest audible octaves). This result confirms earlier judgments as to the relative size of octaves in different parts of the frequency range.

1,036 citations