scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A 90 nm CMOS, $6\ {\upmu {\text{W}}}$ Power-Proportional Acoustic Sensing Frontend for Voice Activity Detection

TL;DR: In this article, the authors presented a new power-proportional sensing paradigm and the use of machine-learning-assisted moderate-precision analog analytics for classification of speech and non-speech.
Abstract: This work presents a ${\text{sub}}{\text{-}}6\ \upmu {\text{W}} $ acoustic frontend for speech/non-speech classification in a voice activity detection (VAD) in 90 nm CMOS. Power consumption of the VAD system is minimized by architectural design around a new power-proportional sensing paradigm and the use of machine-learning-assisted moderate-precision analog analytics for classification. Power-proportional sensing allows for hierarchical and context-aware scaling of the frontend’s power consumption depending on the complexity of the ongoing information extraction, while the use of analog analytics brings increased power efficiency through switching on / off the computation of individual features depending on the features’ usefulness in a particular context. The proposed VAD system reduces the power consumption by $\text{{10}} \times $ as compared to state-of-the-art (SotA) systems and yet achieves an 89% average hit rate (HR) for a 12 dB signal-to-acoustic-noise ratio (SANR) in babble context, which is at par with software-based VAD systems.

Summary (3 min read)

Introduction

  • Power consumption of the VAD system is minimized by architectural design around a new Power-Proportional sensing paradigm and the use of machine-learning assisted moderate-precision analog analytics for classification.
  • Power-Proportional sensing allows for hierarchical and context-aware scaling of the frontend’s power consumption depending on the complexity of the ongoing information extraction, while the use of analog analytics brings increased power efficiency through switching on/off the computation of individual features depending on the features’ usefulness in a particular context.
  • Technological innovations are changing the way the authors interact with electronic devices.
  • Yet, the information content in raw signals and its application relevance dynamically varies depending on the operating context.
  • VAD systems distinguish speech from non-speech in different background noise contexts for varying signal to acoustic noise ratios (SANR).

A. Power-Proportional Sensing

  • The core premise for Power-Proportional sensing is that power consumption of the sensing system scales proportionally with the complexity of the sensing task.
  • First, the amount of information extracted from the incoming signal can scale in complexity.
  • In such an architecture each processing stage extracts more complex information than the previous stage while consuming more power.
  • Context-awareness enables Power-Proportional sensing to scale power as the background noise context scales the complexity of information extraction, as shown in bold in Fig.
  • SotA sensing systems do not exploit the power scaling opportunity offered by the above scenarios, and typically operate constantly in full processing mode.

B. Power Efficiency through Analog Analytics

  • The Power-Proportional sensing paradigm as highlighted in previous paragraph needs complexity and precision dependent power scalable hardware blocks.
  • Reduction in supply voltage due to technology scaling allows more power efficient digital circuits and questions the beneficial analog behavior in advanced technologies.
  • This is because with scaling, the cost of maintaining the same precision in analog increases as a larger bias > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 5 current is needed to reduce the noise-floor compensating for reduction in signal swing.
  • Hence, absolute precision requirements for such systems are rather modest, and mismatches and offset impairments are automatically taken care of by the embedded trained classifier in the loop.
  • As demonstrated by this work, as well as some existing works, machine learning assisted [13, 14] and/or digital calibration [15] can improve SNR by 6 – 10 dB for comparable power which pushes the efficiency crossover point in the rightward direction as shown in Fig.

III. SYSTEM ARCHITECTURE AND SPECIFICATIONS

  • This section highlights the use of the aforementioned key principles in the developed VAD architecture [16] and derives the specifications for the analog/mixed-signal building blocks.
  • If the signal is speech, the classifier wakes up the microcontroller for more advanced processing.
  • This allows scaling the power with necessary information as outlined in Section II.
  • As further modelled in subsection B, considering that the analog feature-extraction blocks are in the loop during this training operation, all static analog impairments such as mismatch, gain errors, or offsets are absorbed in the trained feature thresholds and do not affect the classification > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 7 accuracy.
  • B derives specifications for the targeted VAD system.

B. Specifications for VAD system

  • This section first derives the system level specifications and then the specifications for individual analog blocks.
  • Mathematically, each analog feature is defined as ∗ (1) where is the amplified acoustic signal, is the impulse response of band pass filter used to decompose the input signal into a smaller frequency band, ,∗ and represent the absolute value, convolution, and averaging respectively.
  • The MATLAB model varies the number of computed features in the above frequency range by scaling the Q factor of the band pass filters.
  • The results of the above simulation are shown in Fig. 4(a).
  • It can be seen that more features improve classification > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 8 accuracy, yet accuracy gains diminish beyond 16 features allowing us to limit their design to a maximum of 16 (individually (dis)activated) features.

IV. SYSTEM IMPLEMENTATION

  • This Section details the implementation nuances of the individual system blocks discussed in the previous section: namely the wakeup detector, the analog feature-extractor and the embedded mixed-signal classifier.
  • A further subsection discusses system training for the complete VAD system before discussing one-time calibration and measurement results in Section V.

A. Wakeup detector

  • The always-awake threshold-based wakeup detector acts as the system’s watch-dog that wakes up the analog feature-extractor only when a signal of sufficient strength is detected.
  • The wakeup detector is a low power 3-phase comparator and its schematic is shown in Fig.
  • Each amplifier is a PMOS input source-coupled single-ended differential amplifier and can be turned on/off individually to save power depending on the microphone’s signal-level and is designed to provide a mid-band gain of 20 dB.
  • Measured power consumption of this block is 700 nW when all four amplifier stages are turned on, and excluding the external bias.
  • > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 10.

B. Analog feature-extractor

  • On receiving the wakeup signal from the threshold based wakeup detector, the analog featureextractor decomposes the input signal into the set of 16 features.
  • This contributes to Power-Proportional information extraction, as it allows turning off amplifier stages of unused features along with all other circuitry involved in individual feature computation.
  • A.2. The sub-blocks of the analog feature-extractor are now explained in more detail.
  • To cover for this, the f-3dB of the amplifiers in each band also increases progressively from band 1 to band 16.
  • The architecture of the currentmode averaging is shown in Fig. 12.

C. Decision tree based classifier

  • The extracted feature subset, af5 - af12, is passed on to the on-chip classifier (Fig. 5) while the complete feature-set af1 - af16 can be passed on to an off-chip ADC for more complex information extraction, such as context-change detection and retraining the classifier as in [22].
  • In these cases, the Nyquist sampling rate for the features is only 16x2x16 = 512 Hz instead of 8 kHz for audio.
  • Each node of the decision tree can be configured to select one feature out of af5 - af12.
  • To this end, the on-chip decision tree classifier is trained with their modified C4.5 algorithm with 160 s of labeled data from the standardized NOIZEUS database [23].
  • The authors modification to C4.5 maximizes the information-gain/watt and therefore outputs a resourceefficient model that maximizes the information capture while minimizing the power [22].

B. System measurement results

  • The chip is integrated with the microcontroller using external level-shifters and DACs, to form the complete VAD.
  • Fig. 20 shows a one-time calibration to characterize for mismatch in the ADC and DAC paths.
  • This subsection also displays the classification accuracy results for the complete VAD system and illustrates the achieved Power-Proportionality.
  • Receiver operating characteristic (ROC) curves characterize the classifier systems and depict hit-rates (HR) for the variables under observation [24].
  • The power consumption for signal detection is measured to be below 1 µW, whereas power consumption for classification varies depending on complexity of the operating context and has an upper bound of 6 µW.

VI. CONCLUSIONS

  • This work demonstrates a power efficient acoustic sensing frontend for speech/non-speech classification in a voice activity detection system.
  • K. Badami, S. Lauwereins, W. Meert, and M. Verhelst, ‘Context-aware hierarchical informationsensing in a 6μW 90nm CMOS voice activity detector’, 2015 IEEE International Solid-State Circuits Conference - Digest of Technical Papers, 2015.

Did you find this useful? Give us your feedback

Figures (23)

Content maybe subject to copyright    Report

Citati
o
Archi
v
Publi
s
Journ
Auth
o
IR
o
n
v
ed version
s
hed version
al homepag
e
o
r contact
Ko
m
A
9
for
IE
E
Aut
pa
p
htt
p
e
htt
p
ko
htt
p
m
ail Badami,
S
9
0 nm CMO
S
Voice Activi
t
E
E Journal of
S
hor manuscri
p
er, but witho
u
p
://ieeexplore.
i
p
://sscs.ieee.o
m
ail.badami@
e
p
s://lirias.kule
u
S
teven Lauw
e
S
, 6 μW Pow
t
y Detection
S
olid State Ci
r
pt: the conte
n
u
t the final typ
e
i
eee.org/docu
rg/en/publica
t
e
sat.kuleuve
n
u
ven.be/handl
e
e
reins, Wann
e
er-Proportio
n
r
cuits, Vol. 51
,
n
t is identica
l
e
setting by th
e
ment/731502
5
t
ions/ieee-
j
ou
r
n
.be
e
/123456789
/
s Meert, Mari
a
n
al Acoustic
,
Issue 1
l
to the cont
e
e
publisher
5
/?arnumber
=
r
nal-of-solid-s
t
/
514022
a
n Verhelst, (
2
Sensing Fr
o
e
nt of the pu
b
=
7315025
t
ate-circuits-
j
s
2
014),
o
ntend
b
lished
sc

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO
EDIT) <
1
A 90 nm CMOS, 6 μW Power-Proportional Acoustic
Sensing Frontend for Voice Activity Detection
Abstract – This work presents a sub-6 µW acoustic front-end for speech/non-speech
classification in a voice activity detection (VAD) in 90 nm CMOS. Power consumption of the
VAD system is minimized by architectural design around a new Power-Proportional sensing
paradigm and the use of machine-learning assisted moderate-precision analog analytics for
classification. Power-Proportional sensing allows for hierarchical and context-aware scaling of
the frontend’s power consumption depending on the complexity of the ongoing information
extraction, while the use of analog analytics brings increased power efficiency through switching
on/off the computation of individual features depending on the features’ usefulness in a
particular context. The proposed VAD system reduces the power consumption by 10X as
compared to state-of-the-art systems and yet achieves an 89% average hit rate for a 12 dB signal
to acoustic noise ratio in babble context, which is at par with software based VAD systems.
I. INTRODUCTION
Technological innovations are changing the way we interact with electronic devices.
Interactions like voice control and gesture recognition are rapidly gaining popularity. Such
natural interactive systems do not only need many integrated sensors, but also always-awake,
reactive sensor frontends. These frontends generate large amounts of raw signals that state-of-the
art (SotA) frontends immediately digitize for processing on a DSP. This very robust approach is
Komail Badami, Steven Lauwereins, Wannes Meert, Marian Verhelst, KU Leuven, Belgium

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO
EDIT) <
2
not power efficient, as not all raw sensor signals are equally relevant. The net information
content of a sensed signal is quite often significantly smaller than the Nyquist rate [1-7]. Existing
works such as Information-Rate processing [1,2], Analog to Information conversion [3-5] and
Compressed Sensing [6,7] show power savings by extracting or compressing the information
from signals before digitizing the data. However, as these schemes operate in a static way, the
compression or extraction parameters are set beforehand. Yet, the information content in raw
signals and its application relevance dynamically varies depending on the operating context.
Operating such systems efficiently hence requires a dynamic system adaptation depending on the
context or signal information content. Existing systems do not perform such fine grain adaptive
behavior, which severely limits their power savings as shown by solid line in Fig. 1.
We propose a self-scalable, Power-Proportional sensing paradigm which gracefully scales the
system’s power consumption with the amount and complexity of extracted information, i.e. the
power consumption for such a system increases only as the task of information extraction gets
more complex. To this end, in this paper we propose key enablers for Power-Proportionality and
apply them to a proof of concept acoustic frontend for voice activity detection (VAD).
VAD systems distinguish speech from non-speech in different background noise contexts for
varying signal to acoustic noise ratios (SANR). SotA VAD systems [8-10] extract complex
features like Mel-Frequency Cepstral Coefficients, DCT etc. to differentiate speech from non-
speech. The high computational complexity of such features results in large power consumption,
typically about 50 - 100 µW [8-11] in addition to the power consumption of the required active
microphone. Such a continuous large power consumption is unacceptable for battery powered
always-on sensor frontends. This work exploits our new Power-Proportional sensing paradigm
along with moderate-precision, computationally-inexpensive, analog feature-extraction, coupled

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO
EDIT) <
3
with an embedded mixed-signal classifier to save more than 10X power consumption over SotA
without compromising on the classification accuracy.
The outline for this paper is as follows. Section II discusses insights into the design principles
for Power-Proportional sensing and explains the rationale behind the analog feature-extraction
instead of the commonly used digital scheme. Section III describes the architecture and
specification set for VAD while the detailed implementation is discussed in Section IV.
Measurement results for the chip and for the full VAD system are discussed in Section V.
II. KEY PRINCIPLES FOR POWER EFFICIENT SENSING
This section details the two key principles that allow our always-on sensing system to scale its
power consumption with the information extracted saving 10X power over SotA VAD systems.
A. Power-Proportional Sensing
The core premise for Power-Proportional sensing is that power consumption of the sensing
system scales proportionally with the complexity of the sensing task. The sensing process with
the target of information extraction can increase in complexity along two dimensions:
First, the amount of information extracted from the incoming signal can scale in complexity.
Consider for example, the task of speaker identification v/s speech detection. The former task
entails the later as a prerequisite first step, hence justifying the increase in power consumption.
Enabling hierarchical operation for tasks of increasing complexity allows scaling of power
consumption with complexity of information extraction. In such an architecture each processing
stage extracts more complex information than the previous stage while consuming more power.
This enables information extraction by necessity, as is shown on the horizontal-axis in Fig. 1.
Secondly, even if the amount of extracted information remains the same, distinguishing the
useful information from the background noise (the context) is subject to varying levels of

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO
EDIT) <
4
difficulty. For this case consider the complexity of speech detection in a quiet office, in contrast
to a noisy street environment. The amount of information needed is same in both cases, but in the
latter case as the background noise maps directly onto the information spectrum, it creates in-
band interference on the desired signal. As such, distinguishing speech from non-speech
becomes more complex, hence justifying the increase in power consumption. Context-awareness
enables Power-Proportional sensing to scale power as the background noise context scales the
complexity of information extraction, as shown in bold in Fig. 1. For the example above,
context-awareness allows to use a much smaller discriminating feature subset in a low noise
environment and a relatively larger subset for noisy background contexts, hence scaling power.
SotA sensing systems do not exploit the power scaling opportunity offered by the above
scenarios, and typically operate constantly in full processing mode. This plateaus the on-state
power consumption for SotA sensing systems independent of system utility as shown in Fig. 1.
B. Power Efficiency through Analog Analytics
The Power-Proportional sensing paradigm as highlighted in previous paragraph needs
complexity and precision dependent power scalable hardware blocks. Such power scaling with
precision is very different for analog and digital implementations. Analog power consumption
scales gradually for thermal noise limited system with low-to-medium precision, while digital
has a logarithmic power v/s precision profile. As it has been shown in [12] and in Fig. 2, for a
0.25 µm CMOS technology, analog computation is not only more power-efficient than digital for
low-to-medium resolution processing, but also exhibits better scalability.
Reduction in supply voltage due to technology scaling allows more power efficient digital
circuits and questions the beneficial analog behavior in advanced technologies. This is because
with scaling, the cost of maintaining the same precision in analog increases as a larger bias

Citations
More filters
Posted Content
TL;DR: Technologies for long-distance interaction with energy-constraint embedded devices are introduced and sleepy strategies for power management for the platform and the transmission are proposed.
Abstract: Long range wireless connectivity opens the door for new IoT applications. Low energy consumption is essential to enable long autonomy of devices powered by batteries or even relying on harvested energy. This paper introduces technologies for long-distance interaction with energy-constraint embedded devices. It proposes sleepy strategies for power management for the platform and the transmission. Adequate signal processing on the remote modules is demonstrated to play a crucial role to sustain the autonomy of these systems. The presented open-source development platform invites the signal processing community to a smooth validation of new algorithms and applications.

3 citations

Proceedings ArticleDOI
01 May 2018
TL;DR: This paper demonstrates an information-aware compressive sensing architecture for dynamic artifact detection of biophysiological signals in wearable applications and can reduce the system power consumption by up to 70% in the more extreme cases of signal corruption.
Abstract: The performance of traditional compressive sensing (CS) architectures has been tempered by dynamically changing real-world data. This paper demonstrates an information-aware compressive sensing (CS) architecture for dynamic artifact detection of biophysiological signals in wearable applications. Artifacts such as long pause, baseline wandering, and saturation often corrupt recorded data due to environmental factors. In wearable applications where power conservation and ultra-low power operation are paramount, this can lead to wasted power. By combining earlier proposed CS based architectures with an efficient analog feature-extraction (FE) and digital decision making, the sampling rate of the ADC and the integration window of the multiplying DAC can be reduced in presence of artifacts to save power. As shown, this technique can reduce the system power consumption by up to 70% in the more extreme cases of signal corruption. A serialized Walsh-Hadamard Transform (WHT) used for FE is proposed that dramatically simplifies the circuit implementation while the digital classifier comprising of quadratic Support Vector Machine (SVM) classifier ensures low power operation with accurate decision outcomes.

2 citations

Proceedings ArticleDOI
16 Jun 2020
TL;DR: This paper proposes a method to design a generic A2F converter usable for several signal types and proposes to use non uniform wavelet sampling in order to extract information for classification task.
Abstract: One of the main challenges in the field of wireless sensors is to increase their battery life. Analog-to-feature (A2F) conversion is an acquisition method thought for IoT devices, that perform classification tasks at sub-Nyquist rate, by extracting relevant features in the analog domain and then performing the classification step in the digital domain. Current A2F solutions are designed for a specific application, this paper proposes a method to design a generic A2F converter usable for several signal types. In order to extract information for classification task, we propose to use non uniform wavelet sampling, its drawback is that it brings redundancy and irrelevant information. To reach our goal of decreasing power consumption, we need to extract a small set of relevant features for classification. To achieve this, several features selection algorithms are tested for electrocardiogram (ECG) anomalies detection. We demonstrate that the detection rate of ECG anomalies can reach 98% with less than 10 features extracted.

2 citations

Journal ArticleDOI
18 Mar 2021-Sensors
TL;DR: In this paper, a switched inductor based acoustic sensor interface featuring a bandpass filter and an envelope detector is presented, which achieves a sensitivity of approximately 2 mV/mV in the passband, a four times lower sensitivity in the stopband, and power consumption of approximately 3.31 µW.
Abstract: With the growing need to understand our surroundings and improved means of sensor manufacturing, the concept of Internet of Things (IoT) is becoming more interesting. To enable continuous monitoring and event detection by IoT, the development of low power sensors and interfaces is required. In this work we present a novel, switched inductor based acoustic sensor interface featuring a bandpass filter and envelope detector, perform a sensitivity, frequency selectivity, and power consumption analysis of the circuit, and present its design parameters and their qualitative influence on circuit characteristics. We develop a prototype and present experimental characterization of the interface and its operation with input signals up to 20 mV peak-to-peak, at low acoustic frequencies from 100 Hz to 1 kHz. The prototype achieves a sensitivity of approximately 2 mV/mV in the passband, a four times lower sensitivity in the stopband, and a power consumption of approximately 3.31 µW. We compare the prototype interface to an interface consisting of an active bandpass filter and a passive voltage doubler using a prerecorded speedboat signal.

2 citations

Book ChapterDOI
01 Jan 2020
TL;DR: This chapter reviews performance trajectories of conventional data converters and discusses opportunities for application- and system-specific customizations toward analog-to-information conversion.
Abstract: Over the past several decades, the symbiotic interplay between technology push and application pull has led to remarkable performance gains in analog-to-digital interfaces. While there appears to be no weakening in application pull, bottom-up innovation has slowed down. A promising remedy is to explore application-specific optimizations based on system and signal insights. Specifically, analog-to-information converters are designed to extract only the most relevant information for the given application from an analog signal towards its digitization. This approach contrasts conventional analog-to-digital interfaces which are typically designed for faithful waveform preservation and subsequent information extraction in the digital domain. In this chapter, we review performance trajectories of conventional data converters and discuss opportunities for application- and system-specific customizations toward analog-to-information conversion.

2 citations

References
More filters
Journal ArticleDOI
TL;DR: The purpose of this article is to serve as an introduction to ROC graphs and as a guide for using them in research.

17,017 citations

Journal ArticleDOI
TL;DR: The theory of compressive sampling, also known as compressed sensing or CS, is surveyed, a novel sensing/sampling paradigm that goes against the common wisdom in data acquisition.
Abstract: Conventional approaches to sampling signals or images follow Shannon's theorem: the sampling rate must be at least twice the maximum frequency present in the signal (Nyquist rate). In the field of data conversion, standard analog-to-digital converter (ADC) technology implements the usual quantized Shannon representation - the signal is uniformly sampled at or above the Nyquist rate. This article surveys the theory of compressive sampling, also known as compressed sensing or CS, a novel sensing/sampling paradigm that goes against the common wisdom in data acquisition. CS theory asserts that one can recover certain signals and images from far fewer samples or measurements than traditional methods use.

9,686 citations

Journal ArticleDOI
TL;DR: An effective hang-over scheme which considers the previous observations by a first-order Markov process modeling of speech occurrences is proposed which shows significantly better performances than the G.729B VAD in low signal-to-noise ratio (SNR) and vehicular noise environments.
Abstract: In this letter, we develop a robust voice activity detector (VAD) for the application to variable-rate speech coding. The developed VAD employs the decision-directed parameter estimation method for the likelihood ratio test. In addition, we propose an effective hang-over scheme which considers the previous observations by a first-order Markov process modeling of speech occurrences. According to our simulation results, the proposed VAD shows significantly better performances than the G.729B VAD in low signal-to-noise ratio (SNR) and vehicular noise environments.

1,341 citations

Journal ArticleDOI
TL;DR: A new type of data acquisition system, called a random demodulator, that is constructed from robust, readily available components that supports the empirical observations, and a detailed theoretical analysis of the system's performance is provided.
Abstract: Wideband analog signals push contemporary analog-to-digital conversion (ADC) systems to their performance limits. In many applications, however, sampling at the Nyquist rate is inefficient because the signals of interest contain only a small number of significant frequencies relative to the band limit, although the locations of the frequencies may not be known a priori. For this type of sparse signal, other sampling strategies are possible. This paper describes a new type of data acquisition system, called a random demodulator, that is constructed from robust, readily available components. Let K denote the total number of frequencies in the signal, and let W denote its band limit in hertz. Simulations suggest that the random demodulator requires just O(K log(W/K)) samples per second to stably reconstruct the signal. This sampling rate is exponentially lower than the Nyquist rate of W hertz. In contrast to Nyquist sampling, one must use nonlinear methods, such as convex programming, to recover the signal from the samples taken by the random demodulator. This paper provides a detailed theoretical analysis of the system's performance that supports the empirical observations.

1,138 citations

Journal ArticleDOI
TL;DR: A subjective scale for the measurement of pitch was constructed from determinations of the half-value of pitches at various frequencies as mentioned in this paper, which differs from both the musical scale and the frequency scale, neither of which is subjective.
Abstract: A subjective scale for the measurement of pitch was constructed from determinations of the half‐value of pitches at various frequencies. This scale differs from both the musical scale and the frequency scale, neither of which is subjective. Five observers fractionated tones of 10 different frequencies at a loudness level of 60 db. From these fractionations a numerical scale was constructed which is proportional to the perceived magnitude of subjective pitch. In numbering the scale the 1000‐cycle tone was assigned the pitch of 1000 subjective units (mels). The close agreement of the pitch scale with an integration of the differential thresholds (DL's) shows that, unlike the DL's for loudness, all DL's for pitch are of uniform subjective magnitude. The agreement further implies that pitch and differential sensitivity to pitch are both rectilinear functions of extent on the basilar membrane. The correspondence of the pitch scale and the experimentally determined location of the resonant areas of the basilar membrane suggests that, in cutting a pitch in half, the observer adjusts the tone until it stimulates a position half‐way from the original locus to the apical end of the membrane. Measurement of the subjective size of musical intervals (such as octaves) in terms of the pitch scale shows that the intervals become larger as the frequency of the mid‐point of the interval increases (except in the two highest audible octaves). This result confirms earlier judgments as to the relative size of octaves in different parts of the frequency range.

1,036 citations