scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the Acoustical Society of America in 2013"


Journal ArticleDOI
TL;DR: DEMAND (Diverse Environments Multi-channel Acoustic Noise Database) is provided, providing a set of 16-channel noise files recorded in a variety of indoor and outdoor settings to encourage research into algorithms beyond the stereo setup.
Abstract: Multi-microphone arrays allow for the use of spatial filtering techniques that can greatly improve noise reduction and source separation. However, for speech and audio data, work on noise reduction or separation has focused primarily on one- or two-channel systems. Because of this, databases of multichannel environmental noise are not widely available. DEMAND (Diverse Environments Multi-channel Acoustic Noise Database) addresses this problem by providing a set of 16-channel noise files recorded in a variety of indoor and outdoor settings. The data was recorded using a planar microphone array consisting of four staggered rows, with the smallest distance between microphones being 5 cm and the largest being 21.8 cm. DEMAND is freely available under a Creative Commons license to encourage research into algorithms beyond the stereo setup.

413 citations


Journal ArticleDOI
TL;DR: Testing using normal-hearing and HI listeners indicated that intelligibility increased following processing in all conditions, and increases were larger for HI listeners, for the modulated background, and for the least-favorable SNRs.
Abstract: Despite considerable effort, monaural (single-microphone) algorithms capable of increasing the intelligibility of speech in noise have remained elusive. Successful development of such an algorithm is especially important for hearing-impaired (HI) listeners, given their particular difficulty in noisy backgrounds. In the current study, an algorithm based on binary masking was developed to separate speech from noise. Unlike the ideal binary mask, which requires prior knowledge of the premixed signals, the masks used to segregate speech from noise in the current study were estimated by training the algorithm on speech not used during testing. Sentences were mixed with speech-shaped noise and with babble at various signal-to-noise ratios (SNRs). Testing using normal-hearing and HI listeners indicated that intelligibility increased following processing in all conditions. These increases were larger for HI listeners, for the modulated background, and for the least-favorable SNRs. They were also often substantial, allowing several HI listeners to improve intelligibility from scores near zero to values above 70%.

213 citations


PatentDOI
Ira A. Gerson1
TL;DR: In this paper, a wireless system comprises at least one subscriber unit in wireless communication with an infrastructure, and each subscriber unit implements a speech recognition client, and the infrastructure comprises a Speech Recognition Server.
Abstract: A wireless system comprises at least one subscriber unit in wireless communication with an infrastructure. Each subscriber unit implements a speech recognition client, and the infrastructure comprises a speech recognition server. A given subscriber unit takes as input an unencoded speech signal that is subsequently parameterized by the speech recognition client. The parameterized speech is then provided to the speech recognition server that, in turn, performs speech recognition analysis on the parameterized speech. Information signals, based in part upon any recognized utterances identified by the speech recognition analysis, are subsequently provided to the subscriber unit. The information signals may be used to control the subscriber unit itself; to control one or more devices coupled to the subscriber unit, or may be operated upon by the subscriber unit or devices coupled thereto.

191 citations


Journal ArticleDOI
TL;DR: Impaired vowel articulation may be considered as a possible early marker of PD because complex tasks such as monologue are more likely to elicit articulatory deficits in parkinsonian speech, compared to other speaking tasks.
Abstract: The purpose of this study was to analyze vowel articulation across various speaking tasks in a group of 20 early Parkinson's disease (PD) individuals prior to pharmacotherapy. Vowels were extracted from sustained phonation, sentence repetition, reading passage, and monologue. Acoustic analysis was based upon measures of the first (F1) and second (F2) formant of the vowels /a/, /i/, and /u/, vowel space area (VSA), F2i/F2u and vowel articulation index (VAI). Parkinsonian speakers manifested abnormalities in vowel articulation across F2u, VSA, F2i/F2u, and VAI in all speaking tasks except sustained phonation, compared to 15 age-matched healthy control participants. Findings suggest that sustained phonation is an inappropriate task to investigate vowel articulation in early PD. In contrast, monologue was the most sensitive in differentiating between controls and PD patients, with classification accuracy up to 80%. Measurements of vowel articulation were able to capture even minor abnormalities in speech of PD patients with no perceptible dysarthria. In conclusion, impaired vowel articulation may be considered as a possible early marker of PD. A certain type of speaking task can exert significant influence on vowel articulation. Specifically, complex tasks such as monologue are more likely to elicit articulatory deficits in parkinsonian speech, compared to other speaking tasks.

189 citations


Journal ArticleDOI
TL;DR: The smallest detectable interaural time difference (ITD) for sine tones was measured for four human listeners to determine the dependence on tone frequency.
Abstract: The smallest detectable interaural time difference (ITD) for sine tones was measured for four human listeners to determine the dependence on tone frequency. At low frequencies, 250–700 Hz, threshold ITDs were approximately inversely proportional to tone frequency. At mid-frequencies, 700–1000 Hz, threshold ITDs were smallest. At high frequencies, above 1000 Hz, thresholds increased faster than exponentially with increasing frequency becoming unmeasurably high just above 1400 Hz. A model for ITD detection began with a biophysically based computational model for a medial superior olive (MSO) neuron that produced robust ITD responses up to 1000 Hz, and demonstrated a dramatic reduction in ITD-dependence from 1000 to 1500 Hz. Rate-ITD functions from the MSO model became inputs to binaural display models—both place based and rate-difference based. A place-based, centroid model with a rigid internal threshold reproduced almost all features of the human data. A signal-detection version of this model reproduced the high-frequency divergence but badly underestimated low-frequency thresholds. A rate-difference model incorporating fast contralateral inhibition reproduced the major features of the human threshold data except for the divergence. A combined, hybrid model could reproduce all the threshold data.

157 citations


Journal ArticleDOI
TL;DR: The multi-resolution sEPSM is demonstrated to account for intelligibility obtained in conditions with stationary and fluctuating interferers, and noisy speech distorted by reverberation or spectral subtraction, and support the hypothesis that the SNRenv is a powerful objective metric for speech intelligibility prediction.
Abstract: The speech-based envelope power spectrum model (sEPSM) presented by Jorgensen and Dau [(2011). J. Acoust. Soc. Am. 130, 1475–1487] estimates the envelope power signal-to-noise ratio (SNRenv) after modulation-frequency selective processing. Changes in this metric were shown to account well for changes of speech intelligibility for normal-hearing listeners in conditions with additive stationary noise, reverberation, and nonlinear processing with spectral subtraction. In the latter condition, the standardized speech transmission index [(2003). IEC 60268-16] fails. However, the sEPSM is limited to conditions with stationary interferers, due to the long-term integration of the envelope power, and cannot account for increased intelligibility typically obtained with fluctuating maskers. Here, a multi-resolution version of the sEPSM is presented where the SNRenv is estimated in temporal segments with a modulation-filter dependent duration. The multi-resolution sEPSM is demonstrated to account for intelligibility ...

152 citations


Journal ArticleDOI
TL;DR: A theoretical approach is developed to calculate the radiation force of an arbitrary acoustic beam on an elastic sphere in a liquid or gas medium by employing conventional angular spectrum decomposition to derive expressions for components of the radiation stress tensor.
Abstract: A theoretical approach is developed to calculate the radiation force of an arbitrary acoustic beam on an elastic sphere in a liquid or gas medium. First, the incident beam is described as a sum of plane waves by employing conventional angular spectrum decomposition. Then, the classical solution for the scattering of a plane wave from an elastic sphere is applied for each plane-wave component of the incident field. The net scattered field is expressed as a superposition of the scattered fields from all angular spectrum components of the incident beam. With this formulation, the incident and scattered waves are superposed in the far field to derive expressions for components of the radiation stress tensor. These expressions are then integrated over a spherical surface to analytically describe the radiation force on an elastic sphere. Limiting cases for particular types of incident beams are presented and are shown to agree with known results. Finally, the analytical expressions are used to calculate radiation forces associated with two specific focusing transducers.

143 citations


Journal ArticleDOI
TL;DR: Recognition for natural Institute of Electrical and Electronics Engineers (IEEE) sentences was measured in normal-hearing adults at two fixed signal-to-noise ratios (SNRs) in 16 backgrounds with the same long-term spectrum, and natural speech was always the most effective masker for a given number of talkers.
Abstract: Some of the most common interfering background sounds a listener experiences are the sounds of other talkers. In Experiment 1, recognition for natural Institute of Electrical and Electronics Engineers (IEEE) sentences was measured in normal-hearing adults at two fixed signal-to-noise ratios (SNRs) in 16 backgrounds with the same long-term spectrum: unprocessed speech babble (1, 2, 4, 8, and 16 talkers), noise-vocoded versions of the babbles (12 channels), noise modulated with the wide-band envelope of the speech babbles, and unmodulated noise. All talkers were adult males. For a given number of talkers, natural speech was always the most effective masker. The greatest changes in performance occurred as the number of talkers in the maskers increased from 1 to 2 or 4, with small changes thereafter. In Experiment 2, the same targets and maskers (1, 2, and 16 talkers) were used to measure speech reception thresholds (SRTs) adaptively. Periodicity in the target was also manipulated by noise-vocoding, which led to considerably higher SRTs. The greatest masking effect always occurred for the masker type most similar to the target, while the effects of the number of talkers were generally small. Implications are drawn with reference to glimpsing, informational vs energetic masking, overall SNR, and aspects of periodicity.

143 citations


Journal ArticleDOI
TL;DR: A speech-in-noise test which uses digit triplets in steady-state speech noise was developed and the feasibility of the test was approved in a study where reference SRT values were gathered in a representative set of 1386 listeners over 60 years of age.
Abstract: A speech-in-noise test which uses digit triplets in steady-state speech noise was developed. The test measures primarily the auditory, or bottom-up, speech recognition abilities in noise. Digit triplets were formed by concatenating single digits spoken by a male speaker. Level corrections were made to individual digits to create a set of homogeneous digit triplets with steep speech recognition functions. The test measures the speech reception threshold (SRT) in long-term average speech-spectrum noise via a 1-up, 1-down adaptive procedure with a measurement error of 0.7 dB. One training list is needed for naive listeners. No further learning effects were observed in 24 subsequent SRT measurements. The test was validated by comparing results on the test with results on the standard sentences-in-noise test. To avoid the confounding of hearing loss, age, and linguistic skills, these measurements were performed in normal-hearing subjects with simulated hearing loss. The signals were spectrally smeared and/or low-pass filtered at varying cutoff frequencies. After correction for measurement error the correlation coefficient between SRTs measured with both tests equaled 0.96. Finally, the feasibility of the test was approved in a study where reference SRT values were gathered in a representative set of 1386 listeners over 60 years of age.

142 citations


Journal ArticleDOI
TL;DR: It was revealed that the perceptual dimensions of the environment were different from the noise levels, and the acoustic comfort factor related to soundscape quality considerably influenced preference for the overall environment at a higher level of road traffic noise.
Abstract: The aim of this study is to investigate the effect of audio-visual components on environmental quality to improve soundscape Natural sounds with road traffic noise and visual components in urban streets were evaluated through laboratory experiments Waterfall and stream water sounds, as well as bird sounds, were selected to enhance the soundscape Sixteen photomontages of a streetscape were constructed in combination with two types of water features and three types of vegetation which were chosen as positive visual components The experiments consisted of audio-only, visual-only, and audio-visual conditions The preferences and environmental qualities of the stimuli were evaluated by a numerical scale and 12 pairs of adjectives, respectively The results showed that bird sounds were the most preferred among the natural sounds, while the sound of falling water was found to degrade the soundscape quality when the road traffic noise level was high The visual effects of vegetation on aesthetic preference were significant, but those of water features relatively small It was revealed that the perceptual dimensions of the environment were different from the noise levels Particularly, the acoustic comfort factor related to soundscape quality considerably influenced preference for the overall environment at a higher level of road traffic noise

142 citations


Journal ArticleDOI
TL;DR: These findings suggest that generalization of foreign-accent adaptation is the result of exposure to systematic variability in accented speech that is similar across talker-independent but accent-dependent learning after training on multiple talkers from multiple language backgrounds.
Abstract: Foreign-accented speech can be difficult to understand but listeners can adapt to novel talkers and accents with appropriate experience. Previous studies have demonstrated talker-independent but accent-dependent learning after training on multiple talkers from a single language background. Here, listeners instead were exposed to talkers from five language backgrounds during training. After training, listeners generalized their learning to novel talkers from language backgrounds both included and not included in the training set. These findings suggest that generalization of foreign-accent adaptation is the result of exposure to systematic variability in accented speech that is similar across talkers from multiple language backgrounds.

Journal ArticleDOI
TL;DR: Comparison of CI and normal hearing listeners showed that the CI data were best modeled by a vocoder using Gaussian-pulsed tones with 1.5 mm bandwidth, suggesting that interaural matching of electrodes is important for binaural cues to be maximally effective.
Abstract: Bilateral cochlear implants (CIs) have provided some success in improving spatial hearing abilities to patients, but with large variability in performance. One reason for the variability is that there may be a mismatch in the place-of-stimulation arising from electrode arrays being inserted at different depths in each cochlea. Goupell et al. [(2013b). J. Acoust. Soc. Am. 133(4), 2272–2287] showed that increasing interaural mismatch led to non-fused auditory images and poor lateralization of interaural time differences in normal hearing subjects listening to a vocoder. However, a greater bandwidth of activation helped mitigate these effects. In the present study, the same experiments were conducted in post-lingually deafened bilateral CI users with deliberate and controlled interaural mismatch of single electrode pairs. Results show that lateralization was still possible with up to 3 mm of interaural mismatch, even when off-center, or multiple, auditory images were perceived. However, mismatched inputs are not ideal since it leads to a distorted auditory spatial map. Comparison of CI and normal hearing listeners showed that the CI data were best modeled by a vocoder using Gaussian-pulsed tones with 1.5 mm bandwidth. These results suggest that interaural matching of electrodes is important for binaural cues to be maximally effective.

Journal ArticleDOI
TL;DR: With the theory of error propagation of uncertainties it can be shown that prediction of reverberation times with accuracy better than the just noticeable difference requires input data in a quality which is not available from reverberation room measurements.
Abstract: Geometrical acoustics are used as a standard model for room acoustic design and consulting. Research on room acoustic simulation focuses on a more accurate modeling of propagation effects such as diffraction and other wave effects in rooms, and on scattering. Much progress was made in this field so that wave models also (for example, the boundary element method and the finite differences in time domain) can now be used for higher frequencies. The concepts and implementations of room simulation methods are briefly reviewed. After all, simulations in architectural acoustics are indeed powerful tools, but their reliability depends on the skills of the operator who has to create an adequate polygon model and has to choose the correct input data of boundary conditions such as absorption and scattering. Very little is known about the uncertainty of this input data. With the theory of error propagation of uncertainties it can be shown that prediction of reverberation times with accuracy better than the just noticeable difference requires input data in a quality which is not available from reverberation room measurements.

Journal ArticleDOI
TL;DR: It is shown that low frequency performance can be significantly improved by embedding periodically arranged resonant inclusions (slotted cylinders) into the porous matrix.
Abstract: The aim of this work is to design a layer of porous material with a high value of the absorption coefficient in a wide range of frequencies. It is shown that low frequency performance can be significantly improved by embedding periodically arranged resonant inclusions (slotted cylinders) into the porous matrix. The dissipation of the acoustic energy in a porous material due to viscous and thermal losses inside the pores is enhanced by the low frequency resonances of the inclusions and energy trapping between the inclusion and the rigid backing. A parametric study is performed in order to determine the influence of the geometry and the arrangement of the inclusions embedded in a porous layer on the absorption coefficient. The experiments confirm that low frequency absorption coefficient of a composite material is significantly higher than that of the porous layer without the inclusions.

Journal ArticleDOI
TL;DR: Based on the hypothesis that brainstem encoding of the temporal envelope is greater in humans with sensorineural hearing loss, speech-evoked brainstem responses were recorded in normal hearing and hearing impaired age-matched groups of older adults and there was a disruption in the balance of envelope-to-fine structure representation.
Abstract: Individuals with sensorineural hearing loss often report frustration with speech being loud but not clear, especially in background noise. Despite advanced digital technology, hearing aid users may resort to removing their hearing aids in noisy environments due to the perception of excessive loudness. In an animal model, sensorineural hearing loss results in greater auditory nerve coding of the stimulus envelope, leading to a relative deficit of stimulus fine structure. Based on the hypothesis that brainstem encoding of the temporal envelope is greater in humans with sensorineural hearing loss, speech-evoked brainstem responses were recorded in normal hearing and hearing impaired age-matched groups of older adults. In the hearing impaired group, there was a disruption in the balance of envelope-to-fine structure representation compared to that of the normal hearing group. This imbalance may underlie the difficulty experienced by individuals with sensorineural hearing loss when trying to understand speech in background noise. This finding advances the understanding of the effects of sensorineural hearing loss on central auditory processing of speech in humans. Moreover, this finding has clinical potential for developing new amplification or implantation technologies, and in developing new training regimens to address this relative deficit of fine structure representation.

Journal ArticleDOI
TL;DR: This paper presents a methodology referred to as sparse wavenumber analysis based on sparse recovery methods, which accurately recovers the Lamb wave's frequency-wavenumber representation with a limited number of surface mounted transducers.
Abstract: Guided waves in plates, known as Lamb waves, are characterized by complex, multimodal, and frequency dispersive wave propagation, which distort signals and make their analysis difficult. Estimating these multimodal and dispersive characteristics from experimental data becomes a difficult, underdetermined inverse problem. To accurately and robustly recover these multimodal and dispersive properties, this paper presents a methodology referred to as sparse wavenumber analysis based on sparse recovery methods. By utilizing a general model for Lamb waves, waves propagating in a plate structure, and robust l1 optimization strategies, sparse wavenumber analysis accurately recovers the Lamb wave's frequency-wavenumber representation with a limited number of surface mounted transducers. This is demonstrated with both simulated and experimental data in the presence of multipath reflections. With accurate frequency-wavenumber representations, sparse wavenumber synthesis is then used to accurately remove multipath interference in each measurement and predict the responses between arbitrary points on a plate.

Journal ArticleDOI
TL;DR: Numerical results provide an impetus for further designing acoustical tweezers for potential applications in particle entrapment and remote controlled manipulation.
Abstract: This work aims to model the acoustic radiation forces acting on an elastic sphere placed in an inviscid fluid. An expression of the axial and transverse forces exerted on the sphere is derived. The analysis is based on the scattering of an arbitrary acoustic field expanded in the spherical coordinate system centered on the spherical scatterer. The sphere is allowed to be arbitrarily located. The special case of high order Bessel beams, acoustical vortices, are considered. These types of beams have a helicoidal wave front, i.e., a screw-type phase singularity and hence, the beam has a central dark core of zero amplitude surrounded by an intense ring. Depending on the sphere's radius, different radial equilibrium positions may exist and the sphere can be set in rotation around the beam axis by an azimuthal force. This confirms the pseudo-angular moment transfer from the beam to the sphere. Cases where the axial force is directed opposite to the direction of the beam propagation are investigated and the potential use of Bessel beams as tractor beams is demonstrated. Numerical results provide an impetus for further designing acoustical tweezers for potential applications in particle entrapment and remote controlled manipulation.

Journal ArticleDOI
TL;DR: Results demonstrate the efficacy of all three methods by producing very sparse indications of damage at the correct locations even in the presence of model mismatch and significant noise.
Abstract: Ultrasonic guided waves are gaining acceptance for structural health monitoring and nondestructive evaluation of plate-like structures. One configuration of interest is a spatially distributed array of fixed piezoelectric devices. Typical operation consists of recording signals from all transmit-receive pairs and subtracting pre-recorded baselines to detect changes, possibly due to damage or other effects. While techniques such as delay-and-sum imaging as applied to differential signals are both simple and capable of detecting flaws, their performance is limited, particularly when there are multiple damage sites. Here a very different approach to imaging is considered that exploits the expected sparsity of structural damage; i.e., the structure is mostly damage-free. Differential signals are decomposed into a sparse linear combination of location-based components, which are pre-computed from a simple propagation model. The sparse reconstruction techniques of basis pursuit denoising and orthogonal matching pursuit are applied to achieve this decomposition, and a hybrid reconstruction method is also proposed and evaluated. Noisy simulated data and experimental data recorded on an aluminum plate with artificial damage are considered. Results demonstrate the efficacy of all three methods by producing very sparse indications of damage at the correct locations even in the presence of model mismatch and significant noise.

Journal ArticleDOI
TL;DR: This paper describes the spectral-temporally modulated ripple test and provides evidence that it is sensitive to changes in spectral resolution, as well as creating a modified spectral ripple test with dynamically changing ripples.
Abstract: Poor spectral resolution can be a limiting factor for hearing impaired listeners, particularly for complex listening tasks such as speech understanding in noise. Spectral ripple tests are commonly used to measure spectral resolution, but these tests contain a number of potential confounds that can make interpretation of the results difficult. To measure spectral resolution while avoiding those confounds, a modified spectral ripple test with dynamically changing ripples was created, referred to as the spectral-temporally modulated ripple test (SMRT). This paper describes the SMRT and provides evidence that it is sensitive to changes in spectral resolution.

Journal ArticleDOI
TL;DR: Results show that the FFRENV response is dominated by peripheral auditory channels responding to unresolved harmonics, although low-frequency channels driven by resolved harmonics also contribute, demonstrating the utility of the PLV for quantifying the strength of FFRenV across conditions.
Abstract: Two experiments, both presenting diotic, harmonic tone complexes (100 Hz fundamental), were conducted to explore the envelope-related component of the frequency-following response (FFRENV), a measure of synchronous, subcortical neural activity evoked by a periodic acoustic input. Experiment 1 directly compared two common analysis methods, computing the magnitude spectrum and the phase-locking value (PLV). Bootstrapping identified which FFRENV frequency components were statistically above the noise floor for each metric and quantified the statistical power of the approaches. Across listeners and conditions, the two methods produced highly correlated results. However, PLV analysis required fewer processing stages to produce readily interpretable results. Moreover, at the fundamental frequency of the input, PLVs were farther above the metric's noise floor than spectral magnitudes. Having established the advantages of PLV analysis, the efficacy of the approach was further demonstrated by investigating how different acoustic frequencies contribute to FFRENV, analyzing responses to complex tones composed of different acoustic harmonics of 100 Hz (Experiment 2). Results show that the FFRENV response is dominated by peripheral auditory channels responding to unresolved harmonics, although low-frequency channels driven by resolved harmonics also contribute. These results demonstrate the utility of the PLV for quantifying the strength of FFRENV across conditions.

Journal ArticleDOI
TL;DR: Habitat modeling with acoustic detections should give further insights into how niches and prey may have shaped species-specific FM pulse types.
Abstract: Beaked whale echolocation signals are mostly frequency-modulated (FM) upsweep pulses and appear to be species specific. Evolutionary processes of niche separation may have driven differentiation of beaked whale signals used for spatial orientation and foraging. FM pulses of eight species of beaked whales were identified, as well as five distinct pulse types of unknown species, but presumed to be from beaked whales. Current evidence suggests these five distinct but unidentified FM pulse types are also species-specific and are each produced by a separate species. There may be a relationship between adult body length and center frequency with smaller whales producing higher frequency signals. This could be due to anatomical and physiological restraints or it could be an evolutionary adaption for detection of smaller prey for smaller whales with higher resolution using higher frequencies. The disadvantage of higher frequencies is a shorter detection range. Whales echolocating with the highest frequencies, or broadband, likely lower source level signals also use a higher repetition rate, which might compensate for the shorter detection range. Habitat modeling with acoustic detections should give further insights into how niches and prey may have shaped species-specific FM pulse types.

Journal ArticleDOI
TL;DR: A hardware and software system was developed to detect, classify, and report 14 call types produced by 4 species of baleen whales in real time from ocean gliders, accompanied by real-time acoustic detections of the same species by the glider within ±12 h of the sighting time.
Abstract: In the past decade, much progress has been made in real-time passive acoustic monitoring of marine mammal occurrence and distribution from autonomous platforms (e.g., gliders, floats, buoys), but current systems focus primarily on a single call type produced by a single species, often from a single location. A hardware and software system was developed to detect, classify, and report 14 call types produced by 4 species of baleen whales in real time from ocean gliders. During a 3-week deployment in the central Gulf of Maine in late November and early December 2012, two gliders reported over 25 000 acoustic detections attributed to fin, humpback, sei, and right whales. The overall false detection rate for individual calls was 14%, and for right, humpback, and fin whales, false predictions of occurrence during 15-min reporting periods were 5% or less. Transmitted pitch tracks—compact representations of sounds—allowed unambiguous identification of both humpback and fin whale song. Of the ten cases when whales were sighted during aerial or shipboard surveys and a glider was within 20 km of the sighting location, nine were accompanied by real-time acoustic detections of the same species by the glider within ±12 h of the sighting time.

Journal ArticleDOI
TL;DR: The envelope-based metrics are used to characterize cross-linguistic differences in speech rhythm in the UC San Diego Speech Lab corpus of English, German, Greek, Italian, Korean, and Spanish speech elicit in read sentences, read passages, and spontaneous speech.
Abstract: This study presents a method for analyzing speech rhythm using empirical mode decomposition of the speech amplitude envelope, which allows for extraction and quantification of syllabic- and supra-syllabic time-scale components of the envelope. The method of empirical mode decomposition of a vocalic energy amplitude envelope is illustrated in detail, and several types of rhythm metrics derived from this method are presented. Spontaneous speech extracted from the Buckeye Corpus is used to assess the effect of utterance length on metrics, and it is shown how metrics representing variability in the supra-syllabic time-scale components of the envelope can be used to identify stretches of speech with targeted rhythmic characteristics. Furthermore, the envelope-based metrics are used to characterize cross-linguistic differences in speech rhythm in the UC San Diego Speech Lab corpus of English, German, Greek, Italian, Korean, and Spanish speech elicited in read sentences, read passages, and spontaneous speech. The envelope-based metrics exhibit significant effects of language and elicitation method that argue for a nuanced view of cross-linguistic rhythm patterns.

Journal ArticleDOI
TL;DR: The results show that the individual soundwalk procedure has advantages for measuring diverse subjective responses and for obtaining the perceived elements of the urban soundscape.
Abstract: This study proposes a soundwalk procedure for evaluating urban soundscapes. Previous studies, which adopted soundwalk methodologies for investigating participants' responses to visual and acoustic environments, were analyzed considering type, evaluation position, measurement, and subjective assessment. An individual soundwalk procedure was then developed based on asking individual subjects to walk and select evaluation positions where they perceived any positive or negative characteristics of the urban soundscape. A case study was performed in urban spaces and the results were compared with those of the group soundwalk to validate the individual soundwalk procedure. Thirty subjects (15 architects and 15 acousticians) participated in the soundwalk. During the soundwalk, the subjects selected a total of 196 positions, and those were classified into 4 groups. It was found that soundscape perceptions were dominated by acoustic comfort, visual images, and openness. It was also revealed that perceived elements of the acoustic environment and visual image differed across classified soundscape groups, and there was a difference between architects and acousticians in terms of how they described their impressions of the soundscape elements. The results show that the individual soundwalk procedure has advantages for measuring diverse subjective responses and for obtaining the perceived elements of the urban soundscape.

Journal ArticleDOI
TL;DR: Effects of interaural frequency mismatch on binaural processing were studied in normal-hearing (NH) listeners using band-limited pulse trains, thereby avoiding confounding factors that may occur in CI users.
Abstract: Although bilateral cochlear implantation has the potential to improve sound localization and speech understanding in noise, obstacles exist in presenting maximally useful binaural information to bilateral cochlear-implant (CI) users One obstacle is that electrode arrays may differ in cochlear position by several millimeters, thereby stimulating different neural populations Effects of interaural frequency mismatch on binaural processing were studied in normal-hearing (NH) listeners using band-limited pulse trains, thereby avoiding confounding factors that may occur in CI users In experiment 1, binaural image fusion was measured to capture perceptual number, location, and compactness Subjects heard a single, compact image on 73% of the trials In experiment 2, intracranial image location was measured for different interaural time differences (ITDs) and interaural level differences (ILDs) For larger mismatch, locations perceptually shifted towards the ear with the higher carrier frequency In experiment 3, ITD and ILD just-noticeable differences (JNDs) were measured JNDs increased with decreasing bandwidth and increasing mismatch, but were always measurable up to 3 mm of mismatch If binaural-hearing mechanisms are similar between NH and CI subjects, these results may explain reduced sensitivity of ITDs and ILDs in CI users Large mismatches may lead to distorted spatial maps and reduced binaural image fusion

Journal ArticleDOI
TL;DR: Methods for the fully automatic detection and species classification of odontocete whistles are described and a classifier has been developed specifically to work with fragmented whistle detections.
Abstract: Methods for the fully automatic detection and species classification of odontocete whistles are described. The detector applies a number of noise cancellation techniques to a spectrogram of sound data and then searches for connected regions of data which rise above a pre-determined threshold. When tested on a dataset of recordings which had been carefully annotated by a human operator, the detector was able to detect (recall) 79.6% of human identified sounds that had a signal-to-noise ratio above 10 dB, with 88% of the detections being valid. A significant problem with automatic detectors is that they tend to partially detect whistles or break whistles into several parts. A classifier has been developed specifically to work with fragmented whistle detections. By accumulating statistics over many whistle fragments, correct classification rates of over 94% have been achieved for four species. The success rate is, however, heavily dependent on the number of species included in the classifier mix, with the me...

Journal ArticleDOI
TL;DR: A socio-acoustic survey carried out in three large urban parks in Rome confirms that the sound environment in urban parks is often considered as "good" or "excellent" even if the sound pressure level is nearly always higher than the limits commonly used to define quiet areas.
Abstract: The present paper reports a socio-acoustic survey carried out in three large urban parks in Rome, selected on the basis of the outcome of a preliminary online survey. According to the experimental protocol applied in a previous study carried out in Milan and Naples, binaural recordings in 85 sites and interviews with 266 users of the three parks were performed only during the day in summertime. On the basis of selected acoustical descriptors, the sonic environment of the three parks was categorized and, thanks to statistical analysis, three clusters were identified. The results confirm that the sound environment in urban parks is often considered as “good” or “excellent” even if the sound pressure level is nearly always higher than the limits commonly used to define quiet areas. This is due to the influence of other factors, such as the presence of trees, natural features, and the tranquility; all of these components cannot be neglected in the assessment of the soundscape because they directly affect the psychological state of the person.

Journal ArticleDOI
TL;DR: It is shown that accurate leading edge noise predictions can be made when assuming an inviscid meanflow, but that it is not valid to assume a uniform meanflow.
Abstract: Computational aeroacoustic methods are applied to the modeling of noise due to interactions between gusts and the leading edge of real symmetric airfoils Single frequency harmonic gusts are interacted with various airfoil geometries at zero angle of attack The effects of airfoil thickness and leading edge radius on noise are investigated systematically and independently for the first time, at higher frequencies than previously used in computational methods Increases in both leading edge radius and thickness are found to reduce the predicted noise This noise reduction effect becomes greater with increasing frequency and Mach number The dominant noise reduction mechanism for airfoils with real geometry is found to be related to the leading edge stagnation region It is shown that accurate leading edge noise predictions can be made when assuming an inviscid meanflow, but that it is not valid to assume a uniform meanflow Analytic flat plate predictions are found to over-predict the noise due to a NACA 0002 airfoil by up to 3 dB at high frequencies The accuracy of analytic flat plate solutions can be expected to decrease with increasing airfoil thickness, leading edge radius, gust frequency, and Mach number

Journal ArticleDOI
TL;DR: Overall, the time course of perception of velum lowering in American English indicates that the dynamics of perception parallelThe dynamics of the gestural information encoded in the acoustic signal.
Abstract: The perception of coarticulated speech as it unfolds over time was investigated by monitoring eye movements of participants as they listened to words with oral vowels or with late or early onset of anticipatory vowel nasalization. When listeners heard [CṼNC] and had visual choices of images of CVNC (e.g., send) and CVC (said) words, they fixated more quickly and more often on the CVNC image when onset of nasalization began early in the vowel compared to when the coarticulatory information occurred later. Moreover, when a standard eye movement programming delay is factored in, fixations on the CVNC image began to occur before listeners heard the nasal consonant. Listeners' attention to coarticulatory cues for velum lowering was selective in two respects: (a) listeners assigned greater perceptual weight to coarticulatory information in phonetic contexts in which [Ṽ] but not N is an especially robust property, and (b) individual listeners differed in their perceptual weights. Overall, the time course of perception of velum lowering in American English indicates that the dynamics of perception parallel the dynamics of the gestural information encoded in the acoustic signal. In real-time processing, listeners closely track unfolding coarticulatory information in ways that speed lexical activation.

Journal ArticleDOI
TL;DR: The results suggest that working memory capacity is associated with release from informational masking by semantically related information, and additionally with the encoding, storage, or retrieval of speech content in memory.
Abstract: This study examined how semantically related information facilitates the intelligibility of spoken sentences in the presence of masking sound, and how this facilitation is influenced by masker type and by individual differences in cognitive functioning. Dutch sentences were masked by stationary noise, fluctuating noise, or an interfering talker. Each sentence was preceded by a text cue; cues were either three words that were semantically related to the sentence or three unpronounceable nonwords. Speech reception thresholds were adaptively measured. Additional measures included working memory capacity (reading span and size comparison span), linguistic closure ability (text reception threshold), and delayed sentence recognition. Word cues facilitated speech perception in noise similarly for all masker types. Cue benefit was related to reading span performance when the masker was interfering speech, but not when other maskers were used, and it did not correlate with text reception threshold or size comparison span. Better reading span performance was furthermore associated with enhanced delayed recognition of sentences preceded by word relative to nonword cues, across masker types. The results suggest that working memory capacity is associated with release from informational masking by semantically related information, and additionally with the encoding, storage, or retrieval of speech content in memory.