scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the Acoustical Society of America in 1991"


Journal ArticleDOI
TL;DR: In this article, a constant Q transform with a constant ratio of center frequency to resolution has been proposed to obtain a constant pattern in the frequency domain for sounds with harmonic frequency components.
Abstract: The frequencies that have been chosen to make up the scale of Western music are geometrically spaced. Thus the discrete Fourier transform (DFT), although extremely efficient in the fast Fourier transform implementation, yields components which do not map efficiently to musical frequencies. This is because the frequency components calculated with the DFT are separated by a constant frequency difference and with a constant resolution. A calculation similar to a discrete Fourier transform but with a constant ratio of center frequency to resolution has been made; this is a constant Q transform and is equivalent to a 1/24‐oct filter bank. Thus there are two frequency components for each musical note so that two adjacent notes in the musical scale played simultaneously can be resolved anywhere in the musical frequency range. This transform against log (frequency) to obtain a constant pattern in the frequency domain for sounds with harmonic frequency components has been plotted. This is compared to the conventio...

890 citations


Journal ArticleDOI
TL;DR: In this article, a two-dimensional Fourier transform (2D FFT) was used to measure the amplitudes and velocities of the Lamb waves propagating in a plate, the output of the transform being presented using an isometric projection which gives a three-dimensional view of the wave-number dispersion curves.
Abstract: A technique for the analysis of propagating multimode signals is presented. The method involves a two-dimensional Fourier transformation of the time history of the waves received at a series of equally spaced positions along the propagation path. The technique has been used to measure the amplitudes and velocities of the Lamb waves propagating in a plate, the output of the transform being presented using an isometric projection which gives a three-dimensional view of the wave-number dispersion curves. The results of numerical and experimental studies to measure the dispersion curves of Lamb waves propagating in 0.5-, 2.0-, and 3.0-mm-thick steel plates are presented. The results are in good agreement with analytical predictions and show the effectiveness of using the two-dimensional Fourier transform (2-D FFT) method to identify and measure the amplitudes of individual Lamb modes.

889 citations


PatentDOI
TL;DR: In this article, the authors presented a set of selectable catheter sheaths, including a sheath with an integral acoustically-transparent window, sheaths with end extensions that aid in positioning and a liquid injection-producing sheath.
Abstract: Acoustic imaging balloon catheters formed by a disposable liquid-confining sheath supporting a high fidelity, flexible drive shaft which carries on its end an ultrasound transducer and includes an inflatable dilatation balloon. The shaft and transducer rotate with sufficient speed and fidelity to produce real time images on a T.V. screen. In preferred embodiments, special features that contribute to the high fidelity of the drive shaft include the particular multi-filar construction of concentric, oppositely wound, interfering coils, a pre-loaded torque condition on the coils enhancing their interfering contact, and dynamic loading of the distal end of the probe, preferably with viscous drag. The coil rotating in the presence of liquid in the sheath is used to produce a desirable pressure in the region of the transducer. Numerous selectable catheter sheaths are shown including a sheath with an integral acoustically-transparent window, sheaths with end extensions that aid in positioning, a liquid injection-producing sheath, a sheath having its window section under tension employing an axially loaded bearing, a sheath carrying a dilatation or positioning balloon over the transducer, a sheath carrying a distal rotating surgical tool and a sheath used in conjunction with a side-viewing trocar.

851 citations


Journal ArticleDOI
TL;DR: The results demonstrate the importance of stimulus variability and task-related factors in training nonnative speakers to perceive novel phonetic contrasts that are not distinctive in their native language.
Abstract: Native speakers of Japanese learning English generally have difficulty differentiating the phonemes /r/ and /l/, even after years of experience with English. Previous research that attempted to train Japanese listeners to distinguish this contrast using synthetic stimuli reported little success, especially when transfer to natural tokens containing /r/ and /l/ was tested. In the present study, a different training procedure that emphasized variability among stimulus tokens was used. Japanese subjects were trained in a minimal pair identification paradigm using multiple natural exemplars contrasting /r/ and /l/ from a variety of phonetic environments as stimuli. A pretest‐posttest design containing natural tokens was used to assess the effects of training. Results from six subjects showed that the new procedure was more robust than earlier training techniques. Small but reliable differences in performance were obtained between pretest and posttest scores. The results demonstrate the importance of stimulus ...

798 citations


Journal ArticleDOI
TL;DR: The findings reviewed here clearly indicate that future studies of otoacoustic emissions will significantly increase the understanding of the basic mechanisms of cochlear function while, at the same time, provide a new and important clinical tool.
Abstract: Otoacoustic emissions measured in the external ear canal describe responses that the cochlea generates in the form of acoustic energy. For the convenience of discussing their principal features, emitted responses can be classified into several categories according to the type of stimulation used to evoke them. On this basis, four distinct but interrelated classes can be distinguished including spontaneous, transiently evoked, stimulus-frequency, and distortion-product otoacoustic emissions. The present review details the findings that have been described for each emission type according to this classification schema. Additionally, the known features of emitted responses are discussed for both normally hearing and hearing-impaired humans and experimental animals, and with respect to their potential clinical applications. The findings reviewed here clearly indicate that future studies of otoacoustic emissions will significantly increase our understanding of the basic mechanisms of cochlear function while, at the same time, provide a new and important clinical tool.

768 citations


Journal ArticleDOI
TL;DR: In a set of experiments involving 35 pairs of phonetically similar sentences representing seven types of structural contrasts, the perceptual evidence shows that some, but not all, of the pairs can be disambiguated on the basis of prosodic differences.
Abstract: Prosodic structure and syntactic structure are not identical; neither are they unrelated. Knowing when and how the two correspond could yield better quality speech synthesis, could aid in the disambiguation of competing syntactic hypotheses in speech understanding, and could lead to a more comprehensive view of human speech processing. In a set of experiments involving 35 pairs of phonetically similar sentences representing seven types of structural contrasts, the perceptual evidence shows that some, but not all, of the pairs can be disambiguated on the basis of prosodic differences. The phonological evidence relates the disambiguation primarily to boundary phenomena, although prominences sometimes play a role. Finally, phonetic analyses describing the attributes of these phonological markers indicate the importance of both absolute and relative measures.

528 citations


Journal ArticleDOI
TL;DR: A new voice source model that accounted for certain physiological aspects of vocal fold motion was developed and tested using speech synthesis, and applications include synthesis of natural sounding speech, synthesis and modeling of vocal disorders, and the development of speaker independent (or adaptive) speech recognition systems.
Abstract: The purpose of this study was to examine several factors of vocal quality that might be affected by changes in vocal fold vibratory patterns. Four voice types were examined: modal, vocal fry, falsetto, and breathy. Three categories of analysis techniques were developed to extract source‐related features from speech and electroglottographic (EGG) signals. Four factors were found to be important for characterizing the glottal excitations for the four voice types: the glottal pulse width, the glottal pulse skewness, the abruptness of glottal closure, and the turbulent noise component. The significance of these factors for voice synthesis was studied and a new voice source model that accounted for certain physiological aspects of vocal fold motion was developed and tested using speech synthesis. Perceptual listening tests were conducted to evaluate the auditory effects of the source model parameters upon synthesized speech. The effects of the spectral slope of the source excitation, the shape of the glottal excitation pulse, and the characteristics of the turbulent noise source were considered. Applications for these research results include synthesis of natural sounding speech, synthesis and modeling of vocal disorders, and the development of speaker independent (or adaptive) speech recognition systems.

498 citations


Journal ArticleDOI
TL;DR: Licklider made his original suggestion in an attempt to explain the human ability to perceive the pitch of a complex tone even though that tone contained no spectral component corresponding to that pitch.
Abstract: Licklider made his original suggestion in an attempt to explain the human ability to perceive the pitch of a complex tone even though that tone contained no spectral component corresponding to that pitch. He rejected the prevailing theory (Fletcher, 1924) that distortion products of nonlinear cochlear responses could wholly explain the phenomenon. He pointed to the fact that the waveform envelope of unresolved harmonic components could be used to extract pitch information if an autocorrelation analysis could be performed. He thought that this might be achieved by a delay line mechanism at a low level in the auditory nervous system. His theory depended on the idea that the harmonic com

487 citations


Journal ArticleDOI
TL;DR: In this article, the general Kirchhoff theory of sound propagation in a circular tube is shown to take a simpler form in a regime that includes both narrow and wide tubes, where the sound pressure is essentially constant through each cross section, and the excess density and sound pressure (when scaled by the equilibrium density and pressure of air) are comparable in magnitude.
Abstract: The general Kirchhoff theory of sound propagation in a circular tube is shown to take a considerably simpler form in a regime that includes both narrow and wide tubes. For tube radii greater than rw=10−3 cm and sound frequencies f such that rwf3/2<106 cm s−3/2, the Kirchhoff solution reduces to the approximate solution suggested by Zwikker and Kosten. In this regime, viscosity and thermal conductivity effects are treated separately, within complex density and complex compressibility functions. The sound pressure is essentially constant through each cross section, and the excess density and sound pressure (when scaled by the equilibrium density and pressure of air, respectively) are comparable in magnitude. These last two observations are assumed to apply to uniform tubes having arbitrary cross‐sectional shape, and a generalized theory of sound propagation in narrow and wide tubes is derived. The two‐dimensional wave equation that results can be used to describe the variation of either particle velocity or...

418 citations


Journal ArticleDOI
TL;DR: The integral solution to the wave equation is combined with a general description of the field from typical transducers used in clinical ultrasound to yield a model for the received pulse-echo pressure field.
Abstract: An inhomogeneous wave equation is derived describing propagation and scattering of ultrasound in an inhomogeneous medium. The scattering term is a function of density and propagation velocity perturbations. The integral solution to the wave equation is combined with a general description of the field from typical transducers used in clinical ultrasound to yield a model for the received pulse-echo pressure field. Analytic expressions are found in the literature for a number of transducers, and any transducer excitation can be incorporated into the model. An example is given for a concave, nonapodized transducer in which the predicted pressure field is compared to a measured field.

376 citations


Journal ArticleDOI
TL;DR: In this article, the effects of aperture size and inhomogeneities in the propagation medium were treated for both the near-field and far-field regions, and it was concluded that phase-conjugate arrays offer an attractive approach to some long-standing problems in underwater acoustics.
Abstract: Phase‐conjugate mirrors are used in optics to compensate for aberrations caused by inhomogeneities in the propagation medium and by imperfections in optical components. In acoustics, analogous behavior can be achieved by a time‐reversed retransmission of signals received by an array. Compensation for multipath propagation and array imperfections is automatic and does not require knowledge of the detailed properties of either the medium or the array. The behavior of acoustic phase‐conjugate arrays is illustrated in several examples, some highly idealized and some more realistic. The effects of aperture size and inhomogeneities in the propagation medium are treated for both the near‐field and far‐field regions. It is concluded that phase‐conjugate arrays offer an attractive approach to some long‐standing problems in underwater acoustics.

Journal ArticleDOI
TL;DR: The data indicate that power integration occurs only for separations less than approximately 5 ms and that the input is sampled at a fairly high rate and that these samples or "looks" are stored in memory and can be accessed and processed selectively.
Abstract: The decrease in detection and discrimination thresholds with increases in signal duration has often been taken to indicate that a process of relatively long‐term temporal integration occurs in hearing. Two experiments are reported that suggest that no such process occurs. The first experiment is similar to the two‐pulse experiment reported by Zwislocki [J. Zwislocki, J. Acoust. Soc. Am. 32, 1046–1059 (1960)] in which the threshold in quiet for a pair of brief pulses is measured as a function of the temporal separation between them. Our data indicate that power integration occurs only for separations less than approximately 5 ms. For separations larger than 5–10 ms, thresholds do not change with separation and the pulses appear to be processed independently. In the second experiment, brief 1‐kHz tone pulses separated by 100 ms are presented during gaps in a wideband noise. The threshold for a pair of pulses is lower than that for either pulse presented alone, indicating that some type of ‘‘integration’’ oc...

Journal ArticleDOI
TL;DR: The boundary conditions for an interface between two solids are analyzed to model a thin viscoelastic interface layer in this article, where the applicability of such boundary conditions is analyzed by comparison with exact solutions for ultrasonic wave reflection.
Abstract: The boundary conditions for an interface between two solids are analyzed to model a thin viscoelastic interface layer. Boundary conditions that relate stresses and displacements on both sides of the interface are obtained as an asymptotic representation of three‐dimensional solutions for an interface layer in the limit of small wavelength to thickness ratio. The interface boundary conditions obtained include interface stiffnesses and inertia and terms involving coupling between normal and tangential stresses and displacements. The applicability of such boundary conditions is analyzed by comparison with exact solutions for ultrasonic wave reflection. Fundamental boundary conditions are introduced where only one transverse or normal mass or stiffness is included. It is shown that the solution for more exact interface boundary conditions which include two inertia elements and two stiffness elements can be decomposed into a sum of fundamental solutions. The transition between welded and slip boundary conditio...

Journal ArticleDOI
TL;DR: The results are interpreted to mean that individuals who learn a L2 in early childhood, but not those who learn an L2 later in life, are able to establish phonetic categories for sounds in the L2 that differ acoustically from correspondingSounds in the native language.
Abstract: This study examined whether Spanish-English bilinguals are able to fully differentiate Spanish and English /t/ according to voice-onset time (VOT) if they learn English as a second language (L2) in early childhood. In experiment 1, VOT was measured in Spanish words spoken by Spanish monolinguals, in English words spoken by English monolinguals, and in Spanish and English words spoken by bilinguals who learned English either as young children or as adults. As expected, the Spanish monolinguals produced /t/ with considerably shorter VOT values than the English monolinguals. Also as expected, the late L2 learners produced English /t/ with "compromise" VOT values that were intermediate to the short-lag values observed for Spanish monolinguals and the long-lag values observed for English monolinguals. The early learners' VOT values for English /t/, on the other hand, did not differ from English monolinguals' VOT. The same pattern of results was obtained for stops in utterance-medial position and in absolute utterance-initial position. The results of experiment 1 were replicated in experiment 2, where bilingual subjects were required to produce Spanish and English utterances (sentences, phrases, words) in alteration. The results are interpreted to mean that individuals who learn an L2 in early childhood, but not those who learn an L2 later in life, are able to establish phonetic categories for sounds in the L2 that differ acoustically from corresponding sounds in the native language. It is hypothesized that the late L2 learners produced /t/ with slightly longer VOT values in English than Spanish by applying different realization rules to a single phonetic category.

Journal ArticleDOI
TL;DR: In this paper, an exact analytical treatment of the interaction of harmonic elastic waves with n-layered anisotropic plates is presented, where the wave is allowed to propagate along an arbitrary angle from the normal to the plate as well as along any azimuthal angle.
Abstract: Exact analytical treatment of the interaction of harmonic elastic waves with n-layered anisotropic plates is presented. Each layer of the plate can possess up to as low as monoclinic symmetry and thus allowing results for higher symmetry materials such as orthotropic, transversely isotropic, cubic, and isotropic to be obtained as special cases. The wave is allowed to propagate along an arbitrary angle from the normal to the plate as well as along any azimuthal angle. Solutions are obtained by using the transfer matrix method. According to this method formal solutions for each layer are derived and expressed in terms of wave amplitudes. By eliminating these amplitudes the stresses and displacements on one side of the layer are related to those of the other side. By satisfying appropriate continuity conditions at interlayer interfaces a global transfer matrix can be constructed which relates the displacements and stresses on one side of the plate to those on the other. Invoking appropriate boundary conditions on the plates outer boundaries a large variety of important problems can be solved. Of these mention is made of the propagation of free waves on the plate and the propagation of waves in a periodic media consisting of a periodic repetition of the plate. Confidence is the approach and results are confirmed by comparisons with whatever is available from specialized solutions. A variety of numerical illustrations are included.

Journal ArticleDOI
TL;DR: The results of this study suggest that, for sleeping subjects, modulation frequencies above 70 Hz may be best when using steady-state potentials for hearing threshold estimation.
Abstract: Steady-state evoked potential responses were measured to binaural amplitude-modulated (AM) and combined amplitude- and frequency-modulated (AM/FM) tones. For awake subjects, AM/FM tones produced larger amplitude responses than did AM tones. Awake and sleeping responses to 30-dB HL AM/FM tones were compared. Response amplitudes were lower during sleep and the extent to which they differed from awake amplitudes was dependent on both carrier and modulation frequencies. Background EEG noise at the stimulus modulation frequency was also reduced during sleep and varied with modulation frequency. A detection efficiency function was used to indicate the modulation frequencies likely to be most suitable for electrical estimation of behavioral threshold. In awake subjects, for all carrier frequencies tested, detection efficiency was highest at a modulation frequency of 45 Hz. In sleeping subjects, the modulation frequency regions of highest efficiency varied with carrier frequency. For carrier frequencies of 250 Hz, 500 Hz, and 1 kHz, the highest efficiencies were found in two modulation frequency regions centered on 45 and 90 Hz. For 2 and 4 kHz, the highest efficiencies were at modulation frequencies above 70 Hz. Sleep stage affected both response amplitude and background EEG noise in a manner that depended on modulation frequency. The results of this study suggest that, for sleeping subjects, modulation frequencies above 70 Hz may be best when using steady-state potentials for hearing threshold estimation.

PatentDOI
Katashi Nagao1, Hiroshi Nomiyama1
TL;DR: In this article, a system for resolving structural ambiguities in syntactic analysis of natural language is presented, which is caused by prepositional phrase attachment, relative clause attachment, and other modifier-modifiee relationships in sentences.
Abstract: A system for resolving structural ambiguities in syntactic analysis of natural language, which ambiguities are caused by prepositional phrase attachment, relative clause attachment, and other modifier-modifiee relationships in sentences. The system uses instances of dependency (modification relationship) structures extracted from a terminology dictionary as a knowledge base. Structural ambiguity is represented by indicating that a word in a sentence has several words as candidate modifiees. The system resolves such ambiguity by 1) first searching the knowledge base, which contains dependency information in the form of tree structures, for dependencies between the word and each of its possible modifiees, 2) then assigning an order of preference to these dependencies by means of a path search in the tree structures, and 3) finally selecting the most preferable dependency as the modifiee. The sentences can be analyzed by a parser and transformed into dependency structures by the system. The knowledge base can be constructed automatically, since the source of knowledge exists in the form of texts, and knowledge bootstrapping can be realized by adding the outputs of the system to its knowledge base.

Journal ArticleDOI
TL;DR: In this article, an analysis is made of the noise produced by low Mach number turbulent flow over the serrated edge of a flat plate airfoil at zero angle of attack.
Abstract: An analysis is made of the noise produced by low Mach number turbulent flow over the serrated edge of a flat plate airfoil at zero angle of attack. The serrations are of sawtooth profile of wavelength λ and root‐to‐tip distance 2h. At frequencies ω satisfying ωh/U≫1 (where U is the velocity of the main stream) it is predicted that the intensity of the radiation is reduced relative to that produced by the same flow over an unserrated edge by at least 10×log[1+(4h/λ)2] dB. Predictions are contrasted with analogous results derived [M. S. Howe, J. Fluids Struct. 5, 33–45 (1991)] for smoothly varying serrations of sinusoidal profile, for which it was concluded that attenuations of order 10×log(6h/λ) dB are possible.

Journal ArticleDOI
TL;DR: This paper investigated Locus equations as a potential metric capable of illustrating relational invariance for place of articulation in voiced initial stop consonants independently of vowel context, and developed a brain-based recognition algorithm for stop place integrating burst and F2 trajectory cues.
Abstract: Locus equations were investigated as a potential metric capable of illustrating relational invariance for place of articulation in voiced initial stop consonants independently of vowel context. Locus equations are straight line regression fits to data points formed by plotting onsets of F2 transitions along the y axis and their corresponding midvowel nuclei along the x axis. Twenty subjects, 10 male and 10 female, produced /b/v/t/, /d/v/t/, and /g/v/t/ tokens for ten vowel contexts. Each CVC token was repeated in a carrier phrase five times yielding 50 tokens per stop place category. Formant measures were obtained using the MacSpeech Lab II speech analysis system. Extremely linear regression functions were found characterized by distinct slopes and y intercepts as a function of place of articulation. A discriminant analysis using F2onset and vowel frequencies as predictors showed 82%, 78%, and 67% classification rates for labial, alveolar, and velar place. Using derived slope and y‐intercept values as predictors led to 100% classification into stop place categories. A neurobiologically oriented perspective on the invariance issue is developed and a brain‐based recognition algorithm for stop place integrating burst and F2 trajectory cues is offered.

PatentDOI
TL;DR: In this article, a system and method for enabling a caller to obtain access to services via a telephone network by entering a spoken password having a plurality of digits is described. But the method requires the caller to utter the password beginning with a first digit and ending with a last digit of the password.
Abstract: The present invention describes a system and method for enabling a caller to obtain access to services via a telephone network by entering a spoken password having a plurality of digits Preferably, the method includes the steps of: (1) prompting the caller to speak the password beginning with a first digit and ending with a last digit thereof, (2) recognizing each spoken digit of the password using a speaker-independent voice recognition algorithm, (3) following entry of the last digit of the password, determining whether the password is valid, and (4) if the password is valid, verifying the caller's identity using a voice verification algorithm

Journal ArticleDOI
TL;DR: MRI techniques were used to gather basic data to apply in computational models of speech articulation and axial images of the pharyngeal cavity were collected during the production of an ensemble of nine vowels.
Abstract: Magnetic resonance imaging (MRI) techniques were used to gather basic data to apply in computational models of speech articulation. Two experiments were performed. In experiment 1, voice recordings from two male subjects were obtained simultaneously with axial, coronal, or midsagittal MR images of their vocal tracts while they produced the four point vowels. Area functions describing the individual tract shapes were obtained by measurements performed on the MR images. Digital filters derived from these functions were then used to resynthesize the vowel sounds which were compared, both perceptually and acoustically, with the subjects' original recordings. In experiment 2, axial images of the pharyngeal cavity were collected during the production of an ensemble of nine vowels. Plots of cross-sectional area versus the midsagittal width of the tract at different locations within the pharynx and for different vowel productions were used to derive a functional relationship between the two variables. Data from experiment 1 relating midsagittal width to cross-sectional area within the oral cavity were also examined.

Journal ArticleDOI
Ingo R. Titze1
TL;DR: It is shown that singers obtain two to three times greater peak flow for a given lung pressure, suggesting that they adjust their glottal or vocal tract impedance for optimal flow transfer between the source and the resonantor.
Abstract: Phonation threshold pressure has previously been defined as the minimum lung pressure required to initiate phonation. By modeling the dependence of this pressure on fundamental frequency, it is shown that relatively simple aerodynamic relations for time-varying flow in the glottis are obtained. Lung pressure and peak glottal flow are nearly linearly related, but not proportional. For this reason, traditional power law relations between vocal power and lung pressure may not hold. Glottal impendance for time-varying flow should be defined differentially rather than as a simple ratio between lung pressure and peak flow. It is shown that the peak flow, the peak flow derivative, the open quotient, and the speed quotient of inverse-filtered glottal flow waveforms all depend explicitly on phonation threshold pressure. Data from singers are compared with those from nonsingers. The primary difference is that singers obtain two to three times greater peak flow for a given lung pressure, suggesting that they adjust their glottal or vocal tract impedance for optimal flow transfer between the source and the resonantor.

PatentDOI
TL;DR: A Score Function is provided for disambiguating or truncating ambiguities on the basis of composite scores, generated at different stages of the processing.
Abstract: A language processing system includes a mechanism for measuring the syntax trees of sentences of material to be translated and a mechanism for truncating syntax trees in response to the measuring mechanism. In a particular embodiment, a Score Function is provided for disambiguating or truncating ambiguities on the basis of composite scores, generated at different stages of the processing.

PatentDOI
TL;DR: In this article, the tradeoff between time resolution and frequency resolution is optimized by adaptively selecting the transform block length for each sampled audio segment, and/or can optimize coding gain by adapting the transform and analysis window or the analysis/synthesis window pair.
Abstract: The invention relates in general to high-quality low bit-rate digital transform coding and decoding of information corresponding to audio signals such as music signals. More particularly, the invention relates to signal analysis/synthesis in coding and decoding. The invention can optimize the trade off in transform coders between time resolution and frequency resolution by adaptively selecting the transform block length for each sampled audio segment, and/or can optimize coding gain by adaptively selecting the transform and/or by adaptively selecting the analysis window or the analysis/synthesis window pair.

Journal ArticleDOI
TL;DR: The xyz algorithm as mentioned in this paper uses products of powers of the Cartesian coordinates as a basis for expansion of the displacement in a truncated complete set, enabling one to analytically evaluate the required matrix elements for these systems.
Abstract: The Hamilton’s principle approach to the calculation of vibrational modes of elastic objects with free boundaries is exploited to compute the resonance frequencies of a variety of anisotropic elastic objects, including spheres, hemispheres, spheroids, ellipsoids, cylinders, eggs, shells, bells, sandwiches, parallelepipeds, cones, pyramids, prisms, tetrahedra, octahedra, and potatoes. The paramount feature of this calculation, which distinguishes it from previous ones, is the choice of products of powers of the Cartesian coordinates as a basis for expansion of the displacement in a truncated complete set, enabling one to analytically evaluate the required matrix elements for these systems. Because these basis functions are products of powers of x, y, and z, this scheme is called the xyz algorithm. The xyz algorithm allows a general anisotropic elastic tensor with any position dependence and any shape with arbitrary density variation. A number of plots of resonance spectra of families of elastic objects are...

Journal ArticleDOI
TL;DR: Results show that SPL increases with Fo at a rate of 8-9 dB/octave provided that lung pressure is raised proportional to phonation threshold pressure, a new quantity that assumes considerable importance in vocal intensity calculations.
Abstract: Vocal intensity is studied as a function of fundamental frequency and lung pressure A combination of analytical and empirical models is used to predict sound pressure levels from glottal waveforms of five professional tenors and twenty five normal control subjects The glottal waveforms were obtained by inverse filtering the mouth flow Empirical models describe features of the glottal flow waveform (peak flow, peak flow derivative, open quotient, and speed quotient) in terms of lung pressure and phonation threshold pressure, a key variable that incorporates the Fo dependence of many of the features of the glottal flow The analytical model describes the contributions to sound pressure levels SPL by the vocal tract Results show that SPL increases with Fo at a rate of 8-9 dB/octave provided that lung pressure is raised proportional to phonation threshold pressure The SPL also increases at a rate of 8-9 dB per doubling of excess pressure over threshold, a new quantity that assumes considerable importance in vocal intensity calculations For the same excess pressure over threshold, the professional tenors produced 10-12 dB greater intensity than the male nonsingers, primarily because their peak airflow was much higher for the same pressure A simple set of rules is devised for predicting SPL from source waveforms

PatentDOI
Lynn D. Wilcox1, Marcia A. Bush1
TL;DR: The wordspotter is intended for interactive applications, such as the editing of voice mail or mixed-media documents, and for keyword indexing in single-speaker audio or video recordings.
Abstract: A technique for wordspotting based on hidden Markov models (HMM's). The technique allows a speaker to specify keywords dynamically and to train the associated HMM's via a single repetition of a keyword. Non-keyword speech is modeled using an HMM trained from a prerecorded sample of continuous speech. The wordspotter is intended for interactive applications, such as the editing of voice mail or mixed-media documents, and for keyword indexing in single-speaker audio or video recordings.

Journal ArticleDOI
TL;DR: By comparison to lengthening for accent, final lengthening is like a localized change in speaking tempo, although it cannot be equated directly with the specification of stiffness.
Abstract: In order to understand better the phonetic control of final lengthening, the articulation of phrase‐final syllables was compared with that of two other contexts known to increase syllable duration: accent and slow tempo. The kinematics of jaw movements in [pap] sequences and of lower lip movements in [pE] sequences for four subjects were interpreted in terms of a task‐dynamic model. There was evidence of two different control strategies: decreasing intragestural stiffness to slow down some part of the syllable, and changing intergestural phasing to decrease overlap of the vowel gesture by the consonant. The first was used in slowing down tempo, whereas the second was used to increase the duration of accented syllables over unaccented syllables. Both strategies were implicated in phrase‐final lengthening. In accented syllables, final closing gestures generally were longer and slower, but not more displaced. The two slowest subjects, however, used the other strategy in their slow‐tempo final syllables. Final lengthening in reduced syllables was more difficult to interpret. The relationship between peak velocity and displacement suggested that a lesser stiffness is obscured by an increased gestural amplitude. Thus, by comparison to lengthening for accent, final lengthening is like a localized change in speaking tempo, although it cannot be equated directly with the specification of stiffness.

PatentDOI
TL;DR: A speech recognition apparatus having reference pattern adaptation stores a plurality of reference patterns representing speech to be recognized, each stored reference pattern having associated therewith a quality value representing the effectiveness of that pattern for recognizing an incoming speech utterance.
Abstract: A speech recognition apparatus having reference pattern adaptation stores a plurality of reference patterns representing speech to be recognized, each stored reference pattern having associated therewith a quality value representing the effectiveness of that pattern for recognizing an incoming speech utterance. The method and apparatus provide user correction actions representing the accuracy of a speech recognition, dynamically, during the recognition of unknown incoming speech utterances and after training of the system. The quality values are updated, during the speech recognition process, for at least a portion of those reference patterns used during the speech recognition process. Reference patterns having low quality values, indicative of either inaccurate representation of the unknown speech or non-use, can be deleted so long as the reference pattern is not needed, for example, where the reference pattern is the last instance of a known word or phrase. Various methods and apparatus are provided for determining when reference patterns can be deleted or added, to the reference memory, and when the scores or values associated with a reference pattern should be increased or decreased to represent the "goodness" of the reference pattern in recognizing speech.

Journal ArticleDOI
TL;DR: In this paper, the authors generalized the van Cittert-Zernike theorem to pulse echo ultrasound and showed that the spatial covariance of the backscattered pressure field is proportional to the autocorrelation of the transmitting aperture function.
Abstract: A classical theorem of statistical optics, the van Cittert–Zernike theorem, is generalized to pulse echo ultrasound. This theorem fully describes the second‐order statistics of the spatial fluctuations (the spatial covariance) of the field produced by an incoherent source. As a random scattering medium is insonified, it behaves as an incoherent source. The van Cittert–Zernike theorem can thus predict the spatial covariance of the pressure field backscattered by a random medium. It is shown that this spatial covariance and the incident energy diagram are Fourier pairs. In the case of a focused illumination, the spatial covariance of the backscattered pressure field is proportional to the autocorrelation of the transmitting aperture function. This is independent of frequency and of F/ number. Experimental results obtained with a linear array are in good agreement with theoretical expectations. The implications of this theorem in speckle reduction and in focusing in nonhomogenous media are discussed.