scispace - formally typeset
Search or ask a question
Author

Thomas W. Parsons

Bio: Thomas W. Parsons is an academic researcher. The author has contributed to research in topics: Intelligibility (communication) & Voice activity detection. The author has an hindex of 1, co-authored 2 publications receiving 287 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, the harmonics of the desired voice in the Fourier transform of the input were selected to distinguish between two different voices. But the authors focus on the principal subproblem, the separation of vocalic speech.
Abstract: A common type of interference in speech transmission is that caused by the speech of a competing talker. Although the brain is adept at clarifying such speech, it relies heavily on binaural data. When voices interfere over a single channel, separation is much more difficult and intelligibility suffers. Clarifying such speech is a complex and varied problem whose nature changes with the moment‐to‐moment variation in the types of sound which interfere. This paper describes an attack on the principal subproblem, the separation of vocalic speech. Separation is done by selecting the harmonics of the desired voice in the Fourier transform of the input. In implementing this process, techniques have been developed for resolving overlapping spectrum components, for determining pitches of both talkers, and for assuring consistent separation. These techniques are described, their performance on test utterances is summarized, and the possibility of using this process as a basis for the solution of the general two‐tal...

294 citations

01 Dec 1971
TL;DR: The implementation of a system which automatically recognizes spoken words in near real time is described, capable of learning and recognizing a vocubulary of 100 words of up to three syllables each and providing the user with wide flexibility in the operation of the system.
Abstract: : The report describes the implementation of a system which automatically recognizes spoken words in near real time. The techniques on which the system is based were developed over a period of 10 years. During most of this time, the research activities were devoted to the problems of selecting, extracting, and utilizing essential information-bearing parameters of speech. A non-real-time version of the speech analysis and recognition techniques which were developed was tested and demonstrated in 1967. Subsequently, the objective was shifted to implementing an on-line speech recognition system. An initial version was completed in 1969. Since then, the final objective of this long-term effort has been to improve the effectiveness of the analysis and classification procedures, speed their operation, and provide the user with wide flexibility in the operation of the system. The system described was implemented in a DDP-116 computer and is capable of learning and recognizing a vocubulary of 100 words of up to three syllables each. (Author)

1 citations


Cited by
More filters
Journal ArticleDOI
26 Jun 1979
TL;DR: An overview of the variety of techniques that have been proposed for enhancement and bandwidth compression of speech degraded by additive background noise is provided to suggest a unifying framework in terms of which the relationships between these systems is more visible and which hopefully provides a structure which will suggest fruitful directions for further research.
Abstract: Over the past several years there has been considerable attention focused on the problem of enhancement and bandwidth compression of speech degraded by additive background noise. This interest is motivated by several factors including a broad set of important applications, the apparent lack of robustness in current speech-compression systems and the development of several potentially promising and practical solutions. One objective of this paper is to provide an overview of the variety of techniques that have been proposed for enhancement and bandwidth compression of speech degraded by additive background noise. A second objective is to suggest a unifying framework in terms of which the relationships between these systems is more visible and which hopefully provides a structure which will suggest fruitful directions for further research.

1,236 citations

Journal ArticleDOI
TL;DR: The model is an application and illustration of the Correlation Theory of brain function and represents the peripheral evidence represented by amplitude modulations globally present in all components of a sound spectrum.
Abstract: Sensory segmentation is an outstanding unsolved problem of theoretical, practical and technical importance. The basic idea of a solution is described in the form of a model. The response of “neurons” within the sensory field is temporally unstable. Segmentation is expressed by synchronization within segments and desynchronization between segments. Correlations are generated by an autonomous pattern formation process. Neuronal coupling is the result both of peripheral evidence (similarity of local quality) and of central evidence (common membership in a stored pattern). The model is consistent with known anatomy and physiology. However, a new physiological function, synaptic modulation, has to be postulated. The present paper restricts explicit treatment to the peripheral evidence represented by amplitude modulations globally present in all components of a sound spectrum. Generalization to arbitrary sensory qualities will be the subject of a later paper. The model is an application and illustration of the Correlation Theory of brain function.

983 citations

Journal ArticleDOI
TL;DR: In this article, the authors developed a representation for discrete-time signals and systems based on short-time Fourier analysis and showed that a class of linear-filtering problems can be represented as the product of the time-varying frequency response of the filter multiplied by the short time Fourier transform of the input signal.
Abstract: This paper develops a representation for discrete-time signals and systems based on short-time Fourier analysis. The short-time Fourier transform and the time-varying frequency response are reviewed as representations for signals and linear time-varying systems. The problems of representing a signal by its short-time Fourier transform and synthesizing a signal from its transform are considered. A new synthesis equation is introduced that is sufficiently general to describe apparently different synthesis methods reported in the literature. It is shown that a class of linear-filtering problems can be represented as the product of the time-varying frequency response of the filter multiplied by the short-time Fourier transform of the input signal. The representation of a signal by samples of its short-time Fourier transform is applied to the linear filtering problem. This representation is of practical significance because there exists a computationally efficient algorithm for implementing such systems. Finally, the methods of fast convolution age considered as special cases of this representation.

600 citations

DissertationDOI
01 Jan 1996
TL;DR: A blackboard-based implementation of the 'prediction-driven' approach is described which analyzes dense, ambient sound examples into a vocabulary of noise clouds, transient clicks, and a correlogram-based representation of wide-band periodic energy called the weft.
Abstract: The sound of a busy environment, such as a city street, gives rise to a perception of numerous distinct events in a human listener--the 'auditory scene analysis' of the acoustic information. Recent advances in the understanding of this process from experimental psychoacoustics have led to several efforts to build a computer model capable of the same function. This work is known as 'computational auditory scene analysis'. The dominant approach to this problem has been as a sequence of modules, the output of one forming the input to the next. Sound is converted to its spectrum, cues are picked out, and representations of the cues are grouped into an abstract description of the initial input. This 'data-driven' approach has some specific weaknesses in comparison to the auditory system: it will interpret a given sound in the same way regardless of its context, and it cannot 'infer' the presence of a sound for which direct evidence is hidden by other components. The 'prediction-driven' approach is presented as an alternative, in which analysis is a process of reconciliation between the observed acoustic features and the predictions of an internal model of the sound-producing entities in the environment. In this way, predicted sound events will form part of the scene interpretation as long as they are consistent with the input sound, regardless of whether direct evidence is found. A blackboard-based implementation of this approach is described which analyzes dense, ambient sound examples into a vocabulary of noise clouds, transient clicks, and a correlogram-based representation of wide-band periodic energy called the weft. The system is assessed through experiments that firstly investigate subjects' perception of distinct events in ambient sound examples, and secondly collect quality judgments for sound events resynthesized by the system. Although rated as far from perfect, there was good agreement between the events detected by the model and by the listeners. In addition, the experimental procedure does not depend on special aspects of the algorithm (other than the generation of resyntheses), and is applicable to the assessment and comparison of other models of human auditory organization. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

399 citations

Journal ArticleDOI
Yariv Ephraim1
01 Oct 1992
TL;DR: A unified statistical approach for the three basic problems of speech enhancement is developed, using composite source models for the signal and noise and a fairly large set of distortion measures.
Abstract: Since the statistics of the speech signal as well as of the noise are not explicitly available, and the most perceptually meaningful distortion measure is not known, model-based approaches have recently been extensively studied and applied to the three basic problems of speech enhancement: signal estimation from a given sample function of noisy speech, signal coding when only noisy speech is available, and recognition of noisy speech signals in man-machine communication. Research on the model-based approach is integrated and put into perspective with other more traditional approaches for speech enhancement. A unified statistical approach for the three basic problems of speech enhancement is developed, using composite source models for the signal and noise and a fairly large set of distortion measures. >

383 citations