scispace - formally typeset
Search or ask a question

Showing papers on "Spectrogram published in 1996"


Book
13 May 1996
TL;DR: In this paper, the authors present a series of applications of JFTA, such as: Signal Joint-Time Frequency Representations, Adaptive Gabor Expansion and Adaptive Spectrogram, and Instantaneous Frequency Estimation.
Abstract: Classical Signal Analysis. Signal Joint-Time Frequency Representations. Wigner-Ville Distribution. Time-Frequency Distribution Series. Adaptive Gabor Expansion and Adaptive Spectrogram. Applications of JFTA. Instantaneous Frequency Estimation. Time-Varying Signal Pattern Recognition and Classification. Non-Linear Time-Varying Filtering.

504 citations


Book
Sudeshna Adak1
01 Jan 1996
TL;DR: In this article, a general class of piecewise locally stationary processes is introduced that allows both abrupt and smooth changes in the spectral characteristics of the nonstationary time series and can be used to model various naturally occuring phenomena.
Abstract: Modeling of nonstationary stochastic time series has found wide applications in speech processing, biomedical signal processing, seismology, and failure detection. Data from these fields have often been modeled as piecewise stationary processes with abrupt changes, and their time-varying spectral features have been studied with the help of spectrograms. A general class of piecewise locally stationary processes is introduced here that allows both abrupt and smooth changes in the spectral characteristics of the nonstationary time series. It is shown that this class of processes behave as approximately piecewise stationary processes and can be used to model various naturally occuring phenomena. An adaptive segmentation method of estimating the time-dependent spectrum is proposed for this class of processes. The segmentation procedure uses binary trees and windowed spectra to nonparametrically and adaptively partition the data into approximately stationary intervals. Results of simulation studies dem...

166 citations


Proceedings ArticleDOI
07 May 1996
TL;DR: This paper describes techniques to automatically morph from one sound to another, representations for morphing, techniques for matching, and algorithms for interpolating and morphing each sound component.
Abstract: This paper describes techniques to automatically morph from one sound to another. Audio morphing is accomplished by representing the sound in a multi-dimensional space that is warped or modified to produce a desired result. The multi-dimensional space encodes the spectral shape and pitch on orthogonal axes. After matching components of the sound, a morph smoothly interpolates the amplitudes to describe a new sound in the same perceptual space. Finally, the representation is inverted to produce a sound. This paper describes representations for morphing, techniques for matching, and algorithms for interpolating and morphing each sound component. Spectrographic images of a complete morph are shown at the end.

94 citations


Journal ArticleDOI
TL;DR: An analysis of auto-term presentation using the reduced interference distributions (RID) is done, and an optimal kernel, with respect to the auto- term quality and cross-term suppression, is derived.
Abstract: An analysis of auto-term presentation using the reduced interference distributions (RID) is done. Comparison with an ideal time-frequency signal representation is taken as a basis for this analysis. The following distributions are considered: Choi-Williams (1989), Zao-Atlas-Marks, Born-Jordan, sinc, Zhang-Sato (see ibid., vol.42, no.1, p.54, 1994), Butterworth, spectrogram, and the author's recently proposed S-method for time-frequency analysis. Various distributions produce different auto-term shapes. In all cases, the condition for cross-term reduction is contradictory to the condition for high auto-term quality. A procedure for designing a kernel that will produce the desired auto-term shape is demonstrated. An optimal kernel, with respect to the auto-term quality and cross-term suppression, is derived.

89 citations


Patent
15 Mar 1996
TL;DR: In the first step of sound morphing process, each sound which forms the basis for the morph is converted into one or more quantitative representations, such as spectrograms as mentioned in this paper, and then the temporal axes of the two sounds are matched, so that similar components of two sounds, including onsets, harmonic regions and inharmonic regions, are aligned with one another.
Abstract: In the first step of a sound morphing process, each sound which forms the basis for the morph is converted into one or more quantitative representations, such as spectrograms. After the representations have been obtained, the temporal axes of the two sounds are matched, so that similar components of the two sounds, such as onsets, harmonic regions and inharmonic regions, are aligned with one another. Other characteristics of the sounds, such as pitch, formant frequencies, or the like, are then matched. Once the energy in each of the sounds has been accounted for and matched to that of the other sound, the two sounds are cross-faded, to produce a representation of a new sound. This representation is then inverted, to generate the morphed sound.

68 citations


Journal ArticleDOI
TL;DR: Theoretical estimates of the standard deviation of four acoustospectrographic parameters (the intercept and slope of attenuation and backscatter coefficient) are derived and it is proposed that the deviation from these estimates is a potential tissue characterization parameter.
Abstract: Theoretical estimates of the standard deviation (STD) of four acoustospectrographic parameters (the intercept and slope of attenuation and backscatter coefficient) are derived. This derivation expands and corrects existing derivations, and is confirmed using simulations based on the adopted theoretical model. A robust parameter estimation method is applied to various phantom measurements, and to in vivo liver scans of healthy human subjects. The measured STD is higher than the theoretically predicted value, and we investigated four possible factors which explain this discrepancy. First, it is shown that the STD and bias after spectrogram calculation are rather insensitive to changes in windowing function, type, length and overlap. Second, we observed that a diffraction correction spectrogram calibrated on a medium different from the one being measured insufficiently corrects the depth-dependency of the parameters, which affects both precision as well as accuracy. We therefore propose a method that constructs an organ-specific diffraction correction spectrogram from the averaged spectrogram of a set of normal organs. We show that the organ-specific correction does not affect STD even in case of previously unseen acquisitions. Third, we introduce local inhomogeneity, which predicts excess STD due to local variations of the physical parameters within an organ (i.e., intrasubject), and global inhomogeneity, which predicts variations between organs (i.e., intersubject). We conclude that our method of estimating STD predicts normal, in vivo data very well, and propose that the deviation from these estimates is a potential tissue characterization parameter.

55 citations


Journal ArticleDOI
TL;DR: Although the proposed transform has been derived heuristically—namely, to be optimal in the perceptual frequency scale in Gabor-sense and to perform a 1 CB speech analysis—it appears that this is a self-invertible, overcomplete, shiftable transform.

44 citations


Journal ArticleDOI
04 Jun 1996
TL;DR: In this article, Tikhonov deconvolution is used for transforming the processed spectrogram in such a way as to facilitate finding initial estimates of its parameters, i.e., gains in accuracy of estimating the parameters of peaks, are demonstrated using both synthetic and real-world spectrophotometric data.
Abstract: The problem of spectrogram interpretation is considered under the assumption that the parameters of spectral peaks-their positions and magnitudes-contain the information essential for spectrometric analysis. The subsequent use of Tikhonov deconvolution and iterative correction of the estimates of those parameters is proposed. Deconvolution is used for transforming the processed spectrogram in such a way as to facilitate finding initial estimates of its parameters. The advantages of the proposed approach, i.e., gains in accuracy of estimating the parameters of peaks, are demonstrated using both synthetic and real-world spectrophotometric data.

35 citations


Journal ArticleDOI
TL;DR: In this article, the influence of noise on the two most important distributions (spectrogram and Wigner distribution) is analyzed in a unified manner using the S-method and the expressions for mean and variance are derived.
Abstract: An analysis of time-frequency representations of noisy signals is performed. Using the method for time-frequency signal analysis which was recently defined by Stankovic (the S-method), the influence of noise on the two most important distributions (spectrogram and Wigner distribution) is analyzed in unified manner. It is also shown that, for signals whose instantaneous frequency is not constant, an improvement over the spectrogram and the Wigner distribution performances in a noisy environment may be achieved using the S-method. The expressions for mean and variance are derived. Results are given for several illustrative and numerical examples.

31 citations


Proceedings ArticleDOI
07 May 1996
TL;DR: This work investigates the question of what the joint moments of a signal are by considering the joint moment of the spectrogram for limiting cases of the window, and derived expressions for the joint Moments reveal the distorting effects of theSpectrogram window.
Abstract: We investigate the question of what the joint moments of a signal are by considering the joint moments of the spectrogram for limiting cases of the window. Operator methods are also explored. Expressions for the joint moments are derived, which reveal the distorting effects of the spectrogram window. Knowledge of the joint moments of a signal may be useful in estimating positive time-frequency distributions, or in signal classification of nonstationary signals.

26 citations


Patent
04 Sep 1996
TL;DR: In this paper, the analysis window of a spectrogram is rotated relative to the frequency components of the signal by preprocessing using a fractional Fourier transform to form rotated window spectrograms.
Abstract: A speech processing and analysis apparatus and method for generating a time-frequency distribution of a speech signal combines a set of spectrograms with varying window lengths and orientations to provide a parameter-less time-frequency distribution having good joint time and frequency resolution at all angular orientations. The analysis window of a spectrogram is rotated relative to the frequency components of the signal by preprocessing using a Fractional Fourier Transform to form rotated window spectrograms. In particular, to form the rotated window spectrogram, the signal is initially pre-processed using a Fractional Fourier Transform of angle α, the spectrogram time-frequency distribution of the pre-processed signal is then computed using analysis window h(t) and then rotated by angle -α. The geometric mean of a set of rotated window spectrograms, which are indexed by both the analysis window length and the angular orientation of the window relative to the signal's time-frequency features, is then computed to form a combination of rotated window spectrograms.

Journal ArticleDOI
TL;DR: In this paper, a new method for numerical correction of spectrograms is proposed, which consists of sequential use of the Tikhonov deconvolution algorithm, for estimating the positions of spectral peaks, and a curve-fitting algorithm for estimating their magnitudes.
Abstract: The problem of numerical correction of spectrograms is addressed. A new method of correction is developed which consists of sequential use of the Tikhonov deconvolution algorithm, for estimating the positions of spectral peaks, and a curve-fitting algorithm, for estimating their magnitudes. The metrological and numerical properties of the proposed method for spectrogram interpretation are assessed by means of spectrometry-based criteria, using synthetic and real-world spectrograms. Conclusions are drawn concerning computational complexity and accuracy of the proposed method and its metrological applicability.

Journal ArticleDOI
TL;DR: In this article, a joint time-frequency ISAR algorithm that combines the conventional ISAR processing with the joint timefrequency signal representation is presented, where the adaptive spectrogram, applied to the range axis of the ISAR image, is used as the timefrequency processing engine.
Abstract: A new joint time-frequency ISAR algorithm that combines the conventional ISAR processing with the joint time-frequency signal representation is presented. The adaptive spectrogram, applied to the range axis of the ISAR image, is used as the time-frequency processing engine. The algorithm is tested using the chamber measurement data from a scale model airplane. The results show that the nonpoint scattering mechanisms due to the waveguide-like engine inlet can be seamlessly removed, leading to an enhanced ISAR image consisting only of point scatterers. Furthermore, the extracted inlet features are displayed in the frequency-aspect plane and show distinct waveguide cutoff features.

Proceedings ArticleDOI
18 Jun 1996
TL;DR: In this article, an improvement to the spectrogram reassignment, based on a multi-window procedure, which brings a solution to this kind of problem, is proposed, illustrated by numerical simulations and supported by quantitative measures.
Abstract: The reassignment method has been proved to improve the time-frequency representation of deterministic signals and especially of "chirp-like" FM signals. Unfortunately, reassignment presents some limitations when broadband noise is present. In this case, the squeezing process yields peaked areas in noise-only regions: this drawback should be avoided, since a rather flat energy distribution should be expected there. We propose in this paper an improvement to the spectrogram reassignment, based on a multi-window procedure, which brings a solution to this kind of problem. The efficiency of the proposed method is illustrated by numerical simulations and supported by quantitative measures.

Proceedings ArticleDOI
25 Aug 1996
TL;DR: A novel speech signal classification scheme based on spectrograms which are subjected to wavelet transform: a procedure which yields specific information regarding time and frequency variation of the signal.
Abstract: This paper describes a novel speech signal classification scheme based on spectrograms which are subjected to wavelet transform: a procedure which yields specific information regarding time and frequency variation of the signal. Feature vectors are extracted and classified using LVQ networks. The output of the network is interpreted as a fuzzy membership coefficient. This scheme is applied to the classification of voice dysphonia.

Proceedings ArticleDOI
TL;DR: In this paper, the authors studied the scattering interaction of electromagnetic pulses of short duration with a few targets and compared different time-frequency distributions of the Wigner-type, or Cohen class.
Abstract: We study the scattering interaction of electromagnetic pulses of short duration with a few targets The targets are two spheres and a short cylinder made of metal, and they are buried at selected depths in dry sand contained in an indoor sandbox The backscattered echoes are extracted by an impulse radar system playing the role of a ground penetrating radar (GPR) In general, multiple scattering between a buried target and the ground surface and scattering from discontinuities in the sand distort the returned echoes to the extent that target recognition by means of frequency signature is nearly impossible These obstacles for successful target recognition can be counteracted by analyzing returned echoes by means of time-frequency distributions of the Wigner-type, or Cohen class, by which it is possible to study how each one of the target's signature features evolves in time Numerous members of the Cohen class of time-frequency distributions have been proposed over the years, each with its own property of concentrating the features in time-frequency and ability of suppressing undesirable cross-terms interference We examine how, and how well a few members of the Cohen class reveal the time-progression of each target's features The time-frequency distributions we compare in this survey are the pseudo-Wigner distribution, the Choi-Williams distribution, the adaptive spectrogram, the cone-shaped distribution, the Gabor spectrogram, and the spectrogram We discuss the ability of each method of analysis to extract and concentrate features of the signature of each target in time-frequency and to suppress undesirable interference from cross- terms or multiple scattering The results serve to assess the possibility of identifying subsurface targets using a GPR

Journal ArticleDOI
TL;DR: An automatic system for regional seismic phase identification from mono-component single station records is presented based on a neural network study of the spectrogram, which allows the system to take into account the variability of the different regional seismic phases in a wide magnitude and distance range.
Abstract: We present an automatic system for regional seismic phase identification from mono-component single station records. It is based on a neural network study of the spectrogram. A large dataset of regional events checked by experts has been used for the training step. A sophisticated neural network design allows the system to take into account the variability of the different regional seismic phases in a wide magnitude and distance range. On the training and test sets respectively, more than 85% and 70% of the data are correctly classified.

Journal ArticleDOI
TL;DR: A class of linear, vector-valued time-frequency representations (TFRs) that are easily related to associated bilinear TFDs through the SP decomposition are introduced and can be realized as a weighted sum of STFT synthesis schemes.
Abstract: Cohen's (1989) class of time frequency distributions (TFDs), which includes the spectrogram (SP), Wigner distribution (WD), and reduced interference distributions (RIDs) has become widely known as a useful signal analysis tool. It has been shown that every real-valued TFD can be written as a weighted sum of SPs. The "SP decomposition" has been used to construct fast approximations to desirable TFDs using the SP building block, for which there exist accessible and efficient hardware and software implementations. We introduce a class of linear, vector-valued time-frequency representations (TFRs) that are easily related to associated bilinear TFDs through the SP decomposition. We solve a least-squares signal synthesis problem on modified vector-valued TFRs that are associated with nonnegative TFDs as a weighted sum of least-squares short-time Fourier transform (STFT) signal synthesis schemes. We extend the solution to vector-valued TFRs associated with high-resolution TFDs in order to define a high-resolution alternative to STFT signal synthesis, as demonstrated by desirable properties and examples. The resulting signal synthesis methods can be realized as a weighted sum of STFT synthesis schemes, for which there exist accessible and efficient hardware and software implementations.

Journal ArticleDOI
TL;DR: The proposed algorithm has the advantage of providing formant trajectories which, in addition to being sufficiently close to the spectral peaks of the respective formants, are sufficiently smooth to allow an accurate evaluation of formant transitions.

Proceedings ArticleDOI
13 Oct 1996
TL;DR: In this paper, the Discrete Time Wavelet Transform of the signal is calculated and the highest scales along with the low-pass residue of the wavelet transform are treated as signals and the spectrogram of each one of them is in turn treated as 2D images.
Abstract: Recognition of pre-defined musical patterns in the context of Greek Traditional Music is very useful to researchers in Musicology and Ethnomusicology. This paper presents an efficient method for recognizing isolated musical patterns played by Creek Traditional Clarinet, in a monophonic environment. The Discrete Time Wavelet Transform of the signal is calculated. The highest scales, along with the lowpass residue of the Wavelet Transform, are treated as signals and the spectrogram of each one of them is calculated. The spectrograms are in turn treated as 2-D images. A number of translation and scaling invariant moments are then computed for the resulting images. These moments are used as features, and turn out to cluster around certain points in the corresponding multidimensional feature space, for the various musical patterns. Tree-like structured classification procedure is then adopted for classification. A few clusters correspond to more than one musical pattern. In such case a Dynamic Time Warping procedure is employed to determine the specific pattern.

Proceedings ArticleDOI
12 May 1996
TL;DR: For quasistationary processes with small product of temporal and spectral correlation width (underspread processes), it is shown that one and the same set of orthogonal windows is appropriate for both the estimation and the nonstationary Wiener filtering.
Abstract: The short-time Fourier transform (STFT) and its squared magnitude, the spectrogram, are classical tools for linear and quadratic time-frequency signal representation. The choice of the STFT window entails a well-known duration-bandwidth tradeoff. Multi-window methods, as originally introduced by Thompson for spectrum estimation, help to overcome this tradeoff at the cost of a more complicated concept. The present paper extends multiwindow methods from spectral estimation to filtering of nonstationary processes. By using the Kohn-Nirenberg correspondence, new results about STFT-based filter design are obtained. For quasistationary processes with small product of temporal and spectral correlation width (underspread processes), it is shown that one and the same set of orthogonal windows is appropriate for both the estimation and the nonstationary Wiener filtering. This fact makes the present theory suitable to a numerically efficient, parallel concept for on-line signal enhancement.

Proceedings ArticleDOI
14 Oct 1996
TL;DR: A novel feature extraction technique based on the two-dimensional DCT (discrete cosine transform) and zigzag scanning of the spectrogram is proposed, in contrast to conventional approaches based on single dimension analysis such as LPC, cepstral, or FFT.
Abstract: In this paper a novel feature extraction technique based on the two-dimensional DCT (discrete cosine transform) and zigzag scanning of the spectrogram is proposed. This is in contrast to conventional approaches based on single dimension analysis such as LPC, cepstral, or FFT. As a phoneme recognition task, a series of experiments were conducted on the voice stops ('b', 'd', 'g') of the TIMIT database uttered by 630 speakers (male and female). The extracted data form the basis for input patterns for training two types of neural networks, the semi-dynamic network (TDNN), and a static network (MLP). The highest recognition rates of 77.5 and 72.4 percent were recorded for TDNN and MLP respectively. This contrasts with results of 72 percent quoted by Hwang et al. (1992) for the same phonemes spoken by 40 females.

Proceedings ArticleDOI
18 Jun 1996
TL;DR: A new method of displaying and analyzing the evolutionary correlation structure of nonstationary signals, called time-correlation analysis (TCA), is proposed, based on a filter-bank approach for stochastic signal characterization known as parametric filtering.
Abstract: This paper proposes a new method of displaying and analyzing the evolutionary correlation structure of nonstationary signals. The method, called time-correlation analysis (TCA), is based on a filter-bank approach for stochastic signal characterization known as parametric filtering. Some properties of the TCA method are discussed that can be used to interpret the TCA plot. Examples of an application to speech analysis are given.

Patent
24 Dec 1996
TL;DR: In this paper, a smoothing spectrogram was proposed to reduce the influence of periodicity in the direction of frequency and time by smoothing the lattice points of a time/frequency plane by curves of pair primary.
Abstract: PROBLEM TO BE SOLVED: To reduce influence of periodicity of a voice signal. SOLUTION: A smoothing spectrogram calculating section 10 obtains an interpolation function of a triangle having frequency width of two times as much as the fundamental frequency of a signal based on information the fundamental frequency of a signal. This interpolation function and spectrum obtained by an adaptive frequency analyzing section 9 are folded in the direction of frequency. Successively, a smoothing spectrogram in which gaps of lattice points of a time/frequency plane are filled up by curves of pair primary is obtained by interpolating in the direction of time a spectrum interpolated in the direction of frequency previously using an interpolation function of a triangle having time length of two times as much as the fundamental period. Using this smoothing spectrogram, a voice is converted. Thereby, influence of periodicity in the direction of frequency and time.

Proceedings ArticleDOI
07 May 1996
TL;DR: Three new methods of instantaneous frequency estimation are introduced and compared in view of characterizing gravitational waves, and averages based on different windowings permit to enhance the signal-to-noise ratio, leading to accurate results even below 0 dB.
Abstract: Three new methods of instantaneous frequency estimation are introduced and compared in view of characterizing gravitational waves. Two methods are Bayesian and can be formulated as solutions of an ill-posed inverse problem with two different stochastic regularizations. Using either a state-space model for the time-frequency data or a compound non-uniform Bernoulli-Gauss model for the instantaneous frequency. The third method uses a reassignment technique applied to a spectrogram. In each case, averages based on different windowings permit to enhance the signal-to-noise ratio, leading to accurate results even below 0 dB.

Proceedings ArticleDOI
31 Oct 1996
TL;DR: It is shown that given the characteristics of the HRV signal, it is advisable to use adaptive kernels in order to improve the temporal resolution and the advantage of the use of time frequency distributions (TFD) with respect to the classical spectrogram.
Abstract: The analysis of heart rate variability (HRV) is currently used as a means for non invasively researching the autonomous control of the cardiovascular system. In this analysis it is very important to have a good temporal resolution as the phenomena analyzed may present a very short duration. We show the advantage of the use of time frequency distributions (TFD) with respect to the classical spectrogram. We show that given the characteristics of the HRV signal, it is advisable to use adaptive kernels in order to improve the temporal resolution. Finally, we comment on the results obtained applying this technique to the analysis of ischemic episodes.

Proceedings ArticleDOI
TL;DR: Using parallel-processing hardware, this work is able to use FROG traces of limited size without any feature extraction as input for a neural net, and this original approach involved feature extraction by computing the lowest-order integral moments of the FROG trace, making it particularly sensitive to the presence of additive noise.
Abstract: Frequency-resolved optical gating (FROG) is a technique that allows the determination of the intensity and phase of ultrashort laser pulses. In FROG, a spectrogram of the pulse, the so- called FROG trace, is produced, from which the intensity and phase is then retrieved using an iterative algorithm. This algorithm performs well for all types of pulses, but it sometimes requires more than a minute to converge, and more rapid retrieval is important for many applications. It is therefore desirable to have a non-iterative computational method capable of inverting the function that relates the pulse intensity and phase to its FROG trace. In previous work, we showed that a neural network can retrieve simple pulses rapidly and directly. This original approach involved feature extraction by computing the lowest-order integral moments of the FROG trace, making it particularly sensitive to the presence of additive noise. Using parallel-processing hardware, we are now able to use FROG traces of limited size (32 X 32 pixel) without any feature extraction as input for a neural net. In addition, FROG traces of 64 X 64 pixel size, typical for experimental data, can be used in conjunction with a more noise-insensitive feature extraction method.© (1996) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Proceedings ArticleDOI
18 Jun 1996
TL;DR: This work addresses the problem of automatic detection and classification on spectrograms of mixed planetary low frequency radio signals with additive plasma noise with a series of preprocessings that enables the use of neural networks.
Abstract: We address the problem of automatic detection and classification on spectrograms of mixed planetary low frequency radio signals with additive plasma noise. The signals and the noise under study are overlapping, nonGaussian, non stationary and non linear. The data obtained from spacecraft telemetry are irregularly sampled. We show a series of preprocessings that enables the use of neural networks. A cluster of time delay neural networks is then used to observe the signals from many windows. The different outputs of the time delay neural networks are the inputs of multi layer perceptrons which yield an intermediate classification. Cellular automata with a look up table of rules derived from the physical laws governing the radio electric phenomena do the find pattern recognition in a deterministic number of iterations.

Proceedings ArticleDOI
TL;DR: A two- dimensional discrete wavelet transform is applied to the noisy FROG trace, threshold the wavelet coefficients, and perform the inverse wavelettransform to regain the trace, efficiently removes noise from the trace and improves the algorithm's ability to retrieve the intensity and phase of the pulse accurately.
Abstract: Frequency-resolved optical gating (FROG) is a technique for measuring ultrashort laser pulses that involves producing a spectrogram of the pulse and then retrieving the intensity and phase of the electric field using a phase-retrieval algorithm. Since noise on experimental FROG traces reduces the performance of the retrieval algorithm, removing the noise is crucial. In previous work we have shown that subtracting the mean of the noise, optimized lowpass filtering, and suppression of the corners of the trace provides an efficient tool for denoising FROG traces. The recent development of wavelet noise-reduction techniques for signal and image processing now provides a new method for attacking this problem. We apply a two- dimensional discrete wavelet transform to the noisy FROG trace, threshold the wavelet coefficients, and perform the inverse wavelet transform to regain the trace. In combination with other noise-filtering methods, this efficiently removes noise from the trace and improves the algorithm's ability to retrieve the intensity and phase of the pulse accurately, especially in fairly low-noise situations, where extremely high accuracy is desired. In addition to wavelet- coefficient thresholding, we also investigate the possibility of using a geometrical scheme for filtering the wavelet coefficients, thus combining data compression and noise reduction.

Journal ArticleDOI
TL;DR: This study examines the appropriateness of FD’s combined with 17 other general features for classifying spectrogram images, including eigenvalues and eigenvectors, gray‐level variance and covariance, run‐length and chain encodings, and segment size, shape, and compactness.
Abstract: Speech spectrograms can be analyzed using computer image processing techniques to yield high recognition rates [B. Pinkowski, Pattern Recognition 26, 1593–1602 (1993)]. In particular, Fourier descriptors (FD’s) have proven useful for characterizing the boundary of segmented isolated words containing the English semivowels (/w/, /y/, /l/, /r/). This study examines the appropriateness of FD’s combined with 17 other general features for classifying spectrogram images. The other features include eigenvalues and eigenvectors, gray‐level variance and covariance, run‐length and chain encodings, and segment size, shape, and compactness. Principal components (PC’s) are used for feature reduction on a speaker‐dependent data set consisting of 80 sounds representing 20 speaker‐dependent words containing semivowels. With eight combined features, including four 32‐point FD’s and four general features obtained from principal component analysis, a 97.5% recognition rate was obtained using a linear discriminant function. ...