scispace - formally typeset
Search or ask a question

Showing papers on "Spectrogram published in 1995"


Journal ArticleDOI
TL;DR: The reassignment method, first applied by Kodera, Gendrin, and de Villedary (1976) to the spectrogram, is generalized to any bilinear time-frequency or time-scale distribution.
Abstract: In this paper, the use of the reassignment method, first applied by Kodera, Gendrin, and de Villedary (1976) to the spectrogram, is generalized to any bilinear time-frequency or time-scale distribution. This method creates a modified version of a representation by moving its values away from where they are computed, so as to produce a better localization of the signal components. We first propose a new formulation of this method, followed by a thorough theoretical study of its characteristics. Its practical use for a large variety of known time-frequency and time-scale distributions is then addressed. Finally, some experimental results are reported to demonstrate the performance of this method. >

1,268 citations


Journal ArticleDOI
TL;DR: In the SWITCHBOARD corpus as mentioned in this paper, an attempt was made to compensate for the systematic variability due to different vocal tract lengths of various speakers by warping the spectrum of each speaker linearly over a 20% range, and finding the maximum a posteriori probability of the data given the warp.
Abstract: The performance of speech recognition systems is often improved by accounting explicitly for sources of variability in the data. In the SWITCHBOARD corpus, studied during the 1994 CAIP workshop [Frontiers in Speech Processing Workshop II, CAIP (August 1994)], an attempt was made to compensate for the systematic variability due to different vocal tract lengths of various speakers. The method found a maximum probability parameter for each speaker which mapped an acoustic model to the mean of the models taken from a homogeneous speaker population. The underlying acoustic model was that of a straight tube, and the parameter estimation was accomplished by warping the spectrum of each speaker linearly over a 20% range (actually accomplished by digitally resampling the data), and finding the maximum a posteriori probability of the data given the warp. The technique produces statistically significant improvements in accuracy on a speech transcription task using each of four different speech recognition systems. The best parametrizations were later found to correlate well with vocal tract estimates computed manually from spectrograms.

103 citations


Journal ArticleDOI
TL;DR: It is shown that, at the expense of an insignificant increase in computation time, much better results are obtained, with respect to the representation of instantaneous frequency using time-frequency distributions of energy density domain.
Abstract: This paper presents an analysis of the representation of instantaneous frequency using time-frequency distributions of energy density domain. Similarity to the "ideal" instantaneous frequency presentation is chosen as a criterion for comparison of various distributions. Although all the commonly used distributions suffer from the artifacts along frequency axis, it is shown that the Wigner distribution is the best among them, with respect to this criterion. The generalization of Wigner distribution-LWD-is introduced to decrease the artifacts. The properties of the LWD are analyzed. It is shown that, at the expense of an insignificant increase in computation time, much better results are obtained. The theory is illustrated by a numerical example with the frequency modulated signals. >

88 citations


Journal ArticleDOI
TL;DR: In this paper, a technique for simultaneously measuring the time-dependent intensity and phase of two independent and arbitrary ultrashort laser pulses from a single measured spectrogram is introduced, which is mathematically equivalent to the problem of blind deconvolution, and an algorithm analogous to those used for deblurring two-dimensional images to recover the two pulses.
Abstract: We introduce a technique for simultaneously measuring the time-dependent intensity and phase of two independent and arbitrary ultrashort laser pulses from a single measured spectrogram. This two-pulse method is mathematically equivalent to the problem of blind deconvolution, and we use an algorithm analogous to those used for deblurring two-dimensional images to recover the two pulses. We demonstrate the method by simultaneously retrieving the intensity and the phase of two different pulses from a Ti:sapphire laser, one of which is chirped by propagation through glass.

86 citations


Proceedings ArticleDOI
24 Apr 1995
TL;DR: In this paper, a variety of signal processing strategies applicable to time-variant data such as spectrogram, The Wigner Ville distribution and wavelet decomposition have been implemented and their success in detecting these non-stationary components evaluated.
Abstract: Previous work at The Robert Gordon University has shown that faults within the rotors of large three phase induction motors, such as broken rotor bars, can be detected by monitoring and analysing the line current taken by the machine during a no-load starting transient. This line current has been shown to contain frequency components which are indicative of these fault conditions. Under transient conditions these components are however nonstationary in both the time and frequency domains. A variety of signal processing strategies applicable to time variant data such as the spectrogram, The Wigner Ville distribution and wavelet decomposition have been implemented and their success in detecting these non-stationary components evaluated. The most suitable of these techniques has been used to determine the occurrence and severity of motor faults. Preliminary work suggests that these techniques may also be used to detect frequency components indicative to the location of the fault

44 citations


Journal ArticleDOI
TL;DR: In this paper, a wide class of short-time Fourier transforms (STFT's) or spectrograms can be obtained by applying an infinite-length analysis window, corresponding to an impulse response of an IIR filter, at every data sample and then taking the Fourier transform.
Abstract: A wide class of short-time Fourier transforms (STFT's), or spectrograms, can be obtained by applying an infinite-length analysis window, corresponding to an impulse response of an IIR filter, at every data sample and then taking the Fourier transform. The use of the cascade form realization of the analysis window allows simple and recursive generations of a number of distinct STFT's with different tradeoff between temporal and spectral resolutions. The temporal-spectral resolution diversity of all STFT's (spectrograms) associated with the same window is a function of its poles and zeros as well as their order of cascade. Closed-form expressions of such diversity are derived for the case of multiple-pole and all pole analysis windows. >

41 citations


Proceedings ArticleDOI
17 Nov 1995
TL;DR: In this paper, a low-cost, portable, video-camera system built by University of Bristol for the UK-DRA, RARDE Fort Halstead, permits in-field acquisition of terrestrial hyper-spectral image sets.
Abstract: A low-cost, portable, video-camera system built by University of Bristol for the UK-DRA, RARDE Fort Halstead, permits in-field acquisition of terrestrial hyper-spectral image sets. Each set is captured as a sequence of thirty-one images through a set of different interference filters which span the visible spectrum, at 10 nm intervals: effectively providing a spectrogram of 256 by 256 pixels. The system is customized from off-the-shelf components. A database of twenty-nine hyper-spectral images sets was acquired and analyzed as a sample of natural environment. We report the manifest information capacity with respect to spatial and optical frequency drawing implications for management of hyper-spectral data and visual processing.

38 citations


Journal ArticleDOI
TL;DR: The seismic spectral amplitude measurement (SSAM) system as discussed by the authors is a low-cost system for monitoring the spectra of seismic signals in near real-time on a low cost PC.
Abstract: The seismic spectral amplitude measurement (SSAM) system is a new, inexpensive tool for monitoring the spectra of seismic signals in near real time on a low-cost PC. The heart of the system is a digital signal processing board that is capable of continuously computing fast Fourier transforms (FFT) for up to 64 channels of data digitized at a rate of 100 samples per second. Parameters such as the frequency range of each spectral band and the time interval over which the spectral amplitude within each band is averaged are easily modified through a startup control file for the data acquisition program. In the current system, spectral amplitudes are computed approximately every 5 sec and then averaged within each of 16 user-defined frequency bands over a 1-min interval. Data for each time interval are output through a parallel port for use in real-time display and written into binary files on disk for archiving and later analysis. Spectrograms generated from these data proved to be an effective tool for assessing the nature of long-period (LP) event swarms that accompanied the 1989-1990 eruption sequence at Redoubt volcano, Alaska, and for distinguishing these signals from seismic noise. In particular, one of the eruptions was successfully forecast principally on the basis of identifying the precursory LP swarm on SSAM records.

30 citations


Proceedings ArticleDOI
01 Sep 1995
TL;DR: In this paper, a redundant wavelet filtering method is used in conjunction with spectrogram computations to address a component of the problem of predicting epileptic seizure activity, and it is shown that spectrograms of seizure episodes exhibit multiple chirps consistent with the relatively simple almost periodic behavior of the observed time series.
Abstract: A redundant wavelet filtering method is used in conjunction with spectrogram computations to address a component of the problem of predicting epileptic seizure activity. It is shown that spectrograms of seizure episodes exhibit multiple chirps consistent with the relatively simple almost periodic behavior of the observed time series. Scalograms corresponding to a redundant (non-dyadic) wavelet analysis are used to provide finer information about these chirps, including their evolution in preseizure intervals. Detection of the origin of such periodicities are useful in the prediction problem.

26 citations


Journal ArticleDOI
TL;DR: A new method combining both Capon's estimator and a time-octave representation is proposed to obtain legibility in the time frequency plane using a variable frequency resolution with a fixed time resolution.
Abstract: In time-frequency analysis, Capon's estimator has proven its efficiency in precise applications. In a context where a time-octave representation is also necessary, the authors propose a new method combining both Capon's estimator and a time-octave representation. The main objective is to obtain legibility in the time frequency plane using a variable frequency resolution with a fixed time resolution. This fixed time resolution is possible owing to the good resolution properties of Capon's estimator compared to the Fourier transform. This choice leads to a particular repartition of basic cells in the time-frequency plane that seems more adapted to a physical interpretation in the application presented. Nevertheless, a parallel with the wavelet transform is displayed: the constructed wavelet is adapted to the signal at each octave or at each fraction of octave. The proposed method is presented both in continuous and discrete formulations. Its structure is studied and a simplification is proposed when precise hypotheses are verified. Simulations and comparisons with classical representations (spectrogram, scalogram) are discussed. The contribution of each method, essentially in the duality of time-frequency and time-scale, are shown up in relation to the analyzed signal. Lastly, the proposed method and classical ones are applied on rear signals issued from room acoustics where the aim is the time-frequency characterization of concert halls from impulse responses. >

18 citations


Journal ArticleDOI
TL;DR: The development of a software system which can detect and identify the flight calls of migrating birds is reported, which first produces a spectrogram using a DFT and decision trees are used to determine the bird species.
Abstract: The development of a software system which can detect and identify the flight calls of migrating birds is reported. The system first produces a spectrogram using a DFT. Calls are detected in the spectrogram using an ad hoc combination of local peak‐finding and a connectedness measure. Attributes are extracted both globally from the call and from a window moved incrementally through the call. Decision trees are then used to determine the bird species. These decision trees are induced from a training set using Quinlan’s C4.5 system [J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kauffman (1993)]. The system has been tested on a set of 138 nocturnal flight calls from nine species of birds [W. R. Evans, personal communication]. Some calls are faint, and interfering insect noise is present in others. Tenfold resampling was used to classify the calls unseen. Seventy‐eight percent of calls were identified correctly, 4% incorrectly and 18% were placed in an ‘‘uncertain’’ category. Neural network‐based classifiers are commonly used in this general domain and would likely produce similar accuracy, but use of symbolic machine learning offers two important advantages: Training time is linear in the number of examples and the resulting classifier is less opaque. Both significantly ease classifier construction.

Proceedings ArticleDOI
22 Oct 1995
TL;DR: Algorithms to convert spectrograms, cochleagrams and correlograms back into sounds using convex projections and intelligent phases guesses to iteratively find the closest waveform consistent with the known information are described.
Abstract: This paper describes algorithms to convert spectrograms, cochleagrams and correlograms back into sounds Each of these representations converts sound waves into pictures or movies Techniques for inversion, known as the pattern playback problem, are important because they allow these representations to be used for analysis and transformations of sound The algorithms described here use convex projections and intelligent phases guesses to iteratively find the closest waveform consistent with the known information Reconstructions from the spectrogram and cochleagram are indistinguishable from the original sound In informal listening tests, the correlogram reconstructions are nearly identical

Proceedings ArticleDOI
07 Jun 1995
TL;DR: In this article, it is shown how full TFR representations can be obtained by using a carefully chosen set of scaled cross spectrogram windows, thus avoiding the inherent approximations of the eigenvector approach.
Abstract: The decomposition of time-frequency representations (TFRs) in terms of weighted spectrograms has been recently proposed by several authors. Spectrogram decomposition concepts allow inner products to be used in the computations rather than the more cumbersome outer products usually associated with TFR computations. Kernels are decomposed in terms of eigenvectors in such a manner that a TFR may be represented by a truncated spectrogram series according to the strength of the eigenvalues associated with these eigenvectors. Many TFRs can be represented by relatively few spectrograms due to the small contributions of the remaining spectrograms with eigenvalues below some threshold value. The windows of the spectrograms forming the spectrogram series are the eigenvectors of the decomposition of the kernel of the particular representation. In the present paper it is shown how full TFR representations can be obtained by using a carefully chosen set of scaled cross spectrogram windows, thus avoiding the inherent approximations of the eigenvector approach. Much redundancy can be taken advantage of to permit computation of a small number of short-time Fourier transforms (STFTs). It is not practical to compute the Wigner distribution via the spectrogram decomposition approach due to the fact that the singular values of the decomposition are plus or minus one, precluding truncation of the spectrogram series. The new approach, on the other hand, can represent a Wigner distribution and other TFRs with a small number of STFTs. These STFTs can be used to compute a number of spectrograms and cross-spectrograms which, when appropriately weighted and summed, yield a given TFR, depending on the kernel used in the decomposition.

Book ChapterDOI
01 Jan 1995
TL;DR: An image size reduction allows browsing among icons rather than among original images, a much faster operation, and ordering the archive by its content, which facilitates searching and comparing tasks, is investigated using self-organizing maps.
Abstract: The quantity of observational data recorded by high resolution spectrometers is in constant increase and must be managed efficiently, in order to be best exploited. More precisely, operations for searching, comparing and counting events in the spectrogram archive must be available. However, the large, variable size of spectrograms, as well as their low signal to noise ratio, prevent the use of conventional methods to implement these tasks. We propose an alternative method, more adapted to the kind of data we manage. Spectrograms are reduced to smaller representations, called image icons. They are built from data density considerations in a higher-dimensional space. An image size reduction allows browsing among icons rather than among original images, a much faster operation. Ordering the archive by its content, which facilitates searching and comparing tasks, is investigated using self-organizing maps. Such a map conserves the topological order of the input space. This property may be used to automatically distinguish dissimilar emission types.

Patent
08 Feb 1995
TL;DR: In this article, a two-dimensional image matrix (spectrogram) is formed in which the frequency components of a speech signal are transformed from time domain to frequency domain with a Fast Fourier Transform (FFT) algorithm.
Abstract: The present invention relates to a method with which a speech signal can be compressed, respectively reconstructed, with the aid of two-dimensional algorithms developed for image processing. Both the compression and reconstruction can be thought of being formed from two different phases. In the first phase from a one-dimensional speech signal a two-dimensional image matrix (spectrogram) is formed in which the frequency components of a speech signal transformed from time domain to frequency domain with e.g. Fast Fourier Transform algorithm are presented frame by frame as a function of time. In the second phase of the method the image presented by the image matrix is compressed with a powerful two-dimensional compression algorithm, e.g. by JPEG. The reconstruction of speech is performed in inverse order; decompression (IJPEG) and inverse FFT (IFFT). The method is particularly appropriate for storing and transmitting speech when no strict real time properties are necessitated in the transmission or storing.

Proceedings ArticleDOI
20 Sep 1995
TL;DR: The steady state visual evoked response (VER), but under physiological aspects the transient VER seems to be more appropriate for reliable signal detection, and a stimulation with flicker-bursts is proposed combining both methods.
Abstract: Knowing the properties of the EEG-signal related to a visual stimulus is of fundamental importance for an adequate design of a signal detection method, e.g. In objective sensory function diagnostics. From a signal-theoretical point of view the steady state visual evoked response (VER), but under physiological aspects the transient VER seems to be more appropriate for reliable signal detection. A stimulation with flicker-bursts is proposed combining both methods. The EEG/VER-signals have been analyzed with different methods of time-frequency analysis: spectrogram, pseudo-Wigner distribution, cone-kernel representation, reduced-interference distribution are applied to real signals. Concerning feature extraction of the stimulus-method, the reduced-interference distribution offers the best properties.

Journal ArticleDOI
TL;DR: The conventional 2-D spectrogram is compared with the 2- D Wigner-Ville distribution (WVD) that is the most fundamental bilinear image representation, and their ability to discriminate textures is compared.

Proceedings ArticleDOI
07 Nov 1995
TL;DR: In this paper, the authors examined three alternative, nonstationary spectral estimators: the Choi-Williams distribution (CWD), the Bessel distribution (ED) and the novel adaptive-Q distribution (AQD) for their applicability to Doppler ultrasound.
Abstract: The time-frequency distribution (TFD) of Doppler blood flow signals is usually obtained using the spectrogram, which requires signal stationarity and is known to produce large estimation variance. The authors examine three alternative, nonstationary spectral estimators: the Choi-Williams distribution (CWD), the Bessel distribution (ED) and the novel, adaptive-Q distribution (AQD) for their applicability to Doppler ultrasound. A synthetic Doppler signal, simulating the nonaxial and pulsatile flow of the common carotid artery, was used as the test signal. The theoretical distribution was compared to each technique, with the cross-correlation (/spl rho/) and the root-mean-square-error (RMSE) providing a quantitative assessment. The AQD had the lowest RMSE and the highest /spl rho/ of all the TFDs, while the CWD and the ED were very similar and better than the spectrogram (but both were prone to low-level noise). In conclusion, the AQD performed better than the traditional spectrogram and the other TFDs, but it was more computationally demanding.

Proceedings ArticleDOI
18 Oct 1995
TL;DR: An investigation was carried out to evaluate the performance of a Multi-Layer Perceptron based neural network transient classifier for detecting attacks, using bolt cutters, on security fences.
Abstract: An investigation was carried out to evaluate the performance of a Multi-Layer Perceptron based neural network transient classifier for detecting attacks, using bolt cutters, on security fences. A tape containing acoustic recordings from fence mounted microphonic cable security systems was used in the investigation. The data was digitised and Fourier Transformed and the resulting spectrograms were subject to detailed examination, in conjunction with aural analysis, in order to deduce appropriate time/frequency resolution for distinguishing genuine attacks from background signals. This facilitated the selection of suitable candidate sets of processing parameters for the system. The data was then partitioned into training and test data. Normalised spectrograms were extracted from the training data and labelled appropriately as "Fencecut" or "Backgrnd" for use as training templates for the neural networks. A back-propagation algorithm was used for training the neural networks.

01 Jan 1995
TL;DR: In this paper, the authors present a system for the acquisition and analysis of acoustic, flow and pressure signals generated during snoring using a commercial A/D converter, while analysis is done by a series of routines written in C and the MATLAB signal processing environment.
Abstract: We present a system for the acquisition and analysis of acoustic, flow and pressure signals which are generated during snoring. Signal acquisition is handled by a commercial A/D converter, while analysis is done by a series of routines written in C and the MATLAB signal processing environment. The system has been evaluated using normal subjects, while ongoing studies are being done on rea1 patients. I.INIXODUCTION The acoustical phenomenon of snoring is due to the vibration of the pharynx during partial obstruction of the upper respiratory tract (l). This phenomenon is present either sporadically or constantly in over 40% of an adult population (2,3). In these subjects, between 3 and 6% present a complete but intermittent obstruction of breathing during sleep, which is known as Sleep Obstructive Apnea Syndrome (SAOS), and which is considered a risk factox in cardiopulmonarydisease (4,5). The use of a polysomnographic recording is the standard evaluation procedure for patients at risk. Part of these recordings is the acquisition of sound, flow and pressure data. The purpose of this paper is to analyze acoustical signals during simulated snoring in order to investigate the mechanisms responsible for the generation of these sounds, and to provide a simple tool to discriminate those subjects needing a more complex polysomnographic analysis from those that do not. II. METHODS Signal acquisition was carried out, using a 12 bit MI) converter card (PCL-818, Advantech Co. Ltd.) with 16 analog input channels and a lOOKHz maximum sampling rate,which was installed in a PC-compatible microcomputer. A program, written in C, handles all the acquisition paramenters (sampling rate, number of channels and number of samples to convert) and the display of the data. The signal analysis system was programmed using the tools provided by the MATLAB signal processing environment, which runs under the WINDOWS graphical user interface. This analysis system includes spectral estimation by deterministic and autoregressive models. In particular, the Fast Fourier Transform, using a Hamming window, and Burg's estimator were employed. Other functions that were implemented include a 2D and 3D spectrogram, the cross-correlation function, and XY graphs. The system is cabable of finding the best estimation of the autoregressive (AR) model's order, using the FinaI Prediction Error @PE) criterion. For this analysis, simulated snoring sounds were acquired from seven volunteers without a history of SAOS at the Instituto Nacional de Enfermedades Respiratorias (INER, National Institute for Respiratory Disease), Mdxico D.F., using an electret condenser microphone, with a frequency response between 20 Hz and 10 KHz, which was placed 30 cm fiom the subject's mouth. Oral, nasal and oronasal inspiratory sounds were acquired, at a 15KHz sampling rate, while the subjects underwent a full neck extension. Each acquisition consisted of 32Ksamples. m. RESULTS

01 Jan 1995
TL;DR: The information theoretic measure of mutual information is used to investigate the distribution of phonetic information across the on/off aligned auditory spectrogram for a corpus of vowel-plosive-vowel utterances to test to what extent small high information samples are sufficient for plosive discrimination.
Abstract: In this paper the As. use the information theoretic measure of mutual information to investigate the distribution of phonetic information across the on/off aligned auditory spectrogram for a corpus of vowel-plosive-vowel utterances. Automatic recognition is then used to test to what extent small high information samples are sufficient for plosive discrimination

Proceedings Article
21 May 1995
TL;DR: In this paper, the authors demonstrate a spectrally and temporally resolved upconversion technique (STRUT) in a single shot configuration which produces a group-delay spectrogram, allowing essentially real-time characterization of femtosecond laser pulses.
Abstract: There are presently a variety of techniques available for the characterization of femtosecond laser pulses. These characterizations are accomplished by analyzing either interferograms, spectrograms, or the nonlinear spectrum modulation. These techniques all require either scanning optical delays or complicated algorithms to compute the phase, which limit their speed.1’2,3,4 We demonstrate a spectrally and temporally resolved upconversion technique (STRUT) in a single shot configuration which produces a group-delay spectrogram. This spectrogram requires only a simple, and thus rapid, algorithm to retrieve the phase, allowing essentially real-time characterization of femtosecond laser pulses.

Patent
05 Jul 1995
TL;DR: In this article, a method for fast in-line inspection of capacitor includes such steps as connecting capacitor to the test terminate of impedance spectrum analyzer controlled by computer, applying DC bias voltage across the capacitor, setting up proper spectrogram, and judging the quality of the capacitor according to the resonant peak indicated by the spectrogram.
Abstract: The method for fast in-line inspection of capacitor includes such steps as connecting capacitor to be tested to the test terminate of impedance spectrum analyzer controlled by computer, applying DC bias voltage across the capacitor, setting up proper spectrogram, and judging the quality of the capacitor according to the resonant peak indicated by the spectrogram.

01 Mar 1995
TL;DR: In this article, the authors compared the detection ability of the spectrogram, the bispectrum, and outer product (dyadic) representation for digitally modulated signals corrupted by additive white Gaussian noise.
Abstract: : This thesis compared the detection ability of the spectrogram, the 1-1/2D instantaneous power spectrum (l-1/2Dips), the bispectrum, and outer product (dyadic) representation for digitally modulated signals corrupted by additive white Gaussian noise. Four detection schemes were tried on noise free BPSK, QPSK, FSK, and 00K signals using different transform lengths. After determining the optimum transform length, each test signal is corrupted by additive white Gaussian noise. Different SNR levels were used to determine the lowest SNR level at which the message or the modulation type could be extracted. The optimal transform length was found to be the symbol duration when processing BPSK, 00K, and FSK via the spectrogram, the 1-1/2Dips, or the bispectrum method. The best transform size for QPSK was half of the symbol length. For the outer product (dyadic) spectral representation, the best transform size was four times larger than the symbol length. For all processing techniques, with the exception of the other product representation, the minimum detectable SNR is about 15 dB for BPSK, FSK, and 00K signals and about 20 dB for QPSK signals. For the outer product spectral method, these values tend to be about 10 dB lower. (KAR) p. 2

Proceedings ArticleDOI
15 Oct 1995
TL;DR: The modified moving window method is presented, invented by Kodera (1976), which performs a time-frequency mapping of the DFT magnitude yielding a more concentrated energy distribution in the t-f-plane.
Abstract: The short-time Fourier transform (STFT) or spectrogram data is the base for signal analysis in many computer music applications as resynthesis and sound interpolation. There is a tradeoff between time and frequency resolution due to the length of the analysis window used. This paper presents the modified moving window method, invented by Kodera (1976), which performs a time-frequency mapping of the DFT magnitude yielding a more concentrated energy distribution in the t-f-plane. A simplified method is proposed to overcome the computational complexity of the original version leading to the "improved spectrogram". The resulting analysis data are used to form a parameter set of a sinusoidal signal representation which serves as the input to a resynthesis algorithm.