Showing papers on "Spectrogram published in 2002"

PDF

Open Access

Patent•

System and method for real-time spectrum analysis in a communication device

[...]

Gary L. Sugar, Karl A. Miller, Jong Sup Baek

18 Sep 2002

TL;DR: A spectrum analysis engine (SAGE) as mentioned in this paper consists of a spectrum analyzer, a signal detector, a universal signal synchronizer, and a snapshot buffer component, where the signal detector detects signal pulses in the frequency band and outputs pulse event information entries.

...read moreread less

Abstract: A spectrum analysis engine (SAGE) that comprises a spectrum analyzer component, a signal detector component, a universal signal synchronizer component and a snapshot buffer component. The spectrum analyzer component generates data representing a real-time spectrogram of a bandwidth of radio frequency (RF) spectrum. The signal detector detects signal pulses in the frequency band and outputs pulse event information entries output, which include the start time, duration, power, center frequency and bandwidth of each detected pulse. The signal detector also provides pulse trigger outputs which may be used to enable/disable the collection of information by the spectrum analyzer and the snapshot buffer components. The snapshot buffer collects a set of raw digital signal samples useful for signal classification and other purposes. The universal signal synchronizer synchronizes to periodic signal sources, useful for instituting schemes to avoid interference with those signals.

...read moreread less

134 citations

Proceedings Article•DOI•

Yet Another Algorithm for Pitch Tracking

[...]

Kavita Kasi¹, Stephen A. Zahorian¹•Institutions (1)

Old Dominion University¹

13 May 2002

TL;DR: This paper presents a pitch detection algorithm that is extremely robust for both high quality and telephone speech and evaluated its algorithm using the Keele pitch extraction reference database as “ground truth” for both “high quality” and “telephone” speech.

...read moreread less

Abstract: In this paper, we present a pitch detection algorithm that is extremely robust for both high quality and telephone speech. The kernel method for this algorithm is the “NCCF or Normalized Cross Correlation” reported by David Talkin [1]. Major innovations include: processing of the original acoustic signal and a nonlinearly processed version of the signal to partially restore very weak F0 components; intelligent peak picking to select multiple F0 candidates and assign merit factors; and, incorporation of highly rohust pitch contours obtained from smoothed versions of low frequency portions of spectrograms. Dynamic programming is used to find the “best” pitch track among all the candidates, using both local and transition costs. We evaluated our algorithm using the Keele pitch extraction reference database as “ground truth” for both “high quality” and “telephone” speech. For both types of speech, the error rates obtained are lower than the lowest reported in the literature.

...read moreread less

126 citations

Journal Article•DOI•

Spectrogram segmentation by means of statistical features for non-stationary signal interpretation

[...]

C. Hory¹, Nadine Martin¹, A. Chehikian¹•Institutions (1)

Centre national de la recherche scientifique¹

01 Dec 2002-IEEE Transactions on Signal Processing

TL;DR: This paper investigates the use of TFR statistical properties for classification or recognition purposes, focusing on a particular TFR: the spectrogram, and proposes a method of segmentation that is relevant for the signal understanding.

...read moreread less

Abstract: Time-frequency representations (TFRs) are suitable tools for nonstationary signal analysis, but their reading is not straightforward for a signal interpretation task. This paper investigates the use of TFR statistical properties for classification or recognition purposes, focusing on a particular TFR: the spectrogram. From the properties of a stationary process periodogram, we derive the properties of a nonstationary process spectrogram. It leads to transform the TFR to a local statistical features space from which we propose a method of segmentation. We illustrate our matter with first- and second-order statistics and identify the information they, respectively, provide. The segmentation is operated by a region growing algorithm, which does not require any prior knowledge on the nonstationary signal. The result is an automatic extraction of informative subsets from the TFR, which is relevant for the signal understanding. Examples are presented concerning synthetic and real signals.

...read moreread less

96 citations

Journal Article•DOI•

Blind high-resolution localization and tracking of multiple frequency hopped signals

[...]

Xiangqian Liu¹, N.D. Sidiropoulos¹, Ananthram Swami²•Institutions (2)

University of Minnesota¹, United States Army Research Laboratory²

01 Apr 2002-IEEE Transactions on Signal Processing

TL;DR: This paper identifies a hop free subset of data by discarding high-entropy spectral slices from the spectrogram, then performs low-rank decomposition of four-way data generated by capitalizing on both spatial and temporal shift invariance for high resolution direction of arrival (DOA) recovery.

...read moreread less

Abstract: This paper considers the problem of blind localization and tracking of multiple frequency-hopped spread-spectrum signals using a uniform linear antenna array without knowledge of hopping patterns or directions of arrival. As a preprocessing step, we propose to identify a hop-free subset of data by discarding high-entropy spectral slices from the spectrogram. High-resolution localization is then achieved via either quadrilinear regression of four-way data generated by capitalizing on both spatial and temporal shift invariance or a new maximum likelihood (ML)-based two-dimensional (2-D) harmonic retrieval algorithm. The latter option achieves the best-known model identifiability bound while remaining close to the Cramer-Rao bound even at low signal-to-noise ratios (SNRs). Following beamforming using the recovered directions, a dynamic programming approach is developed for joint ML estimation of signal frequencies and hop instants in single-user tracking. The efficacy of the proposed algorithms is illustrated in pertinent simulations.

...read moreread less

81 citations

Journal Article•DOI•

Beam intensity striations and applications.

[...]

T. C. Yang¹•Institutions (1)

United States Naval Research Laboratory¹

06 May 2002-Journal of the Acoustical Society of America

TL;DR: The single-element spectrogram for a continuous broadband signal, plotted as a function of range, has been shown to exhibit striated bands of intensity maxima and minima, an invariant of the modal interference and described by a waveguide invariant parameter "beta."

...read moreread less

Abstract: The single-element spectrogram for a continuous broadband signal, plotted as a function of range, has been shown to exhibit striated bands of intensity maxima and minima. The slope of the striations is an invariant of the modal interference and is described by a waveguide invariant parameter “beta.” The striation pattern is analyzed and modeled in this paper for the beam outputs of a horizontal line array obtained by conventional beamforming. Array beamforming makes it possible to measure the waveguide invariant parameter for weak signals due to the enhancement of signal levels by the array gain over that of a single element. It is shown that the signal beam spectrogram as a function of range exhibits the same striation pattern as that (predicted) for a single element. Specifically, for a broadside signal, the beam striation is identical to that of a single-element plus a constant signal gain. For a nonbroadside target, the signal beam intensity will be modified by a frequency-bearing dependent signal gain due to the signal spread over multiple beams, nevertheless the beam spectrogram retains the same striation pattern (slope) as for a single element. The sidelobe beams (outside the canonical cones containing the signal arrivals) exhibit an entirely different striation pattern as a function of frequency and range. For array processing, it is shown that a fast range-rate, close range target and a distant, slow range-rate interference source will have a different striation pattern (slope) in the corresponding beam spectrograms as a function of time, assuming no prior knowledge of the source ranges. The difference in the striations between the beam spectrograms can be used in array processing to suppress the interference contribution. A 5–7 dB interference suppression is demonstrated using simulated data.

...read moreread less

52 citations

Proceedings Article•

Speech reconstruction from mel-frequency cepstral coefficients using a source-filter model

[...]

Ben Milner, Xu Shao

01 Sep 2002

TL;DR: This work presents a method of reconstructing a speech signal from a stream of MFCC vectors using a source-filter model of speech production, and listening tests reveal that the reconstructed speech is intelligible and of similar quality to a system based on LPC analysis of the original speech.

...read moreread less

Abstract: This work presents a method of reconstructing a speech signal from a stream of MFCC vectors using a source-filter model of speech production. The MFCC vectors are used to provide an estimate of the vocal tract filter. This is achieved by inverting the MFCC vector back to a smoothed estimate of the magnitude spectrum. The Wiener- Khintchine theorem and linear predictive analysis transform this into an estimate of the vocal tract filter coefficients. The excitation signal is produced from a series of pitch pulses or white noise, depending on whether the speech is voiced or unvoiced. This pitch estimate forms an extra element of the feature vector. Listening tests reveal that the reconstructed speech is intelligible and of similar quality to a system based on LPC analysis of the original speech. Spectrograms of the MFCC-derived speech and the real speech are included which confirm the similarity.

...read moreread less

50 citations

Journal Article•DOI•

Flaw localization using the reassigned spectrogram on laser-generated and detected Lamb modes

[...]

Christine Valle¹, Jerrol W. Littles²•Institutions (2)

University of Maine¹, Pratt & Whitney²

01 Jun 2002-Ultrasonics

TL;DR: The reassigned spectrogram is used to characterize the modal and frequency content of a single ultrasonic signal as a function of time, enabling a procedure to locate flaws in an aluminum plate specimen.

...read moreread less

50 citations

Journal Article•DOI•

Evaluation of a strategy for automatic formant tracking

[...]

Terrance M. Nearey, Peter F. Assmann, James Hillenbrand

25 Oct 2002-Journal of the Acoustical Society of America

TL;DR: In this paper, a correlation measure between a spectrogram of the original signal and one resynthesized from each estimated track set is defined, combined with other heuristic figures of merit (based on continuity, formant ranges, and bandwidths) to choose the best analysis.

...read moreread less

Abstract: Variations on an automatic formant tracking strategy developed at Alberta will be compared to manual formant measurements from two databases of vowels spoken by men, women, and children (in Texas or Michigan). ‘‘Correct’’ vowel formant candidates for F1, F2, and F3 may be found roughly 85–90 percent of the time for adult male speakers using autocorrelation LPC with the following settings: F3 maximum at 3000 Hz, LPC order of 14, sampling rate of 10 kHz [J. Markel and A. Gray, Linear Prediction of Speech (Springer, New York, 1975)]. Experience shows good results are also often found with females’ and children’s speech, provided the sampling rate and F3 maximum are scaled appropriately for each speaker. Our new basic strategy involves analyzing each utterance at several distinct sampling rates and coordinated F3 cutoff frequencies with a fixed LPC order. Each scaling choice provides an independent set of candidates that is post‐processed by a simple tracking algorithm. A correlation measure between a spectrogram of the original signal and one resynthesized from each estimated track set is defined. This measure is combined with other heuristic figures of merit (based on, e.g., continuity, formant ranges, and bandwidths) to choose the ‘‘best’’ analysis.

...read moreread less

36 citations

Proceedings Article•DOI•

Missing data speech recognition in reverberant conditions

[...]

Kalle J. Palomäki¹, Guy J. Brown¹, Jon Barker¹•Institutions (1)

University of Sheffield¹

13 May 2002

TL;DR: An auditory processing front-end for missing data speech recognition, which is robust in the presence of reverberation, is described, which attempts to identify time-frequency regions that are not badly contaminated by reverberation and have strong speech energy.

...read moreread less

Abstract: In this study we describe an auditory processing front-end for missing data speech recognition, which is robust in the presence of reverberation. The model attempts to identify time-frequency regions that are not badly contaminated by reverberation and have strong speech energy. This is achieved by applying reverberation masking. Subsequently, reliable time-frequency regions are passed to a ‘missing data’ speech recogniser for classification. We demonstrate that the model improves recognition performance in three different virtual rooms where reverberation time T60 varies from 0.7 sec to 2.7 sec. We also discuss the advantages of our approach over RASTA and modulation filtered spectrograms.

...read moreread less

34 citations

Journal Article•DOI•

Estimation of the signal-to-noise ratio with amplitude modulation spectrograms

[...]

Jürgen Tchorz, Birger Kollmeier¹•Institutions (1)

University of Oldenburg¹

01 Sep 2002-Speech Communication

TL;DR: An algorithm is proposed which automatically estimates the local signal-to-noise ratio (SNR) between speech and noise, motivated by neurophysiological findings on amplitude modulation processing in higher stages of the auditory system in mammals.

...read moreread less

28 citations

Journal Article•DOI•

Time-resolved tracking of a sound scatterer in a complex flow: nonstationary signal analysis and applications.

[...]

Nicolas Mordant¹, Jean-François Pinton, Olivier Michel•Institutions (1)

École normale supérieure de Lyon¹

02 Jul 2002-Journal of the Acoustical Society of America

TL;DR: In this article, a new technique for the measurement of the velocity of individual solid particles moving in fluid flows is proposed, which relies on the ability to resolve in time the Doppler shift of the sound scattered by the continuously insonified particle.

...read moreread less

Abstract: It is known that ultrasound techniques yield nonintrusive measurements of hydrodynamic flows. For example, the study of the echoes produced by a large number of particles insonified by pulsed wavetrains has led to a now-standard velocimetry device. In this paper, a new technique for the measurement of the velocity of individual solid particles moving in fluid flows is proposed. It relies on the ability to resolve in time the Doppler shift of the sound scattered by the continuously insonified particle. For this signal-processing problem two classes of approaches can be used: time-frequency analysis and parametric high-resolution methods. In the first class the spectrogram and reassigned spectrogram is considered, and applied to detect the motion of a small bead settling in a fluid at rest. In nonstationary flows, methods in the second class are more robust. An approximated maximum likelihood (AML) technique has been adapted, coupled with a generalized Kalman filter. This method allows for the estimation of rapidly varying frequencies; the parametric nature of the algorithm also provides an estimate of the variance of the identified frequency parameters.

...read moreread less

Proceedings Article•

2-d processing of speech with application to pitch estimation.

[...]

Thomas F. Quatieri¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2002

TL;DR: A new approach to two-dimensional (2-D) processing of the one-dimensional speech signal in the time-frequency plane is introduced and the shortspace 2-D Fourier transform magnitude of a narrowband spectrogram of the signal is obtained and shown to maps harmonically-related signal components to a concentrated entity in the new 1-D plane.

...read moreread less

Abstract: In this paper, we introduce a new approach to two-dimensional (2-D) processing of the one-dimensional (1-D) speech signal in the time-frequency plane. Specifically, we obtain the shortspace 2-D Fourier transform magnitude of a narrowband spectrogram of the signal and show that this 2-D transformation maps harmonically-related signal components to a concentrated entity in the new 2-D plane. We refer to this series of operations as the “grating compression transform” (GCT), consistent with sine-wave grating patterns in the spectrogram reduced to smeared impulses. The GCT forms the basis of a speech pitch estimator that uses the radial distance to the largest peak in the GCT plane. Using an average magnitude difference between pitch-contour estimates, the GCT-based pitch estimator is shown to compare favorably to a sine-wave-based pitch estimator for all-voiced speech in additive white noise. An extension to a basis for two-speaker pitch estimation is also proposed.

...read moreread less

Journal Article•

The Use of Wavelets for Analyzing Transient Machinery Vibration

[...]

Howard A. Gaberson

01 Jan 2002-Sound and Vibration

TL;DR: In this paper, a MATLAB® wavelet code is proposed to detect the location and magnitude of transient events in a continuous machinery vibration data. The code decomposes the signal into wavelets, which can be used to identify the transient.

...read moreread less

Abstract: Transient events are sometimes buried in continuous machinery vibration data. Conditions causing these transient events include: a bearing ball rolling over a defect, a pit or chip on the face of a gear tooth, clearance in a bearing that allows a repetitive pounding, engine rod or main bearing knock and piston slap. Detection of the magnitude and timing of these events can be valuable for diagnostics. These can be identified with time-frequency methods that show frequency content and time of occurrence on a two-dimension contour plot. Wavelets can also be used to detect magnitude and timing. The orthogonal wavelet is inexpensive to compute and has the potential to display something new. The code decomposes the signal into wavelets, the location and magnitude of which identify the transient. Partial inverse transformation also shows regions of the signal where different wavelet levels build up to recreate the transient in the signal. The recent availability of orthogonal wavelets and MATLAB® wavelet code has made such analyses convenient. This article explains these methods and the signal processing calculations involved. Many problems in machinery diagnostics are characterized by transient or impulsive events in the vibration signal that cause the frequency content to vary considerably and regularly with time. Several methods are available for analyzing transient events. Signal analyzers average individual spectra to a single spectrum, but they convert the time signal to a frequency signal and cannot show any time-frequency variations. The Short Time Fourier Transform (STFT) or spectrogram is used to display time-frequency variations in speech analysis but does not provide sufficient resolution for some machinery diagnostics problems. The Wigner distribution increases resolution in timefrequency distributions. It can locate the angular position of an impact or the discontinuity associated with individual gear tooth faults.1,2 However, the Wigner distribution has severe interference (cross) terms that confuse the interpretation and require additional efforts to resolve. The Reduced Interference Distributions (RIDs) mitigate the Wigner cross terms while preserving sharp resolution. The Choi and Williams distribution3 yields impressive detail and a significant structure in the timefrequency plane with recognizable impact. Finally, the wavelet transform has recently become available and offers an additional approach. 4 With some development, this tool could become very helpful in machinery diagnostics.

...read moreread less

Patent•DOI•

2-d processing of speech

[...]

Jr. T. Quatieri¹•Institutions (1)

Massachusetts Institute of Technology¹

13 Sep 2002-Journal of the Acoustical Society of America

TL;DR: The grating compression transform (GCT) as mentioned in this paper uses sine-wave grating patterns in the frequency-related representation reduced to smeared impulses to estimate pitch estimates of voiced speech or provide noise filtering or speaker separation in a multiple speaker acoustic signal.

...read moreread less

Abstract: Acoustic signals are analyzed by two-dimensional (2-D) processing of the one-dimensional (1-D) speech signal in the time-frequency plane. The short-space 2-D Fourier transform of a frequency-related representation (e.g., spectrogram) of the signal is obtained. The 2-D transformation maps harmonically-related signal components to a concentrated entity in the new 2-D plane (compressed frequency-related representation). The series of operations to produce the compressed frequency-related representation is referred to as the “grating compression transform” (GCT), consistent with sine-wave grating patterns in the frequency-related representation reduced to smeared impulses. The GCT provides for speech pitch estimation. The operations may, for example, determine pitch estimates of voiced speech or provide noise filtering or speaker separation in a multiple speaker acoustic signal.

...read moreread less

An extension for source separation techniques avoiding beats

[...]

Harald Viste¹, Gianpaolo Evangelista¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

01 Jan 2002

TL;DR: In this paper, a method to separate overlapping par-tials in stereo signals is presented, which looks at the shapes of partial envelopes and uses minimization of the difference between such shapes in order to demix overlapping partials.

...read moreread less

Abstract: The problem of separating individual sound sources from a mix-ture of these, known as Source Separation or Computational Au-ditory Scene Analysis (CASA), has become popular in the recentdecades. A number of methods have emerged from the study ofthis problem, some of which perform very well for certain types ofaudio sources, e.g. speech. For separation of instruments in mu-sic, there are several shortcomings. In general when instrumentsplay together they are not independent of each other. More specif-ically the time-frequency distributions of the different sources willoverlap. Harmonic instruments in particular have high probabilityof overlapping partials. If these overlapping partials are not sepa-rated properly, the separated signals will have a different sensationof roughness, and the separation quality degrades.In this paper we present a method to separate overlapping par-tials in stereo signals. This method looks at the shapes of partialenvelopes, and uses minimization of the difference between suchshapes in order to demix overlapping partials. The method canbe applied to enhance existing methods for source separation, e.g.blind source separation techniques, model based techniques, andspatial separation techniques. We also discuss other simpler meth-ods that can work with mono signals.1. INTRODUCTIONWhen instruments play together, their signals are mixed together.Source separation is simply the problem of obtaining the originalsource signals from the recorded mixture. The problem with mu-sic signals, as opposed to other types of signals like e.g. speech,is that the sources are normally not independent. First of all, theinstruments are dependent among each others in time, due to thefact that they all follow the same underlying tempo and rhythmof the music piece. In addition, for melodic instruments whichha ve a pitch and harmonic structure, the notes are often related byharmonic intervals as well. When two partials fall within one crit-ical band, the ear will not hear two separate sounds, but rather onecombined sound. This is explained in [1]: “When two sinusoidswith slightly different frequencies are added together, they resem-ble a single sinusoid, with frequency equal to the mean frequencyof the two components, but whose amplitude ﬂuctu ates at a regularrate. These ﬂuctua tions in amplitude are known as ’beats’.” Thesebeats occur at a rate equal to the frequency difference of the twocomponents.If the beats are slow, this results in audible loudness ﬂuctu-ations. In the combined sound these ﬂuctuations sounds natural,but in the separated signals such ﬂuctuations can be very annoy-ing if present. For faster beats the ﬂuctuations can not be heardseparately, but rather as an increase in the roughness of the sound.This roughness is related to the consonance [2], and depends onfrequency. Maximum roughness occurs for beat frequencies in therange 30-70Hz [3]. However, partials that are 30-70Hz apart, arequite well handled by existing source separation methods. We willtherefore concentrate on slow beats.Figure 1 shows the spectrogram of two trumpet notes, an F atabout 349 Hz, and a C at about 523 Hz. Every third partial of theF overlaps every second partial of the C. For the partials slightlyabove 1, 2 and 3 kHz, one can see the beats as amplitude ﬂuctua-tions. By counting the number of ﬂuctuations per second, we cansee that the beat frequencies for these are about 3, 6 and 9 Hz,respectively. All the other partials have quite constant amplitudeduring their duration. Obviously, the ﬂuctuations we see in the

...read moreread less

Journal Article•DOI•

Speech enhancement using fourth-order cumulants and optimum filters in the subband domain

[...]

Elias Nemer¹, Rafik Goubran², Samy A. Mahmoud²•Institutions (2)

Intel¹, Carleton University²

01 Mar 2002-Speech Communication

TL;DR: It is shown that the kurtosis and the diagonal slice of the FOC may be used to estimate such parameters as the SNR, the speech autocorrelation and the probability of speech presence in a given band.

...read moreread less

Proceedings Article•DOI•

Maximum likelihood noise estimation for spectrogram segmentation control

[...]

Cyril Hory, Nadine Martin

13 May 2002

TL;DR: An approximation to the Maximum Likelihood estimator of the γ distribution parameters is proposed and it is shown that it leads to build an efficient estimators of a white Gaussian process variance by noting that γ distributions admits sufficient statistics.

...read moreread less

Abstract: This communication is composed of two related parts First we propose an approximation to the Maximum Likelihood estimator of the γ distribution parameters We show that it leads to build an efficient estimator of a white Gaussian process variance by noting that γ distribution admits sufficient statistics Second we describe an application of this result to a non-stationary signal spectrogram segmentation that we proposed recently Examples of segmented spectrograms are presented on a synthetic signal and on an acoustical recording of a dolphin whistle

...read moreread less

Patent•

Speech recognition system using spectrogram analysis

[...]

Vitaliy Fain, Samuel Fain

07 Mar 2002

TL;DR: In this article, a comparison of one or more dictionary entries with a sound record of a human utterance is performed to determine whether and where each dictionary entry is contained within the sound record.

...read moreread less

Abstract: Computer comparison of one or more dictionary entries with a sound record of a human utterance to determine whether and where each dictionary entry is contained within the sound record. The record is segmented, and for each vocalized segment a spectrogram is obtained, and for other segments symbolic and numeric data are obtained. The spectrogram of a vocalized segment is then processed to decrease noise and to eliminate variations in pronunciation. Each entry in the dictionary is then compared with every sequence of segments of substantially the same length in the sound record. The comparison takes into account the formant profiles within each vocalized segment and symbolic and numeric data for other segments are obtained in the record and in the dictionary entries.

...read moreread less

Proceedings Article•DOI•

Using a chain of LVQ neural networks for pattern recognition of EEG signals related to intermittent photic-stimulation

[...]

Mauricio Kugler, Heitor Silvério Lopes

11 Nov 2002

TL;DR: This work reports the use of neural networks for pattern recognition in electroencephalographic signals related to intermittent photic-stimulation and demonstrates the feasibility of the proposed system for real-time pattern recognition of complex signals.

...read moreread less

Abstract: This work reports the use of neural networks for pattern recognition in electroencephalographic signals related to intermittent photic-stimulation. Due to the low signal/noise ratio of this kind of signal, it was necessary the use of a spectrogram as a predictor and a chain of LVQ neural networks. The efficiency of this pattern recognition structure was tested for many different configurations of the neural networks parameters and different volunteers. A direct relationship between the dimension of the neural networks and their performance was observed. Results so far encourage new experiments and demonstrate the feasibility of the proposed system for real-time pattern recognition of complex signals.

...read moreread less

Patent•

Method, apparatus, and program for information discrimination and recording medium

[...]

Yasuhiro Tokuri, 戸栗康裕

30 Sep 2002

TL;DR: In this paper, a spectrum analysis of an input audio signal in prescribed block units is performed to obtain a spectrogram for every prescribed discrimination section, and a horizontal line frequency component extraction part 13 regards a spectrum of each small block as an image to extract horizontal line components being within a prescribed partial area of a two-dimensional frequency area.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To accurately discriminate and detect a voice or music for every prescribed time section from an information source including an audio signal. SOLUTION: A spectrogram calculation part 11 performs frequency analysis of a spectrum of an input audio signal in prescribed block units to obtain a spectrogram for every prescribed discrimination section. A horizontal line frequency component extraction part 13 regards a spectrogram of each small block as an image to extract horizontal line components being within a prescribed partial area of a two-dimensional frequency area. A horizontal line power ratio calculation part 14 obtains a ratio of the power of extracted horizontal line components to the power of the overall two-dimensional frequency area, and a comprehensive power ratio calculation part 15 evaluates the horizontal line component power ratio obtained for each small block to obtain a comprehensive horizontal line component power ratio as featured values. A voice/music discrimination part 16 uses the comprehensive horizontal line component power ratio to discriminate between a voice or music. COPYRIGHT: (C)2004,JPO

...read moreread less

Quotient Signal Decomposition and Order Estimation

[...]

Domenico Napoletani, Carlos A. Berenstein, Perinkulam S. Krishnaprasad

01 Jan 2002

TL;DR: A method for blind signal decomposition of speech signals that does not require that the sources are independent or stationary is proposed, and experimental results show that the method works even when several shifted versions of the same source are mixed.

...read moreread less

Abstract: : In this paper, the authors propose a method for blind signal decomposition of speech signals that does not require that the sources are independent or stationary. This method, which they consider a simple instance of nonlinear projection pursuit, is based on the possibility of recovering the areas in the time-frequency domain where the original signals are isolated, or almost isolated, with the use of suitable quotients of linear combinations of the spectrograms of the mixtures. The authors then threshold such quotients according to the value of their imaginary parts to prove that the method is theoretically sound under mild assumptions on the mixing matrix and the sources. They study one basic algorithm based on this method. The algorithm has the important feature of estimating the number of sources with two measurements. It then requires n-2 additional measurements to provide a reconstruction of n sources. Experimental results show that the method works even when several shifted versions of the same source are mixed.

...read moreread less

Patent•

Speech recognition system using normalized voiced segment spectrogram analysis

[...]

Vitaliy Fain, Samuel Fain

07 Mar 2002

TL;DR: In this article, a computer comparison of one or more dictionary entries with a sound record of a human utterance is performed to determine whether and where each dictionary entry is contained within the sound record.

...read moreread less

Abstract: Computer comparison of one or more dictionary entries with a sound record of a human utterance to determine whether and where each dictionary entry is contained within the sound record. The record is segmented, and for each vocalized segment a spectrogram is obtained, and for other segments symbolic and numeric data are obtained. The spectrogram of a vocalized segment is then processed using a method selected from a group consisting of a triple time transform, a triple frequency transform, a linear-piecewise-linear transform, and combinations thereof, to decrease noise and to eliminate variations in pronunciation. Each entry in the dictionary is then compared with every sequence of segments of substantially the same length in the sound record. The comparison takes into account the formant profiles within each vocalized segment and symbolic and numeric data for other segments are obtained in the record and in the dictionary entries.

...read moreread less

Proceedings Article•DOI•

Pitch and speech-rate conversion using envelope modulation modeling

[...]

Kazuaki Yoshida¹, Michiko Kazama², Mikio Tohyama¹•Institutions (2)

Kogakuin University¹, Waseda University²

13 May 2002

TL;DR: A method of intelligible speech representation that uses narrow-band envelopes and their carriers that enables modification of the talker's voice pitch and speech-rate without sacrificing intelligibility and could be useful in frequency scaling of the speech spectrum to assist hearing-impaired listeners or in time scaling the speech signal for speech signal reproduction.

...read moreread less

Abstract: This article describes a method of intelligible speech representation that uses narrow-band envelopes and their carriers. This method enables modification of the talker's voice pitch and speech-rate without sacrificing intelligibility. The carrier, which shows the instantaneous phase, conveys pitch information, while the temporal envelope conveys speech-rate information and preserves speech intelligibility. The carriers, however, can be replaced by sinusoidal signals without severely degrading intelligibility or voice quality. Consequently, we can modify the pitch by shifting each envelope's carrier-frequency and convert the speech-rate by stretching or shrinking the envelopes. These findings could be useful in frequency scaling of the speech spectrum to assist hearing-impaired listeners or in time scaling of the speech signal for speech signal reproduction.

...read moreread less

Proceedings Article•DOI•

Harmonic spectrogram for the analysis of semi-periodic physiologic signals

[...]

James McNames¹, Cristina Crespo¹, Mateo Aboy¹, Jules Bassale¹, L. Jenkins¹, Brahm Goldstein - Show less +2 more•Institutions (1)

Portland State University¹

23 Oct 2002

TL;DR: In this paper, a simple method that improves the resolution of power spectral density (PSD) estimates of nonsinusoidal semi-periodic signals is described, which is a useful tool for the analysis of physiologic signals such as the electrocardiogram and blood pressure.

...read moreread less

Abstract: This paper describes a simple method that improves the resolution of power spectral density (PSD) estimates of nonsinusoidal semi-periodic signals. This method can also be used to generate time-frequency mappings similar to the spectrogram. This is a useful tool for the analysis of physiologic signals such as the electrocardiogram and blood pressure. Several examples are given of the method applied to biomedical signals.

...read moreread less

Techniques For The Visualization Of Nonstationary Biomedical Signals

[...]

James McNames¹, Jules Bassale, Mateo Aboy, Cristina Crespo, Brahm Goldstein - Show less +1 more•Institutions (1)

Portland State University¹

01 Jan 2002

TL;DR: Two visualization techniques for nonstationary biomedical signals are described and several examples of how they can be used to extract useful information are given and how the pulse morphology changes with time is described.

...read moreread less

Abstract: We describe two visualization techniques for nonstationary biomedical signals and give several examples of how they can be used to extract useful information. We first discuss the spectrogram, a popular technique for visualizing how the estimated power spectral density varies with time. We also describe a complementary technique that displays how the pulse morphology changes with time. Both of these techniques use a sliding window to account for the nonstationarity. 1 Spectrogram Time-frequency visualization is a common preliminary step in the analysis of nonstationary signals. The most popular technique is the spectrogram, which estimates the power spectral density (PSD) by applying the periodogram to windowed segments separated by a fixed interval. This is computationally efficient because it incorporates the fast Fourier transform (FFT). The user specifies the window shape and length that controls the trade-off between time and frequency resolution of the image. For all of the examples in this paper we used a Blackman window and zero padding to evaluate the estimate at smaller frequency intervals. Fig. 1. The top plot shows the spectrogram of the interbeat intervals of a patient with obstructive sleep apnea. The bottom plot shows the interbeat interval (IBI) series generated from the ECG signal. The window length was 4.27 min. to capture the low frequency respiratory oscillations. The spectrogram is a useful tool for visualizing heart rate variability (HRV). Fig. 1 shows an example of the interbeat interval (IBI) spectrogram generated from the electrocardiogram (ECG) of a patient with obstructive sleep apnea (OSA). OSA is characterized by intermittent

...read moreread less

Journal Article•DOI•

A time-filtered wigner transformation for use in signal analysis

[...]

P.G. Baum¹•Institutions (1)

Leibniz University of Hanover¹

01 Nov 2002-Mechanical Systems and Signal Processing

TL;DR: In this article, a procedure is presented, with which by repeated filtering in time domain and subsequent Wigner transformation a transformation result is achieved, in which the time-frequency resolution of the WIGNer transformation is maintained and at the same time the interference terms can be removed.

...read moreread less

Proceedings Article•DOI•

Noise suppression based on approximate KLT with wavelet packet expansion

[...]

Chung-Hsien Yang¹, Jhing-Fa Wang¹•Institutions (1)

National Cheng Kung University¹

13 May 2002

TL;DR: Experimental results show that this method achieves satisfactory enhancement of speech by using approximate Karhunen-Loeve transform using the Aurora-2 database.

...read moreread less

Abstract: In this paper, we perform the noise suppression based on approximate Karhunen-Loeve transform (KL T). The discrete cosine transform(DCT) has been a good candidate for approximate KLT when the signal is modeled as an autoregressive process. However, for nonstationary signals, wavelet transform is more capable than DCT while approximating KLT. To calculate approximate KLT, we first represent the signal by using wavelet packet based on a basis search algorithm, then eigenvectors are evaluated from the basis. A linear estimator based on these eigenvectors can be constructed and used to perform noise reduction. We evaluate the performance of this method by using the Aurora-2 database. The SNR improvement is calculated. Some waveforms and spectrograms of enhanced speech are also shown. Finally. the enhanced speech is tested for speech recognition. These experimental results show that this method achieves satisfactory enhancement of speech.

...read moreread less

Journal Article•DOI•

Monitoring the Formation of Kernel-Based Topographic Maps with Application to Hierarchical Clustering of Music Signals

[...]

Marc M. Van Hulle¹, Temujin Gautama¹•Institutions (1)

Katholieke Universiteit Leuven¹

01 Aug 2002

TL;DR: A new algorithm for monitoring the degree of topology preservation of kernel-based maps during learning is introduced, applied to a real-world example concerned with the identification of 3 musical instruments and the notes played by them by means of a hierarchical clustering analysis.

...read moreread less

Abstract: When using topographic maps for clustering purposes, which is now being considered in the data mining community, it is crucial that the maps are free of topological defects. Otherwise, a contiguous cluster could become split into separate clusters. We introduce a new algorithm for monitoring the degree of topology preservation of kernel-based maps during learning. The algorithm is applied to a real-world example concerned with the identification of 3 musical instruments and the notes played by them, in an unsupervised manner, by means of a hierarchical clustering analysis, starting from the music signal's spectrogram.

...read moreread less

Proceedings Article•DOI•

Flight parameter identification from cepstrum tracks

[...]

Y. Gao¹, G.W. Pulford¹, J. Sendt¹, A. Maguer¹•Institutions (1)

Thales Underwater Systems¹

07 Aug 2002

TL;DR: In this article, the authors used the asymmetry of the Lloyds mirror rings (LMR) that have been converted into a primary rahmonic in the cepstrogram of the acoustic data.

...read moreread less

Abstract: By using a single microphone located above the ground, it is possible to determine the flight parameters of an aircraft fly-over. This technique utilises the asymmetry of the Lloyds mirror rings (LMR) that have been converted into a primary rahmonic in the cepstrogram of the acoustic data. Unlike previous techniques, the spectrogram is not needed. The cepstrum data are automatically processed by a hidden Markov model tracker that provides input to the flight parameter estimation stage. The Levenberg-Marquardt optimisation procedure is then applied to derive the aircraft speed along with the time, horizontal distance and height of the closest point of approach. Reliable cepstrogram estimates are obtainable when at least three LMR's are present in the spectrogram data.

...read moreread less

Journal Article•DOI•

Analysis of Room Responses, Motivated by Auditory Perception

[...]

Tapio Lokki, Matti Karjalainen

01 Jun 2002-Journal of New Music Research

TL;DR: With the proposed method it is possible to study the decaying sound field in a room with a resolution that mimics the human hearing better than simpler methods such as spectrograms or one-third octave band filtering.

...read moreread less

Abstract: An analysis and visualization method for room responses is presented The method utilizes time and frequency resolution similar to the human hearing With the proposed method it is possible to study the decaying sound field in a room with a resolution that mimics the human hearing better than simpler methods such as spectrograms or one-third octave band filtering This method is applicable as well in the analysis of artificial reverberation and related audio effects The analysis method includes the use of directional microphones, which yields information cues about the diffuseness and the directional characteristics of sound fields in the time-frequency domain This approach is particularly interesting and promising in the visualization of concert hall acoustics As case studies two example responses, one from a small and another one from a large concert hall, are analyzed

...read moreread less