scispace - formally typeset
Search or ask a question

Showing papers on "Spectrogram published in 2007"


Journal ArticleDOI
TL;DR: An unsupervised learning algorithm for the separation of sound sources in one-channel music signals is presented and enables a better separation quality than the previous algorithms.
Abstract: An unsupervised learning algorithm for the separation of sound sources in one-channel music signals is presented. The algorithm is based on factorizing the magnitude spectrogram of an input signal into a sum of components, each of which has a fixed magnitude spectrum and a time-varying gain. Each sound source, in turn, is modeled as a sum of one or more components. The parameters of the components are estimated by minimizing the reconstruction error between the input spectrogram and the model, while restricting the component spectrograms to be nonnegative and favoring components whose gains are slowly varying and sparse. Temporal continuity is favored by using a cost term which is the sum of squared differences between the gains in adjacent frames, and sparseness is favored by penalizing nonzero gains. The proposed iterative estimation algorithm is initialized with random values, and the gains and the spectra are then alternatively updated using multiplicative update rules until the values converge. Simulation experiments were carried out using generated mixtures of pitched musical instrument samples and drum sounds. The performance of the proposed method was compared with independent subspace analysis and basic nonnegative matrix factorization, which are based on the same linear model. According to these simulations, the proposed method enables a better separation quality than the previous algorithms. Especially, the temporal continuity criterion improved the detection of pitched musical sounds. The sparseness criterion did not produce significant improvements

1,096 citations


Journal ArticleDOI
TL;DR: Two techniques to separate out the speech signal of the speaker of interest from a mixture of speech signals are presented and can result in significant enhancement of individual speakers in mixed recordings, consistently achieving better performance than that obtained with hard binary masks.
Abstract: The problem of single-channel speaker separation attempts to extract a speech signal uttered by the speaker of interest from a signal containing a mixture of acoustic signals. Most algorithms that deal with this problem are based on masking, wherein unreliable frequency components from the mixed signal spectrogram are suppressed, and the reliable components are inverted to obtain the speech signal from speaker of interest. Most current techniques estimate this mask in a binary fashion, resulting in a hard mask. In this paper, we present two techniques to separate out the speech signal of the speaker of interest from a mixture of speech signals. One technique estimates all the spectral components of the desired speaker. The second technique estimates a soft mask that weights the frequency subbands of the mixed signal. In both cases, the speech signal of the speaker of interest is reconstructed from the complete spectral descriptions obtained. In their native form, these algorithms are computationally expensive. We also present fast factored approximations to the algorithms. Experiments reveal that the proposed algorithms can result in significant enhancement of individual speakers in mixed recordings, consistently achieving better performance than that obtained with hard binary masks.

129 citations


Journal ArticleDOI
TL;DR: An algorithm for estimating signals from short-time magnitude spectra is introduced offering a significant improvement in quality and efficiency over current methods, and is applied to audio time-scale and pitch modification and compared to classical algorithms for these tasks on a variety of signal types.
Abstract: An algorithm for estimating signals from short-time magnitude spectra is introduced offering a significant improvement in quality and efficiency over current methods. The key issue is how to invert a sequence of overlapping magnitude spectra (a ldquospectrogramrdquo) containing no phase information to generate a real-valued signal free of audible artifacts. Also important is that the algorithm performs in real-time, both structurally and computationally. In the context of spectrogram inversion, structurally real-time means that the audio signal at any given point in time only depends on transform frames at local or prior points in time. Computationally, real-time means that the algorithm is efficient enough to run in less time than the reconstructed audio takes to play on the available hardware. The spectrogram inversion algorithm is parameterized to allow tradeoffs between computational demands and the quality of the signal reconstruction. The algorithm is applied to audio time-scale and pitch modification and compared to classical algorithms for these tasks on a variety of signal types including both monophonic and polyphonic audio signals such as speech and music.

122 citations


Journal ArticleDOI
TL;DR: The analysis and comparisons of the spectrogram, Wigner distribution and wavelet transform techniques to the phonocardiogram signal (PCG) are presented to be able to distinguish the various techniques in their aptitude to separate and present suitably the internal components of these sounds.

95 citations


Proceedings ArticleDOI
27 Apr 2007
TL;DR: This paper examines the problem of human target detection and identification using single-channel, airborne, synthetic aperture radar (SAR) using a MATLAB simulation environment and shows that spectrograms have some ability to detect and identify human targets in low noise.
Abstract: Radar offers unique advantages over other sensors, such as visual or seismic sensors, for human target detection. Many situations, especially military applications, prevent the placement of video cameras or implantment seismic sensors in the area being observed, because of security or other threats. However, radar can operate far away from potential targets, and functions during daytime as well as nighttime, in virtually all weather conditions. In this paper, we examine the problem of human target detection and identification using single-channel, airborne, synthetic aperture radar (SAR). Human targets are differentiated from other detected slow-moving targets by analyzing the spectrogram of each potential target. Human spectrograms are unique, and can be used not just to identify targets as human, but also to determine features about the human target being observed, such as size, gender, action, and speed. A 12-point human model, together with kinematic equations of motion for each body part, is used to calculate the expected target return and spectrogram. A MATLAB simulation environment is developed including ground clutter, human and non-human targets for the testing of spectrogram-based detection and identification algorithms. Simulations show that spectrograms have some ability to detect and identify human targets in low noise. An example gender discrimination system correctly detected 83.97% of males and 91.11% of females. The problems and limitations of spectrogram-based methods in high clutter environments are discussed. The SNR loss inherent to spectrogram-based methods is quantified. An alternate detection and identification method that will be used as a basis for future work is proposed.

94 citations


Proceedings ArticleDOI
01 Aug 2007
TL;DR: A framework for analog-to-information conversion based on the theory of information recovery from random samples enables sub-Nyquist acquisition and processing of wideband signals that are sparse in a local Fourier representation.
Abstract: We develop a framework for analog-to-information conversion based on the theory of information recovery from random samples. The framework enables sub-Nyquist acquisition and processing of wideband signals that are sparse in a local Fourier representation. We present the random sampling theory associated with an efficient information recovery algorithm to compute the spectrogram of the signal. Additionally, we develop a hardware design for the random sampling system that demonstrates a consistent reconstruction fidelity in the presence of sampling jitter, which forms the main source of non-ideality in a practical system implementation.

89 citations


Proceedings ArticleDOI
27 Aug 2007
TL;DR: It is argued that the 2-D Gabor filterbank has the capacity to decompose a patch into its underlying dominant spectro-temporal components, and the response of the filterbank to different speech phenomena is illustrated.
Abstract: We present a 2-D spectro-temporal Gabor filterbank based on the 2-D Fast Fourier Transform, and show how it may be used to analyze localized patches of a spectrogram. We argue that the 2-D Gabor filterbank has the capacity to decompose a patch into its underlying dominant spectro-temporal components, and we illustrate the response of our filterbank to different speech phenomena such as harmonicity, formants, vertical onsets/offsets, noise, and overlapping simultaneous speakers.

86 citations


Patent
30 Jul 2007
TL;DR: In this article, a spectrum analysis engine (SAGE) consisting of a spectrum analyzer, a signal detector, a universal signal synchronizer component and a snapshot buffer component is presented.
Abstract: A spectrum analysis engine (SAGE) that comprises a spectrum analyzer component, a signal detector component, a universal signal synchronizer component and a snapshot buffer component. The spectrum analyzer component generates data representing a real-time spectrogram of a bandwidth of radio frequency (RF) spectrum. The signal detector detects signal pulses in the frequency band and outputs pulse event information entries output, which include the start time, duration, power, center frequency and bandwidth of each detected pulse. The signal detector also provides pulse trigger outputs which may be used to enable/disable the collection of information by the spectrum analyzer and the snapshot buffer components. An alternative pulse detection module is provided that tracks signal pulses by comparing peak data from successive FFT cycles with existing signal pulse data that is derived from comparing peak data for prior FFT cycles. Peaks for new FFT cycles are matched to data associated with signal pulses determined to be occurring over many FFT intervals.

84 citations


Proceedings ArticleDOI
24 Aug 2007
TL;DR: A novel voice conversion system using phoneme-based linear mapping functions on main vowel phonemes is proposed in this paper, which has the following three improvements: instead of using all the vocal tract resonance (VTR) vectors in the portion of a phoneme, the VTR vector at the steady-state of each phoneme is used to train phoneme's GMM.
Abstract: A novel voice conversion system using phoneme-based linear mapping functions on main vowel phonemes is proposed in this paper. Our voice conversion algorithm has the following three improvements. First, instead of using all the vocal tract resonance (VTR) vectors in the portion of a phoneme, we use the VTR vector at the steady-state of each phoneme to train phoneme-based GMM. Second, different linear mapping functions have been trained to describe the mapping relationships for corresponding phonemes. Third, in the transformation procedure, the transformed formant frequencies at the main vowel phonemes are obtained using the corresponding GMM. Besides, prosody parameters are also transformed. Finally the converted speech is re-synthesized with the transformed parameters by high quality speech manipulation framework STRAIGHT (Speech Transformation and Representation based on Adaptive Interpolation of weiGHTed spectrogram). Perceptual results for F-M and M-F conversion show that our MOS score of the converted voice is improved from 3.8 to 4.1 and ABX score from 3.3 to 3.8 compared with IBM's system. Comparisons with other systems are also given in this paper.

79 citations


Journal ArticleDOI
TL;DR: Experimental results with 70 popular songs showed that the template-adaptation and harmonic-structure-suppression methods improved the recognition accuracy and achieved 83%, 58%, and 46% in detecting onsets of the bass drum, snare drum, and hi-hat cymbals, respectively.
Abstract: This paper describes a system that detects onsets of the bass drum, snare drum, and hi-hat cymbals in polyphonic audio signals of popular songs. Our system is based on a template-matching method that uses power spectrograms of drum sounds as templates. This method calculates the distance between a template and each spectrogram segment extracted from a song spectrogram, using Goto's distance measure originally designed to detect the onsets in drums-only signals. However, there are two main problems. The first problem is that appropriate templates are unknown for each song. The second problem is that it is more difficult to detect drum-sound onsets in sound mixtures including various sounds other than drum sounds. To solve these problems, we propose template-adaptation and harmonic-structure-suppression methods. First of all, an initial template of each drum sound, called a seed template, is prepared. The former method adapts it to actual drum-sound spectrograms appearing in the song spectrogram. To make our system robust to the overlapping of harmonic sounds with drum sounds, the latter method suppresses harmonic components in the song spectrogram before the adaptation and matching. Experimental results with 70 popular songs showed that our template-adaptation and harmonic-structure-suppression methods improved the recognition accuracy and achieved 83%, 58%, and 46% in detecting onsets of the bass drum, snare drum, and hi-hat cymbals, respectively

69 citations


Journal ArticleDOI
TL;DR: In this article, a new technique is presented to automatically identify and characterize waves in three-axis data, which can be applied in a variety of settings, including triaxial ground-magnetometer data or satellite wave data.
Abstract: [1] A new technique designed to automatically identify and characterize waves in three-axis data is presented, which can be applied in a variety of settings, including triaxial ground-magnetometer data or satellite wave data (particularly when transformed to a field-aligned coordinate system). This technique is demonstrated on a single Pc1 event recorded on a triaxial search coil magnetometer in Parkfield, California (35.945°,−120.542°), and then applied to a 6-month period between 1 June 2003 and 31 December 2003. The technique begins with the creation of a standard dynamic spectrogram and consists of three steps: (1) for every column of the spectrogram (which represents the spectral content of a short period in the time series), spectral peaks are identified whose power content significantly exceeds the ambient noise; (2) the series of spectral peaks from step 1 are grouped into continuous blocks representing discrete wave events using a “spectral-overlap” criterion; and (3) for each identified event, wave parameters (e.g., wave normal angles, polarization ratio) are calculated which can be used to check the continuity of individual identified wave events or to further filter wave events (e.g., by polarization ratio).

Journal ArticleDOI
TL;DR: Particle filtering algorithms rely on a so-called sequential importance distribution, and it is shown that it can be built on previous multipitch estimation algorithms, so as to yield an even more efficient estimation procedure with established convergence properties.
Abstract: This paper addresses the joint estimation and detection of time-varying harmonic components in audio signals. We follow a flexible viewpoint, where several frequency/amplitude trajectories are tracked in spectrogram using particle filtering. The core idea is that each harmonic component (composed of a fundamental partial together with several overtone partials) is considered a target. Tracking requires to define a state-space model with state transition and measurement equations. Particle filtering algorithms rely on a so-called sequential importance distribution, and we show that it can be built on previous multipitch estimation algorithms, so as to yield an even more efficient estimation procedure with established convergence properties. Moreover, as our model captures all the harmonic model information, it actually separates the harmonic sources. Simulations on synthetic and real music data show the interest of our approach

Journal ArticleDOI
TL;DR: In this paper, the authors used a nonlinear decomposition technique called the empirical mode decomposition EMD method with the Hilbert transform to obtain more reliable low frequency electromagneticVLF-EM data.
Abstract: Geologic noise and background electromagnetic EM waves often degrade the quality of very low frequency electromagneticVLF-EMdata.Toretrievesignalswithsignificant geologic information, we used a new nonlinear decomposition technique called the empirical mode decomposition EMD method with the Hilbert transform. We conducted a 2Dresistivitymodelstudythatincludedinversionofthesyntheticdatatotesttheaccuracyandcapabilitiesofthismethod. Next, we applied this method to real data obtained from a fieldexperimentandageologicexample.ThefilteringprocedureforrealdatastartswithapplyingtheEMDmethodtodecompose the VLF data into a series of intrinsic mode functions that admit a well-behaved Hilbert transform. With the Hilbert transform, the intrinsic mode functions yielded a spectrogram that presents an energy-wavenumber-distance distribution of the VLF data. We then examined the decomposeddataandtheirspectrogramtodeterminethenoisecomponents, which we eliminated to obtain more reliable VLF data. The EMD-filtered data and their associated spectrograms indicated the successful application of this method. Because VLF data are recorded as a complex function of the real variable distance, the in-phase and quadrature parts are complementarycomponentsofeachotherandcouldbeaHilbert transform pair if the data are analytical and noise free. Therefore,bycomparingtheoriginaldatasetwiththeoneobtained from the Hilbert transform, we could evaluate data quality and could even replace the original with its Hilbert transform counterpart with acceptable accuracy. By application of both this technique and conventional methods to real data in this study, we have shown the superiority of this new method and have obtained a more reliable earth model by invertingtheEMD-filtereddata.

Journal ArticleDOI
TL;DR: This paper proposes a novel F0 contour estimation algorithm based on a precise parametric description of the voiced parts of speech derived from the power spectrum that is competitive on clean single-speaker speech, and outperforms existing methods in the presence of noise.
Abstract: This paper proposes a novel F0 contour estimation algorithm based on a precise parametric description of the voiced parts of speech derived from the power spectrum. The algorithm is able to perform in a wide variety of noisy environments as well as to estimate the F0s of cochannel concurrent speech. The speech spectrum is modeled as a sequence of spectral clusters governed by a common F0 contour expressed as a spline curve. These clusters are obtained by an unsupervised 2-D time-frequency clustering of the power density using a new formulation of the EM algorithm, and their common F 0 contour is estimated at the same time. A smooth F0 contour is extracted for the whole utterance, linking together its voiced parts. A noise model is used to cope with nonharmonic background noise, which would otherwise interfere with the clustering of the harmonic portions of speech. We evaluate our algorithm in comparison with existing methods on several tasks, and show 1) that it is competitive on clean single-speaker speech, 2) that it outperforms existing methods in the presence of noise, and 3) that it outperforms existing methods for the estimation of multiple F0 contours of cochannel concurrent speech

Journal ArticleDOI
TL;DR: It is concluded that a supervised learning approach to note onset detection performs well and warrants further investigation.
Abstract: This paper presents a novel approach to detecting onsets in music audio files. We use a supervised learning algorithm to classify spectrogram frames extracted from digital audio as being onsets or nononsets. Frames classified as onsets are then treated with a simple peak-picking algorithm based on amoving average.We present two versions of this approach. The first version uses a single neural network classifier. The second version combines the predictions of several networks trained using different hyperparameters. We describe the details of the algorithm and summarize the performance of both variants on several datasets.We also examine our choice of hyperparameters by describing results of cross-validation experiments done on a custom dataset. We conclude that a supervised learning approach to note onset detection performs well and warrants further investigation.

Journal ArticleDOI
TL;DR: Spectrogram analysis of reconstructed speech shows that highly intelligible speech is produced with the quality of the speaker-dependent speech being slightly higher owing to the more accurate fundamental frequency and voicing predictions.
Abstract: This work proposes a method for predicting the fundamental frequency and voicing of a frame of speech from its mel-frequency cepstral coefficient (MFCC) vector representation. This information is subsequently used to enable a speech signal to be reconstructed solely from a stream of MFCC vectors and has particular application in distributed speech recognition systems. Prediction is achieved by modeling the joint density of fundamental frequency and MFCCs. This is first modeled using a Gaussian mixture model (GMM) and then extended by using a set of hidden Markov models to link together a series of state-dependent GMMs. Prediction accuracy is measured on unconstrained speech input for both a speaker-dependent system and a speaker-independent system. A fundamental frequency prediction error of 3.06% is obtained on the speaker-dependent system in comparison to 8.27% on the speaker-independent system. On the speaker-dependent system 5.22% of frames have voicing errors compared to 8.82% on the speaker-independent system. Spectrogram analysis of reconstructed speech shows that highly intelligible speech is produced with the quality of the speaker-dependent speech being slightly higher owing to the more accurate fundamental frequency and voicing predictions

Proceedings ArticleDOI
15 Apr 2007
TL;DR: In this paper, the authors explore the use of the modulation frequency domain for single channel speaker separation and show that multiple speakers are highly separable in this space, and propose an automatic speaker separation algorithm that only needs a rough estimate of the target speaker's pitch range.
Abstract: We explore the use of the modulation frequency domain for single channel speaker separation. We discuss features of the modulation spectrogram of speech signals that suggest that multiple speakers are highly separable in this space. In a preliminary experiment, we separate a target speaker from an interfering speaker by manually masking out modulation spectral features of the interferer. We extend this experiment into a new automatic speaker separation algorithm, and show that it achieves an acceptable level of separation. The new algorithm only needs a rough estimate of the target speaker's pitch range.

Journal ArticleDOI
TL;DR: In the presence of impulsive noise, the spectrogram-based detector using the French hat wavelet as the filter kernel outperforms the GLRT detector and decreases computational time by a factor of 6.
Abstract: This paper considers the problem of detection of contact calls produced by the critically endangered North Atlantic right whale, Eubalaena glacialis. To reduce computational time, the class of acceptable detectors is constrained by the detectors implemented as a bank of two-dimensional linear FIR filters and using the data spectrogram as the input. The closed form representations for the detectors are derived and the detection performance is compared with that of the generalized likelihood ratio test (GLRT) detector. The test results demonstrate that in the presence of impulsive noise, the spectrogram-based detector using the French hat wavelet as the filter kernel outperforms the GLRT detector and decreases computational time by a factor of 6.

Proceedings ArticleDOI
15 Apr 2007
TL;DR: This work introduces an approach that leverages a probabilistic representation of phase to improve the separation results and investigates the consequences of this common assumption that the mixture spectrogram depends on the phase of the source STFTs.
Abstract: Spectrogram factorization methods have been proposed for single channel source separation and audio analysis. Typically, the mixture signal is first converted into a time-frequency representation such as the short-time Fourier transform (STFT). The phase information is thrown away and this spectrogram matrix is then factored into the sum of rank-one source spectrograms. This approach incorrectly assumes the mixture spectrogram is the sum of the source spectrograms. In fact, the mixture spectrogram depends on the phase of the source STFTs. We investigate the consequences of this common assumption and introduce an approach that leverages a probabilistic representation of phase to improve the separation results.

Journal ArticleDOI
TL;DR: A new method and application is proposed to characterize intensity and pitch of human heart sounds and murmurs using recorded heart sounds from the library of one of the authors, and a visual map of heart sound energy was established.
Abstract: A new method and application is proposed to characterize intensity and pitch of human heart sounds and murmurs. Using recorded heart sounds from the library of one of the authors, a visual map of heart sound energy was established. Both normal and abnormal heart sound recordings were studied. Representation is based on Wigner-Ville joint time-frequency transformations. The proposed methodology separates acoustic contributions of cardiac events simultaneously in pitch, time and energy. The resolution accuracy is superior to any other existing spectrogram method. The characteristic energy signature of the innocent heart murmur in a child with the S3 sound is presented. It allows clear detection of S1, S2 and S3 sounds, S2 split, systolic murmur, and intensity of these components. The original signal, heart sound power change with time, time-averaged frequency, energy density spectra and instantaneous variations of power and frequency/pitch with time, are presented. These data allow full quantitative characterization of heart sounds and murmurs. High accuracy in both time and pitch resolution is demonstrated. Resulting visual images have self-referencing quality, whereby individual features and their changes become immediately obvious.

Proceedings ArticleDOI
04 Dec 2007
TL;DR: A probabilistic model of interaural level and phase differences and an EM algorithm for finding the maximum likelihood parameters of this model are described, which is able to separate and localize more sound sources than there are available channels.
Abstract: We describe a system for localizing and separating multiple sound sources from a reverberant two-channel recording. It consists of a probabilistic model of interaural level and phase differences and an EM algorithm for finding the maximum likelihood parameters of this model. By assigning points in the interaural spectrogram probabilistically to sources with the best-fitting parameters and then estimating the parameters of the sources from the points assigned to them, the system is able to separate and localize more sound sources than there are available channels. It is also able to estimate frequency-dependent level differences of sources in a mixture that correspond well to those measured in isolation. In experiments in simulated anechoic and reverberant environments, the proposed system improved the signal-to-noise ratio of target sources by 2.7 and 3.4dB more than two comparable algorithms on average.

Patent
07 Mar 2007
TL;DR: In this paper, the authors provide systems and methods that facilitate the location and identification of repetitive DNA patterns, such as CpG islands, AIu repeats, tandem repeats and various types of satellite repeats.
Abstract: Spectrogram extraction from DNA sequence has been known since 2001. A DNA spectrogram is generated by applying Fourier transform to convert a symbolic DNA sequence consisting of letters A, T, C, G into a visual representation that highlights periodicities of co-occurrence of DNA patterns. Given a DNA sequence or whole genomes, with this method it is easy to generate a large number of spectrogram images. However, the difficult part is to elucidate where are the repetitive patterns and to associate a biological and clinical meaning to them. The present disclosure provides systems and methods that facilitate the location and/or identification of repetitive DNA patterns, such as CpG islands, AIu repeats, tandem repeats and various types of satellite repeats. These repetitive elements can be found within a chromosome, within a genome or across genomes of various species. The disclosed systems and methods apply image processing operators to find prominent features in the vertical and horizontal direction of the DNA spectrograms. Systems and methods for fast, full scale analysis of the derived images using supervised machine learning methods are also disclosed. The disclosed systems and methods for detecting and/or classifying repetitive DNA patterns include: (a) comparative histogram method, (b) feature selection and classification using support vector machines and genetic algorithms, and (c) generation of spectrovideo from a plurality of spectral images.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a novel alias-free time-frequency distribution, which can avoid the information loss in Choi-Williams distribution while suppressing the cross-terms in Wigner-Ville distribution.

Journal ArticleDOI
TL;DR: The reassigned joint time-frequency transform is applied to study distinct microDoppler features of a moving human and reassigned spectrograms generated from a simulation model and measured data are presented and analysed.
Abstract: The Doppler spectrogram of a moving human is characterised by microDoppler returns due to the dynamic movements of the different body parts. The reassigned joint time-frequency transform is applied to study these distinct microDoppler features. Reassigned spectrograms generated from a simulation model and measured data are presented and analysed.

Proceedings ArticleDOI
Michele Covell1, Shumeet Baluja1
15 Apr 2007
TL;DR: The resulting system has excellent detection capabilities for small snippets of audio that have been degraded in a variety of manners, including competing noise, poor recording quality, and cell-phone playback.
Abstract: In this paper, we present a novel system for detecting known audio. We start with Waveprint, an audio identification system that, given a probe snippet, efficiently provides reliable forced-choice ranking of entries from an audio database. For open-set detection, we can re-examine the best-ranked matches from waveprint using simple temporal-ordering-based processing. The resulting system has excellent detection capabilities for small snippets of audio that have been degraded in a variety of manners, including competing noise, poor recording quality, and cell-phone playback. The system is more accurate than the previous state-of-the-art system while being more efficient and flexible in memory usage and computation.

Proceedings ArticleDOI
15 Apr 2007
TL;DR: An integrated weighted-mixture model consisting of both harmonic-st structure and inharmonic-structure tone models (generative models for the power spectrogram) is developed under several original constraints for preventing over-training and maintaining intra-instrument consistency.
Abstract: This paper describes a sound source separation method for polyphonic sound mixtures of music to build an instrument equalizer for remixing multiple tracks separated from compact-disc recordings by changing the volume level of each track. Although such mixtures usually include both harmonic and inharmonic sounds, the difficulties in dealing with both types of sounds together have not been addressed in most previous methods that have focused on either of the two types separately. We therefore developed an integrated weighted-mixture model consisting of both harmonic-structure and inharmonic-structure tone models (generative models for the power spectrogram). On the basis of the MAP estimation using the EM algorithm, we estimated all model parameters of this integrated model under several original constraints for preventing over-training and maintaining intra-instrument consistency. Using standard MIDI files as prior information of the model parameters, we applied this model to compact-disc recordings and achieved the instrument equalizer.

Journal ArticleDOI
TL;DR: A noise suppression algorithm is proposed based on filtering the spectrotemporal modulations of noisy signals using a multiscale representation of the signal spectrogram generated by a model of sound processing in the auditory system to suppress noise that has distinctive modulation patterns, despite being spectrally overlapping with the signal.
Abstract: A noise suppression algorithm is proposed based on filtering the spectrotemporal modulations of noisy signals. The modulations are estimated from a multiscale representation of the signal spectrogram generated by a model of sound processing in the auditory system. A significant advantage of this method is its ability to suppress noise that has distinctive modulation patterns, despite being spectrally overlapping with the signal. The performance of the algorithm is evaluated using subjective and objective tests with contaminated speech signals and compared to traditional Wiener filtering method. The results demonstrate the efficacy of the spectrotemporal filtering approach in the conditions examined.

Journal ArticleDOI
François Léonard1
TL;DR: In this paper, a frequency spectrogram is calculated from the phase difference between each time slice of the Short Time Fourier Transform (STFT) and the reference frequency for each component, which shows the drift on the instantaneous frequency of each spectral component.

Proceedings ArticleDOI
01 Dec 2007
TL;DR: In this article, a rule based system is developed to detect and classify the various types of power quality disturbances, such as swell, sag, interruption, harmonic, interharmonic, transient, notching and normal voltage.
Abstract: This paper presents the detection and classifications of power quality disturbances using time-frequency signal analysis. The method used is based on the pattern recognition approach. It consists of parameter estimation followed classification. Based on the spectrogram time-frequency analysis, a set of signal parameters are estimated as input to a classifier network. The power quality events that are analyzed are swell, sag, interruption, harmonic, interharmonic, transient, notching and normal voltage. The parameter estimation is characterized by voltage signal in rms per unit, waveform distortion, harmonic distortion and interharmonic distortion. A rule based system is developed to detect and classify the various types of power quality disturbances. The system has been tested with 100 data for each power quality event at SNR from OdB to 50dB to verify its performance. The results show that the system gives 100 percent accuracy of power quality signals at 30 dB of SNR.

Journal ArticleDOI
TL;DR: The goals are to completely describe the computation of this second-order mixed partial derivative of the short-time Fourier transform phase in a way that highlights the relations to the two most influential methods of computing a reassigned spectrogram, and to demonstrate the utility of this technique for plotting spectrograms showing line components or impulses while excluding most other points.
Abstract: Two computational methods for pruning a reassigned spectrogram to show only quasisinusoidal components, or only impulses, or both, are presented mathematically and provided with step-by-step algorithms. Both methods compute the second-order mixed partial derivative of the short-time Fourier transform phase, and rely on the conditions that components and impulses are each well-represented by reassigned spectrographic points possessing particular values of this derivative. This use of the mixed second-order derivative was introduced by Nelson [J. Acoust. Soc. Am. 110, 2575–2592 (2001)] but here our goals are to completely describe the computation of this derivative in a way that highlights the relations to the two most influential methods of computing a reassigned spectrogram, and also to demonstrate the utility of this technique for plotting spectrograms showing line components or impulses while excluding most other points. When applied to speech signals, vocal tract resonances (formants) or glottal pulsat...