scispace - formally typeset
Search or ask a question

Showing papers on "Spectrogram published in 2008"


Journal ArticleDOI
TL;DR: The authors focus on a fast feature-based approach to estimate human motion features for real-time applications to provide a realistic look-alike of the real motion of the person.
Abstract: Radar can be an extremely useful sensing technique to observe persons. It perceives persons behind walls or at great distances and in situations where persons have no or poor visibility. Human motion modulates the radar signal which can be observed in the spectrogram of the received signal. Extraction of these movements enables the animation of a person in virtual reality. The authors focus on a fast feature-based approach to estimate human motion features for real-time applications. The human walking model of Boulic is used, which describe the human motion with three parameters. Personification information is obtained by estimating the individual leg and torso parameters. These motion parameters can be estimated from the temporal maximum, minimum and centre velocity of the human motion distribution. Three methods are presented to extract these velocities. Additionally, we extract an independent human motion repetition frequency estimate based on velocity slices in the spectrogram. Kalman filters smooth the parameters and estimate the global Boulic parameters. These estimated parameters are input to the human model of Boulic which forms the basis for animation. The methods are applied to real radar measurements. The animated person generated with the extracted parameters provides a realistic look-alike of the real motion of the person.

178 citations


Journal ArticleDOI
TL;DR: A block thresholding estimation procedure is introduced, which adjusts all parameters adaptively to signal property by minimizing a Stein estimation of the risk.
Abstract: Removing noise from audio signals requires a nondiagonal processing of time-frequency coefficients to avoid producing ldquomusical noise.rdquo State of the art algorithms perform a parameterized filtering of spectrogram coefficients with empirically fixed parameters. A block thresholding estimation procedure is introduced, which adjusts all parameters adaptively to signal property by minimizing a Stein estimation of the risk. Numerical experiments demonstrate the performance and robustness of this procedure through objective and subjective evaluations.

161 citations


Proceedings Article
01 Aug 2008
TL;DR: A simple and fast method to separate a monaural audio signal into harmonic and percussive components, which is much useful for multi-pitch analysis, automatic music transcription, drum detection, modification of music, and so on.
Abstract: In this paper, we present a simple and fast method to separate a monaural audio signal into harmonic and percussive components, which is much useful for multi-pitch analysis, automatic music transcription, drum detection, modification of music, and so on. Exploiting the differences in the spectrograms of harmonic and percussive components, the objective function is defined in a quadrature form of the spectrogram gradients. Applying the auxiliary function approach to that, simple and fast update equations are derived, which guarantee the decrease of the objective function at each iteration. We show some experimental results by applying our method to popular and jazz music songs.

136 citations


Journal ArticleDOI
TL;DR: Using nonnegative matrix factorization to derive a novel description for the timbre of musical sounds, a spectrogram is factorized providing a characteristic spectral basis and compression is shown to reduce the noise present in the data set resulting in more stable classification models.
Abstract: Nonnegative matrix factorization (NMF) is used to derive a novel description for the timbre of musical sounds. Using NMF, a spectrogram is factorized providing a characteristic spectral basis. Assuming a set of spectrograms given a musical genre, the space spanned by the vectors of the obtained spectral bases is modeled statistically using mixtures of Gaussians, resulting in a description of the spectral base for this musical genre. This description is shown to improve classification results by up to 23.3% compared to MFCC-based models, while the compression performed by the factorization decreases training time significantly. Using a distance-based stability measure this compression is shown to reduce the noise present in the data set resulting in more stable classification models. In addition, we compare the mean squared errors of the approximation to a spectrogram using independent component analysis and nonnegative matrix factorization, showing the superiority of the latter approach.

116 citations


Proceedings ArticleDOI
05 Jul 2008
TL;DR: The result shows that it is quite feasible to recognize the different human activities using micro-Doppler information and that the trained ANN could lead to high error when it is used to classify data measured from another sensor.
Abstract: An ANN has been proposed to classify human activities from their micro-Doppler signatures. Data were collected using a Doppler radar for 12 human subjects performing seven activities to construct the training data set. Six features from Doppler signatures were captured in the spectrogram. Validation tests based on the features resulted in an 82.7% and 87.8% classification accuracy for two different validation scenarios. This result shows that it is quite feasible to recognize the different human activities using micro-Doppler information. Several issues still need to be further addressed. In this study, we used measurement data for the training process. The features can be affected by the characteristics of the particular radar used, such as I-Q imbalance, polarization and Rx-Tx locations. Therefore, the trained ANN could lead to high error when it is used to classify data measured from another sensor. Our study is only applicable when the human approaches the radar head-on. Data from other aspects should be included in the testing. Also, we used a 3-second time-window for the features extraction. If the human activity changes during the window duration, classification error may increase. A method to extract features within a shorter time duration needs further research.

90 citations


Proceedings ArticleDOI
22 Jul 2008
TL;DR: Evaluation results indicate the high accuracy and the effectiveness of the proposed implementation of a patient monitoring system that may be used for patient activity recognition and emergency treatment in case a patient or an elder falls.
Abstract: The paper presents am initial implementation of a patient monitoring system that may be used for patient activity recognition and emergency treatment in case a patient or an elder falls. Sensors equipped with accelerometers and microphones are attached on the body of the patients and transmit patient movement and sound data wirelessly to the monitoring unit. Applying Short Time Fourier Transform (STFT) and spectrogram analysis on sounds detection of fall incidents is possible. The classification of the sound and movement data is performed using Support Vector Machines. Evaluation results indicate the high accuracy and the effectiveness of the proposed implementation.

90 citations


Journal ArticleDOI
TL;DR: In this paper, a general spectrogram/sonogram inversion algorithm called principal components generalized projections (PCGP) was proposed for frequency-resolved optical gating (FROG) measurements.
Abstract: Frequency-resolved optical gating (FROG) is a technique used to measure ultrafast laser pulses by optically producing a spectrogram, or FROG trace, of the measured pulse. While a great deal of information about the pulse can be gleaned from its FROG trace, quantitative pulse information must be obtained using an iterative two-dimensional phase retrieval algorithm. A general spectrogram/sonogram inversion algorithm called principal components generalized projections (PCGP) that can be applied to pulse measurement schemes, such as FROG, is reviewed. The algorithm is fast, robust, and can invert FROG traces in real time, making commercial pulse measurement systems based on FROG a reality. Measurement rates are no longer algorithm limited; they are data-acquisition limited. Also, because of some of its unique properties, the PCGP algorithm has found applications in measuring attosecond pulses and measuring telecommunications pulses. In addition, the PCGP structures the inversion and measurement process in a way that can allow new insights into convergence properties of spectrogram and sonogram inversion algorithms.

89 citations


Proceedings Article
01 Jan 2008
TL;DR: The constraints which a set of complex numbers must verify to be a consistent STFT spectrogram are derived and described, and it is shown how inconsistency can be used to develop a spectrogram-based audio encryption scheme.
Abstract: As many acoustic signal processing methods, for example for source separation or noise canceling, operate in the magnitude spectrogram domain, the problem of reconstructing a perceptually good sounding signal from a modified magnitude spectrogram, and more generally to understand what makes a spectrogram consistent, is very important. In this article, we derive the constraints which a set of complex numbers must verify to be a consistent STFT spectrogram, i.e. to be the STFT spectrogram of a real signal, and describe how they lead to an objective function measuring the consistency of a set of complex numbers as a spectrogram. We then present a flexible phase reconstruction algorithm based on a local approximation of the consistency constraints, explain its relation with phase-coherence conditions devised as necessary for a good perceptual sound quality, and derive a real-time time scale modification algorithm based on sliding-block analysis. Finally, we show how inconsistency can be used to develop a spectrogram-based audio encryption scheme.

87 citations


Proceedings Article
01 Jan 2008
TL;DR: A novel algorithm based on pitch estimation and nonnegative matrix factorization (NMF) that predicts the amount of noise in the vocal segments, which allows separating vocals and noise even when they overlap in time and frequency is proposed.
Abstract: This paper proposes a novel algorithm for separating vocals from polyphonic music accompaniment. Based on pitch estimation, the method first creates a binary mask indicating timefrequency segments in the magnitude spectrogram where harmonic content of the vocal signal is present. Second, nonnegative matrix factorization (NMF) is applied on the non-vocal segments of the spectrogram in order to learn a model for the accompaniment. NMF predicts the amount of noise in the vocal segments, which allows separating vocals and noise even when they overlap in time and frequency. Simulations with commercial and synthesized acoustic material show an average improvement of 1.3 dB and 1.8 dB, respectively, in comparison with a reference algorithm based on sinusoidal modeling, and also the perceptual quality of the separated vocals is clearly improved. The method was also tested in aligning separated vocals and textual lyrics, where it produced better results than the reference method.

85 citations


Proceedings Article
01 Jan 2008
TL;DR: A real-time equalizer to control a volume balance of harmonic and percussive components in music signals without a priori knowledge of scores or included instruments is presented.
Abstract: In this paper, we present a real-time equalizer to control a volume balance of harmonic and percussive components in music signals without a priori knowledge of scores or included instruments. The harmonic and percussive components of music signals have much different structures in the power spectrogram domain, the former is horizontal, while the latter is vertical. Exploiting the anisotropy, our methods separate input music signals into them based on the MAP estimation framework. We derive two kind of algorithm based on a I-divergence-based mixing model and a hard mixing model. Although they include iterative update equations, we realized the real-time processing by a sliding analysis technique. The separated harmonic and percussive components are finally remixed in an arbitrary volume balance and played. We show the prototype system implemented on Windows environment.

81 citations


Posted Content
TL;DR: A simple audio classification algorithm based on treating sound spectrograms as texture images, inspired by an earlier visual classification scheme particularly efficient at classifying textures achieves surprisingly good performance in musical instrument classification experiments.
Abstract: Time-frequency representations of audio signals often resemble texture images. This paper derives a simple audio classification algorithm based on treating sound spectrograms as texture images. The algorithm is inspired by an earlier visual classification scheme particularly efficient at classifying textures. While solely based on time-frequency texture features, the algorithm achieves surprisingly good performance in musical instrument classification experiments.

Proceedings ArticleDOI
18 May 2008
TL;DR: Results from the RS-AIC hardware implementation demonstrate successful reconstruction of signals that are sampled at half the Nyquist-rate while maintaining up to a 51 dB signal-to-noise ratio (SNR), which is equivalent to an 8.5 bit resolution analog to digital converter.
Abstract: In this paper, we successfully demonstrate the feasibility of hardware implementation of a sub-Nyquist random- sampling based analog to information converter (RS-AIC). The RS-AIC is based on the theory of information recovery from random samples using an efficient information recovery algorithm to compute the spectrogram of the signal. Our RS-AIC enables sub-Nyquist acquisition and processing of wideband signals that are sparse in a local Fourier representation. Results from our RS-AIC hardware implementation demonstrate successful reconstruction of signals that are sampled at half the Nyquist-rate while maintaining up to a 51 dB signal-to-noise ratio (SNR), which is equivalent to an 8.5 bit resolution analog to digital converter.

Journal ArticleDOI
TL;DR: This work proposes a novel approach where the noisy magnitude spectrum is recombined with a changed phase spectrum to produce a modified complex spectrum, which results in improved speech quality.
Abstract: Typical speech enhancement algorithms operate on the short-time magnitude spectrum, while keeping the short-time phase spectrum unchanged for synthesis. We propose a novel approach where the noisy magnitude spectrum is recombined with a changed phase spectrum to produce a modified complex spectrum. During synthesis, the low energy components of the modified complex spectrum cancel out more than the high energy components, thus reducing background noise. Using objective speech quality measures, informal subjective listening tests and spectrogram analysis, we show that the proposed method results in improved speech quality.

Journal ArticleDOI
TL;DR: Results showed that the scalogram with the Morlet wavelet exhibited good performance for the sample linear FM signal and the sample hyperbolic FM signal in comparison to the spectrogram.
Abstract: Instantaneous frequency (IF) estimation through the estimation of peak locations in the time-frequency plane is an important approach for signals contaminated with additive white Gaussian noise. In this paper, the forementioned analysis is carried out for continuous wavelet transform. The analysis of the scalogram as the instantaneous frequency estimator is performed for any FM signal regardless of the mother wavelet. Accurate expressions for the bias and the variance of the estimator are derived, and reveal that the bias and the variance are signal dependent. Results are statistically confirmed through the numerical analysis for several mother wavelets, and among considered wavelets, the Morlet wavelet produces the smallest estimation error. Furthermore, the performance of the IF estimator based on the scalogram and the spectrogram were compared through analysis of mean square error. These results showed that the scalogram with the Morlet wavelet exhibited good performance for the sample linear FM signal and the sample hyperbolic FM signal in comparison to the spectrogram.

Proceedings ArticleDOI
12 May 2008
TL;DR: A novel speech feature analysis technique based on localized spectro- temporal cepstral analysis of speech that is more robust to noise, and better capture temporal modulations important for recognizing plosive sounds is presented.
Abstract: Drawing on recent progress in auditory neuroscience, we present a novel speech feature analysis technique based on localized spectro- temporal cepstral analysis of speech. We proceed by extracting localized 2D patches from the spectrogram and project onto a 2D discrete cosine (2D-DCT) basis. For each time frame, a speech feature vector is then formed by concatenating low-order 2D- DCT coefficients from the set of corresponding patches. We argue that our framework has significant advantages over standard one- dimensional MFCC features. In particular, we find that our features are more robust to noise, and better capture temporal modulations important for recognizing plosive sounds. We evaluate the performance of the proposed features on a TIMIT classification task in clean, pink, and babble noise conditions, and show that our feature analysis outperforms traditional features based on MFCCs.

Journal ArticleDOI
TL;DR: In this paper, a real-time automatic detection system for regional phase arrivals on the NORSAR array and demonstrate how stable and accurate slowness and azimuth estimates can be obtained for quite marginal signals.
Abstract: SUMMARY Seismic arrays are employed in the global monitoring of earthquakes and explosions because of their superior ability to detect and estimate the direction of incident seismic arrivals. Traditional beamforming and f–k analysis require waveform semblance over the full array aperture and cannot be applied in many situations where signals are incoherent between sensors. The NORSAR and MJAR arrays are two primary IMS stations where this is the case for high-frequency regional phases. Large intersite distances and significant geological heterogeneity at these arrays result in waveform dissimilarity which precludes coherent array processing in the frequency bands with optimal SNR. Multitaper methods provide low variance spectral estimates over short time-windows and seismic arrivals can be detected on single channels using a non-linear spectrogram transformation which attains local maxima at times and frequencies characterized by an energy increase. This detection procedure requires very little a priori knowledge of the spectral content of the signal. The transformed spectrograms can be beamformed over large-aperture arrays or networks according to theoretical time-delays resulting in an incoherent detection system which does not require waveform semblance at any frequencies. We outline a real-time automatic detection system for regional phase arrivals on the NORSAR array and demonstrate how stable and accurate slowness and azimuth estimates can be obtained for quite marginal signals. In the case of partially coherent arrays, the procedure described may provide stable, if low resolution, estimates which can subsequently be refined using coherent processing over subsets of sensors. In particular, we illustrate how the spectrogram beamforming method facilitates a stable and accurate slowness estimate for the incoherent high-frequency Pn arrival at the MJAR array in Japan from the 2006 October 9 underground nuclear test in North Korea.

Journal ArticleDOI
T.S. Brandes1
TL;DR: This paper describes an effective process for automated detection and classification of frequency-modulated sounds from birds, crickets, and frogs that have a narrow short-time frequency bandwidth using a frequency band threshold filter on spectrograms.
Abstract: This paper describes an effective process for automated detection and classification of frequency-modulated sounds from birds, crickets, and frogs that have a narrow short-time frequency bandwidth. An algorithm is provided for extracting these signals from background noise using a frequency band threshold filter on spectrograms. Feature vectors are introduced and demonstrated to accurately model the resultant bioacoustic signals with hidden Markov models. Additionally, sequences of sounds are successfully modeled with composite hidden Markov models, allowing for a wider range of automated species recognition.

Journal ArticleDOI
TL;DR: This method is shown to perform well when dealing with the challenging problem of denoising broadband transients commonly encountered in warm shallow waters inhabited by snapping shrimp and would also be useful with other types of broadband transient noise.
Abstract: Marine mammal vocalizations are often analyzed using time-frequency representations (TFRs) which highlight their nonstationarities. One commonly used TFR is the spectrogram. The characteristic spectrogram time-frequency (TF) contours of marine mammal vocalizations play a significant role in whistle classification and individual or group identification. A major hurdle in the robust automated extraction of TF contours from spectrograms is underwater noise. An image-based algorithm has been developed for denoising and extraction of TF contours from noisy underwater recordings. An objective procedure for measuring the accuracy of extracted spectrogram contours is also proposed. This method is shown to perform well when dealing with the challenging problem of denoising broadband transients commonly encountered in warm shallow waters inhabited by snapping shrimp. Furthermore, it would also be useful with other types of broadband transient noise.

Journal Article
TL;DR: In this paper, the problem of detection and recognition of contact calls produced by North Atlantic right whales, Eubalaena glacialis, is considered. And the proposed solution is based on a multiple-stage hypothesis-testing technique involving a spectrogram-based detector, spectrogram testing, and feature vector testing algorithms.
Abstract: The problem of detection and recognition of contact calls produced by North Atlantic right whales, Eubalaena glacialis, is considered. A proposed solution is based on a multiple-stage hypothesis-testing technique involving a spectrogram-based detector, spectrogram testing, and feature vector testing algorithms. Results show that the proposed technique is able to detect over 80% of the contact calls detected by a human operator and to produce about 26 false alarms per 24 h of observation.

Proceedings ArticleDOI
01 Jan 2008
TL;DR: A novel imputation technique working on entire words that achieves recognition accuracies of 92% at SNR -5 dB using oracle masks on AURORA-2 as compared to 61% using a conventional frame-based approach.
Abstract: Noise robustness of automatic speech recognition benefits from using missing data imputation: Prior to recognition the parts of the spectrogram dominated by noise are replaced by clean speech estimates. Especially at low SNRs each frame contains at best only a few uncorrupted coefficients. This makes frame-by-frame restoration of corrupted feature vectors error-prone, and recognition accuracy will mostly be sub-optimal. In this paper we present a novel imputation technique working on entire words. A word is sparsely represented in an overcomplete basis of exemplar (clean) speech signals using only the uncorrupted time-frequency elements of the word. The corrupted elements are replaced by estimates obtained by projecting the sparse representation in the basis. We achieve recognition accuracies of 92% at SNR −5 dB using oracle masks on AURORA-2 as compared to 61% using a conventional frame-based approach. The performance obtained with estimated masks can be directly related to the proportion of correctly identified uncorrupted coefficients.

01 Jan 2008
TL;DR: A low-dimensional feature is defined which captures the shape of the modulation spectra and is improved from the previous result of Eer=25.1% to EER=17.4% on the NIST 2001 speaker recognition task.
Abstract: A so-called modulation spectrogram is obtained from the conventional speech spectrogram by short-term spectral analysis along the temporal trajectories of the frequency bins. In its original definition, the modulation spectrogram is a highdimensional representation and it is not clear how to extract features from it. In this paper, we define a low-dimensional feature which captures the shape of the modulation spectra. The recognition accuracy of the modulation spectrogram based classifier is improved from our previous result of EER=25.1% to EER=17.4% on the NIST 2001 speaker recognition task.

Journal ArticleDOI
TL;DR: The RID gave a detailed representation of the TMJ signals' relative energy distribution in the time and frequency domains, with a great reduction in the interference or cross terms, and appears to be most useful in the application of time-frequency distributions in classification of TMJ sounds.
Abstract: summary For analysis of time-varying signals such as the TMJ sounds, it is often desirable to know how the frequency components change with time, using methods of timefrequency analysis. The aim of this study was to compare two of the most familiar methods for energy density representation with a newly developed technique. The sounds were recorded with a microphone fastened to the subject’s forehead, transformed to the time-frequency domain and displayed as 3D- and contour plots using spectrogram, Wigner distribution (WD), and the reduced interference distribution (RID) to display their time-frequency energy distributions. The spectrogram resolved only the lowfrequency components. The WD provided higher resolution but also exhibited strong interference between components. The RID gave a detailed representation of the TMJ signals’ relative energy distribution in the time and frequency domains, with a great reduction in the interference or cross terms. The RID therefore appears to be most useful in the application of time-frequency distributions in classification of TMJ sounds.

Journal ArticleDOI
TL;DR: In this article, the authors developed a method that uses the dynamic Allan variance and the spectrogram to detect and to identify the typical anomalies of an atomic clock, and applied the method to simulated data.
Abstract: When an anomaly occurs in an atomic clock, its stability and frequency spectrum change with time. The variation with time of the stability can be evaluated with the dynamic Allan variance. The variation with time of the frequency spectrum can be described with the spectrogram, a time–frequency distribution. We develop a method that uses the dynamic Allan variance and the spectrogram to detect and to identify the typical anomalies of an atomic clock. We apply the method to simulated data.

Journal ArticleDOI
TL;DR: The links detector was validated by using an artificial recording environment, including synthetic calls, atmospheric absorption, and echoes, which provided control of signal-to-noise ratio and an absolute ground truth.
Abstract: The link detector combines a model-based spectral peak tracker with an echo filter to detect echolocation calls of bats. By processing calls in the spectrogram domain, the links detector separates calls that overlap in time, including call harmonics and echoes. The links detector was validated by using an artificial recording environment, including synthetic calls, atmospheric absorption, and echoes, which provided control of signal-to-noise ratio and an absolute ground truth. Maximum hit rate (2% false positive rate) for the links detector was 87% compared to 1.5% for a spectral peak detector. The difference in performance was due to the ability of the links detector to filter out echoes. Detection range varied across species from 13 to more than 20 m due to call bandwidth and frequency range. Global features of calls detected by the links detector were compared to those of synthetic calls. The error in all estimates increased as the range increased, and estimates of minimum frequency and frequency of most energy were more accurate compared to maximum frequency. The links detector combines local and global features to automatically detect calls within the machine learning paradigm and detects overlapping calls and call harmonics in a unified framework.

Journal ArticleDOI
TL;DR: In this paper, it was shown that even during one gait cycle the velocity of the torso, which constitutes the major part of the reflection, is not constant and a smaller portion of the signal is reflected from the legs.
Abstract: . Human locomotion consists of a complex movement of various parts of the body. The reflections generated by body parts with different relative velocities result in different Doppler shifts which can be detected as a superposition with a Continuous-Wave (CW) Radar. A time-frequency transform like the short-time Fourier transform (STFT) of the radar signal allows a representation of the signal in both time- and frequency domain (spectrogram). It can be shown that even during one gait cycle the velocity of the torso, which constitutes the major part of the reflection, is not constant. Further a smaller portion of the signal is reflected from the legs. The velocity of the legs varies in a wide range from zero (foot is on the ground) to a velocity which is higher than that of the torso. The two dominant parameters which characterise the human gait are the step rate and the mean velocity. Both parameters can be deduced from suitable portions of the spectrogram. The statistical evaluation of the two parameters has the potential to be included for discrimination purposes either between different persons or between humans and other moving objects.

Proceedings ArticleDOI
12 May 2008
TL;DR: An auditory-inspired feed-forward architecture which achieves good performance in noisy conditions on a segmented word recognition task is presented and its combination with MFCCs or RASTA features yields improved recognition scores in noise.
Abstract: Previously we presented an auditory-inspired feed-forward architecture which achieves good performance in noisy conditions on a segmented word recognition task. In this paper we propose to use a modified version of this hierarchical model to generate features for standard hidden Markov models. To obtain these features we firstly compute the spectrograms using a Gammatone filterbank. A filtering over the channels permits to enhance the formant frequencies which are afterwards detected using Gabor-like receptive fields. Then the responses of the receptive fields are combined to complex features which span the whole frequency range and extend over three different time windows. The features have been evaluated on a single digit recognition task. The results show that their combination with MFCCs or RASTA features yields improved recognition scores in noise.

Patent
16 Apr 2008
TL;DR: In this paper, an FPGA-based random signal generator comprising a PC, a USB controller, an MCU3, a MCU interface module, a crystal resonator, an EPC2, a time controller, a dual-channel DA output circuit, a frequency controller, register matrix unit, a keyboard, a key scanning module, FLASH, a Flash control module, TFT display, a TFT control module and a DDS signal generator, a waveform synthesis module are presented.
Abstract: The invention discloses an FPGA-based random signal generator comprising a PC, a USB controller, an MCU3, an MCU interface module, a crystal resonator, an EPC2, a time controller, a dual-channel DA output circuit, a frequency controller, a register matrix unit, a keyboard, a keyboard scanning module, a FLASH, a Flash control module, a TFT display, a TFT control module, a DDS signal generator, a waveform synthesis module and other waveform generators. When the invention is used, software can automatically complete frequency spectrum information identification and get frequency point amplitude and phase parameters after a frequency spectrogram and phase spectrogram parameters are inputted into a software control interface; then a time domain information table is obtained after sampling values are quantized and encoded, the time domain information table is downloaded to the RAM of a DDS generating circuit to realize periodic or nonperiodic time domain signal reduction output; furthermore, a waveform amplitude is online stepped and adjustable, thereby realizing the aims of frequency domain output and time domain output.

Journal ArticleDOI
TL;DR: A chi-squared description of theSpectrogram distribution appears accurate when the analysis window used to construct the spectrogram decreases to zero at its boundaries, regardless of the level of correlation contained in the signal.
Abstract: Given a correlated Gaussian signal, may a chi-squared law of probability always be used to describe a spectrogram coefficient distribution? If not, would a "chi-squared description" lead to an acceptable amount of error when detection problems are to be faced in the time-frequency domain? These two questions prompted the study reported in this paper. After deriving the probability distribution of spectrogram coefficients in the context of a non centered Gaussian correlated signal, the Kullback-Leibler divergence is first used to evaluate to what extent the nonwhiteness of the signal and the Fourier analysis window impact the probability distribution of the spectrogram. To complete the analysis, a detection task formulated as a binary hypothesis test is considered. We evaluate the error committed on the probability of false alarm when the likelihood ratio test is expressed with chi-squared laws. From these results, a chi-squared description of the spectrogram distribution appears accurate when the analysis window used to construct the spectrogram decreases to zero at its boundaries, regardless of the level of correlation contained in the signal. When other analysis windows are used, the length of the window and the correlation contained in the analyzed signal impact the validity of the chi-squared description.

Patent
05 May 2008
TL;DR: In this paper, a beat extractor extracts a beat component of the sound signal based on a spectrogram, and generates a beat waveform having information of a beat timing and a beat intensity.
Abstract: In a sound output device, a sound input unit acquires a sound signal reproduced by a reproduction device. A beat extractor extracts a beat component of the sound signal based on a spectrogram, and generates a beat waveform having information of a beat timing and a beat intensity. An output signal generator amplifies the sound signal with the beat waveform being a gain, using the beat timing and beat intensity which the beat waveform has. A sound output unit outputs the beat enhanced sound signal as a sound by performing D/A conversion on the beat enhanced sound signal.

Patent
14 Apr 2008
TL;DR: In this article, an audio signal produced by playing a plurality of musical instruments is separated into sound sources according to respective instrument sounds, and each time a separation process is performed, the updated model parameter estimation/storage section 114 estimates parameters respectively contained in updated model parameters.
Abstract: An audio signal produced by playing a plurality of musical instruments is separated into sound sources according to respective instrument sounds. Each time a separation process is performed, the updated model parameter estimation/storage section 114 estimates parameters respectively contained in updated model parameters such that updated power spectrograms gradually change from a state close to initial power spectrograms to a state close to a plurality of power spectrograms most recently stored in a power spectrogram separation/storage section. Respective sections including the power spectrogram separation/storage section 112 and an updated distribution function computation/storage section 118 repeatedly perform process operations until the updated power spectrograms change from the state close to the initial power spectrograms to the state close to the plurality of power spectrograms most recently stored in the power spectrogram separation/storage section 112. The final updated power spectrograms are close to the power spectrograms of single tones of one musical instrument contained in the input audio signal formed to contain harmonic and inharmonic models.