Topic

Spectrogram

About: Spectrogram is a research topic. Over the lifetime, 5813 publications have been published within this topic receiving 81547 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Sequence-to-Sequence Acoustic Modeling for Voice Conversion

[...]

Jing-Xuan Zhang¹, Zhen-Hua Ling¹, Li-Juan Liu¹, Yuan Jiang¹, Li-Rong Dai¹ - Show less +1 more•Institutions (1)

University of Science and Technology of China¹

16 Oct 2018-arXiv: Sound

TL;DR: Experimental results show that the proposed neural network named sequence-to-sequence ConvErsion NeTwork (SCENT) obtained better objective and subjective performance than the baseline methods using Gaussian mixture models and deep neural networks as acoustic models.

...read moreread less

Abstract: In this paper, a neural network named Sequence-to-sequence ConvErsion NeTwork (SCENT) is presented for acoustic modeling in voice conversion. At training stage, a SCENT model is estimated by aligning the feature sequences of source and target speakers implicitly using attention mechanism. At conversion stage, acoustic features and durations of source utterances are converted simultaneously using the unified acoustic model. Mel-scale spectrograms are adopted as acoustic features which contain both excitation and vocal tract descriptions of speech signals. The bottleneck features extracted from source speech using an automatic speech recognition (ASR) model are appended as auxiliary input. A WaveNet vocoder conditioned on Mel-spectrograms is built to reconstruct waveforms from the outputs of the SCENT model. It is worth noting that our proposed method can achieve appropriate duration conversion which is difficult in conventional methods. Experimental results show that our proposed method obtained better objective and subjective performance than the baseline methods using Gaussian mixture models (GMM) and deep neural networks (DNN) as acoustic models. This proposed method also outperformed our previous work which achieved the top rank in Voice Conversion Challenge 2018. Ablation tests further confirmed the effectiveness of several components in our proposed method.

...read moreread less

32 citations

Proceedings Article•DOI•

Phase Reconstruction with Learned Time-Frequency Representations for Single-Channel Speech Separation

[...]

Gordon Wichern¹, Jonathan Le Roux¹•Institutions (1)

Mitsubishi Electric Research Laboratories¹

01 Sep 2018

TL;DR: This paper explicitly integrate phase reconstruction into the authors' separation algorithm using a loss function defined on time-domain signals, and allows the network to learn a modified version of these representations from data, instead of using fixed STFT/iSTFT time-frequency representations.

...read moreread less

Abstract: Progress in solving the cocktail party problem, i.e., separating the speech from multiple overlapping speakers, has recently accelerated with the invention of techniques such as deep clustering and permutation free mask inference. These approaches typically focus on estimating target STFT magnitudes and ignore problems of phase inconsistency. In this paper, we explicitly integrate phase reconstruction into our separation algorithm using a loss function defined on time-domain signals. A deep neural network structure is defined by unfolding a phase reconstruction algorithm and treating each iteration as a layer in our network. Furthermore, instead of using fixed STFT/iSTFT time-frequency representations, we allow our network to learn a modified version of these representations from data. We compare several variants of these unfolded phase reconstruction networks achieving state of the art results on the publicly available wsj0-2mix dataset, and show improved performance when the STFT/iSTFT-like representations are allowed to adapt.

...read moreread less

32 citations

Journal Article•

Measurement, Analysis, and Visualization of Directional Room Responses

[...]

Matti Karjalainen¹, Juha Merimaa¹, Timo Peltonen, Tapio Lokki¹•Institutions (1)

Helsinki University of Technology¹

01 Nov 2001-Journal of The Audio Engineering Society

TL;DR: This paper proposes methods that apply 3-D microphone arrays, directional analysis of measured room responses, and visualization of data, yielding useful information about the time-frequency-direction properties of the responses.

...read moreread less

Abstract: Room impulse responses are inherently multidimensional, including components in three coordinate directions, each one further being described as a time-frequency representation. Suc h 5-dimensional data is di cult to visualize and interpret. We propose methods that apply 3-D microphone arrays, directional analysis of measured room responses, and visualization of data, yielding useful information about the time-frequency-direction properties of the responses. The applicability of the methods is demonstrated with three di erent cases of real measurements. INTRODUCTION A room impulse response, measured from a source to a receiver position, is inherently multidimensional. Traditionally, the evolution of an omnidirectional sound pressure response in a single point has been studied as a function of time and frequency. However, dividing the response further into directional components can reveal much more information about the actual propagation of sound in the room, as well as about its perceptual aspects. In this paper we propose methods that are based on 3-D microphone arrays, directional analysis of the measured responses, and visualization of such data in a way that yields maximal information about the time-frequency-direction properties of the response. MERIMAA ET AL. Measurement, Analysis, and Visualization of Directional Room Responses The measurement of directional room responses is made with a special 3-D microphone probe which basically consists of two intensity probes in each x-, y-, and z-coordinate directions and is constructed of small electret capsules. The responses are analyzed either with a uniform or an auditorily motivated time-frequency resolution. The analysis results in a significant amount of 5-dimensional data that is hard to visualize and interpret. Based on measured x/y/z-intensity components, intensity vectors (magnitude and direction) can be plotted in a spectrogram-like map, one vector for each time-frequency bin, illustrating the directional evolution of the field in time and frequency. Additionally, a pressure-related time-frequency spectrogram can be overlaid with the vectors, in gray levels or colors, illustrating for example a perceptually motivated spectrogram with no directional information. One such map can be used to illustrate the horizontal information and another one can be added for the elevation information. This technique is a part of a Matlab visualization toolbox for directional room responses developed by the authors, and it includes several other possibilities to analyze and represent room acoustical data. Traditional parameters and presentations are also available, some of them in 3-D versions, such as energy-time plots in desired directions. The paper starts with a discussion on measurements of directional room responses and sound intensity. This is followed by descriptions of the visualization method and the auditorily motivated time-frequency analysis. Finally, the applicability of the methods is demonstrated with three different cases of real measurements. DIRECTIONAL SOUND PRESSURE COMPONENTS Existing literature on room acoustics discusses mainly omnidirectional measurements with the exception of some special directional parameters. Directional room responses can be measured with either directional microphones or arrays of microphones. However, an array of omnidirectional microphones has some distinct advantages compared to directional microphones. Omnidirectional capsules can be made smaller and they usually behave more like ideal transducers. Further, if the omnidirectional signals are stored at the measurement time, it is possible to afterwards create varying directivity patterns based on a single measurement. Typical directivity patterns can be formed with an array of two or more closely spaced omnidirectional microphones and some equalization to compensate for the resulting non-flat magnitude response. For example the difference of two microphone signals gives a dipole pattern and adding an appropriate delay to one of the signals changes the pattern to a cardioid. Okubo et al. [1] have also proposed a method that uses a product of cardioid and dipole signals to achieve a directivity pattern more suitable for some directional room acoustics measurements. Various directional sound pressure responses can be used to plot traditional impulse responses, energy-time-curves or spectrograms that give information about the directional properties of the room responses. With larger microphone arrays it is also possible to form directivity patterns with very narrow beams and thus good spatial resolution. However, groups of similar plots for several different directions are not very visual or easy to interpret. Sound intensity as a vector quantity can solve some of the visualization problems in the method we are proposing in this paper. SOUND INTENSITY Sound intensity [2] describes the propagation of energy in a sound field. Instantaneous intensity vector is defined as the product of instantaneous sound pressure p(t) and particle velocity u(t) I(t) = p(t)u(t) (1) Based on the linearized fluid momentum equation, particle velocity in the direction n can be written in the form

...read moreread less

32 citations

Journal Article•DOI•

On the Description of Spectrogram Probabilities With a Chi-Squared Law

[...]

Julien Huillery¹, Fabien Millioz¹, Nadine Martin¹•Institutions (1)

Centre national de la recherche scientifique¹

01 Jun 2008-IEEE Transactions on Signal Processing

TL;DR: A chi-squared description of theSpectrogram distribution appears accurate when the analysis window used to construct the spectrogram decreases to zero at its boundaries, regardless of the level of correlation contained in the signal.

...read moreread less

Abstract: Given a correlated Gaussian signal, may a chi-squared law of probability always be used to describe a spectrogram coefficient distribution? If not, would a "chi-squared description" lead to an acceptable amount of error when detection problems are to be faced in the time-frequency domain? These two questions prompted the study reported in this paper. After deriving the probability distribution of spectrogram coefficients in the context of a non centered Gaussian correlated signal, the Kullback-Leibler divergence is first used to evaluate to what extent the nonwhiteness of the signal and the Fourier analysis window impact the probability distribution of the spectrogram. To complete the analysis, a detection task formulated as a binary hypothesis test is considered. We evaluate the error committed on the probability of false alarm when the likelihood ratio test is expressed with chi-squared laws. From these results, a chi-squared description of the spectrogram distribution appears accurate when the analysis window used to construct the spectrogram decreases to zero at its boundaries, regardless of the level of correlation contained in the signal. When other analysis windows are used, the length of the window and the correlation contained in the analyzed signal impact the validity of the chi-squared description.

...read moreread less

32 citations

Proceedings Article•DOI•

Common fate model for unison source separation

[...]

Fabian-Robert Stöter, Antoine Liutkus¹, Roland Badeau², Bernd Edler, Paul Magron² - Show less +1 more•Institutions (2)

French Institute for Research in Computer Science and Automation¹, Institut Mines-Télécom²

20 Mar 2016

TL;DR: A novel source separation method aiming to overcome the difficulty of modelling non-stationary signals, based on a signal representation that divides the complex spectrogram into a grid of patches of arbitrary size, which reveals spectral and temporal modulation textures.

...read moreread less

Abstract: In this paper we present a novel source separation method aiming to overcome the difficulty of modelling non-stationary signals. The method can be applied to mixtures of musical instruments with frequency and/or amplitude modulation, e.g. typically caused by vibrato. It is based on a signal representation that divides the complex spectrogram into a grid of patches of arbitrary size. These complex patches are then processed by a two-dimensional discrete Fourier transform, forming a tensor representation which reveals spectral and temporal modulation textures. Our representation can be seen as an alternative to modulation transforms computed on magnitude spectrograms. An adapted factorization model allows to decompose different time-varying harmonic sources based on their particular common modulation profile: hence the name Common Fate Model. The method is evaluated on musical instrument mixtures playing the same fundamental frequency (unison), showing improvement over other state-of-the-art methods.

...read moreread less

31 citations

Collapse

Network Information

Performance

Metrics

7,848

Papers

107,060

Citations

No. of papers in the topic in previous years
Year	Papers
2024	1
2023	627
2022	1,396
2021	488
2020	595
2019	593

Spectrogram

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics