Home
/
Topics
/
Spectrogram

Topic

Spectrogram

About: Spectrogram is a research topic. Over the lifetime, 5813 publications have been published within this topic receiving 81547 citations.

...read moreread less

Papers published on a yearly basis

2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1973
1972
1971
1970

1 / 2

Papers

PDF

Open Access

More filters

Posted Content•

Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

[...]

Ryuichi Yamamoto, Eunwoo Song¹, Jae-Min Kim¹•Institutions (1)

Naver Corporation¹

25 Oct 2019-arXiv: Audio and Speech Processing

TL;DR: The proposed Parallel WaveGAN has only 1.44 M parameters and can generate 24 kHz speech waveform 28.68 times faster than real-time on a single GPU environment, which is comparative to the best distillation-based Parallel WaveNet system.

...read moreread less

Abstract: We propose Parallel WaveGAN, a distillation-free, fast, and small-footprint waveform generation method using a generative adversarial network. In the proposed method, a non-autoregressive WaveNet is trained by jointly optimizing multi-resolution spectrogram and adversarial loss functions, which can effectively capture the time-frequency distribution of the realistic speech waveform. As our method does not require density distillation used in the conventional teacher-student framework, the entire model can be easily trained. Furthermore, our model is able to generate high-fidelity speech even with its compact architecture. In particular, the proposed Parallel WaveGAN has only 1.44 M parameters and can generate 24 kHz speech waveform 28.68 times faster than real-time on a single GPU environment. Perceptual listening test results verify that our proposed method achieves 4.16 mean opinion score within a Transformer-based text-to-speech framework, which is comparative to the best distillation-based Parallel WaveNet system.

...read moreread less

256 citations

Journal Article•DOI•

Time-frequency representations of Lamb waves.

[...]

Marc Niethammer¹, Laurence J. Jacobs¹, Jianmin Qu¹, Jacek Jarzynski¹•Institutions (1)

Georgia Institute of Technology¹

05 Jun 2001-Journal of the Acoustical Society of America

TL;DR: The utility of using TFRs to quantitatively resolve changes in the frequency content of these nonstationary signals, as a function of time, is illustrated.

...read moreread less

Abstract: The objective of this study is to establish the effectiveness of four different time-frequency representations (TFRs)—the reassigned spectrogram, the reassigned scalogram, the smoothed Wigner–Ville distribution, and the Hilbert spectrum—by comparing their ability to resolve the dispersion relationships for Lamb waves generated and detected with optical techniques This paper illustrates the utility of using TFRs to quantitatively resolve changes in the frequency content of these nonstationary signals, as a function of time While each technique has certain strengths and weaknesses, the reassigned spectrogram appears to be the best choice to characterize multimode Lamb waves

...read moreread less

253 citations

Journal Article•DOI•

Unsupervised Speech Representation Learning Using WaveNet Autoencoders

[...]

Jan Chorowski¹, Ron Weiss², Samy Bengio², Aaron van den Oord•Institutions (2)

University of Wrocław¹, Google²

01 Dec 2019-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A regularization scheme is introduced that forces the representations to focus on the phonetic content of the utterance and report performance comparable with the top entries in the ZeroSpeech 2017 unsupervised acoustic unit discovery task.

...read moreread less

Abstract: We consider the task of unsupervised extraction of meaningful latent representations of speech by applying autoencoding neural networks to speech waveforms. The goal is to learn a representation able to capture high level semantic content from the signal, e.g. phoneme identities, while being invariant to confounding low level details in the signal such as the underlying pitch contour or background noise. Since the learned representation is tuned to contain only phonetic content, we resort to using a high capacity WaveNet decoder to infer information discarded by the encoder from previous samples. Moreover, the behavior of autoencoder models depends on the kind of constraint that is applied to the latent representation. We compare three variants: a simple dimensionality reduction bottleneck, a Gaussian Variational Autoencoder (VAE), and a discrete Vector Quantized VAE (VQ-VAE). We analyze the quality of learned representations in terms of speaker independence, the ability to predict phonetic content, and the ability to accurately reconstruct individual spectrogram frames. Moreover, for discrete encodings extracted using the VQ-VAE, we measure the ease of mapping them to phonemes. We introduce a regularization scheme that forces the representations to focus on the phonetic content of the utterance and report performance comparable with the top entries in the ZeroSpeech 2017 unsupervised acoustic unit discovery task.

...read moreread less

252 citations

Journal Article•DOI•

Speech emotion recognition with deep convolutional neural networks

[...]

Dias Issa¹, M. Fatih Demirci¹, Adnan Yazici¹•Institutions (1)

Nazarbayev University¹

01 May 2020-Biomedical Signal Processing and Control

TL;DR: A new architecture is introduced, which extracts mel-frequency cepstral coefficients, chromagram, mel-scale spectrogram, Tonnetz representation, and spectral contrast features from sound files and uses them as inputs for the one-dimensional Convolutional Neural Network for the identification of emotions using samples from the Ryerson Audio-Visual Database of Emotional Speech and Song, Berlin, and EMO-DB datasets.

...read moreread less

251 citations

Journal Article•DOI•

Detection, estimation, and classification with spectrograms

[...]

Richard A. Altes

01 Apr 1980-Journal of the Acoustical Society of America

TL;DR: Spectrogram correlation can be used for classification as well as for estimation and detection, and for maximum likelihood parameter estimation, e.g., estimation of delay or center frequency of a signal.

...read moreread less

Abstract: A locally optimum detector correlates the data spectrogram with a reference spectrogram in order to detect (i) a known signal with unknown delay and Doppler parameters, (ii) a random signal with known covariance function, or (iii) the output of a random, time‐varying channel with known scattering function. Spectrogram correlation can also be used for maximum likelihood parameter estimation, e.g., estimation of delay or center frequency of a signal. To estimate an analog input signal from its spectrogram, a modified deconvolution operation can be used together with a predictive noise canceler. If no noise is added to the spectrogram, the mean‐square error of this signal estimate is independent of the window function that is used to construct the spectrogram. When estimates of specific signal parameters are obtained directly from the spectrogram, these estimates have mean‐square errors that depend upon both signal and window waveforms. Spectrogram correlation can be used for classification as well as for estimation and detection. Parameter estimators and detectors are, in fact, specialized kinds of classifiers.

...read moreread less

248 citations

1
2
3
4
5
…
6
7
8
9
10
11
12
…
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

7,848

Papers

107,060

Citations

No. of papers in the topic in previous years
Year	Papers
2024	1
2023	627
2022	1,396
2021	488
2020	595
2019	593

Spectrogram

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics