Home
/
Authors
/
Sangita Sharma

Author

Sangita Sharma

Other affiliations: International Computer Science Institute

Bio: Sangita Sharma is an academic researcher from Oregon Health & Science University. The author has contributed to research in topics: Feature (machine learning) & Word error rate. The author has an hindex of 7, co-authored 7 publications receiving 1366 citations. Previous affiliations of Sangita Sharma include International Computer Science Institute.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Tandem connectionist feature extraction for conventional HMM systems

[...]

Hynek Hermansky¹, Daniel P. W. Ellis², Sangita Sharma•Institutions (2)

Oregon Health & Science University¹, International Computer Science Institute²

05 Jun 2000

TL;DR: A large improvement in word recognition performance is shown by combining neural-net discriminative feature processing with Gaussian-mixture distribution modeling.

...read moreread less

Abstract: Hidden Markov model speech recognition systems typically use Gaussian mixture models to estimate the distributions of decorrelated acoustic feature vectors that correspond to individual subword units. By contrast, hybrid connectionist-HMM systems use discriminatively-trained neural networks to estimate the probability distribution among subword units given the acoustic observations. In this work we show a large improvement in word recognition performance by combining neural-net discriminative feature processing with Gaussian-mixture distribution modeling. By training the network to generate the subword probability posteriors, then using transformations of these estimates as the base features for a conventionally-trained Gaussian-mixture based system, we achieve relative error rate reductions of 35% or more on the multicondition Aurora noisy continuous digits task.

...read moreread less

803 citations

Proceedings Article•DOI•

Temporal patterns (TRAPs) in ASR of noisy speech

[...]

Hynek Hermansky¹, Sangita Sharma²•Institutions (2)

Oregon Health & Science University¹, International Computer Science Institute²

15 Mar 1999

TL;DR: The proposed neural TRAPs are found to yield significant amount of complementary information to that of the conventional spectral feature based ASR system, which results in improved robustness to several types of additive and convolutive environmental degradations.

...read moreread less

Abstract: We study a new approach to processing temporal information for automatic speech recognition (ASR). Specifically, we study the use of rather long-time temporal patterns (TRAPs) of spectral energies in place of the conventional spectral patterns for ASR. The proposed neural TRAPs are found to yield significant amount of complementary information to that of the conventional spectral feature based ASR system. A combination of these two ASR systems is shown to result in improved robustness to several types of additive and convolutive environmental degradations.

...read moreread less

206 citations

Proceedings Article•

TRAPS - classifiers of temporal patterns.

[...]

Hynek Hermansky¹, Sangita Sharma¹•Institutions (1)

Oregon Health & Science University¹

01 Jan 1998

TL;DR: The work proposes a radically different set of features for ASR where TempoRAl Patterns of spectral energies are used in place of the conventional spectral patterns.

...read moreread less

Abstract: The work proposes a radically di erent set of features for ASR where TempoRAl Patterns of spectral energies are used in place of the conventional spectral patterns The approach has several inherent advantages, among them robustness to stationary or slowly varying disturbances

...read moreread less

171 citations

Journal Article•DOI•

Relevance of time-frequency features for phonetic and speaker-channel classification

[...]

Howard Hua Yang¹, Sarel van Vuuren¹, Sangita Sharma¹, Hynek Hermansky¹•Institutions (1)

Oregon Health & Science University¹

01 May 2000-Speech Communication

TL;DR: A large database of hand-labeled fluent speech is used to compute the mutual information between a phonetic classification variable and one spectral feature variable in the time–frequency plane, and the joint mutual information (JMI) between the phonetic Classification variable and two feature variables in thetime-frequency plane.

...read moreread less

93 citations

Proceedings Article•DOI•

Feature extraction using non-linear transformation for robust speech recognition on the Aurora database

[...]

Sangita Sharma¹, Daniel P. W. Ellis, Sachin S. Kajarekar, P. Jain, Hynek Hermansky - Show less +1 more•Institutions (1)

Oregon Health & Science University¹

05 Jun 2000

TL;DR: It is shown that after a non-linear transformation, a number of features can be effectively used in a HMM-based recognition system.

...read moreread less

Abstract: We evaluate the performance of several feature sets on the Aurora task as defined by ETSI. We show that after a non-linear transformation, a number of features can be effectively used in a HMM-based recognition system. The non-linear transformation is computed using a neural network which is discriminatively trained on the phonetically labeled (forcibly aligned) training data. A combination of the non-linearly transformed PLP (perceptive linear predictive coefficients), MSG (modulation filtered spectrogram) and TRAP (temporal pattern) features yields a 63% improvement in error rate as compared to baseline me frequency cepstral coefficients features. The use of the non-linearly transformed RASTA-like features, with system parameters scaled down to take into account the ETSI imposed memory and latency constraints, still yields a 40% improvement in error rate.

...read moreread less

87 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

[...]

Geoffrey E. Hinton¹, Li Deng², Dong Yu², George E. Dahl¹, Abdelrahman Mohamed¹, Navdeep Jaitly¹, Andrew W. Senior³, Vincent Vanhoucke³, Patrick Nguyen³, Tara N. Sainath⁴, Brian Kingsbury⁴ - Show less +7 more•Institutions (4)

University of Toronto¹, Microsoft², Google³, IBM⁴

18 Oct 2012-IEEE Signal Processing Magazine

TL;DR: This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.

...read moreread less

Abstract: Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models (GMMs) to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input. An alternative way to evaluate the fit is to use a feed-forward neural network that takes several frames of coefficients as input and produces posterior probabilities over HMM states as output. Deep neural networks (DNNs) that have many hidden layers and are trained using new methods have been shown to outperform GMMs on a variety of speech recognition benchmarks, sometimes by a large margin. This article provides an overview of this progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.

...read moreread less

9,091 citations

Journal Article•DOI•

Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition

[...]

George E. Dahl¹, Dong Yu², Li Deng², Alex Acero²•Institutions (2)

University of Toronto¹, Microsoft²

01 Jan 2012-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output that can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs.

...read moreread less

Abstract: We propose a novel context-dependent (CD) model for large-vocabulary speech recognition (LVSR) that leverages recent advances in using deep belief networks for phone recognition. We describe a pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output. The deep belief network pre-training algorithm is a robust and often helpful way to initialize deep neural networks generatively that can aid in optimization and reduce generalization error. We illustrate the key components of our model, describe the procedure for applying CD-DNN-HMMs to LVSR, and analyze the effects of various modeling choices on performance. Experiments on a challenging business search dataset demonstrate that CD-DNN-HMMs can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs, with an absolute sentence accuracy improvement of 5.8% and 9.2% (or relative error reduction of 16.0% and 23.2%) over the CD-GMM-HMMs trained using the minimum phone error rate (MPE) and maximum-likelihood (ML) criteria, respectively.

...read moreread less

3,120 citations

Book•

Deep Learning: Methods and Applications

[...]

Li Deng¹, Dong Yu¹•Institutions (1)

Microsoft¹

12 Jun 2014

TL;DR: This monograph provides an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks, including natural language and text processing, information retrieval, and multimodal information processing empowered by multi-task deep learning.

...read moreread less

Abstract: This monograph provides an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks. The application areas are chosen with the following three criteria in mind: (1) expertise or knowledge of the authors; (2) the application areas that have already been transformed by the successful use of deep learning technology, such as speech recognition and computer vision; and (3) the application areas that have the potential to be impacted significantly by deep learning and that have been experiencing research growth, including natural language and text processing, information retrieval, and multimodal information processing empowered by multi-task deep learning.

...read moreread less

2,817 citations

Journal Article•

Deep Neural Networks for Acoustic Modeling in Speech Recognition

[...]

Geoffrey E. Hinton, Li Deng, Dong Yu, George E. Dahl, Abdelrahman Mohamed, Navdeep Jaitly, Andrew W. Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N. Sainath, Brian Kingsbury - Show less +7 more

01 Nov 2012-IEEE Signal Processing Magazine

TL;DR: This paper provides an overview of this progress and repres nts the shared views of four research groups who have had recent successes in using deep neural networks for a coustic modeling in speech recognition.

...read moreread less

2,527 citations

Journal Article•DOI•

Convolutional neural networks for speech recognition

[...]

Ossama Abdel-Hamid¹, Abdelrahman Mohamed², Hui Jiang¹, Li Deng³, Gerald Penn², Dong Yu³ - Show less +2 more•Institutions (3)

York University¹, University of Toronto², Microsoft³

01 Oct 2014-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: It is shown that further error rate reduction can be obtained by using convolutional neural networks (CNNs), and a limited-weight-sharing scheme is proposed that can better model speech features.

...read moreread less

Abstract: Recently, the hybrid deep neural network (DNN)- hidden Markov model (HMM) has been shown to significantly improve speech recognition performance over the conventional Gaussian mixture model (GMM)-HMM. The performance improvement is partially attributed to the ability of the DNN to model complex correlations in speech features. In this paper, we show that further error rate reduction can be obtained by using convolutional neural networks (CNNs). We first present a concise description of the basic CNN and explain how it can be used for speech recognition. We further propose a limited-weight-sharing scheme that can better model speech features. The special structure such as local connectivity, weight sharing, and pooling in CNNs exhibits some degree of invariance to small shifts of speech features along the frequency axis, which is important to deal with speaker and environment variations. Experimental results show that CNNs reduce the error rate by 6%-10% compared with DNNs on the TIMIT phone recognition and the voice search large vocabulary speech recognition tasks.

...read moreread less

1,948 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse