Home
/
Authors
/
Sridha Sridharan

Author

Sridha Sridharan

Other affiliations: SRI International, Carnegie Mellon University, Commonwealth Scientific and Industrial Research Organisation ...read more

Bio: Sridha Sridharan is an academic researcher from Queensland University of Technology. The author has contributed to research in topics: Speaker recognition & Feature extraction. The author has an hindex of 54, co-authored 736 publications receiving 12998 citations. Previous affiliations of Sridha Sridharan include SRI International & Carnegie Mellon University.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990

Papers

PDF

Open Access

More filters

Feature Warping for Robust Speaker Verification

[...]

Jason W. Pelecanos, Sridha Sridharan

01 Jan 2001

TL;DR: In this paper, the authors proposed a target mapping method that warps the distribution of a cepstral feature stream to a standardised distribution over a specified time interval, which is robust to channel mismatch, additive noise and to some extent, non-linear effects attributed to transducers.

...read moreread less

Abstract: We propose a novel feature mapping approach that is robust to channel mismatch, additive noise and to some extent, non-linear effects attributed to handset transducers. These adverse effects can distort the short-term distribution of the speech features. Some methods have addressed this issue by conditioning the variance of the distribution, but not to the extent of conforming the speech statistics to a target distribution. The proposed target mapping method warps the distribution of a cepstral feature stream to a standardised distribution over a specified time interval. We evaluate a number of the enhancement methods for speaker verification, and compare them against a Gaussian target mapping implementation. Results indicate improvements of the warping technique over a number of methods such as Cepstral Mean Subtraction (CMS), modulation spectrum processing, and short-term windowed CMS and variance normalisation.

...read moreread less

565 citations

Proceedings Article•DOI•

Crowd Counting Using Multiple Local Features

[...]

David Ryan¹, Simon Denman¹, Clinton Fookes¹, Sridha Sridharan¹•Institutions (1)

Queensland University of Technology¹

01 Dec 2009

TL;DR: An approach that uses local features to count the number of people in each foreground blob segment, so that the total crowd estimate is the sum of the group sizes is proposed.

...read moreread less

Abstract: In public venues, crowd size is a key indicator of crowd safety and stability. Crowding levels can be detected using holistic image features, however this requires a large amount of training data to capture the wide variations in crowd distribution. If a crowd counting algorithm is to be deployed across a large number of cameras, such a large and burdensome training requirement is far from ideal. In this paper we propose an approach that uses local features to count the number of people in each foreground blob segment, so that the total crowd estimate is the sum of the group sizes. This results in an approach that is scalable to crowd volumes not seen in the training data, and can be trained on a very small data set. As a local approach is used, the proposed algorithm can easily be used to estimate crowd density throughout different regions of the scene and be used in a multi-camera environment. A unique localised approach to ground truth annotation reduces the required training data is also presented, as a localised approach to crowd counting has different training requirements to a holistic one. Testing on a large pedestrian database compares the proposed technique to existing holistic techniques and demonstrates improved accuracy, and superior performance when test conditions are unseen in the training set, or a minimal training set is used.

...read moreread less

328 citations

Journal Article•DOI•

Iris Recognition With Off-the-Shelf CNN Features: A Deep Learning Perspective

[...]

Kien Nguyen¹, Clinton Fookes¹, Arun Ross², Sridha Sridharan¹•Institutions (2)

Queensland University of Technology¹, Michigan State University²

01 Jan 2018-IEEE Access

TL;DR: It is shown that the off-the-shelf CNN features, while originally trained for classifying generic objects, are also extremely good at representing iris images, effectively extracting discriminative visual features and achieving promising recognition results on two iris datasets: ND-CrossSensor-2013 and CASIA-Iris-Thousand.

...read moreread less

Abstract: Iris recognition refers to the automated process of recognizing individuals based on their iris patterns. The seemingly stochastic nature of the iris stroma makes it a distinctive cue for biometric recognition. The textural nuances of an individual’s iris pattern can be effectively extracted and encoded by projecting them onto Gabor wavelets and transforming the ensuing phasor response into a binary code - a technique pioneered by Daugman. This textural descriptor has been observed to be a robust feature descriptor with very low false match rates and low computational complexity. However, recent advancements in deep learning and computer vision indicate that generic descriptors extracted using convolutional neural networks (CNNs) are able to represent complex image characteristics. Given the superior performance of CNNs on the ImageNet large scale visual recognition challenge and a large number of other computer vision tasks, in this paper, we explore the performance of state-of-the-art pre-trained CNNs on iris recognition. We show that the off-the-shelf CNN features, while originally trained for classifying generic objects, are also extremely good at representing iris images, effectively extracting discriminative visual features and achieving promising recognition results on two iris datasets: ND-CrossSensor-2013 and CASIA-Iris-Thousand. We also discuss the challenges and future research directions in leveraging deep learning methods for the problem of iris recognition.

...read moreread less

291 citations

Journal Article•DOI•

Soft + Hardwired attention: An LSTM framework for human trajectory prediction and abnormal event detection.

[...]

Tharindu Fernando¹, Simon Denman¹, Sridha Sridharan¹, Clinton Fookes¹•Institutions (1)

Queensland University of Technology¹

20 Sep 2018-Neural Networks

TL;DR: This work proposes a novel method to predict the future motion of a pedestrian given a short history of their, and their neighbours, past behaviour using a combined attention model which utilises both "soft attention" as well as "hard-wired" attention in order to map the trajectory information from the local neighbourhood to the future positions of the pedestrian of interest.

...read moreread less

242 citations

Proceedings Article•DOI•

i-vector based speaker recognition on short utterances

[...]

Ahilan Kanagasundaram¹, Robbie Vogt¹, David Dean¹, Sridha Sridharan¹, Michael Mason¹ - Show less +1 more•Institutions (1)

Queensland University of Technology¹

27 Aug 2011

TL;DR: In this paper, a comparison of Joint Factor Analysis (JFA) and i-vector based systems including various compensation techniques; Within-Class Covariance Normalization (WCCN), LDA, Scatter Difference Nuisance Attribute Projection (SDNAP) and Gaussian Probabilistic Linear Discriminant Analysis (GPLDA) is presented.

...read moreread less

Abstract: Robust speaker verification on short utterances remains a key consideration when deploying automatic speaker recognition, as many real world applications often have access to only limited duration speech data. This paper explores how the recent technologies focused around total variability modeling behave when training and testing utterance lengths are reduced. Results are presented which provide a comparison of Joint Factor Analysis (JFA) and i-vector based systems including various compensation techniques; Within-Class Covariance Normalization (WCCN), LDA, Scatter Difference Nuisance Attribute Projection (SDNAP) and Gaussian Probabilistic Linear Discriminant Analysis (GPLDA). Speaker verification performance for utterances with as little as 2 sec of data taken from the NIST Speaker Recognition Evaluations are presented to provide a clearer picture of the current performance characteristics of these techniques in short utterance conditions.

...read moreread less

239 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

I and i

[...]

Kevin Barraclough

08 Dec 2001-BMJ

TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.

...read moreread less

Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

...read moreread less

33,785 citations

Pattern Recognition and Machine Learning

[...]

Christopher M. Bishop¹•Institutions (1)

Microsoft¹

01 Jan 2006

TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.

...read moreread less

Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

...read moreread less

10,141 citations

Journal Article•

“Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告

[...]

杉山拓海

12 Sep 2017-Computers & Graphics

3,940 citations

Journal Article•DOI•

Front-End Factor Analysis for Speaker Verification

[...]

Najim Dehak¹, Patrick Kenny, Réda Dehak², Pierre Dumouchel, Pierre Ouellet - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, École Pour l'Informatique et les Techniques Avancées²

01 May 2011-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.

...read moreread less

Abstract: This paper presents an extension of our previous work which proposes a new speaker representation for speaker verification. In this modeling, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis. This space is named the total variability space because it models both speaker and channel variabilities. Two speaker verification systems are proposed which use this new representation. The first system is a support vector machine-based system that uses the cosine kernel to estimate the similarity between the input data. The second system directly uses the cosine similarity as the final decision score. We tested three channel compensation techniques in the total variability space, which are within-class covariance normalization (WCCN), linear discriminate analysis (LDA), and nuisance attribute projection (NAP). We found that the best results are obtained when LDA is followed by WCCN. We achieved an equal error rate (EER) of 1.12% and MinDCF of 0.0094 using the cosine distance scoring on the male English trials of the core condition of the NIST 2008 Speaker Recognition Evaluation dataset. We also obtained 4% absolute EER improvement for both-gender trials on the 10 s-10 s condition compared to the classical joint factor analysis scoring.

...read moreread less

3,526 citations

Proceedings Article•DOI•

The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression

[...]

Patrick Lucey¹, Jeffrey F. Cohn¹, Takeo Kanade¹, Jason Saragih¹, Zara Ambadar², Iain Matthews³ - Show less +2 more•Institutions (3)

Carnegie Mellon University¹, University of Pittsburgh², Disney Research³

13 Jun 2010

TL;DR: The Cohn-Kanade (CK+) database is presented, with baseline results using Active Appearance Models (AAMs) and a linear support vector machine (SVM) classifier using a leave-one-out subject cross-validation for both AU and emotion detection for the posed data.

...read moreread less

Abstract: In 2000, the Cohn-Kanade (CK) database was released for the purpose of promoting research into automatically detecting individual facial expressions. Since then, the CK database has become one of the most widely used test-beds for algorithm development and evaluation. During this period, three limitations have become apparent: 1) While AU codes are well validated, emotion labels are not, as they refer to what was requested rather than what was actually performed, 2) The lack of a common performance metric against which to evaluate new algorithms, and 3) Standard protocols for common databases have not emerged. As a consequence, the CK database has been used for both AU and emotion detection (even though labels for the latter have not been validated), comparison with benchmark algorithms is missing, and use of random subsets of the original database makes meta-analyses difficult. To address these and other concerns, we present the Extended Cohn-Kanade (CK+) database. The number of sequences is increased by 22% and the number of subjects by 27%. The target expression for each sequence is fully FACS coded and emotion labels have been revised and validated. In addition to this, non-posed sequences for several types of smiles and their associated metadata have been added. We present baseline results using Active Appearance Models (AAMs) and a linear support vector machine (SVM) classifier using a leave-one-out subject cross-validation for both AU and emotion detection for the posed data. The emotion and AU labels, along with the extended image data and tracked landmarks will be made available July 2010.

...read moreread less

3,439 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse