Home
/
Authors
/
Xiaoyu Zhang

Author

Xiaoyu Zhang

Bio: Xiaoyu Zhang is an academic researcher from Rutgers University. The author has contributed to research in topics: Speaker recognition & Speech processing. The author has an hindex of 6, co-authored 10 publications receiving 615 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Robust speaker recognition: a feature-based approach

[...]

Richard J. Mammone, Xiaoyu Zhang, Ravi P. Ramachandran¹•Institutions (1)

Rutgers University¹

01 Jan 1996-IEEE Signal Processing Magazine

TL;DR: Linear predictive (LP) analysis, the first step of feature extraction, is discussed, and various robust cepstral features derived from LP coefficients are described, including the afJine transform, which is a feature transformation approach that integrates mismatch to simultaneously combat both channel and noise distortion.

...read moreread less

Abstract: The future commercialization of speaker- and speech-recognition technology is impeded by the large degradation in system performance due to environmental differences between training and testing conditions. This is known as the "mismatched condition." Studies have shown [l] that most contemporary systems achieve good recognition performance if the conditions during training are similar to those during operation (matched conditions). Frequently, mismatched conditions axe present in which the performance is dramatically degraded as compared to the ideal matched conditions. A common example of this mismatch is when training is done on clean speech and testing is performed on noise- or channel-corrupted speech. Robust speech techniques [2] attempt to maintain the performance of a speech processing system under such diverse conditions of operation. This article presents an overview of current speaker-recognition systems and the problems encountered in operation, and it focuses on the front-end feature extraction process of robust speech techniques as a method of improvement. Linear predictive (LP) analysis, the first step of feature extraction, is discussed, and various robust cepstral features derived from LP coefficients are described. Also described is the afJine transform, which is a feature transformation approach that integrates mismatch to simultaneously combat both channel and noise distortion.

...read moreread less

344 citations

Patent•

Speaker verification system using decision fusion logic

[...]

Richard J. Mammone¹, Kevin R. Farrell¹, Manish Sharma¹, Devang Naik¹, Xiaoyu Zhang¹, Khaled Assaleh¹, Han-Seng Liou¹ - Show less +3 more•Institutions (1)

Rutgers University¹

07 Jun 1995

TL;DR: In this article, a pattern recognition system which uses data fusion to combine data from a plurality of extracted features and classifiers is presented. But the method is limited to a single classifier.

...read moreread less

Abstract: The present invention relates to a pattern recognition system which uses data fusion to combine data from a plurality of extracted features and a plurality of classifiers Speaker patterns can be accurately verified with the combination of discriminant based and distortion based classifiers A novel approach using a training set of a "leave one out" data can be used for training the system with a reduced data set Extracted features can be improved with a pole filtered method for reducing channel effects and an affine transformation for improving the correlation between training and testing data

...read moreread less

76 citations

Patent•

Voice print system and method

[...]

Manish Sharma, Xiaoyu Zhang, Richard J. Mammone

08 Jan 2002

TL;DR: In this article, a subword-based, text-dependent automatic speaker verification system that embodies the capability of user-selectable passwords with no constraints on the choice of vocabulary words or the language is presented.

...read moreread less

Abstract: The voice print system of the present invention is a subword-based, text-dependent automatic speaker verification system that embodies the capability of user-selectable passwords with no constraints on the choice of vocabulary words or the language. Automatic blind speech segmentation allows speech to be segmented into subword units without any linguistic knowledge of the password. Subword modeling is performed using a multiple classifiers. The system also takes advantage of such concepts as multiple classifier fusion and data resampling to successfully boost the performance. Key word/key phrase spotting is used to optimally locate the password phrase. Numerous adaptation techniques increase the flexibility of the base system, and include: channel adaptation, fusion adaptation, model adaptation and threshold adaptation.

...read moreread less

60 citations

Patent•

Subword-based speaker verification using multiple classifier fusion, with channel, fusion, model, and threshold adaptation

[...]

Manish Sharma, Xiaoyu Zhang, Richard J. Mammone

21 Nov 1997

TL;DR: In this article, a text-dependent automatic speaker verification voiceprint system embodies a capability of user-selectable passwords with no constraints on the choice of vocabulary words or the language.

...read moreread less

Abstract: The subword-based, text-dependent automatic speaker verification voiceprint system embodies a capability of user-selectable passwords with no constraints on the choice of vocabulary words or the language. Automatic blind speech segmentation allows speech to be segmented into subword units (210) without any linguistic knowledge of the password. Subword modeling is performed using multiple classifiers (240, 250). The system also takes advantage of such concepts as multiple classifier fusion (260) and data resampling to successfully boost the performance. Key word/key phrase spotting (200) is used to optimally locate the password phrase. Numerous adaptation techniques increase the flexibility of the base system, and include: channel adaptation (180), fusion adaptation (290), model adaptation (220, 230) and threshold adaptation (295).

...read moreread less

57 citations

Patent•DOI•

Subword-based speaker verification with multiple-classifier score fusion weight and threshold adaptation

[...]

Manish Sharma, Xiaoyu Zhang, Richard J. Mammone

21 Nov 1997-Journal of the Acoustical Society of America

TL;DR: The voice print system of the present invention is a subword-based, text-dependent automatic speaker verification system that embodies the capability of user-selectable passwords with no constraints on the choice of vocabulary words or the language.

...read moreread less

41 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Speaker recognition: a tutorial

[...]

Jr. J.P. Campbell¹•Institutions (1)

Johns Hopkins University¹

01 Sep 1997

TL;DR: A tutorial on the design and development of automatic speaker-recognition systems is presented and a new automatic speakers recognition system is given that performs with 98.9% correct decalcification.

...read moreread less

Abstract: A tutorial on the design and development of automatic speaker-recognition systems is presented. Automatic speaker recognition is the use of a machine to recognize a person from a spoken phrase. These systems can operate in two modes: to identify a particular person or to verify a person's claimed identity. Speech processing and the basic components of automatic speaker-recognition systems are shown and design tradeoffs are discussed. Then, a new automatic speaker-recognition system is given. This recognizer performs with 98.9% correct decalcification. Last, the performances of various systems are compared.

...read moreread less

1,686 citations

Journal Article•DOI•

Digital processing of speech signals

[...]

M.G. Bellanger

01 Oct 1980

1,565 citations

Journal Article•DOI•

An overview of text-independent speaker recognition: From features to supervectors

[...]

Tomi Kinnunen¹, Haizhou Li²•Institutions (2)

University of Eastern Finland¹, Institute for Infocomm Research Singapore²

01 Jan 2010-Speech Communication

TL;DR: This paper starts with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling and elaborate advanced computational techniques to address robustness and session variability.

...read moreread less

1,433 citations

Journal Article•DOI•

Automated localisation of the optic disc, fovea, and retinal blood vessels from digital colour fundus images.

[...]

Chanjira Sinthanayothin¹, James F. Boyce, Helen L Cook, Tom H. Williamson•Institutions (1)

King's College London¹

01 Aug 1999-British Journal of Ophthalmology

TL;DR: In this study the optic disc, blood vessels, and fovea were accurately detected and the identification of the normal components of the retinal image will aid the future detection of diseases in these regions.

...read moreread less

Abstract: Aim—To recognise automatically the main components of the fundus on digital colour images. Methods—The main features of a fundus retinal image were defined as the optic disc, fovea, and blood vessels. Methods are described for their automatic recognition and location. 112 retinal images were preprocessed via adaptive, local, contrast enhancement. The optic discs were located by identifying the area with the highest variation in intensity of adjacent pixels. Blood vessels were identified by means of a multilayer perceptron neural net, for which the inputs were derived from a principal component analysis (PCA) of the image and edge detection of the first component of PCA. The foveas were identified using matching correlation together with characteristics typical of a fovea—for example, darkest area in the neighbourhood of the optic disc. The main components of the image were identified by an experienced ophthalmologist for comparison with computerised methods. Results—The sensitivity and specificity of the recognition of each retinal main component was as follows:99.1% and 99.1% for the optic disc; 83.3% and 91.0% for blood vessels; 80.4% and 99.1% for the fovea. Conclusions—In this study the optic disc, blood vessels, and fovea were accurately detected. The identification of the normal components of the retinal image will aid the future detection of diseases in these regions. In diabetic retinopathy, for example,an image could be analysed for retinopathy with reference to sight threatening complications such as disc neovascularisation, vascular changes, or foveal exudation. (Br J Ophthalmol 1999;83:902‐910)

...read moreread less

846 citations

Journal Article•DOI•

Speaker Recognition by Machines and Humans: A tutorial review

[...]

John H. L. Hansen¹, Taufiq Hasan¹•Institutions (1)

University of Texas at Dallas¹

14 Oct 2015-IEEE Signal Processing Magazine

TL;DR: A comparative study of human versus machine speaker recognition is concluded, with an emphasis on prominent speaker-modeling techniques that have emerged in the last decade for automatic systems.

...read moreread less

Abstract: Identifying a person by his or her voice is an important human trait most take for granted in natural human-to-human interaction/communication. Speaking to someone over the telephone usually begins by identifying who is speaking and, at least in cases of familiar speakers, a subjective verification by the listener that the identity is correct and the conversation can proceed. Automatic speaker-recognition systems have emerged as an important means of verifying identity in many e-commerce applications as well as in general business interactions, forensics, and law enforcement. Human experts trained in forensic speaker recognition can perform this task even better by examining a set of acoustic, prosodic, and linguistic characteristics of speech in a general approach referred to as structured listening. Techniques in forensic speaker recognition have been developed for many years by forensic speech scientists and linguists to help reduce any potential bias or preconceived understanding as to the validity of an unknown audio sample and a reference template from a potential suspect. Experienced researchers in signal processing and machine learning continue to develop automatic algorithms to effectively perform speaker recognition?with ever-improving performance?to the point where automatic systems start to perform on par with human listeners. In this article, we review the literature on speaker recognition by machines and humans, with an emphasis on prominent speaker-modeling techniques that have emerged in the last decade for automatic systems. We discuss different aspects of automatic systems, including voice-activity detection (VAD), features, speaker models, standard evaluation data sets, and performance metrics. Human speaker recognition is discussed in two parts?the first part involves forensic speaker-recognition methods, and the second illustrates how a na?ve listener performs this task from a neuroscience perspective. We conclude this review with a comparative study of human versus machine speaker recognition and attempt to point out strengths and weaknesses of each.

...read moreread less

554 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119

Collapse