Home
/
Authors
/
Guoning Hu

Author

Guoning Hu

Bio: Guoning Hu is an academic researcher from Ohio State University. The author has contributed to research in topics: Computational auditory scene analysis & Auditory scene analysis. The author has an hindex of 12, co-authored 16 publications receiving 1037 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Monaural speech segregation based on pitch tracking and amplitude modulation

[...]

Guoning Hu¹, DeLiang Wang¹•Institutions (1)

Ohio State University¹

01 Sep 2004-IEEE Transactions on Neural Networks

TL;DR: This work proposes a novel system for voiced speech segregation that segregates resolved and unresolved harmonics differently, and it yields substantially better performance, especially for the high-frequency part of speech.

...read moreread less

Abstract: Segregating speech from one monaural recording has proven to be very challenging. Monaural segregation of voiced speech has been studied in previous systems that incorporate auditory scene analysis principles. A major problem for these systems is their inability to deal with the high-frequency part of speech. Psychoacoustic evidence suggests that different perceptual mechanisms are involved in handling resolved and unresolved harmonics. We propose a novel system for voiced speech segregation that segregates resolved and unresolved harmonics differently. For resolved harmonics, the system generates segments based on temporal continuity and cross-channel correlation, and groups them according to their periodicities. For unresolved harmonics, it generates segments based on common amplitude modulation (AM) in addition to temporal continuity and groups them according to AM rates. Underlying the segregation process is a pitch contour that is first estimated from speech segregated according to dominant pitch and then adjusted according to psychoacoustic constraints. Our system is systematically evaluated and compared with pervious systems, and it yields substantially better performance, especially for the high-frequency part of speech.

...read moreread less

394 citations

Journal Article•DOI•

A Tandem Algorithm for Pitch Estimation and Voiced Speech Segregation

[...]

Guoning Hu¹, DeLiang Wang¹•Institutions (1)

Ohio State University¹

01 Nov 2010-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A tandem algorithm is proposed that performs pitch estimation of a target utterance and segregation of voiced portions of target speech jointly and iteratively and performs substantially better than previous systems for either pitch extraction or voiced speech segregation.

...read moreread less

Abstract: A lot of effort has been made in computational auditory scene analysis (CASA) to segregate speech from monaural mixtures. The performance of current CASA systems on voiced speech segregation is limited by lacking a robust algorithm for pitch estimation. We propose a tandem algorithm that performs pitch estimation of a target utterance and segregation of voiced portions of target speech jointly and iteratively. This algorithm first obtains a rough estimate of target pitch, and then uses this estimate to segregate target speech using harmonicity and temporal continuity. It then improves both pitch estimation and voiced speech segregation iteratively. Novel methods are proposed for performing segregation with a given pitch estimate and pitch determination with given segregation. Systematic evaluation shows that the tandem algorithm extracts a majority of target speech without including much interference, and it performs substantially better than previous systems for either pitch extraction or voiced speech segregation.

...read moreread less

263 citations

Journal Article•DOI•

Auditory Segmentation Based on Onset and Offset Analysis

[...]

Guoning Hu¹, DeLiang Wang¹•Institutions (1)

Ohio State University¹

01 Feb 2007-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: Systematic evaluation shows that most of target speech, including unvoiced speech, is correctly segmented, and target speech and interference are well separated into different segments.

...read moreread less

Abstract: A typical auditory scene in a natural environment contains multiple sources. Auditory scene analysis (ASA) is the process in which the auditory system segregates a scene into streams corresponding to different sources. Segmentation is a major stage of ASA by which an auditory scene is decomposed into segments, each containing signal mainly from one source. We propose a system for auditory segmentation by analyzing onsets and offsets of auditory events. The proposed system first detects onsets and offsets, and then generates segments by matching corresponding onset and offset fronts. This is achieved through a multiscale approach. A quantitative measure is suggested for segmentation evaluation. Systematic evaluation shows that most of target speech, including unvoiced speech, is correctly segmented, and target speech and interference are well separated into different segments

...read moreread less

135 citations

Proceedings Article•DOI•

Speech segregation based on pitch tracking and amplitude modulation

[...]

Guoning Hu¹, DeLiang Wang•Institutions (1)

Ohio State University¹

21 Oct 2001

TL;DR: This work extends the Wang-Brown model for speech segregation by adding further processes based on psychoacoustic evidence to improve the performance, and it yields significantly better performance.

...read moreread less

Abstract: Speech segregation is an important task of auditory scene analysis (ASA), in which the speech of a certain speaker is separated from other interfering signals. D.L. Wang and G.J. Brown (see IEEE Trans. Neural Network, vol.10, p.684-97, 1999) proposed a multistage neural model for speech segregation, the core of which is a two-layer oscillator network. We extend their model by adding further processes based on psychoacoustic evidence to improve the performance. These processes include pitch tracking and grouping based on amplitude modulation (AM). Our model is systematically evaluated and compared with the Wang-Brown model, and it yields significantly better performance.

...read moreread less

79 citations

Journal Article•DOI•

Segregation of unvoiced speech from nonspeech interference

[...]

Guoning Hu¹, DeLiang Wang¹•Institutions (1)

Ohio State University¹

05 Aug 2008-Journal of the Acoustical Society of America

TL;DR: Systematic evaluation shows that the proposed system extracts a majority of unvoiced speech without including much interference, and it performs substantially better than spectral subtraction.

...read moreread less

Abstract: Monaural speech segregation has proven to be extremely challenging. While efforts in computational auditory scene analysis have led to considerable progress in voiced speech segregation, little attention has been given to unvoiced speech, which lacks harmonic structure and has weaker energy, hence more susceptible to interference. This study proposes a new approach to the problem of segregating unvoiced speech from nonspeech interference. The study first addresses the question of how much speech is unvoiced. The segregation process occurs in two stages: Segmentation and grouping. In segmentation, the proposed model decomposes an input mixture into contiguous time-frequency segments by a multiscale analysis of event onsets and offsets. Grouping of unvoiced segments is based on Bayesian classification of acoustic-phonetic features. The proposed model for unvoiced speech segregation joins an existing model for voiced speech segregation to produce an overall system that can deal with both voiced and unvoiced speech. Systematic evaluation shows that the proposed system extracts a majority of unvoiced speech without including much interference, and it performs substantially better than spectral subtraction.

...read moreread less

74 citations

1
2
3
4
…

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria

[...]

Tuomas Virtanen¹•Institutions (1)

Tampere University of Technology¹

01 Mar 2007-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: An unsupervised learning algorithm for the separation of sound sources in one-channel music signals is presented and enables a better separation quality than the previous algorithms.

...read moreread less

Abstract: An unsupervised learning algorithm for the separation of sound sources in one-channel music signals is presented. The algorithm is based on factorizing the magnitude spectrogram of an input signal into a sum of components, each of which has a fixed magnitude spectrum and a time-varying gain. Each sound source, in turn, is modeled as a sum of one or more components. The parameters of the components are estimated by minimizing the reconstruction error between the input spectrogram and the model, while restricting the component spectrograms to be nonnegative and favoring components whose gains are slowly varying and sparse. Temporal continuity is favored by using a cost term which is the sum of squared differences between the gains in adjacent frames, and sparseness is favored by penalizing nonzero gains. The proposed iterative estimation algorithm is initialized with random values, and the gains and the spectra are then alternatively updated using multiplicative update rules until the values converge. Simulation experiments were carried out using generated mixtures of pitched musical instrument samples and drum sounds. The performance of the proposed method was compared with independent subspace analysis and basic nonnegative matrix factorization, which are based on the same linear model. According to these simulations, the proposed method enables a better separation quality than the previous algorithms. Especially, the temporal continuity criterion improved the detection of pitched musical sounds. The sparseness criterion did not produce significant improvements

...read moreread less

1,096 citations

Journal Article•DOI•

Supervised Speech Separation Based on Deep Learning: An Overview

[...]

DeLiang Wang¹, Jitong Chen¹•Institutions (1)

Ohio State University¹

01 Oct 2018-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A comprehensive overview of deep learning-based supervised speech separation can be found in this paper, where three main components of supervised separation are discussed: learning machines, training targets, and acoustic features.

...read moreread less

Abstract: Speech separation is the task of separating target speech from background interference. Traditionally, speech separation is studied as a signal processing problem. A more recent approach formulates speech separation as a supervised learning problem, where the discriminative patterns of speech, speakers, and background noise are learned from training data. Over the past decade, many supervised separation algorithms have been put forward. In particular, the recent introduction of deep learning to supervised speech separation has dramatically accelerated progress and boosted separation performance. This paper provides a comprehensive overview of the research on deep learning based supervised speech separation in the last several years. We first introduce the background of speech separation and the formulation of supervised separation. Then, we discuss three main components of supervised separation: learning machines, training targets, and acoustic features. Much of the overview is on separation algorithms where we review monaural methods, including speech enhancement (speech-nonspeech separation), speaker separation (multitalker separation), and speech dereverberation, as well as multimicrophone techniques. The important issue of generalization, unique to supervised learning, is discussed. This overview provides a historical perspective on how advances are made. In addition, we discuss a number of conceptual issues, including what constitutes the target source.

...read moreread less

1,009 citations

Book Chapter•DOI•

On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis

[...]

DeLiang Wang¹•Institutions (1)

Ohio State University¹

01 Jan 2005

TL;DR: This chapter is an attempt at a computational-theory analysis of auditory scene analysis, where the main task is to understand the character of the CASA problem.

...read moreread less

Abstract: In his famous treatise of computational vision, Marr (1982) makes a compelling argument for separating different levels of analysis in order to understand complex information processing. In particular, the computational theory level, concerned with the goal of computation and general processing strategy, must be separated from the algorithm level, or the separation of what from how. This chapter is an attempt at a computational-theory analysis of auditory scene analysis, where the main task is to understand the character of the CASA problem.

...read moreread less

617 citations

Journal Article•DOI•

Towards Scaling Up Classification-Based Speech Separation

[...]

Yuxuan Wang¹, DeLiang Wang¹•Institutions (1)

Ohio State University¹

01 Jul 2013-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This work proposes to learn more linearly separable and discriminative features from raw acoustic features and train linear SVMs, which are much easier and faster to train than kernel SVMs.

...read moreread less

Abstract: Formulating speech separation as a binary classification problem has been shown to be effective. While good separation performance is achieved in matched test conditions using kernel support vector machines (SVMs), separation in unmatched conditions involving new speakers and environments remains a big challenge. A simple yet effective method to cope with the mismatch is to include many different acoustic conditions into the training set. However, large-scale training is almost intractable for kernel machines due to computational complexity. To enable training on relatively large datasets, we propose to learn more linearly separable and discriminative features from raw acoustic features and train linear SVMs, which are much easier and faster to train than kernel SVMs. For feature learning, we employ standard pre-trained deep neural networks (DNNs). The proposed DNN-SVM system is trained on a variety of acoustic conditions within a reasonable amount of time. Experiments on various test mixtures demonstrate good generalization to unseen speakers and background noises.

...read moreread less

460 citations

Monaural speech segregation based on pitch tracking and

[...]

Biophysics Program, DeLiang Wang

01 Jan 2002

TL;DR: In this paper, the authors propose a system for speech segmentation that deals with low-frequency and high-frequency signals differently, based on temporal continuity and cross-channel correlation, and groups segments according to periodicity.

...read moreread less

Abstract: inability to deal with signals in the high -freq uency range. Psychoacoustic evidence sugge sts that different perceptual mechanisms are involved to handle resolved and unresolved hannonics. We propose a system for speech segre gation that deals with low-frequency and high-frequency signals differently. For low-frequency signals , our model generates segm ents based on temporal continuity and cross-channel correlation , and groups them according to periodicity. For high-frequency signals, the model generates segments based on common amplitude modulation (AM) in addition to temporal contin uity, and grou ps them according to AM repetition rates. Underlying the group ing process is a p itch contour that is first est imated from segregated speech based on global pitch and then verified by psychoacoustic constraints. Our system is systematically evaluated, and it yields substantially better performan ce than previous CASA systems, especially in the high-frequency range.

...read moreread less

401 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162

Collapse