Showing papers by "DeLiang Wang published in 2011"

PDF

Open Access

Journal Article•DOI•

HMM-Based Multipitch Tracking for Noisy and Reverberant Speech

[...]

Zhaozhang Jin¹, DeLiang Wang¹•Institutions (1)

01 Jul 2011-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This paper proposes a robust algorithm for multipitch tracking in the presence of both background noise and room reverberation, which can reliably detect single and double pitch contours in noisy and reverberant conditions.

...read moreread less

Abstract: Multipitch tracking in real environments is critical for speech signal processing. Determining pitch in reverberant and noisy speech is a particularly challenging task. In this paper, we propose a robust algorithm for multipitch tracking in the presence of both background noise and room reverberation. An auditory front-end and a new channel selection method are utilized to extract periodicity features. We derive pitch scores for each pitch state, which estimate the likelihoods of the observed periodicity features given pitch candidates. A hidden Markov model integrates these pitch scores and searches for the best pitch state sequence. Our algorithm can reliably detect single and double pitch contours in noisy and reverberant conditions. Quantitative evaluations show that our approach outperforms existing ones, particularly in reverberant conditions.

...read moreread less

81 citations

Journal Article•DOI•

LEGION-Based Automatic Road Extraction From Satellite Imagery

[...]

Jiangye Yuan¹, DeLiang Wang¹, Bo Wu¹, Lin Yan¹, Rongxing Li¹ - Show less +1 more•Institutions (1)

Ohio State University¹

31 May 2011-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: An automatic method for road extraction from satellite imagery using locally excitatory globally inhibitory oscillator networks (LEGION) and a comparison with other methods shows that the proposed method produces very competitive extraction results.

...read moreread less

Abstract: An automatic method for road extraction from satellite imagery is presented. The core of the proposed method is locally excitatory globally inhibitory oscillator networks (LEGION). The road extraction task is decomposed into three stages. The first stage is image segmentation by LEGION. In the second stage, the medial axis of each segment is computed, and the medial axis points corresponding to narrow regions are selected. The third is the road grouping stage. Alignment-dependent connections between selected points are established, and LEGION is utilized to group well-aligned points, which represent the extracted roads. Due to the selective gating mechanism of LEGION, different roads in an image are grouped separately. Road extraction results on synthetic and real images are presented. A comparison with other methods shows that the proposed method produces very competitive extraction results.

...read moreread less

60 citations

Journal Article•DOI•

Unvoiced Speech Segregation From Nonspeech Interference via CASA and Spectral Subtraction

[...]

Ke Hu¹, DeLiang Wang¹•Institutions (1)

Ohio State University¹

01 Aug 2011-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: The proposed algorithm is computationally efficient, and systematic evaluation and comparison show that the approach considerably improves the performance of unvoiced speech segregation.

...read moreread less

Abstract: While a lot of effort has been made in computational auditory scene analysis to segregate voiced speech from monaural mixtures, unvoiced speech segregation has not received much attention. Unvoiced speech is highly susceptible to interference due to its relatively weak energy and lack of harmonic structure, and hence makes its segregation extremely difficult. This paper proposes a new approach to segregation of unvoiced speech from nonspeech interference. The proposed system first removes estimated voiced speech, and the periodic part of interference based on cross-channel correlation. The resultant interference becomes more stationary and we estimate the noise energy in unvoiced intervals using segregated speech in neighboring voiced intervals. Then unvoiced speech segregation occurs in two stages: segmentation and grouping. In segmentation, we apply spectral subtraction to generate time-frequency segments in unvoiced intervals. Unvoiced speech segments are subsequently grouped based on frequency characteristics of unvoiced speech using simple thresholding as well as Bayesian classification. The proposed algorithm is computationally efficient, and systematic evaluation and comparison show that our approach considerably improves the performance of unvoiced speech segregation.

...read moreread less

58 citations

Proceedings Article•DOI•

An SVM based classification approach to speech separation

[...]

Kun Han¹, DeLiang Wang¹•Institutions (1)

Ohio State University¹

22 May 2011

TL;DR: Systematic evaluations show that the proposed classification approach to monaural separation problem produces high quality binary masks and outperforms a previous system in terms of classification accuracy.

...read moreread less

Abstract: Monaural speech separation is a very challenging task. CASA-based systems utilize acoustic features to produce a time-frequency (T-F) mask. In this study, we propose a classification approach to monaural separation problem. Our feature set consists of pitch-based features and amplitude modulation spectrum features, which can discriminate both voiced and unvoiced speech from nonspeech interference. We employ support vector machines (SVMs) followed by a re-thresholding method to classify each T-F unit as either target-dominated or interference-dominated. An auditory segmentation stage is then utilized to improve SVM-generated results. Systematic evaluations show that our approach produces high quality binary masks and outperforms a previous system in terms of classification accuracy.

...read moreread less

45 citations

Journal Article•DOI•

Selecting salient objects in real scenes: An oscillatory correlation model

[...]

Marcos G. Quiles¹, DeLiang Wang², Liang Zhao³, Roseli A. F. Romero³, De-Shuang Huang⁴ - Show less +1 more•Institutions (4)

Federal University of São Paulo¹, Ohio State University², University of São Paulo³, Chinese Academy of Sciences⁴

01 Jan 2011-Neural Networks

TL;DR: A neurocomputational model of object-based selection in the framework of oscillatory correlation is presented, which selects salient objects rather than salient locations by segmenting an input scene and integrating the segments with their conspicuity obtained from a saliency map.

...read moreread less

30 citations

Journal Article•DOI•

A multistage approach to blind separation of convolutive speech mixtures

[...]

Tariqullah Jan¹, Wenwu Wang¹, DeLiang Wang²•Institutions (2)

University of Surrey¹, Ohio State University²

01 Apr 2011-Speech Communication

TL;DR: In this article, the authors proposed an algorithm for the separation of convolutive speech mixtures using two-microphone recordings, based on the combination of independent component analysis (ICA) and ideal binary mask (IBM), together with a post-filtering process in the cepstral domain.

...read moreread less

28 citations

Proceedings Article•DOI•

Robust speaker identification using a CASA front-end

[...]

Xiaojia Zhao¹, Yang Shao¹, DeLiang Wang¹•Institutions (1)

Ohio State University¹

22 May 2011

TL;DR: This work first shows that a recently introduced speaker feature, Gammatone Frequency Cepstral Coefficient, performs substantially better than conventional speaker features under noisy conditions, and applies CASA separation and then either reconstruct or marginalize corrupted components indicated by the CASA mask.

...read moreread less

Abstract: Speaker recognition remains a challenging task under noisy conditions. Inspired by auditory perception, computational auditory scene analysis (CASA) typically segregates speech by producing a binary time-frequency mask. We first show that a recently introduced speaker feature, Gammatone Frequency Cepstral Coefficient, performs substantially better than conventional speaker features under noisy conditions. To deal with noisy speech, we apply CASA separation and then either reconstruct or marginalize corrupted components indicated by the CASA mask. Both methods are effective. We further combine them into a single system depending on the detected signal to noise ratio (SNR). This system achieves significant performance improvements over related systems under a wide range of SNR conditions.

...read moreread less

25 citations

Journal Article•DOI•

Reverberant Speech Segregation Based on Multipitch Tracking and Classification

[...]

Zhaozhang Jin¹, DeLiang Wang¹•Institutions (1)

Ohio State University¹

01 Nov 2011-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A computational auditory scene analysis approach to monaural segregation of reverberant voiced speech is proposed, which performs multipitch tracking of reverberan mixtures and supervised classification and has a significant advantage over existing systems.

...read moreread less

Abstract: Room reverberation creates a major challenge to speech segregation. We propose a computational auditory scene analysis approach to monaural segregation of reverberant voiced speech, which performs multipitch tracking of reverberant mixtures and supervised classification. Speech and nonspeech models are separately trained, and each learns to map from a set of pitch-based features to a grouping cue which encodes the posterior probability of a time-frequency (T-F) unit being dominated by the source with the given pitch estimate. Because interference may be either speech or nonspeech, a likelihood ratio test selects the correct model for labeling corresponding T-F units. Experimental results show that the proposed system performs robustly in different types of interference and various reverberant conditions, and has a significant advantage over existing systems.

...read moreread less

18 citations

Proceedings Article•DOI•

A trend estimation algorithm for singing pitch detection in musical recordings

[...]

Chao-Ling Hsu¹, DeLiang Wang², Jyh-Shing Roger Jang¹•Institutions (2)

National Tsing Hua University¹, Ohio State University²

22 May 2011

TL;DR: A trend estimation algorithm to detect the pitch ranges of a singing voice in each time frame is proposed and substantially reduces the difficulty of singing pitch detection by reducing a large number of wrong pitch candidates either produced by musical instruments or the overtones of the singing voice.

...read moreread less

Abstract: Detecting pitch values for singing voice in the presence of music accompaniment is challenging but useful for many applications. We propose a trend estimation algorithm to detect the pitch ranges of a singing voice in each time frame. The detected trend substantially reduces the difficulty of singing pitch detection by reducing a large number of wrong pitch candidates either produced by musical instruments or the overtones of the singing voice. The proposed algorithm can be applied to improve the performance of singing pitch detection. Quantitative evaluations show that proposed trend estimation improves an existing algorithm significantly. The results from the MIREX 2010 competition show that our system achieves the best overall raw-pitch accuracy for vocal songs.

...read moreread less

18 citations

Proceedings Article•DOI•

Directionality-based speech enhancement for hearing aids

[...]

John Woodruff¹, DeLiang Wang¹•Institutions (1)

Ohio State University¹

22 May 2011

TL;DR: In situations where a target of interest is near to the listener while interfering sources are more distant, simple features that capture the directionality of sound energy can be used to attenuate significant undesired signal energy and can be more effective than a strategy based on noise-floor tracking.

...read moreread less

Abstract: In this work we describe methods for using the directionality of sound energy as a criterion to estimate single- and multichannel linear filters for suppression of diffuse noise and reverberation in a hearing aid application. We compare conservative strategies where direction of arrival is unknown, and more aggressive strategies where the proposed methods can be used to derive a fast acting post-filter for the output of a beamformer. We show that in situations where a target of interest is near to the listener while interfering sources are more distant, simple features that capture the directionality of sound energy can be used to attenuate significant undesired signal energy and can be more effective than a strategy based on noise-floor tracking.

...read moreread less

5 citations

Proceedings Article•DOI•

Robust speech recognition using multiple prior models for speech reconstruction

[...]

Arun Narayanan¹, Xiaojia Zhao¹, DeLiang Wang¹, Eric Fosler-Lussier¹•Institutions (1)

Ohio State University¹

22 May 2011

TL;DR: This paper proposes to train multiple prior models of speech instead of a single prior model based on distinct characteristics of speech, and in this study, they are trained based on voicing characteristics.

...read moreread less

Abstract: Prior models of speech have been used in robust automatic speech recognition to enhance noisy speech. Typically, a single prior model is trained by pooling the entire training data. In this paper we propose to train multiple prior models of speech instead of a single prior model. The prior models can be trained based on distinct characteristics of speech. In this study, they are trained based on voicing characteristics. The trained prior models are then used to reconstruct noisy speech. Significant improvements are obtained on the Aurora-4 robust speech recognition task when multiple priors are used; in conjunction with an uncertainty transform technique, multiple priors yield a 13.7% absolute improvement in the average word error rate over directly recognizing noisy speech.

...read moreread less

Proceedings Article•DOI•

An approach to sequential grouping in cochannel speech

[...]

Ke Hu¹, DeLiang Wang¹•Institutions (1)

Ohio State University¹

22 May 2011

TL;DR: An unsupervised approach to sequential organization of cochannel speech is proposed that outperforms a model-based method in terms of speech segregation and is computationally simple.

...read moreread less

Abstract: Model-based methods for sequential organization in cochannel speech require pretrained speaker models and often prior knowledge of participating speakers. We propose an unsupervised approach to sequential organization of cochannel speech. Based on cepstral features, we first cluster voiced speech into two speaker groups by maximizing the ratio of between- and within-group distances penalized by within-group concurrent pitches. To group unvoiced speech, we employ an onset/offset based analysis to generate time-frequency segments. Unvoiced segments are then labeled by the complementary portions of segregated voiced speech. Our method does not require any pretrained model and is computationally simple. Evaluations and comparisons show that the proposed method outperforms a model-based method in terms of speech segregation.

...read moreread less

Journal Article•DOI•

Editorial: An excellent year and a transition

[...]

Kenji Doya, Stephen Grossberg, John G. Taylor, DeLiang Wang

01 Jan 2011-Neural Networks

TL;DR: This issue begins the twenty-third year of publication for Neural Networks, which is the leading journal in the world that covers the full range of neural networks and related research from all the areas of psychology and cognitive science, neuroscience and neuropsychology, mathematical and computational analysis, engineering and design, and technology and applications.

...read moreread less

Proceedings Article•DOI•

On the use of ideal binary masks for improving phonetic classification

[...]

Arun Narayanan¹, DeLiang Wang¹•Institutions (1)

Ohio State University¹

22 May 2011

TL;DR: It is shown that by combining the outputs of classifiers trained on the traditional MFCC features and this novel speech pattern, statistically significant improvements over the baseline MFCC based classifier can be achieved for the task of phonetic classification.

...read moreread less

Abstract: Ideal binary masks are binary patterns that encode the masking characteristics of speech in noise Recent evidence in speech perception suggests that such binary patterns provide sufficient information for human speech recognition Motivated by these findings, we propose to use ideal binary masks to improve phonetic modeling We show that by combining the outputs of classifiers trained on the traditional MFCC features and this novel speech pattern, statistically significant improvements over the baseline MFCC based classifier can be achieved for the task of phonetic classification Using the combined classifiers, we achieve an error rate of 195% on the TIMIT phonetic classification task using multilayer perceptrons as the underlying classifier

...read moreread less

Proceedings Article•DOI•

Image segmentation based on local spectral histograms and linear regression

[...]

Jiangye Yuan¹, DeLiang Wang¹, Rongxing Li¹•Institutions (1)

Ohio State University¹

03 Oct 2011

TL;DR: This work proposes an algorithm to automatically identify representative features corresponding to different homogeneous regions, and shows that the number of representative features can be determined by examining the effective rank of a feature matrix.

...read moreread less

Abstract: We present a novel method for segmenting images with texture and nontexture regions. Local spectral histograms are feature vectors consisting of histograms of chosen filter responses, which capture both texture and nontexture information. Based on the observation that the local spectral histogram of a pixel location can be approximated through a linear combination of the representative features weighted by the area coverage of each feature, we formulate the segmentation problem as a multivariate linear regression, where the solution is obtained by least squares estimation. Moreover, we propose an algorithm to automatically identify representative features corresponding to different homogeneous regions, and show that the number of representative features can be determined by examining the effective rank of a feature matrix. We present segmentation results on different types of images, and our comparison with another spectral histogram based method shows that the proposed method gives more accurate results.

...read moreread less