scispace - formally typeset
Search or ask a question
Author

Abolghasem Sayadiyan

Bio: Abolghasem Sayadiyan is an academic researcher from Amirkabir University of Technology. The author has contributed to research in topics: Speech coding & Speech processing. The author has an hindex of 11, co-authored 45 publications receiving 405 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: A new technique for separating two speech signals from a single recording is presented and effectively adds vocal-tract-related filter characteristics as a new cue to CASA models using a new grouping technique based on an underdetermined blind source separation.
Abstract: We present a new technique for separating two speech signals from a single recording. The proposed method bridges the gap between underdetermined blind source separation techniques and those techniques that model the human auditory system, that is, computational auditory scene analysis (CASA). For this purpose, we decompose the speech signal into the excitation signal and the vocal-tract-related filter and then estimate the components from the mixed speech using a hybrid model. We first express the probability density function (PDF) of the mixed speech's log spectral vectors in terms of the PDFs of the underlying speech signal's vocal-tract-related filters. Then, the mean vectors of PDFs of the vocal-tract-related filters are obtained using a maximum likelihood estimator given the mixed signal. Finally, the estimated vocal-tract-related filters along with the extracted fundamental frequencies are used to reconstruct estimates of the individual speech signals. The proposed technique effectively adds vocal-tract-related filter characteristics as a new cue to CASA models using a new grouping technique based on an underdetermined blind source separation. We compare our model with both an underdetermined blind source separation and a CASA method. The experimental results show that our model outperforms both techniques in terms of SNR improvement and the percentage of crosstalk suppression.

53 citations

Journal ArticleDOI
TL;DR: It is concluded that this mixture-maximisation approximation is a nonlinear minimum mean square error estimator with the assumption of uniform distributions for phase information of the underlying speech signals.
Abstract: In many speech separation, enhancement, and recognition techniques, it is necessary to express the log spectrum of a mixture speech signal in terms of the log spectra of the underlying speech signals. For this purpose, the mixture-maximisation (MIXMAX) approximation is commonly used. Presented is a proof for this approximation in a statistical framework. It is concluded that this approximation is a nonlinear minimum mean square error estimator with the assumption of uniform distributions for phase information of the underlying speech signals.

41 citations

Journal ArticleDOI
TL;DR: The results show that although for the speaker-dependent case, model-based separation delivers the best quality, for a speaker independent scenario the integrated model outperforms the individual approaches and supports the idea that the human auditory system takes on both grouping cues and a priori knowledge to segregate speech signals.

25 citations

Proceedings ArticleDOI
09 Feb 2010
TL;DR: A novel algorithm is proposed to design a set of Speech-Like (SL) symbols which leads to a GSM voice channel data modem to modulate and demodulate data on GSM Adaptive Multi Rate (AMR) voice codec which consist of different bit-rates.
Abstract: This paper introduces a new method to transmit digital data through Global System for Mobile communications (GSM) voice channel. A novel algorithm is proposed to design a set of Speech-Like (SL) symbols which leads to design a GSM voice channel data modem to modulate and demodulate data on GSM Adaptive Multi Rate (AMR) voice codec which consist of different bit-rates. Designing a set of time-symbols is an off line procedure with the aim of minimizing symbol detection error. This modem is useful in real-time data communication with high priority. The introduced modem encodes data into SL symbols to be transmitted over GSM voice channel and the received SL symbols are decoded back to data.

21 citations


Cited by
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Journal ArticleDOI
01 Oct 1980

1,565 citations

Journal ArticleDOI
TL;DR: In this paper, a convolutional recurrent neural network (CRNN) was proposed for polyphonic sound event detection task and compared with CNN, RNN and other established methods, and observed a considerable improvement for four different datasets consisting of everyday sound events.
Abstract: Sound events often occur in unstructured environments where they exhibit wide variations in their frequency content and temporal structure. Convolutional neural networks CNNs are able to extract higher level features that are invariant to local spectral and temporal variations. Recurrent neural networks RNNs are powerful in learning the longer term temporal context in the audio signals. CNNs and RNNs as classifiers have recently shown improved performances over established methods in various sound recognition tasks. We combine these two approaches in a convolutional recurrent neural network CRNN and apply it on a polyphonic sound event detection task. We compare the performance of the proposed CRNN method with CNN, RNN, and other established methods, and observe a considerable improvement for four different datasets consisting of everyday sound events.

432 citations

Journal ArticleDOI
TL;DR: A tandem algorithm is proposed that performs pitch estimation of a target utterance and segregation of voiced portions of target speech jointly and iteratively and performs substantially better than previous systems for either pitch extraction or voiced speech segregation.
Abstract: A lot of effort has been made in computational auditory scene analysis (CASA) to segregate speech from monaural mixtures. The performance of current CASA systems on voiced speech segregation is limited by lacking a robust algorithm for pitch estimation. We propose a tandem algorithm that performs pitch estimation of a target utterance and segregation of voiced portions of target speech jointly and iteratively. This algorithm first obtains a rough estimate of target pitch, and then uses this estimate to segregate target speech using harmonicity and temporal continuity. It then improves both pitch estimation and voiced speech segregation iteratively. Novel methods are proposed for performing segregation with a given pitch estimate and pitch determination with given segregation. Systematic evaluation shows that the tandem algorithm extracts a majority of target speech without including much interference, and it performs substantially better than previous systems for either pitch extraction or voiced speech segregation.

263 citations