scispace - formally typeset
Search or ask a question
Author

Vinayak Abrol

Bio: Vinayak Abrol is an academic researcher from University of Oxford. The author has contributed to research in topics: Sparse approximation & Speech processing. The author has an hindex of 10, co-authored 43 publications receiving 296 citations. Previous affiliations of Vinayak Abrol include Idiap Research Institute & University Institute of Engineering and Technology, Panjab University.

Papers
More filters
Journal ArticleDOI
TL;DR: This paper proposes to use a multilevel decomposition (having multiple layers), also known as the deep sparse representation (DSR), to derive a feature representation for speech recognition, and reveals that the representations obtained at different sparse layers of the proposed DSR model have complimentary information.
Abstract: Features derived using sparse representation (SR)-based approaches have been shown to yield promising results for speech recognition tasks. In most of the approaches, the SR corresponding to speech signal is estimated using a dictionary, which could be either exemplar based or learned. However, a single-level decomposition may not be suitable for the speech signal, as it contains complex hierarchical information about various hidden attributes. In this paper, we propose to use a multilevel decomposition (having multiple layers), also known as the deep sparse representation (DSR), to derive a feature representation for speech recognition. Instead of having a series of sparse layers, the proposed framework employs a dense layer between two sparse layers, which helps in efficient implementation. Our studies reveal that the representations obtained at different sparse layers of the proposed DSR model have complimentary information. Thus, the final feature representation is derived after concatenating the representations obtained at the sparse layers. This results in a more discriminative representation, and improves the speech recognition performance. Since the concatenation results in a high-dimensional feature, principal component analysis is used to reduce the dimension of the obtained feature. Experimental studies demonstrate that the proposed feature outperforms existing features for various speech recognition tasks.

34 citations

Proceedings ArticleDOI
15 Sep 2019
TL;DR: This paper develops a gradient based approach to estimate the relevance of each speech sample input on the output score, and shows that analysis of the resulting “relevance signal” through conventional speech signal processing techniques can reveal the information modeled by the whole network.
Abstract: Modeling directly raw waveforms through neural networks for speech processing is gaining more and more attention. Despite its varied success, a question that remains is: what kind of information are such neural networks capturing or learning for different tasks from the speech signal? Such an insight is not only interesting for advancing those techniques but also for understanding better speech signal characteristics. This paper takes a step in that direction, where we develop a gradient based approach to estimate the relevance of each speech sample input on the output score. We show that analysis of the resulting “relevance signal” through conventional speech signal processing techniques can reveal the information modeled by the whole network. We demonstrate the potential of the proposed approach by analyzing raw waveform CNN-based phone recognition and speaker identification systems.

25 citations

Journal ArticleDOI
TL;DR: The proposed novel unsupervised voiced/nonvoiced (V/NV) detection method attempts to exploit the fact that there is significant glottal activity during production of voiced speech while the same is not true for nonvoiced speech, and provides compelling evidence of the effectiveness of sparse feature vector for V/NV detection.

20 citations

Journal ArticleDOI
TL;DR: Compared to the existing state-of-the-art methods, the proposed method has much less computational complexity, but performs similar for various pattern classification tasks.

19 citations

Proceedings ArticleDOI
13 May 2013
TL;DR: This work shows a comparative analysis of different sparse basis & measurement matrices which can be used in speech/audio processing and gives a detail analysis of the performance bounds, compression ratios, reconstruction errors etc. which should be taken care of while designing CS based speech applications.
Abstract: Reconstruction of a signal based on Compressed Sensing (CS) framework relies on the knowledge of the sparse basis & measurement matrix used for sensing. While most of the studies so far focus on the application of CS in fields of images, radar, astronomy etc.; wepresent our work on application of CS in field of speech/Audio processing. This work shows a comparative analysis of different sparse basis & measurement matrices which can be used in speech/audio processing. Our work gives a detail analysis of the performance bounds, compression ratios, reconstruction errors etc. which should be taken care of while designing CS based speech applications.

17 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: To bridge the gap between theory and practicality of CS, different CS acquisition strategies and reconstruction approaches are elaborated systematically in this paper.
Abstract: Compressive Sensing (CS) is a new sensing modality, which compresses the signal being acquired at the time of sensing. Signals can have sparse or compressible representation either in original domain or in some transform domain. Relying on the sparsity of the signals, CS allows us to sample the signal at a rate much below the Nyquist sampling rate. Also, the varied reconstruction algorithms of CS can faithfully reconstruct the original signal back from fewer compressive measurements. This fact has stimulated research interest toward the use of CS in several fields, such as magnetic resonance imaging, high-speed video acquisition, and ultrawideband communication. This paper reviews the basic theoretical concepts underlying CS. To bridge the gap between theory and practicality of CS, different CS acquisition strategies and reconstruction approaches are elaborated systematically in this paper. The major application areas where CS is currently being used are reviewed here. This paper also highlights some of the challenges and research directions in this field.

334 citations

01 Jan 2005

331 citations

Journal ArticleDOI
TL;DR: General‐purpose acoustic bird detection can achieve very high retrieval rates in remote monitoring data with no manual recalibration, and no pre‐training of the detector for the target species or the acoustic conditions in the target environment.
Abstract: Assessing the presence and abundance of birds is important for monitoring specific species as well as overall ecosystem health. Many birds are most readily detected by their sounds, and thus passive acoustic monitoring is highly appropriate. Yet acoustic monitoring is often held back by practical limitations such as the need for manual configuration, reliance on example sound libraries, low accuracy, low robustness, and limited ability to generalise to novel acoustic conditions. Here we report outcomes from a collaborative data challenge. We present new acoustic monitoring datasets, summarise the machine learning techniques proposed by challenge teams, conduct detailed performance evaluation, and discuss how such approaches to detection can be integrated into remote monitoring projects. Multiple methods were able to attain performance of around 88% AUC (area under the ROC curve), much higher performance than previous general‐purpose methods. With modern machine learning including deep learning, general‐purpose acoustic bird detection can achieve very high retrieval rates in remote monitoring data with no manual recalibration, and no pre‐training of the detector for the target species or the acoustic conditions in the target environment.

220 citations

Dissertation
01 Jan 2010
TL;DR: This research investigates the combination of domain adaptation, dictionary learning, object recognition, activity recognition, and shape representation in machine learning to solve the challenge of sparse representation in signal/Image processing.
Abstract: Research Interests Security and privacy: Active authentication, biometrics template protection, biometrics recognition. Computer vision: Domain adaptation, dictionary learning, object recognition, activity recognition, shape representation. Machine learning: Dimensionality reduction, clustering, kernel methods, weakly-supervised learning. Signal/Image processing: Sparse representation, compressive sampling, synthetic aperture radar imaging, millimeter wave imaging.

160 citations

Journal ArticleDOI
TL;DR: An enhanced fuzzy k-nearest neighbor (FKNN) method for the early detection of PD based upon vocal measurements was developed, and simulation results indicated the proposed approach outperformed the other five FKNN models based on BFO, particle swarm optimization, Genetic algorithms, fruit fly optimization, and firefly algorithm.
Abstract: Parkinson's disease (PD) is a common neurodegenerative disease, which has attracted more and more attention. Many artificial intelligence methods have been used for the diagnosis of PD. In this study, an enhanced fuzzy k-nearest neighbor (FKNN) method for the early detection of PD based upon vocal measurements was developed. The proposed method, an evolutionary instance-based learning approach termed CBFO-FKNN, was developed by coupling the chaotic bacterial foraging optimization with Gauss mutation (CBFO) approach with FKNN. The integration of the CBFO technique efficiently resolved the parameter tuning issues of the FKNN. The effectiveness of the proposed CBFO-FKNN was rigorously compared to those of the PD datasets in terms of classification accuracy, sensitivity, specificity, and AUC (area under the receiver operating characteristic curve). The simulation results indicated the proposed approach outperformed the other five FKNN models based on BFO, particle swarm optimization, Genetic algorithms, fruit fly optimization, and firefly algorithm, as well as three advanced machine learning methods including support vector machine (SVM), SVM with local learning-based feature selection, and kernel extreme learning machine in a 10-fold cross-validation scheme. The method presented in this paper has a very good prospect, which will bring great convenience to the clinicians to make a better decision in the clinical diagnosis.

97 citations