Proceedings ArticleDOI
A discriminatively trained Hough Transform for frame-level phoneme recognition
Jonathan J. Dennis,Huy Dat Tran,Haizhou Li,Eng Siong Chng +3 more
- pp 2514-2518
Reads0
Chats0
TLDR
This paper proposes an alternative to conventional MFCC or filterbank features, using an approach based on the Generalised Hough Transform (GHT), a common approach used in the field of image processing for the task of object detection.Abstract:
Despite recent advances in the use of Artificial Neural Network (ANN) architectures for automatic speech recognition (ASR), relatively little attention has been given to using feature inputs beyond MFCCs in such systems. In this paper, we propose an alternative to conventional MFCC or filterbank features, using an approach based on the Generalised Hough Transform (GHT). The GHT is a common approach used in the field of image processing for the task of object detection, where the idea is to learn the spatial distribution of a codebook of feature information relative to the location of the target class. During recognition, a simple weighted summation of the codebook activations is commonly used to detect the presence of the target classes. Here we propose to learn the weighting discriminatively in an ANN, where the aim is to optimise the static phone classification error at the output of the network. As such an ANN is common to hybrid ASR architectures, the output activations from the GHT can be considered as a novel feature for ASR. Experimental results on the TIMIT phoneme recognition task demonstrate the state-of-the-art performance of the approach.read more
Citations
More filters
Journal ArticleDOI
A Review of Physical and Perceptual Feature Extraction Techniques for Speech, Music and Environmental Sounds
TL;DR: An up-to-date review of the most relevant audio feature extraction techniques developed to analyze the most usual audio signals: speech, music and environmental sounds is presented.
References
More filters
Journal ArticleDOI
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
Geoffrey E. Hinton,Li Deng,Dong Yu,George E. Dahl,Abdelrahman Mohamed,Navdeep Jaitly,Andrew W. Senior,Vincent Vanhoucke,Patrick Nguyen,Tara N. Sainath,Brian Kingsbury +10 more
TL;DR: This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.
Journal ArticleDOI
Use of the Hough transformation to detect lines and curves in pictures
Richard O. Duda,Peter E. Hart +1 more
TL;DR: It is pointed out that the use of angle-radius rather than slope-intercept parameters simplifies the computation further, and how the method can be used for more general curve fitting.
Journal ArticleDOI
Generalizing the hough transform to detect arbitrary shapes
TL;DR: It is shown how the boundaries of an arbitrary non-analytic shape can be used to construct a mapping between image space and Hough transform space, which makes the generalized Houghtransform a kind of universal transform which can beused to find arbitrarily complex shapes.
Journal ArticleDOI
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
TL;DR: A pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output that can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs.
Journal ArticleDOI
Robust Object Detection with Interleaved Categorization and Segmentation
TL;DR: A novel method for detecting and localizing objects of a visual category in cluttered real-world scenes that is applicable to a range of different object categories, including both rigid and articulated objects and able to achieve competitive object detection performance from training sets that are between one and two orders of magnitude smaller than those used in comparable systems.
Related Papers (5)
Study on CNN in the recognition of emotion in audio and images
Bin Zhang,Changqin Quan,Fuji Ren +2 more
Novel Approach of Implementing Speech Recognition using Neural Networks for Information Retrieval
K. Sajeer,Paul Rodrigues +1 more