A discriminatively trained Hough Transform for frame-level phoneme recognition

doi:10.1109/ICASSP.2014.6854053

Proceedings ArticleDOI

A discriminatively trained Hough Transform for frame-level phoneme recognition

Jonathan J. Dennis, +3 more

- pp 2514-2518

Chats0

TLDR

This paper proposes an alternative to conventional MFCC or filterbank features, using an approach based on the Generalised Hough Transform (GHT), a common approach used in the field of image processing for the task of object detection.

Abstract:

Despite recent advances in the use of Artificial Neural Network (ANN) architectures for automatic speech recognition (ASR), relatively little attention has been given to using feature inputs beyond MFCCs in such systems. In this paper, we propose an alternative to conventional MFCC or filterbank features, using an approach based on the Generalised Hough Transform (GHT). The GHT is a common approach used in the field of image processing for the task of object detection, where the idea is to learn the spatial distribution of a codebook of feature information relative to the location of the target class. During recognition, a simple weighted summation of the codebook activations is commonly used to detect the presence of the target classes. Here we propose to learn the weighting discriminatively in an ANN, where the aim is to optimise the static phone classification error at the output of the network. As such an ANN is common to hybrid ASR architectures, the output activations from the GHT can be considered as a novel feature for ASR. Experimental results on the TIMIT phoneme recognition task demonstrate the state-of-the-art performance of the approach.

A discriminatively trained Hough Transform for frame-level phoneme recognition

Citations

A Review of Physical and Perceptual Feature Extraction Techniques for Speech, Music and Environmental Sounds

Image processing techniques for speech signal processing

References

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

Use of the Hough transformation to detect lines and curves in pictures

Generalizing the hough transform to detect arbitrary shapes

Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition

Robust Object Detection with Interleaved Categorization and Segmentation

Related Papers (5)

Convolutional Neural Networks-based continuous speech recognition using raw speech signal

Study on CNN in the recognition of emotion in audio and images

Novel Approach of Implementing Speech Recognition using Neural Networks for Information Retrieval

A novel approach to increase the robustness of speaker independent Arabic speech recognition

Appearance and shape-based hybrid visual feature extraction: toward audio–visual automatic speech recognition