scispace - formally typeset
Search or ask a question
Book ChapterDOI

Improving Recognition of Speech System Using Multimodal Approach

TL;DR: The proposed work (combines normal, throat, and visual features) shows 94% recognition accuracy which is better compared to unimodal and bimodoal ASR systems.
Abstract: Building an ASR system in adverse conditions is a challenging task. The performance of the ASR system is high in clean environments. However, the variabilities such as speaker effect, transmission effect, and the environmental conditions degrade the recognition performance of the system. One way to enhance the robustness of ASR system is to use multiple sources of information about speech. In this work, two sources of additional information on speech are used to build a multimodal ASR system. A throat microphone speech and visual lip reading which is less susceptible to noise acts as alternate sources of information. Mel-frequency cepstral features are extracted from the throat signal and modeled by HMM. Pixel-based transformation methods (DCT and DWT) are used to extract the features from the viseme of the video data and modeled by HMM. Throat and visual features are combined at the feature level. The proposed system has improved recognition accuracy compared to unimodals. The digit database for the English language is used for the study. The experiments are carried out for both unimodal systems and the combined systems. The combined feature of normal and throat microphone gives 86.5% recognition accuracy. Visual speech features with the normal microphone combination produce 84% accuracy. The proposed work (combines normal, throat, and visual features) shows 94% recognition accuracy which is better compared to unimodal and bimodoal ASR systems.
Citations
More filters
Book ChapterDOI
08 Nov 2022
TL;DR: Neuro-OCR as mentioned in this paper is an interactive book reader for blind people based on optical character recognition (OCR), which is made up of a camera-based architecture that aids blind people in reading text on labels, printed notes, and objects.
Abstract: Everyone deserves to live freely, even those who are impaired. In recent decades, technology has focused on empowering disabled people to have as much control over their lives as possible. The braille system, which allows the blind to read, is now the only effective system available. However, this approach is time demanding, and it takes a long time to recognize the text. Our goal is to cut down on time it takes to read. Our article created a ground-breaking interactive book reader for blind people based on optical character recognition. In artificial intelligence and recognition of patterns, among the most effective technology applications are optical character recognition. It is necessary to have a simple content reader accessible, inexpensive, and easily obtainable in public. The framework is made up of a camera-based architecture that aids blind people in reading text on labels, printed notes, and objects. Text-to-speech (TTS), OCR, image processing methods, and a synthesis module are all part of our framework. Neuro-OCR deals with incorporating a complete text read-out device suited for the visually handicapped. We used Google Tesseract as an OCR and Pico as a TTS in our work. After which, the voice output is sent to the Telegram application and noticed by the user.

6 citations

Journal ArticleDOI
TL;DR: A Three-Layered Perceptron Neural Network model that is trained using a variety of evolutionary as well as quantum-inspired evolutionary algorithms for the classification of Parkinson's Disease is presented.
Abstract: Parkinson’s Disease is a degenerative neurological disorder with unknown origins, making it impossible to be cured or even diagnosed. The following article presents a Three-Layered Perceptron Neural Network model that is trained using a variety of evolutionary as well as quantum-inspired evolutionary algorithms for the classification of Parkinson's Disease. Optimization algorithms such as Particle Swarm Optimization, Artificial Bee Colony Algorithm and Bat Algorithm are studied along with their quantum-inspired counter-parts in order to identify the best suited algorithm for Neural Network Weight Distribution. The results show that the quantum-inspired evolutionary algorithms perform better under the given circumstances, with qABC offering the highest accuracy of about 92.3%. The presented model can be used not only for disease diagnosis but is also likely to find its applications in various other fields as well.
Journal ArticleDOI
TL;DR: In this article , the authors studied the exclusive influence of Lombard effect on automatic speech recognition (ASR) systems towards building robust multimodal ASR systems in adverse environments in the context of Indian languages which are syllabic in nature.
References
More filters
Journal ArticleDOI
TL;DR: In this paper, a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates is described. But the detection performance is limited to 15 frames per second.
Abstract: This paper describes a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the “Integral Image” which allows the features used by our detector to be computed very quickly. The second is a simple and efficient classifier which is built using the AdaBoost learning algorithm (Freund and Schapire, 1995) to select a small number of critical visual features from a very large set of potential features. The third contribution is a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions. A set of experiments in the domain of face detection is presented. The system yields face detection performance comparable to the best previous systems (Sung and Poggio, 1998; Rowley et al., 1998; Schneiderman and Kanade, 2000; Roth et al., 2000). Implemented on a conventional desktop, face detection proceeds at 15 frames per second.

13,037 citations

Proceedings ArticleDOI
07 Jul 2001
TL;DR: A new image representation called the “Integral Image” is introduced which allows the features used by the detector to be computed very quickly and a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions.
Abstract: This paper describes a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the "Integral Image" which allows the features used by our detector to be computed very quickly. The second is a simple and efficient classifier which is built using the AdaBoost learning algo- rithm (Freund and Schapire, 1995) to select a small number of critical visual features from a very large set of potential features. The third contribution is a method for combining classifiers in a "cascade" which allows back- ground regions of the image to be quickly discarded while spending more computation on promising face-like regions. A set of experiments in the domain of face detection is presented. The system yields face detection perfor- mance comparable to the best previous systems (Sung and Poggio, 1998; Rowley et al., 1998; Schneiderman and Kanade, 2000; Roth et al., 2000). Implemented on a conventional desktop, face detection proceeds at 15 frames per second.

10,592 citations

Journal ArticleDOI
TL;DR: In this paper, the authors provide an up-to-date critical survey of still-and video-based face recognition research, and provide some insights into the studies of machine recognition of faces.
Abstract: As one of the most successful applications of image analysis and understanding, face recognition has recently received significant attention, especially during the past several years. At least two reasons account for this trend: the first is the wide range of commercial and law enforcement applications, and the second is the availability of feasible technologies after 30 years of research. Even though current machine recognition systems have reached a certain level of maturity, their success is limited by the conditions imposed by many real applications. For example, recognition of face images acquired in an outdoor environment with changes in illumination and/or pose remains a largely unsolved problem. In other words, current systems are still far away from the capability of the human perception system.This paper provides an up-to-date critical survey of still- and video-based face recognition research. There are two underlying motivations for us to write this survey paper: the first is to provide an up-to-date review of the existing literature, and the second is to offer some insights into the studies of machine recognition of faces. To provide a comprehensive survey, we not only categorize existing recognition techniques but also present detailed descriptions of representative methods within each category. In addition, relevant topics such as psychophysical studies, system evaluation, and issues of illumination and pose variation are covered.

6,384 citations

Journal ArticleDOI
TL;DR: A common theoretical framework for combining classifiers which use distinct pattern representations is developed and it is shown that many existing schemes can be considered as special cases of compound classification where all the pattern representations are used jointly to make a decision.
Abstract: We develop a common theoretical framework for combining classifiers which use distinct pattern representations and show that many existing schemes can be considered as special cases of compound classification where all the pattern representations are used jointly to make a decision. An experimental comparison of various classifier combination schemes demonstrates that the combination rule developed under the most restrictive assumptions-the sum rule-outperforms other classifier combinations schemes. A sensitivity analysis of the various schemes to estimation errors is carried out to show that this finding can be justified theoretically.

5,670 citations

01 Jan 2000
TL;DR: Speech Reference EPFL-CONF-82637 is presented, which describes the development of a framework for future generations of interpreters to understand and respond toaudible language barriers.
Abstract: Keywords: speech Reference EPFL-CONF-82637 Related documents: http://publications.idiap.ch/index.php/publications/showcite/glotinidiaprr0035 Record created on 2006-03-10, modified on 2017-05-10

255 citations