Improving Recognition of Speech System Using Multimodal Approach

doi:10.1007/978-981-13-2354-6_42

Home
/
Papers
/
Improving Recognition of Speech System Using Multimodal Approach

Book Chapter•DOI•

Improving Recognition of Speech System Using Multimodal Approach

N. Radha¹, A. Shahina¹, A. Nayeemulla Khan²•Institutions (2)

Sri Sivasubramaniya Nadar College of Engineering¹, VIT University²

01 Jan 2019-pp 397-410

TL;DR: The proposed work (combines normal, throat, and visual features) shows 94% recognition accuracy which is better compared to unimodal and bimodoal ASR systems.

read less

Abstract: Building an ASR system in adverse conditions is a challenging task. The performance of the ASR system is high in clean environments. However, the variabilities such as speaker effect, transmission effect, and the environmental conditions degrade the recognition performance of the system. One way to enhance the robustness of ASR system is to use multiple sources of information about speech. In this work, two sources of additional information on speech are used to build a multimodal ASR system. A throat microphone speech and visual lip reading which is less susceptible to noise acts as alternate sources of information. Mel-frequency cepstral features are extracted from the throat signal and modeled by HMM. Pixel-based transformation methods (DCT and DWT) are used to extract the features from the viseme of the video data and modeled by HMM. Throat and visual features are combined at the feature level. The proposed system has improved recognition accuracy compared to unimodals. The digit database for the English language is used for the study. The experiments are carried out for both unimodal systems and the combined systems. The combined feature of normal and throat microphone gives 86.5% recognition accuracy. Visual speech features with the normal microphone combination produce 84% accuracy. The proposed work (combines normal, throat, and visual features) shows 94% recognition accuracy which is better compared to unimodal and bimodoal ASR systems.

...read moreread less

Citations

PDF

Open Access

More filters

Book Chapter•DOI•

Assistive System for the Blind with Voice Output Based on Optical Character Recognition

[...]

R. Jeya

08 Nov 2022

TL;DR: Neuro-OCR as mentioned in this paper is an interactive book reader for blind people based on optical character recognition (OCR), which is made up of a camera-based architecture that aids blind people in reading text on labels, printed notes, and objects.

...read moreread less

Abstract: Everyone deserves to live freely, even those who are impaired. In recent decades, technology has focused on empowering disabled people to have as much control over their lives as possible. The braille system, which allows the blind to read, is now the only effective system available. However, this approach is time demanding, and it takes a long time to recognize the text. Our goal is to cut down on time it takes to read. Our article created a ground-breaking interactive book reader for blind people based on optical character recognition. In artificial intelligence and recognition of patterns, among the most effective technology applications are optical character recognition. It is necessary to have a simple content reader accessible, inexpensive, and easily obtainable in public. The framework is made up of a camera-based architecture that aids blind people in reading text on labels, printed notes, and objects. Text-to-speech (TTS), OCR, image processing methods, and a synthesis module are all part of our framework. Neuro-OCR deals with incorporating a complete text read-out device suited for the visually handicapped. We used Google Tesseract as an OCR and Pico as a TTS in our work. After which, the voice output is sent to the Telegram application and noticed by the user.

...read moreread less

6 citations

Journal Article•DOI•

Quantum-Inspired Evolutionary Algorithms for Neural Network Weight Distribution: A Classification Model for Parkinson's Disease

[...]

Srishti Sahni¹, Vaibhav Aggarwal¹, Ashish Khanna¹, Deepak Gupta¹, Siddhartha Bhattacharyya - Show less +1 more•Institutions (1)

Maharaja Agrasen Institute of Technology¹

18 Dec 2020-Journal of information and organizational sciences

TL;DR: A Three-Layered Perceptron Neural Network model that is trained using a variety of evolutionary as well as quantum-inspired evolutionary algorithms for the classification of Parkinson's Disease is presented.

...read moreread less

Abstract: Parkinson’s Disease is a degenerative neurological disorder with unknown origins, making it impossible to be cured or even diagnosed. The following article presents a Three-Layered Perceptron Neural Network model that is trained using a variety of evolutionary as well as quantum-inspired evolutionary algorithms for the classification of Parkinson's Disease. Optimization algorithms such as Particle Swarm Optimization, Artificial Bee Colony Algorithm and Bat Algorithm are studied along with their quantum-inspired counter-parts in order to identify the best suited algorithm for Neural Network Weight Distribution. The results show that the quantum-inspired evolutionary algorithms perform better under the given circumstances, with qABC offering the highest accuracy of about 92.3%. The presented model can be used not only for disease diagnosis but is also likely to find its applications in various other fields as well.

...read moreread less

Journal Article•DOI•

A multimodal Lombard speech recognition system for the confusable Hindi syllabic units

[...]

S. Uma Maheswari, N. Radha, A. Shahina, P. J. Thanusu Prabha, B.T. Preethi Sri, A. Nayeemulla Khan - Show less +2 more

01 Jun 2022-Materials Today: Proceedings

TL;DR: In this article , the authors studied the exclusive influence of Lombard effect on automatic speech recognition (ASR) systems towards building robust multimodal ASR systems in adverse environments in the context of Indian languages which are syllabic in nature.

...read moreread less

References

PDF

Open Access

More filters

Journal Article•DOI•

Robust Real-Time Face Detection

[...]

Paul A. Viola¹, Michael Jones²•Institutions (2)

Microsoft¹, Mitsubishi Electric²

01 May 2004-International Journal of Computer Vision

TL;DR: In this paper, a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates is described. But the detection performance is limited to 15 frames per second.

...read moreread less

Abstract: This paper describes a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the “Integral Image” which allows the features used by our detector to be computed very quickly. The second is a simple and efficient classifier which is built using the AdaBoost learning algorithm (Freund and Schapire, 1995) to select a small number of critical visual features from a very large set of potential features. The third contribution is a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions. A set of experiments in the domain of face detection is presented. The system yields face detection performance comparable to the best previous systems (Sung and Poggio, 1998; Rowley et al., 1998; Schneiderman and Kanade, 2000; Roth et al., 2000). Implemented on a conventional desktop, face detection proceeds at 15 frames per second.

...read moreread less

13,037 citations

Proceedings Article•DOI•

Robust real-time face detection

[...]

Paul A. Viola¹, Michael Jones²•Institutions (2)

Microsoft¹, Mitsubishi Electric²

07 Jul 2001

TL;DR: A new image representation called the “Integral Image” is introduced which allows the features used by the detector to be computed very quickly and a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions.

...read moreread less

Abstract: This paper describes a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the "Integral Image" which allows the features used by our detector to be computed very quickly. The second is a simple and efficient classifier which is built using the AdaBoost learning algo- rithm (Freund and Schapire, 1995) to select a small number of critical visual features from a very large set of potential features. The third contribution is a method for combining classifiers in a "cascade" which allows back- ground regions of the image to be quickly discarded while spending more computation on promising face-like regions. A set of experiments in the domain of face detection is presented. The system yields face detection perfor- mance comparable to the best previous systems (Sung and Poggio, 1998; Rowley et al., 1998; Schneiderman and Kanade, 2000; Roth et al., 2000). Implemented on a conventional desktop, face detection proceeds at 15 frames per second.

...read moreread less

10,592 citations

Journal Article•DOI•

Face recognition: A literature survey

[...]

W. Zhao¹, Rama Chellappa², P. J. Phillips³, Azriel Rosenfeld²•Institutions (3)

Sarnoff Corporation¹, University of Maryland, College Park², National Institute of Standards and Technology³

01 Dec 2003-ACM Computing Surveys

TL;DR: In this paper, the authors provide an up-to-date critical survey of still-and video-based face recognition research, and provide some insights into the studies of machine recognition of faces.

...read moreread less

Abstract: As one of the most successful applications of image analysis and understanding, face recognition has recently received significant attention, especially during the past several years. At least two reasons account for this trend: the first is the wide range of commercial and law enforcement applications, and the second is the availability of feasible technologies after 30 years of research. Even though current machine recognition systems have reached a certain level of maturity, their success is limited by the conditions imposed by many real applications. For example, recognition of face images acquired in an outdoor environment with changes in illumination and/or pose remains a largely unsolved problem. In other words, current systems are still far away from the capability of the human perception system.This paper provides an up-to-date critical survey of still- and video-based face recognition research. There are two underlying motivations for us to write this survey paper: the first is to provide an up-to-date review of the existing literature, and the second is to offer some insights into the studies of machine recognition of faces. To provide a comprehensive survey, we not only categorize existing recognition techniques but also present detailed descriptions of representative methods within each category. In addition, relevant topics such as psychophysical studies, system evaluation, and issues of illumination and pose variation are covered.

...read moreread less

6,384 citations

Journal Article•DOI•

On combining classifiers

[...]

Josef Kittler¹, M. Hatef², Robert P. W. Duin³, Jiri Matas¹•Institutions (3)

University of Surrey¹, ERA Technology Ltd², Delft University of Technology³

01 Mar 1998-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A common theoretical framework for combining classifiers which use distinct pattern representations is developed and it is shown that many existing schemes can be considered as special cases of compound classification where all the pattern representations are used jointly to make a decision.

...read moreread less

Abstract: We develop a common theoretical framework for combining classifiers which use distinct pattern representations and show that many existing schemes can be considered as special cases of compound classification where all the pattern representations are used jointly to make a decision. An experimental comparison of various classifier combination schemes demonstrates that the combination rule developed under the most restrictive assumptions-the sum rule-outperforms other classifier combinations schemes. A sensitivity analysis of the various schemes to estimation errors is carried out to show that this finding can be justified theoretically.

...read moreread less

5,670 citations

Audio-visual speech recognition

[...]

Chalapathy Neti, Gerasimos Potamianos, Juergen Luettin, Iain Matthews, Hervé Glotin, D. Vergyri, J. Sison, A. Mashari - Show less +4 more

01 Jan 2000

TL;DR: Speech Reference EPFL-CONF-82637 is presented, which describes the development of a framework for future generations of interpreters to understand and respond toaudible language barriers.

...read moreread less

Abstract: Keywords: speech Reference EPFL-CONF-82637 Related documents: http://publications.idiap.ch/index.php/publications/showcite/glotinidiaprr0035 Record created on 2006-03-10, modified on 2017-05-10

...read moreread less

255 citations