scispace - formally typeset
Search or ask a question
Author

Vinay Kumar Mittal

Bio: Vinay Kumar Mittal is an academic researcher from Indian Institutes of Information Technology. The author has contributed to research in topics: Speech production & Speech processing. The author has an hindex of 14, co-authored 53 publications receiving 600 citations. Previous affiliations of Vinay Kumar Mittal include International Institute of Information Technology, Hyderabad & International Institute of Information Technology.

Papers published on a yearly basis

Papers
More filters
Proceedings ArticleDOI
01 Dec 2015
TL;DR: This paper proposes a multi-functional `Smart Home Automation System' (SHAS), where users can use voice-commands to control their home-appliances and gadgets, for different functionalities and purposes, aimed to be cost-effective, flexible and robust.
Abstract: Current availability of interactive technologies infrastructure such as internet bandwidth, increased processing power and connectivity through mobile devices at affordable costs have led to new concepts related to human living. Smart cities, smart life and internet of things etc. are few such evolving research domains. A prominent concept among these is ‘Smart Home’, which involves automation and interactive technologies. This paper proposes a multi-functional ‘Smart Home Automation System’ (SHAS), where users can use voice-commands to control their home-appliances and gadgets, for different functionalities and purposes. The proposed system can be adapted to a user's voice and recognise the voice-commands, independent of the speaker's personal characteristics such as accent. The system is aimed to be cost-effective, flexible and robust. The voice command recognition is achieved using a dedicated hardware module and an Arduino micro-controller board for commands processing and control. Performance evaluation is carried out by developing a multi-functional miniature prototype of the SHAS. Results of the experiments conducted are quite promising. The prototype SHAS can be used for converting existing homes into smart homes at relatively low cost and with convenience.

72 citations

Journal ArticleDOI
TL;DR: It is shown that the closed phase behavior of the excitation at different loudness levels can be seen in the temporal variation of spectral energy in the low frequency (LF) (<400 Hz) region.
Abstract: In this paper characteristics of speech produced at different loudness levels are analyzed in terms of changes in the glottal excitation. Four loudness levels are considered in this study, namely, soft, normal, loud, and shout. The distinct changes in the excitation of the shout signal are analyzed using electroglottograph signals. The open and closed phases of the glottal vibration are distinctly different for shout signals, in comparison with those for normal speech. It is generally difficult to derive the glottal pulse information from the speech signal due to limitations in inverse filtering. Hence, the effects of changes in the excitation are examined by analyzing the speech signal using methods that can capture the temporal variations of the spectral features. In particular, the recently proposed methods of zero-frequency filtering and zero-time liftering are used in this analysis. It is shown that the closed phase behavior of the excitation at different loudness levels can be seen in the temporal variation of spectral energy in the low frequency (LF) (<400 Hz) region. The ratio of the LF to high frequency energy clearly discriminates the speech produced at different loudness levels. These distinctions in the excitation features are also observed in different vowel contexts and across several speakers.

61 citations

Proceedings ArticleDOI
01 Dec 2015
TL;DR: Two applications of fingerprint biometric are proposed, an Access Control System (ACS) for person-specific door access and a Classroom Attendance Management System (CAMS) that uses fingerprint as biometric feature for classroom attendance.
Abstract: Fingerprint is a reliable biometric feature having a wide range of applications that require authentication. Person-specific verification is needed in many scenarios such as access-control, classroom attendance and financial transactions etc. In this paper, two applications of fingerprint biometric are proposed. An Access Control System (ACS) prototype is demonstrated for person-specific door access, using a fingerprinting device. Another prototype of a Classroom Attendance Management System (CAMS) is developed that uses fingerprint as biometric feature for classroom attendance. The CAMS consists of modules for database, web-user interface and views at multiple levels of access. Both systems are expected to mitigate the shortcomings of alternative existing systems, and eliminate the possibilities of spoofing or proxy. These systems store fingerprints along with the date/time-stamp for each user. Fingerprints are stored dynamically in a database for computing the different statistics, e.g., month-wise or semester-wise trends in the case of CAMS. The CAMS can also provide a solution to the problem of late-coming. Experiments are conducted for measuring the recognition accuracy, i.e., fingerprint match. The initial results of recognition accuracy at 87% for ACS and 92% for CAMS are encouraging. The proposed systems can be further scaled-up for real-time deployment, in applications such as employee attendance and controlled access to high-security areas etc.

39 citations

Journal ArticleDOI
TL;DR: In this paper, the production characteristics of laughter are analysed at call and bout levels using EGG and speech signals and parameters representing degree of change and temporal changes in the production features are derived to study the discriminating characteristics of laughing from normal speech.

38 citations

Journal ArticleDOI
TL;DR: Effects of constriction can indeed be observed in the features of glottal vibration as well as vocal tract resonances, as shown in the studies on speech and electroglottograph signals.
Abstract: Characteristics of glottal vibration are affected by the obstruction to the flow of air through the vocal tract system. The obstruction to the airflow is determined by the nature, location, and extent of constriction in the vocal tract during production of voiced sounds. The effects of constriction on glottal vibration are examined for six different categories of speech sounds having varying degree of constriction. The effects are examined in terms of source and system features derived from the speech and electroglottograph signals. It is observed that a high degree of constriction causing obstruction to the flow of air results in large changes in these features, relative to the adjacent steady vowel regions, as in the case of apical trill and alveolar fricative sounds. These changes are insignificant when the obstruction to the airflow is less, as in the case of velar fricative and lateral approximant sounds. There are no changes in the excitation features when there is a free flow of air along the auxiliary tract, despite constriction in the vocal tract, as in the case of nasals. These studies show that effects of constriction can indeed be observed in the features of glottal vibration as well as vocal tract resonances.

33 citations


Cited by
More filters
09 Mar 2012
TL;DR: Artificial neural networks (ANNs) constitute a class of flexible nonlinear models designed to mimic biological neural systems as mentioned in this paper, and they have been widely used in computer vision applications.
Abstract: Artificial neural networks (ANNs) constitute a class of flexible nonlinear models designed to mimic biological neural systems. In this entry, we introduce ANN using familiar econometric terminology and provide an overview of ANN modeling approach and its implementation methods. † Correspondence: Chung-Ming Kuan, Institute of Economics, Academia Sinica, 128 Academia Road, Sec. 2, Taipei 115, Taiwan; ckuan@econ.sinica.edu.tw. †† I would like to express my sincere gratitude to the editor, Professor Steven Durlauf, for his patience and constructive comments on early drafts of this entry. I also thank Shih-Hsun Hsu and Yu-Lieh Huang for very helpful suggestions. The remaining errors are all mine.

2,069 citations

Journal ArticleDOI
01 Oct 1980

1,565 citations

Journal ArticleDOI
TL;DR: A review is conducted to map the research landscape of smart home based on Internet of Things into a coherent taxonomy and identifies the basic characteristics of this emerging field in the following aspects: motivation of using IoT in smart home applications, open challenges hindering utilization, and recommendations to improve the acceptance and use of smartHome IoT applications in literature.

413 citations

Journal ArticleDOI
TL;DR: In this survey paper, vision-based pedestrian detection systems are analysed based on their field of application, acquisition technology, computer vision techniques and classification strategies, and the reported results highlight the importance of testing pedestrians detection systems on different datasets to evaluate the robustness of the computed groups of features used as input to classifiers.

349 citations

Journal ArticleDOI
TL;DR: The authors' open-sourced cough detection algorithm is contributed to the research community to assist in data robustness assessment and four experienced physicians labeled more than 2,800 recordings to diagnose medical abnormalities present in the coughs, thereby contributing one of the largest expert-labeled cough datasets in existence.
Abstract: Cough audio signal classification has been successfully used to diagnose a variety of respiratory conditions, and there has been significant interest in leveraging Machine Learning (ML) to provide widespread COVID-19 screening. However, there is currently no validated database of cough sounds with which to train such ML models. The COUGHVID dataset provides over 20,000 crowdsourced cough recordings representing a wide range of subject ages, genders, geographic locations, and COVID-19 statuses. First, we filtered the dataset using our open-sourced cough detection algorithm. Second, experienced pulmonologists labeled more than 2,000 recordings to diagnose medical abnormalities present in the coughs, thereby contributing one of the largest expert-labeled cough datasets in existence that can be used for a plethora of cough audio classification tasks. Finally, we ensured that coughs labeled as symptomatic and COVID-19 originate from countries with high infection rates, and that their expert labels are consistent. As a result, the COUGHVID dataset contributes a wealth of cough recordings for training ML models to address the world's most urgent health crises.

177 citations