scispace - formally typeset
Search or ask a question
Author

N. P. Narendra

Bio: N. P. Narendra is an academic researcher from Aalto University. The author has contributed to research in topics: Speech synthesis & Augmented reality. The author has an hindex of 8, co-authored 43 publications receiving 287 citations. Previous affiliations of N. P. Narendra include Harvard University & Indian Institute of Technology Kharagpur.

Papers
More filters
Journal ArticleDOI
TL;DR: The subjective and objective measures indicate that the proposed features and methods have improved the quality of the synthesized speech from stage-2 to stage-4.
Abstract: This paper presents the design and development of unrestricted text to speech synthesis (TTS) system in Bengali language. Unrestricted TTS system is capable to synthesize good quality of speech in different domains. In this work, syllables are used as basic units for synthesis. Festival framework has been used for building the TTS system. Speech collected from a female artist is used as speech corpus. Initially five speakers' speech is collected and a prototype TTS is built from each of the five speakers. Best speaker among the five is selected through subjective and objective evaluation of natural and synthesized waveforms. Then development of unrestricted TTS is carried out by addressing the issues involved at each stage to produce good quality synthesizer. Evaluation is carried out in four stages by conducting objective and subjective listening tests on synthesized speech. At the first stage, TTS system is built with basic festival framework. In the following stages, additional features are incorporated into the system and quality of synthesis is evaluated. The subjective and objective measures indicate that the proposed features and methods have improved the quality of the synthesized speech from stage-2 to stage-4.

65 citations

Proceedings ArticleDOI
01 Nov 2013
TL;DR: A consortium effort on building text to speech (TTS) systems for 13 Indian languages using the same common framework and the TTS systems are evaluated using Mean Opinion Score (DMOS) and Word Error Rate (WER).
Abstract: In this paper, we discuss a consortium effort on building text to speech (TTS) systems for 13 Indian languages. There are about 1652 Indian languages. A unified framework is therefore attempted required for building TTSes for Indian languages. As Indian languages are syllable-timed, a syllable-based framework is developed. As quality of speech synthesis is of paramount interest, unit-selection synthesizers are built. Building TTS systems for low-resource languages requires that the data be carefully collected an annotated as the database has to be built from the scratch. Various criteria have to addressed while building the database, namely, speaker selection, pronunciation variation, optimal text selection, handling of out of vocabulary words and so on. The various characteristics of the voice that affect speech synthesis quality are first analysed. Next the design of the corpus of each of the Indian languages is tabulated. The collected data is labeled at the syllable level using a semiautomatic labeling tool. Text to speech synthesizers are built for all the 13 languages, namely, Hindi, Tamil, Marathi, Bengali, Malayalam, Telugu, Kannada, Gujarati, Rajasthani, Assamese, Manipuri, Odia and Bodo using the same common framework. The TTS systems are evaluated using degradation Mean Opinion Score (DMOS) and Word Error Rate (WER). An average DMOS score of ≈3.0 and an average WER of about 20 % is observed across all the languages.

42 citations

Journal ArticleDOI
TL;DR: The evaluation of both approaches demonstrate that automatic detection of pathological voice from healthy speech benefits from using glottal source information.
Abstract: Automatic methods for the detection of pathological voice from healthy speech can be considered as potential clinical tools for medical treatment. This study investigates the effectiveness of glottal source information in the detection of pathological voice by comparing the classical pipeline approach to the end-to-end approach. The traditional pipeline approach consists of a feature extractor and a separate classifier. In the former, two sets of glottal features (computed using the quasi-closed phase glottal inverse filtering method) are used together with the widely used openSMILE features. Using both the glottal and openSMILE features extracted from voice utterances and the corresponding healthy/pathology labels, support vector machine (SVM) classifiers are trained. In building end-to-end systems, both raw speech signals and raw glottal flow waveforms are used to train two deep learning architectures: (1) a combination of convolutional neural network (CNN) and multilayer perceptron (MLP), and (2) a combination of CNN and long short-term memory (LSTM) network. Experiments were carried out using three publicly available databases, including dysarthric (the UA-Speech database and the TORGO database) and dysphonic voices (the UPM database). The performance analysis of the detection system based on the traditional pipeline approach showed best results when the glottal features were combined with the baseline openSMILE features. The results of the end-to-end approach indicated higher accuracies (about 2-3 % improvement in all three databases) when glottal flow was used as the raw time-domain input (87.93 % for UA-Speech, 81.12 % for TORGO and 76.66 % for UPM) compared to using raw speech waveform (85.12 % for UA-Speech, 78.83 % for TORGO and 73.71 % for UPM). The evaluation of both approaches demonstrate that automatic detection of pathological voice from healthy speech benefits from using glottal source information.

40 citations

Journal ArticleDOI
TL;DR: In this article, the use of voice source information in the detection of Parkinson's disease from speech using two classifier architectures: traditional pipeline approach and end-to-end approach.
Abstract: Developing automatic methods to detect Parkinson's disease (PD) from speech has attracted increasing interest as these techniques can potentially be used in telemonitoring health applications. This article studies the utilization of voice source information in the detection of PD using two classifier architectures: traditional pipeline approach and end-to-end approach. The former consists of feature extraction and classifier stages. In feature extraction, the baseline acoustic features—consisting of articulation, phonation, and prosody features—were computed and voice source information was extracted using glottal features that were estimated by iterative adaptive inverse filtering (IAIF) and quasi-closed phase (QCP) glottal inverse filtering methods. Support vector machine classifiers were developed utilizing the baseline and glottal features extracted from every speech utterance and the corresponding healthy/PD labels. The end-to-end approach uses deep learning models which were trained using both raw speech waveforms and raw voice source waveforms. In the latter, two glottal inverse filtering methods (IAIF and QCP) and zero frequency filtering method were utilized. The deep learning architecture consists of a combination of convolutional layers followed by a multilayer perceptron. Experiments were performed using PC-GITA speech database. From the traditional pipeline systems, the highest classification accuracy (67.93%) was given by combination of baseline and QCP-based glottal features. From the end-to-end-systems, the highest accuracy (68.56%) was given by the system trained using QCP-based glottal flow signals. Even though classification accuracies were modest for all systems, the study is encouraging as the extraction of voice source information was found to be most effective in both approaches.

34 citations

Journal ArticleDOI
TL;DR: The results showed that the glottal features in combination with the openSMILE-based acoustic features resulted in improved classification accuracies, which validate the complementary nature of glattal features.

31 citations


Cited by
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Journal ArticleDOI
01 Oct 1980

1,565 citations

Journal ArticleDOI
TL;DR: An IAR system architecture that combines cloudlets and fog computing, which reduce latency response and accelerate rendering tasks while offloading compute intensive tasks from the cloud is proposed.
Abstract: Shipbuilding companies are upgrading their inner workings in order to create Shipyards 4.0, where the principles of Industry 4.0 are paving the way to further digitalized and optimized processes in an integrated network. Among the different Industry 4.0 technologies, this paper focuses on augmented reality, whose application in the industrial field has led to the concept of industrial augmented reality (IAR). This paper first describes the basics of IAR and then carries out a thorough analysis of the latest IAR systems for industrial and shipbuilding applications. Then, in order to build a practical IAR system for shipyard workers, the main hardware and software solutions are compared. Finally, as a conclusion after reviewing all the aspects related to IAR for shipbuilding, it proposed an IAR system architecture that combines cloudlets and fog computing, which reduce latency response and accelerate rendering tasks while offloading compute intensive tasks from the cloud.

283 citations

Journal ArticleDOI
TL;DR: The landscape of MAR through the past and its future prospects with respect to the 5G systems and complementary technology MEC are discussed and an informative analysis of the network formation of current and future MAR systems in terms of cloud, edge, localized, and hybrid architectural options is provided.
Abstract: The Augmented Reality (AR) technology enhances the human perception of the world by combining the real environment with the virtual space. With the explosive growth of powerful, less expensive mobile devices, and the emergence of sophisticated communication infrastructure, Mobile Augmented Reality (MAR) applications are gaining increased popularity. MAR allows users to run AR applications on mobile devices with greater mobility and at a lower cost. The emerging 5G communication technologies act as critical enablers for future MAR applications to achieve ultra-low latency and extremely high data rates while Multi-access Edge Computing (MEC) brings enhanced computational power closer to the users to complement MAR. This paper extensively discusses the landscape of MAR through the past and its future prospects with respect to the 5G systems and complementary technology MEC. The paper especially provides an informative analysis of the network formation of current and future MAR systems in terms of cloud, edge, localized, and hybrid architectural options. The paper discusses key application areas for MAR and their future with the advent of 5G technologies. The paper also discusses the requirements and limitations of MAR technical aspects such as communication, mobility management, energy management, service offloading and migration, security, and privacy and analyzes the role of 5G technologies.

259 citations

31 Jul 1996
TL;DR: In this article, the authors investigated the use of modulation block codes as the inner code of a concatenated coding system in order to improve the overall space link communications performance and identified and analyzed candidate codes that will complement the performance of the overall coding system which uses the interleaved RS (255,223) code as the outer code.
Abstract: This report describes the progress made towards the completion of a specific task on error-correcting coding. The proposed research consisted of investigating the use of modulation block codes as the inner code of a concatenated coding system in order to improve the overall space link communications performance. The study proposed to identify and analyze candidate codes that will complement the performance of the overall coding system which uses the interleaved RS (255,223) code as the outer code.

179 citations