Proceedings ArticleDOI
Building speech synthesis systems for Indian languages
Abhijit Pradhan,Anusha Prakash,S. Aswin Shanmugam,G. R. Kasthuri,Raghava Krishnan,Hema A. Murthy +5 more
- pp 1-6
TLDR
New efforts to build text-to-speech synthesis systems (TTS) for Indian languages is presented and a group delay based syllable segmentation semi-automatic tool is discussed, showing that automatic segmentation is preferred.Abstract:
In this paper, new efforts to build text-to-speech synthesis systems (TTS) for Indian languages is presented. The synthesisers are built around both concatenative speech synthesis and statistical parametric speech synthesis frameworks. Text to speech synthesis systems require accurate segmentation. Obtaining accurate segmentation at the phone-level is a difficult task. Manual segmentation leads to human errors, while automatic segmentation using statistical approaches (hidden Markov model based approaches) leads to poor boundary information, when the amount of data used for training is small.read more
Citations
More filters
Proceedings ArticleDOI
Data-Efficient Training Strategies for Neural TTS Systems
K R Prajwal,C. V. Jawahar +1 more
TL;DR: In this article, the authors demonstrate three simple, yet effective pre-training strategies that allow them to train neural TTS systems with just about one-tenth of the data needs while also achieving better accuracy and naturalness.
Journal ArticleDOI
Significance of spectral cues in automatic speech segmentation for Indian language speech synthesizers
TL;DR: In this article, signal processing cues like short-term energy (STE) and sub-band spectral flux (SBSF) are used in tandem with HMM based forced alignment for automatic speech segmentation.
Proceedings ArticleDOI
Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning
TL;DR: This article investigated the use of fine-tuning the English-pretrained Tacotron2 model with limited Sanskrit data to synthesize natural sounding speech in Sanskrit in low resource settings.
Proceedings Article
IndicSpeech: Text-to-Speech Corpus for Indian Languages
TL;DR: A 24 hour text-to-speech corpus for 3 major Indian languages namely Hindi, Malayalam and Bengali is released and a state-of-the-art TTS system for each of these languages is trained.
Journal ArticleDOI
Rules for Orthographic Word Parsing of the Philippines’ Cebuano-Visayan Language Using Context-Free Grammars
TL;DR: G grammar rules for hyphenated words are created which include sequences of a hyphen between vowel-consonant, consonant-cons onant, vowel-vowel, and consonants to enhance the understanding and comprehension of the Cebuano-Visayan discourse.
References
More filters
Journal ArticleDOI
Software for a cascade/parallel formant synthesizer
TL;DR: A software formant synthesizer is described that can generate synthetic speech using a laboratory digital computer and a control program lets the user specify variable control parameter data, such as formant frequencies as a function of time, as a sequence of 〈time, value〉 points.
Proceedings ArticleDOI
Unit selection in a concatenative speech synthesis system using a large speech database
Andrew Hunt,Alan W. Black +1 more
TL;DR: In this paper, a state transition network is proposed to select and concatenate phonemes from a large speech database to produce a natural realisation of a target phoneme sequence predicted from text which is annotated with prosodic and phonetic context information.
Proceedings ArticleDOI
Speech parameter generation algorithms for HMM-based speech synthesis
TL;DR: A speech parameter generation algorithm for HMM-based speech synthesis, in which the speech parameter sequence is generated from HMMs whose observation vector consists of a spectral parameter vector and its dynamic feature vectors, is derived.
Proceedings Article
Automatically clustering similar units for unit selection in speech synthesis.
Alan W. Black,Paul Taylor +1 more
TL;DR: A new method for synthesizing speech by concatenating sub-word units from a database of labelled speech by automatically clustering units of the same phone class based on their phonetic and prosodic context is described.
Journal ArticleDOI
Speaking in shorthand - A syllable-centric perspective for understanding pronunciation variation
TL;DR: Systematic analysis of pronunciation variation in a corpus of spontaneous English discourse (Switchboard) demonstrates that the variation observed is more systematic at the level of the syllable than at the phonetic-segment level, and syllabic onsets are realized in canonical form far more frequently than either coda or nuclear constituents.