scispace - formally typeset
Proceedings ArticleDOI

Building speech synthesis systems for Indian languages

TLDR
New efforts to build text-to-speech synthesis systems (TTS) for Indian languages is presented and a group delay based syllable segmentation semi-automatic tool is discussed, showing that automatic segmentation is preferred.
Abstract
In this paper, new efforts to build text-to-speech synthesis systems (TTS) for Indian languages is presented. The synthesisers are built around both concatenative speech synthesis and statistical parametric speech synthesis frameworks. Text to speech synthesis systems require accurate segmentation. Obtaining accurate segmentation at the phone-level is a difficult task. Manual segmentation leads to human errors, while automatic segmentation using statistical approaches (hidden Markov model based approaches) leads to poor boundary information, when the amount of data used for training is small.

read more

Citations
More filters
Proceedings ArticleDOI

Data-Efficient Training Strategies for Neural TTS Systems

TL;DR: In this article, the authors demonstrate three simple, yet effective pre-training strategies that allow them to train neural TTS systems with just about one-tenth of the data needs while also achieving better accuracy and naturalness.
Journal ArticleDOI

Significance of spectral cues in automatic speech segmentation for Indian language speech synthesizers

TL;DR: In this article, signal processing cues like short-term energy (STE) and sub-band spectral flux (SBSF) are used in tandem with HMM based forced alignment for automatic speech segmentation.
Proceedings ArticleDOI

Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning

TL;DR: This article investigated the use of fine-tuning the English-pretrained Tacotron2 model with limited Sanskrit data to synthesize natural sounding speech in Sanskrit in low resource settings.
Proceedings Article

IndicSpeech: Text-to-Speech Corpus for Indian Languages

TL;DR: A 24 hour text-to-speech corpus for 3 major Indian languages namely Hindi, Malayalam and Bengali is released and a state-of-the-art TTS system for each of these languages is trained.
Journal ArticleDOI

Rules for Orthographic Word Parsing of the Philippines’ Cebuano-Visayan Language Using Context-Free Grammars

TL;DR: G grammar rules for hyphenated words are created which include sequences of a hyphen between vowel-consonant, consonant-cons onant, vowel-vowel, and consonants to enhance the understanding and comprehension of the Cebuano-Visayan discourse.
References
More filters
Journal ArticleDOI

Software for a cascade/parallel formant synthesizer

TL;DR: A software formant synthesizer is described that can generate synthetic speech using a laboratory digital computer and a control program lets the user specify variable control parameter data, such as formant frequencies as a function of time, as a sequence of 〈time, value〉 points.
Proceedings ArticleDOI

Unit selection in a concatenative speech synthesis system using a large speech database

TL;DR: In this paper, a state transition network is proposed to select and concatenate phonemes from a large speech database to produce a natural realisation of a target phoneme sequence predicted from text which is annotated with prosodic and phonetic context information.
Proceedings ArticleDOI

Speech parameter generation algorithms for HMM-based speech synthesis

TL;DR: A speech parameter generation algorithm for HMM-based speech synthesis, in which the speech parameter sequence is generated from HMMs whose observation vector consists of a spectral parameter vector and its dynamic feature vectors, is derived.
Proceedings Article

Automatically clustering similar units for unit selection in speech synthesis.

TL;DR: A new method for synthesizing speech by concatenating sub-word units from a database of labelled speech by automatically clustering units of the same phone class based on their phonetic and prosodic context is described.
Journal ArticleDOI

Speaking in shorthand - A syllable-centric perspective for understanding pronunciation variation

TL;DR: Systematic analysis of pronunciation variation in a corpus of spontaneous English discourse (Switchboard) demonstrates that the variation observed is more systematic at the level of the syllable than at the phonetic-segment level, and syllabic onsets are realized in canonical form far more frequently than either coda or nuclear constituents.