Building speech synthesis systems for Indian languages

doi:10.1109/NCC.2015.7084931

Proceedings ArticleDOI

Building speech synthesis systems for Indian languages

- pp 1-6

TLDR

New efforts to build text-to-speech synthesis systems (TTS) for Indian languages is presented and a group delay based syllable segmentation semi-automatic tool is discussed, showing that automatic segmentation is preferred.

Abstract:

In this paper, new efforts to build text-to-speech synthesis systems (TTS) for Indian languages is presented. The synthesisers are built around both concatenative speech synthesis and statistical parametric speech synthesis frameworks. Text to speech synthesis systems require accurate segmentation. Obtaining accurate segmentation at the phone-level is a difficult task. Manual segmentation leads to human errors, while automatic segmentation using statistical approaches (hidden Markov model based approaches) leads to poor boundary information, when the amount of data used for training is small.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Data-Efficient Training Strategies for Neural TTS Systems

K R Prajwal, +1 more

TL;DR: In this article, the authors demonstrate three simple, yet effective pre-training strategies that allow them to train neural TTS systems with just about one-tenth of the data needs while also achieving better accuracy and naturalness.

...read moreread less

Journal ArticleDOI

Significance of spectral cues in automatic speech segmentation for Indian language speech synthesizers

Arun Baby, +3 more

- 01 Oct 2020 -

Speech Communication

TL;DR: In this article, signal processing cues like short-term energy (STE) and sub-band spectral flux (SBSF) are used in tandem with HMM based forced alignment for automatic speech segmentation.

...read moreread less

Proceedings ArticleDOI

Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning

Ankur Debnath, +3 more

TL;DR: This article investigated the use of fine-tuning the English-pretrained Tacotron2 model with limited Sanskrit data to synthesize natural sounding speech in Sanskrit in low resource settings.

...read moreread less

Proceedings Article

IndicSpeech: Text-to-Speech Corpus for Indian Languages

Nimisha Srivastava, +3 more

TL;DR: A 24 hour text-to-speech corpus for 3 major Indian languages namely Hindi, Malayalam and Bengali is released and a state-of-the-art TTS system for each of these languages is trained.

...read moreread less

Journal ArticleDOI

Rules for Orthographic Word Parsing of the Philippines’ Cebuano-Visayan Language Using Context-Free Grammars

Roseclaremath A. Caroro, +2 more

- 01 Apr 2020 -

International Journal of Software Scienc...

TL;DR: G grammar rules for hyphenated words are created which include sequences of a hyphen between vowel-consonant, consonant-cons onant, vowel-vowel, and consonants to enhance the understanding and comprehension of the Cebuano-Visayan discourse.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

Software for a cascade/parallel formant synthesizer

Dennis H. Klatt

- 01 Mar 1980 -

Journal of the Acoustical Society of Ame...

TL;DR: A software formant synthesizer is described that can generate synthetic speech using a laboratory digital computer and a control program lets the user specify variable control parameter data, such as formant frequencies as a function of time, as a sequence of 〈time, value〉 points.

...read moreread less

Proceedings ArticleDOI

Unit selection in a concatenative speech synthesis system using a large speech database

Andrew Hunt, +1 more

TL;DR: In this paper, a state transition network is proposed to select and concatenate phonemes from a large speech database to produce a natural realisation of a target phoneme sequence predicted from text which is annotated with prosodic and phonetic context information.

...read moreread less

Proceedings ArticleDOI

Speech parameter generation algorithms for HMM-based speech synthesis

Keiichi Tokuda, +4 more

TL;DR: A speech parameter generation algorithm for HMM-based speech synthesis, in which the speech parameter sequence is generated from HMMs whose observation vector consists of a spectral parameter vector and its dynamic feature vectors, is derived.

...read moreread less

Proceedings Article

Automatically clustering similar units for unit selection in speech synthesis.

Alan W. Black, +1 more

TL;DR: A new method for synthesizing speech by concatenating sub-word units from a database of labelled speech by automatically clustering units of the same phone class based on their phonetic and prosodic context is described.

...read moreread less

Journal ArticleDOI

Speaking in shorthand - A syllable-centric perspective for understanding pronunciation variation

Steven Greenberg

- 01 Nov 1999 -

Speech Communication

TL;DR: Systematic analysis of pronunciation variation in a corpus of spontaneous English discourse (Switchboard) demonstrates that the variation observed is more systematic at the level of the syllable than at the phonetic-segment level, and syllabic onsets are realized in canonical form far more frequently than either coda or nuclear constituents.

...read moreread less

Collapse

Building speech synthesis systems for Indian languages

Citations

Data-Efficient Training Strategies for Neural TTS Systems

Significance of spectral cues in automatic speech segmentation for Indian language speech synthesizers

Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning

IndicSpeech: Text-to-Speech Corpus for Indian Languages

Rules for Orthographic Word Parsing of the Philippines’ Cebuano-Visayan Language Using Context-Free Grammars

References

Software for a cascade/parallel formant synthesizer

Unit selection in a concatenative speech synthesis system using a large speech database

Speech parameter generation algorithms for HMM-based speech synthesis

Automatically clustering similar units for unit selection in speech synthesis.

Speaking in shorthand - A syllable-centric perspective for understanding pronunciation variation

Related Papers (5)

Arabic Speech Synthesis System Based on HMM

Automatic Speech Segmentation with HMM

Embedded Learning Segmentation Approach for Arabic Speech Recognition

Framework for cross-language automatic phonetic segmentation

Decision tree usage for incremental parametric speech synthesis