Multilingual processing of speech via web services

Question

Q1. What are the contributions in this paper?

Q2. What are the future works in this paper?

Q3. What are the two tasks that occurred frequently?

Q4. What is the advantage of web services?

Q5. What are the uses of G2P converters?

Q6. What is the cost function for the alignment of two symbolic sequences?

Q7. What is the common framework that supports the definition of building blocks?

Q8. What are the two metadata descriptions used for the BAS web services?

Q9. What is the morphological analysis of the stress learning-by-analogy?

Q10. What are the main techniques used to modify the synthesized speech?

Q11. What is the metrical tree induced by the stress learning-by-analogy?

Accepted Answer

In this paper, the authors describe a set of web services at the Bavarian Archive for Speech Signals ( BAS ) CLARIN centre in Munich that were developed for the multilingual processing of spoken language, i.e. speech signals.

Accepted Answer

One natural goal is to enlarge the user community by extending the implemented services to as many languages as possible. Such an approach is especially necessary because the authors plan to introduce more signal processing web services in the coming years ( see below ). The authors plan to investigate the best cloud storage possibilities and implement APIs for these. The authors plan to implement a new automatic chunk segmentation web service based on the results reported in Pörner ( 2016 ) that allows the user to first chunk segment very long recordings and then process the chunks in MAUS batch mode.

Accepted Answer

Two tasks occurred frequently: extending the web interface to incorporate new services and changing/adding parameters that control the back end behaviour.

Accepted Answer

One advantage of web services is the possibility of easily combining services into more complex processing constructs or processing chains.

Accepted Answer

G2P converters are needed for speech synthesis, for aiding manual and automatic transcription of spoken text and for the generation of pronunciation dictionaries based on text collections (to name but a few possible uses).

Accepted Answer

The aligner considers the alignment of two symbolic sequences v and w as a task to transform v into w by a minimum sum of edit costs, which is known as the Levenshtein distance (Levenshtein, 1966).

Accepted Answer

One popular framework that supports the definition of such building blocks is AngularJS ix which follows a modified Modal View Controller (MVC) pattern.

Accepted Answer

The two metadata descriptions, WADL and CMD instance, allow for automatic service invocation by other applications and the automatic generation of documentation.

Accepted Answer

For German and English, the stress learning-by-analogy is preceded by a compound decomposition step based on the morphological analyses (described in Reichel 2012a).

Accepted Answer

Several standard techniques to modify the synthesized speech, e.g. 'vocal tract scaling', 'fundamental frequency scaling', 'chorus effect' etc., are available.

Accepted Answer

From this decomposition, a metrical tree is induced based on the relative coherence of neighbouring compound parts (Reichel 2012b).

Multilingual processing of speech via web services

Citations

"Do you trust me?": Increasing User-Trust by Integrating Virtual Agents in Explainable AI Interaction Design

“Let me explain!”: exploring the potential of virtual agents in explainable AI interaction design

Rapid computations of spectrotemporal prediction error support perception of degraded speech.

Exploiting Multi-Modal Features from Pre-Trained Networks for Alzheimer's Dementia Recognition.

MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible

References

C4.5: Programs for Machine Learning

Binary codes capable of correcting deletions, insertions and reversals

Praat, a system for doing phonetics by computer

Binary codes capable of correcting deletions, insertions, and reversals

The String-to-String Correction Problem

Related Papers (5)

lmerTest Package: Tests in Linear Mixed Effects Models

Fitting Linear Mixed-Effects Models Using lme4

Praat : doing phonetics by computer [Computer program]

Predictability Effects on Durations of Content and Function Words in Conversational English

A theory of lexical access in speech production.

Frequently Asked Questions (11)