scispace - formally typeset
Open AccessJournal ArticleDOI

Multilingual processing of speech via web services

Reads0
Chats0
TLDR
Five multilingual web services for speech science operational since 2012 are described and the benefits and drawbacks of the new paradigm as well as the experiences with user acceptance and implementation problems are discussed.
About
This article is published in Computer Speech & Language.The article was published on 2017-09-01 and is currently open access. It has received 272 citations till now. The article focuses on the topics: Web service & Speech synthesis.

read more

Citations
More filters
Proceedings ArticleDOI

"Do you trust me?": Increasing User-Trust by Integrating Virtual Agents in Explainable AI Interaction Design

TL;DR: Examination of the impact of virtual agents within the field of XAI on the perceived trustworthiness of autonomous intelligent systems finds significant evidence suggesting that the integration ofvirtual agents into XAI interaction design leads to an increase of trust in the autonomous intelligent system.
Journal ArticleDOI

“Let me explain!”: exploring the potential of virtual agents in explainable AI interaction design

TL;DR: It is found that increased human-likeness of and interaction with the virtual agent are the two most common mention points on how to improve the proposed XAI interaction design.
Journal ArticleDOI

Rapid computations of spectrotemporal prediction error support perception of degraded speech.

TL;DR: It is demonstrated that spectrotemporal modulations in speech are more strongly represented in neural responses than alternative speech representations, and an interaction between speech signal quality and expectations from prior written text on the quality of neural representations is found.
Proceedings ArticleDOI

Exploiting Multi-Modal Features from Pre-Trained Networks for Alzheimer's Dementia Recognition.

TL;DR: This work exploits various multi-modal features extracted from pre-trained networks to recognize Alzheimer's Dementia using a neural network, with a small dataset provided by the ADReSS Challenge at INTERSPEECH 2020.
Proceedings Article

MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible

TL;DR: This article proposes to add multilingual links between speech segments in different languages, and shares a large and clean dataset of 8,130 parallel spoken utterances across 8 languages, which is named MaSS (Multilingual corpus of Sentence-aligned Spoken utterances).
References
More filters
Book

C4.5: Programs for Machine Learning

TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
Journal ArticleDOI

The String-to-String Correction Problem

TL;DR: An algorithm is presented which solves the string-to-string correction problem in time proportional to the product of the lengths of the two strings.
Related Papers (5)
Frequently Asked Questions (11)
Q1. What are the contributions in this paper?

In this paper, the authors describe a set of web services at the Bavarian Archive for Speech Signals ( BAS ) CLARIN centre in Munich that were developed for the multilingual processing of spoken language, i.e. speech signals. 

One natural goal is to enlarge the user community by extending the implemented services to as many languages as possible. Such an approach is especially necessary because the authors plan to introduce more signal processing web services in the coming years ( see below ). The authors plan to investigate the best cloud storage possibilities and implement APIs for these. The authors plan to implement a new automatic chunk segmentation web service based on the results reported in Pörner ( 2016 ) that allows the user to first chunk segment very long recordings and then process the chunks in MAUS batch mode. 

Two tasks occurred frequently: extending the web interface to incorporate new services and changing/adding parameters that control the back end behaviour. 

One advantage of web services is the possibility of easily combining services into more complex processing constructs or processing chains. 

G2P converters are needed for speech synthesis, for aiding manual and automatic transcription of spoken text and for the generation of pronunciation dictionaries based on text collections (to name but a few possible uses). 

The aligner considers the alignment of two symbolic sequences v and w as a task to transform v into w by a minimum sum of edit costs, which is known as the Levenshtein distance (Levenshtein, 1966). 

One popular framework that supports the definition of such building blocks is AngularJS ix which follows a modified Modal View Controller (MVC) pattern. 

The two metadata descriptions, WADL and CMD instance, allow for automatic service invocation by other applications and the automatic generation of documentation. 

For German and English, the stress learning-by-analogy is preceded by a compound decomposition step based on the morphological analyses (described in Reichel 2012a). 

Several standard techniques to modify the synthesized speech, e.g. 'vocal tract scaling', 'fundamental frequency scaling', 'chorus effect' etc., are available. 

From this decomposition, a metrical tree is induced based on the relative coherence of neighbouring compound parts (Reichel 2012b).