Showing papers in &quot;Computer Speech &amp; Language in 2018&quot;

Annotating and Modeling Empathy in Spoken Conversations

TL;DR: The performances of several classification methods are compared, including Gaussian Mixture Model–Universal Background Model (GMM–UBM), GMM–Support Vector Machine (G MM–SVM) and i-vector based approaches, and the utility of different frequency bands for speaker, age-group and gender recognition from children’s speech is assessed.

...read moreread less

59 citations

Journal Article•DOI•

[...]

Firoj Alam¹, Morena Danieli¹, Giuseppe Riccardi¹•Institutions (1)

University of Trento¹

Synthetic speech detection using fundamental frequency variation and spectral features

TL;DR: An automatic segmentation and classification system for empathy inspired by the modal model of emotions is designed and evaluated and designed to support both the fusion and automatic selection of relevant features from high dimensional space.

...read moreread less

49 citations

Journal Article•DOI•

[...]

Monisankha Pal¹, Dipjyoti Paul¹, Goutam Saha¹•Institutions (1)

Indian Institute of Technology Kharagpur¹

01 Mar 2018-Computer Speech & Language

TL;DR: This paper proposed a new approach to detect synthetic speech using score-level fusion of front-end features namely, constant Q cepstral coefficients (CQCCs), all-pole group delay function (APGDF) and fundamental frequency variation (FFV), which outperforms all existing baseline features for both known and unknown attacks.

...read moreread less

47 citations

Journal Article•DOI•

Analysis of speech production real-time MRI

[...]

Vikram Ramanarayanan¹, Vikram Ramanarayanan², Sam Tilsen³, Michael Proctor⁴, Johannes Töger, Louis Goldstein⁵, Krishna S. Nayak⁵, Shrikanth S. Narayanan⁵ - Show less +4 more•Institutions (5)

Educational Testing Service¹, University of California, San Francisco², Cornell University³, Macquarie University⁴, University of Southern California⁵

01 Nov 2018-Computer Speech & Language

TL;DR: Techniques used in analysis of articulatory data acquired using RT-MRI are reviewed, the utility of different approaches for different types of data and research goals are assessed, and new challenges in audio–video data analysis and data modeling are presented.

...read moreread less

44 citations

Journal Article•DOI•

Assessment of pitch-adaptive front-end signal processing for children’s speech recognition

[...]

Rohit Sinha¹, Syed Shahnawazuddin¹•Institutions (1)

Indian Institute of Technology Guwahati¹

01 Mar 2018-Computer Speech & Language

TL;DR: Pitch-adaptive front-end signal processing in deriving the Mel-frequency cepstral coefficient features is explored to reduce the sensitivity to pitch variation and the effectiveness of existing speaker normalization techniques remain intact even with the use of proposed pitch- Adaptive MFCCs.

...read moreread less

35 citations

Journal Article•DOI•

Semi-supervised speech activity detection with an application to automatic speaker verification

[...]

Alexey Sholokhov¹, Alexey Sholokhov², Sahidullah², Tomi Kinnunen²•Institutions (2)

Saint Petersburg State University of Information Technologies, Mechanics and Optics¹, University of Eastern Finland²

Neural versus Phrase-Based MT Quality: an In-Depth Analysis on English-German and English-French

TL;DR: This work extends the conventional expectation-maximization algorithm for GMM training using semi-supervised learning and provides a methodology to incorporate unlabeled data into the SAD training process, leading to more accurate statistical models by exploiting the structure of data distribution.

...read moreread less

34 citations

Journal Article•DOI•

[...]

Luisa Bentivogli¹, Arianna Bisazza², Mauro Cettolo¹, Marcello Federico¹•Institutions (2)

fondazione bruno kessler¹, Leiden University²

Combining sentence similarities measures to identify paraphrases

TL;DR: A detailed analysis of neural versus phrase-based statistical machine translation outputs, leveraging high quality post-edits performed by professional translators on the IWSLT data provides useful insights on what linguistic phenomena are best modelled by neural models.

...read moreread less

33 citations

Journal Article•DOI•

[...]

Rafael Ferreira¹, George D. C. Cavalcanti², Fred Freitas², Rafael Dueire Lins¹, Steven J. Simske³, Marcelo Riss³ - Show less +2 more•Institutions (3)

Universidade Federal Rural de Pernambuco¹, Federal University of Pernambuco², Hewlett-Packard³

Optimal sensor placement in electromagnetic articulography recording for speech production study

TL;DR: A paraphrase identification system that represents each pair of sentence as a combination of different similarity measures that extract lexical, syntactic and semantic components of the sentences encompassed in a graph is proposed.

...read moreread less

32 citations

Journal Article•DOI•

[...]

Ashok Kumar Pattem¹, Aravind Illa¹, Amber Afshan², Prasanta Kumar Ghosh¹•Institutions (2)

Indian Institute of Science¹, University of California, Los Angeles²

Localizing speakers in multiple rooms by using Deep Neural Networks

TL;DR: This work addresses the problem of optimal placement of EMA sensors by posing it as the optimal selection of points for minimizing the reconstruction error of the air-tissue boundaries in the real-time magnetic resonance imaging (rtMRI) video frames of vocal tract (VT) in the mid-sagittal plane using dynamic programming.

...read moreread less

32 citations

Journal Article•DOI•

[...]

Fabio Vesperini¹, Paolo Vecchiotti¹, Emanuele Principi¹, Stefano Squartini¹, Francesco Piazza¹ - Show less +1 more•Institutions (1)

Marche Polytechnic University¹

Restricted Boltzmann machines for vector representation of speech in speaker recognition

TL;DR: It is shown how DNN-based algorithm significantly outperforms the state-of-the-art approaches evaluated on the DIRHA dataset, providing an average localization error expressed in terms of Root Mean Square Error (RMSE), equal to 324 mm and 367 mm for the Simulated and the Real subsets.

...read moreread less

Journal Article•DOI•

[...]

Omid Ghahabi¹, Javier Hernando¹•Institutions (1)

Polytechnic University of Catalonia¹

On the Effects of Using word2vec Representations in Neural Networks for Dialogue Act Recognition

TL;DR: Experiments on the core test condition 5 of NIST SRE 2010 show that comparable results with conventional i-vectors are achieved with a clearly lower computational load in the vector extraction process.

...read moreread less

Journal Article•DOI•

[...]

Christophe Cerisara, Pavel Král¹, Ladislav Lenc¹•Institutions (1)

University of West Bohemia¹

Comparing human and automatic speech recognition in simple and complex acoustic scenes

TL;DR: This paper proposed a new deep neural network that explores recurrent models to capture word sequences within sentences, and further study the impact of pretrained word embeddings on the performance of the proposed approach.

...read moreread less

Journal Article•DOI•

[...]

Constantin Spille¹, Birger Kollmeier¹, Bernd Meyer¹•Institutions (1)

University of Oldenburg¹

01 Nov 2018-Computer Speech & Language

TL;DR: It is found that DNN-based ASR reaches human performance for single-channel, small-vocabulary tasks in the presence of speech-shaped noise and in multi-talker babble noise, which is an important difference to previous human-machine comparisons.

...read moreread less

Journal Article•DOI•

Uncertainty weighting and propagation in DNN–HMM-based speech recognition

[...]

José Novoa¹, Josué Fredes¹, Víctor Poblete², Néstor Becerra Yoma¹•Institutions (2)

University of Chile¹, Austral University of Chile²

Improving PLDA speaker verification performance using domain mismatch compensation techniques

TL;DR: The results presented here suggest that substantial reduction in WER is achieved with clean training, and the uncertainty weighting method reduced the gap between clean and multi-noise/multi-condition training.

...read moreread less

Journal Article•DOI•

[...]

Hafizur Rahman¹, Ahilan Kanagasundaram¹, Ivan Himawan¹, David Dean¹, Sridha Sridharan¹ - Show less +1 more•Institutions (1)

Queensland University of Technology¹

Rank-1 constrained Multichannel Wiener Filter for speech recognition in noisy environments

TL;DR: In this article, a domain-invariant linear discriminant analysis (DI-LDA) technique was proposed to compensate domain mismatch from both LDA and PLDA subspaces.

...read moreread less

Journal Article•DOI•

[...]

Ziteng Wang¹, Emmanuel Vincent², Romain Serizel², Yonghong Yan¹•Institutions (2)

Chinese Academy of Sciences¹, French Institute for Research in Computer Science and Automation²

Automatic Analysis of Pronunciations for Children with Speech Sound Disorders.

TL;DR: In this paper, the rank-1 constrained multichannel Wiener filter is employed for noise reduction and a new constant residual noise power constraint is derived which enhances the recognition performance.

...read moreread less

Journal Article•DOI•

[...]

Shiran Dudy¹, Steven Bedrick¹, Meysam Asgari¹, Alexander Kain¹•Institutions (1)

Oregon Health & Science University¹

An empirical study on POS tagging for Vietnamese social media text

TL;DR: It is found that pronunciation models that use explicit knowledge about error pronunciation patterns can lead to more accurate classification whether a phoneme was correctly pronounced or not, and this paper proposes two new GOP techniques.

...read moreread less

Journal Article•DOI•

[...]

Ngo Xuan Bach¹, Nguyen Dieu Linh¹, Tu Minh Phuong¹•Institutions (1)

Posts and Telecommunications Institute of Technology¹

Audio–visual synchronization in reading while listening to texts: Effects on visual behavior and verbal learning

TL;DR: An empirical study on POS tagging for Vietnamese social media text is presented, which shows several challenges compared with tagging for general text and the semi-supervised model outperformed, in terms of accuracy, the version of vnTagger trained on the same Facebook dataset, showing the usefulness of word cluster features.

...read moreread less

Journal Article•DOI•

[...]

Emilie Gerbier¹, Gérard Bailly², Marie-Line Bosse²•Institutions (2)

Centre national de la recherche scientifique¹, University of Grenoble²

Reward estimation for dialogue policy optimisation

TL;DR: The S+ condition presumably captured the children's attention toward the currently heard word, which forced the children to be strictly aligned with the oral modality, as well as improving the learning benefits provided by a reading experience.

...read moreread less

Journal Article•DOI•

[...]

Pei-Hao Su¹, Milica Gasic¹, Steve Young¹•Institutions (1)

University of Cambridge¹

01 Sep 2018-Computer Speech & Language

TL;DR: Two approaches to tackling dialogue management as a reinforcement learning task are presented, whereby a recurrent neural network is utilised as a task success predictor which is pre-trained from off-line data to estimate task success during subsequent on-line dialogue policy learning.

...read moreread less

Journal Article•DOI•

Automatic cohesive summarization with pronominal anaphora resolution

[...]

Jamilson Antunes¹, Rafael Dueire Lins¹, Rafael Dueire Lins², Rinaldo Lima¹, Rinaldo Lima², Hilário Oliveira¹, Marcelo Riss³, Steven J. Simske³ - Show less +4 more•Institutions (3)

Federal University of Pernambuco¹, Universidade Federal Rural de Pernambuco², Hewlett-Packard³

01 Nov 2018-Computer Speech & Language

TL;DR: This paper proposes a method that solves unbound pronominal anaphoric expressions, automatically enabling the cohesiveness of the extractive summaries, and provides a comparative evaluation concerning two distinct assessment scenarios which are compared to a baseline.

...read moreread less

Journal Article•DOI•

A novel rule based machine translation scheme from Greek to Greek Sign Language: Production of different types of large corpora and Language Models evaluation

[...]

Dimitris Kouremenos¹, Klimis Ntalianis¹, Stefanos Kollias¹•Institutions (1)

National Technical University of Athens¹

01 Sep 2018-Computer Speech & Language

TL;DR: This work presents a novel prototype Rule Based Machine Translation (RBMT) system for the creation of large and quality written Greek Sign Language (GSL) glossed corpora from Greek text and stresses that Language Models for written GSL gloss are missing from the scientific literature, thus this work is pioneer in this field.

...read moreread less

Journal Article•DOI•

Sparse coding based features for speech units classification

[...]

Pulkit Sharma¹, Vinayak Abrol¹, A. D. Dileep¹, Anil Kumar Sao¹•Institutions (1)

Indian Institute of Technology Mandi¹

Conversational telephone speech recognition for Lithuanian

TL;DR: Both raw speech samples and mel frequency cepstral coefficients are used as an initial representation for feature extraction and a transformation function known as weighted decomposition (WD) of principal components is used to emphasize the discriminative information present in the PCA-based dictionary.

...read moreread less

Journal Article•DOI•

[...]

Rasa Lileikytė¹, Lori Lamel¹, Jean-Luc Gauvain¹, Arseniy Gorin¹•Institutions (1)

Université Paris-Saclay¹

Inferring stance in news broadcasts from prosodic-feature configurations

TL;DR: The use of Web texts for language modeling is shown to significantly improve both speech recognition and keyword spotting performance, and combining full-word and subword units leads to the best keyword spotting results.

...read moreread less

Journal Article•DOI•

[...]

Nigel Ward¹, Jason C. Carlson¹, Olac Fuentes¹•Institutions (1)

University of Texas at El Paso¹

Dempster-Shafer theory for enhanced statistical model-based voice activity detection

TL;DR: This work identifies 14 aspects of stance that occur frequently in radio news stories and that could be useful for information retrieval, including indications of subjectivity, immediacy, local relevance, and newness.

...read moreread less

Journal Article•DOI•

[...]

Tae Jun Park¹, Joon-Hyuk Chang¹•Institutions (1)

Hanyang University¹

Computer based speech prosody teaching system

TL;DR: The probability of an ignorant state is eliminated through the orthogonal sum of several speech presence probabilities, which results in the performance improvement when detecting voice activity.

...read moreread less

Journal Article•DOI•

[...]

Dávid Sztahó¹, Gabor Kiss¹, Klára Vicsi¹•Institutions (1)

Budapest University of Technology and Economics¹

Using a PCA-based dataset similarity measure to improve cross-corpus emotion recognition

TL;DR: A novel prosody teaching system where intensity (accent), intonation and rhythm are presented visually for the students as visual feedback and automatic assessment scores are given jointly and separately for the goodness of intonations and rhythm is introduced.

...read moreread less

Journal Article•DOI•

[...]

Ingo Siegert¹, Ronald Böck¹, Andreas Wendemuth¹•Institutions (1)

Otto-von-Guericke University Magdeburg¹

01 Sep 2018-Computer Speech & Language

TL;DR: A corpus similarity measure based on PCA-ranked features answers the question which corpora should be included into joint training and outperforms all other combinations of corpora.

...read moreread less

Journal Article•DOI•

RankUp: Enhancing graph-based keyphrase extraction methods with error-feedback propagation

[...]

Gerardo Figueroa¹, Po-Chi Chen¹, Yi-Shin Chen¹•Institutions (1)

National Tsing Hua University¹