scispace - formally typeset
Search or ask a question

Showing papers in "Computer Speech & Language in 2022"


Journal ArticleDOI
TL;DR: This work shows that VBx achieves superior performance on three of the most popular datasets for evaluating diarization: CALLHOME, AMI and DIHARDII datasets and presents for the first time the derivation and update formulae for the VBX model.

110 citations


Journal ArticleDOI
TL;DR: Speaker diarization is a task to label audio or video recordings with classes that correspond to speaker identity as mentioned in this paper, or in short, identifying "who spoke when" in audio and video recordings.

55 citations


Journal ArticleDOI
TL;DR: The VBx model as discussed by the authors uses a Bayesian hidden Markov model to find speaker clusters in a sequence of x-vectors and achieves superior performance on three popular datasets for evaluating diarization: CALLHOME, AMI and DIHARD II.

41 citations


Journal ArticleDOI
TL;DR: In this paper , the authors developed an Urdu language hate lexicon, on the basis of which they formulated annotated dataset of 10,526 Urdu tweets and used various machine learning techniques for hate speech detection.

41 citations


Journal ArticleDOI
TL;DR: This article identifies key scientific and engineering advances needed to enable effective spoken language interaction with robotics, and makes 25 recommendations, involving eight general themes: putting human needs first, better modeling the social and interactive aspects of language, improving robustness, creating new methods for rapid adaptation, and improving research infrastructure and resources.

38 citations


Journal ArticleDOI
TL;DR: This work proposes a novel short text classification approach combining Context-Relevant Features with multi-stage Attention model based on Temporal Convolutional Network (TCN) and CNN, called CRFA, which uses Probase as external knowledge to enrich the semantic representation for the solution to the data sparsity and ambiguity of short texts.

32 citations



Journal ArticleDOI
TL;DR: This paper proposed a novel short text classification approach combining Context-Relevant Features with multi-stage Attention model based on Temporal Convolutional Network (TCN) and CNN, called CRFA.

25 citations


Journal ArticleDOI
TL;DR: The results of the experimental assessment have shown a transfer of syntactic knowledge of the mBERT model among languages belonging to different branches of the Indo-European languages, namely English, Italian and French, which present very different syntactic constructions.

24 citations


Journal ArticleDOI
TL;DR: Automatic Text Summarization (ATS) is an important area in NLP as mentioned in this paper with the goal of shortening a long text into a more compact version by conveying the most important points in a readable form.

23 citations


Journal ArticleDOI
TL;DR: A full automated method able to classify the spontaneous spoken production of the subjects using the spectrogram of the audio signal, which is the visual representation of the speech of the subject, and a specific data augmentation approach that avoids distorting the original samples is proposed.

Journal ArticleDOI
TL;DR: Automatic Text Summarization (ATS) is an important area in Natural Language Processing (NLP) with the goal of shortening a long text into a more compact version by conveying the most important points in a readable form as mentioned in this paper.

Journal ArticleDOI
TL;DR: In this paper , a weighted ensemble framework for hate and offensive code-mixed posts identification on social platforms has been proposed to detect hate speech and offensive language on social networking platforms.

Journal ArticleDOI
TL;DR: A comprehensive review of the novel and emerging GAN-based speech frameworks and algorithms that have revolutionized speech processing and categorized speech GANs based on application areas: speech synthesis, speech enhancement & conversion, and data augmentation in automatic speech recognition and emotion speech recognition systems.

Journal ArticleDOI
TL;DR: The first VoicePrivacy 2020 Challenge as mentioned in this paper focused on developing anonymization solutions for speech technology and evaluated the results and analyses stemming from the challenge, including the objective and subjective evaluation metrics and attack models.

Journal ArticleDOI
TL;DR: This paper investigated the ability of multilingual BERT (mBERT) language model to transfer syntactic knowledge cross-lingually, verifying if and to which extent syntactic dependency relationships learnt in a language are maintained in other languages.

Journal ArticleDOI
TL;DR: In this article , the authors presented their work on code-switched Egyptian Arabic-English ASR using DNN-based hybrid and Transformer-based end-to-end models.

Journal ArticleDOI
TL;DR: In this paper, the authors presented their work on code-switched Egyptian Arabic-English ASR using DNN-based hybrid and Transformer-based end-to-end models.

Journal ArticleDOI
TL;DR: In this paper , the authors proposed a full automated method able to classify the spontaneous spoken production of the subjects, in particular, trained an artificial neural network using the spectrogram of the audio signal which is the visual representation of the speech of the subject.

Journal ArticleDOI
TL;DR: A comprehensive review of the novel and emerging GAN-based speech frameworks and algorithms that have revolutionized speech processing can be found in this article , where the authors categorized speech GANs based on application areas: speech synthesis, speech enhancement and conversion, and data augmentation in automatic speech recognition and emotion speech recognition systems.

Journal ArticleDOI
TL;DR: Comprehensive evaluations which include a detailed mathematical analysis, a simulation on amplitude and frequency modulated (AM-FM) signals, and a spectrographic inspection involving different filterbank structures, along with their experimental results are provided in this paper.

Journal ArticleDOI
TL;DR: In this paper , the authors proposed a speech anonymization method based on autoencoders and adversarial training. But the method is limited to the English utterance and cannot handle other languages, such as French, German, and Dutch.

Journal ArticleDOI
TL;DR: The overall findings are that emotion lexica can offer complementary information to even extremely large pre-trained models such as BERT, and the performance of the models is comparable to state-of-the art models that are specifically engineered for certain datasets, and even outperform the state of the art on four datasets.

Journal ArticleDOI
TL;DR: This study proposes a language and domain independent approach for automatic extractive text summarization (EATS) tasks, which is based on a clustering scheme supported by a genetic algorithm (GA), to find an optimal grouping of sentences.

Journal ArticleDOI
TL;DR: In this paper , the authors presented the results from the top four teams, which achieved an area-under-the-receiver operating curve (AUC-ROC) of 95.1% on the blind test data.

Journal ArticleDOI
TL;DR: This paper explores the voice commands using a Voice-Assistant System (VAS), i.e., Amazon Alexa, from 40 older adults who were either Healthy Control (HC) participants or Mild Cognitive Impairment (MCI) participants, age 65 or older, to demonstrate the promise of future home-based cognitive assessments using Voice- Assistant Systems.

Journal ArticleDOI
TL;DR: In this article, the authors used a bespoke data collection interface to generate speaking chatbots and made them available as tasks on the crowd sourcing platform Mechanical Turk to simulate how privacy can be communicated in a dialogue between user and machine.

Journal ArticleDOI
TL;DR: In this paper , a state-of-the-art Hindi NER system based on MuRIL language model and CRF is proposed. But, the model is not suitable for the Hindi named entity recognition task.

Journal ArticleDOI
TL;DR: The authors performed a comprehensive benchmarking for end-to-end transformer ASR, modular HMM-DNN ASR and human speech recognition (HSR) on the Arabic language and its dialects.

Journal ArticleDOI
TL;DR: In this article , the authors explored the voice commands using a Voice-Assistant System (VAS), i.e., Amazon Alexa, from 40 older adults who were either Healthy Control (HC) participants or Mild Cognitive Impairment (MCI) participants, age 65 or older.