Showing papers by "Sandra Maria Aluísio published in 2020"

PDF

Open Access

Proceedings Article•DOI•

Using Eye-tracking Data to Predict the Readability of Brazilian Portuguese Sentences in Single-task, Multi-task and Sequential Transfer Learning Approaches

[...]

Sidney Evaldo Leal¹, João Marcos Munguba Vieira, Erica dos Santos Rodrigues, Elisângela Nogueira Teixeira², Sandra Maria Aluísio¹ - Show less +1 more•Institutions (2)

University of São Paulo¹, Federal University of Ceará²

01 Dec 2020

TL;DR: The best model proposed here reaches the new state-of-the-art for Portuguese with 97.5% accuracy, an increase of almost 10 points compared to the best previous results, in addition to proposing improvements in the public dataset after analysing the errors of the best model.

...read moreread less

Abstract: Sentence complexity assessment is a relatively new task in Natural Language Processing. One of its aims is to highlight in a text which sentences are more complex to support the simplification of contents for a target audience (e.g., children, cognitively impaired users, non-native speakers and low-literacy readers (Scarton and Specia, 2018)). This task is evaluated using datasets of pairs of aligned sentences including the complex and simple version of the same sentence. For Brazilian Portuguese, the task was addressed by (Leal et al., 2018), who set up the first dataset to evaluate the task in this language, reaching 87.8% of accuracy with linguistic features. The present work advances these results, using models inspired by (Gonzalez-Garduno and Sogaard, 2018), which hold the state-of-the-art for the English language, with multi-task learning and eye-tracking measures. First-Pass Duration, Total Regression Duration and Total Fixation Duration were used in two moments; first to select a subset of linguistic features and then as an auxiliary task in the multi-task and sequential learning models. The best model proposed here reaches the new state-of-the-art for Portuguese with 97.5% accuracy 1 , an increase of almost 10 points compared to the best previous results, in addition to proposing improvements in the public dataset after analysing the errors of our best model.

...read moreread less

6 citations

Book Chapter•DOI•

A Dataset for the Evaluation of Lexical Simplification in Portuguese for Children

[...]

Nathan Siegle Hartmann¹, Gustavo Paetzold², Sandra Maria Aluísio¹•Institutions (2)

University of São Paulo¹, Federal University of Technology - Paraná²

02 Mar 2020

TL;DR: An improved version of SIMPLEX-PB, a public benchmarking dataset for LS that was subject to multiple rounds of manual annotation in order for it to accurately capture the simplification needs of underprivileged children, is presented.

...read moreread less

Abstract: Most research on Lexical Simplification (LS) addresses non-native speakers of English, since they are numerous and easier to recruit for evaluating resources. Target audiences that are harder to deal with, such as children, are often underrepresented in literature, although simplifying a text for children could facilitate access to knowledge in a classroom setting, for example. This paper presents an improved version of SIMPLEX-PB, a public benchmarking dataset for LS that was subject to multiple rounds of manual annotation in order for it to accurately capture the simplification needs of underprivileged children. It addresses all limitations of the old SIMPLEX-PB: incorrectly generated synonyms for complex words, low coverage of synonyms, and the absence of reliable simplicity rankings for synonyms. The dataset was subjected to an enhancement on the number of synonyms for its target complex words (7,31 synonyms on average), and the simplicity rankings introduced were manually provided by the target audience itself – children between 10 and 14 years of age studying in underprivileged public institutions.

...read moreread less

5 citations

Proceedings Article•DOI•

SIMPLEX-PB 2.0: A Reliable Dataset for Lexical Simplification in Brazilian Portuguese

[...]

Nathan Siegle Hartmann, Gustavo Paetzold, Sandra Maria Aluísio

01 Jul 2020

TL;DR: To create SIMPLEX-PB 2.0, a dataset for LS in Brazilian Portuguese that, unlike its predecessor SIMPLEx-PB, accurately captures the needs of Brazilian underprivileged children, and features much more reliable and numerous candidate substitutions to complex words, as well as word complexity rankings produced by a group underPrivileged children.

...read moreread less

Abstract: Most research on Lexical Simplification (LS) addresses non-native speakers of English, since they are numerous and easy to recruit. This makes it difficult to create LS solutions for other languages and target audiences. This paper presents SIMPLEX-PB 2.0, a dataset for LS in Brazilian Portuguese that, unlike its predecessor SIMPLEX-PB, accurately captures the needs of Brazilian underprivileged children. To create SIMPLEX-PB 2.0, we addressed all limitations of the old SIMPLEX-PB through multiple rounds of manual annotation. As a result, SIMPLEX-PB 2.0 features much more reliable and numerous candidate substitutions to complex words, as well as word complexity rankings produced by a group underprivileged children.

...read moreread less

4 citations

Proceedings Article•

Evaluating Sentence Segmentation in Different Datasets of Neuropsychological Language Tests in Brazilian Portuguese

[...]

Edresson Casanova¹, Marcos Vinícius Treviso², Lilian Cristine Hübner³, Sandra Maria Aluísio¹•Institutions (3)

University of São Paulo¹, University of Lisbon², Pontifícia Universidade Católica do Rio Grande do Sul³

01 May 2020

TL;DR: Two new models for segmenting impaired speech transcriptions are developed, along with an ideal combination of datasets and specific groups of narratives to be used as the training set, for narratives of neuropsychological language tests.

...read moreread less

Abstract: Automatic analysis of connected speech by natural language processing techniques is a promising direction for diagnosing cognitive impairments. However, some difficulties still remain: the time required for manual narrative transcription and the decision on how transcripts should be divided into sentences for successful application of parsers used in metrics, such as Idea Density, to analyze the transcripts. The main goal of this paper was to develop a generic segmentation system for narratives of neuropsychological language tests. We explored the performance of our previous single-dataset-trained sentence segmentation architecture in a richer scenario involving three new datasets used to diagnose cognitive impairments, comprising different stories and two types of stimulus presentation for eliciting narratives — visual and oral — via illustrated story-book and sequence of scenes, and by retelling. Also, we proposed and evaluated three modifications to our previous RCNN architecture: (i) the inclusion of a Linear Chain CRF; (ii) the inclusion of a self-attention mechanism; and (iii) the replacement of the LSTM recurrent layer by a Quasi-Recurrent Neural Network layer. Our study allowed us to develop two new models for segmenting impaired speech transcriptions, along with an ideal combination of datasets and specific groups of narratives to be used as the training set.

...read moreread less

3 citations

Posted Content•

TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese

[...]

Edresson Casanova¹, Arnaldo Candido Junior, Christopher Shulby, Frederico Santos de Oliveira, João Paulo Teixeira, Moacir Antonelli Ponti, Sandra Maria Aluísio - Show less +3 more•Institutions (1)

University of São Paulo¹

11 May 2020-arXiv: Audio and Speech Processing

TL;DR: In this paper, the authors present a dataset of 10.5 hours from a single speaker, from which a Tacotron 2 model with the RTISI-LA vocoder achieved the best performance, achieving a 4.03 MOS value.

...read moreread less

Abstract: Speech provides a natural way for human-computer interaction. In particular, speech synthesis systems are popular in different applications, such as personal assistants, GPS applications, screen readers and accessibility tools. However, not all languages are on the same level when in terms of resources and systems for speech synthesis. This work consists of creating publicly available resources for Brazilian Portuguese in the form of a novel dataset along with deep learning models for end-to-end speech synthesis. Such dataset has 10.5 hours from a single speaker, from which a Tacotron 2 model with the RTISI-LA vocoder presented the best performance, achieving a 4.03 MOS value. The obtained results are comparable to related works covering English language and the state-of-the-art in Portuguese.

...read moreread less

1 citations

Posted Content•

Speech2Phone: A Multilingual and Text Independent Speaker Identification Model.

[...]

Edresson Casanova, Arnaldo Cândido Júnior, Christopher Shulby, Hamilton Pereira da Silva, Pedro Luiz de Paula Filho, Alessandro Ferreira Cordeiro, Victor Guedes, Sandra Maria Aluísio - Show less +4 more

25 Feb 2020-arXiv: Computation and Language

TL;DR: This work proposes the Speech2Phone and compares several embedding models for open-set speaker identification, as well as traditional closed-set models, showing that embeddings generated by artificial neural networks are competitive when compared to classical approaches for the task.

...read moreread less

Abstract: Voice recognition is an area with a wide application potential. Speaker identification is useful in several voice recognition tasks, as seen in voice-based authentication, transcription systems and intelligent personal assistants. Some tasks benefit from open-set models which can handle new speakers without the need of retraining. Audio embeddings for speaker identification is a proposal to solve this issue. However, choosing a suitable model is a difficult task, especially when the training resources are scarce. Besides, it is not always clear whether embeddings are as good as more traditional methods. In this work, we propose the Speech2Phone and compare several embedding models for open-set speaker identification, as well as traditional closed-set models. The models were investigated in the scenario of small datasets, which makes them more applicable to languages in which data scarceness is an issue. The results show that embeddings generated by artificial neural networks are competitive when compared to classical approaches for the task. Considering a testing dataset composed of 20 speakers, the best models reach accuracies of 100% and 76.96% for closed an open set scenarios, respectively. Results suggest that the models can perform language independent speaker identification. Among the tested models, a fully connected one, here presented as Speech2Phone, led to the higher accuracy. Furthermore, the models were tested for different languages showing that the knowledge learned was successfully transferred for close and distant languages to Portuguese (in terms of vocabulary). Finally, the models can scale and can handle more speakers than they were trained for, identifying 150% more speakers while still maintaining 55% accuracy.

...read moreread less

1 citations

Journal Article•DOI•

Identificação automática de unidades de informação em testes de reconto de narrativas usando métodos de similaridade semântica

[...]

Leandro Borges dos Santos¹, Sandra Maria Aluísio¹•Institutions (1)

University of São Paulo¹

04 Jan 2020-Linguamática

TL;DR: Two clinical tasks are evaluated: the automatic identification of which elements of a retold narrative were recalled; and the binary classification of the narrative produced by a patient, having the units identified as attributes, aiming at an automatic screening of patients with cognitive impairment.

...read moreread less

Abstract: Diagnoses of Alzheimer's Disease (AD) and Mild Cognitive Impairment (CCL) are based on the analysis of the patient's cognitive functions by administering cognitive and neuropsychological assessment batteries. The use of retelling narratives is common to help identify and quantify the degree of dementia. In general, one point is awarded for each unit recalled, and the final score represents the number of units recalled. In this paper, we evaluated two clinical tasks: the automatic identification of which elements of a retold narrative were recalled; and the binary classification of the narrative produced by a patient, having the units identified as attributes, aiming at an automatic screening of patients with cognitive impairment. We used two transcribed retelling data sets in which sentences were divided and manually annotated with the information units. These data sets were then made publicly available. They are: the Arizona Battery for Communication and Dementia Disorders (ABCD) that contains narratives of patients with CCL and Healthy Controls and the Avaliacao da Linguagem no Envelhecimento (BALE), which includes narratives of patients with AD and CCLs as well as Healthy Controls. We evaluated two methods based on semantic similarity, referred to here as STS and Chunking, and transformed the multi-label problem of identifying elements of a retold narrative into binary classification problems, finding a cutoff point for the similarity value of each information unit. In this way, we were able to overcome two baselines for the two datasets in the SubsetAccuracy metric, which is the most punitive for the multi-label scenario. In binary classification, however, not all six machine learning methods evaluated performed better than the baselines methods. For ABCD, the best methods were Decision Trees and KNN, and for BALE, SVM with RBF kernel stood out.

...read moreread less

1 citations

Posted Content•

End-To-End Speech Synthesis Applied to Brazilian Portuguese

[...]

Edresson Casanova, Arnaldo Candido Junior, Frederico Santos de Oliveira, Christopher Shulby, João Paulo Teixeira, Moacir Antonelli Ponti, Sandra Maria Aluísio - Show less +3 more

11 May 2020-arXiv: Audio and Speech Processing

TL;DR: The creation of publicly available resources for Brazilian Portuguese in the form of a dataset and deep learning models for end-to-end voice synthesis and it is verified that transfer learning, phonetic transcriptions and denoising are useful to train the models over the presented dataset.

...read moreread less

Abstract: Voice synthesis systems are popular in different applications, such as personal assistants, GPS applications, screen readers and accessibility tools.Voice provides a natural way for human-computer interaction. However, not all languages are on the same level when in terms of resources and systems for voice synthesis. This work consists of the creation of publicly available resources for Brazilian Portuguese in the form of a dataset and deep learning models for end-to-end voice synthesis. The dataset has 10.5 hours from a single speaker. We investigated three different architectures to perform end-to-end speech synthesis: Tacotron 1, DCTTS and Mozilla TTS. We also analysed the performance of models according to different vocoders (RTISI-LA, WaveRNN and Universal WaveRNN), phonetic transcriptions usage, transfer learning (from English) and denoising. In the proposed scenario, a model based on Mozilla TTS and RTISI-LA vocoder presented the best performance, achieving a 4.03 MOS value. We also verified that transfer learning, phonetic transcriptions and denoising are useful to train the models over the presented dataset. The obtained results are comparable to related works covering English, even while using a smaller dataset

...read moreread less

Posted Content•

Speech2Phone: A Novel and Efficient Method for Training Speaker Recognition Models

[...]

Edresson Casanova¹, Arnaldo Candido Junior, Christopher Shulby, Frederico Santos de Oliveira, Lucas Rafael Stefanel Gris, Hamilton Pereira da Silva, Sandra Maria Aluísio, Moacir Antonelli Ponti - Show less +4 more•Institutions (1)

University of São Paulo¹

25 Feb 2020-arXiv: Computation and Language

TL;DR: In this article, an efficient method for training models for speaker recognition using small or under-resourced datasets was presented, which is done using the knowledge of the reconstruction of a phoneme in the speaker's voice.

...read moreread less

Abstract: In this paper we present an efficient method for training models for speaker recognition using small or under-resourced datasets. This method requires less data than other SOTA (State-Of-The-Art) methods, e.g. the Angular Prototypical and GE2E loss functions, while achieving similar results to those methods. This is done using the knowledge of the reconstruction of a phoneme in the speaker's voice. For this purpose, a new dataset was built, composed of 40 male speakers, who read sentences in Portuguese, totaling approximately 3h. We compare the three best architectures trained using our method to select the best one, which is the one with a shallow architecture. Then, we compared this model with the SOTA method for the speaker recognition task: the Fast ResNet-34 trained with approximately 2,000 hours, using the loss functions Angular Prototypical and GE2E. Three experiments were carried out with datasets in different languages. Among these three experiments, our model achieved the second best result in two experiments and the best result in one of them. This highlights the importance of our method, which proved to be a great competitor to SOTA speaker recognition models, with 500x less data and a simpler approach.

...read moreread less