scispace - formally typeset
Search or ask a question
Author

Sandra Maria Aluísio

Bio: Sandra Maria Aluísio is an academic researcher from University of São Paulo. The author has contributed to research in topics: Brazilian Portuguese & Sentence. The author has an hindex of 23, co-authored 138 publications receiving 1873 citations. Previous affiliations of Sandra Maria Aluísio include Spanish National Research Council.


Papers
More filters
Proceedings Article
05 Jun 2010
TL;DR: A readability assessment approach to support the process of text simplification for poor literacy readers with a number of new features, and experiment with alternative ways to model this problem using machine learning methods, namely classification, regression and ranking.
Abstract: We describe a readability assessment approach to support the process of text simplification for poor literacy readers Given an input text, the goal is to predict its readability level, which corresponds to the literacy level that is expected from the target reader: rudimentary, basic or advanced We complement features traditionally used for readability assessment with a number of new features, and experiment with alternative ways to model this problem using machine learning methods, namely classification, regression and ranking The best resulting model is embedded in an authoring tool for Text Simplification

136 citations

Proceedings ArticleDOI
05 Oct 2009
TL;DR: Facilita is introduced, an assistive technology to help lower-literacy users to understand the text content of Web applications, and generates an accessible content from Web pages automatically, using summarization and simplification techniques.
Abstract: Texts are the media content primarily available on Web sites and applications. However, this heavy use of texts creates an accessibility barrier to those who cannot read fluently in their mother tongue due to both text length and linguistic complexity. To offer an accessible alternative to these readers, shorter and simplified versions of text content should be provided. Taking that into consideration, this paper introduces Facilita, an assistive technology to help lower-literacy users to understand the text content of Web applications. Facilita generates an accessible content from Web pages automatically, using summarization and simplification techniques. It is also important to consider interface design requirements, since Facilita's target audience (the functionally illiterate) is often classified as computer illiterate as well. Thus, interaction and user interface design were developed considering the limitations and skills of the functionally illiterate.

109 citations

Proceedings ArticleDOI
16 Sep 2008
TL;DR: This study illustrates the need for text simplification to facilitate accessibility to information by poor literacy readers and potentially by people with other cognitive disabilities.
Abstract: In this paper we investigate the main linguistic phenomena that can make texts complex and how they could be simplified. We focus on a corpus analysis of simple account texts available on the web for Brazilian Portuguese and propose simplification strategies for this language. This study illustrates the need for text simplification to facilitate accessibility to information by poor literacy readers and potentially by people with other cognitive disabilities. It also highlights characteristics of simplification for Portuguese, which may differ from other languages. Such study consists of the first step towards building Brazilian Portuguese text simplification systems. One of the scenarios in which these systems could be used is that of reading electronic texts produced, e.g., by the Brazilian government or by relevant news agencies.

102 citations

Proceedings Article
06 Jun 2010
TL;DR: The PorSimples project is presented, whose aim is to develop text adaptations tools for Brazilian Portuguese that cater for both people at poor literacy levels and authors that want to produce texts for this audience.
Abstract: In this paper we present the PorSimples project, whose aim is to develop text adaptations tools for Brazilian Portuguese. The tools developed cater for both people at poor literacy levels and authors that want to produce texts for this audience. Here we describe the tools and resources developed over two years of this project and point directions for future work and collaboration. Since Portuguese and Spanish have many aspects in common, we believe our main point for collaboration lies in transferring our knowledge and experience to researches willing to developed simplification and elaboration tools for Spanish.

92 citations

01 Jan 2013
TL;DR: This work presents an evaluation of the Brazilian Portuguese LIWC dictionary for Sentiment Analysis by comparison against two other sentiment resources for Portuguese language: Opinion Lexicon and SentiLex.
Abstract: This work presents an evaluation of the Brazilian Portuguese LIWC dictionary for Sentiment Analysis. This evaluation is conducted by comparison against two other sentiment resources for Portuguese language: Opinion Lexicon and SentiLex. We conducted an intrinsic and an extrinsic evaluations and show how LIWC dictionary could be used in sentiment analysis projects.

90 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: McAlpine, Lumsden, and Acheson's reappraisal is an essential reference for the practising neurologist and the new edition makes important modification of and changes in emphasis from the edition of 1965.
Abstract: tical perspective. For instance, there are only three passing references to kuru in a book of 650 pages. This edition reflects the renewed interest in the immunological theories of multiple sclerosis. More than half the text is devoted to Professor Lumsden's analysis of the pathoIogy and, in particular, the chemical pathology of the immune response. There is a great deal of original work devoted to the chemistry and behaviour of the immunoglobulins. Much of this appears in specialist journals and one must be grateful for the critical summary provided here. Professor Lumsden unequivocally sees the key to the problem of multiple sclerosis in the study of its immunochemistry, relegating infection by a virus or a slow virus to a quite subsidiary role. The clinical studies drawing on wide practical experience help to get one's prejudices about the illness onto a more reasoned footing. The section on treatment is still sadly limited. Dr. McAlpine found little to add to the regime which he described in 1955. McAlpine, Lumsden, and Acheson's reappraisal is an essential reference for the practising neurologist and the new edition makes important modification of and changes in emphasis from the edition of 1965.

1,264 citations

Proceedings Article
19 Feb 2018
TL;DR: This article used two sources of data to train these models: the free online encyclopedia Wikipedia and data from the common crawl project, and introduced three new word analogy datasets to evaluate these word vectors, for French, Hindi and Polish.
Abstract: Distributed word representations, or word vectors, have recently been applied to many tasks in natural language processing, leading to state-of-the-art performance. A key ingredient to the successful application of these representations is to train them on very large corpora, and use these pre-trained models in downstream tasks. In this paper, we describe how we trained such high quality word representations for 157 languages. We used two sources of data to train these models: the free online encyclopedia Wikipedia and data from the common crawl project. We also introduce three new word analogy datasets to evaluate these word vectors, for French, Hindi and Polish. Finally, we evaluate our pre-trained word vectors on 10 languages for which evaluation datasets exists, showing very strong performance compared to previous models.

831 citations

Proceedings Article
21 Jun 2014
TL;DR: A deep neural network is proposed that learns character-level representation of words and associate them with usual word representations to perform POS tagging and produces state-of-the-art POS taggers for two languages.
Abstract: Distributed word representations have recently been proven to be an invaluable resource for NLP. These representations are normally learned using neural networks and capture syntactic and semantic information about words. Information about word morphology and shape is normally ignored when learning word representations. However, for tasks like part-of-speech tagging, intra-word information is extremely useful, specially when dealing with morphologically rich languages. In this paper, we propose a deep neural network that learns character-level representation of words and associate them with usual word representations to perform POS tagging. Using the proposed approach, while avoiding the use of any handcrafted feature, we produce state-of-the-art POS taggers for two languages: English, with 97.32% accuracy on the Penn Treebank WSJ corpus; and Portuguese, with 97.47% accuracy on the Mac-Morpho corpus, where the latter represents an error reduction of 12.2% on the best previous known result.

627 citations

Journal ArticleDOI
TL;DR: This work is the first to design automatic metrics that are effective for tuning and evaluating simplification systems, which will facilitate iterative development for this task.
Abstract: Most recent sentence simplification systems use basic machine translation models to learn lexical and syntactic paraphrases from a manually simplified parallel corpus. These methods are limited by the quality and quantity of manually simplified corpora, which are expensive to build. In this paper, we conduct an in-depth adaptation of statistical machine translation to perform text simplification, taking advantage of large-scale paraphrases learned from bilingual texts and a small amount of manual simplifications with multiple references. Our work is the first to design automatic metrics that are effective for tuning and evaluating simplification systems, which will facilitate iterative development for this task.

452 citations