Open Access
Linguateca: um Centro de Recursos Distribuído para o Processamento Computacional da Língua Portuguesa
Diana Santos,Alberto Simões,Ana Frankenberg-Garcia,Ana Maria Viana Pinto,Anabela Barreiro,Belinda Maia,Cristina Mota,D. P. de Oliveira,Eckhard Bick,Elisabete Ranchhod,José João Almeida,Luís Miguel Cabral,Luís Costa,Luís Sarmento,Marcirio Silveira Chaves,Nuno Cardoso,Paulo M. Rocha,Rachel Aires,Rosário Silva,Rui Vilela,Susana Afonso +20 more
Reads0
Chats0
TLDR
This paper present a panorâmica da actividade da Linguateca na criacao e disponibilizacao de recursos e ferramentas for a lingua portuguesa.Abstract:
Resumo. Neste artigo apresentamos uma panorâmica da actividade da Linguateca na criacao e disponibilizacao de recursos e ferramentas para a lingua portuguesa. Comecamos por uma descricao dos objectivos e pressupostos da Linguateca e uma breve historia da sua intervencao, e finalizamos com algumas consideracoes sobre a melhor forma de prosseguir na organizacao da area.read more
Citations
More filters
Proceedings Article
The brWaC Corpus: A New Open Resource for Brazilian Portuguese
TL;DR: This work presents the construction process of a large Web corpus for Brazilian Portuguese, aiming to achieve a size comparable to the state of the art in other languages, and discusses the updated sentence-level approach for the strict removal of duplicated content.
Representação em XML da Floresta Sintáctica
TL;DR: Pretende-se neste documento descrever o processo de equipar a Floresta Sintáctica com mecanismos that facilitem a sua utilização, e pretende-se converter o seu formato actual para formatos baseados em XML.
A Linguateca e o projecto 'Processamento Computacional do português'
Diana Santos,Luís Costa +1 more
TL;DR: A Linguateca project as mentioned in this paper is a continuação natural of the project "Processamento computacional do português" (PCLP), which was initiated by the Ministério da Ciência e da Tecnologia (MCT) no Porto, Portugal.
Posted Content
Compiling and Processing Historical and Contemporary Portuguese Corpora.
TL;DR: This technical report describes the framework used for processing three large Portuguese corpora, a historical Portuguese collection containing texts written between the 16th and the early 20th century, and presents published research papers using the corpora.
Polishing the gold – how much revision do we need in treebanks?
TL;DR: The second version of PetroGold, a gold-standard treebank for the oil & gas domain in the Portuguese language, is presented and a negative impact in the intrinsic evaluation when simplifying the annotation related to prepositional verbal arguments is verified.
References
More filters
Proceedings Article
Floresta sintá(c)tica: a treebank for Portuguese
TL;DR: The creation of the annotated objects is presented in detail: preparing the text to be annotated, applying the Constraint Grammar based PALAVRAS parser, revising its output manually in a two-stage process, and carefully documenting the linguistic options.
Journal ArticleDOI
Adding Geographic Scopes to Web Resources
TL;DR: This paper presents work on automatically identifying the geographical scope of web documents, which provides the means to develop retrieval tools that take the geographical context into consideration, and makes extensive use of an ontology of geographical concepts.
Proceedings Article
Providing Internet Access to Portuguese Corpora: the AC/DC Project
Diana Santos,Eckhard Bick +1 more
TL;DR: The aims of the project Computational Processing of Portuguese are described, and the process of tagging and parsing the underlying corpora is focused on, using a Constraint Grammar parser for Portuguese.
Proceedings ArticleDOI
Evaluating CETEMPúblico, a Free Resource for Portuguese
Diana Santos,Paulo Rocha +1 more
TL;DR: A thorough evaluation of a corpus resource for Portuguese, CETEMPublico, a 180-million word newspaper corpus free for R&D in Portuguese processing, thinks that the procedures presented can be of interest for the larger NLP community.
Book ChapterDOI
Processing natural language without natural language processing
TL;DR: Recent work in a number of areas, including grammar checker development, automatic question answering, and language modeling, where state of the art accuracy is achieved using very simple methods, suggesting that the field of NLP might benefit by concentrating less on technology development and more on data acquisition.