scispace - formally typeset
Open Access

Linguateca: um Centro de Recursos Distribuído para o Processamento Computacional da Língua Portuguesa

Reads0
Chats0
TLDR
This paper present a panorâmica da actividade da Linguateca na criacao e disponibilizacao de recursos e ferramentas for a lingua portuguesa.
Abstract
Resumo. Neste artigo apresentamos uma panorâmica da actividade da Linguateca na criacao e disponibilizacao de recursos e ferramentas para a lingua portuguesa. Comecamos por uma descricao dos objectivos e pressupostos da Linguateca e uma breve historia da sua intervencao, e finalizamos com algumas consideracoes sobre a melhor forma de prosseguir na organizacao da area.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings Article

The brWaC Corpus: A New Open Resource for Brazilian Portuguese

TL;DR: This work presents the construction process of a large Web corpus for Brazilian Portuguese, aiming to achieve a size comparable to the state of the art in other languages, and discusses the updated sentence-level approach for the strict removal of duplicated content.

Representação em XML da Floresta Sintáctica

TL;DR: Pretende-se neste documento descrever o processo de equipar a Floresta Sintáctica com mecanismos that facilitem a sua utilização, e pretende-se converter o seu formato actual para formatos baseados em XML.

A Linguateca e o projecto 'Processamento Computacional do português'

Diana Santos, +1 more
TL;DR: A Linguateca project as mentioned in this paper is a continuação natural of the project "Processamento computacional do português" (PCLP), which was initiated by the Ministério da Ciência e da Tecnologia (MCT) no Porto, Portugal.
Posted Content

Compiling and Processing Historical and Contemporary Portuguese Corpora.

TL;DR: This technical report describes the framework used for processing three large Portuguese corpora, a historical Portuguese collection containing texts written between the 16th and the early 20th century, and presents published research papers using the corpora.

Polishing the gold – how much revision do we need in treebanks?

TL;DR: The second version of PetroGold, a gold-standard treebank for the oil & gas domain in the Portuguese language, is presented and a negative impact in the intrinsic evaluation when simplifying the annotation related to prepositional verbal arguments is verified.
References
More filters
Proceedings Article

Floresta sintá(c)tica: a treebank for Portuguese

TL;DR: The creation of the annotated objects is presented in detail: preparing the text to be annotated, applying the Constraint Grammar based PALAVRAS parser, revising its output manually in a two-stage process, and carefully documenting the linguistic options.
Journal ArticleDOI

Adding Geographic Scopes to Web Resources

TL;DR: This paper presents work on automatically identifying the geographical scope of web documents, which provides the means to develop retrieval tools that take the geographical context into consideration, and makes extensive use of an ontology of geographical concepts.
Proceedings Article

Providing Internet Access to Portuguese Corpora: the AC/DC Project

TL;DR: The aims of the project Computational Processing of Portuguese are described, and the process of tagging and parsing the underlying corpora is focused on, using a Constraint Grammar parser for Portuguese.
Proceedings ArticleDOI

Evaluating CETEMPúblico, a Free Resource for Portuguese

TL;DR: A thorough evaluation of a corpus resource for Portuguese, CETEMPublico, a 180-million word newspaper corpus free for R&D in Portuguese processing, thinks that the procedures presented can be of interest for the larger NLP community.
Book ChapterDOI

Processing natural language without natural language processing

TL;DR: Recent work in a number of areas, including grammar checker development, automatic question answering, and language modeling, where state of the art accuracy is achieved using very simple methods, suggesting that the field of NLP might benefit by concentrating less on technology development and more on data acquisition.