Linguateca: um Centro de Recursos Distribuído para o Processamento Computacional da Língua Portuguesa

Open Access

Linguateca: um Centro de Recursos Distribuído para o Processamento Computacional da Língua Portuguesa

Chats0

TLDR

This paper present a panorâmica da actividade da Linguateca na criacao e disponibilizacao de recursos e ferramentas for a lingua portuguesa.

Abstract:

Resumo. Neste artigo apresentamos uma panorâmica da actividade da Linguateca na criacao e disponibilizacao de recursos e ferramentas para a lingua portuguesa. Comecamos por uma descricao dos objectivos e pressupostos da Linguateca e uma breve historia da sua intervencao, e finalizamos com algumas consideracoes sobre a melhor forma de prosseguir na organizacao da area.

Citations

PDF

Open Access

More filters

Proceedings Article

The brWaC Corpus: A New Open Resource for Brazilian Portuguese

Jorge A. Wagner Filho, +3 more

TL;DR: This work presents the construction process of a large Web corpus for Brazilian Portuguese, aiming to achieve a size comparable to the state of the art in other languages, and discusses the updated sentence-level approach for the strict removal of duplicated content.

...read moreread less

Representação em XML da Floresta Sintáctica

Rui Vilela, +3 more

TL;DR: Pretende-se neste documento descrever o processo de equipar a Floresta Sintáctica com mecanismos that facilitem a sua utilização, e pretende-se converter o seu formato actual para formatos baseados em XML.

...read moreread less

A Linguateca e o projecto 'Processamento Computacional do português'

Diana Santos, +1 more

TL;DR: A Linguateca project as mentioned in this paper is a continuação natural of the project "Processamento computacional do português" (PCLP), which was initiated by the Ministério da Ciência e da Tecnologia (MCT) no Porto, Portugal.

...read moreread less

Posted Content

Compiling and Processing Historical and Contemporary Portuguese Corpora.

Marcos Zampieri

- 02 Oct 2017 -

arXiv: Computation and Language

TL;DR: This technical report describes the framework used for processing three large Portuguese corpora, a historical Portuguese collection containing texts written between the 16th and the early 20th century, and presents published research papers using the corpora.

...read moreread less

Polishing the gold – how much revision do we need in treebanks?

Elvis Maranhão De Souza, +1 more

TL;DR: The second version of PetroGold, a gold-standard treebank for the oil & gas domain in the Portuguese language, is presented and a negative impact in the intrinsic evaluation when simplifying the annotation related to prepositional verbal arguments is verified.

...read moreread less

References

PDF

Open Access

More filters

Proceedings Article

Floresta sintá(c)tica: a treebank for Portuguese

Susana Afonso, +3 more

TL;DR: The creation of the annotated objects is presented in detail: preparing the text to be annotated, applying the Constraint Grammar based PALAVRAS parser, revising its output manually in a two-stage process, and carefully documenting the linguistic options.

...read moreread less

Journal ArticleDOI

Adding Geographic Scopes to Web Resources

Mário J. Silva, +4 more

TL;DR: This paper presents work on automatically identifying the geographical scope of web documents, which provides the means to develop retrieval tools that take the geographical context into consideration, and makes extensive use of an ontology of geographical concepts.

...read moreread less

Proceedings Article

Providing Internet Access to Portuguese Corpora: the AC/DC Project

Diana Santos, +1 more

TL;DR: The aims of the project Computational Processing of Portuguese are described, and the process of tagging and parsing the underlying corpora is focused on, using a Constraint Grammar parser for Portuguese.

...read moreread less

Proceedings ArticleDOI

Evaluating CETEMPúblico, a Free Resource for Portuguese

Diana Santos, +1 more

TL;DR: A thorough evaluation of a corpus resource for Portuguese, CETEMPublico, a 180-million word newspaper corpus free for R&D in Portuguese processing, thinks that the procedures presented can be of interest for the larger NLP community.

...read moreread less

Book ChapterDOI

Processing natural language without natural language processing

Eric D. Brill

TL;DR: Recent work in a number of areas, including grammar checker development, automatic question answering, and language modeling, where state of the art accuracy is achieved using very simple methods, suggesting that the field of NLP might benefit by concentrating less on technology development and more on data acquisition.

...read moreread less

Collapse