scispace - formally typeset
A

Altigran Soares da Silva

Researcher at Federal University of Amazonas

Publications -  132
Citations -  3029

Altigran Soares da Silva is an academic researcher from Federal University of Amazonas. The author has contributed to research in topics: Web page & Ranking (information retrieval). The author has an hindex of 27, co-authored 127 publications receiving 2909 citations. Previous affiliations of Altigran Soares da Silva include Universidade Federal de Minas Gerais.

Papers
More filters
Journal ArticleDOI

A brief survey of web data extraction tools

TL;DR: A taxonomy for characterizing Web data extraction fools is proposed, a survey of major web data extraction tools described in the literature is briefly surveyed, and a qualitative analysis of them is provided.
Journal ArticleDOI

DEByE - Date extraction by example

TL;DR: This paper presents DEByE (Data Extraction By Example), an approach to extracting data from Web sources, based on a small set of examples specified by the user, which adopts nested tables as its visual paradigm.
Proceedings ArticleDOI

A source independent framework for research paper recommendation

TL;DR: This paper proposes a novel source independent framework for research paper recommendation that requires as input only a single research paper and generates several potential queries by using terms in that paper, which are then submitted to existing Web information sources that hold research papers.
Proceedings ArticleDOI

A fast and robust method for web page template detection and removal

TL;DR: It is shown that the approach is effective for identifying terms occurring in templates - obtaining F-measure values around 0.9, and that it also boosts the accuracy of web page clustering and classification methods.
Journal ArticleDOI

Automatic generation of agents for collecting hidden web pages for data extraction

TL;DR: This method uses a pre-existing data repository for identifying the contents of these pages and takes the advantage of some patterns that can be found among Web sites to identify the navigation paths to follow.