Effective Web data extraction with standard XML technologies

doi:10.1016/S1389-1286(02)00214-1

Journal ArticleDOI

Effective Web data extraction with standard XML technologies

Jussi Myllymaki

- 05 Aug 2002 -

Computer Networks

- Vol. 39, Iss: 5, pp 635-644

TLDR

Key aspects of ANDES are that it uses XML technologies for data extraction, including Extensible HTML and Extensible Stylesheet Language Transformations, and provides access to the “deep Web”.

About:

This article is published in Computer Networks.The article was published on 2002-08-05. It has received 121 citations till now. The article focuses on the topics: HTML & Information extraction.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

The Platformization of the Web: Making Web Data Platform Ready

Anne Helmond

- 30 Sep 2015 -

Social media and society

TL;DR: This article inquire into Facebook’s development as a platform by situating it within the transformation of social network sites into social media platforms with a historical perspective on platformization, or the rise of the platform as the dominant infrastructural and economic model of the social web and its consequences.

...read moreread less

Journal ArticleDOI

Nearest neighbor selection for iteratively kNN imputation

Shichao Zhang

- 01 Nov 2012 -

Journal of Systems and Software

TL;DR: The gray distance is much better than the Minkowski distance at both capturing the proximity relationship (or nearness) of two instances and dealing with mixed attributes, and experimental results show that the GkNN algorithm is much more efficient than existent kNN imputation methods.

...read moreread less

Proceedings ArticleDOI

Deriving marketing intelligence from online discussion

Natalie S. Glance, +5 more

TL;DR: It is argued that applications for mining large volumes of textual data for marketing intelligence should provide two key elements: a suite of powerful mining and visualization technologies and an interactive analysis environment which allows for rapid generation and testing of hypotheses.

...read moreread less

Journal ArticleDOI

Synthesizing an integrated ontology

Domenico Beneventano, +3 more

- 01 Sep 2003 -

IEEE Internet Computing

TL;DR: The Mediator Environment for Multiple Information Sources (Momis) supports semiautomatic building, annotation, and extension of domain ontologies.

...read moreread less

Patent

Product recommendations based on collaborative filtering of user data

Michael Stoppelman

TL;DR: In this article, a system gathers user behavior data from a group of web retailers and/or non-web retailers, analyzes the user behaviour data to identify product recommendations for products offered by the web retailers, and provides one of the identified product recommendations in connection with a product page associated with one of those web retailers.

...read moreread less

Collapse

References

PDF

Open Access

More filters

The TSIMMIS project: Integration of heterogeneous information sources

Sudarshan S. Chawathe, +6 more

TL;DR: The Tsimmis project as mentioned in this paper is a joint project between Stanford and IBM Almaden Research Center to develop tools that facilitate the rapid integration of heterogeneous information sources that may include both structured and unstructured data.

...read moreread less

Proceedings ArticleDOI

XWRAP: an XML-enabled wrapper construction system for Web information sources

Ling Liu, +2 more

TL;DR: The paper describes the methodology and the software development of XWRAP, an XML-enabled wrapper construction system for semi-automatic generation of wrapper programs, and introduces and develops a two-phase code generation framework.

...read moreread less

Proceedings Article

Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources

Mary Roth, +1 more

TL;DR: The architecture for wrappers, key components of Garlic that encapsulate data sources and mediate between them and the middleware are described, which shows that Garlic wrappers can be written quickly and that the architecture is flexible enough to accommodate data sources with a variety of data models and a broad range of traditional and non-traditional query processing capabilities.

...read moreread less

Journal ArticleDOI

Wrapper generation for semi-structured Internet sources

Naveen Ashish, +1 more

TL;DR: The key idea is to exploit the formatting information in pages from the source to hypothesize the underlying structure of a page and generate a wrapper that facilitates querying of a source and possibly integrating it with other sources.

...read moreread less

Proceedings Article

Modeling Web sources for information integration

Craig A. Knoblock, +7 more

TL;DR: This work has developed methods for mapping web sources into a simple, uniform representation that makes it efficient to integrate multiple sources and makes it easy to maintain these agents and incorporate new sources as they become available.

...read moreread less

Effective Web data extraction with standard XML technologies

Citations

The Platformization of the Web: Making Web Data Platform Ready

Nearest neighbor selection for iteratively kNN imputation

Deriving marketing intelligence from online discussion

Synthesizing an integrated ontology

Product recommendations based on collaborative filtering of user data

References

The TSIMMIS project: Integration of heterogeneous information sources

XWRAP: an XML-enabled wrapper construction system for Web information sources

Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources

Wrapper generation for semi-structured Internet sources

Modeling Web sources for information integration

Related Papers (5)

XWRAP: an XML-enabled wrapper construction system for Web information sources

A brief survey of web data extraction tools

RoadRunner: Towards Automatic Data Extraction from Large Web Sites

The TSIMMIS project: Integration of heterogeneous information sources

Visual Web Information Extraction with Lixto