scispace - formally typeset
Journal ArticleDOI

Effective Web data extraction with standard XML technologies

Jussi Myllymaki
- 05 Aug 2002 - 
- Vol. 39, Iss: 5, pp 635-644
TLDR
Key aspects of ANDES are that it uses XML technologies for data extraction, including Extensible HTML and Extensible Stylesheet Language Transformations, and provides access to the “deep Web”.
About
This article is published in Computer Networks.The article was published on 2002-08-05. It has received 121 citations till now. The article focuses on the topics: HTML & Information extraction.

read more

Citations
More filters
Journal ArticleDOI

The Platformization of the Web: Making Web Data Platform Ready

TL;DR: This article inquire into Facebook’s development as a platform by situating it within the transformation of social network sites into social media platforms with a historical perspective on platformization, or the rise of the platform as the dominant infrastructural and economic model of the social web and its consequences.
Journal ArticleDOI

Nearest neighbor selection for iteratively kNN imputation

TL;DR: The gray distance is much better than the Minkowski distance at both capturing the proximity relationship (or nearness) of two instances and dealing with mixed attributes, and experimental results show that the GkNN algorithm is much more efficient than existent kNN imputation methods.
Proceedings ArticleDOI

Deriving marketing intelligence from online discussion

TL;DR: It is argued that applications for mining large volumes of textual data for marketing intelligence should provide two key elements: a suite of powerful mining and visualization technologies and an interactive analysis environment which allows for rapid generation and testing of hypotheses.
Journal ArticleDOI

Synthesizing an integrated ontology

TL;DR: The Mediator Environment for Multiple Information Sources (Momis) supports semiautomatic building, annotation, and extension of domain ontologies.
Patent

Product recommendations based on collaborative filtering of user data

TL;DR: In this article, a system gathers user behavior data from a group of web retailers and/or non-web retailers, analyzes the user behaviour data to identify product recommendations for products offered by the web retailers, and provides one of the identified product recommendations in connection with a product page associated with one of those web retailers.
References
More filters

The TSIMMIS project: Integration of heterogeneous information sources

TL;DR: The Tsimmis project as mentioned in this paper is a joint project between Stanford and IBM Almaden Research Center to develop tools that facilitate the rapid integration of heterogeneous information sources that may include both structured and unstructured data.
Proceedings ArticleDOI

XWRAP: an XML-enabled wrapper construction system for Web information sources

TL;DR: The paper describes the methodology and the software development of XWRAP, an XML-enabled wrapper construction system for semi-automatic generation of wrapper programs, and introduces and develops a two-phase code generation framework.
Proceedings Article

Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources

TL;DR: The architecture for wrappers, key components of Garlic that encapsulate data sources and mediate between them and the middleware are described, which shows that Garlic wrappers can be written quickly and that the architecture is flexible enough to accommodate data sources with a variety of data models and a broad range of traditional and non-traditional query processing capabilities.
Journal ArticleDOI

Wrapper generation for semi-structured Internet sources

TL;DR: The key idea is to exploit the formatting information in pages from the source to hypothesize the underlying structure of a page and generate a wrapper that facilitates querying of a source and possibly integrating it with other sources.
Proceedings Article

Modeling Web sources for information integration

TL;DR: This work has developed methods for mapping web sources into a simple, uniform representation that makes it efficient to integrate multiple sources and makes it easy to maintain these agents and incorporate new sources as they become available.
Related Papers (5)