Journal ArticleDOI
Effective Web data extraction with standard XML technologies
TLDR
Key aspects of ANDES are that it uses XML technologies for data extraction, including Extensible HTML and Extensible Stylesheet Language Transformations, and provides access to the “deep Web”.About:
This article is published in Computer Networks.The article was published on 2002-08-05. It has received 121 citations till now. The article focuses on the topics: HTML & Information extraction.read more
Citations
More filters
Journal ArticleDOI
The Platformization of the Web: Making Web Data Platform Ready
TL;DR: This article inquire into Facebook’s development as a platform by situating it within the transformation of social network sites into social media platforms with a historical perspective on platformization, or the rise of the platform as the dominant infrastructural and economic model of the social web and its consequences.
Journal ArticleDOI
Nearest neighbor selection for iteratively kNN imputation
TL;DR: The gray distance is much better than the Minkowski distance at both capturing the proximity relationship (or nearness) of two instances and dealing with mixed attributes, and experimental results show that the GkNN algorithm is much more efficient than existent kNN imputation methods.
Proceedings ArticleDOI
Deriving marketing intelligence from online discussion
Natalie S. Glance,Matthew Hurst,Kamal Nigam,Matthew Siegler,Robert Stockton,Takashi Tomokiyo +5 more
TL;DR: It is argued that applications for mining large volumes of textual data for marketing intelligence should provide two key elements: a suite of powerful mining and visualization technologies and an interactive analysis environment which allows for rapid generation and testing of hypotheses.
Journal ArticleDOI
Synthesizing an integrated ontology
TL;DR: The Mediator Environment for Multiple Information Sources (Momis) supports semiautomatic building, annotation, and extension of domain ontologies.
Patent
Product recommendations based on collaborative filtering of user data
TL;DR: In this article, a system gathers user behavior data from a group of web retailers and/or non-web retailers, analyzes the user behaviour data to identify product recommendations for products offered by the web retailers, and provides one of the identified product recommendations in connection with a product page associated with one of those web retailers.
References
More filters
The TSIMMIS project: Integration of heterogeneous information sources
Sudarshan S. Chawathe,Hector Garcia-Molina,Joachim Hammer,Kelly Ireland,Yannis Papakonstantinou,Jeffrey D. Ullman,Jennifer Widom +6 more
TL;DR: The Tsimmis project as mentioned in this paper is a joint project between Stanford and IBM Almaden Research Center to develop tools that facilitate the rapid integration of heterogeneous information sources that may include both structured and unstructured data.
Proceedings ArticleDOI
XWRAP: an XML-enabled wrapper construction system for Web information sources
Ling Liu,Calton Pu,W. Han +2 more
TL;DR: The paper describes the methodology and the software development of XWRAP, an XML-enabled wrapper construction system for semi-automatic generation of wrapper programs, and introduces and develops a two-phase code generation framework.
Proceedings Article
Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources
Mary Roth,Peter Schwarz +1 more
TL;DR: The architecture for wrappers, key components of Garlic that encapsulate data sources and mediate between them and the middleware are described, which shows that Garlic wrappers can be written quickly and that the architecture is flexible enough to accommodate data sources with a variety of data models and a broad range of traditional and non-traditional query processing capabilities.
Journal ArticleDOI
Wrapper generation for semi-structured Internet sources
Naveen Ashish,Craig A. Knoblock +1 more
TL;DR: The key idea is to exploit the formatting information in pages from the source to hypothesize the underlying structure of a page and generate a wrapper that facilitates querying of a source and possibly integrating it with other sources.
Proceedings Article
Modeling Web sources for information integration
Craig A. Knoblock,Steven Minton,José Luis Ambite,Naveen Ashish,Pragnesh Jay Modi,Ion Muslea,Andrew Philpot,Sheila Tejada +7 more
TL;DR: This work has developed methods for mapping web sources into a simple, uniform representation that makes it efficient to integrate multiple sources and makes it easy to maintain these agents and incorporate new sources as they become available.