Journal ArticleDOI
The Ariadne approach to Web- based information integration
Craig A. Knoblock,Steven Minton,José Luis Ambite,Naveen Ashish,Ion Muslea,Andrew Philpot,Sheila Tejada +6 more
TLDR
This work has developed methods for mapping web sources into a uniform representation that makes it simple and efficient to integrate multiple sources and makes it easy to maintain these agents and incorporate new sources as they become available.Abstract:
The Web is based on a browsing paradigm that makes it difficult to retrieve and integrate data from multiple sites. Today, the only way to do this is to build specialized applications, which are time-consuming to develop and difficult to maintain. We have addressed this problem by creating the technology and tools for rapidly constructing information agents that extract, query, and integrate data from web sources. Our approach is based on a uniform representation that makes it simple and efficient to integrate multiple sources. Instead of building specialized algorithms for handling web sources, we have developed methods for mapping web sources into this uniform representation. This approach builds on work from knowledge representation, databases, machine learning and automated planning. The resulting system, called Ariadne, makes it fast and easy to build new information agents that access existing web sources. Ariadne also makes it easy to maintain these agents and incorporate new sources as they become available.read more
Citations
More filters
Journal ArticleDOI
Learning object identification rules for information integration
TL;DR: An object identification system called Active Atlas, which compares the objects' shared attributes in order to identify matching objects using exact text match, and achieves higher accuracy and require less user involvement than previous methods across various application domains.
Proceedings ArticleDOI
Learning domain-independent string transformation weights for high accuracy object identification
TL;DR: Extensions to the Active Atlas system are discussed, which allow it to learn to tailor the weights of a set of general transformations to a specific application domain through limited user input, and demonstrate that this approach achieves higher accuracy and requires less user involvement than previous methods.
Proceedings Article
Estimated-regression planning for interactions with web services
TL;DR: A preliminary implementation of the proposed Estimated-regression planners for web-services domain requires extending classical notations in various ways, and further tests are underway.
Patent
Automatic method and system for formulating and transforming representations of context used by information services
TL;DR: In this article, an information retrieval system for automatically retrieving information related to the context of an active task being manipulated by a user is presented, where the system observes the operation of the active task and user interactions and utilizes predetermined criteria to generate a context representation.
Journal ArticleDOI
Active learning with multiple views
TL;DR: In this article, a multi-view active learning framework is proposed, in which there are several disjoint subsets of features (views), each of which is sufficient to learn the target concept.
References
More filters
Journal ArticleDOI
Mediators in the architecture of future information systems
TL;DR: A mediator is a software module that exploits encoded knowledge about certain sets or subsets of data to create information for a higher layer of applications as discussed by the authors, which simplifies, abstracts, reduces, merges, and explains data.
Proceedings Article
Fast planning through planning graph analysis
Avrim Blum,Merrick L. Furst +1 more
TL;DR: A new approach to planning in STRIPS-like domains based on constructing and analyzing a compact structure the authors call a Planning Graph is introduced, and a new planner, Graphplan, is described that uses this paradigm.
Proceedings Article
Querying Heterogeneous Information Sources Using Source Descriptions
TL;DR: The Information Manifold is described, an implemented system that provides uniform access to a heterogeneous collection of more than 100 information sources, many of them on the WWW, and algorithms that use the source descriptions to prune effciently the set of information sources for a given query are described.
Proceedings Article
Wrapper induction for information extraction
TL;DR: This work introduces wrapper induction, a method for automatically constructing wrappers, and identifies hlrt, a wrapper class that is e ciently learnable, yet expressive enough to handle 48% of a recently surveyed sample of Internet resources.
Journal ArticleDOI
Learning Information Extraction Rules for Semi-Structured and Free Text
TL;DR: WHISK is designed to handle text styles ranging from highly structured to free text, including text that is neither rigidly formatted nor composed of grammatical sentences, and can also handle extraction from free text such as news stories.