scispace - formally typeset
Journal ArticleDOI

The Ariadne approach to Web- based information integration

TLDR
This work has developed methods for mapping web sources into a uniform representation that makes it simple and efficient to integrate multiple sources and makes it easy to maintain these agents and incorporate new sources as they become available.
Abstract
The Web is based on a browsing paradigm that makes it difficult to retrieve and integrate data from multiple sites. Today, the only way to do this is to build specialized applications, which are time-consuming to develop and difficult to maintain. We have addressed this problem by creating the technology and tools for rapidly constructing information agents that extract, query, and integrate data from web sources. Our approach is based on a uniform representation that makes it simple and efficient to integrate multiple sources. Instead of building specialized algorithms for handling web sources, we have developed methods for mapping web sources into this uniform representation. This approach builds on work from knowledge representation, databases, machine learning and automated planning. The resulting system, called Ariadne, makes it fast and easy to build new information agents that access existing web sources. Ariadne also makes it easy to maintain these agents and incorporate new sources as they become available.

read more

Citations
More filters
Journal ArticleDOI

Learning object identification rules for information integration

TL;DR: An object identification system called Active Atlas, which compares the objects' shared attributes in order to identify matching objects using exact text match, and achieves higher accuracy and require less user involvement than previous methods across various application domains.
Proceedings ArticleDOI

Learning domain-independent string transformation weights for high accuracy object identification

TL;DR: Extensions to the Active Atlas system are discussed, which allow it to learn to tailor the weights of a set of general transformations to a specific application domain through limited user input, and demonstrate that this approach achieves higher accuracy and requires less user involvement than previous methods.
Proceedings Article

Estimated-regression planning for interactions with web services

TL;DR: A preliminary implementation of the proposed Estimated-regression planners for web-services domain requires extending classical notations in various ways, and further tests are underway.
Patent

Automatic method and system for formulating and transforming representations of context used by information services

TL;DR: In this article, an information retrieval system for automatically retrieving information related to the context of an active task being manipulated by a user is presented, where the system observes the operation of the active task and user interactions and utilizes predetermined criteria to generate a context representation.
Journal ArticleDOI

Active learning with multiple views

TL;DR: In this article, a multi-view active learning framework is proposed, in which there are several disjoint subsets of features (views), each of which is sufficient to learn the target concept.
References
More filters
Journal ArticleDOI

Mediators in the architecture of future information systems

TL;DR: A mediator is a software module that exploits encoded knowledge about certain sets or subsets of data to create information for a higher layer of applications as discussed by the authors, which simplifies, abstracts, reduces, merges, and explains data.
Proceedings Article

Fast planning through planning graph analysis

TL;DR: A new approach to planning in STRIPS-like domains based on constructing and analyzing a compact structure the authors call a Planning Graph is introduced, and a new planner, Graphplan, is described that uses this paradigm.
Proceedings Article

Querying Heterogeneous Information Sources Using Source Descriptions

TL;DR: The Information Manifold is described, an implemented system that provides uniform access to a heterogeneous collection of more than 100 information sources, many of them on the WWW, and algorithms that use the source descriptions to prune effciently the set of information sources for a given query are described.
Proceedings Article

Wrapper induction for information extraction

TL;DR: This work introduces wrapper induction, a method for automatically constructing wrappers, and identifies hlrt, a wrapper class that is e ciently learnable, yet expressive enough to handle 48% of a recently surveyed sample of Internet resources.
Journal ArticleDOI

Learning Information Extraction Rules for Semi-Structured and Free Text

TL;DR: WHISK is designed to handle text styles ranging from highly structured to free text, including text that is neither rigidly formatted nor composed of grammatical sentences, and can also handle extraction from free text such as news stories.
Related Papers (5)