scispace - formally typeset
Search or ask a question

Showing papers on "Semantic Web Stack published in 1997"


Proceedings ArticleDOI
03 Nov 1997
TL;DR: This paper defines Web mining and presents an overview of the various research issues, techniques, and development efforts, and briefly describes WEBMINER, a system for Web usage mining, and concludes the paper by listing research issues.
Abstract: Application of data mining techniques to the World Wide Web, referred to as Web mining, has been the focus of several recent research projects and papers. However, there is no established vocabulary, leading to confusion when comparing research efforts. The term Web mining has been used in two distinct ways. The first, called Web content mining in this paper, is the process of information discovery from sources across the World Wide Web. The second, called Web usage mining, is the process of mining for user browsing and access patterns. We define Web mining and present an overview of the various research issues, techniques, and development efforts. We briefly describe WEBMINER, a system for Web usage mining, and conclude the paper by listing research issues.

1,365 citations


Proceedings ArticleDOI
08 Feb 1997
TL;DR: ShopBot, a fully-implemented, domainindependent comparison-shopping agent that relies on a combination of heuristic search, pattern matching, and inductive learning techniques, enables users to both find superior prices and substantially reduce Web shopping time.
Abstract: The WorldWideWeb is less agent-friendly than we might hope. Most information on the Web is presented in loosely structured natural language text with no agent-readable semantics. HTML annotations structure the display of Web pages, but provide virtually no insight into their content. Thus, the designers of intelligent Web agents need to address the following questions: (1) To what extent can an agent understand information published at Web sites? (2) Is the agent's understanding sufficient to provide genuinely useful assistance to users? (3) Is site-specific hand-coding necessary, or can the agent automatically extract information from unfamiliar Web sites? (4) What aspects of the Web facilitate this competence? In this paper we investigate these issues with a case study using ShopBot, a fully-implemented, domainindependent comparison-shopping agent. Given the home pages of several online stores, ShopBot autonomously learns how to shop at those vendors. After learning, it is able to speedily visit over a dozen software and CD vendors, extract product information, and summarize the results for the user. Preliminary studies show that ShopBot enables users to both find superior prices and substantially reduce Web shopping time. Remarkably, ShopBot achieves this performance without sophisticated natural language processing, and requires only minimal knowledge about different product domains. Instead, ShopBot relies on a combination of heuristic search, pattern matching, and inductive learning techniques. PERMISSION TO COPY WITHOUT FEE ALL OR OR PART OF THIS MATERIAL IS GRANTED PROVIDED THAT THE COPIES ARE NOT MADE OR DISTRIBUTED FOR DIRECT COMMERCIAL ADVANTAGE, THE ACM copyRIGHT NOTICE AND THE TITLE OF THE PUBLICATION AND ITS DATE APPEAR, AND NOTICE IS GIVEN THAT COPYING IS BY PERMISSION OF ACM. To COPY OTHERWISE, OR TO REPUBLISH, REQUIRES A FEE AND/OR SPECIFIC PERMISSION. AGENTS '97 CONFERENCE PROCEEDINGS, COPYRIGHT 1997 ACM.

593 citations


Proceedings ArticleDOI
08 Feb 1997
TL;DR: SHOE, a set of Simple HTML Ontology Extensions which allow World-Wide Web authors to annotate their pages with semantic knowledge such as “I am a graduate student” or “This person is my graduate advisor”, is described.
Abstract: This paper describes SHOE, a set of Simple HTML Ontology Extensions which allow World-Wide Web authors to annotate their pages with semantic knowledge such as “I am a graduate student” or “This person is my graduate advisor”. These annotations are expressed in terms of ontological knowledge which can be generated by using or extending standard ontologies available on the Web. This makes it possible to ask Web agent queries such as “Find me all graduate students in Maryland who are working on a project funded by DoD initiative 123-4567”, instead of simplistic keyword searches enabled by current search engines. We have also developed a web-crawling agent, Expos´ e, which interns SHOE knowledge from web documents, making these kinds queries a reality.

246 citations


Proceedings Article
25 Aug 1997
TL;DR: A set of languages for managing and restructuring data coming from the World Wide Web is introduced, inspired to the structures typically present in Web sites, and a specific data model, called the ARANEUS Data Model, is presented, to describe the scheme of a Web hypertext, in the spirit of databases.
Abstract: The paper discusses the issue of views in the Web context. We introduce a set of languages for managing and restructuring data coming from the World Wide Web. We present a specific data model, called the ARANEUS Data Model, inspired to the structures typically present in Web sites. The model allows us to describe the scheme of a Web hypertext, in the spirit of databases. Based on the data model, we develop two languages to support a sophisticate view definition process: the first, called ULIXES, is used to build database views of the Web, which can then be analyzed and integrated using database techniques; the second, called PENELOPE, allows the definition of derived Web hypertexts from relational views. This can be used to generate hypertextual views over the Web.

234 citations


Proceedings Article
14 Aug 1997
TL;DR: Webfoot, a preprocessor that parses web pages into logically coherent segments based on page layout cues, is introduced and passed on to CRYSTAL, an NLP system that learns text extraction rules from example.
Abstract: There is a wealth of information to be mined from narrative text on the World Wide Web. Unfortunately, standard natural language processing (NLP) extraction techniques expect full, grammatical sentences, and perform poorly on the choppy sentence fragments that are often found on web pages. This paper1 introduces Webfoot, a preprocessor that parses web pages into logically coherent segments based on page layout cues. Output from Webfoot is then passed on to CRYSTAL, an NLP system that learns text extraction rules from example. Webfoot and CRYSTAL transform the text into a formal representation that is equivalent to relational database entries. This is a necessary first step for knowledge discovery and other automated analysis of free text.

231 citations



Journal ArticleDOI
01 Sep 1997
TL;DR: This paper presents a unique approach that tightly integrates searching and browsing in a manner that improves both paradigms and is embodied in WebCutter, a client-server system fully integrated with Web software.
Abstract: Conventional information discovery tools can be classified as being either search oriented or browse oriented. In the context of the Web, search-oriented tools employ text-analysis techniques to find Web documents based on user-specified queries, whereas browse-oriented ones employ site mapping and visualization techniques to allow users to navigate through the Web. This paper presents a unique approach that tightly integrates searching and browsing in a manner that improves both paradigms. When browsing is the primary task, it enables semantic content-based tailoring of Web maps in both the generation as well as the visualization phases. When search is the primary task, it enables to contextualize the results by augmenting them with the documents' neighborhoods. The approach is embodied in WebCutter, a client-server system fully integrated with Web software. WebCutter consists of a map generator running off a standard Web server and a map visualization client implemented as a Java applet runnable from any standard Web-browser and requiring no installation or external plug-in application. WebCutter is in beta stage and is in the process of being integrated into the Lotus Domino.Applications product line.

112 citations


Proceedings ArticleDOI
15 Apr 1997
TL;DR: A technique to form focus+context views of WorldWide Web nodes that shows the immediate neighborhood of the current node and its position with respect to the important (landmark) nodes in the information space.
Abstract: With the explosive growth of information that is available on the World-Wide Web, it is very easy for the user to get lost in hyperspace. When the user feels lost, some idea of the position of the current node in the overall information space will help to orient the user. Therefore we have developed a technique to form focus+context views of WorldWide Web nodes. The view shows the immediate neighborhood of the current node and its position with respect to the important (landmark) nodes in the information space. The views have been used to enhance a Web search engine. We have also used the landmark nodes and the focus+ context views in forming overview diagrams of Web sites.

79 citations


Journal ArticleDOI

53 citations


Proceedings ArticleDOI
15 Apr 1997
TL;DR: How queries can be used as the basis for navigation is discussed, and it is argued that this is integral to current efforts to integrate hypermedia and information retrieval.
Abstract: This paper discusses an approach to navigation based on queries made possible by a semantic hypermedia architecture. Navigation via query offers an augmented browsing capacity based on measures of semantic closeness between terms in an index space that models the classification of artefacts within a museum collection management system. The paper discusses some of the possibilities that automatic traversal of relationships in the index space holds for hybrid query/navigation tools, such as navigation via similarity and query generalisation. The example scenario suggests that, although these tools are implemented by complex queries, they fit into a browsing, rather than an analytical style of access. Such hybrid navigation tools are capable of overcoming some of the limitations of manual browsing and contributing to a smooth transition between browsing and query. A prototype implementation of the architecture is described, along with details of a social history application with three dimensions of classification schema in the index space. The paper discusses how queries can be used as the basis for navigation, and argues that this is integral to current efforts to integrate hypermedia and information retrieval.

50 citations




Proceedings ArticleDOI
02 Jul 1997
TL;DR: This paper shows how semantic structuring of documents can be efficiently defined using SGML syntax and has been presented as a relevant example for handling semantic structured documents.
Abstract: This paper presents a formal model for an explicit description of the semantic structure which implicitly exists with documents. This model relies on content meaning description of document elements. Meaning representation is distributed in the overall architecture model: it binds a semantic structure, a logical structure of documents and a domain model. The semantic structure contains two levels of description: meaning representation of information units. The description logic formalism is used to represent semantics of document elements and document rhetorical organisation. This paper shows how semantic structuring of documents can be efficiently defined using SGML syntax. Using this documents structuring norm, one can define two levels of description: generic semantic structure (vs. Document Type Definition) and specific semantic structure (vs. instantiated document) in order to define an abstract interface to the information stored in documents. The medical patient record has been presented as a relevant example for handling semantic structured documents.

01 Jan 1997
TL;DR: These tools utilize robust language processing techniques to generate multi-purpose data structures called LEXICAL WEBS, used in the system TEXTRACT, an automated semantic indexing program designed to parse, index, and hyperlink electronic documents.
Abstract: In this paper, we describe linguistically sophisticated tools for the automatic annotation and navigation of on-line documents. Creation of these tools relies on research into finite-state technologies for the design and development of lexicallyintensive semantic indexing, shallow semantic understanding, and content abstraction techniques for texts. These tools utilize robust language processing techniques to generate multi-purpose data structures called LEXICAL WEBS, used in the system TEXTRACT, an automated semantic indexing program designed to parse, index, and hyperlink electronic documents.


Posted Content
TL;DR: This paper outlines the Context Mediation abductive framework and presents the context Mediation procedure, and shows how this procedure is implemented using a constraint store and a constraint propagation model.
Abstract: The Context Interchange Project is studying the semantic integration of disparate, i.e. distributed and heterogeneous, information sources. In this paper, we illustrate the principle of Context Mediation using examples from an application of our prototype. We outline the Context Mediation abductive framework and present the Context Mediation procedure. We show how this procedure is implemented using a constraint store and a constraint propagation model. We discuss the relationship between our approach and previous work on semantic query optimization.

Journal ArticleDOI
01 Sep 1997
TL;DR: This paper proposes a well-worked form interaction abstraction that alleviates a significant Web deficiency - supporting truly interactive and dynamic form-based input and fitting within the interaction and usage model of the Web.
Abstract: The phenomenal interest and growth of the World Wide Web as an application server has pushed the Web model to its limits. Specifically, the Web offers limited interactivity and versatility as a platform for networked applications. One major challenge for the HCI community is to determine how to improve the human-computer interface for Web-based applications. This paper focuses on a significant Web deficiency - supporting truly interactive and dynamic form-based input. We propose a well-worked form interaction abstraction that alleviates this Web deficiency. We describe how the abstraction is seamlessly integrated into the Web framework by leveraging on the virtues of the Web and fitting within the interaction and usage model of the Web. © 1997 Published by Elsevier Science B.V.

Book ChapterDOI
30 May 1997

03 Oct 1997
TL;DR: Glass or ceramic-to-metal composites or seals wherein the glass or ceramic is bonded to a copper base alloy having a thin film of Al2O3 on its surface may be invented.
Abstract: Glass or ceramic-to-metal composites or seals wherein the glass or ceramic is bonded to a copper base alloy having a thin film of Al2O3 on its surface. The Al2O3 film comprises at least 10 percent, up to 100 percent, of the oxide film thickness on the metal. The copper base alloy preferably contains 2 to 10 percent aluminum with C.D.A. Alloy 638 being the most preferred alloy. The invention also includes the process of bonding the glasses or ceramics to the metal. Substantial mismatch between the coefficient of thermal expansion of the glasses or ceramics and the copper base alloys may be tolerated in accordance with this invention.


Book ChapterDOI
01 Jan 1997
TL;DR: The research in this paper aims at using world knowledge to aid the user in retrieving information from the World Wide Web and identifies issues together with methods to address them.
Abstract: The research in this paper aims at using world knowledge to aid the user in retrieving information from the World Wide Web. Some issues are identified together with methods to address them.



Proceedings ArticleDOI
Wendy A. Kellogg1, Jakob Nielsen
22 Mar 1997
TL;DR: This workshop aims to gain some perspective on the rapidly changing landscape of the Web by driving up the level of abstraction in considering significant Web phenomena by creating conceptual leverage to augment understanding of what the Web is, and what it will become in the future.
Abstract: In the history of computing, there has been nothing comparable to the World Wide Web. No one predicted the Web or its unprecedented growth, which continues almost unabated today. The purpose of this workshop is to gain some perspective on the rapidly changing landscape of the Web by driving up the level of abstraction in considering significant Web phenomena. We seek to create conceptual leverage to augment our understanding of what the Web is, and what it will become in the future.

Book ChapterDOI
01 Jan 1997
TL;DR: This research paper reports results of research on semantic inter-operability in realtime, distributed object-oriented systems and proposes an architectural framework for distributed heterogeneous object system organization to provide a basis of inter-operation for different kinds of object models and application domains.
Abstract: This research paper reports results of research on semantic inter-operability in realtime, distributed object-oriented systems. It proposes an architectural framework for distributed heterogeneous object system organization. The major goals of the framework are to provide a basis of inter-operability for different kinds of object models and application domains.

Dissertation
01 Jan 1997
TL;DR: This thesis develops a full-scale semantic content-based model that caters for the above seven semantic aspects of video and audio, and uses an entities of interest approach, instead of a structure-oriented one.
Abstract: Issues of syntax have dominated research in multimedia information systems (MMISs), with video developing as a technology of images and audio as one of signals. But when we use video and audio, we do so for their content. This is a semantic issue. Current research in multimedia on semantic content-based models has adopted a structure-oriented approach, where video and audio content is described on a frame-by-frame or segment-by-segment basis (where a segment is an arbitrary set of contiguous frames). This approach has failed to cater for semantic aspects, and thus has not been fully effective when used within an MMIS. The research undertaken for this thesis reveals seven semantic aspects of video and audio: (1) explicit media structure; (2) objects; (3) spatial relationships between objects; (4) events and actions involving objects; (5) temporal relationships between events and actions; (6) integration of syntactic and semantic information; and (7) direct user-media interaction. This thesis develops a full-scale semantic content-based model that caters for the above seven semantic aspects of video and audio. To achieve this, it uses an entities of interest approach, instead of a structure-oriented one, where the MMIS integrates relevant semantic content-based information about video and audio with information about the entities of interest to the system, e.g. mountains, vehicles, employees. A method for developing an interactive MMIS that encompasses the model is also described. Both the method and the model are used in the development of ARISTOTLE, an interactive instructional MMIS for teaching young children about zoology, in order to demonstrate their operation.


Book ChapterDOI
11 Jan 1997
TL;DR: Data structures and a transformation algorithm are developed here for efficient constraint selection and processing and to verify that transformed queries can be executed more efficiently, a cost analysis method is utilized.
Abstract: This paper investigates an algorithm for semantic query optimization for object-oriented databases (OODBs). Semantic query optimization is accomplished by applying transformation rules that use semantic integrity constraints to modify queries to execute more efficiently. Data structures and a transformation algorithm are developed here for efficient constraint selection and processing. To verify that transformed queries can be executed more efficiently, a cost analysis method is utilized that compares the cost of an original query and the cost of its modified version. Representing and manipulating semantic knowledge in an OODB and effectively using this knowledge to enhance query performance are contributions of this research.