scispace - formally typeset
Search or ask a question

Showing papers on "Semantic Web Stack published in 1998"


Proceedings ArticleDOI
01 May 1998
TL;DR: This investigation shows that although the process by which users of the Web create pages and links is very difficult to understand at a “local” level, it results in a much greater degree of orderly high-level structure than has typically been assumed.
Abstract: The World Wide Web grows through a decentralized, almost anarchic process, and this has resulted in a large hyperlinked corpus without the kind of logical organization that can be built into more tradit,ionally-created hypermedia. To extract, meaningful structure under such circumstances, we develop a notion of hyperlinked communities on the www t,hrough an analysis of the link topology. By invoking a simple, mathematically clean method for defining and exposing the structure of these communities, we are able to derive a number of themes: The communities can be viewed as containing a core of central, “authoritative” pages linked togh and they exhibit a natural type of hierarchical topic generalization that can be inferred directly from the pat,t,ern of linkage. Our investigation shows that although the process by which users of the Web create pages and links is very difficult to understand at a “local” level, it results in a much greater degree of orderly high-level structure than has typically been assumed.

905 citations


Proceedings Article
01 Jul 1998
TL;DR: The goal of the research described here is to automatically create a computer understandable world wide knowledge base whose content mirrors that of the World Wide Web, and several machine learning algorithms for this task are described.
Abstract: The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable world wide knowledge base whose content mirrors that of the World Wide Web. Such a knowledge base would enable much more effective retrieval of Web information, and promote new uses of the Web to support knowledge-based inference and problem solving. Our approach is to develop a trainable information extraction system that takes two inputs: an ontology defining the classes and relations of interest, and a set of training data consisting of labeled regions of hypertext representing instances of these classes and relations. Given these inputs, the system learns to extract information from other pages and hyperlinks on the Web. This paper describes our general approach, several machine learning algorithms for this task, and promising initial results with a prototype system.

766 citations


Journal ArticleDOI
Ora Lassila1
TL;DR: This paper considers how the Resource Description Framework, with its focus on machine-understandable semantics, has the potential for saving time and yielding more accurate search results.
Abstract: The sheer volume of information can make searching the Web frustrating. The paper considers how the Resource Description Framework, with its focus on machine-understandable semantics, has the potential for saving time and yielding more accurate search results. RDF, a foundation for processing metadata, provides interoperability between applications that exchange machine understandable information on the Web.

192 citations


01 Jan 1998
TL;DR: This paper describes how SHOE, a set of Simple HTML Ontological Extensions, can be used to discover implicit knowledge from the World-Wide Web through the use of context, inheritance and inference.
Abstract: This paper describes how SHOE, a set of Simple HTML Ontological Extensions, can be used to discover implicit knowledge from the World-Wide Web (WWW) SHOE allows authors to annotate their pages with ontology-based knowledge about page contents In previous papers, we discussed how the semantic knowledge provided by SHOE allows users to issue queries that are much more sophisticated than keyword search techniques, including queries that require retrieval of information from many sources Here, we expand upon this idea by describing how SHOE’s ontologies allow agents to understand more than what is explicitly stated in Web pages through the use of context, inheritance and inference We use examples to illustrate the usefulness of these features to Web agents and query engines

61 citations


Journal ArticleDOI
TL;DR: The aim is to demonstrate how the solution of basic problems in the design of digital libraries can benefit from human science theories and models when applying digital libraries to industrial engineering design.
Abstract: several issues that call for cross-disciplinary cooperation with such human science disciplines as psychology, sociology, and information science. Therefore , an international research network called the Network for Engineering Design and Human Science was established in 1996 with participation by universities in Canada, Denmark, England, and the U.S., and Danfoss A/S, a Danish manufacturer. The aim is to demonstrate how the solution of basic problems in the design of digital libraries can benefit from human science theories and models when applying digital libraries to industrial engineering design. Let industrial designers learn from the experts in the human sciences as well as from their companies' collective memories. In industrial design, a product is simultaneously an object in the domains of marketing, technical design, manufacturing, user application, maintenance , and recycling (see Figure 1). Groups of designers with different professional backgrounds, domain expertise, and departmental affiliations cooperate across the organization, meeting in project groups to make design decisions about the product. Decisions cannot be made without detailed information about the product in the context of these various domains. Each participant is responsible for a particular knowledge domain of importance to the ultimate design, and work takes place among several departments and companies, all distributed geographically. This cooperation requires the non-trivial exploration of colleagues' work-domain knowledge and retrieval of information about previ

51 citations


Proceedings ArticleDOI
26 May 1998
TL;DR: A framework in which different caching and replication strategies can be devised independently per Web document, and a prototype in Java is developed to demonstrate the feasibility of implementing different strategies for different Web objects.
Abstract: Despite the extensive use of caching techniques, the Web is overloaded. While the caching techniques currently used help some, it would be better to use different caching and replication strategies for different Web pages, depending on their characteristics. We propose a framework in which such strategies can be devised independently per Web document. A Web document is constructed as a worldwide, scalable distributed Web object. Depending on the coherence requirements for that document, the most appropriate caching or replication strategy can subsequently be implemented and encapsulated by the Web object. Coherence requirements are formulated from two different perspectives: that of the Web object, and that of clients using the Web object. We have developed a prototype in Java to demonstrate the feasibility of implementing different strategies for different Web objects.

46 citations


Book ChapterDOI
30 Mar 1998
TL;DR: It is noted that some structural properties can be identified with semantic properties of the data and provide measures for comparison between HTML documents.
Abstract: When we describe a Web page informally, we often use phrases like “it looks like a newspaper site”, “there are several unordered lists” or “it's just a collection of links”. Unfortunately, no Web search or classification tools provide the capability to retrieve information using such informal descriptions that are based on the appearance, i.e., structure, of the Web page. In this paper, we take a look at the concept of structurally similar Web pages. We note that some structural properties can be identified with semantic properties of the data and provide measures for comparison between HTML documents.

41 citations


Proceedings Article
01 Jan 1998
TL;DR: A declarative query language that would allow resource discovery on the Internet with interactive and progressively interactive inquiries and consents to the discovery of knowledge within the content of the documents and the structure of the hyperspace is proposed.
Abstract: There is a massive increase of information available on electronic networks. This profusion of resources on the WorldWide Web gave rise to considerable interest in the research community. Traditional information retrieval techniques have been applied to the document collection on the Internet, and a myriad of search engines and tools have been proposed and implemented. However, the e ectiveness of these tools is not satisfactory. None of them is capable of discovering knowledge from the Internet. We propose a declarative query language that would allow resource discovery on the Internet with interactive and progressively re ned inquiries. The language also consents to the discovery of knowledge within the content of the documents and the structure of the hyperspace.

40 citations


Proceedings Article
01 Jan 1998
TL;DR: This paper introduces a database system called FreeNet that facilitates the description and exploration of finite binary relations and describes the design and implementation of Lexical FreeNet, a semantic network that mixes WordNet-derived semantic relations with data-derived and phonetically-derived relations.
Abstract: The study of lexical semantics has produced a systematic analysis of binary relationships between content words that has greatly benefited lexical search tools and natural language processing algorithms. We first introduce a database system called FreeNet that facilitates the description and exploration of finite binary relations. We then describe the design and implementation of Lexical FreeNet, a semantic network that mixes WordNet-derived semantic relations with data-derived and phonetically-derived relations. We discuss how Lexical FreeNet has aided in lexical discovery, the pursuit of linguistic and factual knowledge by the computer-aided exploration of lexical relations. 1 M o t i v a t i o n This paper discusses Lexical FreeNet, a database system designed to enhance lexical discovery. By this we mean the pursuit of linguistic and factual knowledge with the computer-aided exploration of lexical relations. Lexical FreeNet is a semantic network that leverages WordNet and other knowledge and data sources to facilitate the discovery of nontrivial lexical connections between words and concepts. A semantic network allied with the proper user interface can be a useful tool in its own right. By organizing words semantically rather than alphabetically, WordNet provides a means by which users can navigate its vocabulary logically, establishing connections between concepts and not simply character sequences. Exploring the WordNet hyponym tree starting at the word mammal, for instance, reveals to us that aardvarks are mammals; exploring WordNet's meronym relation at the word tv, mr*al reveals to us that mammals have ha i r . From these two explorations we can accurately conclude that aardvarks have hair. Lexical exploration need not be limited to one step at a time, however. Viewing a semantic network as a computational structure awaiting graph-theoretic queries gives us the freedom to demand services beyond mete lookup. "Does the aardvark have hair?", or "What is the closest connection between aardvarks and hair?" or "How interchangably can the words aardvark and a n t e a t e r be used?" are all reasonable questions with answers staring us in the 135 face. Of course, the idea of finding shortest paths in semantic networks (through so-called activationspreading or intersection search) is not new. But these questions have typically been asked of very limited graphs, networks for domains far narrower than the lexical space of English, say. We feel that formalizing how WordNet can be employed for this broader sort of lexical discovery is a good start. We also feel that it is necessary first to enrich the network with information that, as we shall see, cannot be easily gleaned from WordNet's current battery of relations. The very large electronic corpora and wide variety of linguistic resources that today's computing technology has enabled will in turn enable this. The remainder of this paper is organized as follows. We shall first describe in Section 2 the FreeNet database system for the expression and analysis of relational data. In Section 3 we'll describe the design and construction of an instance of this database called Lexical FreeNet. We'll conclude by providing examples of applications of Lexical FreeNet to lexical discovery.

36 citations


Journal ArticleDOI
TL;DR: A new formal model of query and computation on the Web is presented, focusing on two important aspects that distinguish the access to Web data from theAccess to a standard database system: the navigational nature of the access and the lack of concurrency control.

27 citations


Proceedings ArticleDOI
01 May 1998
TL;DR: The requirements of a tourism hypermedia system resulting from ethnographic studies of tourist advisers are presented, and it is concluded that an open semantic hypermedia (SH) approach is appropriate.
Abstract: Web-based Public Information Systems of the kind common in tourism do not satisfy the needs of the customer because they do not offer a sufficiently flexible linking environment capable of emulating the mediation role of a tourist adviser. We present the requirements of a tourism hypermedia system resulting from ethnographic studies of tourist advisers, and conclude that an open semantic hypermedia (SH) approach is appropriate. We present a novel and powerful SH prototype based on the use of a semantic model expressed as a terminology. The terminological model is implemented by a Description Logic, GRAIL, capable of the automatic and dynamic multi-dimensional classification of concepts, and hence the web pages they describe, We show how GRAIL-Link has been used within the TourisT hypermedia system and conclude with a discussion.

Book ChapterDOI
24 Aug 1998
TL;DR: Issues in web join, a new operator that combines information from two web tables to yield a new web table, are discussed.
Abstract: Recently, there has been increasing interests in data models and query languages for unstructured data in the World Wide Web. When web data is harnessed in a web warehouse, new and useful information can be derived through appropriate information manipulation. In our web warehousing project, we introduce a new operator called the web join. Like its relational counterpart, web join combines information from two web tables to yield a new web table. This paper discusses various issues in web join such as join semantics, joinability, and join evaluation.

Journal ArticleDOI
15 Dec 1998
TL;DR: A new type of tags for denoting the semantics of data stored in HTML pages,implemented as HTML comments, superimpose on HTML pages semistructured objects in the style of the OEM model.
Abstract: Current query languages for the Web (e.g., W3QL, WebLog and WebSQL) explore the structure of the Web. However, usually, the structure of the Web has little to do with the semantics of the data. Therefore, it is practically difficult to pose database queries over the Web. We introduce a new type of tags for denoting the semantics of data stored in HTML pages. These semantic tags (implemented as HTML comments) superimpose on HTML pages semistructured objects in the style of the OEM model. The paper discusses two implemented tools for fully utilizing the semantics. The first is a visualization tool for displaying both the HTML reading of Web pages and the OEM reading of Web pages. The second tool is a query language, similar to LOREL, that can query the HTML structure and/or the OEM reading. The above formalism and tools provide data-modeling capabilities for the Web that fit its heterogeneous nature. Real database queries, taking the OEM point of view, can be formulated, including queries about the schema as well as queries about the HTML structure of Web pages. Therefore, the query language is not restricted to portions of the Web in which semantic tags are used.

Book ChapterDOI
21 Sep 1998
TL;DR: This work studies the problem of semantic caching of Web queries and develops a caching mechanism for conjunctiveWeb queries based on signature files that extends this processing to more complex cases of semantic intersection.
Abstract: In digital libraries accessing distributed Web-based bibliographic repositories, performance is a major issue. Efficient query processing requires an appropriate caching mechanism. Unfortunately, standard page-based as well as tuple-based caching mechanisms designed for conventional databases are not efficient on the Web, where keyword-based querying is often the only way to retrieve data. Therefore, we study the problem of semantic caching of Web queries and develop a caching mechanism for conjunctiveWeb queries based on signature files. We propose two implementation choices. A first algorithm copes with the relation of semantic containment between a query and the corresponding cache items. A second algorithm extends this processing to more complex cases of semantic intersection. We report results of experiments and show how the caching mechanism is successfully realized in the Knowledge Broker system.

Proceedings Article
01 Jan 1998
TL;DR: In this scheme, tasks for query translation/capability mapping (named as query naturalization) between wrappers and web sources and tasks for semantic caching are seamlessly integrated, resulting in easier query optimization.
Abstract: A semantic caching scheme suitable for web database environments is proposed. In our scheme, tasks for query translation/capability mapping (named as query naturalization) between wrappers and web sources and tasks for semantic caching are seamlessly integrated, resulting in easier query optimization. A semantic cache consists of three components: 1) semantic view , a description of the contents in the cache using sub-expressions of the previous queries, 2) semantic index , an index for the tuple IDs that satisfy the semantic view, and 3) physical storage, a storage containing the tuples (or objects) that are shared by all semantic views in the cache. Types of matching between the native query and cache query are discussed. Algorithms for nding the optimal match of the input query in semantic cache and for cache replacement are presented. The proposed techniques are being implemented in a cooperative web database (CoWeb) prototype at UCLA.

01 Jan 1998
TL;DR: Since the functionality of wrappers and mediators is integrated into a single declarative language the development of advanced applications based on the Web as an information source is signi cantly simpli ed.
Abstract: Languages supporting deduction and object orientation seem particularly promising for querying and reasoning about structure and contents of the Web and for the integration of information from heterogeneous sources Florid an implementation of the deductive object oriented language F logic has been extended to provide a declarative semantics for querying the Web This extension allows extraction and restructuring of data from the Web and a seamless integration with local data Since the functionality of wrappers and mediators is integrated into a single declarative language the development of advanced applications based on the Web as an information source is signi cantly simpli ed This claim is substantiated using a comprehensive example

Journal ArticleDOI
TL;DR: The goal of the Empirical Web Analysis (EWA) project was to investigate the discrepancy between commercial and academic Web design, paying special attention to these new features of industry related Web pages.
Abstract: Frequent users of the Web will have noticed an emerging discrepancy between university Web pages and commercial sites. An abundance of animated GIFs, image maps, fancy background images, frames and advanced font handling are characteristic of industry related Web pages. Web designers in academia still remain closer to the original purpose of HTML: to delineate the structure and semantics of a document and not its presentation in a browser. The goal of the Empirical Web Analysis (EWA) project was to investigate the discrepancy between commercial and academic Web design, paying special attention to these new features.

Proceedings ArticleDOI
04 Nov 1998
TL;DR: With the release and widespread support of XML (extensible markup language) and the development of MathML, Web pages not only can display mathematics and equations in TeX-like fashion, but, beyond that, retain the meaning of the equations so that they can be opened and processed by a variety of mathematical software applications.
Abstract: One of the ironies of the World Wide Web (WWW or simply the Web) is that even though it was initially conceived and implemented for use by physicists, it provided no special capabilities for mathematics and equations. With the release and widespread support of XML (extensible markup language) and the development of MathML, Web pages not only can display mathematics and equations in TeX-like fashion, but, beyond that, retain the meaning of the equations so that they can be opened and processed by a variety of mathematical software applications. The Web thus can expand the scope of its inherent intense interactivity to include equations and mathematics, as well as text and multimedia.

Proceedings ArticleDOI
01 Jun 1998
TL;DR: The IDEA Web Laboratory (Web Lab), a Web-based software design environment available on the Internet, is presented, which demonstrates a novel approach to the software production process on the Web.
Abstract: With the spreading of the World Wide Web as a uniform and ubiquitous interface to computer applications and information, novel opportunities are offered for introducing significant changes in all organizations and their processes. This demo presents the IDEA Web Laboratory (Web Lab), a Web-based software design environment available on the Internet, which demonstrates a novel approach to the software production process on the Web.

Journal Article
TL;DR: NetQL as discussed by the authors is a query language designed for local structure-based queries, which not only exploits the topology of web pages given by hyperlinks, but also supports queries involving information inside pages.
Abstract: With the increasing importance of the World Wide Web as an information repository, how to locate documents of interest becomes more and more significant. The current practice is to send keywords to search engines. However, these search engines lack the capability to take the structure of the Web into consideration. We thus present a novel query language, NetQL and its implementation, for accessing the World Wide Web. Rather than working on global text-full search, NetQL is designed for local structure-based queries. It not only exploits the topology of web pages given by hyperlinks, but also supports queries involving information inside pages. A novel approach to extract information from web pages is presented. In addition, the methods to control the complexity of query processing are also addressed in this paper.

Journal ArticleDOI
Yves Poumay1
22 May 1998-Science

Proceedings Article
01 Jan 1998

Proceedings Article
01 Jan 1998





Journal Article
TL;DR: This paper discusses and shows how two web operators are used to associate related web information from the WWW and also from multiple web tables in a web warehouse.
Abstract: Web information coupling refers to an association of topically related web documents. This coupling is initiated explicitly by a user in a web warehouse specially designed for web information. Web information coupling provides the means to derive additional, useful information from the WWW. In this paper, we discuss and show how two web operators, i.e., global web coupling and local web coupling, are used to associate related web information from the WWW and also from multiple web tables in a web warehouse. This paper discusses various issues in web coupling such as coupling semantics, coupling-compability, and coupling evaluation.