Showing papers on "Semantic Web published in 1999"

PDF

Open Access

Proceedings Article•

Extracting Large-Scale Knowledge Bases from the Web

[...]

Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins¹•Institutions (1)

07 Sep 1999

TL;DR: This paper develops novel algorithms for enumerating and organizing all web occurrences of certain subgraphs that are signatures of web phenomena such as tightly-focused topic communities, webrings, taxonomy trees, keiretsus, etc, and argues that these algorithms run efficiently in this model.

...read moreread less

Abstract: The subject of this paper is the creation of knowledge bases by enumerating and organizing all web occurrences of certain subgraphs. We focus on subgraphs that are signatures of web phenomena such as tightly-focused topic communities, webrings, taxonomy trees, keiretsus, etc. For instance, the signature of a webring is a central page with bidirectional links to a number of other pages. We develop novel algorithms for such enumeration problems. A key technical contribution is the development of a model for the evolution of the web graph, based on experimental observations derived from a snapshot of the web. We argue that our algorithms run efficiently in this model, and use the model to explain some statistical phenomena on the web that emerged during our experiments. Finally, we describe the design and implementation of Campfire, a knowledge base of over one hundred thousand web communities.

...read moreread less

282 citations

A Machine Learning Architecture for Optimizing Web Search Engines

[...]

Justin A. Boyan, Dayne Freitag

01 Jan 1999

TL;DR: A wide range of heuristics for adjusting document rankings based on the special HTML structure of Web documents are described, including a novel one inspired by reinforcement learning techniques for propagating rewards through a graph which can be used to improve a search engine's rankings.

...read moreread less

Abstract: Indexing systems for the World Wide Web, such as Lycos and Alta Vista, play an essential role in making the Web useful and usable. These systems are based on Information Retrieval methods for indexing plain text documents, but also include heuristics for adjusting their document rankings based on the special HTML structure of Web documents. In this paper, we describe a wide range of such heuristics|including a novel one inspired by reinforcement learning techniques for propagating rewards through a graph|which can be used to a ect a search engine's rankings. We then demonstrate a system which learns to combine these heuristics automatically, based on feedback collected unintrusively from users, resulting in much improved rankings.

...read moreread less

239 citations

Book Chapter•DOI•

Research Issues in Web Data Mining

[...]

Sanjay Kumar Madria¹, Sourav S. Bhowmick², Wee Keong Ng², Ee-Peng Lim²•Institutions (2)

Purdue University¹, Nanyang Technological University²

01 Sep 1999

TL;DR: This paper focuses on web data mining research in context of the authors' web warehousing project called WHOWEDA (Warehouse of Web Data), and categorized web datamining into threes areas; web content mining, web structure mining and web usage mining.

...read moreread less

Abstract: In this paper, we discuss mining with respect to web data referred here as web data mining. In particular, our focus is on web data mining research in context of our web warehousing project called WHOWEDA (Warehouse of Web Data). We have categorized web data mining into threes areas; web content mining, web structure mining and web usage mining. We have highlighted and discussed various research issues involved in each of these web data mining category. We believe that web data mining will be the topic of exploratory research in near future.

...read moreread less

203 citations

Automatic Web Page Categorization by Link and Context Analysis

[...]

Giuseppe Attardi, Antonio Gulli, Fabrizio Sebastiani

01 Jan 1999

TL;DR: The paper describes the novel technique of categorization by context, which instead extracts useful information for classifying a document from the context where a URL referring to it appears, and presents the results of experimenting with Theseus, a classifier that exploits this technique.

...read moreread less

Abstract: Assistance in retrieving documents on the World Wide Web is provided either by search engines, through keyword-based queries, or by catalogues, which organize documents into hierarchical collections. Maintaining catalogues manually is becoming increasingly difficult, due to the sheer amount of material on the Web; it is thus becoming necessary to resort to techniques for the automatic classification of documents. Automatic classification is traditionally performed by extracting the information for representing a document (“indexing”) from the document itself. The paper describes the novel technique of categorization by context, which instead extracts useful information for classifying a document from the context where a URL referring to it appears. We present the results of experimenting with Theseus, a classifier that exploits this technique.

...read moreread less

192 citations

The World Wide Web as a Resource for Example-Based Machine Translation Tasks

[...]

Gregory Grefenstette

01 Jan 1999

TL;DR: This article illustrates this by showing that an Example-Based approach to lexical choice for machine translation can use the Web as an adequate and free resource.

...read moreread less

Abstract: The WWW is two orders of magnitude larger than the largest corpora. Although noisy, web text presents language as it is used, and statistics derived from the Web can have practical uses in many NLP applications. For this reason, the WWW should be seen and studied as any other computationally available linguistic resource. In this article, we illustrate this by showing that an Example-Based approach to lexical choice for machine translation can use the Web as an adequate and free resource.

...read moreread less

175 citations

Journal Article•DOI•

Corporate World Wide Web Pages: Serving the News Media and Other Publics

[...]

Stuart L. Esrock, Greg Leichty

01 Sep 1999-Journalism & Mass Communication Quarterly

TL;DR: The authors investigated the intersection between corporate World Wide Web pages and the publics they serve and found that while the typical corporate Web page is used to service news media, customers, and the financial community, it is not being used to its fullest potential to communicate simultaneously with other audiences.

...read moreread less

Abstract: Against the backdrop of the rapid growth of the Internet, this research study investigates the intersection between corporate World Wide Web pages and the publics they serve. Content analysis revealed that, while the typical corporate Web page is used to service news media, customers, and the financial community, it is not being used to its fullest potential to communicate simultaneously with other audiences. Through a cluster analysis procedure, the researchers found about one-third of corporate Web sites are assertively used to communicate with a multiplicity of audiences in a variety of information formats.

...read moreread less

148 citations

Journal Article•DOI•

An analysis of Web page and Web site constancy and permanence

[...]

Wallace Koehler¹•Institutions (1)

University of Oklahoma¹

01 Feb 1999-Journal of the Association for Information Science and Technology

TL;DR: This study is first a preliminary exploration into Web page and Web site mortality rates, then considers two types of change: Content and structural, and explores the “short memory” and “mind changing” of the World Wide Web.

...read moreread less

Abstract: We recognize that documents on the World Wide Web are ephemeral and changing. We also recognize that Web documents can be categorized along a number of dimensions, including “publisher,” size, object mix, as well as purpose, meaning, and content. This study is first a preliminary exploration into Web page and Web site mortality rates. It then considers two types of change: Content and structural. Finally, the study is concerned with understanding those constancy and permanence phenomena for different Web document classes. It is suggested that, from the perspective of information maintenance and retrieval, the WWW does not represent revolutionary change. In fact, in some ways the Web is a less sophisticated form than traditional publication practices. Finally, this study explores the “short memory” and “mind changing” of the World Wide Web.

...read moreread less

146 citations

On2broker: semantic-based access to information sources at the WWW

[...]

Dieter Fensel Jürgen Angele¹, Stefan Decker¹, Michael Erdmann¹, Hans-Peter Schnurr¹, Steffen Staab¹, Rudi Studer¹, Andreas Witt¹ - Show less +3 more•Institutions (1)

Karlsruhe Institute of Technology¹

31 Jul 1999

TL;DR: The general architecture and main components of On2broker are discussed and the use of ontologies to make explicit the semantics of web pages is provided.

...read moreread less

Abstract: On2broker provides brokering services to improve access to heterogeneous, distributed and semistructured information sources as they are presented in the World Wide Web It relies on the use of ontologies to make explicit the semantics of web pages In the paper we will discuss the general architecture and main components of On2broker and provide some application scenarios

...read moreread less

131 citations

Proceedings Article•DOI•

Practical knowledge representation for the web

[...]

Frank van Harmelen¹, Dieter Fensel Aifb²•Institutions (2)

VU University Amsterdam¹, Karlsruhe Institute of Technology²

31 Jul 1999

TL;DR: A survey and analysis of traditional, new, and arising Web standards and show how they can be used to represent machine-processable semantics of Web sources to help AI researchers and practitioners to apply their results to real Web documents.

...read moreread less

Abstract: The lack of semantic markup is a major barrier to the development of more intelligent document processing on the Web. Current HTML markup is used only to indicate the structure and lay-out of documents, but not the document semantics. Unfortunately, proposals from the AI community for Web-based knowledge -representation languages can hardly expect wide acceptance on the Web. Even if unpalatable for the AI community, the question should instead be how well AI concepts can be fitted into the markup languages that are widely supported on the Web, either now or in the foreseeable future. We provide a survey and analysis of traditional, new, and arising Web standards and show how they can be used to represent machine-processable semantics of Web sources. The results of this paper should help AI researchers and practitioners to apply their results to real Web documents, instead of basing themselves on AI specific representations that have no chance of becoming widely used on the Web.

...read moreread less

76 citations

Proceedings Article•DOI•

Web mining: knowledge discovery on the Web

[...]

Wang Jicheng¹, Huang Yuan, WU Gangshan, Zhang Fu-yan•Institutions (1)

Nanjing University¹

12 Oct 1999

TL;DR: This paper presents a preliminary discussion about Web mining, including its definition, the relationship between information mining and information retrieval on the Web, and the taxonomy and the function of Web mining.

...read moreread less

Abstract: With the flood of information on the World Wide Web, Web mining is a new research issue which is drawing great interest from many communities. Currently, there is no agreement about Web mining; it needs more discussion among researchers in order to define exactly what it is. Meanwhile, the development of Web mining systems will in turn promote its research. In this paper, we present a preliminary discussion about Web mining, including its definition, the relationship between information mining and information retrieval on the Web, and the taxonomy and the function of Web mining. In addition, a prototype system called WebTMS (Web Text Mining System) has been designed. WebTMS is a multi-agent system which combines text mining and multi-dimensional document analysis to help users mine HTML documents on the Web effectively.

...read moreread less

68 citations

Journal Article•

Agents on the Web: Personal Ontologies.

[...]

Michael N. Huhns, Larry M. Stephens

01 Jan 1999-IEEE Internet Computing

TL;DR: In this paper, the authors discuss the use of the personal ontology and propose an organization scheme based on a model of an office and its information, an ontology, coupled with the proper tools for using it.

...read moreread less

Abstract: Corporations can suffer from too much information, and it is often inaccessible, inconsistent, and incomprehensible. The corporate solution entails knowledge management techniques and data warehouses. The paper discusses the use of the personal ontology. The promising approach is an organization scheme based on a model of an office and its information, an ontology, coupled with the proper tools for using it.

...read moreread less

Proceedings Article•DOI•

Visualization of WWW-search results

[...]

Thomas M. Mann¹•Institutions (1)

University of Konstanz¹

01 Sep 1999

TL;DR: An approach is presented to use alternative simple visualizations grouped around the traditional result-list, for the usage with a local meta web search engine.

...read moreread less

Abstract: The idea of Information Visualization is to get insights into great amounts of abstract data. Especially document sets found by searching the World Wide Web are a special challenge. The paper gives a short overview on the variety of possible visualizations for this application area. The presented ideas are grouped by using the four phase framework of information seeking. Crucial factors for the success of visualizations are discussed. An approach is presented to use alternative simple visualizations grouped around the traditional result-list, for the usage with a local meta web search engine.

...read moreread less

Proceedings Article•DOI•

On Web annotations: promises and pitfalls of current Web infrastructure

[...]

V. Vasudevan, M. Palmer

05 Jan 1999

TL;DR: The experiences with client- and proxy-server based implementations of an annotation system architecture are described, pointing to missing elements in the current Web infrastructure that make any implementation of annotation systems less than completely satisfactory.

...read moreread less

Abstract: Annotations are a broadly useful mechanism that can support a number of useful document management applications (third-party commentary, design rationale, information filtering and semantic labelling of document content, to name just a few). The ubiquity of World Wide Web content motivates the need for Web annotation systems that are lightweight, efficient, non-intrusive (preferably transparent), platform-independent and scalable. Building such a system using open and standard Web infrastructures (as opposed to proprietary ones) facilitates widespread applicability and deployment. In practice, there are a number of ways to do this, all of which instantiate a common abstract architecture based on intermediaries. This paper described our experiences with client- and proxy-server based implementations of an annotation system architecture. The implementations point to missing elements in the current Web infrastructure that make any implementation of annotation systems less than completely satisfactory. This paper discusses these elements of current Web infrastructure, and potential changes to the Web architecture that might make the implementation of annotation systems more complete.

...read moreread less

Book•

Inductive Logic Programming

[...]

Simon Price¹, Peter A. Flach¹•Institutions (1)

University of Bristol¹

01 Jan 1999

TL;DR: This work proposes a generalisation of joins from the relational database model to enable joins on arbitrarily complex structured data in a higher-order representation, and extends this model to support approximate joins of heterogeneous data.

...read moreread less

Abstract: Integrating heterogeneous data from sources as diverse as web pages, digital libraries, knowledge bases, the Semantic Web and databases is an open problem. The ultimate aim of our work is to be able to query such heterogeneous data sources as if their data were conveniently held in a single relational database. Pursuant to this aim, we propose a generalisation of joins from the relational database model to enable joins on arbitrarily complex structured data in a higher-order representation. By incorporating kernels and distances for structured data, we further extend this model to support approximate joins of heterogeneous data. We demonstrate the flexibility of our approach in the publications domain by evaluating example approximate queries on the CORA data sets, joining on types ranging from sets of co-authors through to entire publications.

...read moreread less

Proceedings Article•DOI•

Modeling and querying structure and contents of the Web

[...]

Wolfgang May

01 Sep 1999

TL;DR: This work regards the Web and its contents as a unit, represented in an object-oriented data model: the Web structure, given by its hyperlinks, the parse-trees of Web pages (intra-document level), and their contents, and the model is complemented by a rule-based object- oriented language.

...read moreread less

Abstract: For accessing and processing the information provided on the Web, there is a need for extraction, restructuring, and integration of semistructured data from autonomous, heterogeneous sources. We regard the Web and its contents as a unit, represented in an object-oriented data model: the Web structure (inter-document level), given by its hyperlinks, the parse-trees of Web pages (intra-document level), and their contents. The model is complemented by a rule-based object-oriented language which is extended by Web access capabilities and allows for navigation in the unified model. We show the practicability of our approach by using the FLORID system.

...read moreread less

Journal Article•DOI•

Integrating temporal media and open hypermedia on the World Wide Web

[...]

Niels Olof Bouvin¹, René Schade•Institutions (1)

Aarhus University¹

17 May 1999

TL;DR: The Mimicry system is introduced that allows authors and readers to link to and from temporal media (video and audio) on the Web and is integrated with the Arakne Environment, an open hypermedia integration aimed at Web augmentation.

...read moreread less

Abstract: The World Wide Web has since its beginning provided linking to and from text documents encoded in HTML. The Web has evolved and most Web browsers now support a rich set of media types either by default or by the use of specialised content handlers, known as plug-ins. The limitations of the Web linking model are well known and they also extend into the realm of the other media types currently supported by Web browsers. This paper introduces the Mimicry system that allows authors and readers to link to and from temporal media (video and audio) on the Web. The system is integrated with the Arakne Environment, an open hypermedia integration aimed at Web augmentation. The links created are stored externally, allowing for links to and from resources not owned by the (link) author. Based on the experiences a critique is raised of the limited APIs supported by plug-ins.

...read moreread less

Journal Article•DOI•

Web review: W3C, the World Wide Web consortium

[...]

Kim Moorman

01 Nov 1999-ACM Crossroads Student Magazine

TL;DR: The W3C provides a broad array of information, organized into general categories ranging from a general history of the Web to an archive of released technical reports and specifications, which is free to all.

...read moreread less

Abstract: Just what direction is the Web headed? For the answers to these and other general Internet or markup language specific questions, tune your browser to http://www. w3.org/ for a vast and varied selection of information. Because the W3C establishes recommendations concerning the Web, this site offers interesting possibilities as to future directions for the Web. While the W3C does not exercise the influence of an official standards setting organization, they have been influential in bringing together industry members and other interested parties to develop solutions and circulate recommendations to the public and members. provide the means for establishing guidelines as to future web development, and to serve as a forum for members to meet and discuss common problems. Membership is not free, but the website serves as a central location to disseminate the technical specifications written by the consortium, as well as other related information, which is free to all. The website does provide a very comprehensive assortment of information about the Web, and is well-structured and organized. The structure has a good balance between depth and breadth of pages. The website provides a broad array of information, organized into general categories ranging from a general history of the Web to an archive of released technical reports and specifications. The technical reports are also noted in the press releases that are available at the website. The technical reports include a chronological listing of drafts of

...read moreread less

Book•DOI•

The World Wide Web and Databases

[...]

Paolo Atzeni, Alberto O. Mendelzon, Giansalvatore Mecca

01 Jan 1999

TL;DR: This work argues that cache performance can be improved by integrating cache replacement and consistency algorithms, and presents an unified algorithm LNC-R-W3-U, which achieves performance comparable (and often superior) to most of the published cache replacement algorithms and at the same time significantly reduces the staleness of the cached documents.

...read moreread less

Abstract: Caching of Web documents improves the response time perceived by the clients. Cache replacement algorithms play a central role in the response time reduction by selecting a subset of documents for caching so that an appropriate performance metric is maximized. At the same time, the cache must take extra steps to guarantee some form of consistency of the cached data. Cache consistency algorithms enforce appropriate guarantees about the staleness of documents it stores. Most of the published work on Web cache design either considers cache consistency algorithms separately from cache replacement algorithms or concentrates only on studying one of the two. We argue that cache performance can be improved by integrating cache replacement and consistency algorithms. We present an unified algorithm LNC-R-W3-U. Using trace-based experiments, we demonstrate that LNC-R-W3-U achieves performance comparable (and often superior) to most of the published cache replacement algorithms and at the same time significantly reduces the staleness of the cached documents.

...read moreread less

Journal Article•DOI•

Hypertext in the Web — a history

[...]

Robert Cailliau, Helen Ashman¹•Institutions (1)

University of Nottingham¹

01 Dec 1999-ACM Computing Surveys

TL;DR: The history of hypertext in the World Wide Web is overviewed to bring more sophisticated hypertext into the Web, and the new XML proposals are making many of these into mainstream functions.

...read moreread less

Abstract: In this short paper, we briefly overview the history of hypertext in the World Wide Web. The Web started with hypertext functions that have disappeared from the early popular browsers, and some are still not present in today's dominant browsers. The hypertext community has proposed ways to bring more sophisticated hypertext into the Web, and the new XML proposals are making many of these into mainstream functions.

...read moreread less

Proceedings Article•DOI•

Converting Web pages into well-formed XML documents

[...]

H. Ouahid, A. Karmouch

06 Jun 1999

TL;DR: The paper presents an overview of the Web mining agent system, then gives the motivations for the conversion into XML, and discusses in detail the transformation process performed on the Web documents.

...read moreread less

Abstract: The work presented is part of a Web mining agent (WMA) system under development at our Multimedia and Mobile Agent Research Laboratory. The purpose of this system is to automatically extract specific information from Web pages and appropriately format the extracted information for further use. This requires resolving problems related to the disorganized nature of the Web that may result from ill-formatted HTML-based Web pages. The desired information is extracted from the Web documents by applying a sequence of filters to these documents. Each of the filters has a specific role. We discuss the filter that is used to convert Web documents into well-formed XML documents. This conversion involves the following operations: (i) syntactic mapping of HTML to XML, (ii) resolving ambiguity introduced by HTML tagging rules, and (iii) handling errors that may occur due to improper usage of HTML by the authors. The paper presents an overview of the Web mining agent system, then gives the motivations for the conversion into XML and finally, discusses in detail the transformation process performed on the Web documents.

...read moreread less

On2broker: Lessons Learned from Applying AI to the Web

[...]

Dieter Fensel¹, Jürgen Angele, Stefan Decker, Michael Erdmann, Hans-Peter Schnurr, Rudi Studer - Show less +2 more•Institutions (1)

Karlsruhe Institute of Technology¹

01 Jan 1999

TL;DR: This paper sketches Ontobroker and discusses its main shortcomings, then shows how On2broker overcomes these limitations and the integration of new web standards like XML and RDF.

...read moreread less

Abstract: Ontobroker applies Artificial Intelligence techniques to improve access to heterogeneous, distributed and semistructured information sources as they are presented in the World Wide Web or organization-wide intranets. It relies on the use of ontologies to annotate web pages, formulate queries and derive answers. In the paper we will briefly sketch Ontobroker. Then we will discuss its main shortcomings, i.e. we will share the lessons we learned from our exercise. We will also show how On2broker overcomes these limitations. Most important is the separation of the query and inference engines and the integration of new web standards like XML and RDF.

...read moreread less

Journal Article•DOI•

A semantic approach adds meaning to the web

[...]

R. Hellman

01 Feb 1999-IEEE Computer

TL;DR: It is argued that new standards would let users apply the power of their technology to make the most of the Web.

...read moreread less

Abstract: It is argued that new standards would let users apply the power of their technology to make the most of the Web.

...read moreread less

Proceedings Article•DOI•

A natural language interface for information retrieval from forms on the World Wide Web

[...]

Frank Meng¹•Institutions (1)

University of California, Los Angeles¹

01 Jan 1999

TL;DR: This paper presents an approach for retrieving information from forms on the world wide web from natural language input, and a statistical disambiguation method based on n-gram statistics is proposed.

...read moreread less

Abstract: This paper presents an approach for retrieving information from forms on the world wide web from natural language input. The structured nature of the form can be utilized to process natural language input for querying data sources on the web that provide form interfaces. Since the valid values for each field can be determined from the form itself or by a user of the form, the form can be filled out be looking for these values in the natural language user input. Since it is possible for a particular value to be valid for more than one field, the surrounding context must be used to determine the correct field for an ambiguous value. A statistical disambiguation method based on n-gram statistics is proposed. It was shown that this method works better than using single context words for disambiguation when the domain is limited.

...read moreread less

Book Chapter•DOI•

A Framework for Warehousing the Web Contents

[...]

Yan Zhu¹•Institutions (1)

Technische Universität Darmstadt¹

13 Dec 1999

TL;DR: In this framework, a hybrid (partially materialized) approach and extended ontologies are used to achieve Web data integration and makes it possible to integrate DW data with Web-based information resources as they are needed.

...read moreread less

Abstract: This paper presents a framework for warehousing selected Web contents. In this framework, a hybrid (partially materialized) approach and extended ontologies are used to achieve Web data integration. This hybrid approach makes it possible to integrate DW data with Web-based information resources as they are needed. The Ontologies are used to represent domain knowledge related to Web sources and the logic model of data warehouses. Moreover, we define the mapping rules between Web data and attributes of data warehouses in the ontologies to facilitate the construction and maintenance requirements of data warehouses.

...read moreread less

Proceedings Article•DOI•

Temporal hypermedia for multimedia applications in the World Wide Web

[...]

Norbert Braun, Ralf Dörner

23 Sep 1999

TL;DR: A system architecture and implementation relying on commercial WWW technology is presented and the temporal aspects of hypermedia features for continuous media like audio and video resemble all other kinds of multimedia applications.

...read moreread less

Abstract: Multimedia applications within the World Wide Web (WWW) have to deal with difficulties like executing within Web pages and being transferred via the Internet. However, the temporal aspects of hypermedia features for continuous media like audio and video resemble all other kinds of multimedia applications. These temporal aspects are discussed in consideration of presentation and authoring facilities. A system architecture and implementation relying on commercial WWW technology is presented.

...read moreread less

Proceedings Article•

Searching on the Web: Two Types of Expertise (poster abstract).

[...]

Christoph Hölscher, Gerhard Strube

01 Jan 1999

Book Chapter•DOI•

2-D databases on the World Wide Web.

[...]

Ron D. Appel¹, Amos Marc Bairoch², Denis Hochstrasser¹•Institutions (2)

Geneva College¹, University of Geneva²

01 Jan 1999-Methods of Molecular Biology

Journal Article•

The Knowledge Web: Learning and Collaborating on the Web.

[...]

Abdul Paliwala

01 Jan 1999-Journal of Information, Law and Technology

Journal Article•DOI•

The Web-OEM approach to Web information extraction

[...]

Luca Iocchi¹•Institutions (1)

Fondazione Ugo Bordoni¹

01 Oct 1999-Journal of Network and Computer Applications

TL;DR: This framework provides an easy-to-use and well-formalized method for automatic generation of wrappers extracting data from Web documents and an associated SQL-like query language.

...read moreread less

Book Chapter•DOI•

Constructing a 2-D Database for the World Wide Web

[...]

Ron D. Appel¹, Christine Hoogland², Amos Marc Bairoch², Denis Hochstrasser¹•Institutions (2)

Geneva College¹, University of Geneva²

01 Jan 1999-Methods of Molecular Biology