scispace - formally typeset
Search or ask a question

Showing papers on "Semantic Web published in 2000"


Journal ArticleDOI
01 Jun 2000
TL;DR: It is found that while successful search performance requires the combination of the two types of expertise, specific strategies directly related to Web experience or domain knowledge can be identified.
Abstract: Searching for relevant information on the World Wide Web is often a laborious and frustrating task for casual and experienced users. To help improve searching on the Web based on a better understanding of user characteristics, we investigate what types of knowledge are relevant for Web-based information seeking, and which knowledge structures and strategies are involved. Two experimental studies are presented, which address these questions from different angles and with different methodologies. In the first experiment, 12 established Internet experts are first interviewed about search strategies and then perform a series of realistic search tasks on the World Wide Web. From this study a model of information seeking on the World Wide Web is derived and then tested in a second study. In the second experiment two types of potentially relevant types of knowledge are compared directly. Effects of Web experience and domain-specific background knowledge are investigated with a series of search tasks in an economics-related domain (introduction of the Euro currency). We find differential and combined effects of both Web experience and domain knowledge: while successful search performance requires the combination of the two types of expertise, specific strategies directly related to Web experience or domain knowledge can be identified.

721 citations


Journal ArticleDOI
TL;DR: The goal of the research described here is to automatically create a computer understandable knowledge base whose content mirrors that of the World Wide Web, and several machine learning algorithms for this task are described, and promising initial results with a prototype system that has created a knowledge base describing university people, courses, and research projects.

473 citations


Journal ArticleDOI
TL;DR: This paper considers how the semantic Web will provide intelligent access to heterogeneous and distributed information, enabling software products (agents) to mediate between user needs and available information sources.
Abstract: The Web has drastically changed the availability of electronic information, but its success and exponential growth have made it increasingly difficult to find, access, present and maintain such information for a wide variety of users. In reaction to this bottleneck many new research initiatives and commercial enterprises have been set up to enrich available information with machine-processable semantics. The paper considers how the semantic Web will provide intelligent access to heterogeneous and distributed information, enabling software products (agents) to mediate between user needs and available information sources. The paper discusses the Resource Description Framework, XML and other languages.

455 citations


Book
31 Aug 2000
TL;DR: This chapter discusses the structure and Dynamics of Organizational Knowledge, and the role of the Intranet as Infrastructure for Knowledge Work, in the context of knowledge work on the World Wide Web.
Abstract: Section I: Information Seeking and Knowledge Work. 1. Information Seeking. 2. The Structure and Dynamics of Organizational Knowledge. Section II: Knowledge Work on Intranets. 3. The Intranet as Infrastructure for Knowledge Work. 4. Designing Intranets to Support Knowledge Work. Section III: Information Seeking on the World Wide Web. 5. Models of Information Seeking on the World Wide Web. 6. Understanding Organizational Web Use. Coda. References. Index.

291 citations


Book
01 Jan 2000
TL;DR: This paper presents a meta-anatomy of the web, which aims to explain the web in the context of 21st Century society and investigates its role in promoting human rights and democracy.
Abstract: 1. Enquire within upon everything 2. Tangles, links and webs 3. info.cern.ch 4. Protocols: Simple rules for global systems 5. Going global 6. Browsing 7. Changes 8. Consortium 9. Competition and consensus 10. Web of people 11. Privacy 12. Mind to mind 13. Machines and the web 14. Weaving the web

240 citations


Journal ArticleDOI
01 Jun 2000
TL;DR: How a comprehensive and flexible strategy for building and maintaining a high-value community Web portal has been conceived and implemented based on an ontology as a semantic backbone for accessing information on the portal, for contributing information, as well as for developing and maintaining the portal is discussed.
Abstract: Community Web portals serve as portals for the information needs of particular communities on the Web. We here discuss how a comprehensive and flexible strategy for building and maintaining a high-value community Web portal has been conceived and implemented. The strategy includes collaborative information provisioning by the community members. It is based on an ontology as a semantic backbone for accessing information on the portal, for contributing information, as well as for developing and maintaining the portal. We have also implemented a set of ontology-based tools that have facilitated the construction of our show case — the community Web portal of the knowledge acquisition community.

226 citations


01 Jan 2000
TL;DR: This paper presents a comprehensive architecture and generic method for discovering a domain-tailored ontology from given intranet resources and describes the actual and ongoing work in supporting semi-automatic ontology acquisition from a corporate intranets of an insurance company.
Abstract: The focused access to knowledge resources like intranet documents plays a vital role in knowledge management and supports in general the shifting towards a Semantic Web. Ontologies act as a conceptual backbone for semantic document access by providing a common understanding and conceptualization of a domain. Building domain-specific ontologies is a time-consuming and expensive manual construction task. This paper describes our actual and ongoing work in supporting semi-automatic ontology acquisition from a corporate intranet of an insurance company. We present a comprehensive architecture and generic method for discovering a domain-tailored ontology from given intranet resources.

198 citations



ReportDOI
01 Jan 2000
TL;DR: The SHOE language is presented, which the authors feel has many of the features necessary to enable a semantic web, and an existing set of tools that make it easy to use the language are described.
Abstract: XML will have a profound impact on the way data is exchanged on the Internet. An important feature of thislanguage is the separation of content from presentation, which makes it easier to select and/or reformat the data.However, due to the likelihood of numerous industry and domain specific DTDs, those who wish to integrateinformation will still be faced with the problem of semantic interoperability. In this paper we discuss why thisproblem is not solved by XML, and then discuss why the Resource Description Framework is only a partial solution.We then present the SHOE language, which we feel has many of the features necessary to enable a semantic web,and describe an existing set of tools that make it easy to use the language.Jeff Heflin, University of MarylandJames Hendler, University of Maryland

166 citations


Journal ArticleDOI
16 May 2000
TL;DR: The case for identifying replicated documents and collections to improve web crawlers, archivers, and ranking functions used in search engines is made.
Abstract: Many web documents (such as JAVA FAQs) are being replicated on the Internet. Often entire document collections (such as hyperlinked Linux manuals) are being replicated many times. In this paper, we make the case for identifying replicated documents and collections to improve web crawlers, archivers, and ranking functions used in search engines. The paper describes how to efficiently identify replicated documents and hyperlinked document collections. The challenge is to identify these replicas from an input data set of several tens of millions of web pages and several hundreds of gigabytes of textual data. We also present two real-life case studies where we used replication information to improve a crawler and a search engine. We report these results for a data set of 25 million web pages (about 150 gigabytes of HTML data) crawled from the web.

145 citations


Journal ArticleDOI
TL;DR: The Resource Description Framework (RDF) provides a data model that supports fast integration of data sources by bridging semantic differences and can be used as a general framework for data exchange on the Web.
Abstract: The current World Wide Web supports mainly human browsing and searching of textual content. This model has become less and less adequate as the mass of available information increases. What is required instead is a model that supports integrated and uniform access to information sources and services as well as intelligent applications for information processing on the Web. Such a model requires standard mechanisms for interchanging data and handling different data semantics. The Resource Description Framework (RDF) is a step in this direction. RDF provides a data model that supports fast integration of data sources by bridging semantic differences. It is often used (and was initially designed) for representing metadata about other Web resources, such as XML files. However, representing metadata about the Web is not different from representing data generally. Thus, RDF can be used as a general framework for data exchange on the Web.

Journal Article
TL;DR: Support in data, information, and knowledge exchange is the key issue in current computer technology and is essential for “bringing the web to its full potential” in areas such as knowledge management and electronic commerce.
Abstract: Currently computers are changing from single isolated devices to entry points in a worldwide network of information exchange and business transactions called the World WideWeb (WWW). Therefore support in data, information, and knowledge exchange becomesthe key issue in current computer technology. The WWW has drastically changed theavailability of electronically available information. However, this success andexponential grow makes it increasingly difficult to find, to access, to present, and tomaintain the information of use to a wide variety of users. In reaction to this bottleneckmany new research initiatives and commercial enterprises have been set up to enrichavailable information with machineprocessable semantics. Such support is essential for “bringing the web to its full potential” in areas such as knowledge management andelectronic commerce. This

Book ChapterDOI
14 Aug 2000
TL;DR: The paper presents the mapping of RDF into CG and its interest in the context of the semantic Web and its approach is to exploit a standard language for expressing metadata and to interpret these metadata in conceptual graphs in order to exploit querying and inferencing capabilities enabled by CG formalism.
Abstract: With the aim of building a ”Semantic Web”, the content of the documents must be explicitly represented through metadata in order to enable contents-guided search. Our approach is to exploit a standard language (RDF, recommended by W3C) for expressing such metadata and to interpret these metadata in conceptual graphs (CG) in order to exploit querying and inferencing capabilities enabled by CG formalism. The paper presents our mapping of RDF into CG and its interest in the context of the semantic Web.

Proceedings ArticleDOI
06 Nov 2000
TL;DR: A way to use a user’s personal arrangement of concepts to navigate the Web using the characterizations created by the OBIWAN system and the mapping of the reference ontology to the personal ontology is shown to have a promising level of correctness and precision.
Abstract: The publicly indexable Web contains an estimated 800 million pages, however it is estimated that the largest search engine contains only 300 million of these pages. As the number of Internet users and the number of accessible Web pages grows, it is becoming increasingly difficult for users to find documents that are relevant to their particular needs. Often users must browse through a large hierarchy of categories to find the information for which they are looking. To provide the user with the most useful information in the least amount of time, we need a system that uses each user’s view of the world for classification. This paper explores a way to use a user’s personal arrangement of concepts to navigate the Web. This system is built by using the characterizations for a particular site created by the Ontology Based Informing Web Agent Navigation (OBIWAN) system and mapping from them to the user’s personal ontologies. OBIWAN allows users to explore multiple sites via the same browsing hierarchy. This paper extends OBIWAN to allow users to explore multiple sites via their own browsing hierarchy. The mapping of the reference ontology to the personal ontology is shown to have a promising level of correctness and precision.

01 Sep 2000
TL;DR: A layered approach to interoperability of information models that borrows from layered software structuring techniques used in today's internetworking is suggested and the key features of the object layer like identity and binary relationships, basic typing, reification, ordering, and n-ary relationships are suggested.
Abstract: On the Semantic Web, the target audience is the machinesr ather than humans. To satisfy the demands of this audience, information needs to be available in machine-processable form rather than as unstructured text. A variety of information models like RDF or UML are available to fulfil this purpose, varying greatly in their capabilities. The advent of XML leveraged a promising consensus on the encoding syntax for machine-processable information. However, interoperating between different information models on a syntactic level proved to be a laborious task. In this paper, we suggest a layered approach to interoperability of information models that borrows from layered software structuring techniques used in today's internetworking. We identify the object layer that fills the gap between the syntax and semantic layers and examine it in detail. We suggest the key features of the object layer like identity and binary relationships, basic typing, reification, ordering, and n-ary relationships. Finally, we examine design issues and implementation alternatives involved in building the object layer.

Journal ArticleDOI
TL;DR: An approach to document enrichment is presented, which consists of developing and integrating formal knowledge models with archives of documents, to provide intelligent knowledge retrieval and (possibly) additional knowledge-intensive services, beyond what is currently available using “standard” information retrieval and search facilities.
Abstract: In this paper, we present an approach to document enrichment, which consists of developing and integrating formal knowledge models with archives of documents, to provide intelligent knowledge retrieval and (possibly) additional knowledge-intensive services, beyond what is currently available using “standard” information retrieval and search facilities. Our approach is ontology-driven, in the sense that the construction of the knowledge model is carried out in a top-down fashion, by populating a given ontology, rather than in a bottom-up fashion, by annotating a particular document. In this paper, we give an overview of the approach and we examine the various types of issues (e.g. modelling, organizational and user interface issues) which need to be tackled to effectively deploy our approach in the workplace. In addition, we also discuss a number of technologies we have developed to support ontology-driven document enrichment and we illustrate our ideas in the domains of electronic news publishing, scholarly discourse and medical guidelines.

Proceedings Article
01 Jan 2000
TL;DR: The OIL language extends the RDF schema standard to provide just such a layer, which combines the most attractive features of frame based languages with the expressive power, formal rigour and reasoning services of a very expressive description logic.
Abstract: Exploiting the full potential of the World Wide Web will require semantic as well as syntactic interoperability. This can best be achieved by providing a further representation and inference layer that builds on existing and proposed web standards. The OIL language extends the RDF schema standard to provide just such a layer. It combines the most attractive features of frame based languages with the expressive power, formal rigour and reasoning services of a very expressive description logic.

01 Jan 2000
TL;DR: Results of association rules, propositional and relational learning are provided, which demonstrate that data-mining can help us improve the authors' extractors, and that using information from two kinds of sources improves the reliability of data-mined rules.
Abstract: Information extractors and classifiers operating on unrestricted, unstructured texts are an errorful source of large amounts of potentially useful information, especially when combined with a crawler which automatically augments the knowledge base from the world-wide web. At the same time, there is much structured information on the World Wide Web. Wrapping the web-sites which provide this kind of information provide us with a second source of information; possibly less up-to-date, but reliable as facts. We give a case study of combining information from these two kinds of sources in the context of learning facts about companies. We provide results of association rules, propositional and relational learning, which demonstrate that data-mining can help us improve our extractors, and that using information from two kinds of sources improves the reliability of data-mined rules.

01 Jan 2000
TL;DR: A conceptual framework for enriching Web links by displaying small, information-rich visualizations—pop-up views—that provide the user with information about linked pages that can be used to evaluate the appropriateness of the pages before making a commitment to select the link and wait for the page to load.
Abstract: We describe a conceptual framework for enriching Web links by displaying small, information-rich visualizations—pop-up views—that provide the user with information about linked pages that can be used to evaluate the appropriateness of the pages before making a commitment to select the link and wait for the page to load. Examples of how the enriched links framework could be applied in contexts, such as e-commerce catalog pages, search results for a video repository, and desktop icons, are also presented.

Journal ArticleDOI
TL;DR: Analysis of the Web pages retrieved by the major search engines on a particular date, as a result of the query “informetrics OR informetric,” indicates that valuable, freely available data is hidden in the Web waiting to be extracted from the millions of Web pages.
Abstract: This article addresses the question of whether the Web can serve as an information source for research. Specifically, it analyzes by way of content analysis the Web pages retrieved by the major search engines on a particular date (June 7, 1998), as a result of the query “informetrics OR informetric.” In 807 out of the 942 retrieved pages, the search terms were mentioned in the context of information science. Over 70% of the pages contained only indirect information on the topic, in the form of hypertext links and bibliographical references without annotation. The bibliographical references extracted from the Web pages were analyzed, and lists of most productive authors, most cited authors, works, and sources were compiled. The list of references obtained from the Web was also compared to data retrieved from commercial databases. For most cases, the list of references extracted from the Web outperformed the commercial, bibliographic databases. The results of these comparisons indicate that valuable, freely available data is hidden in the Web waiting to be extracted from the millions of Web pages.


Proceedings Article
30 Oct 2000
TL;DR: These are the assets of an approach combining XML technology designed for the Web and the distributed nature of multi-agent systems as a solution to the heterogeneity and the distribution of the corporate memory.
Abstract: A corporate memory and the World Wide Web have in common that they are both heterogeneous and distributed information landscapes. They also share the same problem of relevance of results when one wants to search them. However, compared to the Web, a corporate memory has a delimited and better defined context, infrastructure and scope : the corporation. Taking into account the characteristics of a corporate memory we show in this paper the assets of an approach combining XML technology designed for the Web and the distributed nature of multi-agent systems. In particular, we consider the heterogeneity and distribution of the multi-agent system as a solution to the heterogeneity and the distribution of the corporate memory.

Journal ArticleDOI
TL;DR: Ontobroker as mentioned in this paper applies Artificial Intelligence techniques to improve access to heterogeneous, distributed and semi-structured information sources as they are presented in the World Wide Web or organization-wide intranets.
Abstract: Ontobroker applies Artificial Intelligence techniques to improve access to heterogeneous, distributed and semi-structured information sources as they are presented in the World Wide Web or organization-wide intranets. It relies on the use of ontologies to annotate web pages, formulate queries and derive answers. In this paper we will briefly sketch Ontobroker. Then we will discuss its main shortcomings, i.e. we will share the lessons we learned from our exercise. We will also show how On2broker overcomes these limitations. Most important is the separation of the query and inference engines and the integration of new web standards like XML and RDF.

Journal ArticleDOI
01 Jun 2000
TL;DR: The paper describes the OHIF format and demonstrates how the Webvise system handles OHIF, and argues for better support for handling user controlled meta data, e.g. support for linking in non-XML data, integration of external linking in the Web infrastructure, and collaboration support for external structures and meta-data.
Abstract: This paper introduces an approach to utilise open hypermedia structures such as links, annotations, collections and guided tours as meta data for Web resources The paper introduces an XML based data format, called Open Hypermedia Interchange Format, OHIF, for such hypermedia structures OHIF resembles XLink with respect to its representation of out-of-line links, but it goes beyond XLink with a more rich set of structuring mechanisms, including eg composites Moreover OHIF includes an addressing mechanisms (LocSpecs) that goes beyond XPointer and URL in its ability to locate non-XML data segments By means of the Webvise system, OHIF structures can be authored, imposed on Web pages, and finally linked on the Web as any ordinary Web resource Following a link to an OHIF file automatically invokes a Webvise download of the meta data structures and the annotated Web content will be displayed in the browser Moreover, the Webvise system provides support for users to create, manipulate, and share the OHIF structures together with custom made Web pages and MS Office 2000 documents on WebDAV servers These Webvise facilities goes beyond earlier open hypermedia systems in that it now allows fully distributed open hypermedia linking between Web pages and WebDAV aware desktop applications The paper describes the OHIF format and demonstrates how the Webvise system handles OHIF Finally, it argues for better support for handling user controlled meta data, eg support for linking in non-XML data, integration of external linking in the Web infrastructure, and collaboration support for external structures and meta-data

Proceedings Article
01 Jan 2000
TL;DR: This paper introduced a labels discovery algorithm that uses the hierarchical structure extracted from the web pages to discover similar labels which describe the same kind of information.
Abstract: Many Web documents containing the same type of information , would have similar structure. In this paper, we examine the problem of nding the structure of web documents and present a hierarchical structure to represent the relation among text data in the web documents. Due to the loose standard of web page publishing, diierent authors can use diierent wordings (labels) to label the same information. We introduced a labels discovery algorithm that uses the hierarchical structure extracted from the web pages. The algorithm discovers similar labels which describe the same kind of information. Such labels would help us nd the structure of the web documents. Experiments have shown that the algorithm can successfully discover similar labels and the structure obtained by our method can distinguish web pages accurately.

Proceedings ArticleDOI
03 Oct 2000
TL;DR: A method to extract descriptions of technical terms from Web pages in order to utilize the World Wide Web as an encyclopedia by using linguistic patterns and HTML text structures and a clustering method to summarize resultant descriptions.
Abstract: In this paper, we propose a method to extract descriptions of technical terms from Web pages in order to utilize the World Wide Web as an encyclopedia. We use linguistic patterns and HTML text structures to extract text fragments containing term descriptions. We also use a language model to discard extraneous descriptions, and a clustering method to summarize resultant descriptions. We show the effectiveness of our method by way of experiments.


Journal ArticleDOI
01 Jun 2000
TL;DR: This paper defines and formalizes the general duality problem of relations on the Web, and solves the problem of identifying acronyms and their expansions through patterns of occurrences of (acronym, expansion) pairs as they occur in Web pages.
Abstract: The Web is a vast source of information. However, due to the disparate authorship of Web pages, this information is buried in its amorphous and chaotic structure. At the same time, with the pervasiveness of Web access, an increasing number of users is relying on Web search engines for interesting information. We are interested in identifying how pieces of information are related as they are presented on the Web. One such problem is studying patterns of occurrences of related phrases in Web documents and in identifying relationships between these phrases. We call these the duality problems of the Web. Duality problems are materialized in trying to define and identify two sets of inter-related concepts, and are solved by iteratively refining mutually dependent coarse definitions of these concepts. In this paper we define and formalize the general duality problem of relations on the Web. Duality of patterns and relationships are of importance because they allow us to define the rules of patterns and relationships iteratively through the multitude of their occurrences. Our solution includes Web crawling to iteratively refine the definition of patterns and relations. As an example we solve the problem of identifying acronyms and their expansions through patterns of occurrences of (acronym, expansion) pairs as they occur in Web pages.

Book ChapterDOI
10 Oct 2000
TL;DR: A need to support interactive translation of Web pages as the World Wide Web becomes more accessible to people with varying needs and abilities throughout the world is envisioned.
Abstract: A mixed-initiative system is one which allows more interactivity between the system and user, as the system is reasoning We present some observations on the task of translating Web pages for users and suggest that a more interactive approach to this problem may be desirable The aim is to interact with the user who is requesting the translation and the challenge is to determine the circumstances under which the user should be able to take the initiative to direct the processing or the system should be able to take the initiative to solicit further input from the user In fact, we envision a need to support interactive translation of Web pages as the World Wide Web becomes more accessible to people with varying needs and abilities throughout the world

Book ChapterDOI
04 Sep 2000
TL;DR: This paper presents an eXtensible Web Modeling Framework (XWMF), which applies the Resource Description Framework (RDF) to Web engineering to provide an interoperable exchange format.
Abstract: Generally, a multitude of tools is used for the management of a Web application life cycle. It is highly desirable to provide an exchange format for such tools to enable interoperability. This paper presents an eXtensible Web Modeling Framework (XWMF), which applies the Resource Description Framework (RDF) to Web engineering to provide an interoperable exchange format. Our proposed framework makes use of one and the same (meta- ) data model to specify the structure and content of a Web application, to make statements about the elements of a Web application, and to reason about the data and metadata. XWMF is extensible, because schemata defining additional vocabulary to integrate new design artifacts can be added. The XWMF tools are able to convert the Web application (metadata) description into the corresponding Web implementation.