scispace - formally typeset
Search or ask a question

Showing papers on "Semantic Web published in 2002"


01 Jan 2002
TL;DR: An ontology defines a common vocabulary for researchers who need to share information in a domain that includes machine-interpretable definitions of basic concepts in the domain and relations among them.
Abstract: 1 Why develop an ontology? In recent years the development of ontologies—explicit formal specifications of the terms in the domain and relations among them (Gruber 1993)—has been moving from the realm of ArtificialIntelligence laboratories to the desktops of domain experts. Ontologies have become common on the World-Wide Web. The ontologies on the Web range from large taxonomies categorizing Web sites (such as on Yahoo!) to categorizations of products for sale and their features (such as on Amazon.com). The WWW Consortium (W3C) is developing the Resource Description Framework (Brickley and Guha 1999), a language for encoding knowledge on Web pages to make it understandable to electronic agents searching for information. The Defense Advanced Research Projects Agency (DARPA), in conjunction with the W3C, is developing DARPA Agent Markup Language (DAML) by extending RDF with more expressive constructs aimed at facilitating agent interaction on the Web (Hendler and McGuinness 2000). Many disciplines now develop standardized ontologies that domain experts can use to share and annotate information in their fields. Medicine, for example, has produced large, standardized, structured vocabularies such as SNOMED (Price and Spackman 2000) and the semantic network of the Unified Medical Language System (Humphreys and Lindberg 1993). Broad general-purpose ontologies are emerging as well. For example, the United Nations Development Program and Dun & Bradstreet combined their efforts to develop the UNSPSC ontology which provides terminology for products and services (www.unspsc.org). An ontology defines a common vocabulary for researchers who need to share information in a domain. It includes machine-interpretable definitions of basic concepts in the domain and relations among them. Why would someone want to develop an ontology? Some of the reasons are:

4,838 citations


Book ChapterDOI
09 Jun 2002
TL;DR: In this article, the authors propose a solution based on DAML-S, a DAMLbased language for service description, and show how service capabilities are presented in the Profile section of a DAMl-S description and how a semantic match between advertisements and requests is performed.
Abstract: The Web is moving from being a collection of pages toward a collection of services that interoperate through the Internet. The first step toward this interoperation is the location of other services that can help toward the solution of a problem. In this paper we claim that location of web services should be based on the semantic match between a declarative description of the service being sought, and a description of the service being offered. Furthermore, we claim that this match is outside the representation capabilities of registries such as UDDI and languages such as WSDL.We propose a solution based on DAML-S, a DAML-based language for service description, and we show how service capabilities are presented in the Profile section of a DAML-S description and how a semantic match between advertisements and requests is performed.

2,412 citations


Journal ArticleDOI
TL;DR: This work shows that the Web self-organizes and its link structure allows efficient identification of communities and is significant because no central authority or process governs the formation and structure of hyperlinks.
Abstract: The vast improvement in information access is not the only advantage resulting from the increasing percentage of hyperlinked human knowledge available on the Web. Additionally, much potential exists for analyzing interests and relationships within science and society. However, the Web's decentralized and unorganized nature hampers content analysis. Millions of individuals operating independently and having a variety of backgrounds, knowledge, goals and cultures author the information on the Web. Despite the Web's decentralized, unorganized, and heterogeneous nature, our work shows that the Web self-organizes and its link structure allows efficient identification of communities. This self-organization is significant because no central authority or process governs the formation and structure of hyperlinks.

1,033 citations


Proceedings ArticleDOI
07 May 2002
TL;DR: Glue is described, a system that employs machine learning techniques to find semantic mappings between ontologies and is distinguished in that it works with a variety of well-defined similarity notions and that it efficiently incorporates multiple types of knowledge.
Abstract: Ontologies play a prominent role on the Semantic Web. They make possible the widespread publication of machine understandable data, opening myriad opportunities for automated information processing. However, because of the Semantic Web's distributed nature, data on it will inevitably come from many different ontologies. Information processing across ontologies is not possible without knowing the semantic mappings between their elements. Manually finding such mappings is tedious, error-prone, and clearly not possible at the Web scale. Hence, the development of tools to assist in the ontology mapping process is crucial to the success of the Semantic Web.We describe glue, a system that employs machine learning techniques to find such mappings. Given two ontologies, for each concept in one ontology glue finds the most similar concept in the other ontology. We give well-founded probabilistic definitions to several practical similarity measures, and show that glue can work with all of them. This is in contrast to most existing approaches, which deal with a single similarity measure. Another key feature of glue is that it uses multiple learning strategies, each of which exploits a different type of information either in the data instances or in the taxonomic structure of the ontologies. To further improve matching accuracy, we extend glue to incorporate commonsense knowledge and domain constraints into the matching process. For this purpose, we show that relaxation labeling, a well-known constraint optimization technique used in computer vision and other fields, can be adapted to work efficiently in our context. Our approach is thus distinguished in that it works with a variety of well-defined similarity notions and that it efficiently incorporates multiple types of knowledge. We describe a set of experiments on several real-world domains, and show that glue proposes highly accurate semantic mappings.

1,027 citations


Proceedings ArticleDOI
07 May 2002
TL;DR: This paper defines the semantics for a relevant subset of DAML-S in terms of a first-order logical language and provides decision procedures for Web service simulation, verification and composition.
Abstract: Web services -- Web-accessible programs and devices - are a key application area for the Semantic Web. With the proliferation of Web services and the evolution towards the Semantic Web comes the opportunity to automate various Web services tasks. Our objective is to enable markup and automated reasoning technology to describe, simulate, compose, test, and verify compositions of Web services. We take as our starting point the DAML-S DAML+OIL ontology for describing the capabilities of Web services. We define the semantics for a relevant subset of DAML-S in terms of a first-order logical language. With the semantics in hand, we encode our service descriptions in a Petri Net formalism and provide decision procedures for Web service simulation, verification and composition. We also provide an analysis of the complexity of these tasks under different restrictions to the DAML-S composite services we can describe. Finally, we present an implementation of our analysis techniques. This implementation takes as input a DAML-S description of a Web service, automatically generates a Petri Net and performs the desired analysis. Such a tool has broad applicability both as a back end to existing manual Web service composition tools, and as a stand-alone tool for Web service developers.

953 citations


Proceedings ArticleDOI
07 May 2002
TL;DR: In this article, the authors discuss the open source project Edutella which builds upon metadata standards defined for the WWW and aims to provide an RDF-based metadata infrastructure for P2P applications, building on the recently announced JXTA Framework.
Abstract: Metadata for the World Wide Web is important, but metadata for Peer-to-Peer (P2P) networks is absolutely crucial. In this paper we discuss the open source project Edutella which builds upon metadata standards defined for the WWW and aims to provide an RDF-based metadata infrastructure for P2P applications, building on the recently announced JXTA Framework. We describe the goals and main services this infrastructure will provide and the architecture to connect Edutella Peers based on exchange of RDF metadata. As the query service is one of the core services of Edutella, upon which other services are built, we specify in detail the Edutella Common Data Model (ECDM) as basis for the Edutella query exchange language (RDF-QEL-i) and format implementing distributed queries over the Edutella network. Finally, we shortly discuss registration and mediation services, and introduce the prototype and application scenario for our current Edutella aware peers.

939 citations


Proceedings Article
01 Jan 2002
TL;DR: It is argued that an augmented version of the logic programming language Golog provides a natural formalism for automatically composing services on the Semantic Web and logical criteria for these generic procedures that define when they are knowledge self-sufficient and physically selfsufficient are proposed.
Abstract: Motivated by the problem of automatically composing network accessible services, such as those on the World Wide Web, this paper proposes an approach to building agent technology based on the notion of generic procedures and customizing user constraint. We argue that an augmented version of the logic programming language Golog provides a natural formalism for automatically composing services on the Semantic Web. To this end, we adapt and extend the Golog language to enable programs that are generic, customizable and usable in the context of the Web. Further, we propose logical criteria for these generic procedures that define when they are knowledge self-sufficient and physically selfsufficient. To support information gathering combined with search, we propose a middle-ground Golog interpreter that operates under an assumption of reasonable persistence of certain information. These contributions are realized in our augmentation of a ConGolog interpreter that combines online execution of information-providing Web services with offline simulation of worldaltering Web services, to determine a sequence of Web Services for subsequent execution. Our implemented system is currently interacting with services on the Web.

939 citations


Journal ArticleDOI
01 Jun 2002
TL;DR: A taxonomy for characterizing Web data extraction fools is proposed, a survey of major web data extraction tools described in the literature is briefly surveyed, and a qualitative analysis of them is provided.
Abstract: In the last few years, several works in the literature have addressed the problem of data extraction from Web pages. The importance of this problem derives from the fact that, once extracted, the data can be handled in a way similar to instances of a traditional database. The approaches proposed in the literature to address the problem of Web data extraction use techniques borrowed from areas such as natural language processing, languages and grammars, machine learning, information retrieval, databases, and ontologies. As a consequence, they present very distinct features and capabilities which make a direct comparison difficult to be done. In this paper, we propose a taxonomy for characterizing Web data extraction fools, briefly survey major Web data extraction tools described in the literature, and provide a qualitative analysis of them. Hopefully, this work will stimulate other studies aimed at a more comprehensive analysis of data extraction approaches and tools for Web data.

760 citations


Book
01 Jan 2002
TL;DR: This chapter discusses the infrastructure of the Web, the future of Web mining, and applications of semi-supervised learning for text and similarity and clustering.
Abstract: Preface. Introduction. I Infrastructure: Crawling the Web. Web search. II Learning: Similarity and clustering. Supervised learning for text. Semi-supervised learning. III Applications: Social network analysis. Resource discovery. The future of Web mining.

751 citations


Book
01 Dec 2002
TL;DR: Towards theSemantic Web focuses on the application of Semantic Web technology and ontologies in particular to electronically available information to improve the quality of knowledge management in large and distributed organizations.
Abstract: From the Publisher: "Towards the Semantic Web focuses on the application of Semantic Web technology and ontologies in particular to electronically available information to improve the quality of knowledge management in large and distributed organizations. Covering the key technologies for the next generation of the WWW, this book is a mixture of theory, tools and applications in an important area of WWW research." Aimed primarily at researchers and developers in the area of WWW-based knowledge management and information retrieval. It will also be a useful reference for students in computer science at the postgraduate level, academic and industrial researchers in the field, business managers who are aiming to increase the corporations' information infrastructure and industrial personnel who are tracking WWW technology developments in order to understand the business implications.

647 citations


BookDOI
01 Jan 2002
TL;DR: A full-fledged Web Service Modeling Framework (WSMF) is defined that provides the appropriate conceptual model for developing and describing web services and their composition and is based on the following principle: maximal de-coupling complemented by scalable mediation service.
Abstract: Web Services will transform the web from a collection of information into a distributed device of computation. In order to employ their full potential, appropriate description means for web services need to be developed. For this purpose we define a full-fledged Web Service Modeling Framework (WSMF) that provides the appropriate conceptual model for developing and describing web services and their composition. Its philosophy is based on the following principle: maximal de-coupling complemented by scalable mediation service. The current web is mainly a collection of information but does not yet provide support in processing this information, i.e., in using the computer as a computational device. Recent efforts around UDDI, WSDL, and SOAP try to lift the web to a new level of service. Software programs can be accessed and executed via the web based on the idea of web services. A service can provide information, e.g. a weather forecast service, or it may have an effect in the real world, e.g. an online flight booking service. Web services can significantly increase the Web architecture’s potential, by providing a way of automated program communication, discovery of services, etc. Therefore, they are in the centre of interests of various software developing companies. In a business environment this translates into the automatic cooperation between enterprises. An enterprise requiring a business interaction with another enterprise can automatically discover and select the appropriate optimal web services relying on selection policies. They can be invoked automatically and payment processes can be initiated. Any necessary mediation is applied based on data and process ontologies and the automatic translation of their concepts into each other. An example are supply chain relationships where a manufacturing enterprise of short-lived goods has to frequently seek suppliers as well as buyers dynamically. Instead of constantly searching for suppliers and I. Horrocks and J. Hendler (Eds.): ISWC 2002, LNCS 2342, pp. 1–2, 2002. c © Springer-Verlag Berlin Heidelberg 2002 2 Dieter Fensel, Christoph Bussler, and Alexander Maedche buyers by employees the web service infrastructure does it automatically within the defined constraints. Still, there need to be done more work before the web service infrastructure can make this vision true. Current technology around UDDI, WSDL, and SOAP provide limited support in mechanizing service recognition, service configuration and combination (i.e., realizing complex workflows and business logics with web services), service comparison and automated negotiation. Therefore, there are proposals such as WSFL that develops a language for describing complex web services or DAML-S that employees semantic web technology for service description. The Web Service Modeling Framework (WSMF) follows this line of research. It is a full-fledged modeling framework for describing the various aspects related to web services. Fully enabled E-commerce based on workable web services requires a modeling framework that is centered around two complementary principles: – Strong de-coupling of the various components that realize an E-commerce application. – Strong mediation service enabling anybody to speak with everybody in a scalable manner. These principles are rolled out in a number of specification elements and an architecture describing their relationships. WSMF is the methodological framework developed within SWWS,1 a recent European project aiming on Semantic Web enabled Web Services. SWWS accounts for three main challenges: – Provide a comprehensive Web Service description framework, including the definition of a Web Service Modeling Framework WSMF (establishing a tight connection to industrial like XML, RDF, WSDL, WSFL and research efforts like, DAML+OIL, OWL, etc. – Define a Web Service discovery framework that goes beyond simple registration means (like UDDI) and provides full-fledged ontology-based and metadata driven service discovery. – Provide a scalable Web Service mediation framework that is fundamentally based on the P2P approach in order to provide direct connectivity between service requesters and service providers. This framework also includes means for configuration, composition and negotiation. SWWS has a large industrial advisory board with more than 60 members and is the nucleus of an initiative for a large integrated project within framework VI of the research funding schema of the European Commission. Semantic Web enabled Web Services are a key-enabler for intelligent web services. Intelligent web services have a revolutionary potential for many applications areas such as eWork, eCommerce, eGoverment, eLearning, etc. We will sketch this potential during the talk. 1 Project partners are the Vrije Universiteit Amsterdam (coordinator), NL; FZI, Germany; Isoco, Spain; Shinka Technologies, Germany; Ontotext, Bulgaria; and Hewlett-Packard (HP), UK. The Grid, Grid Services and the Semantic Web: Technologies and Opportunities

Journal ArticleDOI
B. McBride1
TL;DR: The Jena toolkit is a Java application programming interface that is available as an open-source download from www.hpl.com/semweb/jena-top.
Abstract: HP Labs developed the Jena toolkit to make it easier to develop applications that use the semantic Web information model and languages. Jena is a Java application programming interface that is available as an open-source download from www.hpl.hp.com/semweb/jena-top.html.

Journal ArticleDOI
TL;DR: The paper presents the overall design of Annotea and describes some of the issues the project has faced and how it has solved them, including combining RDF with XPointer, XLink, and HTTP.

Proceedings ArticleDOI
07 May 2002
TL;DR: A new RDF query language called RQL is proposed, which is a typed functional language (a la OQL) and relies on a formal model for directed labeled graphs permitting the interpretation of superimposed resource descriptions by means of one or more RDF schemas.
Abstract: Real-scale Semantic Web applications, such as Knowledge Portals and E-Marketplaces, require the management of large volumes of metadata, i.e., information describing the available Web content and services. Better knowledge about their meaning, usage, accessibility or quality will considerably facilitate an automated processing of Web resources. The Resource Description Framework (RDF) enables the creation and exchange of metadata as normal Web data. Although voluminous RDF descriptions are already appearing, sufficiently expressive declarative languages for querying both RDF descriptions and schemas are still missing. In this paper, we propose a new RDF query language called RQL. It is a typed functional language (a la OQL) and relies on a formal model for directed labeled graphs permitting the interpretation of superimposed resource descriptions by means of one or more RDF schemas. RQL adapts the functionality of semistructured/XML query languages to the peculiarities of RDF but, foremost, it enables to uniformly query both resource descriptions and schemas. We illustrate the RQL syntax, semantics and typing system by means of a set of example queries and report on the performance of our persistent RDF Store employed by the RQL interpreter.

Book
01 Oct 2002
TL;DR: The Semantic Web as discussed by the authors is a new type of hierarchy and standardization that will replace the current Web of links with a web of meaning using a flexible set of languages and tools.
Abstract: From the Publisher: As the World Wide Web continues to expand, it becomes increasingly difficult for users to obtain information efficiently. Because most search engines read format languages such as HTML or SGML, search results reflect formatting tags more than actual page content, which is expressed in natural language. Spinning the Semantic Web describes an exciting new type of hierarchy and standardization that will replace the current "web of links" with a "web of meaning." Using a flexible set of languages and tools, the Semantic Web will make all available information--display elements, metadata, services, images, and especially content--accessible. The result will be an immense repository of information accessible for a wide range of new applications. This first handbook for the Semantic Web covers, among other topics, software agents that can negotiate and collect information, markup languages that can tag many more types of information in a document, and knowledge systems that enable machines to read Web pages and determine their reliability. The truly interdisciplinary Semantic Web combines aspects of artificial intelligence, markup languages, natural language processing, information retrieval, knowledge representation, intelligent agents, and databases.

Book ChapterDOI
09 Jun 2002
TL;DR: This paper focuses on collaborative development of ontologies with OntoEdit which is guided by a comprehensive methodology.
Abstract: Ontologies now play an important role for enabling the semantic web. They provide a source of precisely defined terms e.g. for knowledge-intensive applications. The terms are used for concise communication across people and applications. Typically the development of ontologies involves collaborative efforts of multiple persons. OntoEdit is an ontology editor that integrates numerous aspects of ontology engineering. This paper focuses on collaborative development of ontologies with OntoEdit which is guided by a comprehensive methodology.

Proceedings ArticleDOI
08 Nov 2002
TL;DR: By explicitly representing the role of semantics in different components of the information retrieval process (people, interfaces, search systems, and information resources), the Semantic Geospatial Web will enable users to retrieve more precisely the data they need, based on the semantics associated with these data.
Abstract: With the growth of the World Wide Web has come the insight that currently available methods for finding and using information on the web are often insufficient. In order to move the Web from a data repository to an information resource, a totally new way of organizing information is needed. The advent of the Semantic Web promises better retrieval methods by incorporating the data's semantics and exploiting the semantics during the search process. Such a development needs special attention from the geospatial perspective so that the particularities of geospatial meaning are captured appropriately. The creation the Semantic Geospatial Web needs the development multiple spatial and terminological ontologies, each with a formal semantics; the representation of those semantics such that they are available both to machines for processing and to people for understanding; and the processing of geospatial queries against these ontologies and the evaluation of the retrieval results based on the match between the semantics of the expressed information need and the available semantics of the information resources and search systems. This will lead to a new framework for geospatial information retrieval based on the semantics of spatial and terminological ontologies. By explicitly representing the role of semantics in different components of the information retrieval process (people, interfaces, search systems, and information resources), the Semantic Geospatial Web will enable users to retrieve more precisely the data they need, based on the semantics associated with these data.

Book ChapterDOI
01 Oct 2002
TL;DR: MAFRA is presented, an interactive, incremental and dynamic framework for mapping distributed ontologies in the Semantic Web, and aims to balance the autonomy of each community with the need for interoperability.
Abstract: Ontologies as means for conceptualizing and structuring domain knowledge within a community of interest are seen as a key to realize the Semantic Web vision. However, the decentralized nature of the Web makes achieving this consensus across communities difficult, thus, hampering efficient knowledge sharing between them. In order to balance the autonomy of each community with the need for interoperability, mapping mechanisms between distributed ontologies in the Semantic Web are required. In this paper we present MAFRA, an interactive, incremental and dynamic framework for mapping distributed ontologies.

Book ChapterDOI
TL;DR: This work expands on previous work by showing how DAML-S Service Profiles, that describe service capabilities within DAML, can be mapped into UDDI records providing therefore a way to record semantic information within U DDI records, and shows how this encoded information can be used within the UDDi registry to perform semantic matching.
Abstract: The web is moving from being a collection of pages toward a collection of services that interoperate through the Internet. A fundamental step toward this interoperation is the ability of automatically locating services on the bases of the functionalities that they provide. Such a functionality would allow services to locate each other and automatically interoperate. Location of web services is inherently a semantic problem, because it has to abstract from the superficial differences between representations of the services provided, and the services requested to recognize semantic similarities between the two.Current Web Services technology based on UDDI and WSDL does not make any use of semantic information and therefore fails to address the problem of matching between capabilities of services and allowing service location on the bases of what functionalities are sought, failing therefore to address the problem of locating web services. Nevertheless, previous work within DAML-S, a DAML-based language for service description, shows how ontological information collected through the semantic web can be used to match service capabilities. This work expands on previous work by showing how DAML-S Service Profiles, that describe service capabilities within DAML-S, can be mapped into UDDI records providing therefore a way to record semantic information within UDDI records. Furthermore we show how this encoded information can be used within the UDDI registry to perform semantic matching.

Journal ArticleDOI
TL;DR: The goal is to help developers find the most suitable language for their representation needs for the semantic information that this Web requires-solving heterogeneous data exchange in this heterogeneous environment.
Abstract: Ontologies have proven to be an essential element in many applications. They are used in agent systems, knowledge management systems, and e-commerce platforms. They can also generate natural language, integrate intelligent information, provide semantic-based access to the Internet, and extract information from texts in addition to being used in many other applications to explicitly declare the knowledge embedded in them. However, not only are ontologies useful for applications in which knowledge plays a key role, but they can also trigger a major change in current Web contents. This change is leading to the third generation of the Web-known as the Semantic Web-which has been defined as the conceptual structuring of the Web in an explicit machine-readable way. New ontology-based applications and knowledge architectures are developing for this new Web. A common claim for all of these approaches is the need for languages to represent the semantic information that this Web requires-solving heterogeneous data exchange in this heterogeneous environment. Our goal is to help developers find the most suitable language for their representation needs.

Book ChapterDOI
09 Jun 2002
TL;DR: This paper presents TRIPLE, a layered and modular rule language for the Semantic Web that is based on Horn logic and borrows many basic features from F-Logic but is especially designed for querying and transforming RDF models.
Abstract: This paper presents TRIPLE, a layered and modular rule language for the Semantic Web [1]. TRIPLE is based on Horn logic and borrows many basic features from F-Logic [11] but is especially designed for querying and transforming RDF models [20].TRIPLE can be viewed as a successor of SiLRI (Simple Logic-based RDF Interpreter [5]). One of the most important differences to F-Logic and SiLRI is that TRIPLE does not have a fixed semantics for object-oriented features like classes and inheritance. Its layered architecture allows such features to be easily defined for different object-oriented and other data models like UML, Topic Maps, or RDF Schema [19]. Description logics extensions of RDF (Schema) like OIL [17] and DAML+OIL [3] that cannot be fully handled by Horn logic are provided as modules that interact with a description logic classifier, e.g. FaCT [9], resulting in a hybrid rule language. This paper sketches syntax and semantics of TRIPLE.


Journal ArticleDOI
TL;DR: In this paper, the authors focus on DAML's current markup language, DAML+OIL, which is a proposed starting point for the W3C's Semantic Web Activity's Ontology Web Language (OWL).
Abstract: By all measures, the Web is enormous and growing at a staggering rate, which has made it increasingly difficult-and important-for both people and programs to have quick and accurate access to Web information and services. The Semantic Web offers a solution, capturing and exploiting the meaning of terms to transform the Web from a platform that focuses on presenting information, to a platform that focuses on understanding and reasoning with information. To support Semantic Web development, the US Defense Advanced Research Projects Agency launched the DARPA Agent Markup Language (DAML) initiative to fund research in languages, tools, infrastructure, and applications that make Web content more accessible and understandable. Although the US government funds DAML, several organizations-including US and European businesses and universities, and international consortia such as the World Wide Web Consortium-have contributed to work on issues related to DAML's development and deployment. We focus on DAML's current markup language, DAML+OIL, which is a proposed starting point for the W3C's Semantic Web Activity's Ontology Web Language (OWL). We introduce DAML+OIL syntax and usage through a set of examples, drawn from a wine knowledge base used to teach novices how to build ontologies.

Book ChapterDOI
01 Oct 2002
TL;DR: OntoMat-Annotizer extract with the help of Amilcare knowledge structure from web pages through the use of knowledge extraction rules, the result of a learning-cycle based on already annotated pages.
Abstract: Richly interlinked, machine-understandable data constitute the basis for the Semantic Web. We provide a framework, S-CREAM, that allows for creation of metadata and is trainable for a specific domain. Annotating web documents is one of the major techniques for creating metadata on the web. The implementation of S-CREAM, OntoMat-Annotizer supports now the semi-automatic annotation of web pages. This semi-automatic annotation is based on the information extraction component Amilcare. OntoMat-Annotizer extract with the help of Amilcare knowledge structure from web pages through the use of knowledge extraction rules. These rules are the result of a learning-cycle based on already annotated pages.

Book ChapterDOI
01 Oct 2002
TL;DR: M is presented, an annotation tool which provides both automated and semi-automated support for annotating web pages with semantic contents and integrates a web browser with an ontology editor and provides open APIs to link to ontology servers and for integrating information extraction tools.
Abstract: An important precondition for realizing the goal of a semantic web is the ability to annotate web resources with semantic information. In order to carry out this task, users need appropriate representation languages, ontologies, and support tools. In this paper we present MnM, an annotation tool which provides both automated and semi-automated support for annotating web pages with semantic contents. MnM integrates a web browser with an ontology editor and provides open APIs to link to ontology servers and for integrating information extraction tools. MnM can be seen as an early example of the next generation of ontology editors, being web-based, oriented to semantic markup and providing mechanisms for large-scale automatic markup of web pages.

Proceedings ArticleDOI
07 May 2002
TL;DR: By ranking words and phrases in the citing documents according to expected entropy loss, this work is able to accurately name clusters of web pages, even with very few positive examples.
Abstract: The structure of the web is increasingly being used to improve organization, search, and analysis of information on the web. For example, Google uses the text in citing documents (documents that link to the target document) for search. We analyze the relative utility of document text, and the text in citing documents near the citation, for classification and description. Results show that the text in citing documents, when available, often has greater discriminative and descriptive power than the text in the target document itself. The combination of evidence from a document and citing documents can improve on either information source alone. Moreover, by ranking words and phrases in the citing documents according to expected entropy loss, we are able to accurately name clusters of web pages, even with very few positive examples. Our results confirm, quantify, and extend previous research using web structure in these areas, introducing new methods for classification and description of pages.

Journal Article
TL;DR: DAML+OIL is an ontology language specifically designed for use on the Web; it exploits existing Web standards (XML and RDF), adding the familiar ontological primitives of object oriented and frame based systems, and the formal rigor of a very expressive description logic.
Abstract: Ontologies are set to play a key role in the “Semantic Web”, extending syntactic interoperability to semantic interoperability by providing a source of shared and precisely defined terms. DAML+OIL is an ontology language specifically designed for use on the Web; it exploits existing Web standards (XML and RDF), adding the familiar ontological primitives of object oriented and frame based systems, and the formal rigor of a very expressive description logic. The logical basis of the language means that reasoning services can be provided, both to support ontology design and to make DAML+OIL described Web resources more accessible to automated processes.

Book ChapterDOI
02 Sep 2002
TL;DR: KAON - the Karlsruhe Ontology and Semantic WebTool Suite is introduced, specifically designed to provide the ontology and metadata infrastructure needed for building, using and accessing semantics-driven applications on the Web and on your desktop.
Abstract: The Semantic Web will bring structure to the content of Web pages, being an extension of the current Web, in which information is given a well-defined meaning. Especially within e-commerce applications, Semantic Web technologies in the form of ontologies and metadata are becoming increasingly prevalent and important. This paper introduce KAON - the Karlsruhe Ontology and Semantic WebTool Suite. KAON is developed jointly within several EU-funded projects and specifically designed to provide the ontology and metadata infrastructure needed for building, using and accessing semantics-driven applications on the Web and on your desktop.

Proceedings ArticleDOI
07 May 2002
TL;DR: This work provides a framework, CREAM, that allows for creation of metadata, and describes its implementation, viz.
Abstract: Richly interlinked, machine-understandable data constitute the basis for the Semantic Web. We provide a framework, CREAM, that allows for creation of metadata. While the annotation mode of CREAM allows to create metadata for existing web pages, the authoring mode lets authors create metadata --- almost for free --- while putting together the content of a page.As a particularity of our framework, CREAM allows to create relational metadata, i.e. metadata that instantiate interrelated definitions of classes in a domain ontology rather than a comparatively rigid template-like schema asm Dublin Core. We discuss some of the requirements one has to meet when developing such an ontology-based framework, e.g. the integration of a metadata crawler, inference services, document management and a meta-ontology, and describe its implementation, viz. Ont-O-Mat, a component-based, ontology-driven Web page authoring and annotation tool.

Book ChapterDOI
09 Jun 2002
TL;DR: An overview of where the two areas meet today, and ways of how a closer integration could be profitable are sketched.
Abstract: Semantic Web Mining aims at combining the two fast-developing research areas Semantic Web and Web Mining. The idea is to improve, on the one hand, the results of Web Mining by exploiting the new semantic structures in the Web; and to make use of Web Mining, on the other hand, for building up the Semantic Web. This paper gives an overview of where the two areas meet today, and sketches ways of how a closer integration could be profitable.