Showing papers on "Semantic Web published in 2000"

PDF

Open Access

Journal Article•DOI•

Web search behavior of Internet experts and newbies

[...]

Christoph Hölscher¹, Gerhard Strube¹•Institutions (1)

01 Jun 2000

TL;DR: It is found that while successful search performance requires the combination of the two types of expertise, specific strategies directly related to Web experience or domain knowledge can be identified.

...read moreread less

Abstract: Searching for relevant information on the World Wide Web is often a laborious and frustrating task for casual and experienced users. To help improve searching on the Web based on a better understanding of user characteristics, we investigate what types of knowledge are relevant for Web-based information seeking, and which knowledge structures and strategies are involved. Two experimental studies are presented, which address these questions from different angles and with different methodologies. In the first experiment, 12 established Internet experts are first interviewed about search strategies and then perform a series of realistic search tasks on the World Wide Web. From this study a model of information seeking on the World Wide Web is derived and then tested in a second study. In the second experiment two types of potentially relevant types of knowledge are compared directly. Effects of Web experience and domain-specific background knowledge are investigated with a series of search tasks in an economics-related domain (introduction of the Euro currency). We find differential and combined effects of both Web experience and domain knowledge: while successful search performance requires the combination of the two types of expertise, specific strategies directly related to Web experience or domain knowledge can be identified.

...read moreread less

721 citations

Journal Article•DOI•

Learning to construct knowledge bases from the World Wide Web

[...]

Mark Craven¹, Daniel DiPasquo¹, Dayne Freitag², Andrew McCallum¹, Tom M. Mitchell¹, Kamal Nigam¹, Seán Slattery¹ - Show less +3 more•Institutions (2)

Carnegie Mellon University¹, Jordan University of Science and Technology²

01 Apr 2000-Artificial Intelligence

TL;DR: The goal of the research described here is to automatically create a computer understandable knowledge base whose content mirrors that of the World Wide Web, and several machine learning algorithms for this task are described, and promising initial results with a prototype system that has created a knowledge base describing university people, courses, and research projects.

...read moreread less

473 citations

Journal Article•DOI•

The semantic Web and its languages

[...]

Ora Lassila, F.A.H. van Harmelen, Ian Horrocks, James A. Hendler, Deborah L. McGuinness - Show less +1 more

01 Nov 2000-IEEE Intelligent Systems & Their Applications

TL;DR: This paper considers how the semantic Web will provide intelligent access to heterogeneous and distributed information, enabling software products (agents) to mediate between user needs and available information sources.

...read moreread less

Abstract: The Web has drastically changed the availability of electronic information, but its success and exponential growth have made it increasingly difficult to find, access, present and maintain such information for a wide variety of users. In reaction to this bottleneck many new research initiatives and commercial enterprises have been set up to enrich available information with machine-processable semantics. The paper considers how the semantic Web will provide intelligent access to heterogeneous and distributed information, enabling software products (agents) to mediate between user needs and available information sources. The paper discusses the Resource Description Framework, XML and other languages.

...read moreread less

455 citations

Book•

Web Work: Information Seeking and Knowledge Work on the World Wide Web

[...]

Chun Wei Choo, Brian Detlor, Don Turnbull

31 Aug 2000

TL;DR: This chapter discusses the structure and Dynamics of Organizational Knowledge, and the role of the Intranet as Infrastructure for Knowledge Work, in the context of knowledge work on the World Wide Web.

...read moreread less

Abstract: Section I: Information Seeking and Knowledge Work. 1. Information Seeking. 2. The Structure and Dynamics of Organizational Knowledge. Section II: Knowledge Work on Intranets. 3. The Intranet as Infrastructure for Knowledge Work. 4. Designing Intranets to Support Knowledge Work. Section III: Information Seeking on the World Wide Web. 5. Models of Information Seeking on the World Wide Web. 6. Understanding Organizational Web Use. Coda. References. Index.

...read moreread less

291 citations

Book•

Weaving the Web : the past and present and future of the World Wide Web by its inventor

[...]

Tim Berners-Lee, Mark Fischetti

01 Jan 2000

TL;DR: This paper presents a meta-anatomy of the web, which aims to explain the web in the context of 21st Century society and investigates its role in promoting human rights and democracy.

...read moreread less

Abstract: 1. Enquire within upon everything 2. Tangles, links and webs 3. info.cern.ch 4. Protocols: Simple rules for global systems 5. Going global 6. Browsing 7. Changes 8. Consortium 9. Competition and consensus 10. Web of people 11. Privacy 12. Mind to mind 13. Machines and the web 14. Weaving the web

...read moreread less

240 citations

Journal Article•DOI•

Semantic community Web portals

[...]

Steffen Staab¹, Steffen Staab², Jürgen Angele², Stefan Decker², Stefan Decker¹, Michael Erdmann¹, Andreas Hotho¹, Alexander Maedche¹, Hans-Peter Schnurr², Hans-Peter Schnurr¹, Rudi Studer², Rudi Studer¹, York Sure¹ - Show less +9 more•Institutions (2)

Karlsruhe Institute of Technology¹, Ontoprise GmbH²

01 Jun 2000

TL;DR: How a comprehensive and flexible strategy for building and maintaining a high-value community Web portal has been conceived and implemented based on an ontology as a semantic backbone for accessing information on the portal, for contributing information, as well as for developing and maintaining the portal is discussed.

...read moreread less

Abstract: Community Web portals serve as portals for the information needs of particular communities on the Web. We here discuss how a comprehensive and flexible strategy for building and maintaining a high-value community Web portal has been conceived and implemented. The strategy includes collaborative information provisioning by the community members. It is based on an ontology as a semantic backbone for accessing information on the portal, for contributing information, as well as for developing and maintaining the portal. We have also implemented a set of ontology-based tools that have facilitated the construction of our show case — the community Web portal of the knowledge acquisition community.

...read moreread less

226 citations

A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet

[...]

Jörg-Uwe Kietz, Alexander Maedche, Raphael Volz

01 Jan 2000

TL;DR: This paper presents a comprehensive architecture and generic method for discovering a domain-tailored ontology from given intranet resources and describes the actual and ongoing work in supporting semi-automatic ontology acquisition from a corporate intranets of an insurance company.

...read moreread less

Abstract: The focused access to knowledge resources like intranet documents plays a vital role in knowledge management and supports in general the shifting towards a Semantic Web. Ontologies act as a conceptual backbone for semantic document access by providing a common understanding and conceptualization of a domain. Building domain-specific ontologies is a time-consuming and expensive manual construction task. This paper describes our actual and ongoing work in supporting semi-automatic ontology acquisition from a corporate intranet of an insurance company. We present a comprehensive architecture and generic method for discovering a domain-tailored ontology from given intranet resources.

...read moreread less

198 citations

XML Linking Language (XLink) Version 1. 0. World Wide Web Consortium, Proposed Recommendation PR - x

[...]

Steven J. DeRose, Eve L. Maler, Dominic A. Orchard

01 Jan 2000

185 citations

Report•DOI•

Semantic Interoperability on the Web

[...]

Jeff Heflin, James A. Hendler

01 Jan 2000

TL;DR: The SHOE language is presented, which the authors feel has many of the features necessary to enable a semantic web, and an existing set of tools that make it easy to use the language are described.

...read moreread less

Abstract: XML will have a profound impact on the way data is exchanged on the Internet. An important feature of thislanguage is the separation of content from presentation, which makes it easier to select and/or reformat the data.However, due to the likelihood of numerous industry and domain specific DTDs, those who wish to integrateinformation will still be faced with the problem of semantic interoperability. In this paper we discuss why thisproblem is not solved by XML, and then discuss why the Resource Description Framework is only a partial solution.We then present the SHOE language, which we feel has many of the features necessary to enable a semantic web,and describe an existing set of tools that make it easy to use the language.Jeff Heflin, University of MarylandJames Hendler, University of Maryland

...read moreread less

166 citations

Journal Article•DOI•

Finding replicated Web collections

[...]

Junghoo Cho¹, Narayanan Shivakumar¹, Hector Garcia-Molina¹•Institutions (1)

Stanford University¹

16 May 2000

TL;DR: The case for identifying replicated documents and collections to improve web crawlers, archivers, and ranking functions used in search engines is made.

...read moreread less

Abstract: Many web documents (such as JAVA FAQs) are being replicated on the Internet. Often entire document collections (such as hyperlinked Linux manuals) are being replicated many times. In this paper, we make the case for identifying replicated documents and collections to improve web crawlers, archivers, and ranking functions used in search engines. The paper describes how to efficiently identify replicated documents and hyperlinked document collections. The challenge is to identify these replicas from an input data set of several tens of millions of web pages and several hundreds of gigabytes of textual data. We also present two real-life case studies where we used replication information to improve a crawler and a search engine. We report these results for a data set of 25 million web pages (about 150 gigabytes of HTML data) crawled from the web.

...read moreread less

145 citations

Journal Article•DOI•

Framework for the semantic Web: an RDF tutorial

[...]

Stefan Decker¹, Prasenjit Mitra², Sergey Melnik¹•Institutions (2)

Stanford University¹, Penn State College of Information Sciences and Technology²

01 Nov 2000-IEEE Internet Computing

TL;DR: The Resource Description Framework (RDF) provides a data model that supports fast integration of data sources by bridging semantic differences and can be used as a general framework for data exchange on the Web.

...read moreread less

Abstract: The current World Wide Web supports mainly human browsing and searching of textual content. This model has become less and less adequate as the mass of available information increases. What is required instead is a model that supports integrated and uniform access to information sources and services as well as intelligent applications for information processing on the Web. Such a model requires standard mechanisms for interchanging data and handling different data semantics. The Resource Description Framework (RDF) is a step in this direction. RDF provides a data model that supports fast integration of data sources by bridging semantic differences. It is often used (and was initially designed) for representing metadata about other Web resources, such as XML files. However, representing metadata about the Web is not different from representing data generally. Thus, RDF can be used as a general framework for data exchange on the Web.

...read moreread less

Journal Article•

The semantic Web and its languages

[...]

Dieter Fensel

01 Jan 2000-IEEE Intelligent Systems

TL;DR: Support in data, information, and knowledge exchange is the key issue in current computer technology and is essential for “bringing the web to its full potential” in areas such as knowledge management and electronic commerce.

...read moreread less

Abstract: Currently computers are changing from single isolated devices to entry points in a worldwide network of information exchange and business transactions called the World WideWeb (WWW). Therefore support in data, information, and knowledge exchange becomesthe key issue in current computer technology. The WWW has drastically changed theavailability of electronically available information. However, this success andexponential grow makes it increasingly difficult to find, to access, to present, and tomaintain the information of use to a wide variety of users. In reaction to this bottleneckmany new research initiatives and commercial enterprises have been set up to enrichavailable information with machineprocessable semantics. Such support is essential for “bringing the web to its full potential” in areas such as knowledge management andelectronic commerce. This

...read moreread less

Book Chapter•DOI•

A Conceptual Graph Model for W3C Resource Description Framework

[...]

Olivier Corby¹, Rose Dieng¹, Cedric Hebert¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

14 Aug 2000

TL;DR: The paper presents the mapping of RDF into CG and its interest in the context of the semantic Web and its approach is to exploit a standard language for expressing metadata and to interpret these metadata in conceptual graphs in order to exploit querying and inferencing capabilities enabled by CG formalism.

...read moreread less

Abstract: With the aim of building a ”Semantic Web”, the content of the documents must be explicitly represented through metadata in order to enable contents-guided search. Our approach is to exploit a standard language (RDF, recommended by W3C) for expressing such metadata and to interpret these metadata in conceptual graphs (CG) in order to exploit querying and inferencing capabilities enabled by CG formalism. The paper presents our mapping of RDF into CG and its interest in the context of the semantic Web.

...read moreread less

Proceedings Article•DOI•

Personal ontologies for web navigation

[...]

Jason Chaffee¹, Susan Gauch¹•Institutions (1)

University of Kansas¹

06 Nov 2000

TL;DR: A way to use a user’s personal arrangement of concepts to navigate the Web using the characterizations created by the OBIWAN system and the mapping of the reference ontology to the personal ontology is shown to have a promising level of correctness and precision.

...read moreread less

Abstract: The publicly indexable Web contains an estimated 800 million pages, however it is estimated that the largest search engine contains only 300 million of these pages. As the number of Internet users and the number of accessible Web pages grows, it is becoming increasingly difficult for users to find documents that are relevant to their particular needs. Often users must browse through a large hierarchy of categories to find the information for which they are looking. To provide the user with the most useful information in the least amount of time, we need a system that uses each user’s view of the world for classification. This paper explores a way to use a user’s personal arrangement of concepts to navigate the Web. This system is built by using the characterizations for a particular site created by the Ontology Based Informing Web Agent Navigation (OBIWAN) system and mapping from them to the user’s personal ontologies. OBIWAN allows users to explore multiple sites via the same browsing hierarchy. This paper extends OBIWAN to allow users to explore multiple sites via their own browsing hierarchy. The mapping of the reference ontology to the personal ontology is shown to have a promising level of correctness and precision.

...read moreread less

A Layered Approach to Information Modeling and Interoperability on the Web

[...]

Sergey Melnik, Stefan Decker

01 Sep 2000

TL;DR: A layered approach to interoperability of information models that borrows from layered software structuring techniques used in today's internetworking is suggested and the key features of the object layer like identity and binary relationships, basic typing, reification, ordering, and n-ary relationships are suggested.

...read moreread less

Abstract: On the Semantic Web, the target audience is the machinesr ather than humans. To satisfy the demands of this audience, information needs to be available in machine-processable form rather than as unstructured text. A variety of information models like RDF or UML are available to fulfil this purpose, varying greatly in their capabilities. The advent of XML leveraged a promising consensus on the encoding syntax for machine-processable information. However, interoperating between different information models on a syntactic level proved to be a laborious task. In this paper, we suggest a layered approach to interoperability of information models that borrows from layered software structuring techniques used in today's internetworking. We identify the object layer that fills the gap between the syntax and semantic layers and examine it in detail. We suggest the key features of the object layer like identity and binary relationships, basic typing, reification, ordering, and n-ary relationships. Finally, we examine design issues and implementation alternatives involved in building the object layer.

...read moreread less

Journal Article•DOI•

Ontology-driven document enrichment

[...]

Enrico Motta¹, Simon Buckingham Shum¹, John Domingue¹•Institutions (1)

Open University¹

01 Jun 2000-International Journal of Human-computer Studies \/ International Journal of Man-machine Studies

TL;DR: An approach to document enrichment is presented, which consists of developing and integrating formal knowledge models with archives of documents, to provide intelligent knowledge retrieval and (possibly) additional knowledge-intensive services, beyond what is currently available using “standard” information retrieval and search facilities.

...read moreread less

Abstract: In this paper, we present an approach to document enrichment, which consists of developing and integrating formal knowledge models with archives of documents, to provide intelligent knowledge retrieval and (possibly) additional knowledge-intensive services, beyond what is currently available using “standard” information retrieval and search facilities. Our approach is ontology-driven, in the sense that the construction of the knowledge model is carried out in a top-down fashion, by populating a given ontology, rather than in a bottom-up fashion, by annotating a particular document. In this paper, we give an overview of the approach and we examine the various types of issues (e.g. modelling, organizational and user interface issues) which need to be tackled to effectively deploy our approach in the workplace. In addition, we also discuss a number of technologies we have developed to support ontology-driven document enrichment and we illustrate our ideas in the domains of electronic news publishing, scholarly discourse and medical guidelines.

...read moreread less

Proceedings Article•

Knowledge Representation on the Web.

[...]

Stefan Decker, Dieter Fensel, F.A.H. van Harmelen, Ian Horrocks, Sergey Melnik, Michel C. A. Klein, Jeen Broekstra - Show less +3 more

01 Jan 2000

TL;DR: The OIL language extends the RDF schema standard to provide just such a layer, which combines the most attractive features of frame based languages with the expressive power, formal rigour and reasoning services of a very expressive description logic.

...read moreread less

Abstract: Exploiting the full potential of the World Wide Web will require semantic as well as syntactic interoperability. This can best be achieved by providing a further representation and inference layer that builds on existing and proposed web standards. The OIL language extends the RDF schema standard to provide just such a layer. It combines the most attractive features of frame based languages with the expressive power, formal rigour and reasoning services of a very expressive description logic.

...read moreread less

Data Mining on Symbolic Knowledge Extracted from the Web

[...]

Rayid Ghani¹, Seán Slattery•Institutions (1)

Carnegie Mellon University¹

01 Jan 2000

TL;DR: Results of association rules, propositional and relational learning are provided, which demonstrate that data-mining can help us improve the authors' extractors, and that using information from two kinds of sources improves the reliability of data-mined rules.

...read moreread less

Abstract: Information extractors and classifiers operating on unrestricted, unstructured texts are an errorful source of large amounts of potentially useful information, especially when combined with a crawler which automatically augments the knowledge base from the world-wide web. At the same time, there is much structured information on the World Wide Web. Wrapping the web-sites which provide this kind of information provide us with a second source of information; possibly less up-to-date, but reliable as facts. We give a case study of combining information from these two kinds of sources in the context of learning facts about companies. We provide results of association rules, propositional and relational learning, which demonstrate that data-mining can help us improve our extractors, and that using information from two kinds of sources improves the reliability of data-mined rules.

...read moreread less

Enriched Links: A Framework For Improving Web Navigation Using Pop-Up Views

[...]

Gary Geisler¹•Institutions (1)

University of North Carolina at Chapel Hill¹

01 Jan 2000

TL;DR: A conceptual framework for enriching Web links by displaying small, information-rich visualizations—pop-up views—that provide the user with information about linked pages that can be used to evaluate the appropriateness of the pages before making a commitment to select the link and wait for the page to load.

...read moreread less

Abstract: We describe a conceptual framework for enriching Web links by displaying small, information-rich visualizations—pop-up views—that provide the user with information about linked pages that can be used to evaluate the appropriateness of the pages before making a commitment to select the link and wait for the page to load. Examples of how the enriched links framework could be applied in contexts, such as e-commerce catalog pages, search results for a video repository, and desktop icons, are also presented.

...read moreread less

Journal Article•DOI•

The Web as an information source on informetrics?: a content analysis

[...]

Judit Bar-Ilan¹•Institutions (1)

Hebrew University of Jerusalem¹

15 Mar 2000-Journal of the Association for Information Science and Technology

TL;DR: Analysis of the Web pages retrieved by the major search engines on a particular date, as a result of the query “informetrics OR informetric,” indicates that valuable, freely available data is hidden in the Web waiting to be extracted from the millions of Web pages.

...read moreread less

Abstract: This article addresses the question of whether the Web can serve as an information source for research. Specifically, it analyzes by way of content analysis the Web pages retrieved by the major search engines on a particular date (June 7, 1998), as a result of the query “informetrics OR informetric.” In 807 out of the 942 retrieved pages, the search terms were mentioned in the context of information science. Over 70% of the pages contained only indirect information on the topic, in the form of hypertext links and bibliographical references without annotation. The bibliographical references extracted from the Web pages were analyzed, and lists of most productive authors, most cited authors, works, and sources were compiled. The list of references obtained from the Web was also compared to data retrieved from commercial databases. For most cases, the list of references extracted from the Web outperformed the commercial, bibliographic databases. The results of these comparisons indicate that valuable, freely available data is hidden in the Web waiting to be extracted from the millions of Web pages.

...read moreread less

Book•

Conceptual Modeling for E-Business and the Web

[...]

Bernhard Thalheim, Heinrich C. Mayr, Stephen W. Liddle

20 Sep 2000

Proceedings Article•

A Multi-Agent System to Support Exploiting an XML-based Corporate Memory.

[...]

Fabien Gandon, Rose Dieng¹, Olivier Corby, Alain Giboin•Institutions (1)

French Institute for Research in Computer Science and Automation¹

30 Oct 2000

TL;DR: These are the assets of an approach combining XML technology designed for the Web and the distributed nature of multi-agent systems as a solution to the heterogeneity and the distribution of the corporate memory.

...read moreread less

Abstract: A corporate memory and the World Wide Web have in common that they are both heterogeneous and distributed information landscapes. They also share the same problem of relevance of results when one wants to search them. However, compared to the Web, a corporate memory has a delimited and better defined context, infrastructure and scope : the corporation. Taking into account the characteristics of a corporate memory we show in this paper the assets of an approach combining XML technology designed for the Web and the distributed nature of multi-agent systems. In particular, we consider the heterogeneity and distribution of the multi-agent system as a solution to the heterogeneity and the distribution of the corporate memory.

...read moreread less

Journal Article•DOI•

Lessons learned from applying ai to the web

[...]

Dieter Fensel¹, Jürgen Angele², Stefan Decker², Michael Erdmann², Hans-Peter Schnurr², Rudi Studer², Andreas Witt² - Show less +3 more•Institutions (2)

VU University Amsterdam¹, Karlsruhe Institute of Technology²

01 Dec 2000-International Journal of Cooperative Information Systems

TL;DR: Ontobroker as mentioned in this paper applies Artificial Intelligence techniques to improve access to heterogeneous, distributed and semi-structured information sources as they are presented in the World Wide Web or organization-wide intranets.

...read moreread less

Abstract: Ontobroker applies Artificial Intelligence techniques to improve access to heterogeneous, distributed and semi-structured information sources as they are presented in the World Wide Web or organization-wide intranets. It relies on the use of ontologies to annotate web pages, formulate queries and derive answers. In this paper we will briefly sketch Ontobroker. Then we will discuss its main shortcomings, i.e. we will share the lessons we learned from our exercise. We will also show how On2broker overcomes these limitations. Most important is the separation of the query and inference engines and the integration of new web standards like XML and RDF.

...read moreread less

Journal Article•DOI•

Open hypermedia as user controlled meta data for the Web

[...]

Kaj Grønbæk¹, Lennert Sloth¹, Niels Olof Bouvin¹•Institutions (1)

Aarhus University¹

01 Jun 2000

TL;DR: The paper describes the OHIF format and demonstrates how the Webvise system handles OHIF, and argues for better support for handling user controlled meta data, e.g. support for linking in non-XML data, integration of external linking in the Web infrastructure, and collaboration support for external structures and meta-data.

...read moreread less

Abstract: This paper introduces an approach to utilise open hypermedia structures such as links, annotations, collections and guided tours as meta data for Web resources The paper introduces an XML based data format, called Open Hypermedia Interchange Format, OHIF, for such hypermedia structures OHIF resembles XLink with respect to its representation of out-of-line links, but it goes beyond XLink with a more rich set of structuring mechanisms, including eg composites Moreover OHIF includes an addressing mechanisms (LocSpecs) that goes beyond XPointer and URL in its ability to locate non-XML data segments By means of the Webvise system, OHIF structures can be authored, imposed on Web pages, and finally linked on the Web as any ordinary Web resource Following a link to an OHIF file automatically invokes a Webvise download of the meta data structures and the annotated Web content will be displayed in the browser Moreover, the Webvise system provides support for users to create, manipulate, and share the OHIF structures together with custom made Web pages and MS Office 2000 documents on WebDAV servers These Webvise facilities goes beyond earlier open hypermedia systems in that it now allows fully distributed open hypermedia linking between Web pages and WebDAV aware desktop applications The paper describes the OHIF format and demonstrates how the Webvise system handles OHIF Finally, it argues for better support for handling user controlled meta data, eg support for linking in non-XML data, integration of external linking in the Web infrastructure, and collaboration support for external structures and meta-data

...read moreread less

Proceedings Article•

Finding Structure and Characteristics of Web Documents for Classification.

[...]

Wai-ching Wong, Ada Wai-Chee Fu

01 Jan 2000

TL;DR: This paper introduced a labels discovery algorithm that uses the hierarchical structure extracted from the web pages to discover similar labels which describe the same kind of information.

...read moreread less

Abstract: Many Web documents containing the same type of information , would have similar structure. In this paper, we examine the problem of nding the structure of web documents and present a hierarchical structure to represent the relation among text data in the web documents. Due to the loose standard of web page publishing, diierent authors can use diierent wordings (labels) to label the same information. We introduced a labels discovery algorithm that uses the hierarchical structure extracted from the web pages. The algorithm discovers similar labels which describe the same kind of information. Such labels would help us nd the structure of the web documents. Experiments have shown that the algorithm can successfully discover similar labels and the structure obtained by our method can distinguish web pages accurately.

...read moreread less

Proceedings Article•DOI•

Utilizing the world wide web as an encyclopedia: extracting term descriptions from semi-structured texts

[...]

Atsushi Fujii, Tetsuya Ishikawa

03 Oct 2000

TL;DR: A method to extract descriptions of technical terms from Web pages in order to utilize the World Wide Web as an encyclopedia by using linguistic patterns and HTML text structures and a clustering method to summarize resultant descriptions.

...read moreread less

Abstract: In this paper, we propose a method to extract descriptions of technical terms from Web pages in order to utilize the World Wide Web as an encyclopedia. We use linguistic patterns and HTML text structures to extract text fragments containing term descriptions. We also use a language model to discard extraneous descriptions, and a clustering method to summarize resultant descriptions. We show the effectiveness of our method by way of experiments.

...read moreread less

The Kulturarw3 Project--The Royal Swedish Web Archiw3e--An Example of "Complete" Collection of Web Pages.

[...]

Allan Arvidson, Krister Persson, Johan Mannerheim

01 Aug 2000

Journal Article•DOI•

Mining the Web for relations

[...]

Neel Sundaresan¹, Jeonghee Yi²•Institutions (2)

IBM¹, University of California, Los Angeles²

01 Jun 2000

TL;DR: This paper defines and formalizes the general duality problem of relations on the Web, and solves the problem of identifying acronyms and their expansions through patterns of occurrences of (acronym, expansion) pairs as they occur in Web pages.

...read moreread less

Abstract: The Web is a vast source of information. However, due to the disparate authorship of Web pages, this information is buried in its amorphous and chaotic structure. At the same time, with the pervasiveness of Web access, an increasing number of users is relying on Web search engines for interesting information. We are interested in identifying how pieces of information are related as they are presented on the Web. One such problem is studying patterns of occurrences of related phrases in Web documents and in identifying relationships between these phrases. We call these the duality problems of the Web. Duality problems are materialized in trying to define and identify two sets of inter-related concepts, and are solved by iteratively refining mutually dependent coarse definitions of these concepts. In this paper we define and formalize the general duality problem of relations on the Web. Duality of patterns and relationships are of importance because they allow us to define the rules of patterns and relationships iteratively through the multitude of their occurrences. Our solution includes Web crawling to iteratively refine the definition of patterns and relations. As an example we solve the problem of identifying acronyms and their expansions through patterns of occurrences of (acronym, expansion) pairs as they occur in Web pages.

...read moreread less

Book Chapter•DOI•

Mixed-Initiative Translation of Web Pages

[...]

Michael Fleming¹, Robin Cohen¹•Institutions (1)

University of Waterloo¹

10 Oct 2000

TL;DR: A need to support interactive translation of Web pages as the World Wide Web becomes more accessible to people with varying needs and abilities throughout the world is envisioned.

...read moreread less

Abstract: A mixed-initiative system is one which allows more interactivity between the system and user, as the system is reasoning We present some observations on the task of translating Web pages for users and suggest that a more interactive approach to this problem may be desirable The aim is to interact with the user who is requesting the translation and the challenge is to determine the circumstances under which the user should be able to take the initiative to direct the processing or the system should be able to take the initiative to solicit further input from the user In fact, we envision a need to support interactive translation of Web pages as the World Wide Web becomes more accessible to people with varying needs and abilities throughout the world

...read moreread less

Book Chapter•DOI•

Applying the Resource Description Framework to Web Engineering

[...]

Reinhold Klapsing, Gustaf Neumann

04 Sep 2000

TL;DR: This paper presents an eXtensible Web Modeling Framework (XWMF), which applies the Resource Description Framework (RDF) to Web engineering to provide an interoperable exchange format.

...read moreread less

Abstract: Generally, a multitude of tools is used for the management of a Web application life cycle. It is highly desirable to provide an exchange format for such tools to enable interoperability. This paper presents an eXtensible Web Modeling Framework (XWMF), which applies the Resource Description Framework (RDF) to Web engineering to provide an interoperable exchange format. Our proposed framework makes use of one and the same (meta- ) data model to specify the structure and content of a Web application, to make statements about the elements of a Web application, and to reason about the data and metadata. XWMF is extensible, because schemata defining additional vocabulary to integrate new design artifacts can be added. The XWMF tools are able to convert the Web application (metadata) description into the corresponding Web implementation.

...read moreread less