scispace - formally typeset
Search or ask a question

Showing papers on "Web service published in 1994"


Journal ArticleDOI
01 Nov 1994
TL;DR: In this article, the authors describe the design and performance of a caching relay for the World Wide Web and model the distribution of requests for pages from the web and see how this distribution affects the performance of the cache.
Abstract: We describe the design and performance of a caching relay for the World Wide Web. We model the distribution of requests for pages from the web and see how this distribution affects the performance of a cache. We use the data gathered from the relay to make some general characterizations about the web.

319 citations


Journal ArticleDOI
01 Nov 1994
TL;DR: The background and objectives of ALIWEB are discussed, an overview of its functionality and implementation is given, and it is compared to other existing resource directories in the Web.
Abstract: ALIWEB is a framework for automatic collection and processing of resource indices in the World Wide Web. The current ALIWEB implementation regularly retrieves index files from many servers in the Web and combines them into a single searchable database. Using existing Web protocols and a simple index file format, server administrators can have descriptive and up-to-date information about their services incorporated into the ALIWEB database with little effort. As the indices are single files there is little overhead in the collection process and because the files are prepared for the purpose the resulting database is of high quality. This paper discusses the background and objectives of ALIWEB and gives an overview of its functionality and implementation, comparing it to other existing resource directories in the Web. It reviews the experiences with the first months of operation, and suggests possible future directions of ALIWEB.

115 citations


Journal ArticleDOI
01 Nov 1994
TL;DR: The MORE system is a meta-data based repository employing Mosaic and the Web as its sole user interface and the design and implementation experience in migrating a repository system onto the Web is described.
Abstract: Administering large quantities of information will be an increasing problem as the World Wide Web grows in size and popularity. The MORE system is a meta-data based repository employing Mosaic and the Web as its sole user interface. We describe here our design and implementation experience in migrating a repository system onto the Web. A demonstration instance of MORE is accessible.

39 citations


Journal ArticleDOI
01 Nov 1994
TL;DR: The organization, structure, evolution, and management of the information contained in the Web server will be addressed and common pitfalls to be avoided when deploying a large, commercial-class World-Wide Web server are identified.
Abstract: Digital Equipment Corporation was one of the first large corporations to embrace the World-Wide Web as a basis for customer support, global electronic marketing, online interactive product demonstrations, and electronic commerce. This paper uses Digital as a case study to help other organizations that wish to participate in this evolving electronic community. This paper will address the organization, structure, evolution, and management of the information contained in the Web server. The original design goals of the Web server will be mapped against the actual experiences gained from the week-to-week management of the environment. The paper will also identify common pitfalls to be avoided when deploying a large, commercial-class World-Wide Web server.

36 citations


Journal ArticleDOI
TL;DR: New WWW server software that solves some of the problems enumerated above by integrating technology from information retrieval (IR) and natural language processing (NLP) based retrieval software that has better precision and recall than WAIS is described.
Abstract: Most World-Wide Web (WWW) sites make minimal use of information retrieval (IR) technology. At best they start with a set of HTML documents and index them with WAIS, a fast but simple information retrieval engine. Users browsing these sites have the option of doing a keyword search of the database. We are building new WWW server software that: • Uses natural language processing (NLP) based retrieval software that has better precision and recall than WAIS; • Incorporates WAIS's ability to search several databases at once and lets the users select those databases; • Allows the user to pose mixed relational and natural language queries; • Lets the user customize several retrieval features including number of documents to return, format of the list of returned documents, and any knowledge used by the system; • Generates improved queries based on user feedback; • Lets users see natural clusters in a set of documents; This is especially valuable for databases that change over time; • Dynamically creates hyper-links between related documents and parts of documents; 1.Current State of the Web Most users browse the World-Wide Web (WWW) using embedded hypertext links to move from document to document or within documents. As the success of the WWW and other hypertext systems like Apple's HyperCard and Microsoft Help have proven, this paradigm works well for many applications. For example, users looking around to see what the WWW offers for the first time can access a wide range of information in a short amount of time. When a person needs to do focused research, however, it can be frustrating to depend on someone else's organization of information. In these cases, a content-based search is more appropriate. Global indices of information on the Internet include Archie, Veronica, and ALIweb. These software programs index information based on filenames and user authored descriptions, and let users issue searches. While Archie and Veronica build their indices automatically by roaming the net, ALIweb depends on authors to submit descriptions of their \"chunk\" of the WWW. These descriptions must be kept up to date. As sites put larger information sources on the Internet, it becomes very expensive, in terms of both time and storage, to build these global indices. A new approach is that of client-based retrieval implemented at Eindhoven University of Technology [DeBra]. These researchers have extended NCSA's Mosaic to accept keyword queries and search for documents containing those words by automatically following links from the current \"page.\" This approach is flexible and does not require any indexing on the server side, but requires bandwidth proportional to the size of the collection being searched, since it must get each document and search it on the client machine. It does not take advantage of any precomputed index. Many sites are using WAIS to tackle the problem of indexing their own large collections. From certain pages at a site, a user will have the option of posing a keyword query. A list of documents will come back, each linked to the full text or multimedia version of the document. WAIS's search is more accurate than that of global indices such as ALIweb, Archie and Veronica, since it indexes the full text of a document, but it is still a simple keyword-based method. Although the standard WAIS client allows users to transparently search multiple distributed databases, most WAIS-indexed WWW sites do not. To improve search accuracy, other sites have integrated retrieval systems with sophisticated query languages that add Boolean operators (AND, OR, NOT) and proximity operators (within-X-words, same sentence, same paragraph). These systems sometimes let the user associate weights with words in the query. While these features give the user more control over the retrieval, the languages are hard to learn and nonstandard. This paper describes new WWW server software that solves some of the problems enumerated above by integrating technology from information retrieval (IR) and natural language processing (NLP). Our server also lets users customize search parameters, mix relational and natural language queries, and browse large collections of documents that do not contain embedded hyperlinks. 2.Natural Language Technology Instead of burdening the user with learning a sophisticated query language, our software lets the user pose his/her query in natural language and applies NLP technology to that query as well as the textual contents of each indexed document. Most IR systems, including WAIS, treat documents as linear lists of words. Function words like \"the,\" \"of,\" and \"and\" are ignored and often words are reduced to a stem (e.g., \"computers\" ⇒ \"compute\") . To improve accuracy, IR systems need richer representations of document content [Salton83]. Linguists have been studying natural language for years and recognize several distinct levels of representation of natural language. These include the morphological structure of words, syntactic structures, semantic predicate argument structures, and the discourse relations between pronouns, definite noun phrases, and their antecedents [Fromkin]. Our system does NOT attempt to do complete natural language understanding of documents and queries. Only recently have many linguists begun to scale their theories up to large quantities of real world text and automatic methods are even farther behind. Projects that are underway to tag large corpora with linguistic structure will help test these theories and provide test data for automatic methods. One of these, the Penn Treebank [Marcus] has been able to consistently tag millions of words of text with syntactic structure. While no existing parser can accurately identify complete syntactic structure, there are several syntactic parsers [Hindle, DeMarcken] that can accurately identify lowlevel constituents (e.g., simple noun phrases, prepositional phrases) in naturally occurring text. NLP technology has been successfully applied in many areas to improve the accuracy of IR systems. Syntactic parsers can identify multiword phrases that can serve as indexable terms [Fagan, Stralkowski]. Other researchers have generated thesauri automatically [Stralkowski], and by automatically disambiguating words with multiple meanings (e.g., crane ( construction equipment) vs. crane ( bird)) [Yarowsky, Krovetz] systems can become much more accurate. Our software incorporates these methods. NLP techniques are especially effective when applied to collections of short documents. Picture Network International (PNI) indexes a collection of hundreds of thousands of images with natural language captions. Because of the size of the images that are returned (usually across phone lines), it is important that any retrieval be as accurate as possible. Since captions are short, content words usually only appear once, reducing the likelihood of an exact keyword match. The information in so-called stop words is critical to distinguish cases such as \"people inside a house\" from \"people outside a house.\" 3.Distributed Queries WAIS servers index a set of documents and answers Z39.50 information requests. Many WWW sites that use WAIS will index their HTML documents and pass along users' keyword queries to this server. They show the user a list of the returned documents. The standard WAIS client, however has many features that are not usually integrated into the WWW. One of the most powerful is its ability to let the user search several distributed databases at once. Users do not like to repeat searches or use different query languages for each source they want to search. Our software lets a user search directories of servers or add servers manually to a list of servers to search. Queries are translated into a form suitable for each server, sent to each server, and the results assimilated (See Fig 1).

17 citations


Journal ArticleDOI
TL;DR: Active pages as mentioned in this paper provide a common interface to World Wide Web applications, crossing browser, platform and operating system boundaries, and can be accessed in the same way and with the same browser as other pages.
Abstract: Active pages provide a common interface to World Wide Web applications, crossing browser, platform and operating system boundaries. They are hypertext documents that present a front−end to intelligent applications. Typically implemented as interpreted programs with an associated database, they use the forms extension for application input. There are three advantages to the active pages approach for application interfaces. Application interfaces are widely accessible because they leverage off of the accessibility of the Web by using HTTP to bridge application and interface. Applications are self−documenting. The hypertext model of the Web makes it simple for active pages to contain embedded documentation and links to auxiliary material. Finally, applications integrate seamlessly with the Web. Active pages may be accessed in the same way and with the same browser as other pages. In this paper, we present our active page design methodology and demonstrate it with two examples from our server: WEBDNS, a facility for editing Internet Domain Name System master files; and The People Directory, an editable personnel database that includes hypertext links to biographical pages. Introduction The World Wide Web can be an interface to shared applications, as well as a mechanism for electronic publishing and collaboration. Applications on the Web can leverage off of the portability of the Web for access. In turn, Web users can leverage off of the flexibility of programs to improve their collaboration. The pages of a shared application change in appearance and content as viewers progress through them. Input from one viewer can become visible to others. The same information can be presented differently, depending on the viewer and their preferences. From the viewers’ perspective, the pages appear active and intelligent. These active pages contrast with other pages, which appear the same from invocation to invocation. We developed an approach to the design of active pages suited to shared applications. To implement the active pages, a server interface for executing programs and a client interface for accepting input is required. We adopted the Common Gateway Interface (CGI) extension to HTTP servers and the forms extension to Web browsers, respectively, because they were immediately available and in widespread use. In this paper, we describe our design and illustrate it with two examples. The first, WEBDNS, facilitates the interactive update of Domain Name System (DNS) master files. The second, The People Directory, presents an editable personnel database. The Advantages of Active Pages Active pages provide several advantages to shared applications. They are widely and conveniently accessible, especially suited to collaborative tasks, encourage documentation, and are seamlessly integrated with other Web pages. Accessiblity The broadest feature of active pages compared to other application interfaces is widespread accessibility. The Web is portable across platform and operating system and network boundaries. Just as for other Web pages, active pages may be accessed interactively, across the network, and from a variety of computer platforms. This makes them appropriate for services that benefit from being freely available to a hetrogeneous community. Collaboration Many users may simultaneously access one active page. This, coupled with the fact that the application executes on the server, rather than the client, allows many users to share few special resources. This type of access is especially useful for collaboration within a user community, and can be applied to many database systems. Two large examples of database systems that can be interfaced to the Web with active pages are SABRE, the United Airlines reservations system, and Lotus Notes.

14 citations


Proceedings Article
01 Jan 1994
TL;DR: An effective X-windows based WYSIWYG WWW browser/editor and a prototype for integrated wide-area authentication and authorization support for delivery and maintenance of WWW service are developed.
Abstract: We describe work to enhance existing software protocols and develop a suite of new software utilities based upon a set of standards known as the World Wide Web (WWW). Specifically, we have developed an effective X-windows based WYSIWYG WWW browser/editor and a prototype for integrated wide-area authentication and authorization support for delivery and maintenance of WWW service. These software development activities, along with parallel work in content development, are empowering individuals to better use the Internet as a resource to easily author, publish, and access materials. As an illustrative application, we describe one Web-based self-instructional unit designed to increase users' knowledge of hazardous substances in the environment. This on-line monograph was adapted from a series of paper-based case studies developed by the Agency for Toxic Substances and Disease Registry of the U.S. Department of Health & Human Services. The on-line version illustrates many of the innovative features provided by the Web, and demonstrates how such materials can significantly impact medical education at all levels.

14 citations


01 Jan 1994

7 citations


Journal ArticleDOI
TL;DR: A tool for generating tours of the web to enable experts to become cyber-carto-graphers, mapmakers of the new virtual world, and share their findings with novice users.
Abstract: The World Wide Web is one of the fastest growing Networked Information Systems in history. The revolution has been brought about by use of GUIs such as NCSA's Mosaic, and the distributed hypertext language HTML, Universal Resource Locators, and with simple protocols for client server access. Another contributory factor has been the development of a number of filters that have permitted the introduction of material prepared using almost all the well known word processing and desk-top publishing tools. However, this growth has led to problems for new users finding the information they want. This paper is about a tool for generating tours of the web to enable experts to become cyber-carto-graphers, mapmakers of the new virtual world, and share their findings with novice users.

4 citations



Journal ArticleDOI
TL;DR: This paper shows how the company Web server was solved and describes how both technical and non-technical staff in discovering problems, finding solutions, and filling the Web with pages were involved.
Abstract: There are two kinds of users of modern information systems: readers and information providers. Often, people are both. It is important to offer the readers good technical support in using the Web. But it is just as important to motivate people to provide the information that will make the Web interesting to use. While one wants to make it easy for users to submit and post material, a Webmaster is responsible for the technical quality of the server; links and HTML formatting must continually be tested and maintained. To what extent can such tasks be automated? How can users be allowed to post material and announce its presence as directly and automatically as possible? What responsibilities for quality control lie with the users as opposed to the Webmaster? What other searching methods, such as indexing, would help the readers navigate the Web? This paper shows how we have solved such problems in our company Web server and describes how we involved both technical and non-technical staff in discovering problems, finding solutions, and filling the Web with pages.

Journal ArticleDOI
01 Nov 1994
TL;DR: It has been found that only through flexibility provided by Deceit have maintainers been able to shift responsibility for information onto its author, thus reducing their workload and increasing the range of information available.
Abstract: When good web clients first became available we decided that the Web provided enormous potential for the dissemination of information both within our department and throughout the university. We therefore decided to join the Web. At that time the only server available was a very primitive version of the CERN HTTPD, so we decided to create our own. This server was the basis for our current server which has provided the backbone for the CityCS Web for well over a year. Despite the potential offered by the Web, we encountered much resistance from the author base. This was due to the lack of good authoring tools and the difficulties involved in presenting the information through the server. Also, the more information we added and the more authors we encouraged, the greater our workloads became. At that time we embarked on a project to find reasons for, and solutions to the problems encountered. We have found that only through flexibility provided by Deceit have we, as maintainers, been able to shift responsibility for information onto its author, thus reducing our workload and increasing the range of information available.

01 Jan 1994
TL;DR: It seems like just yesterday that I was writing my first column as Chair of the Statistical Graphics Section, but given my record at meeting Newsletter deadlines it probably was just yesterday!
Abstract: It seems like just yesterday that I was writing my first column as Chair of the Statistical Graphics Section. Given my record at meeting Newsletter deadlines it probably was just yesterday! The time is here (and the deadline has passed) to write my final column before giving the pen to David Scott, next year’s Chair. Next year I, too, will be able to enjoy my Thanksgiving vacation. CONTINUED ON PAGE 2 FEATURE ARTICLE

01 Jan 1994
TL;DR: This paper provides an overview of document information structure, describes the World Wide Web project, and then discusses how the Web technologies were used to implement an interactive electronic document.
Abstract: This paper provides an overview of document information structure, describes the World Wide Web project, and then discusses how the Web technologies were used to implement an interactive electronic document.

01 Nov 1994
TL;DR: The contents of the configuration management research materials that have been published on the SEI World Wide Web Server and the Web publishing techniques employed in this process are described.
Abstract: : Configuration Management research has been performed by members of the CASE Environments Project over the course of the past five years. This report describes the contents of the configuration management research materials that have been published on the SEI World Wide Web (WWW) Server. Primary Web Structures and methods for accessing information on the Web are described. A summary of the problems and challenges encountered and the Web publishing techniques employed in this process are discussed.