Showing papers presented at "ACM international conference on Digital libraries in 1997"

PDF

Open Access

Proceedings Article•DOI•

Annotation: from paper books to the digital library

[...]

01 Jul 1997

TL;DR: The practice of annotation in a particular situation is examined: the markings students make in university-level textbooks, and their status within a community of fellow textbook readers is examined.

...read moreread less

Abstract: Readers annotate paper books as a routine part of their engagement with the materials; it is a useful practice, manifested through a wide variety of markings made in service of very different purposes. This paper examines the practice of annotation in a particular situation: the markings students make in university-level textbooks. The study focuses on the form and function of these annotations, and their status within a community of fellow textbook readers. Using this study as a basis, I discuss issues and implications for the design of annotation tools for a digital library setting.

...read moreread less

479 citations

Proceedings Article•DOI•

Finding text in images

[...]

Victor Wu¹, R. Manmatha¹, Edward M. Riseman¹•Institutions (1)

University of Massachusetts Amherst¹

01 Jul 1997

TL;DR: A four-step system which automaticnlly detects and extracts text in images is proposed and works well on imayes (with or without structured layouts) from a wide range of sources, including digitized video frames, photographs, and personal checks.

...read moreread less

Abstract: There are many applications in which the automatic detection and recognition of text embedded in images is useful. These applications include digad libraries, multimedia systems, and Geographical Information Systems. When machine generated text is prdnted against clean backgrounds, it can be converted to a computer readable form (ASCII) using current Optical Character Recognition (OCR) technology. However, text is often printed against shaded or textured backgrounds or is embedded in images. Examples include maps, advertisements, photographs, videos and stock certificates. Current document segmentation and recognition technologies cannot handle these situafons well. In this paper, a four-step system which automaticnlly detects and extracts text in images i& proposed. First, a texture segmentation scheme is used to focus attention on regions where text may occur. Second, strokes are extracted from the segmented text regions. Using reasonable heuristics on text strings such as height similarity, spacing and alignment, the extracted strokes are then processed to form rectangular boxes surrounding the corresponding ttzt strings. To detect text over a wide range of font sizes, the above steps are first applied to a pyramid of images generated from the input image, and then the boxes formed at each resolution level of the pyramid are fused at the image in the original resolution level. Third, text is extracted by cleaning up the background and binarizing the detected ted strings. Finally, better text bounding boxes are generated by srsiny the binarized text as strokes. Text is then cleaned and binarized from these new boxes, and can then be passed through a commercial OCR engine for recognition if the text is of an OCR-recognizable font. The system is stable, robust, and works well on imayes (with or without structured layouts) from a wide van’ety of sources, including digitized video frames, photographs, *This material is based on work supported in part by the National Science Foundation, Library of Congress and Department of Commerce under cooperative agreement number EEC9209623, in part by the United States Patent and mademark Office and Defense Advanced Research Projects Agency/IT0 under ARPA order number D468, issued by ESC/AXS contract number F19628-96-C-0235, in part by the National Science Foundation under grant number IF&9619117 and in part by NSF Multimedia CDA-9502639. Any opinions, findings and conclusions or recommendations expressed in this material are the author(s) and do not necessarily reflect those of the sponsors. Prrmission to make digital/hard copies ofall or part oflhis material for personal or clrrssroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title ofthe publication and its date appear, and notice is given that copyright is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires specific permission and/or fe DL 97 Philadelphia PA, USA Copyright 1997 AChi 0-89791~868-1197/7..$3.50 newspapers, advertisements, stock certifimtes, and personal checks. All parameters remain the same for-all the experiments.

...read moreread less

261 citations

Proceedings Article•DOI•

Lexical navigation: visually prompted query expansion and refinement

[...]

James W. Cooper¹, Roy J. Byrd¹•Institutions (1)

IBM¹

01 Jul 1997

TL;DR: A client-server system written in Java to allow users to issue queries, have additional terms suggested to them, explore lexical relationships, and view documents based on keywords they contain, constitutes a powerful set of tools for searching large text collections.

...read moreread less

Abstract: We have designed a document search and retrieval system, termed Lesical Navigation, which provides an interface allowing a user to expand or refine a query based on the actual content of the collection. In this work we have designed a client-server system written in Java to allow users to issue queries, have additional terms suggested to them, explore lexical relationships, and view documents based on keywords they contain. Lexical nehvorks containing domain-specific vocabularies and relationships are automatically extracted from the collection and play an important role in this navigation process. The Lexical Navigation methodology constitutes a powerful set of tools for searching large text collections.

...read moreread less

100 citations

Proceedings Article•DOI•

TINTIN: a system for retrieval in text tables

[...]

Pallavi Pyreddy¹, W. Bruce Croft¹•Institutions (1)

University of Massachusetts Amherst¹

01 Jul 1997

TL;DR: This paper examines the utility of exploiting information other than the key words in a digital document to provide the users with more flexible and powerful query capabilities and demonstrates that heuristic method based table extraction and component tagging can be performed effectively and efficiently.

...read moreread less

Abstract: Tables form an important kind of data element in text retrieval. Often, the gist of an entire news article or other exposition can be concisely captured in tabular form. In this paper, we examine the utility of exploiting information other than the key words in a digital document to provide the users with more flexible and powerful query capabilities. More specifically, we exploit the structural information in a document to identify tables and their component fields and let the users query based on these fields. Our empirical results have demonstrated that heuristic method based table extraction and component tagging can be performed effectively and efficiently. Moreover, our experiments in retrieval using the TINTIN system have strongly indicated that such structural decomposition can facilitate better representation of user’s information needs and hence more effective retrieval of tables.

...read moreread less

94 citations

Proceedings Article•DOI•

Talking in the library: implications for the design of digital libraries

[...]

Andy Crabtree¹, Michael B. Twidale¹, Jon O'Brien¹, David M. Nichols¹•Institutions (1)

Lancaster University¹

01 Jul 1997

TL;DR: The use of ethnomethodologically-informed ethnography as a means of informing the requirements elicitation, design, development and evaluation of digital libraries is described.

...read moreread less

Abstract: We describe the use of ethnomethodologically-informed ethnography as a means of informing the requirements elicitation, design, development and evaluation of digital libraries. We present the case for the contribution of such studies to the development of digital library technology to support the practices of information-searching. This is illustrated by a particular study of the help desk at a university library, examining the implications it has for designing appropriate functionality for a digital library. This requires us to address

...read moreread less

84 citations

Proceedings Article•DOI•

The digital library integrated task environment (DLITE)

[...]

Steve B. Cousins¹, Andreas Paepcke¹, Terry Winograd¹, Eric A. Bier², Ken Pier² - Show less +1 more•Institutions (2)

Stanford University¹, PARC²

01 Jul 1997

TL;DR: A case study in the design of a user interface to a digital library using a metaphor called workcenters, which is customized for users' tasks, and a mechanism to teach new users about the metaphor and interface.

...read moreread less

Abstract: We describe a case study in the design of a user interface to a digital library. Our design stems from a vision of a library as a channel to the vast array of digital information and doc ument services that are becoming available. Based on pub lished studies of library use and on scenarios, we developed a metaphor called workcenters, which are customized for users' tasks. Due to our scenarios and to prior work in the CHI community, we chose a direct-manipulation realization of the metaphor. Our system, called DLITE, is designed to make it easy for users to interact with many different ser vices while focusing on a task. Users have reacted favorably to the interface design in pilot testing, but a problem sur faced: we need a mechanism to teach new users about the metaphor and interface. We conclude by describing our approaches to this problem.

...read moreread less

84 citations

Proceedings Article•DOI•

Using words and phonetic strings for efficient information retrieval from imperfectly transcribed spoken documents

[...]

Michael Witbrock¹, Alexander G. Hauptmann¹•Institutions (1)

Carnegie Mellon University¹

01 Jul 1997

TL;DR: Digital Compression Text Library Creation digital compression text library creation.

...read moreread less

Abstract: Digital Compression Text Library Creation

...read moreread less

67 citations

Proceedings Article•DOI•

Metadata for digital libraries: architecture and design rationale

[...]

Michelle Q. Wang Baldonado¹, Chen-Chuan K. Chang¹, Luis Gravano¹, Andreas Paepcke¹•Institutions (1)

Stanford University¹

01 Jul 1997

TL;DR: A metadata architecture that includes four basic component classes: attribute model proxies, attribute model translators, metadata facilities for search proxies, and metadata repositories to facilitate metadata compatibility and interoperability in a distributed, heterogeneous, proxy-based digital library.

...read moreread less

Abstract: In a distributed, heterogeneous, proxy-based digital library, autonomous services and collections are accessed indirectly via proxies. To facilitate metadata compatibility and interoperability in such a digital library, we have designed a metadata architecture that includes four basic component classes: attribute model proxies, attribute model translators, metadata facilities for search proxies, and metadata repositories. Attribute model proxies elevate both attribute sets and the attributes they define to first-class objects. They also allow relationships among attributes to be captured. Attribute model translators map attributes and attribute values from one attribute model to another (where posMetadata facilities for search proxies provide structured descriptions both of the collections to which the search proxies provide access and of the search capabilities of the proxies. Finally, metadata repositories accumulate selected metadata from local instances of the other three component classes in order to facilitate global metadata queries and local metadata caching. In this paper, we outline further the roles of these component classes, discuss our design rationale, and analyze related work. Keywords: Metadata architecture, interoperability, attribute model, attribute model translation, metadata repository, InfoBus, proxy architecture, heterogeneity, digital libraries, CORBA.

...read moreread less

65 citations

Proceedings Article•DOI•

I read the news today, oh boy: reading and attention in digital libraries

[...]

David N. L. Levy¹•Institutions (1)

PARC¹

01 Jul 1997

TL;DR: Current assumptions about what it means to read digital doctm~nts and to read "in" digital libraries are examined and it is suggested that current work in digital library design and development is participating in a general societal trend toward shallower, more fragmented, and less concentrated reading.

...read moreread less

Abstract: This is a paper at the intersection of two topics now receiving considerable attention. The question of reading of what it is to read and how reading has changed over time has been attracting some interest in recent days, no doubt due in part to the very visible transformation of technology now under way. To a lesser but still substantial extent, the topic of human attention is also the subject of increasing discussion. There is growing awareness of attention as a highly limited resource, stemming in part from the realization that an abundance of information, good though it is in many ways, is also a tax on our attention. This paper examines current assumptions about what it means (or will mean) to read digital doctm~nts and to read "in" digital libraries. It suggests that current work in digital library design and development is participating in a general societal trend toward shallower, more fragmented, and less concentrated reading and, by calling attention to this pI~nomenon, offers an oppoRnni~ to question this

...read moreread less

61 citations

Proceedings Article•DOI•

Citation linking: improving access to online journals

[...]

Steve Hitchcock¹, Leslie Carr¹, Steve Harris¹, J.M.N. Hey¹, Wendy Hall¹ - Show less +1 more•Institutions (1)

University of Southampton¹

01 Jul 1997

TL;DR: A new system, a link service, is described, which is being developed to support novel and flexible linking mechanisms on the Web, and which is working with journal publishers to investigate the most effective ways of applying these powerful link types to enhance online journals.

...read moreread less

Abstract: The most innovative online journals are maturing rapidly and distinctive new features are emerging. Foremost among these features is the hypertext link, popularised by the World Wide Web and which will form the basis of a new, highly integrated scholarly literature. Journal integration in this instance seeks to recognise, extend and exploit relationships at the level of journal content-the papers-while maintaining some of the familiar contexts, in some cases journal identities, that define the content hierarchy and inform decision-making by readers. Links are a powerful tool for journal integration, most immediately in the form of citation linking. The paper reviews examples of citation linking in practice, and describes a new system, a link service, which is being developed to support novel and flexible linking mechanisms on the Web. One application of this link service is the Open Journal project, which is working with journal publishers to investigate the most effective ways of applying these powerful link types to enhance online journals.

...read moreread less

59 citations

Proceedings Article•DOI•

Multimedia abstractions for a digital video library

[...]

Michael G. Christel¹, David B. Winkler¹, C. Roy Taylor¹•Institutions (1)

Carnegie Mellon University¹

01 Jul 1997

TL;DR: Use and evaluation data for abstractions implemented in the Informedia Digital Video Library are presented, and implications for video delivery over the Web are discussed.

...read moreread less

Abstract: Multimedia abstractions form essential components of digital video libraries because they enable a user to determine a video’s distinguishing content without investing long viewing times or requiring high networktransfer speeds. This paper presents usage and evaluation data for abstractions implemented the Informedia Digital Video Library, and discusses implications for video delivery over the Web.

...read moreread less

Proceedings Article•DOI•

Understanding complex information environments: a social analysis of watershed planning

[...]

Lisa R. Schiff¹, Nancy A. Van House¹, Mark H. Butler¹•Institutions (1)

University of California, Berkeley¹

01 Jul 1997

TL;DR: An approach to social analysis for the development of digital libraries is presented, using the theoretical framework of Pierre Bourdieu and the situated action approach as sound bases for this understanding.

...read moreread less

Abstract: This paper presents an approach to social analysis for the development of digital libraries. If digital libraries are viewed as both social and technological artifacts, then effective design requires that we must understand the social world in which each functions. The theoretical framework of Pierre Bourdieu and the situated action approach are suggested as sound bases for this understanding. Initial findings of our work in the arena of watershed planning, as part of the UC Berkeley Digital Library Project, are reported.

...read moreread less

Proceedings Article•DOI•

Browsing in digital libraries: a phrase-based approach

[...]

Craig G. Nevill-Manning¹, Ian H. Witten², Gordon W. Paynter²•Institutions (2)

Stanford University¹, University of Waikato²

01 Jul 1997

TL;DR: A key question for digital libraries is this: how should one go about becoming familiar with a digital collection, as opposed to a physical one?

...read moreread less

Abstract: A key question for digital libraries is this: how should one go about becoming familiar with a digital collection, as opposed to a physical one? Digital collections generally present an appearance which is extremely opaque-a screen, typically a Web page, with no indication of what, or how much, lies beyond: whether a carefully-selected collection or a morass of worthless ephemera; whether half a dozen documents or many millions. At least physical collections occupy physical space, present a physical appearance, and exhibit tangible physical organizatiqn. When standing on the threshold of a large library one gains a sense of presence and permanence that reflects the care taken in building and maintaining the collection inside. No-one could confuse it with a dung-heap! Yet in the digital world the difference is not so palpable.

...read moreread less

Proceedings Article•DOI•

Evaluating Dewey concepts as a knowledge base for automatic subject assignment

[...]

Roger Thompson¹, Keith E. Shafer¹, Diane Vizine-Goetz¹•Institutions (1)

OCLC¹

01 Jul 1997

TL;DR: The results of the exploration of the Dewey Decimal Classification (Dewey) are presented, showing that Dewey demonstrates a high degree of class integrity and thus is a good knowledge base for an automatic subject assignment tool for electronic items.

...read moreread less

Abstract: This article presents the results of our exploration of the Dewey Decimal Classification (Dewey) as a concept definition source for the Scorpion project. Particularly, we show that Dewey demonstrates a high degree of class integrity and thus is a good knowledge base for an automatic subject assignment tool. 1. Electronic chaos and classification With the advent of desktop publishing and the World Wide Web (WWW), the volume of electronically available information has severely strained the ability of current tools to effectively manage access to it. Even the application of advanced information retrieval techniques like ranked retrieval has not provided the solution; it is a common occurrence to retrieve hundreds of documents for a query using one of the free search services available via the WWW. This is clearly an unacceptable solution for most people seeking information electronically. One solution to the information access problem is to partition the data into smaller chunks using a classifictio~ scheme. Librarians have been using classification schemes for centuries to succinctly describe resources. They provide a uniform means for denoting the primary topic of a resource so it can be grouped with similar items. One can then go to where those items are located and browse among them to find the particular information desired. This also applies to information stored electronically, for example, one can go to a database of similar items to search for items of interest. While classification schemes provide an efficient mechanism for grouping similar items other approaches can also be applied to bring out subjects not indicated by the class number. Librarians commonly use sr&ect headings from authoritative lists like the Library of Congress Subject Headings (LCSH) or the Sears List of Subject Headings to more fully denote the content of an item. 2. Scorpion and Dewey OCLC has initiated the Scorpion research project to address the challenge of applying classification schemes and subject headings cost effectively to electronic information. A thesis of Scorpion is that the Dewey Decimal Classiification can be used to perform automatic subject assignment for electronic items. (Similar work using the Library of Congress Classification has been performed by Larson. His paper, -921, presents a good summary of previous research in this

...read moreread less

Proceedings Article•DOI•

Shopping models: a flexible architecture for information commerce

[...]

Steven P. Ketchpel¹, Hector Garcia-Molina¹, Andreas Paepcke¹•Institutions (1)

Stanford University¹

01 Jul 1997

TL;DR: This paper formalizes the shopping models which represent these different modes of consumer to merchant interaction, and defines the application program interfaces (API) to interact with the models.

...read moreread less

Abstract: In a digital library, there are many different interaction models between customers and information providers or merchants Subscriptions, sessions, pay-per-view, shareware, and pre-paid vouchers are different models that each have different properties A single merchant may use several of them Yet if a merchant wants to support multiple models, there is a substantial amount of work to implement each one In this paper, we formalize the shopping models which represent these different modes of consumer to merchant interaction In addition to developing the overall architecture, we define the application program interfaces (API) to interact with the models We show how a small number of primitives can be used to construct a wide range of shopping models that a digital library can support, and provide examples of the shopping models in operation, demonstrating their flexibility

...read moreread less

Proceedings Article•DOI•

Content + connectivity => community: digital resources for a learning community

[...]

Gary Marchionini¹, Victor Nolet¹, Hunter Williams¹, Wei Ding¹, Josephus Beale¹, Anne Rose¹, Allison Gordon¹, Ernestine K. Enomoto¹, Lynn Harbinson¹ - Show less +5 more•Institutions (1)

University of Maryland, College Park¹

01 Jul 1997

TL;DR: The Baltimore Learning Community Project described here is based on the premise that access to resources should be tied to the assessment outcomes that drive classroom activity, and an interface for the BLC digital library was prototyped.

...read moreread less

Abstract: Digital li braries offer new opportunities to p rovide acce ss to d iverse resources beyond those held in school buildings and to allow teachers and learners to reach beyond classroom walls to other people to build distributed learning communities. Creating learning communities requires that t eachers change their behaviors and the Baltimore Learning Community Project described here is based on the premise that access to resources should be tied to the a ssessment outcomes that i ncreasingly drive c urricula a nd classroom activity. Based on examination of curriculum guides and discussions with project teachers, an interface for the BLC digital library was prototyped. Three components (explore, construct, and present) of this user interface that allows teachers to find text, v ideo, images, web sites, and instructional m odules and create their own modules are described. Although the technological challenges of building learning communities are significant, the greater challenges are mainly social and political.

...read moreread less

Proceedings Article•DOI•

Multiple search engines in database merging

[...]

Ellen M. Voorhees¹, Richard M. Tong•Institutions (1)

National Institute of Standards and Technology¹

01 Jul 1997

TL;DR: This paper shows that the behavior of two previously developed isolated techniques is indeed independent of the particular search engines that participate in the search, and suggests that these methods may be able to improve the effectiveness of World Wide Web searches by merging the output from several engines.

...read moreread less

Abstract: A database merging technique is a strategy for combining the results of multiple independent searches into a single cohesive response. While a variety of techniques have been developed to address a range of problem characteristics, our work focuses on environments in which search engines work in isolation. This paper shows that the behavior of two previously developed isolated techniques is indeed independent of the particular search engines that participate in the search. Two very different search engines, SMART and TOPIC@, were each used to retrieve documents from five subcollections. The relative effectiveness of the merged result compared to the effectiveness of a corresponding single collection run is comparable for both engines. The effectiveness of the merged result is improved when both search engines search the same five subcollections but participate in a single merging. The improvement is such that this lo-collection merge is sometimes more effective than the single collection run. This last finding suggests that these methods may be able to improve the effectiveness of World Wide Web searches by merging the output from several engines.

...read moreread less

Proceedings Article•

Annotation: From Paper Books to Digital Library.

[...]

Catherine C. Marshall

01 Jan 1997

Proceedings Article•DOI•

Seed ontologies: growing digital libraries as distributed, intelligent systems

[...]

Peter Weinstein¹, Gene Alloway¹•Institutions (1)

University of Michigan¹

01 Jul 1997

TL;DR: The design and use of ontologies are described in the University of Michigan Digital Library, which will model all aspects of the digital library, including content, services, and licenses, and refined and extended the IFLA hierarchy for the realization of work.

...read moreread less

Abstract: Ontologies are more than a particularly elaborate approach to the description and classification of information. They can be used to support the operation and growth of a new kind of digital library, implemented as a distributed, intelligent system. We describe the design and use of ontologies in the University of Michigan Digital Library. These ontologies will model all aspects of the digital library, including content, services, and licenses. We have refined and extended the IFLA hierarchy for the realization of work, and are starting to use ontologies to support reasoning about content search. We have also used the ontologies to classify the capabilities of computational elements of the system (agents), in a dynamic way that sustains functionality as new agents are added to the system.

...read moreread less

Proceedings Article•DOI•

Internet browsing and searching (poster): user evaluations of category map and concept space techniques

[...]

Hsinchun Chen, Bruce R. Schatz, Andrea L. Houston, Robin R. Sewell, Tobun D. Ng, Chienting Lin - Show less +2 more

01 Jul 1997

TL;DR: The results indicate that a Kohonen self-organizing map (SOM)-based map for browsing, and an automatically generated concept space algorithm for searching, can help improve browsing and/or searching the Internet.

...read moreread less

Abstract: pages indicated that there was limited overlap between The Internet provides an exceptional testbed for develthe homepages retrieved by the subject-suggested and oping algorithms that can improve browsing and searchthesaurus-suggested terms. Since the retrieved homeing large information spaces. Browsing and searching pages for the most part were different, this suggests that tasks are susceptible to problems of information overa user can enhance a keyword-based search by using load and vocabulary differences. Much of the current an automatically generated concept space. Subjects esresearch is aimed at the development and refinement of pecially liked the level of control that they could exert algorithms to improve browsing and searching by adover the search, and the fact that the terms suggested dressing these problems. Our research was focused on by the thesaurus were ‘‘real’’ ( i.e., originating in the discovering whether two of the algorithms our research homepages) and therefore guaranteed to have retrieval group has developed, a Kohonen algorithm category success. map for browsing, and an automatically generated concept space algorithm for searching, can help improve browsing and/or searching the Internet. Our results indicate that a Kohonen self-organizing map (SOM)-based

...read moreread less

Proceedings Article•DOI•

Secure distribution of watermarked images for a digital library of ancient papers

[...]

Christian Rauber¹, Joseph Ó Ruanaidh¹, Thierry Pun¹•Institutions (1)

University of Geneva¹

01 Jul 1997

TL;DR: An important feature of the MEDIA project is a digital watermarking tool which embeds hidden signatures in images which provides copyright protection and helps to ensure that the image will not be copied and sold and without proper authorisation.

...read moreread less

Abstract: The electronic publishing, storage and distribution of documents is growing increasingly important and will have profound implications for our economy, culture and society. The multimedia digitalisation of libraries and the distribution of the contents of museums is revolutionising these organisations and will make these resources available to a much wider audience than was previously possible. The main goal of our MEDIA project (Mobile Electronic Documents with Interacting Agents) is the development of a system for the archival, retrieval, and distribution of electronic documents. For this purpose, a mobile agent platform is used to securely distribute these documents. Information is accessed by a search mechanism that allows the retrieval of text and images according to their content. An important feature of the system is a digital watermarking tool which embeds hidden signatures in images. This provides copyright protection and helps to ensure that the image will not be copied and sold and without proper authorisation. The management of the database of documents and images is accomplished by an extensible object relational database management system. In addition, documents and data can be accessed through the World Wide Web network.

...read moreread less

Proceedings Article•

Proceedings of the second ACM international conference on Digital libraries

[...]

Robert B. Allen, Edie Rasmussen¹•Institutions (1)

University of Pittsburgh¹

01 Jul 1997

Proceedings Article•DOI•

AGS: introducing agents as services provided by digital libraries

[...]

J. Alfredo Sánchez¹, John J. Leggett², John L. Schnase³•Institutions (3)

Universidad de las Américas Puebla¹, Texas A&M University², Missouri Botanical Garden³

01 Jul 1997

TL;DR: An architecture for digital libraries that introduces user agents as one of the services available to publishers, librarians and patrons is presented and is intended to serve as a testbed to investigate alternative user interfaces to digital libraries and, in particular, a host of unexplored issues raised by the introduction of user agents.

...read moreread less

Abstract: This paper presents an architecture for digital libraries that introduces user agents as one of the services available to publishers, librarians and patrons. User agents are the fundamental component of an emerging style of human-computer interaction based on the concept of delegation and indirect management of tasks. In the agent-enabled digital library architecture, termed “AGS”, service providers define classes of agents that describe helpful tasks for patrons. Patrons, in turn, delegate work by selecting agents from the available agent classes and assigning specific tasks to be performed. AGS enables the development of agents that rely on a wide variety of construction approaches while maintaining a unified view of an active environment. AGS is intended to serve as a testbed to investigate alternative user interfaces to digital libraries and, in particular, a host of unexplored issues raised by the introduction of user agents.

...read moreread less

Proceedings Article•DOI•

Web-based collaborative library research

[...]

Scott P. Robertson, Sherif Jitan, Kathy Reese

01 Jul 1997

TL;DR: A webbased system that employees can use to interact with library researchers and automates tracking of research service usage and indexing and archiving of research requests and actions is described.

...read moreread less

Abstract: The U S WEST Research & Information Group, the corporate research library, has recently moved many of its resources and services to the company’s intranet. Principle among the group’s functions is conducting information searches and research analyses for employees. This paper describes a webbased system that employees can use to interact with library researchers. The system also automates tracking of research service usage and indexing and archiving of research requests and actions. Library clients initiate research requests using a personal web page. Each request generates its own web page on which interaction between cIient and researcher takes place. Researchers and clients can post comments, record actions, use e-mail, and upload and download files through the request web page. When the interaction is over, the client may record an evaluation using the same web page and all actions are saved for administrative purposes. Research interactions are maintained in a searchable archive which cau be viewed by all employees.

...read moreread less

Proceedings Article•DOI•

Hypertext construction using statistical and semantic similarity

[...]

Dongwook Shin¹, Sejin Nam¹, Munseok Kim•Institutions (1)

Chungnam National University¹

01 Jul 1997

TL;DR: This paper addresses how to create good hypertexts, combining the notion of statistical and semantic similarity in an appropriate manner, and carries out an experiment with several theses and technical reports written in Korean, measuring how the method proposed here creates hypertext well.

...read moreread less

Abstract: 1 Introduction Automatic construction of hypertext has been gaining growing attention recently in that a number of documents being produced is beginning to be made in the form of hypertext, which calls for an enormous amount of intellectual work by experts. In this decade, several studies have been carried out, employing. techniques mainly developed for retrieving relevant documents to user needs. Among these, most studies underlie the vector space model and well-known weight-ing schemes, from which the notion of similarity (statistical similarity) has been devised and applied for creating hyper-texts. Automatic construction of hypertext has been gaining growing attention recently in that a substantial portion of documents being produced is beginning to be made in the form of hypertext and it becomes more diilicult to hypertext&e them solely by experts. It involves highly intellectual works: we have to understand the content of a document in some degree, find out important keywords, divide the documents into nodes, each of which addresses a specific topic, categorize link types, associate each keyword with related nodes-via specific links of certain types and so on. However, in order to create well-organized hypertexts, semantics of the contents should be also investigated, since generating hypertexts involves highly intellectual works-understanding contents, splitting them into nodes, finding keywords , and making links between entities that are supposed to be related. This paper addresses how to create good hypertexts, combining the notion of statistical and semantic similarity in an appropriate manner. The notion of statistical similarity is based on a weighting scheme by tf x idf and inner vector product, whereas the notion of semantic similarity underlies thesaurus and partial match. We carry out an experiment with several theses and technical reports written in Korean, measuring how the method proposed here creates hypertext well, compared to the result made by human experts. The result shows that the method makes hypertexts closer to those by human experts than that using only statistical method does. From several yesrs ago, researchers have paid attention to this area and achieved some results [lo, 11, 3, 21. G. Salton and C. Buckley [lo] suggested a method for identifying content links which associate pieces of documents with similar content. In principle, the method first applies global text analysis between documents and considers the individual sentence pair only for those documents exhibiting high pairwise document similarities. The similarity between parts of documents is measured as the inner …

...read moreread less

Proceedings Article•DOI•

Evaluating the cost of Boolean query mapping

[...]

Chen-Chuan K. Chang¹, Hector Garcia-Molina¹•Institutions (1)

Stanford University¹

01 Jul 1997

TL;DR: The experimental results show that in many cases incremental post-ltering cost may be acceptable, while the batch post-magnifying cost may sometimes be extremely large.

...read moreread less

Abstract: Non-uniform query languages make searching over heterogeneous information sources difcult. Our approach is to allow a user to compose Boolean queries in one rich front-end language. For each user query and target source, we transform the user query into a subsuming query that can be supported by the source but that may return extra documents. The results are then processed by a lter query to yield the correct nal results. This post-ltering approach may involve signicant cost because the documents that the users will not see may have to be retrieved and ltered. There are generally two ways to implement post-ltering: batch post-ltering and incremental post-ltering. In this paper we evaluate the costs of both methods for different search features such as proximity operators. The experimental results show that in many cases incremental post-ltering cost may be acceptable, while the batch post-ltering cost may sometimes be extremely large.

...read moreread less

Proceedings Article•DOI•

Managing a digital library of legislation

[...]

Timothy Arnold-Moore¹, Philip Anderson¹, Ron Sacks-Davis¹•Institutions (1)

RMIT University¹

01 Jul 1997

TL;DR: An overview of the Themis system, a commercial implementation of a digital library of legislation that uses SGML to store legislation and how versioning impacts the storage of fragments of documents and management of references within and between documents is provided.

...read moreread less

Abstract: We provide an overview of the Themis system, a commercial implementation of a digital library of legislation. !l’hemis uses SGML to store legislation. This allows a single source document to be exported in a number of different formats and presentations. Themis also allows access to different versions of legislation by specifying a point-in-time at which the law is required. We discuss how this is achieved in Themis and how versioning impacts the storage of fragments of documents and management of references within and between documents.

...read moreread less

Proceedings Article•DOI•

Auto-adaptive illustration through conceptual evocation

[...]

Michel Crampes¹•Institutions (1)

Mines ParisTech¹

01 Jul 1997

TL;DR: This paper finds its inspiration in this natnral mechanism to consider a formal model of Conceptual Evocation that could be used for automatic adaptative ilhrstmtion or, more generally, dynamic and autoadaptative hypemavigation in hypermedia applications.

...read moreread less

Abstract: When engaged in reading, a reader is permanently building up associations of ideas either freely, or guided by the evocative power of the text and his imagination. This paper finds its inspiration in this natnral mechanism to consider a formal model of Conceptual Evocation that could be used for automatic adaptative ilhrstmtion or, more generally, dynamic and autoadaptative hypemavigation in hypermedia applications. We borrow from Sowa’s Conceptual Graphs a theoretical l?amework for node conceptual modelling. In search of more creative mechanisms, we introduce Conceptnal State Vectors to tag the nodes, and a Conceptual Evocative Engine to dynamically create Conceptual Evocative Links between nodes. Finally a mock-up is presented that shows the operationnahty of all the concepts in the context of a TV program composition.

...read moreread less

Proceedings Article•

I read the news today, oh boy

[...]

D. M. Levy¹•Institutions (1)

PARC¹

01 Jan 1997

Proceedings Article•DOI•

Content: a practical, scalable, high-performance multimedia database

[...]

Lawrence Yapp¹, Craig Yamashita¹, Gregory L. Zick¹•Institutions (1)

University of Washington¹

01 Jul 1997

TL;DR: This paper describes Content, a practical, scalable, and highperformance text-indexed multimedia database system that contains more than 25,000 multimedia objects that span two different collections of valuable historical photographs.

...read moreread less

Abstract: This paper describes Content, a practical, scalable, and highperformance text-indexed multimedia database system. The novelty of Content is in its approach of integrating high-volume storage, fast searching and browsing, easy multimedia acquisition, effective updates, scalability, extendibility, and an API based on HTTP. Content is also a low-cost solution for large multimedia databases that is available today. Standard Web-based browsers such as Netscape can query the Content server. The API is flexible so that different and unique Content clients on multiple platforms can be built to access multiple Content servers. The Content architecture permits any multimedia type to be stored. Text descriptions are used as indices for images and videos. Content includes an easy-to-use Windows-based acquisition station for acquiring images and video. Currently, Content is being used in a real library setting and contains more than 25,000 multimedia objects that span two different collections of valuable historical photographs. In terms of performance, Content can access a single image in a database of over one million images in less than a second.

...read moreread less