scispace - formally typeset
Search or ask a question

Showing papers in "International Journal on Digital Libraries in 1997"


Journal Articleā€¢DOIā€¢
Serge Abiteboul1, Dallan Quass1, Jason G. McHugh1, Jennifer Widom1, Janet L. Wiener1Ā ā€¢
TL;DR: The main novelties of the Lorel language are the extensive use of coercion to relieve the user from the strict typing of OQL, which is inappropriate for semistructured data; and powerful path expressions, which permit a flexible form of declarative navigational access and are particularly suitable when the details of the structure are not known to the user.
Abstract: language, designed for querying semistructured data. Semistructured data is becoming more and more prevalent, e.g., in structured documents such as HTML and when performing simple integration of data from multiple sources. Traditional data models and query languages are inappropriate, since semistructured data often is irregular: some data is missing, similar concepts are represented using different types, heterogeneous sets are present, or object structure is not fully known. Lorel is a user-friendly language in the SQL/OQL style for querying such data effectively. For wide applicability, the simple object model underlying Lorel can be viewed as an extension of the ODMG data model and the Lorel language as an extension of OQL. The main novelties of the Lorel language are: (i) the extensive use of coercion to relieve the user from the strict typing of OQL, which is inappropriate for semistructured data; and (ii) powerful path expressions, which permit a flexible form of declarative navigational access and are particularly suitable when the details of the structure are not known to the user. Lorel also includes a declarative update language. Lorel is implemented as the query language of the Lore prototype database management system at Stanford. Information about Lore can be found at http://www-db.stanford.edu/lore. In addition to presenting the Lorel language in full, this paper briefly describes the Lore system and query processor. We also briefly discuss a second implementation of Lorel on top of a conventional object-oriented database management system, the O2 system.

1,257Ā citations


Journal Articleā€¢DOIā€¢
TL;DR: This paper proposes a query language, WebSQL, that takes advantage of multiple index servers without requiring users to know about them, and that integrates textual retrieval with structure and topology-based queries.
Abstract: The World Wide Web is a large, heterogeneous, distributed collection of documents connected by hypertext links. The most common technology currently used for searching the Web depends on sending information retrieval requests to "index servers". One problem with this is that these queries cannot exploit the structure and topology of the document network. The authors propose a query language, WebSQL, that takes advantage of multiple index servers without requiring users to know about them, and that integrates textual retrieval with structure and topology-based queries. They give a formal semantics for WebSQL using a calculus based on a novel "virtual graph" model of a document network. They propose a new theory of query cost based on the idea of "query locality," that is, how much of the network must be visited to answer a particular query. Finally, they describe a prototype implementation of WebSQL written in Java.

402Ā citations


Journal Articleā€¢DOIā€¢
TL;DR: The metadata required for a diverse set of Stanford Digital Library services is surveyed and categorize, and an extensible metadata architecture is proposed that fits into the established infrastructure and promotes interoperability among existing and de-facto metadata standards.
Abstract: The overall goal of the Stanford Digital Library project is to provide an infrastructure that affords interoperability among heterogeneous, autonomous digital library services. These services include both search services and remotely usable information processing facilities. In this paper, we survey and categorize the metadata required for a diverse set of Stanford Digital Library services that we have built. We then propose an extensible metadata architecture that meets these requirements. Our metadata architecture fits into our established infrastructure and promotes interoperability among existing and de-facto metadata standards. Several pieces of this architecture are implemented; others are under construction. The architecture includes attribute model proxies, attribute model translation services, metadata information facilities for search services, and local metadata repositories. In presenting and discussing the pieces of the architecture, we show how they address our motivating requirements. Together, these components provide, exchange, and describe metadata for information objects and metadata for information services. We also consider how our architecture relates to prior, relevant work on these two types of metadata.

290Ā citations


Journal Articleā€¢DOIā€¢
TL;DR: It is shown that almost standard database optimization techniques can be used to answer queries without having to load the entire document into the database, and the interaction of full-text indexes with standard database collection indexes that provide important speed-up are considered.
Abstract: that consist in grammars annotated with database programs. To query documents, we introduce an extension of OQL, the ODMG standard query language for object databases. Our extension (named OQL-doc) allows us to query documents without a precise knowledge of their structure using in particular generalized path expressions and pattern matching. This allows us to introduce in a declarative language (in the style of SQL or OQL), navigational and information retrieval styles of accessing data. Query processing in the context of documents and path expressions leads to challenging implementation issues. We extend an object algebra with new operators to deal with generalized path expressions. We then consider two essential complementary optimization techniques. We show that almost standard database optimization techniques can be used to answer queries without having to load the entire document into the database. We also consider the interaction of full-text indexes (e.g., inverted files) with standard database collection indexes (e.g., B-trees) that provide important speed-up.

282Ā citations


Journal Articleā€¢DOIā€¢
TL;DR: A multidimensional ranking scheme based on the three dimensions of space, time, and theme is presented graphically to inform users about how well data sets from a digital spatial library meet their spatial, temporal, and thematic targets.
Abstract: Digital spatial libraries currently under development are generating large repositories of data which will continue to grow. As these repositories grow, the situation will inevitably arise in which a digital library user may be confronted with several hundred spatial data sets in response to a particular query. The question then arises as to how the results from this search can be most easily assimilated by the user. Text based materials have benefited from substantial research and experience on ranking of search results. Ranking of spatial data sets has not received the same attention since there has been little motivation for such activity until recently. In this paper we propose a multidimensional ranking scheme based on the three dimensions of space, time, and theme. The multidimensional rank is presented graphically to inform users about how well data sets from a digital spatial library meet their spatial, temporal, and thematic targets.

56Ā citations


Journal Articleā€¢DOIā€¢
TL;DR: The specific requirements and problems for integration of information within hospitals are discussed and a software architecture which has been designed according to these requirements is presented, adapted to the demands on integration of replicated information within hospital information systems.
Abstract: A fundamental concern in hospital informa- tion systems is the integration of information across heterogeneous subsystems at the data level. Consistent data replication is a central problem to be solved in this domain. The specific requirements and problems for integration of information within hospitals are discussed and a software architecture which has been designed according to these requirements is presented. The pur- pose of this paper is to study the problems and solutions of propagation of information updates across heteroge- neous subsystems within hospitals. The general structure of the presented architecture is based on the reference architecture for federated database systems (Sheth and Larson 1990) and adapted to the speciĀ®c demands on integration of replicated information within hospital information systems. This architecture is the basis for algorithms that restore the integrity of replicated infor- mation when changes occur. A prototype implementa- tion is discussed.

52Ā citations


Journal Articleā€¢DOIā€¢
TL;DR: This paper presents an approach for integrating multiple thesaurus databases and the integration of multilingual and monolingual thesauri, and takes advantage of the most advanced Internet and CORBA technology currently available in public domain and in commercial implementations.
Abstract: As a result of the distribution of interrelated information over several different information systems, the interconnection of information systems has increased in recent years. However, a purely technical interconnection is insufficient for users who need to find their way to information they are looking for. Thesauri are a proven means to identify documents, e.g., books of interest in a library. For different domains, different thesauri are available, which can be used in information systems as well, e.g., for the indexing and retrieval of data objects. Thus, the interconnection of information systems raises the need to integrate related thesauri. Furthermore, recent advances in open interoperability technologies (World Wide Web, CORBA, and Java) offer the potential for completely new technical solutions for employing thesauri. This paper presents an approach for integrating multiple thesaurus databases. It concentrates on the integration of distributed and heterogeneous thesaurus databases and the integration of multilingual and monolingual thesauri. The software architecture takes advantage of the most advanced Internet and CORBA technology currently available in public domain and in commercial implementations.

44Ā citations


Journal Articleā€¢DOIā€¢
TL;DR: Conventional full-text systems represent documents as sets of text documents, but in this paper two new approaches are proposed that allow for much more granular representation of text.
Abstract: Conventional full-text systems represent documents as sets of

39Ā citations


Journal Articleā€¢DOIā€¢
TL;DR: The state of the ongoing Dublin Core effort as of January 1997 is summarized, the design goals that motivate the Dublin Core are described, the results of each of the workshops held thus far are summarized, brief descriptions of pilot projects that use the DublinCore, and points to anticipated future developments.
Abstract: The Dublin Core Metadata Workshop Series began in 1995 with an invitational workshop intended to bring together librarians, digital library researchers, content experts, and text-markup experts to promote better description standards for electronic resources. The Dublin Core is a 15-element set of descriptors that has emerged from this effort in interdisciplinary and international consensus building. This paper summarizes the state of the ongoing Dublin Core effort as of January 1997. It describes the design goals that motivate the Dublin Core, summarizes the results of each of the workshops held thus far, provides brief descriptions of pilot projects that use the Dublin Core, and points to anticipated future developments.

36Ā citations


Journal Articleā€¢DOIā€¢
TL;DR: The paper describes implementational details of several of the components of the GKRS and its implementation in terms of a prototype system, loosely couples a variety of multimedia knowledge sources that are in part represented in Terms of the semantic network and neural network representations developed in artificial intelligence research.
Abstract: Digital libraries serving multimedia information that may be accessed in terms of geographic content and relationships are creating special challenges and opportunities for networked information systems. An especially challenging research issue concerning collections of geo-referenced information relates to the development of techniques supporting geographic information retrieval (GIR) that is both fuzzy and concept-based. Viewing the meta-information environment of a digital library as a heterogeneous set of services that support users in terms of GIR, we define a geographic knowledge representation system (GKRS) in terms of a core set of services of the meta-information environment that is required in supporting concept-based access to collections of geospatial information. In this paper, we describe an architecture for a GKRS and its implementation in terms of a prototype system. Our GKRS architecture loosely couples a variety of multimedia knowledge sources that are in part represented in terms of the semantic network and neural network representations developed in artificial intelligence research. Both textual analysis and image processing techniques are employed in creating these textual and iconic geographical knowledge structures. The GKRS also employs spreading activation algorithms in support of concept-based knowledge retrieval. The paper describes implementational details of several of the components of the GKRS as well as discussing both the lessons learned from, and future directions of, our research.

36Ā citations


Journal Articleā€¢DOIā€¢
TL;DR: In this article, the authors present the results of an ongoing effort for the design and implementation of an architecture based on digital library technologies, for the provision of user-oriented telematic services in a regional healthcare network.
Abstract: Monolithic information system to effectively serve the needs of an entire healthcare organizational structure. Thus, information and telecommunications systems must primarily provide the infrastructure to support the effective integration of distributed and heterogeneous components, ensuring overall integrity in terms of functional and information interworking. This approach i.e., the integration of heterogeneous autonomous distributed systems, to developing and managing regional healthcare networks ensures the transfer and integration of consistent information between healthcare facilities, without imposing constraints on the operation of individual clinical units. This paper presents the results of an ongoing effort for the design and implementation of an architecture based on digital library technologies, for the provision of user-oriented telematic services in a regional healthcare network. Specifically, it addresses issues related to the provision of user-oriented services, transparent to the needs of different user groups and the requirements of specific tasks, based on: a) meta-information for the creation of an information infrastructure for the regional healthcare network which is, effectively, a multimedia distributed digital library, b) intelligent informationretrieval strategies to selectively retrieve information from multimedia data, c) agent-based technologies for effective service delivery adapted to the current user needs and the task at hand, and d) middleware services that explicitly reveal not only the characteristics of the information sources, but also address the context of specific telematic services, through appropriate mediation mechanisms.

Journal Articleā€¢DOIā€¢
TL;DR: A system designed to extract and structure the information contained in free text radiology reports using natural language processing techniques that can be used to index and provide access to image databases is described.
Abstract: We describe a system designed to extract and structure the information contained in free text radiology reports. The system uses natural language processing techniques to transform free text descriptions into structured information units that can be used to index and provide access to image databases. To provide access to the results of the system we developed a unique user interface for reporting and accessing radiology reports. We present a general system architecture for information extraction. Although the prototype focuses on thoracic radiology, this system can be extended into other areas of medicine. We discuss the system's knowledge sources and structures, and explain how natural language processing techniques are used to extract information from reports. The results of clinical testing of the prototype are presented and evaluated.

Journal Articleā€¢DOIā€¢
TL;DR: This paper presents scalable methods for locating information at different levels within a distributed digital library environment and proposes solutions to the problem of storage and retrieval of large objects on both secondary and tertiary storage devices.
Abstract: This paper presents a summary of some of the work-in-progress within the Alexandria Digital Library Project. In particular, we present scalable methods of locating information at different levels within a distributed digital library environment. Starting at the high level, we show how queries can be routed to appropriate information sources. At a give a source, efficient query processing is supported by using materialized views and multidimensional index structures. Finally, we propose solutions to the problem of storage and retrieval of large objects on secondary and tertiary storage.

Journal Articleā€¢DOIā€¢
TL;DR: A distributed, wide area network based approach to collecting, cataloguing, storing, and providing Web access for large-data-objects that originate from the operation of many types of online instruments and imaging systems, and are a ''staple'' of modern intelligence, scientific, and health care environments.
Abstract: We describe a distributed, wide area network based approach to collecting, cataloguing, storing, and providing Web access for large-data-objects that origi- nate as high-speed data streams. Such data streams result from the operation of many types of online instruments and imaging systems, and are a ''staple'' of modern intelligence, scientific, and health care environments. The approach provides for real-time conversion of the data streams and large datasets to ''objects'' that are manage- able, extensible, searchable, browsable, persistent, and available to both ''ordinary'' and high-performance applications through the integration of a high-speed distributed cache and transparent management of tertia- ry storage. The user interfaces - for both application users and data collection curators - are provided by the capabilities of the World Wide Web. The capabilities of the architecture are not unlike a digital library system, and we give an example of a digital image library that has been built using this architecture. However, our ap- proach particularly addresses the issues involved in creating such digital library-like collections automatically from the high date-rate, real-time output of, e.g., satellite imaging systems, scientific instruments, health care imaging systems, etc. We discuss the capabilities, archi- tecture, and implementation of such a system, as well as several example applications in data-intensive environ- ments. The applications include a metropolitan area ATM network based, online health care video imaging system, and several image database applications, includ- ing a photographic-image library (see (9)). We also describe the security architecture used to enforce data owner imposed access conditions.

Journal Articleā€¢DOIā€¢
TL;DR: The College of American Pathologists (CAP), secretariat of the Systematized Nomenclature of Human and Veterinary Medicine (SNOMED), has entered into partnership with the DICOM (Digital Imaging and Communications in Medicine) Standards Committee, to develop the controlled terminology that is needed for diagnostic imaging applications.
Abstract: Existing clinical nomenclatures do not provide comprehensive, detailed coverage for multispecialty biomedical imaging. To address clinical needs in this area, the College of American Pathologists (CAP), secretariat of the Systematized Nomenclature of Human and Veterinary Medicine (SNOMED), has entered into partnership with the DICOM (Digital Imaging and Communications in Medicine) Standards Committee, the American College of Radiology, the American Dental Association, the American Academy of Ophthalmology, the American Society for Gastrointestinal Endoscopy, the American Academy of Neurology, the American Veterinary Medical Association, and other professional specialty organizations to develop the controlled terminology that is needed for diagnostic imaging applications. Terminology development is coordinated with ongoing development and maintenance of the DICOM Standard. SNOMED content is being enhanced in two general areas: 1) imaging procedure descriptions and 2) diagnostic observations. The SNOMED DICOM Microglossary (SDM) has been developed to provide context-dependent value sets (SDM Context Groups) for DICOM codedentry data elements and semantic content specifications (SDM Templates) for reports and other structures composed of multiple data elements. The capability of storing explicitlylabeled coded descriptors from the SDM in DICOM images and reports improves the potential for selective retrieval of images and related information. A pilot test of distributed multispecialty terminology development using a World Wide Web (WWW) application was performed in 1997, demonstrating the feasibility of large-scale distributed development of SDM



Journal Articleā€¢DOIā€¢
TL;DR: Certain critical issues of incorporating digital library (DL) technology into large scale biomedical image archives such as picture archiving and communication systems (PACS) are identified.
Abstract: The purpose of this paper is to identify certain critical issues of incorporating digital library (DL) technology into large scale biomedical image archives such as picture archiving and communication systems (PACS). Digital library technology can increase the knowledge content and utilities of PACS and associated medical systems to provide a broader range of biomedical imaging services. Furthermore, we illustrate certain application areas with examples from prototypes developed on the hospital-integrated PACS at UCSF.

Journal Articleā€¢DOIā€¢
TL;DR: The integrated use of the proposed metadata appears to be a promising method of better supporting the manipulation and analysis of social science data.
Abstract: Statistical social science databases present many unique data management problems. Among them is the fact that computer-aided use of statistical social science data is subject to serious but preventable errors in data manipulation and analysis, particularly given the increasing number of non-expert users. Two incremental prototype solutions to the problem of correctly manipulating units of measure when manipulating data are described. These implementations use rudimentary metadata representing measurement-theoretical and category attribute information to support error checking of data derivations common in research and reference using social science data. The specification of the metadata structures and the implementation of its processing is informed by relevant principles from literature in statistical and scientific data models, dimensional analysis, and knowledge representation with formal ontology. Design parameters useful in the effort to establish appropriate variable-level metadata specifications for social science databases used in digital environments are discussed. The integrated use of the proposed metadata appears to be a promising method of better supporting the manipulation and analysis of social science data. While measurement theoretical metadata has further application to the cataloging and retrieval of collections of statistical databases, the importance is argued of metadata for social science databases to serve users beyond these purposes.

Journal Articleā€¢DOIā€¢
TL;DR: A digital atlas of cervical and lumbar spine x-rays, developed at the Lister Hill National Center for Biomedical Communications, with a need for an online medical digital library reference tool to assist readers in producing interpretations.
Abstract: With the increasing use of digitized x-rays in PACS systems and statistical databanks, there appears to be a need for an online medical digital library reference tool to assist readers in producing interpretations. At the Lister Hill National Center for Biomedical Communications, a research and development division of the National Library of Medicine (NLM), we are developing a digital atlas of cervical and lumbar spine x-rays. The images were collected in the second National Health and Nutrition Examination Survey (NHANES II), a nationwide survey conducted by the National Center for Health Statistics. The initial platform for the atlas is a Sun SPARC workstation running the Solaris operating system. The atlas can display images on any X-Window compatible device. Images are selected for display by specifying the condition of interest and the grade of severity (example: ``anterior osteophytes, grade 2''). Multiple atlas images may be displayed simultaneously. An image processing module is provided, which allows histogram equalization, zooming, window/level, and other operations on the atlas images. In addition, the user may incorporate additional images into the atlas. Further work is proceeding on using Java applets and Java Data Base Connectivity technology to create a tool that is Web-enabled and platform independent.