scispace - formally typeset
Search or ask a question

Showing papers in "Library Trends in 1999"


Journal Article
TL;DR: Co-word analysis is a content analysis technique that uses patterns of co-occurrence of pairs of items in a corpus of texts to identify the relationships between ideas within the subject areas presented in these texts.
Abstract: IN THE LAST HALF CENTURY, AS THE SCIENCE I,ITERATURE has increased dramatically, scientists found it increasingly difficult to locate needed data, and it is increasingly difficult for policymakers to understand the complex interrelationship of science in order to achieve effective research planning. Some quantitative techniques have been developed to ameliorate these problems; co-word analysis is one of these techniques. Based on the co-occurrence frequency of pairs of words or phrases, co-word analysis is used to discover linkages among subjects in a research field and thus to trace the development of science. Within the last two decades, this technique, implemented by several research groups, has proved to be a powerful tool for knowledge discovery in databases. This article reviews the development of co-word analysis, summarizes the advantages and disadvantages of this method, and discusses several research issues. INTRODUCTION Since World War 11, the scope and volume of scientific research have increased dramatically. This is well reflected in the growth of the literature. In the 1960s, the amount of scientific literature was estimated to be doubling approximately every ten years (Price, 1963). Three decades later, in the 199Os, along with developments in information technolo

520 citations


Journal Article
TL;DR: The strengths and limitations of four classificatory approaches are described in terms of their ability to reflect, discover, and create new knowledge.
Abstract: THELINK BETWEEN CLASSIFICATION AND KNOWLEDGE is explored. Classification schemes have properties that enable the representation of entities and relationships in structures that reflect knowledge of the domain being classified. The strengths and limitations of four classificatory approaches are described in terms of their ability to reflect, discover, and create new knowledge. These approaches are hierarchies, trees, paradigms, and faceted analysis. Examples are provided of the way in which knowledge and the classification process affect each other.

248 citations


Journal Article
TL;DR: This work describes how to use software that can create suggestive juxtapositions of Medline records, the purpose being to help biomedical researchers detect new and useful relationships.
Abstract: THEPROBLEM OF HOW TO FIND INTERESTING but previously unknown implicit information within the scientific literature is addressed. Useful information can go unnoticed by anyone, even its creators, if it can be inferred only by considering together two (or more) separate articles neither of which cites the other and which have no authors in common. The two articles (or two sets ofarticles) are in that case said to be complementary and noninteractive. During the past twelve years, this project has uncovered and reported numerous complementary relationships in the biomedical literature that have led to new information of scientific interest. Several of these literature-based discoveries subsequently have been corroborated through clinical or laboratory investigations. We describe how to use software that can create suggestive juxtapositions of Medline records, the purpose being to help biomedical researchers detect new and useful relationships. This software, called Arrowsmith, has also proved valuable as a tool for investigating patterns of complementary relationships in natural language text (Arrowsmith can be used free of charge at http://kiwi.uchicago.edu). INTRODUCTION The juxtaposition of certain natural language text passages from different biomedical journal articles can reveal or suggest new information not contained in the original passages considered separately. For example, Don R. Swanson, Division of the Humanities, The University of Chicago, 1010 E. 59th St., Chicago, IL 60637 Neil R. Smalheiser, Department of Psychiatry, University of Illinois, 1601 W. Taylor St., Chicago, IL 60612 LIBRARY TRENDS, Vol. 48, No. 1, Summer 1999, pp. 48-59 01999 The Board of Trustees, University of Illinois SWANSON AND SMALHEISER/ARROWSMITH 49 one article might report an association or link between substance A and some physiological parameter or property B while another reports a relationship between B and disease C. If nothing has been published concerning a link between A and C via B, then to bring together the separate articles on A-B and B-C may suggest a novel AC relationship of scientific interest. There are now about 9 million records in the Medline database, and hence about 40 trillion (40,000,000,000,000) possible pairings of records. Clearly the vast majority of record pairs and article pairs have never been considered together. It is plausible to think that there are many undiscovered implicit relationships within the biomedical literature, at least some of which might be important (Swanson, 1993, pp. 611-19). It is important, therefore, to develop systematic methods for finding them. The possibility of literature-based discovery implied by the above model underscores two important properties of sets of scientific articlescomplementarity and noninteractivity. Two sets of articles are defined here as complementary if together they can reveal useful information not apparent in the two sets considered separately; two sets are defined as noninteractive if they are disjoint and if no article in either set cites, or is co-cited with, any member of the other set (Swanson, l987,1990a, 1991). The first three examples of “undiscovered public knowledge” (Swanson, 1986a, 1986b, 1988,1990~) demonstrated that complementary noninteractive structures actually do exist within the biomedical literature and can lead to the discovery of apparently new and interesting implicit relationships. In at least two of these cases (Swanson, 1986a, 1988) the hypothesis was subsequently corroborated experimentally by medical researchers. We have cited and discussed these corroborations elsewhere (Swanson, 1993; Smalheiser & Swanson, 1994). The hypothesis advanced in Swanson (1990c)-that the anabolic effects of arginine are brought about by systemic or local release of somatomedin C-has also received direct supporting evidence in three recent studies (see Kirk, 1993; Hurson, 1995; Chevalley, 1998); a fourth study by Corpas (1993) reported negative results. Gordon and Lindsay (1996) re-examined, replicated, and extended Swanson’s work (1986a). The above structures were found through innovative, partially systematic, database search strategies (Swanson, 1989a, 1989b). Computerassisted processing of the downloaded output enhanced the user’s ability to discover novel implicit relationships (Swanson, 1991). This software evolved into a system called Arrowsmith that processes article records downloaded from large bibliographic databases such as Medline. Text passages within database records provide the raw material that suggests or points to underlying linkages (such as AB and BG above) between separately published scientific findings or arguments. Our goal has been to create a research tool for studying complementary noninteractive structures in the scientific literature and at the same time to create a working system useful 50 LIBRARY TRENDS/SUMMER 1999 to biomedical scientists (Swanson, 1991; Swanson & Smalheiser, 1997; Smalheiser & Swanson, 199Rb). With the help of Arrowsmith, we have developed five additional examples of complementary noninteractive literature structures (Swanson & Smalheiser, 199’7; Smalheiser 8c Swanson, 1994, 1996a, 1996b, 1998a), each of which led to a novel, plausible, and testable medical hypothesis. One of these studies (Smalheiser & Swanson, 199th) elicited publication of a concurring letter from an author whose work was the basis for a new hypothesis that we proposed (Ross, 1998). THEPROCESS OF INFERKING TEXTLINKAGES Given two Medline titles that appear to be linked, the process of inferring a biologically meaningful linkage may be more subtle than it seems at first sight. We consider here examples taken from Swanson (1988): 1. “The Relation of Migraine and Epilepsy” (p. 551) 2. “Preliminary Report: The Magnesium-Deficient Rat as a Model of Epilepsy” (p. 556). The two titles taken together appear to provide a link, via epilepsy, between migraine and magnesium deficiency (epilepsy being just one of the eleven links reported). The role of‘Arrowsmith in this example is only to bring the two titles together in order to create a suggestive juxtaposition. Whether the relationship thus revealed might merit further investigation then depends on human judgment. Such judgment in general would be difficult to replace by a computer procedure, for it almost inevitably entails certain background knowledge, context, and presuppositions that are commonly, though perhaps not always consciously, brought to bear by the user. For example, the word “model” in the second title is understood against a substantive background of information about animal models of human disease, and in that context implies that magnesium deficiency causes a disorder resembling epilepsy in the rat. Several hundred analogous title pairs were examined in the course of the migraine-magnesium study, for most of which the linkage was less obvious than in the case above. The user often must make just an educated guess as to which leads are most promising (Swanson, 1991). The problem we identify in this example therefore is not how or whether to draw an inference about the possible effect of magnesium on migraine, given the above two titles, but rather how these two titles (or Medline records), and other pairs analogous to them, could have been found and brought together in the first place without knowing in advance about any specific link such as epilepsy. That task cannot be done using only a conventional Medline search. However, if one first uses Medline to form a local file consisting of all titles with “migraine,” and a second file that consists of all titles with “magnesium,” then a SWANSON AND SMALHEISER/ARROWSMITH 51 straightforward computer procedure can produce a list of all words common to the two sets of titles. “Epilepsy” would be on the list. One can think of this procedure, which Arrowsmith takes as its point of departure, as a “higher order Medline search.” Arrowsmith then automatically filters out noninteresting words (by means of an exclusion list, or stoplist, compiled in advance and built into the system), makes certain morphological transformations (such as plural to singular), constructs and matches phrases, and otherwise exploits information from the Medline record to juxtapose pairs of text passages for the user to consider as possibly complementary. (Arrowsmith can process abstracts as well as titles but, for files of more than 1,000 or so records, it is more efficient and more effective to search, download, and subsequently examine just titles. The restricted context makes it easy to see and assess the A-B relationships when both A and B are in a title and similarly for B-C.) Any inferences about the significance or nature of the linkage between the above two titles, once they have been brought together, are left to the user. Arrowsmith, by creating suggestive juxtapositions of database records, is an aid to scientific discovery but not in itself a mechanism of scientific discovery. AUTOMATIC OF A CANDIDATE GENERATION LISTFOR A Arrowsmith can also do more than help uncover linkages between an initially given A and C. Assume that at the outset only C, the disease under investigation, is given, and the user does not have in mind a specific hypothesis for A (an agent that might act as cause or cure). Then, instead of a specific A, a broad category (AA) may be chosen; such a choice can be simple and effective. In general, categories of exogenous substances that may enter the body and might conceivably have beneficial or adverse effects on C are of interest. Especially important are dietary factors (or deficiencies), toxins, and categories of pharmaceutical agents or their targets (Swanson, 1991). Arrowsmith can then begin with Medline files for C and AA and from these derive a list of specific candidates for A. For example, Arrowsmith was able to start with pre-1988 literature on “migraine” as C, use a category based on dietary or deficiency fac

107 citations


Journal Article
TL;DR: A specific path is described starting in economics and ending in astrophysics traversing 331 documents, with special attention given to where the path crosses disciplinary boundaries and how analogy can be used to model the thought processes involved in such transitions.
Abstract: A METHODOLOGY IS PRESENTED FOR CREATING pathways through the scientific literature following strong co-citation links. A specific path is described starting in economics and ending in astrophysics traversing 331 documents. Special attention is given to where the path crosses disciplinary boundaries and how analogy can be used to model the thought processes involved in such transitions. Implications of information pathways for retrieval, the unity of science, discovery, epistemology, and evaluation are discussed. INFORMATION AND ZNFORMATION RETRIEVAL TRANSITIONS A great deal of information science is concerned with retrieving all the documents from a database that precisely match a user’s query. In this magic bullet model of information retrieval, the documents retrieved will ideally be homogeneous in character. Such an ideal is, of course, rarely achieved. In practice, a wide array of documents of varying relevance is retrieved, resembling more an ecology of information than a uniform set. Less often under consideration is how to understand the diversity and breadth of information that most queries generate, how one topic relates to another, or the transitions from one document to another. Questions such as these naturally arise for large samples of documents and especially multidisciplinary databases. For example, a user interested in a topic such as asthma might retrieve a large number of hits and find that Henry Small, ISI, 3501Market Street, Philadelphia, PA 19104 LIBRARY TRENDS, Vol. 48, No. 1,pp. 72-108 01999 The Board of Trustees, University of Illinois SMAI,L/CROSSING DISCIPLINARY BOUNDARIES 73 some deal with treatment options, age factors, psychological aspects, hereditary tendencies, environmental factors, and so on. The question is how to make sense of this diversity. One reason questions of subject diversity do not come up more often is the tacit assumption that topics or subjects are relatively isolated and distinct from one another, each representing a more or less separate homogeneous entity. Another reason is the assumption that users’ information needs are simple and highly specific. This contrasts with the view that information seeking is more like a gradually unfolding discovery process in which the initial query is only the first step in a longjourney, each step depending on what came before (Kuhlthau, 1999). INFORMATION AND THE UNITYOF SCIENCE TRANSITIONS Earlier discussions of the unity of science (Neurath, 1938) or its modern incarnation in E.O. Wilson’s (1998) consilience, view scientific knowledge as an interconnected fabric of fields and disciplines. In the sociology of science, it is commonplace to say that a great deal of scientific and technological innovation takes place at the boundaries between disciplines (Lemaine et al., 1976) or by individuals who have crossed from one field to another. Cross-fertilization of fields is another term for this, when an idea in one field finds fertile ground in a neighboring field (Crane, 1972). Information scientists have begun to explore these issues by attempting to find unconnected subject areas which, if connected, might yield new discoveries (Swanson & Smalheiser, 1997). Attempts to visualize information spaces also address subject connections since a visualization must depict the relationships among diverse set5 of documents (White &McCain, 1997). It seems likely that future information retrieval systems based on the visual paradigm will have the equivalent of road signs telling the user what direction to travel to reach a particular topic. CITATIONS AND THE STRUCTURE OF SCIENCE One of the best ways of studying the connectedness of information is to use reference or citation links. While connections can also be established by shared vocabulary or indexing terms, a citation link represents a more direct author-selected dependency. By taking a wide-ranging sample of documents across many fields, the unity of scientific information can be examined from a global perspective. Vannevar Bush’s (1945) idea of associative information trails is a natural consequence of the unity of science and the connectedness of knowledge. Hummon and Doreian (1989) attempted to demonstrate this on a small scale by finding a critical path through a DNA citation network. Path analysis has more recently been undertaken for documents in the area of hypertext research using author co-citations (Chen & Carr, 1999). 74 LIBRAKY TKENDS/SUMMER 1999 Taking citation links as the basis o f a structural analysis of science, it is natural to suppose that it would be possible to travel from any topic or field to any other (Small, 1999) just as in the world of the Internet we might follow a series of hypertext links tu reach any desired Web site. In the abstract, this is equivalent to traversing a network, but there is no guarantee the structure is in fact connected. In science, citations are very unevenly distributed, concentrating in narrowly defined pockets which correspond roughly to specialties or invisible colleges of’researchers (Small & Griffith, 1974). The boundaries of these regions of’high density are not well defined, however. Yet the most interesting links in the chain from one end of science to the other are those which cross disciplinary boundaries. Interdisciplinary links represent a kind of intellectual leap from one domain to another. In the world of citation analysis, strong links can be established by frequent patterns of co-citation (Small, 1973) or bibliographic coupling (Kessler,1963). Co-citation links are a second order form of citation linkage that depends on the joint citing of two earlier documents by later documents. Unlike direct citation links, co-citations are nondirectional and can be weighted by frequency of occurrence. By simple “thresholding,” it is possible to identify regions of high co-citation density. Thresholding is in fact equivalent to the method of clustering called “single-linkage” (Hartigan, 1975). In a map based on co-citation clusters, an interdisciplinary link can occur when an author co-cites across the boundary of two disciplinary clusters. If the author cites predominantly into one cluster, as is often the case, the interdisciplinary co-citation reaches out beyond the author’s home cluster (see Figure 1). This reaching out or stretching can import or export methods, ideas, models, or empirical results from the author’s field to the other field. This is an act requiring a broad awareness of literature plus the creative imagination to see how the outside information fi1.s with the author’s problem domain. The author of such a paper is going out on a limb to integrate ideas from another discipline. The objective of the present study is to examine the nature of the connections that tie the scientific literature together, focusing particularly on links crossing disciplinary boundaries. The question is whether interdisciplinary transitions are gradual or abrupt or based on shared features, analogies, creative insights, or perhaps even questionable assumptions-in short, how far the author had to stretch to make the connection. In another sense it is an examination of the creative process of moving from one domain of knowledge to another. If citation relationships capture authors’ decisions or selections on what documents are relevant to a problem, paths that follow citation links may in some sense capture steps in problem-solving behavior, logical thinking, or intuition. SMAI,L/CROSSING DISCIPLINARY BOUNDARIES 75

54 citations


Journal Article
TL;DR: To date, very little is known about the usefulness of the access provided by content-based systems, and more work needs to be done on user needs and satisfaction with these systems.
Abstract: CONVENIENT IMAGE CAPTURE TECHNIQUES, inexpensive storage, and widely available dissemination methods have made digital images a convenient and easily available information format. This increased availability of images is accompanied by a need for solutions to the problems inherent in indexing them for retrieval. Unfortunately, to date, very little information has been available on why users search for images, how they intend to use them, as well as how they pose their queries, though this situation is being remedied as a body of research begins to accumulate. New image indexing methods are also being explored. Traditional concept-based indexing uses controlled vocabulary or natural language to express what an image is or what it is about. Newly developed content-based techniques rely on a pixel-level interpretation of the data content of the image. Concept-based indexing has the advantage of providing a higher-level analysis of the image content but is expensive to implement and suffers from a lack of interindexer consistency due to the subjective nature of image interpretation. Content-based indexing is relatively inexpensive to implement but provides a relatively low level of interpretation of the image except in fairly narrow and applied domains. To date, very little is known about the usefulness of the access provided by content-based systems, and more work needs to be done on user needs and satisfaction with these systems. An examination of a number of image database systems shows

48 citations


Journal Article
TL;DR: The functions of myth are looked at and ideas about the information society, information, and information overload are brought together to conclude that information overload is a myth of modern culture.
Abstract: LIBRARY SCIENCE WORK has often focused on the study AND INFORMATION of solutions to the effects of information overload. For this reason, and because the concept is frequently identified as a problem in popular culture, it is logical to assume that the existence and description of information overload has been documented through rigorous investigation. Such is not the case. This article looks at the functions of myth and brings together ideas about the information society, information, and information overload to conclude that information overload is a myth of modern culture. In this sense, myth is a “nonscientific” process that confirms the reality of an elusive phenomenon. The article also reports results of a pilot project intended to describe information overload experienced by a particular folk group composed of future library and information professionals. In addition to trying to enhance the description of information overload, the pilot project represents an attempt to test the idea of the folk group as a remedy for this condition.

46 citations


Journal Article
TL;DR: The perception that humanists are less than enamored with technology when compared to their peers in other disciplines is investigated.
Abstract: This article investigates the perception that humanists are less than enamored with technology when compared to their peers in other disciplines. Using focus group interviews with humanities faculty at an east coast university, the article examines and analyzes their access to technology, their technological skill and interest, their concerns about digitized texts and art works, their views on the digital library of the future, and the value of technology to their research and teaching.

42 citations


Journal Article
TL;DR: The rationale and environment of its development and applications, and issues related to database design and collection are reviewed.
Abstract: KNOWLEDGE DISCOVERY IN DATABASES (KDD) revolves around the investigation and creation of knowledge, processes, algorithms, and the mechanisms for retrieving potential knowledge from data collections. Related issues include data collection, database design, the description of entries in the database using the most appropriate representation, and data quality. This article is an introductory overview of knowledge discovery in databases. The rationale and environment of its development and applications are discussed. Issues related to database design and collection are reviewed.

42 citations


Journal Article
TL;DR: Drawing upon an efficacious method for discovering previously unknown causes of medical syndromes, and searching in Humanities Index, a periodical index included in WILS, the Wilson Database, an illuminating newhumanities analogy was found by constructing a search statement in which proper names were correlated with associated concepts.
Abstract: Voluminous databases contain hiddenknowledge, i.e., literatures that are logicallybut not bibliographically linked. Unlinkedliteratures containing academically interestingcommonalities cannot be retrieved via normalsearching methods. Extracting hidden knowledgefrom humanities databases is especiallyproblematic because the literature, written in“everyday”rather than technical language, lacksprecision required for efficient retrieval, andbecause humanities scholars seek new analogiesrather than causes. Drawing upon an efficaciousmethod for discovering previously unknown causesof medical syndromes, and searching inHumanities Index, a periodical index included inWILS, the Wilson Database, an illuminating newhumanities analogy was found by constructing asearch statement in which proper names werecoupled with associated concepts.

41 citations


Journal Article
TL;DR: The differences between the level of adoption of information resources by selected faculty and their responses to these technologies are explored, the impact of library technology on the way they use the library for research and teaching, and their interpretation of the role the library plays in this period of transition and change are explored.
Abstract: ACADF MIC LIBRARIESHAVP MADE A sI(:NIFIc:Awr investment in electronic information resources and associated computer-based technologies so that their users can gain access to those resources and services The faculty response to the increase in these library technologies is not always known Using an essential element from the theory of the diffusion of innovations (that individuals adopt innovations at different rates), the authors conducted a series of focus group sessions and personal interviews with university faculty to discover their attitudes regarding the computer-based information resources that academic libraries provide to meet their information needs This article explores the differences between the level of adoption of information resources by selected faculty and their responses to these technologies, the impact of library technology on the way they use the library for research and teaching, and their interpretation of the role the library plays in this period of transition and change

41 citations


Journal Article
TL;DR: This discussion will focus on the problems of image retrieval identified in current research projects, report on an evaluation project in process, and propose a framework for evaluation studies of image retrieved systems that emphasizes the role of user feedback.
Abstract: INTELLECTUAL ACCESS TO A C~ROWINC NUMBER OF NETWORKED image repositories is but a small part of the much larger problem of intellectual access to new information formats. As more and more information becomes available in digital formats, it is imperative that we understand how people retrieve and use images. Several studies have investigated how users search for images, but there are few evaluation studies of image retrieval systems. Preliminary findings from research in progress indicate a need for improved browsing tools, image manipulation software, feedback mechanisms, and query analysis. Comparisons are made to previous research results from a study of intellectual access to digital art images. This discussion will focus on the problems of image retrieval identified in current research projects, report on an evaluation project in process, and propose a framework for evaluation studies of image retrieval systems that emphasizes the role of user feedback.

Journal Article
TL;DR: This article briefly reviews template mining research and shows how templates are used in Web search engines- such as Alta Vista-and in meta-search engines-such as Ask Jeeves-for helping end-users generate natural language search expressions.
Abstract: WITH THE RAPID GROWTH OF DIGITAL INFORMATION RESOURCES, information extraction (1E)-the process of automatically extracting information from natural language texts-is becoming more important. A number of IE systems, particularly in the areas of news/fact retrieval and in domain-specific areas, such as in chemical and patent information retrieval, have been developed in the recent past using the template mining approach that involves a natural language processing (NLP) technique to extract data directly from text if either the data and/or text surrounding the data form recognizable patterns. When text matches a template, the system extracts data according to the instructions associated with that template. This article briefly reviews template mining research. It also shows how templates are used in Web search engines-such as Alta Vista-and in meta-search engines-such as Ask Jeeves-for helping end-users generate natural language search expressions. Some potential areas of application of template mining for extraction of different kinds of information from digital documents are highlighted, and how such applications are used are indicated. It is suggested that, in order to facilitate template mining, standardization in the presentation. and layout of information within digital documents has to be ensured, and this can be done by generating various templates that authors can easily download and use while preparing digital documents.

Journal Article
TL;DR: This article reviews research on how people use mental models of images in an information retrieval environment, which can aid a researcher in designing new systems and help librarians select systems that best serve their patrons.
Abstract: THISARTICLE REVIEWS RESEARCH ON HOW people use mental models of images in an information retrieval environment. An understanding of these cognitive processes can aid a researcher in designing new systems and help librarians select systems that best serve their patrons. There are traditionally two main approaches to image indexing: concept-based and content-based (Rasmussen, 1997). The concept-based approach is used in many production library systems, while the content-based approach is dominant in research and in some newer systems. In the past, contentbased indexing supported the identification of “low-level” features in an image. These features frequently do not require verbal labels. In many cases, current computer technology can create these indexes. Conceptbased indexing, on the other hand, is a primarily verbal and abstract identification of “high-level” concepts in an image. This type of indexing requires the recognition of meaning and is primarily performed by humans. Most production-level library systems rely on concept-based indexing using keywords. Manual keyword indexing is, however, expensive and introduces problems with consistency. Recent advances have made some content-based indexing practical. In addition, some researchers are working on machine vision and pattern recognition techniques that blur the line between concept-based and content-based indexing. It is now possible to produce computer systems that allow users to search simultaneously on aspects of both concept-based and content-based indexes. The intelli

Journal Article
TL;DR: This discussion looks at the changing character of education for librarianship in the Information Age, emphasizing faculty and students in the emerging curriculum.
Abstract: The information profession proclaims itself to be a proponent of both the Information Age and of equity for women and people of color. Yet certain features of the Information Age appear to be inhospitable to the goals of gender equity and there is a long history of gender stratification, with men favored for top positions in the profession. Structural changes brought about by the Information Age may foreshadow a resurgence of inequity. This discussion looks at the changing character of education for librarianship in the Information Age, emphasizing faculty and students in the emerging curriculum. Relative support for Library Science and for Information Science courses, measured using faculty distribution in the two areas, is examined.

Journal Article
TL;DR: An analysis of fifty folktales from different cultures reveals that, while the details of orphan stories vary, there are some universal elements.
Abstract: ABSTRAC T ORPHAN HEROES AND HEROINES ARE familiar characters in children’s literature, particularly in the fiction of the nineteenth and early twentieth century. This type of protagonist has its roots in folktales. An analysis of fifty folktales from different cultures reveals that, while the details of orphan stories vary, there are some universal elements. A comparison of these patterns to a literary orphan story, The Secret Garden, demonstrates how the patterns found in orphan folktales were adapted and applied in children’s fiction.

Journal Article
TL;DR: It is concluded that the wide availibility of complete text in electronic form does not reduce the value of abstracts for information retrieval activities even in such more sophisticated applications as Knowledge discovery.
Abstract: VARIOUS LEVELS OF CRITERIA FOR JUDGING the quality of abstracts and abstracting are presented. Requeriments for abstracts to be read by humans are compared with requeriments for those to be searched by computer. It is concluded that the wide availibility of complete text in electronic form does not reduce the value of abstracts for information retrieval activities even in such more sophisticated applications as Knowledge discovery.

Journal Article
TL;DR: The concept of the virtual library is introduced and how the increasing reliance on computers and digital information has affected library users and staff is explored.
Abstract: This article introduces the concept of the virtual library and also explores how the increasing reliance on computers and digital information has affected library users and staff. In particular, technology has created an expectation for full-text information delivered to the desktop at the user's convenience. As the technologies used by libraries have evolved, library jobs, organizational structures, and working conditions have changed. Some facets of the virtual library present challenges to the intellectual, social, and physical needs of adults and children. In the midst of technological change, the traditional library mission of service and access is still relevant, and librarians are needed more than ever before to help users cope with changing technologies and to humanize the virtual library.

Journal Article
TL;DR: In this article, a method for extracting iiiuximtil ft-~quent sequences in a set of documents is presented and a maximal frequent sequence is presented, a sequence of words that is frequent in the document collection and, moreover, that is not contained in any other longer frequent sequence.
Abstract: AS ONE APPROACH TO ADDRESS THE NEW INFORMATION needs caused by the increasing amount of available digital data, the notion of knowledge discovery has been developed. Knowledge discovery methods typically attempt to reveal general patterns and regularities in data instead of specific facts, the kind of information that is hardly possible for any human being to find. In this article, a method for extracting iiiuximtil ft-~quent sequences in a set of documents is presented. A maximal frequent sequence is a sequence ofwords that is frequent in the document collection and, moreover, that is not contained in any other longer frequent sequence. A sequence is considered to be frequent if it appears in at least n documents when n is the frequency threshold given. Frequent maximal sequences can be used, for instance, as content descriptors for documeiit ment is represented as a set of sequences, which can then be used to discover other regularities in the document collection. As the sequences are frequent, their combination of words is not accidental. Moreover, a sequence has exactly the same form in many documents, providing a possibility to do similarity mappings for information retrieval, hypertext linking, clustering, and discovery of frequent co-occurrences. A set of sequences, particularly the longer ones, as such may also give a concise summary of the topic of the document. INTROIIUCTION The research field of knowledge discovery in databases (or data mining) has in the last years produced methods for finding patterns and

Journal Article
TL;DR: This article describes selected aspects of LC’s practical experience and current practices from digital capture through interactions with users, with an emphasis on the integration of access to pictorial images online with other services and activities at LC.
Abstract: OVER THE LAST FEW YEARS, THE LIBRARY (LC) has increasOF CONGRESS ingly created digital reproductions of visual materials to enhance access to its resources. Digitization is now a mainstream activity in the Prints and Photographs Division (P & P) and the Geography and Maps Division (G & M) . Both divisions work closelywith the National Digital Library Program to make their incomparable resources accessible over the Internet to the general public through the American Memory Web site (http:// memory.loc.gov/) . They also use the digital images to serve their more traditional clientele in the reading rooms. Retrieval from a collection of digital images offers special opportunities to apply new technological advances, as illustrated elsewhere in this issue. However, retrieval often takes place in broader contexts. The Print and Photographs Division seeks to enhance access to its international pictorial holdings, whether digitized or not. Within American Memory, the focus is on retrieval by the nonspecialist from a body of materials related to the history and culture of the United States, materials heterogeneous in both original and digital form. A yet broader context is retrieval from the comprehensive collections of the entire Library of Congress. Beyond enabling retrieval, LC is concerned with facilitating use of the materials retrieved, consistent with any associated rights. This article describes selected aspects of LC’s practical experience and current practices from digital capture through interactions with users, with an emphasis on the integration of access to pictorial images online with other services and activities at LC.

Journal Article
TL;DR: The MESL project’s methods and findings are reviewed, includingdescriptive metadata, database design, interface design, and tools for use, and more recent development efforts in extending the model for digital image delivery of visual resources to higher education audiences are discussed.
Abstract: FROM1995 THROUGH 1997, SEVEN CULTURAL HERITAGE repositories and seven universities collaborated on an extensive demonstration project called the Museum Educational Site Licensing Project (MESL) to explore the administrative, technical, and pedagogical issues involved in making digital museum images and information available to educational audiences. This article reviews the MESL project’s methods and findings in a number of areas-descriptive metadata, database design, interface design, and tools for use. It discusses more recent development efforts in extending the model for digital image delivery of visual resources to higher education audiences. Finally, it suggests how to proceed by posing a number of usercentered questions about the design goals for networked access to the vast visual resources of the cultural heritage community. Selected projects from the literature of computer and information science are discussed to stimulate thinking about avenues for research and to focus project design goals.

Journal Article
TL;DR: The purpose of this study is to construct a baseline of scholarly use of internet-based electronic resources (e-sources) by surveying a group of library and information science (LIS) scholars, and suggestions for improving scholars' use of e-s sources for research are suggested.
Abstract: The purpose of this study is to construct a baseline of scholarly use of internet-based electronic resources (e-sources) by surveying a group of library and information science (LIS) scholars. Results reported here include researchers' demographic information, frequency of use of various Internet tools and resources, ways of accessing various Internet tools and applications, strategies of locating e-sources for research, opinions on citing e-sources, evaluation of e-sources, and suggestions for improving scholars' use of e-sources for research.

Journal Article
TL;DR: In this paper, computer vision offers a variety of techniques for searching for pictures in large collections using indexing languages, and finding methods concentrate on matching subparts of images in the hope of finding particular objects.
Abstract: Very large collections of images are now common. Indexing and searching such collections using indexing languages is difficult. Computer vision offers a variety of techniques for searching for pictures in large collections. Appearance methods compare images based on the overall content of the image using such criteria as similarity of color histograms, texture histograms, spatial layout, and filtered representations. Finding methods concentrate on matching subparts of images, defined in a variety of ways, in the hope of finding particular objects. These ideas are illustrated with a variety of examples from the current literature.

Journal Article
TL;DR: A survey of library support staff perceptions and opinions about technological change was conducted early in 1998 and the results are compared to the results of a similar survey administered to the same population in 1988.
Abstract: A SURVEY CONCERNING UNIVERSITY-LIBRARY support-staff perceptions and opinions about technological change was conducted early in 1998. The results are compared to the results of a similar survey administered to the same population in 1988. The evolving perceptions, opinions, and sug- gestions of this educated and highly experienced group of library person- nel are offered as a resource for better planning of library automation and for the improvement of the library as workplace. INTRODUCTION Support staff, librarians, and administrators working together amica- bly, even enthusiastically, toward cooperatively created goals emanating from a cooperatively written vision statement, could quite possibly find solutions to some of the major challenges now facing libraries, many of which are related to technological change. Changes in the kinds of tools we use at work and the kinds of resources we have available are catalysts for new philosophies, new concepts of service, new designs for our work- day, and new feelings-positive and negative-about our work. Change in the magnitude we are now experiencing is almost sure to cause turbu- lence. Collegial understanding among all of the members of a library staff, if carefully fostered, can certainly minimize trouble and maximize the many strengths available to make technological transitions smoother. A questionnaire distributed in 1988 was designed to study the percep- tions of library support staff concerning new technologies that were

Journal Article
TL;DR: There is an obvious need for continued research, evaluation, and planning if museums and archives are committed to protecting their digital image assets from a number of potential threats.
Abstract: THERE IS AN OBVIOUS NEED FOR ONGOING RESEARCH, evaluation, and planning if museums and archives are committed to protecting their digital image assets. A number of potential threats to the integrity of digital image information can be identified when standard practices in museums and archives are examined. Changes in the integrity of digital image information can be caused by the manner in which the source data are acquired and recorded and by modifications made to the image data file. Alterations made to contextual data can limit valid interpretation of the associated surrogate image. The destruction of the mechanisms that link contextual data to the appropriate digital image has the same effect as deleting contextual information. Loss of control over digital assets can be the result of failure or inability to establish and publicize copyright. Even if copyright is established and enforceable, failure to enforce rights has the same effect as having no rights at all. Finally, failure to detect corruption of digital information means that invalid, partial, or inappropriate information will be spread under the guise of authentic reliable information. Some institutions are already proactively applying security measures to digital image collections. Some of these security measures can have a negative impact on the integrity of the files that they are designed to protect. Systematic consideration of risk factors can inform the creation of procedures and application of security that works to guarantee the reliability and accuracy

Journal Article
TL;DR: Experiments show MARIE prototypes are more accurate than simpler methods, although the task is very challenging and more work is needed, and its processing is illustrated in detail on part of an Internet World Wide Web page.
Abstract: The MARIE project has explored knowledge-based information retrieval of captioned images of the kind found in picture libraries and on the Internet. It exploits the idea that images are easier to understand with context, especially descriptive text near them, but it also does image analysis. The MARIE approach has five parts: (1) find the images and captions; (2) parse and interpret the captions; (3) segment the images into regions of homogeneous characteristics and classify them; (4) correlate caption interpretation with image interpretation using the idea of focus; and (5) optimize query execution at run time. MARIE emphasizes domain-independent methods for portability at the expense of some performance, although some domain specification is still required. Experiments show MARIE prototypes are more accurate than simpler methods, although the task is very challenging and more work is needed. Its processing is illustrated in detail on part of an Internet World Wide Web page.

Journal Article
TL;DR: This article emphasizes the importance of studying folklore of information work environments in the context of the current shift toward removing work from any particular place via information systems, e-mail, and the Web.
Abstract: A FOLKLORE OF INFORMATION WORK ENVIRONMENTS adopts the holistic indepth methods from the folklore of work and applies them to modern information workplaces. Like some other fields, folklore of spaces and artifacts takes the perspective that people's folk practices are a part of the things they interact with, that environment impacts people, and people impact their environment. Thus a folklore of space enfolds many of the research interests of diverse fields that deal with the modern work setting and the elements within that setting, from cubicle design to information systems. This article reviews literature from several bodies of research and attempts to bring them together in a projected folklore of information work space. It emphasizes the importance of studying folklore of information work environments in the context of the current shift toward removing work from any particular place via information systems, e-mail, and the Web. A deep understanding of the folklore of work space can give clues to the impact of this trend and can inform design of information systems and modern work environments.

Journal Article
TL;DR: Some of the progress made over the years toward exploring information retrieval beyond the text domain are reported, including visual feature extraction, retrieval models, query reformulation techniques, efficient execution speed performance, and user interface considerations.
Abstract: With the expansion of the Internet, searching for information goes beyond the boundary of physical libraries. Millions of documents of various media types-such as text, image, video, audio, graphics, and animation-are available around the world and linked by the Internet. Unfortunately, the state of the art of search engines for media types other than text lags far behind their text counterparts. To address this situation, we have developed the Multimedia Analysis and Retrieval System (MARS). This article reports some of the progress made over the years toward exploring information retrieval beyond the text domain. In particular, the following aspects of MARS are addressed in the article: visual feature extraction, retrieval models, query reformulation techniques, efficient execution speed performance, and user interface considerations. Extensive experimental results are reported to validate the proposed approaches.

Journal Article
TL;DR: We are witnessing an electronic revolution in the library which may prove to be a revolution of the humanities and even in the nature of learning and education as mentioned in this paper, and we must hope that the central role of libraries in preserving these ideas will survive the electronic revolution.
Abstract: We are witnessing an electronic revolution in the library which may prove to be a revolution in the humanities and even in the nature of learning and education. Like many revolutions, it is salutary up to a point, but it tends to go beyond that point. In cyberspace, every source seems as authoritative as every other. The revolution tends to depreciate the book in hand and to incapacitate us for thinking about ideas rather than amassing facts. The humanities are an essentially human enterprise of which the record reposes in books in libraries; this is where we look for truth, knowledge, and wisdom. We must hope that the central role of libraries in preserving these ideas will survive the electronic revolution.


Journal Article
TL;DR: The author describes four successive library retreats held by the Bowling Green State University library staff from 1995 to 1998, which reflect the changes that have occurred in library work and in management theory since the early 1990s.
Abstract: The author describes four successive library retreats held by the Bowling Green State University library staff from 1995 to 1998. The retreats reflect the changes that have occurred in library work and in management theory since the early 1990s. The importance of technology to library work is recognized, but there is also a growing realization that developing a flexible staff-capable of learning new skills and willing to absorb the values of the organization-is really the key to maintaining a fully functioning and well-respected academic library.