scispace - formally typeset
Search or ask a question
Author

Timothy W. Cole

Bio: Timothy W. Cole is an academic researcher from University of Illinois at Urbana–Champaign. The author has contributed to research in topics: Metadata & Digital library. The author has an hindex of 16, co-authored 95 publications receiving 1077 citations. Previous affiliations of Timothy W. Cole include Martin Marietta Materials, Inc. & California Institute of Technology.


Papers
More filters
Journal ArticleDOI
TL;DR: The Digital Library Initiative project at the University of Illinois at Urbana-Champaign is developing the information infrastructure to effectively search technical documents on the Internet, constructing a large testbed of scientific literature, evaluating its effectiveness under significant use, and researching enhanced search technology.
Abstract: The Digital Library Initiative (DLI) project at the University of Illinois at Urbana-Champaign is developing the information infrastructure to effectively search technical documents on the Internet. The authors are constructing a large testbed of scientific literature, evaluating its effectiveness under significant use, and researching enhanced search technology. They are building repositories (organized collections) of indexed multiple-source collections and federating (merging and mapping) them by searching the material via multiple views of a single virtual collection. Developing widely usable Web technology is also a key goal. Improving Web search beyond full-text retrieval will require using document structure in the short term and document semantics in the long term. Their testbed efforts concentrate on journal articles from the scientific literature, with structure specified by the Standard Generalized Markup Language (SGML). Research efforts extract semantics from documents using the scalable technology of concept spaces based on context frequency. They then merge these efforts with traditional library indexing to provide a single Internet interface to indexes of multiple repositories.

100 citations

01 Jan 2005
TL;DR: A number of statistical characterizations of metadata samples drawn from a large corpus harvested through the Open Archives Initiative (OAI) Protocol for Metadata Harvesting (PMH) are presented and interpret these findings in relation to general quality dimensions and metadata practices that occur at the local level.
Abstract: Introduction The federation of digital resources has become increasingly important in realizing the full potential of digital libraries. Federation is often achieved through the aggregation of descriptive metadata, therefore the decisions resource developers make for the creation, maintenance, and quality assurance of their metadata can have significant impacts on aggregators and service providers. Metadata may be of high quality within a local database or web site, but when it is taken out of this context, information may be lost or its integrity may be compromised. Maintaining consistency and fitness for purpose are also complicated when metadata are combined in a federated environment. A fuller understanding of the criteria for high quality, “shareable” metadata is crucial to the next step in the development of federated digital libraries. This study of metadata quality was conducted by the IMLS Digital Collections and Content (DCC) project team (http://imlsdcc.grainger.uiuc.edu/) using quantitative and qualitative analysis of metadata authoring practices of several projects funded through the Institute of Museum and Library Services (IMLS) National Leadership Grant (NLG) program. We present a number of statistical characterizations of metadata samples drawn from a large corpus harvested through the Open Archives Initiative (OAI) Protocol for Metadata Harvesting (PMH) and interpret these findings in relation to general quality dimensions and metadata practices that occur at the local level. We discuss the impact of these kinds of quality on aggregation and suggest quality control and normalization processes that may improve search and discovery services at the aggregated level.

95 citations

Journal ArticleDOI
TL;DR: The context and development of the Framework is described, the major principles articulated in the Framework are presented, and remarks regarding the immediate impacts of the work accomplished by the IMLS Digital Library Forum are concluded.
Abstract: As technologies to digitize primary source content mature and become better understood, more widely accessible, and more efficient, the volume of available digital content increases and issues of integration and aggregation become more important. Today's digitization project managers must give high priority to factors such as reusability, persistence, interoperability, verification, and documentation when planning their projects. Digitization project funding agencies, like the Institute of Museum and Library Services (IMLS) and the National Science Foundation (NSF), must give substantial weight to these same factors when assessing programs and evaluating project proposals. A Digital Library Forum convened by the IMLS and working in collaboration with participants from the NSF's National Science, Mathematics, Engineering, and Technology Education Digital Library program has released a Framework of Guidance for Building Good Digital Collections to serve as a resource for practitioners and funding agencies. This Framework pays particular attention to digitization collection practices that facilitate integration and aggregation of digital information resources developed by museums, libraries, and similar institutions. To protect against obsolescence and to better accommodate the wide range of digitization projects funded by the IMLS, NSF, and other granting organizations, the Framework is not wedded to any particular set of standards or best practices. Rather it articulates principles fundamental to planning, implementation, and evaluation of digitization projects and links to specific resources and exemplary models that support and illustrate good application of these principles. This paper describes the context and development of the Framework, briefly presents the major principles articulated in the Framework, and concludes with remarks regarding the immediate impacts of the work accomplished by the IMLS Digital Library Forum and a call for the continued development and maintenance of the Framework.

86 citations

08 Feb 2013
TL;DR: The Open Annotation Core Data Model specifies an interoperable framework for creating associations between related resources, annotations, using a methodology that conforms to the Architecture of the World Wide Web.
Abstract: The Open Annotation Core Data Model specifies an interoperable framework for creating associations between related resources, annotations, using a methodology that conforms to the Architecture of the World Wide Web. Open Annotations can easily be shared between platforms, with sufficient richness of expression to satisfy complex requirements while remaining simple enough to also allow for the most common use cases, such as attaching a piece of text to a single web resource.An Annotation is considered to be a set of connected resources, typically including a body and target, where the body is somehow about the target. The full model supports additional functionality, enabling semantic annotations, embedding content, selecting segments of resources, choosing the appropriate representation of a resource and providing styling hints for consuming clients.

82 citations

Proceedings Article
06 Nov 2004
TL;DR: An approach to conceptualizing, measuring, and assessing metadata quality is presented based on a more general model of information quality (IQ) for many kinds of information beyond just metadata.
Abstract: This paper presents early results from our empirical studies of metadata quality in large corpuses of metadata harvested under Open Archives Initiative (OAI) protocols. Along with some discussion of why and how metadata quality is important, an approach to conceptualizing, measuring, and assessing metadata quality is presented. The approach given in this paper is based on a more general model of information quality (IQ) for many kinds of information beyond just metadata. A key feature of the general model is its ability to condition quality assessments by context of information use, such as the types of activities that use the information, and the typified norms and values of relevant information-using communities. The paper presents a number of statistical characterizations of analyzed samples of metadata from a large corpus built as part of the Institute of Museum and Library Services Digital Collections and

79 citations


Cited by
More filters
Journal IssueDOI
TL;DR: This article proposes a general IQ assessment framework that consists of comprehensive typologies of IQ problems, related activities, and a taxonomy of IQ dimensions organized in a systematic way based on sound theories and practices.
Abstract: One cannot manage information quality (IQ) without first being able to measure it meaningfully and establishing a causal connection between the source of IQ change, the IQ problem types, the types of activities affected, and their implications. In this article we propose a general IQ assessment framework. In contrast to context-specific IQ assessment models, which usually focus on a few variables determined by local needs, our framework consists of comprehensive typologies of IQ problems, related activities, and a taxonomy of IQ dimensions organized in a systematic way based on sound theories and practices. The framework can be used as a knowledge resource and as a guide for developing IQ measurement models for many different settings. The framework was validated and refined by developing specific IQ measurement models for two large-scale collections of two large classes of information objects: Simple Dublin Core records and online encyclopedia articles. © 2007 Wiley Periodicals, Inc.

374 citations

Journal ArticleDOI
TL;DR: In this volume, the author develops a new approach for the analysis of differing types of informations systems, called the Value-Added Model, based on the anlaysis of information-use environments and on the system responses to the needs of those environments.

345 citations

Journal ArticleDOI
TL;DR: This empirical research demonstrates the effectiveness of content analysis to map the research literature of the software engineering discipline and suggests that certain research themes in software engineering have remained constant, but with changing thrusts.
Abstract: This empirical research demonstrates the effectiveness of content analysis to map the research literature of the software engineering discipline. The results suggest that certain research themes in software engineering have remained constant, but with changing thrusts. Other themes have arisen, matured, and then faded as major research topics, while still others seem transient or immature. Co-word analysis is the specific technique used. This methodology identifies associations among publication descriptors (indexing terms) from the ACM Computing Classification System and produces networks of descriptors that reveal these underlying patterns. This methodology is applicable to other domains with a supporting corpus of textual data. While this study utilizes index terms from a fixed taxonomy, that restriction is not inherent; the descriptors can be generated from the corpus. Hence, co-word analysis and the supporting software tools employed here can provide unique insights into any discipline's evolution.

331 citations

Journal ArticleDOI
TL;DR: The results indicate that a Kohonen self-organizing map (SOM)-based algorithm can successfully categorize a large and eclectic Internet information space into manageable sub-spaces that users can successfully navigate to locate a homepage of interest to them.
Abstract: The Internet provides an exceptional testbed for developing algorithms that can improve browsing and searching large information spaces. Browsing and searching tasks are susceptible to problems of information overload and vocabulary differences. Much of the current research is aimed at the development and refinement of algorithms to improve browsing and searching by addressing these problems. Our research was focused on discovering whether two of the algorithms our research group has developed, a Kohonen algorithm category map for browsing, and an automatically generated concept space algorithm for searching, can help improve browsing and/or searching the Internet. Our results indicate that a Kohonen self-organizing map (SOM)-based algorithm can successfully categorize a large and eclectic Internet information space (the Entertainment subcategory of Yahool) into manageable sub-spaces that users can successfully navigate to locate a homepage of interest to them. The SOM algorithm worked best with browsing tasks that were very broad, and in which subjects skipped around between categories. Subjects especially liked the visual and graphical aspects of the map. Subjects who tried to do a directed search, and those that wanted to use the more familiar mental models (alphabetic or hierarchical organization) for browsing, found that the map did not work well. The results from the concept space experiment were especially encouraging. There were no significant differences among the precision measures for the set of documents identified by subject-suggested terms, thesaurus-suggested terms, and the combination of subject- and thesaurus-suggested terms. The recall measures indicated that the combination of subject- and thesaurus-suggested terms exhibited significantly better recall than subject-suggested terms alone. Furthermore, analysis of the homepages indicated that there was limited overlap between the homepages retrieved by the subject-suggested and thesaurus-suggested terms. Since the retrieved homepages for the most part were different, this suggests that a user can enhance a keyword-based search by using an automatically generated concept space. Subjects especially liked the level of control that they could exert over the search, and the fact that the terms suggested by the thesaurus were real (i.e., originating in the homepages) and therefore guaranteed to have retrieval success.

299 citations

Patent
24 Jun 2002
TL;DR: In this paper, the present invention relates to dynamic discovery of documents or information through a focused crawler or search engine, and it pertains to the field of computer software development.
Abstract: The present invention pertains to the field of computer software. More specifically, the present invention relates to dynamic discovery of documents or information through a focused crawler or search engine.

284 citations