scispace - formally typeset
Search or ask a question

Showing papers in "D-lib Magazine in 2008"


Journal ArticleDOI
TL;DR: In this article, the authors study the current state of personal digital archiving in practice with the aim of designing services for the long-term storage, preservation, and access of digital belongings.
Abstract: Most of us engage in magical thinking when it comes to the long term fate of our digital belongings. This magical thinking may manifest itself in several ways: technological optimism (“JPEG is so common; why would it stop working?”), radical ephemeralism (“It’s like a fire: you just have to move on”), or simply a gap between principals and practice (“I don’t know why I never made a copy of those photos.”). At this point, a strategy that hinges on benign neglect and lots of copies seems to be the best we can hope for. For the last few years, with various collaborators, I have tried to understand the current state of personal digital archiving in practice with the aim of designing services for the long-term storage, preservation, and access of digital belongings. Our studies have not only confirmed that experienced computer users have accumulated a substantial amount of digital stuff that they care about, but also that they have already lost irreplaceable artifacts such as photos, creative efforts, research

83 citations


Journal ArticleDOI
TL;DR: A unique look at how electronic journals and other developments have influenced changes in reading behavior over three decades is provided, including evidence of the value of reading in addition to reading patterns.
Abstract: Purpose – By tracking the information-seeking and reading patterns of science, technology, medical and social science faculty members from 1977 to the present, this paper seeks to examine how faculty members locate, obtain, read, and use scholarly articles and how this has changed with the widespread availability of electronic journals and journal alternatives. Design/methodology/approach – Data were gathered using questionnaire surveys of university faculty and other researchers periodically since 1977. Many questions used the critical incident of the last article reading to allow analysis of the characteristics of readings in addition to characteristics of readers. Findings – The paper finds that the average number of readings per year per science faculty member continues to increase, while the average time spent per reading is decreasing. Electronic articles now account for the majority of readings, though most readings are still printed on paper for final reading. Scientists report reading a higher proportion of older articles from a wider range of journal titles and more articles from library e-collections. Articles are read for many purposes and readings are valuable to those purposes. Originality/value – The paper draws on data collected in a consistent way over 30 years. It provides a unique look at how electronic journals and other developments have influenced changes in reading behavior over three decades. The use of critical incidence provides evidence of the value of reading in addition to reading patterns.

80 citations


Journal ArticleDOI
TL;DR: The article provides some figures on the results of the strategic plan, explores future initiatives being devised to further increase the adoption of the repository, and describes the set of activities included in a strategic plan specially designed to undertake the previously outlined problems.
Abstract: In this article, we tackle the ubiquitous problems of slow adoption and low deposit rates often seen in recently created institutional repositories. The article begins with a brief description of the implementation process of RepositoriUM, the institutional repository of the University of Minho, and moves on to thoroughly describe the set of activities included in a strategic plan specially designed to undertake the previously outlined problems. Among those activities are the development of an adequate promotional plan, development of value-added services for authors, engagement in the international community and definition of a self-archiving mandate policy. The article also provides some figures on the results of the strategic plan and explores future initiatives being devised to further increase the adoption of the repository.

74 citations




Journal ArticleDOI
TL;DR: In this paper, the implications of the four challenges presented in Part 1 and Part 2 of this article are explored and some promising technological directions and requirements for each of them are discussed.
Abstract: In this article I explore the implications of the four challenges presented in Part 1 – (1) accumulation, (2) distribution, (3) digital stewardship, and (4) long-term access – and discuss some promising technological directions and requirements for each. To address issues associated with accumulation, I examine the question of what we should keep and how we should designate and assess value of digital objects. Distribution raises the question of where we should put our digital belongings; here I propose the creation of a catalog of distributed stores. Digital stewardship raises the fundamental question of maintenance outside of the institution; in this case, I discuss a range of curation services and mechanisms. Finally, there is the question of how we will find these items we have stored long ago; I look at some new access modes that have begun to appear. I then wrap up the discussion by reflecting on what it means to lose some of our digital assets and how we might think of digital archiving technologies.

42 citations







Journal ArticleDOI
TL;DR: I decided to categorize the articles according to what they are about and included the first two letters of author’s last name and last twoletters of the year the article was published.
Abstract: I also decided to categorize the articles according to what they are about. The following list shows the categories I decided on. If the article discusses more than one of these topics the most prominent topic goes first with the other following after a period. For example, if the article was mostly about Digital Resources but also discussed Music faculty and Google.com its number would be 2.34. After the number I included the first two letters of author’s last name and last two letters of the year the article was published.


Journal ArticleDOI
TL;DR: This paper reports on the use of METS structural, PREMIS preservation and MODS descriptive metadata for the British Library’s eJournal system.
Abstract: As institutions turn towards developing archival digital repositories, many decisions on the use of metadata have to be made. In addition to deciding on the more traditional descriptive and administrative metadata, particular care needs to be given to the choice of structural and preservation metadata, as well as to integrating the various metadata components. This paper reports on the use of METS structural, PREMIS preservation and MODS descriptive metadata for the British Library’s eJournal system.

Journal ArticleDOI
TL;DR: Jpeg 2000 may be considered a format that can guarantee an efficient robustness to bit errors and offers a valid quality with transmission or physical errors: this point of view is confirmed by the case study results that are reported in this article, concerning image quality after occurrence of random errors.
Abstract: Digital preservation requires a strategy for the storage of large quantities of data, which increases dramatically when dealing with high resolution images. Typically, decision-makers must choose whether to keep terabytes of images in their original TIFF format or compress them. This can be a very difficult decision: to lose visual information though compression could be a waste of the money expended in the creation of the digital assets; however, by choosing to compress, the costs of storage will be reduced. Wavelet compression of JPEG 2000 produces a high quality image: it is an acceptable alternative to TIFF and a good strategy for the storage of large image assets. Moreover, JPEG 2000 may be considered a format that can guarantee an efficient robustness to bit errors and offers a valid quality with transmission or physical errors: this point of view is confirmed by the case study results that we report in this article, concerning image quality after occurrence of random errors by a comparison among different file formats. Easy tools and freeware software can be used to improve format robustness by duplicating file headers inside or outside the image file format, enhancing the role of JPEG 2000 as a new archival format for high quality images. Introduction: current trends In recent years the JPEG 2000 format has been widely used in digital libraries, not only as a \"better\" JPEG to deliver medium-quality images, but also as new \"master\" file for high quality images, replacing TIFF1 images. One of the arguments used for this policy was the \"lossless mode\" feature of JPEG 2000; but this type of compression saves only about the half of the storage http://www.dlib.org/dlib/july08/buonora/07buonora.html requirements of TIFF, so it is unlikely that this was the only reason that digital libraries moved in this direction. The only reasonable choice was the standard lossy compression, which offers a 1:20 (color) or 1:10 (grayscale) ratio. This provides a significant savings in terms of storage, considering that the quality of images in digitization projects has increased dramatically in the past few years: the highest standards for image capture are now very common in digital libraries. Thus, the argument turned from the \"mathematically lossless\" concept to a softer \"visually lossless\" definition, and the question became: what do we lose in choosing the JPEG 2000 \"lossy\" mode? Let's focus on the following definitions: \"The image file will not retain the actual RGB color data, but it will look the same because screens and our eyes are so forgiving\"2 \"... many repositories are storing \"visually lossless\" JPEG 2000 files: the compression is lossy and irreversible but the artifacts are not noticeable\"3 As mentioned above, some institutions began to store JPEG 2000 files in their digital repositories as the \"archival format\"4. This policy was sometimes officially declared, or in some cases was adopted de facto. \"The migration process involves creating a derivative master from the original archival master...\" or, as shown in the example of the following migration rationale: \" Create JPEG 2000 datastream for presentation and standardize on JPEG 2000 as an archival master format. \"5 One of the most relevant and specific examples of format migration to JPEG 2000 was made at the Harvard University Library (HUL):6 \"HUL chose to perform a migration of various image files to the JPEG 2000 format. There is great local interest at Harvard in the retrospective conversion of substantial numbers of existing TIFF images to enhance their utility by permitting the dynamic image manipulation facilitated by the JPEG 2000 format. The three goals that guided the design of the migration were: To preserve fully the integrity of the GIF, JPEG, and TIFF source data when transformed into the JPEG 2000 (JP2) format To maximize the utility of the new JP2 objects To minimize migration costs\" The Xerox Research Center, namely Robert Buckley, was involved in this strategy, producing studies about the integration of JPEG 2000 in the OAIS Reference Model and defining it as a digital preservation standard.7 Although Buckley's Technology Watch Report has been accepted and promoted by the Digital Preservation Coalition in the UK, many relevant experts in this http://www.dlib.org/dlib/july08/buonora/07buonora.html field still seem to show some skepticism and continue to take a \"wait and see\" position:8 \"... some institutions engaged in large-scale efforts are considering a switch to JPEG 2000 ... However, the standard is not yet commonly used and there is not sufficient support for it by Web browsers. The number of tools available for JPEG 2000 is limited but continues to grow\".9 Tim Vitale's opinion on JPEG 2000 was very clear in his 2007 report:10 \"It is not an archival format ... Existing web browsers (mid-2007) are not yet JPEG 2000 capable. One of the biggest problems with the format is the need for viewing software to be added to existing web browsers ... There are very few implementations of the JPEG 2000 technology, more work needs to be done before general understanding and acceptance will be possible.\" However, this is no longer the case: most common commercial, digital imaging programs now support JPEG 2000, not to mention JPEG 2000 support by some excellent shareware.11 The real problem is that the JPEG 2000 format allows the storage of very large images, and no current programs can manage the computer memory in an intelligent way: this is the commercial reason for professional image servers and encoders, which are relatively costly,12 or specific viewers for geographic images (generally free13), or browser plug-ins (free as well). 1. Image compression of continuous tone images The primary objection to JPEG 2000 compression remains the possible loss of visual information. Our approach in arguing against this will not focus on how the wavelet approach works,14 but why it works, with some very basic elements of compression theory.15 In other words, preserving visual information deals mainly with how the images are perceived visually, and only secondarily deals with the mathematical aspects of the physical signal (materials, procedures, techniques). Some would argue that images look the same as they did before compression simply because humans don't see very well, and that a deeper examination (or a better monitor) would reveal errors and losses. This is not true: even when JPEG 2000 images are enhanced by magnification, no human could perceive any errors or losses. A digital surrogate is not necessarily a bad copy of the original, and compression does not always mean loss of information. Some people also may think that compression is the equivalent of the \"sampling\" of a signal; for example, if we choose 300 points per inch to represent an object, sub-sampling might take only 150 or 100 points instead, which creates the risk of losing some information essential for reconstructing that signal. Any sampling below the Nyquist rate produces aliasing effects: if we represent the signal as a wave, the sampling interval should match exactly the shape of the wave. Otherwise, original images are \"misunderstood\" and appear as artifacts. But compression is not a kind of sub-sampling made after the capture of an image. http://www.dlib.org/dlib/july08/buonora/07buonora.html We can either eliminate redundant information (a sequence of identical values), or we can have some kind of lossless compression, but below the physicalmathematical reality, we can operate on the human perception of it. Since we are dealing with the information that we perceive with our eyes, we can compress irrelevant information, i.e., what is less relevant to our senses. The human eye is less sensitive to colors than to light, so the chrominance signal can be compressed more than the luminance signal can, without any loss of perception. This is very important with digital images of historical documents, as they are usually either color or grayscale images, i.e., \"continuous tone\" images. As opposed to a \"discrete tone\" image (as a printed or typed document in black and white), in a continuous tone image any variation of adjacent pixels is relevant: in other words, pixels are \"correlated\" with each other. We cannot retrieve a sequence of identical values to compress, and we need a more sophisticated strategy. We can select a part of the image, an array of pixels, and calculate the average of the values; then, we can calculate the difference of any single value from the average. This is called \"de-correlation\" of the image pixels, and at the end of this process we will find that many of the differences from the basic average value are 0, or almost zero, so we can easily compress the image by assigning them the same values (quantization).16 When we separate the three color channels, each of them can be considered as a grayscale image, and we can use the \"bit planes\" technique.17 For example, let us take three adjacent pixels in a grayscale image, with very different values, in a decimal and in a binary code: Figure 1: An 8-bit grayscale image and its bit planes. 10 = 000001010 3 = 000000011 -7 = 100000111 The image is at 8-bit depth, so we have 1+8 bits (the first represents +/sign). At positions 2,3,4,5 (i.e. at bit-plane 2,3,4,5) we find only \"0\", and at position 8 find only \"1\": this is also expressed by saying that the relevance of the information or energy (low frequencies) concentrates at certain levels, and the other levels (high frequencies) can be easily compressed.18 This is very clear in http://www.dlib.org/dlib/july08/buonora/07buonora.html the following representation of an image in 8-bit planes: continuous tone variations between adjacent pixels are now turned in eight separate contexts, where it is now possible to compress adjacent values. Figure 2: A corrupted JPEG file. There are two main methods for de-correlating pixe


Journal ArticleDOI
TL;DR: This article chronicles the journey towards a common way of packaging and exchanging digital content in a future Australian data commons – a national corpus of research resources that can be shared and re-used.
Abstract: In any journey, there's a destination but half the 'fun' is getting there. This article chronicles our journey towards a common way of packaging and exchanging digital content in a future Australian data commons – a national corpus of research resources that can be shared and re-used. Whatever packaging format is used, it has to handle complex content models and work across multiple submission and dissemination scenarios. It has to do this in a way that maintains a history of the chain of custody of objects over time. At the start of our journey we chose METS extended by PREMIS to do this. We learnt a lot during the first two stages that we want to share with those travelling to a similar destination.





Journal ArticleDOI
TL;DR: In this article, the authors propose a simple model for everyday web sites which takes advantage of the web server itself to help prepare the site's resources for preservation, which is accomplished by having metadata utilities analyze the resource at the time of dissemination.
Abstract: There are innumerable departmental, community, and personal web sites worthy of long-term preservation but proportionally fewer archivists available to properly prepare and process such sites. We propose a simple model for such everyday web sites which takes advantage of the web server itself to help prepare the site's resources for preservation. This is accomplished by having metadata utilities analyze the resource at the time of dissemination. The web server responds to the archiving repository crawler by sending both the resource and the just-in-time generated metadata as a straight-forward XML-formatted response. We call this complex object (resource + metadata) a CRATE. In this paper we discuss modoai, the web server module we developed to support this approach, and we describe the process of harvesting preservationready resources using this technique.

Journal ArticleDOI
TL;DR: djatoka is introduced, an open source JPEG 2000 image server with an attractive basic feature set, and extensibility under control of the community of implementers, and called upon the community to engage in further development of djatoka.
Abstract: The ISO-standardized JPEG 2000 image format has started to attract significant attention. Support for the format is emerging in major consumer applications, and the cultural heritage community seriously considers it a viable format for digital preservation. So far, only commercial image servers with JPEG 2000 support have been available. They come with significant license fees and typically provide the customers with limited extensibility capabilities. Here, we introduce djatoka, an open source JPEG 2000 image server with an attractive basic feature set, and extensibility under control of the community of implementers. We describe djatoka, and point at demonstrations that feature digitized images of marvelous historical manuscripts from the collections of the British Library and the University of Ghent. We also caIl upon the community to engage in further development of djatoka.

Journal ArticleDOI
TL;DR: The methodology and initial results from qualitative research into the usage and communication of digital information are presented, including Contextual Design and Cultural Probes, and proposals for refinement are outlined.
Abstract: This article presents the methodology and initial results from qualitative research into the usage and communication of digital information. It considers the motivation for the research and the methodologies adopted, including Contextual Design and Cultural Probes. The article describes the preliminary studies conducted to test the approach, highlighting the strengths and limitations of the techniques applied. Finally, it outlines proposals for refinement in subsequent iterations and the future research activities planned. The research is carried out as part of the Planets (Preservation and Long-term Access through NETworked Services) project. As the digital evolution becomes infused into everyday life, the ways in which society communicates and uses information are changing. New processes are emerging that were inconceivable in a solely analogue world. National libraries and archives, as the custodians of a society's information, have the responsibility to safeguard these records and to provide sustained access to digital cultural and scientific knowledge. If these organisations are to fulfil these responsibilities, as a community of practitioners we must understand the nature of new communication and usage processes, both to ensure the appraisal process captures the right material and to guarantee that the new kinds of emerging working procedures are supported by the institutions.