scispace - formally typeset
Open AccessBook ChapterDOI

Bibliothèque Nationale de France

Noémie Lesquins
- pp 584-590
Reads0
Chats0
TLDR
With more than 10 million volumes and an annual increase of about 50,000, the Bibliotheque Nationale de France is one of the biggest libraries in the world.
Abstract
With more than 10 million volumes and an annual increase of about 50,000, the Bibliotheque Nationale de France is one of the biggest libraries in the world. It is also one of the oldest and since the sixteenth century, it has been entrusted with the mission of collecting, cataloging, preserving, and providing access to the French print heritage. Although the library's history consists of several turning points, the last decade of the twentieth century has brought an unprecedented change in the life of the institution: new information technologies, new buildings, new collection management politics, and new services. More than ever, today the library is part of a national and international network of libraries and other cultural institutions whose goal is both to share the wealth of their resources and assert their identities

read more

Content maybe subject to copyright    Report

HAL Id: hal-00769084
https://hal-bnf.archives-ouvertes.fr/hal-00769084
Submitted on 28 Dec 2012
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
When press is not printed
Clément Oury
To cite this version:
Clément Oury. When press is not printed: The challenge of collecting digital newspapers at the
Bibliothèque nationale de France. The Electronic Re-evolution - News Media in the Digital Age.
Preconference of the IFLA newspapers section, Aug 2012, Mikkeli, Finland. 10 p. �hal-00769084�

1
When press is not printed: the challenge of collecting
digital newspapers at the Bibliothèque nationale de France
Clément Oury
Head of Digital Legal Deposit, Legal Deposit Department
Bibliothèque nationale de France
Abstract:
Since its birth in the early seventeenth century, the press has played a prominent role in the
political and social life of France. Over the two last decades, the economic and even cultural
pillars on which the press ecosystem is built has been challenged by the growing use of digital
technologies, and by the increasing role of the Internet as a way to distribute and access
information.
Heritage libraries are affected by these major changes. They need to address the accelerating
shift from analogue to digital in order to maintain the continuity of their objectives and of
their missions: being able to collect and preserve cultural items, and being able to document
the way these items were produced, distributed and used. Many aspects need to be taken into
account: legal, scientific, technical, economic and organizational issues have to be identified
and addressed.
This paper looks at the example of the National Library of France (Bibliothèque nationale de
France or BnF), and at the way it has dealt with collecting newspapers in digital form.
During the ten last years, the BnF has launched several experiments, testing different
approaches, with varying degrees of success:
- Direct deposit of electronic publications on physical media (CDs and DVDs) or through
FTP. This way of collecting has been experimented with by BnF for some regional
newspapers whose local versions were not kept in their paper form; and for which a
digital substitute was searched out. This paper explains why the experiments were not
conclusive.
- Fully automated web harvesting. Since December 2010, almost 100 news websites
(national and daily newspapers, pure players, news portals…) are collected on a daily
basis. This harvest gives a very good overview of the kind of information available to
French Internet users, but does not allow the collection of publications for which payment
is required.

2
- Web harvesting through agreements with producers. It will be showed how this third
approach may act as an improvement of both previous solutions.
Since its birth in the early seventeenth century, the press has played a major role in the
political and social life of France. In the seventeenth century, struggles between the domestic
and foreign press created an open field for a first kind of public debate. During the
Enlightenment, diffusion of a free press was considered a key condition to achieve the
philosophers’ demands for freedom, political equality and justice. Newspaper and journals
were indeed widely used by leaders, parties, and activists during the French Revolution, and
up to the present day. In the nineteenth and twentieth centuries, newspaper owners and
journalists considered that no subject was beyond their reach, from political, diplomatic or
military issues to economic, cultural or sporting topics.
This prominent role explains why the press is widely used by researchers working on the
history of France and other countries, even for the most recent periods. And this is also the
reason why acquiring, promoting and giving access to press collections is a major objective
for heritage institutions.
At the Bibliothèque nationale de France, this mission is performed for printed journals thanks
to the framework of legal deposit. Legal deposit is the obligation for every producer of
cultural content to send exemplars of its works to the national library. It was introduced by
King François I, at a time where the invention of the printing press radically enhanced the
possibility of producing and distributing books. When the first periodicals were published, in
the 1630s, they were automatically, as printed elements, submitted to legal deposit, thus
theoretically allowing the library to gather all French press titles even though this objective
of comprehensiveness has never been perfectly reached. Legal deposit has been progressively
extended to all kind of cultural items, from engravings (1672) to radio, television and
software (1992), including also music (1745), sounds (1938) and videos (1975).
The digital shift: a threat for heritage institutions’ missions?
However, over the two last decades, the economic and even cultural pillars on which the press
ecosystem was built has been challenged by the growing use of digital technologies, and by
the increasing role of the Internet as a way to distribute and access information. Some major
press titles are encountering financial difficulties. New stakeholders are emerging: some of
them, the “pure players” (or newspapers that only exist online) still come from the realm of
the printed press, whereas others are companies related to computing or information

3
technologies. Finally, all press titles are thinking about their business models, ranging from
fully free content to subscription-based access, with many possible variations.
Heritage libraries are necessarily affected by the major changes that affect the journals, as
well as all other actors in the cultural domain. Their mission remains the same: being able to
gather and preserve all cultural items, and being able to document the way these items are
produced, distributed and used. These institutions need to address this accelerating shift from
analogue to digital in order to maintain the continuity of their objectives and of their missions.
However, they are faced in this regard with two apparently contradictory problems:
- on one hand, radically new kinds of documents are appearing, that need to be gathered and
preserved. The web allows a far larger number of people to publish journals online, hence
multiplying the number of titles whose memory should be kept.
- on the other hand, digital technologies also simplify and make easier the ways people
produce printed documents. The number of printed titles is therefore growing (even
though at a slower pace than their online equivalents), challenging the libraries ability to
acquire, index and store them. For example, 40 000 different titles are currently received
by the periodicals legal deposit service at BnF, representing a total number of 330 000
issues.
To tackle these issues, heritage institutions and especially national libraries have to find
groundbreaking solutions that should at the same time be consistent with the way they deal
with analogue collections. Many aspects need to be taken into account: legal, scientific,
technical, economic and organizational issues have to be identified and addressed.
We propose in this paper to look at the example of the National Library of France
(Bibliothèque nationale de France or BnF), and at the way it deals with collecting newspapers
in digital form. During the ten last years, the BnF has launched several experiments, testing
different approaches, with varying degrees of success.
Deposit of digital substitutes for printed versions
Issues and methods
In the first approach, the publishers perform a deposit of the digital version of the journal they
distribute on paper form. BnF librarians started considering this solution in the early 2000s.
At this time, digitization of older press collections was considered a priority for the BnF

4
online digital library, Gallica
1
and it is still a priority. It therefore appeared logical and
necessary to also be able to keep the memory of the newer ones.
Regional daily press titles were considered the best candidates in order to test this kind of
deposit. Indeed, each of these regional titles generally proposes many local versions according
to the district where they are distributed. Only a few pages vary between the different local
versions, but strictly following the principles of legal deposit, all of them must be kept. This
represents a huge storage cost compared to a rather small benefit in terms of new content.
This is the reason why, at this date, BnF was microfilming the local editions of around 20
regional titles. As there was a threat to the maintenance of microfilm companies and
microfilm reading devices, digital technologies were seen as a good replacement solution.
However the goal was not to digitize the paper version (as it was done in the microfilming
process), but to get directly the digital version used by the publisher and the printer.
There was no legal basis to ask publishers for their digital masters. This is the reason why
BnF started working with volunteers. Two regional newspapers answered positively: the
Populaire du Centre (located in the centre of France) and the Union de Reims (located in
eastern France). First discussions occurred in 2002, and an agreement was signed with the
Populaire du Centre in December 2003. A few months later, discussions started with a third
title, Ouest France (located in western France); the agreement was signed in 2005. These
agreements allowed the retrieval of files from the publishers but also authorized the
consultation of these files in BnF reading rooms. As they were experiments, they had a
suspension date (even though it was possible to renew them).
At the same time, technical teams from BnF and from the publishing companies examined the
processes to be put in place in order to get the data. Very thorough analyses were performed.
It was first decided to have one PDF per page; and to use a FTP platform to exchange data
between publishers and BnF. However, getting the PDF is just the easiest part. Collecting data
is useless if the preservation and access issues are not taken into account.
- From a preservation point of view, it is necessary to validate the format of the files that
are retrieved. This supposes automated identification and characterization of PDFs, and a
way to send back the files that aren’t considered satisfactory.
- from an access point of view, each delivery has to be accompanied by the metadata that
will help in recreating the structure of the newspaper, and will allow the end-user to
navigate within the document. This set of metadata is called the flatplan (in French chemin
de fer, or “railway”).
1
See http://gallica.bnf.fr (consulted on June 14th, 2012).

Citations
More filters
Journal ArticleDOI

European floods during the winter 1783/1784: scenarios of an extreme event during the ‘Little Ice Age’

TL;DR: The first phase of flooding occurred in late December 1783-early January 1784 in England, France, the Low Countries and historical Hungary as mentioned in this paper, and the second phase at the turn of February-March 1784 was of greater extent, generated by the melting of an unusually large accumulation of snow and river ice, affecting catchments across France and Central Europe (where it is still considered as one of the most disastrous known floods), throughout the Danube catchment and in southeast Central Europe.

Pioneras, escritoras y creadoras del siglo XX

TL;DR: The second half of the 20th century, thanks to feminist demands and newer studies, has seen a renewed interest in women writers of the 17th and 18th centuries.
Proceedings ArticleDOI

A Novel Skew Detection Technique Based on Vertical Projections

TL;DR: A new document skew detection approach that is based on vertical projections as well as bounding box minimization criterion is introduced that is more efficient and gives more accurate results compared with the state-of-the-art skew detection algorithm based on horizontal projections.
Dissertation

Transtextuality, (Re)sources and Transmission of the Celtic Culture Through the Shakespearean Repertory

TL;DR: This article explored the resurgence of motifs related to Celtic cultures in Shakespeare's plays, that is to say the way the pre-Christian and pre-Roman cultures of the British Isles permeate the dramatic works of William Shakespeare.
Frequently Asked Questions (11)
Q1. What have the authors contributed in "When press is not printed" ?

This paper looks at the example of the National Library of France ( Bibliothèque nationale de France or BnF ), and at the way it has dealt with collecting newspapers in digital form. This way of collecting has been experimented with by BnF for some regional newspapers whose local versions were not kept in their paper form ; and for which a digital substitute was searched out. This paper explains why the experiments were not conclusive. 

At this time, digitization of older press collections was considered a priority for the BnFonline digital library, Gallica 1 – and it is still a priority. 

Harvesting this content at a rapid pace is critical in order to be able to catch the essence of the Web: its rapidity in publishing and removing information. 

In order to demonstrate that its workflow was also able to harvest short-lived content, BnF decided to consider harvesting news websites as a priority 4 . 

To harvest websites related to the presidential and parliamentary elections of 2007, BnF team built a permanent web archiving workflow (whereas 2002 and 2004 campaigns were run in project mode). 

At this period, it was becoming obvious that the Internet was becoming the main publishing platform, and that it was necessary to organize the safeguarding of documents distributed on the web. 

As there was a threat to the maintenance of microfilm companies and microfilm reading devices, digital technologies were seen as a good replacement solution. 

After a few weeks of tests, the first harvest of this list of 80 websites was launched in December 2010 – and those sites have been harvested each day up to now. 

This software acts as an automated web user: starting from a list of URLs given by the human administrator, the robots follow hyperlinks and copy all pages, files, PDFs, videos, etc. that they may discover. 

Two regional newspapers answered positively: the Populaire du Centre (located in the centre of France) and the Union de Reims (located in eastern France). 

BnF team is currently defining a list of potential candidates, trying to design a representative sample: for example a national daily title, another regional title, a “pure player”…When more newspapers are archived, it will be time to think about better ways to index them.