scispace - formally typeset
Open AccessJournal ArticleDOI

The Recovery Manager of the System R Database Manager

Reads0
Chats0
TLDR
The recovery subsystem of an experimental data management system is described and evaluated and the DO-UNDO-REDO protocol allows new recoverable types and operations to be added to the recovery system.
Abstract: 
The recovery subsystem of an experimental data management system is described and evaluated. The transactmn concept allows application programs to commit, abort, or partially undo their effects. The DO-UNDO-REDO protocol allows new recoverable types and operations to be added to the recovery system Apphcation programs can record data m the transaction log to facilitate application-specific recovery. Transaction undo and redo are based on records kept in a transaction log. The checkpoint mechanism is based on differential fries (shadows). The recovery log is recorded on disk rather than tape.

read more

Content maybe subject to copyright    Report

The Recovery Manager of the System R Database Manager
JIM GRAY
Tandem Computers, 19333 Vallco Parkway, Cupertino, Californta 95014
PAUL McJONES
Xerox Corporatwn, 3333 Coyote Htll Road, Palo Alto, Cahfornia 94304
MIKE BLASGEN, BRUCE LINDSAY, RAYMOND LORIE, TOM PRICE,
FRANCO PUTZOLU, AND IRVING TRAIGER
IBM San Jose Research Laboratory, 5600 Cottle Road, San Jose, Cahfornm 95193
The recovery subsystem of an experimental data management system is described and
evaluated. The transactmn concept allows application programs to commit, abort, or
partially undo their effects. The DO-UNDO-REDO protocol allows new recoverable types
and operations to be added to the recovery system Apphcation programs can record data
m the transaction log to facilitate application-specific recovery. Transaction undo and
redo are based on records kept in a transaction log. The checkpoint mechanism is based
on differential fries (shadows). The recovery log is recorded on disk rather than tape.
Keywords and Phrases transactions, database, recovery, reliability
CR Categortes: 4.33
INTRODUCTION
Application Interface to System R
Making computers easier to use is the goal
of most software. Database management
systems, in particular, provide a program-
ming interface to ease the task of writing
electronic bookkeeping programs. The re-
covery manager of such a system in turn
eases the task of writing fault-tolerant ap-
plication programs.
System R [ASTR76] is a database system
which supports the relational model of
data. The SQL language [CHAM76] pro-
vides operators that manipulate the data-
base. Typically, a user writes a PL/I or
COBOL program which has imbedded SQL
statements. A collection of such statements
is required to make a consistent transfor-
mation of the database. To transfer funds
from one account to another, for example,
requires two SQL statements: one to debit
the first account and one to credit the sec-
ond account. In addition, the transaction
probably records the transfer in a history
file for later reporting and for auditing pur-
poses. Figure 1 gives an example of such a
program written in pseudo-PL/I.
The program effects a consistent trans-
formation of the books of a hypothetical
bank. Its actions are either to
• discover an error,
• accept the input message, and
• produce a failure message,
or to
• discover no errors,
• accept the input message,
Permismon to copy without fee all or part of this material is granted provided that the copies are not made or
¢hstnbuted for direct commercial advantage, the ACM copyright notme and the title of the publication and its
date appear, and notme is given that copying is by pernusmon of the Association for Computing Machinery. To
copy otherwise, or to republish, reqmres a fee and/or specific permission.
© 1981 ACM 0010-4892/81/0600-0223 $00.75
Computing Surveys, Vol. 13, No. 2, June 1981

224 • Jim Gray et al.
CONTENTS
INTRODUCTION
Apphcatlon Interface to System R
Structure of System R
Model of Failures
1. DESCRIPTION OF SYSTEM R RECOVERY
MANAGER
1 1 What Is a Transaction?
1.2 Transactmn Save Points
1 3 Summary
2 IMPLEMENTATION OF SYSTEM R
RECOVERY
2.1 Files, Versmns, and Shadows
2 2 Logs and the DO, UNDO, REDO Protocol
2.3 Commit Processing
2.4 Transactmn UNDO
2 5 Transaction Save Points
2.6 System Configuratmn, Startup and Shutdown
2.7 System Checkpoint
2.8 System Restart
2 9 Medm Failure
2 10 Managing the Log
2 11 Recovery and Locking
3 EVALUATION
3 1 Implementation Cost
3 2 Execution Cost
3.3 I/O Cost
3.4 Success Rate
3.5 Complexity
3.6 Dmk-Based Log
3 7 Save Points
3 8 Shadows
3.9 Message Recovery, an Oversight
3 10 New Features
ACKNOWLEDGMENTS
REFERENCES
A
v
• debit the source account by AMOUNT,
• credit the destination account by
AMOUNT,
• record the transaction in a history file,
and
• produce a success message.
The programmer who writes such a pro-
gram ensures its correctness by ensuring
that it performs the desired transformation
on both the database state and the outside
world (via messages). The programmer and
the user both want the execution to be
• atomic: either all actions are performed
(the transaction has an effect) or the re-
sults of all actions are undone (the trans-
action has no effect);
• durable: once the transaction completes,
its effects cannot be lost due to computer
failure;
• consistent: the transaction occurs as
though it had executed on a system which
sequentially executes only one transaction
at a time.
In order to state this intention, the SQL
programmer brackets the transformations
with the SQL statements, BEGIN__
TRANSACTION to signal the beginning
of the transaction and COMMIT__
TRANSACTION to signal its completion.
If the programmer wants to return to the
beginning of the transaction, the command
RESTORE__TRANSACTION will undo
all actions since the issuance of the BE-
GIN__TRANSACTION command (see
Figure 1).
The System R recovery manager sup-
ports these commands and guarantees an
atomic, durable execution.
System R generally runs several trans-
actions concurrently. The concurrency con-
trol mechanism of System R hides such
concurrency from the programmer by a
locking technique [EswA76, GRAY78,
NAUM78] and gives the appearance of a
consistent system.
Structure of System R
System R consists of an external layer
called the Research Data System (RDS),
and a completely internal layer called the
Research Storage System (RSS) (see
Figure 2).
The external layer provides a relational
data model, and operators thereon. It also
provides catalog management, a data
dictionary, authorization, and alternate
views of data. The RDS is manipulated
using the language SQL [CHAM76]. The
SQL compiler maps SQL statements into
sequences of RSS calls.
The RSS is a nonsymbolic record-at-a-
time access method. It supports the notions
of file, record type, record instance, field
within record, index (B-tree associative
and sequential access path), parent-child
set (an access path supporting the
operations PARENT, FIRST__CHILD,
NEXT__SIBLING, PREVIOUS__SIB-
LING with direct pointers), and cursor
(which navigates over access paths to locate
Computing Surveys, Vol. 13, No 2, June 1981

The Recovery Manager of the System R Database Manager °
225
FUNDS__TRANSFER. PROCEDURE,
$BEGIN__TRANSACTION;
ON ERROR DO; /* in case of error */
$RESTORE_TRANSACTION, /* undo all work */
GET INPUT MESSAGE; /* reacquire input */
PUT MESSAGE ('TRANSFER FAILED'); /* report failure */
GO TO COMMIT;
END;
GET INPUT MESSAGE;
EXTRACT ACCOUNT~EBIT, ACCOUNT_CREDIT,
AMOUNT FROM MESSAGE,
$UPDATE ACCOUNTS /* do debit */
SET BALANCE ffi BALANCE - AMOUNT
WHERE ACCOUNTS. NUMBER = ACCOUNT__DEBIT;
$UPDATE ACCOUNTS /* do credit */
SET BALANCE = BALANCE + AMOUNT
WHERE ACCOUNTS. NUMBER = ACCOUNT_CREDIT;
$INSERT INTO HISTORY /* keep audit trail */
<DATE, MESSAGE>;
PUT MESSAGE ('TRANSFER DONE'); /* report success */
COMMIT: /* commit updates */
$COMMIT TRANSACTION
END; /* end of program */
/* get and parse input */
Figure 1. A snnple PL/I-SQL program whmh transfers funds from one account to another.
Application Programs in PL/I or COBOL, plus SQL
Research Data System (RDS)
* Supports the relational data model
• Supports the relational language SQL
• Does naming and authorization
• Compiles SQL statements into RSS call sequences
Research Storage System (RSS)
• Provides nonsymbolic record-at-a-time database ac-
cess
• Maps records onto operating system files
• Provides transaction concept (recovery and locking)
Operating System
• Provides file system to manage disks
• Provides I/O system to manage terminals
• Provides process structure (multlprogramming)
Hardware
Figure
2. System R consists of two layers above the
operating system. The RSS provides the transaction
concept, recovery notions, and a record-at-a-time data
access method. The RDS accepts application PL/I or
COBOL programs containing SQL statements. It
translates them into COBOL or PL/I programs plus
subroutines which represent the compilation of the
SQL statements into RSS calls.
records). Unfortunately, these objects have
the nonstandard names "segment," "rela-
tion," "tuple," "field," "image," "link," and
"scan" in the System R documentation.
The former, more standard, names are used
here. RSS provides actions to create in-
stances of these objects and to retrieve,
modify, and delete them.
The RSS support of data is substantially
more sophisticated than that normally
found in an access method; it supports vari-
able-length fields, indices on multiple fields,
multiple record types per file, interffle and
intraffle sets, physical clustering of records
by attribute, and a catalog describing the
data, which is kept as a file which may be
manipulated like any other data.
Another major contribution of the RSS
is its support of the notion of
transaction,
a unit of recovery consisting of an applica-
tion-specified sequence of RSS actions. An
application declares the start of a transac-
tion by issuing a BEGIN action. Thereafter
all RSS actions by that application are
within the scope of that transaction until
the application issues a COMMIT or an
ABORT action. The RSS assumes all re-
sponsibility for running concurrent trans-
actions and for assuring that each transac-
tion sees a consistent view of the database.
The RSS is also responsible for recovering
the data to their most recent consistent
state in the event of transaction, action,
system, or media failure or a user request
to cancel the transaction.
Computing Surveys, Vol. 13, No. 2, June 1981

226 •
Jim Gray et al.
A final component of System R is the
operating system. System R runs under the
VM/370 [GRAY75] and the MVS operating
system on IBM S/370 processors. The Sys-
tem R recovery manager is also part of the
SQL/DS product running on DOS/CICS.
The operating system provides processes, a
simple file system, and terminal manage-
ment.
System R allocates an operating system
process for each user to run both the user's
application program and the System R da-
tabase manager. Application programs are
written in a conventional programming lan-
guage (e.g., COBOL or PL/I) augmented
with the SQL language. A SQL preproces-
sor maps the SQL statements to sequences
of RSS calls. Typically, a single application
program or group of programs (main plus
subroutines) constitute a transaction. In
this paper we ignore the RDS and assume
that application programs, like those pro-
duced by the SQL compiler, consist of con-
ventional programs which invoke se-
quences of RSS operations.
Model of Failures
The recovery manager eases the task of
writing
fault-tolerant
programs. It does so
by the careful use of redundancy. Choosing
appropriate redundancy requires a quanti-
tative model of system failures.
In our experience about 97 percent of all
transactions execute successfully. Of the
remainder, almost all fail because of incor-
rect user input or because of user cancella-
tion. Occasionally {much less than 1 per-
cent) transactions are aborted by the sys-
tem as a result of some overload such as
deadlock. In a typical system running one
transaction per second, transaction undo
occurs about twice a minute. Because of its
frequency, transaction undo must run
about as fast as forward processing of trans-
actions.
Every few days the system
restarts
(fol-
lowing a crash). Almost all crashes are due
to hardware or operating system failures,
although System R also initiates crash and
restart whenever it detects damage to its
data structures. The state of primary mem-
ory is lost after a crash. We assume that the
state of the disks (secondary and tertiary
storage) is preserved across crashes, so at
Table
1. Frequency and Recovery Time of Failures
Recovery manager trade-offs
Recovery
Fault Frequency tune
Transaction Several per unnute Milliseconds
abort
System Several per month Seconds
restart
Media failure Several per year Minutes
restart the most recently committed state
is reconstructed from the surviving disk
state by referencing a log of recent activity
to restore the work of committed and
aborted transactions. This process com-
pletes within a matter of seconds or min-
utes.
Occasionally, the integrity of the disk
state will be lost at restart. This may be
caused by hardware failure (disk head crash
or disk dropped on the floor) or by software
failure (bad data written on a disk page by
System R or other program). Such events
are called
media failures
and initiate a
reconstruction of the current state from an
archive version (old and undamaged ver-
sion of the system state) plus a log of activ-
ity since that time. This procedure is in-
voked once or twice a year and is expected
to complete within an hour.
If all these recovery procedures fail, the
user will have lost data owing to an
unre-
coverable failure.
We have very limited
statistics on unrecoverable failures. The
current release of System R has experi-
enced about 25 years of service in a variety
of installations, and to our knowledge al-
most all unrecoverable failures have re-
sulted from operations errors {e.g., failure
to make archive dumps) or from bugs in
the operating system utility for dumping
and restoring disks. The fact that the ar-
chive mechanism is only a minor source of
unrecoverable failure probably indicates
that it is appropriately designed. Table 1
summarizes this discussion.
If the archive mechanism fails once every
hundred years of operation, and if there are
10,000 installations of System R, then it will
fail someone once a month. From this per-
spective, it might be underdesigned.
We assume that System R, the operating
system, the microcode, and the hardware
all have bugs in them. However, each of
Computing Surveys, Vol. 13, No. 2, June 1981

The Recovery Manager of the System R Database Manager • 227
these systems does quite a bit of checking
of its data structures (defensive program-
ming}. We postulate that these errors are
detected and that the system crashes before
the data are seriously corrupted. If this
assumption is incorrect, then the situation
is treated as a media failure. This attitude
assumes that the archive and log mecha-
nism are very reliable and have failure
modes independent of the other parts of
the system.
Some commercial systems are much
more demanding. They run hundreds of
transactions per second, and because they
have hundreds of disks, they see disk fail-
ures hundreds of times as frequently as
typical users of System R {once a week
rather than once a year). They also cannot
tolerate downtimes exceeding a few min-
utes. Although the concepts presented in
this paper are applicable to such systems,
much more redundancy is needed to meet
such demands (e.g., duplexed processors
and disks, and utilities which can recover
small parts of the database without having
to recover it all every time). The recovery
manager presented here is a textbook one,
whose basic facilities are only a subset of
those provided by more sophisticated sys-
tems.
The transaction model is an unrealizable
ideal. At best, careful use of redundancy
minimizes the probability of unrecoverable
failures and consequent loss of committed
updates. Redundant copies are designed to
have independent failure modes, making it
unlikely that all records will be lost at once.
However, Murphy's law ensures that all
recovery techniques will sometimes fail. As
seen below, however, System R can tolerate
any single failure and can often tolerate
multiple failures.
1. DESCRIPTION OF SYSTEM R RECOVERY
MANAGER
1.1 What is a Transaction?
The RSS provides actions on the objects it
implements. These actions include opera-
tions to create, destroy, manipulate, re-
trieve, and modify RSS objects (files, rec-
ord types, record instances, indices, sets,
and cursors). Each RSS action is atomic--
it either happens or has no effect--and
consistent--if any two actions relate to the
same object, they appear to execute in some
serial order. These two qualities are en-
sured by (1) undoing the partial effects of
any actions which fail and (2) locking nec-
essary RSS resources for the duration of
the action.
RSS actions are rather primitive. In gen-
eral, functions like "hire an employee" or
"make a deposit in an account" require
several actions. The user, in mapping ab-
stractions like "employee" or "account"
into such a system, must combine several
actions into an atomic transaction. The
classic example of an atomic transaction is
a funds transfer which debits one account,
credits another, writes an activity record,
and does some terminal input or output.
The user of such a transaction wants it to
be an all-or-nothing affair, in that he does
not want only some of the actions to have
occurred. If the transaction is correctly im-
plemented, it looks and acts atomic.
In a multiuser environment, transactions
take on the additional attribute that any
two transactions concurrently operating on
common objects appear to run serially (i.e.,
as though there were no concurrency). This
property is called consistency and is han-
dled by the RSS lock subsystem [ESWA76,
GRAY76, GRAY78, NAUM78].
The application declares a sequence of
actions to be a transaction by beginning the
sequence with a BEGIN action and ending
it with a COMMIT action. All intervening
actions by that application (be it one or
several processes) are considered to be
parts of a single recovery unit. If the appli-
cation gets into trouble, it may issue the
ABORT action which undoes all actions in
the transaction. Further, the system may
unilaterally abort in-progress transactions
in case of an authorization violation, re-
source limit, deadlock, system shutdown, or
crash. Figure 3 shows the three possible
outcomes--commit, abort, or system abor-
tion-of a transaction, and Figure 4 shows
the outcomes of five sample transactions in
the event of a system crash.
If a transaction either aborts or is
aborted, the system must undo all actions
of that transaction. Once a transaction com-
mits, however, its updates and messages to
Computmg Surveys, Vol. 13, No. 2, June 1981

Citations
More filters
Book

Principles of Distributed Database Systems

TL;DR: This third edition of a classic textbook can be used to teach at the senior undergraduate and graduate levels and concentrates on fundamental theories as well as techniques and algorithms in distributed data management.
Journal ArticleDOI

Query evaluation techniques for large databases

TL;DR: This survey describes a wide array of practical query evaluation techniques for both relational and postrelational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set-matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.
Journal ArticleDOI

Principles of transaction-oriented database recovery

TL;DR: A terminological framework is provided for describing different transactionoriented recovery schemes for database systems in a conceptual rather than an implementation-dependent way by introducing the terms materialized database, propagation strategy, and checkpoint, and a means for classifying arbitrary implementations from a unified viewpoint.
Journal ArticleDOI

ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging

TL;DR: ARIES as discussed by the authors is a database management system applicable not only to database management systems but also to persistent object-oriented languages, recoverable file systems and transaction-based operating systems.
Proceedings ArticleDOI

Implementation techniques for main memory database systems

TL;DR: This paper considers the changes necessary to permit a relational database system to take advantage of large amounts of main memory, and evaluates AVL vs B+-tree access methods, hash-based query processing strategies vs sort-merge, and study recovery issues when most or all of the database fits in main memory.
References
More filters
Journal ArticleDOI

The notions of consistency and predicate locks in a database system

TL;DR: It is argued that a transaction needs to lock a logical rather than a physical subset of the database, and an implementation of predicate locks which satisfies the consistency condition is suggested.
Book ChapterDOI

Notes on Data Base Operating Systems

Jim Gray
TL;DR: This paper is a compendium of data base management operating systems folklore and focuses on particular issues unique to the transaction management component especially locking and recovery.
Journal ArticleDOI

System R: relational approach to database management

TL;DR: This paper contains a description of the overall architecture and design of the system, and emphasizes that System R is a vehicle for research in database architecture, and is not planned as a product.
Journal ArticleDOI

SEQUEL 2: a unified approach to data definition, manipulation, and control

TL;DR: SEQUEL 2 is a relational data language that provides a consistent, English keyword-oriented set of facilities for query, data definition, data manipulation, and datac ontrol.
Journal ArticleDOI

Physical integrity in a large segmented database

TL;DR: A recovery scheme is first proposed for system failure (hardware or software error which causes the contents of main storage to be lost) and a facility for protection against damage to the auxiliary storage itself is proposed.
Related Papers (5)