scispace - formally typeset
Search or ask a question

Showing papers in "ACM Transactions on Computer Systems in 1992"


Journal ArticleDOI
TL;DR: In this paper, a log-structured file system called Sprite LFS is proposed, which uses a segment cleaner to compress the live information from heavily fragmented segments in order to speed up file writing and crash recovery.
Abstract: This paper presents a new technique for disk storage management called a log-structured file system. A log-structured file system writes all modifications to disk sequentially in a log-like structure, thereby speeding up both file writing and crash recovery. The log is the only structure on disk; it contains indexing information so that files can be read back from the log efficiently. In order to maintain large free areas on disk for fast writing, we divide the log intosegmentsand use a segment cleaner to compress the live information from heavily fragmented segments. We present a series of simulations that demonstrate the efficiency of a simple cleaning policy based on cost and benefit. We have implemented a prototype log-structured file system called Sprite LFS; it outperforms current Unix file systems by an order of magnitude for small-file writes while matching or exceeding Unix performance for reads and large writes. Even when the overhead for cleaning is included, Sprite LFS can use 70% of the disk bandwidth for writing, whereas Unix file systems typically can use only 5–10%.

2,403 citations


Journal ArticleDOI
TL;DR: This paper shows that disconnected operation is feasible, efficient and usable by describing its design and implementation in the Coda File System by showing that caching of data, now widely used for performance, can also be exploited to improve availability.
Abstract: Disconnected operation is a mode of operation that enables a client to continue accessing critical data during temporary failures of a shared data repository. An important, though not exclusive, application of disconnected operation is in supporting portable computers. In this paper, we show that disconnected operation is feasible, efficient and usable by describing its design and implementation in the Coda File System. The central idea behind our work is that caching of data, now widely used for performance, can also be exploited to improve availability.

1,214 citations


Journal ArticleDOI
TL;DR: A theory of authentication and a system that implements it, based on the notion of principal and a “speaks for” relation between principals, is described and used to explain many existing and proposed security mechanisms.
Abstract: We describe a theory of authentication and a system that implements it. Our theory is based on the notion of principal and a “speaks for” relation between principals. A simple principal either has a name or is a communication channel; a compound principal can express an adopted role or delegated authority. The theory shows how to reason about a principal's authority by deducing the other principals that it can speak for; authenticating a channel is one important application. We use the theory to explain many existing and proposed security mechanisms. In particular, we describe the system we have built. It passes principals efficiently as arguments or results of remote procedure calls, and it handles public and shared key encryption, name lookup in a large name space, groups of principals, program loading, delegation, access control, and revocation.

857 citations


Journal ArticleDOI
TL;DR: In this paper, the authors argue that the performance of kernel threads is inherently worse than that of user-level threads, rather than this being an artifact of existing implementations; managing parallelism at the user level is essential to high-performance parallel computing.
Abstract: Threads are the vehicle for concurrency in many approaches to parallel programming. Threads can be supported either by the operating system kernel or by user-level library code in the application address space, but neither approach has been fully satisfactory.This paper addresses this dilemma. First, we argue that the performance of kernel threads is inherently worse than that of user-level threads, rather than this being an artifact of existing implementations; managing parallelism at the user level is essential to high-performance parallel computing. Next, we argue that the problems encountered in integrating user-level threads with other system services is a consequence of the lack of kernel support for user-level threads provided by contemporary multiprocessor operating systems; kernel threads are the wrong abstraction on which to support user-level management of parallelism. Finally, we describe the design, implementation, and performance of a new kernel interface and user-level thread package that together provide the same functionality as kernel threads without compromising the performance and flexibility advantages of user-level management of parallelism.

437 citations


Journal ArticleDOI
TL;DR: This paper describes a new way of implementing causal operations that performs well in terms of response time, operation-processing capacity, amount of stored state, and number and size of messages; it does better than replication methods based on reliable multicast techniques.
Abstract: To provide high availability for services such as mail or bulletin boards, data must be replicated. One way to guarantee consistency of replicated data is to force service operations to occur in the same order at all sites, but this approach is expensive. For some applications a weaker causal operation order can preserve consistency while providing better performance. This paper describes a new way of implementing causal operations. Our technique also supports two other kinds of operations: operations that are totally ordered with respect to one another and operations that are totally ordered with respect to all other operations. The method performs well in terms of response time, operation-processing capacity, amount of stored state, and number and size of messages; it does better than replication methods based on reliable multicast techniques.

434 citations


Journal ArticleDOI
TL;DR: This work uses simulation to compare different design choices in the Continuous Media File System, CMFS, and addresses several interrelated design issues; real-time semantics fo sessions, disk layout, an acceptance test for new sessions, and disk scheduling policy.
Abstract: The Continuous Media File System, CMFS, supports real-time storage and retrieval of continuous media data (digital audio and video) on disk. CMFS clients read or write files in “sessions,” each with a guaranteed minimum data rate. Multiple sessions, perhaps with different rates, and non-real-time access can proceed concurrently. CMFS addresses several interrelated design issues; real-time semantics fo sessions, disk layout, an acceptance test for new sessions, and disk scheduling policy. We use simulation to compare different design choices.

330 citations


Journal ArticleDOI
TL;DR: This work develops several page placement algorithms, called careful-mapping algorithms, that try to select a page frame from a pool of available page frames that is likely to reduce cache contention.
Abstract: When a computer system supports both paged virtual memory and large real-indexed caches, cache performance depends in part on the main memory page placement. To date, most operating systems place pages by selecting an arbitrary page frame from a pool of page frames that have been made available by the page replacement algorithm. We give a simple model that shows that this naive (arbitrary) page placement leads to up to 30% unnecessary cache conflicts. We develop several page placement algorithms, called careful-mapping algorithms, that try to select a page frame (from the pool of available page frames) that is likely to reduce cache contention. Using trace-driven simulation, we find that careful mapping results in 10–20% fewer (dynamic) cache misses than naive mapping (for a direct-mapped real-indexed multimegabyte cache). Thus, our results suggest that careful mapping by the operating system can get about half the cache miss reduction that a cache size (or associativity) doubling can.

289 citations


Journal ArticleDOI
TL;DR: This paper describes a new way to organize network software that differs from conventional architectures in all three of these properties; the protocol graph is complex, individual protocols encapsulate a single function, and the topology of the graph is dynamic.
Abstract: Network software is a critical component of any distributed system. Because of its complexity, network software is commonly layered into a hierarchy of protocols, or more generally, into a protocol graph. Typical protocol graphs—including those standardized in the ISO and TCP/IP network architectures—share three important properties; the protocol graph is simple, the nodes of the graph (protocols) encapsulate complex functionality, and the topology of the graph is relatively static. This paper describes a new way to organize network software that differs from conventional architectures in all three of these properties. In our approach, the protocol graph is complex, individual protocols encapsulate a single function, and the topology of the graph is dynamic. The main contribution of this paper is to describe the ideas behind our new architecture, illustrate the advantages of using the architecture, and demonstrate that the architecture results in efficient network software.

272 citations


Journal ArticleDOI
TL;DR: Modal logic framework includes definitions of knowledge, permission, and obligation, which are used to specify secrecy policies and obligation to specify integrity policies.
Abstract: A formal framework called Security Logic (SL) is developed for specifying and reasoning about security policies and for verifying that system designs adhere to such policies. Included in this modal logic framework are definitions of knowledge, permission, and obligation. Permission is used to specify secrecy policies and obligation to specify integrity policies. The combination of policies is addressed and examples based on policies from the current literature are given.

129 citations


Journal ArticleDOI
TL;DR: Simulation results for a hexagonal mesh and a hypercube topology indicate that the expected cost can be lowered substantially by the proposed scheme, intended for distributed real-time systems.
Abstract: Reliable and timely delivery of messages between processing nodes is essential in distributed real-time systems. Failure to deliver a message within its deadline usually forces the system to undertake a recovery action, which introduces some cost (or overhead) to the system. This recovery cost can be very high, especially when the recovery action fails due to lack of time or resources.Proposed in this paper is a scheme to minimize the expected cost incurred as a result of messages failing to meet their deadlines. The scheme is intended for distributed real-time systems, especially with a point-to-point interconnection topology. The goal of minimizing the expected cost is achieved by sending multiple copies of a message through disjoint routes and thus increasing the probability of successful message delivery within the deadline. However, as the number of copies increases, the message traffic on the network increases, thereby increasing the delivery time for each of the copies. There is therefore a tradeoff between the number of copies of each message and the expected cost incurred as a result of messages missing their deadlines. The number of copies of each message to be sent is determined by optimizing this tradeoff. Simulation results for a hexagonal mesh and a hypercube topology indicate that the expected cost can be lowered substantially by the proposed scheme.

56 citations


Journal ArticleDOI
TL;DR: This paper describes some experiments tracing locality at a finer grain, looking at references to individual processes, and with fine-grained time resolution.
Abstract: Packets on a LAN can be viewed as a series of references to and from the objects they address. The amount of locality in this reference stream may be critical to the efficiency of network implementations, if the locality can be exploited through caching or scheduling mechanisms. Most previous studies have treated network locality with an addressing granularity of networks or individual hosts. This paper describes some experiments tracing locality at a finer grain, looking at references to individual processes, and with fine-grained time resolution. Observations of typical LANs show high per-process locality; that is, packets to a host usually arrive for the process that most recently sent a packet, and often with little intervening delay.

Journal ArticleDOI
TL;DR: This work describes the design, implementation, and performance of servers for a shared atomic object, a semiqueue, where each server employs either pessimistic or optimistic locking techniques on each conflicting event type, and compares the performance of a purely optimistic server, a purely pessimistic server, and a hybrid server to demonstrate the most appropriate environment.
Abstract: In many distributed systems concurrent access is required to a shared object, where abstract object servers may incorporate type-specific properties to define consistency requirements. Each operation and its outcome is treated as an event, and conflicts may occur between different event types. Hence concurrency control and synchronization are required at the granularity of conflicting event types. With such a fine granularity of locking, the occurrence of conflicts is likely to be lower than with whole-object locking, so optimistic techniques become more attractive.This work describes the design, implementation, and performance of servers for a shared atomic object, a semiqueue, where each server employs either pessimistic or optimistic locking techniques on each conflicting event type. We compare the performance of a purely optimistic server, a purely pessimistic server, and a hybrid server which treats certain event types optimistically and others pessimistically, to demonstrate the most appropriate environment for using pessimistic, optimistic, or hybrid control. We show that the advantages of low overhead on optimistic locking at low conflict levels is offset at higher conflict levels by the wasted work done by aborted transactions.To achieve optimum performance over the whole range of conflict levels, an adaptable server is required, whereby the treatment of conflicting event types can be changed dynamically between optimistic and pessimistic, according to various criteria depending on the expected frequency of conflict.We describe our implementations of adaptable servers which may allocate concurrency control strategy on the basis of state information, the history of conflicts encountered, or by using preset transaction priorities.We show that the adaptable servers perform almost as well as the best of the purely optimistic, pessimistic, or hybrid servers under the whole range of conflict levels, showing the versatility and efficiency of the dynamic servers.Finally we outline a general design methodology for implementing adaptable concurrency control in servers for atomic objects, illustrated using an atomic shared B-tree.

Journal ArticleDOI
TL;DR: A single-stage combining network to handle synchronization traffic, which is separated from the regular memory traffic, is proposed and shown to give good performance at a lower cost than a multistage combining network.
Abstract: In large multiprocessor systems, fast synchronization is crucial for high performance. However, synchronization traffic tends to create “hot-spots” in shared memory and cause network congestion. Multistage shuffle-exchange networks have been proposed and built to handle synchronization traffic. Software combining schemes have also been proposed to relieve network congestion caused by hot-spots. However, multistage combining networks could be very expensive and software combining could be very slow.In this paper, we propose a single-stage combining network to handle synchronization traffic, which is separated from the regular memory traffic. A single-stage combining network has several advantages: (1) it is attractive from an implementation perspective because only one stage is needed(instead of log N stages); (2) Only one network is needed to handle both forward and returning requests; (3) combined requests are distributed evenly through the network—the wait buffer size is reduced; and (4) fast-finishing algorithms [30] can be used to shorten the network delay.Because of all these advantages, we show that a single-stage combining network gives good performance at a lower cost than a multistage combining network.