scispace - formally typeset
Search or ask a question

Showing papers by "Thomas Anderson published in 2014"


Proceedings ArticleDOI
06 Oct 2014
TL;DR: A new operating system, Arrakis, is designed and implemented that splits the traditional role of the kernel in two, allowing most I/O operations to skip the kernel entirely, while the kernel is re-engineered to provide network and disk protection without kernel mediation of every operation.
Abstract: Recent device hardware trends enable a new approach to the design of network server operating systems. In a traditional operating system, the kernel mediates access to device hardware by server applications, to enforce process isolation as well as network and disk security. We have designed and implemented a new operating system, Arrakis, that splits the traditional role of the kernel in two. Applications have direct access to virtualized I/O devices, allowing most I/O operations to skip the kernel entirely, while the kernel is re-engineered to provide network and disk protection without kernel mediation of every operation. We describe the hardware and software changes needed to take advantage of this new abstraction, and we illustrate its power by showing improvements of 2-5x in latency and 9x in throughput for a popular persistent NoSQL store relative to a well-tuned Linux implementation.

364 citations


Proceedings ArticleDOI
17 Aug 2014
TL;DR: Unlike efforts to redesign the Internet from scratch, it is shown that ARROW can address a set of well-known Internet vulnerabilities, for most users, with the adoption of only a single transit ISP.
Abstract: A longstanding problem with the Internet is that it is vulnerable to outages, black holes, hijacking and denial of service. Although architectural solutions have been proposed to address many of these issues, they have had difficulty being adopted due to the need for widespread adoption before most users would see any benefit. This is especially relevant as the Internet is increasingly used for applications where correct and continuous operation is essential. In this paper, we study whether a simple, easy to implement model is sufficient for addressing the aforementioned Internet vulnerabilities. Our model, called ARROW (Advertised Reliable Routing Over Waypoints), is designed to allow users to configure reliable and secure end to end paths through participating providers. With ARROW, a highly reliable ISP offers tunneled transit through its network, along with packet transformation at the ingress, as a service to remote paying customers. Those customers can stitch together reliable end to end paths through a combination of participating and non-participating ISPs in order to improve the fault-tolerance, robustness, and security of mission critical transmissions. Unlike efforts to redesign the Internet from scratch, we show that ARROW can address a set of well-known Internet vulnerabilities, for most users, with the adoption of only a single transit ISP. To demonstrate ARROW, we have added it to a small-scale wide-area ISP we control. We evaluate its performance and failure recovery properties in both simulation and live settings.

53 citations


Journal ArticleDOI
28 Jul 2014
TL;DR: Nebula provides resilient networking services using ultrareliable routers, an extensible control plane and use of multiple paths upon which arbitrary policies may be enforced, the entirety of which constitute resilience.
Abstract: Nebula is a proposal for a Future Internet Architecture. It is based on the assumptions that: (1) cloud computing will comprise an increasing fraction of the application workload offered to an Internet, and (2) that access to cloud computing resources will demand new architectural features from a network. Features that we have identified include dependability, security, flexibility and extensibility, the entirety of which constitute resilience. Nebula provides resilient networking services using ultrareliable routers, an extensible control plane and use of multiple paths upon which arbitrary policies may be enforced. We report on a prototype system, Zodiac, that incorporates these latter two features.

31 citations


Proceedings Article
17 Jun 2014
TL;DR: A radical re-architecture of the traditional operating system storage stack is proposed to move the kernel off the data path to dramatically reduce the CPU overhead of storage operations while improving application flexibility.
Abstract: We propose a radical re-architecture of the traditional operating system storage stack to move the kernel off the data path. Leveraging virtualized I/O hardware for disk and flash storage, most read and write I/O operations go directly to application code. The kernel dynamically allocates extents, manages the virtual to physical binding, and performs name translation. The benefit is to dramatically reduce the CPU overhead of storage operations while improving application flexibility.

16 citations


Proceedings ArticleDOI
25 Jun 2014
TL;DR: A new failure model, called Machine Fault Tolerance, and a new abstraction, a replicated write-once trusted table, are proposed to provide improved resilience to undetected CPU, memory, and disk errors at datacenter scale.
Abstract: Although rare in absolute terms, undetected CPU, memory, and disk errors occur often enough at datacenter scale to significantly affect overall system reliability and availability. In this paper, we propose a new failure model, called Machine Fault Tolerance, and a new abstraction, a replicated write-once trusted table, to provide improved resilience to these types of failures. Since most machine failures manifest in application server and operating system code, we assume a Byzantine model for those parts of the system. However, by assuming that the hypervisor and network are trustworthy, we are able to reduce the overhead of machine-fault masking to be close to that of non-Byzantine Paxos.

5 citations


Book ChapterDOI
07 May 2014
TL;DR: This article discusses the many alternatives that present themselves when designing a support system for threads on a shared-memory multiprocessor, and concludes with a brief survey of three contemporary thread management systems.
Abstract: Threads, or \lightweight processes," have become a common and necessary component of new languages and operating systems. Threads allow the programmer or compiler to express, create, and control parallel activities, contributing to the structure and performance of programs. In this article, we discuss the many alternatives that present themselves when designing a support system for threads on a shared-memory multiprocessor. These alternatives innuence the ease, granularity, and performance of parallel programming. We conclude with a brief survey of three contemporary thread management systems (Windows NT, Presto, and Multilisp), using them to illustrate the issues raised in this article.

3 citations