Author
Nuno Neves
Other affiliations: University of Illinois at Urbana–Champaign, Carnegie Mellon University, University of Évora
Bio: Nuno Neves is an academic researcher from University of Lisbon. The author has contributed to research in topics: Intrusion tolerance & Asynchronous communication. The author has an hindex of 35, co-authored 157 publications receiving 3531 citations. Previous affiliations of Nuno Neves include University of Illinois at Urbana–Champaign & Carnegie Mellon University.
Papers published on a yearly basis
Papers
More filters
TL;DR: The paper describes the fundamental concepts behind IT, tracing their connection with classical fault tolerance and security, and discusses the main strategies and mechanisms for architecting IT systems, and reports on recent advances on distributed IT system architectures.
Abstract: There is a significant body of research on distributed computing architectures, methodologies and algorithms, both in the fields of fault tolerance and security. Whilst they have taken separate paths until recently, the problems to be solved are of similar nature. In classical dependability, fault tolerance has been the workhorse of many solutions. Classical security-related work has on the other hand privileged, with few exceptions, intrusion prevention. Intrusion tolerance (IT) is a new approach that has slowly emerged during the past decade, and gained impressive momentum recently. Instead of trying to prevent every single intrusion, these are allowed, but tolerated: the system triggers mechanisms that prevent the intrusion from generating a system security failure. The paper describes the fundamental concepts behind IT, tracing their connection with classical fault tolerance and security. We discuss the main strategies and mechanisms for architecting IT systems, and report on recent advances on distributed IT system architectures.
202 citations
18 Oct 2004
TL;DR: This paper extends the normal asynchronous system with a special distributed oracle called TTCB to implement an intrusion-tolerant service based on the state machine approach with only 2f + 1 replicas, this is the first time the number of replicas is reduced.
Abstract: The application of dependability concepts and techniques to the design of secure distributed systems is raising a considerable amount of interest in both communities under the designation of intrusion tolerance. However, practical intrusion-tolerant replicated systems based on the state machine approach (SMA) can handle at most f Byzantine components out of a total of n = 3f + 1, which is the maximum resilience in asynchronous systems. This paper extends the normal asynchronous system with a special distributed oracle called TTCB. Using this extended system we manage to implement an intrusion-tolerant service based on the SMA with only 2f + 1 replicas. Albeit a few other papers in the literature present intrusion-tolerant services with this approach, this is the first time the number of replicas is reduced from 3f + 1 to 2f + 1. Another interesting characteristic of the described service is a low time complexity.
156 citations
TL;DR: The paper proves the equivalence between multi-valued consensus and atomic broadcast in the Byzantine failure model without signatures, and has theoretical relevance since they show once more that consensus is a fundamental problem in distributed systems.
Abstract: This paper proposes a stack of three Byzantine-resistant protocols aimed to be used in practical distributed systems: multi-valued consensus, vector consensus and atomic broadcast. These protocols are designed as successive transformations from one to another. The first protocol, multi-valued consensus, is implemented on top of a randomized binary consensus and a reliable broadcast protocol. The protocols share a set of important structural properties. First, they do not use digital signatures constructed with public-key cryptography, a well-known performance bottleneck in this kind of protocols. Second, they are time-free, i.e. they make no synchrony assumptions, since these assumptions are often vulnerable to subtle but effective attacks. Third, they are completely decentralized, thus avoiding the cost of detecting corrupt leaders. Fourth, they have optimal resilience, i.e. they tolerate the failure of f = ⌊(n-1)/3⌋ out of a total of n processes. In terms of time complexity, the multi-valued consensus protocol terminates in a constant expected number of rounds, while the vector consensus and atomic broadcast protocols have O(f) complexity. The paper also proves the equivalence between multi-valued consensus and atomic broadcast in the Byzantine failure model without signatures. A similar proof is given for the equivalence between multi-valued consensus and vector consensus. These two results have theoretical relevance since they show once more that consensus is a fundamental problem in distributed systems.
151 citations
TL;DR: A proactive-reactive recovery service based on a hybrid distributed system model is designed and it is shown how this service can effectively be used to increase the resilience of an intrusion-tolerant firewall adequate for the protection of critical infrastructures.
Abstract: In the past, some research has been done on how to use proactive recovery to build intrusion-tolerant replicated systems that are resilient to any number of faults, as long as recoveries are faster than an upper bound on fault production assumed at system deployment time. In this paper, we propose a complementary approach that enhances proactive recovery with additional reactive mechanisms giving correct replicas the capability of recovering other replicas that are detected or suspected of being compromised. One key feature of our proactive-reactive recovery approach is that, despite recoveries, it guarantees the availability of a minimum number of system replicas necessary to sustain correct operation of the system. We design a proactive-reactive recovery service based on a hybrid distributed system model and show, as a case study, how this service can effectively be used to increase the resilience of an intrusion-tolerant firewall adequate for the protection of critical infrastructures.
148 citations
TL;DR: A new checkpoint protocol that is well adapted to mobile environments is described, which uses time to indirectly coordinate the creation of new global states, avoiding all message exchanges.
Abstract: Mobile computing allows ubiquitous and continuous access to computing resources while the users travel or work at a client’s site. The flexibility introduced by mobile computing brings new challenges to the area of fault tolerance. Failures that were rare with fixed hosts become common, and host disconnection makes fault detection and message coordination difficult. This paper describes a new checkpoint protocol that is well adapted to mobile environments. The protocol uses time to indirectly coordinate the creation of new global states, avoiding all message exchanges. The protocol uses two different types of checkpoints to adapt to the current network characteristics, and to trade off performance with recovery time.
117 citations
Cited by
More filters
Journal Article•
9,185 citations
01 Jan 2015
TL;DR: This paper presents an in-depth analysis of the hardware infrastructure, southbound and northbound application programming interfaces (APIs), network virtualization layers, network operating systems (SDN controllers), network programming languages, and network applications, and presents the key building blocks of an SDN infrastructure using a bottom-up, layered approach.
Abstract: The Internet has led to the creation of a digital society, where (almost) everything is connected and is accessible from anywhere. However, despite their widespread adoption, traditional IP networks are complex and very hard to manage. It is both difficult to configure the network according to predefined policies, and to reconfigure it to respond to faults, load, and changes. To make matters even more difficult, current networks are also vertically integrated: the control and data planes are bundled together. Software-defined networking (SDN) is an emerging paradigm that promises to change this state of affairs, by breaking vertical integration, separating the network's control logic from the underlying routers and switches, promoting (logical) centralization of network control, and introducing the ability to program the network. The separation of concerns, introduced between the definition of network policies, their implementation in switching hardware, and the forwarding of traffic, is key to the desired flexibility: by breaking the network control problem into tractable pieces, SDN makes it easier to create and introduce new abstractions in networking, simplifying network management and facilitating network evolution. In this paper, we present a comprehensive survey on SDN. We start by introducing the motivation for SDN, explain its main concepts and how it differs from traditional networking, its roots, and the standardization activities regarding this novel paradigm. Next, we present the key building blocks of an SDN infrastructure using a bottom-up, layered approach. We provide an in-depth analysis of the hardware infrastructure, southbound and northbound application programming interfaces (APIs), network virtualization layers, network operating systems (SDN controllers), network programming languages, and network applications. We also look at cross-layer problems such as debugging and troubleshooting. In an effort to anticipate the future evolution of this new paradigm, we discuss the main ongoing research efforts and challenges of SDN. In particular, we address the design of switches and control platforms—with a focus on aspects such as resiliency, scalability, performance, security, and dependability—as well as new opportunities for carrier transport networks and cloud providers. Last but not least, we analyze the position of SDN as a key enabler of a software-defined environment.
3,589 citations
Posted Content•
TL;DR: Software-Defined Networking (SDN) as discussed by the authors is an emerging paradigm that promises to change this state of affairs, by breaking vertical integration, separating the network's control logic from the underlying routers and switches, promoting (logical) centralization of network control, and introducing the ability to program the network.
Abstract: Software-Defined Networking (SDN) is an emerging paradigm that promises to change this state of affairs, by breaking vertical integration, separating the network's control logic from the underlying routers and switches, promoting (logical) centralization of network control, and introducing the ability to program the network. The separation of concerns introduced between the definition of network policies, their implementation in switching hardware, and the forwarding of traffic, is key to the desired flexibility: by breaking the network control problem into tractable pieces, SDN makes it easier to create and introduce new abstractions in networking, simplifying network management and facilitating network evolution. In this paper we present a comprehensive survey on SDN. We start by introducing the motivation for SDN, explain its main concepts and how it differs from traditional networking, its roots, and the standardization activities regarding this novel paradigm. Next, we present the key building blocks of an SDN infrastructure using a bottom-up, layered approach. We provide an in-depth analysis of the hardware infrastructure, southbound and northbound APIs, network virtualization layers, network operating systems (SDN controllers), network programming languages, and network applications. We also look at cross-layer problems such as debugging and troubleshooting. In an effort to anticipate the future evolution of this new paradigm, we discuss the main ongoing research efforts and challenges of SDN. In particular, we address the design of switches and control platforms -- with a focus on aspects such as resiliency, scalability, performance, security and dependability -- as well as new opportunities for carrier transport networks and cloud providers. Last but not least, we analyze the position of SDN as a key enabler of a software-defined environment.
1,968 citations
TL;DR: This survey covers rollback-recovery techniques that do not require special language constructs and distinguishes between checkpoint-based and log-based protocols, which rely solely on checkpointing for system state restoration.
Abstract: This survey covers rollback-recovery techniques that do not require special language constructs. In the first part of the survey we classify rollback-recovery protocols into checkpoint-based and log-based.Checkpoint-based protocols rely solely on checkpointing for system state restoration. Checkpointing can be coordinated, uncoordinated, or communication-induced. Log-based protocols combine checkpointing with logging of nondeterministic events, encoded in tuples called determinants. Depending on how determinants are logged, log-based protocols can be pessimistic, optimistic, or causal. Throughout the survey, we highlight the research issues that are at the core of rollback-recovery and present the solutions that currently address them. We also compare the performance of different rollback-recovery protocols with respect to a series of desirable properties and discuss the issues that arise in the practical implementations of these protocols.
1,772 citations
Book•
30 Apr 1996TL;DR: Technical foundations introduction software reliability and system reliability the operational profile software reliability modelling survey model evaluation and recalibration techniques practices and experiences and best current practice of SRE software reliability measurement experience.
Abstract: Technical foundations introduction software reliability and system reliability the operational profile software reliability modelling survey model evaluation and recalibration techniques practices and experiences best current practice of SRE software reliability measurement experience measurement-based analysis of software reliability software fault and failure classification techniques trend analysis in validation and maintenance software reliability and field data analysis software reliability process assessment emerging techniques software reliability prediction metrics software reliability and testing fault-tolerant SRE software reliability using fault trees software reliability process simulation neural networks and software reliability. Appendices: software reliability tools software failure data set repository.
1,068 citations