Showing papers presented at "ACM/IFIP/USENIX international conference on Middleware in 2010"

PDF

Open Access

Proceedings Article•DOI•

FLEX: a slot allocation scheduling optimizer for MapReduce workloads

[...]

Joel L. Wolf¹, Deepak Rajan¹, Kirsten W. Hildrum¹, Rohit Khandekar¹, Vineet Kumar¹, Sujay Parekh¹, Kun-Lung Wu¹, Andrey Balmin¹ - Show less +4 more•Institutions (1)

IBM¹

29 Nov 2010

TL;DR: The mathematical basis for flex, a different, flexible scheduling allocation scheme that can be regarded as an add-on module that works synergistically with hfs, is described and compared with fifo and hfs in a variety of experiments.

...read moreread less

Abstract: Originally, MapReduce implementations such as Hadoop employed First In First Out (fifo) scheduling, but such simple schemes cause job starvation. The Hadoop Fair Scheduler (hfs) is a slot-based MapReduce scheme designed to ensure a degree of fairness among the jobs, by guaranteeing each job at least some minimum number of allocated slots. Our prime contribution in this paper is a different, flexible scheduling allocation scheme, known as flex. Our goal is to optimize any of a variety of standard scheduling theory metrics (response time, stretch, makespan and Service Level Agreements (slas), among others) while ensuring the same minimum job slot guarantees as in hfs, and maximum job slot guarantees as well. The flex allocation scheduler can be regarded as an add-on module that works synergistically with hfs. We describe the mathematical basis for flex, and compare it with fifo and hfs in a variety of experiments.

...read moreread less

159 citations

Proceedings Article•DOI•

The GOSSPLE anonymous social network

[...]

Marin Bertier¹, Davide Frey², Rachid Guerraoui³, Anne-Marie Kermarrec², Vincent Leroy¹ - Show less +1 more•Institutions (3)

Intelligence and National Security Alliance¹, French Institute for Research in Computer Science and Automation², École Polytechnique Fédérale de Lausanne³

29 Nov 2010

TL;DR: Gossple as discussed by the authors is a gossip protocol for anonymous social acquaintances, which can be used to enhance navigation in Web 2.0 collaborative applications, such as LastFM and Delicious.

...read moreread less

Abstract: While social networks provide news from old buddies, you can learn a lot more from people you do not know, but with whom you share many interests. We show in this paper how to build a network of anonymous social acquaintances using a gossip protocol we call Gossple, and how to leverage such a network to enhance navigation within Web 2.0 collaborative applications, a la LastFM and Delicious. Gossple nodes (users) periodically gossip digests of their interest profiles and compute their distances (in terms of interest) with respect to other nodes. This is achieved with little bandwidth and storage, fast convergence, and without revealing which profile is associated with which user. We evaluate Gossple on real traces from various Web 2.0 applications with hundreds of PlanetLab hosts and thousands of simulated nodes.

...read moreread less

88 citations

Proceedings Article•DOI•

Prometheus: user-controlled P2P social data management for socially-aware applications

[...]

Nicolas Kourtellis¹, Joshua Finnis¹, Paul Anderson¹, Jeremy Blackburn¹, Cristian Borcea², Adriana Iamnitchi¹ - Show less +2 more•Institutions (2)

University of South Florida¹, New Jersey Institute of Technology²

29 Nov 2010

TL;DR: Prometheus is socially-aware: it allows users to select peers that manage their social information based on social trust and exploits naturally-formed social groups for improved performance, and it showed that the social-based mapping of users onto peers improves the service response time and high service availability is achieved with low overhead.

...read moreread less

Abstract: Recent Internet applications, such as online social networks and user-generated content sharing, produce an unprecedented amount of social information, which is further augmented by location or collocation data collected from mobile phones. Unfortunately, this wealth of social information is fragmented across many different proprietary applications. Combined, it could provide a more accurate representation of the social world, and it could enable a whole new set of socially-aware applications.We introduce Prometheus, a peer-to-peer service that collects and manages social information from multiple sources and implements a set of social inference functions while enforcing user-defined access control policies. Prometheus is socially-aware: it allows users to select peers that manage their social information based on social trust and exploits naturally-formed social groups for improved performance. We tested our Prometheus prototype on PlanetLab and built a mobile social application to test the performance of its social inference functions under realtime constraints. We showed that the social-based mapping of users onto peers improves the service response time and high service availability is achieved with low overhead.

...read moreread less

57 citations

Proceedings Article•DOI•

BrownMap: enforcing power budget in shared data centers

[...]

Akshat Verma¹, Pradipta De¹, Vijay Mann¹, Tapan K. Nayak¹, Amit Purohit¹, Gargi B. Dasgupta¹, Ravi Kothari¹ - Show less +3 more•Institutions (1)

IBM¹

29 Nov 2010

TL;DR: The BrownMap methodology is presented that is able to ensure that data centers can deal both with outages that reduce the available power or with surges in workload, while meeting a power budget for shared data centers.

...read moreread less

Abstract: In this work, we investigate mechanisms to ensure that a shared data center can operate within a power budget, while optimizing a global objective function(e.g., maximize the overall revenue earned by the provider). We present the BrownMap methodology that is able to ensure that data centers can deal both with outages that reduce the available power or with surges in workload. BrownMap uses automatic VM resizing and Live Migration technologies to ensure that overall revenue of the provider is maximized, while meeting the budget. We implement BrownMap on an IBM Power6 cluster and study its effectiveness using a trace-driven evaluation of a real workload. Both theoretical and experimental evidence are presented that establish the efficacy of BrownMap to optimize a global objective, while meeting a power budget for shared data centers.

...read moreread less

49 citations

Proceedings Article•DOI•

Asynchronous lease-based replication of software transactional memory

[...]

Nuno Carvalho¹, Paolo Romano¹, Luís Rodrigues¹•Institutions (1)

INESC-ID¹

29 Nov 2010

TL;DR: Asynchronous Lease Certification (ALC), an innovative STM replication scheme that exploits the notion of asynchronous lease to reduce the replica coordination overhead and shelter transactions from repeated abortions due to conflicts originated on remote nodes is presented.

...read moreread less

Abstract: Software Transactional Memory (STM) systems have emerged as a powerful middleware paradigm for parallel programming. At current date, however, the problem of how to leverage replication to enhance dependability and scalability of STMs is still largely unexplored. In this paper we present Asynchronous Lease Certification (ALC), an innovative STM replication scheme that exploits the notion of asynchronous lease to reduce the replica coordination overhead and shelter transactions from repeated abortions due to conflicts originated on remote nodes. These features allow ALC to achieve up to a tenfold reduction of the commit latency phase in scenarios of low contention when compared with state of the art fault-tolerant replication schemes, and to boost the throughput of longruning transactions by a 4x factor in high conflict scenarios.

...read moreread less

41 citations

Proceedings Article•DOI•

Enforcing end-to-end application security in the cloud (big ideas paper)

[...]

Jean Bacon¹, David A. Evans¹, David Eyers¹, Matteo Migliavacca², Peter Pietzuch², Brian Shand - Show less +2 more•Institutions (2)

University of Cambridge¹, Imperial College London²

29 Nov 2010

TL;DR: In this paper, the authors propose a principled approach to designing and deploying end-to-end secure, distributed software by means of thorough, relentless tagging of the security meaning of data, analogous to what is already done for data types.

...read moreread less

Abstract: Security engineering must be integrated with all stages of application specification and development to be effective. Doing this properly is increasingly critical as organisations rush to offload their software services to cloud providers. Service-level agreements (SLAs) with these providers currently focus on performance-oriented parameters, which runs the risk of exacerbating an impedance mismatch with the security middleware. Not only do we want cloud providers to isolate each of their clients from others, we also want to have means to isolate components and users within each client's application.We propose a principled approach to designing and deploying end-to-end secure, distributed software by means of thorough, relentless tagging of the security meaning of data, analogous to what is already done for data types. The aim is to guarantee that---above a small trusted code base---data cannot be leaked by buggy or malicious software components. This is crucial for cloud infrastructures, in which the stored data and hosted services all have different owners whose interests are not aligned (and may even be in competition). We have developed data tagging schemes and enforcement techniques that can help form the aforementioned trusted code base. Our big idea---cloud-hosted services that have end-to-end information flow control---preempts worries about security and privacy violations retarding the evolution of large-scale cloud computing.

...read moreread less

39 citations

Proceedings Article•DOI•

Parametric subscriptions for content-based publish/subscribe networks

[...]

K. R. Jayaram¹, Chamikara Jayalath¹, Patrick Eugster¹•Institutions (1)

Purdue University¹

29 Nov 2010

TL;DR: This paper proposes novel algorithms for updating routing mechanisms effectively and efficiently in classic CPS broker overlay networks and significantly improves the reaction time to subscription updates and can sustain higher throughput in the presence of high update rates.

...read moreread less

Abstract: Subscription adaptations are becoming increasingly important across many content-based publish/subscribe (CPS) applications In algorithmic high frequency trading, for instance, stock price thresholds that are of interest to a trader change rapidly, and gains directly hinge on the reaction time to relevant fluctuations The common solution to adapt a subscription consists of a re-subscription, where a new subscription is issued and the superseded one canceled This is ineffective, leading to missed or duplicate events during the transition In this paper, we introduce the concept of parametric subscriptions to support subscription adaptations We propose novel algorithms for updating routing mechanisms effectively and efficiently in classic CPS broker overlay networks Compared to re-subscriptions, our algorithms significantly improve the reaction time to subscription updates and can sustain higher throughput in the presence of high update rates We convey our claims through implementations of our algorithms in two CPS systems, and by evaluating them on two different real-world applications

...read moreread less

38 citations

Proceedings Article•DOI•

LiFTinG: lightweight freerider-tracking in gossip

[...]

Rachid Guerraoui¹, Kévin Huguenin², Anne-Marie Kermarrec³, Maxime Monod¹, Swagatika Prusty⁴ - Show less +1 more•Institutions (4)

École Polytechnique Fédérale de Lausanne¹, University of Rennes², French Institute for Research in Computer Science and Automation³, Indian Institute of Technology Guwahati⁴

29 Nov 2010

TL;DR: L FTinG is presented, the first protocol to detect freeriders, including colluding ones, in gossip-based content dissemination systems with asymmetric data exchanges, and a methodology to set the parameters of LiFTinG based on a theoretical analysis is presented.

...read moreread less

Abstract: This paper presents LiFTinG, the first protocol to detect freeriders, including colluding ones, in gossip-based content dissemination systems with asymmetric data exchanges. LiFTinG relies on nodes tracking abnormal behaviors by cross-checking the history of their previous interactions, and exploits the fact that nodes pick neighbors at random to prevent colluding nodes from covering up each others' bad actions.We present a methodology to set the parameters of LiFTinG based on a theoretical analysis. In addition to simulations, we report on the deployment of LiFTinG on PlanetLab. In a 300-node system, where a stream of 674kbps is broadcast, LiFTinG incurs a maximum overhead of only 8% while providing good results: for instance, with 10% of freeriders decreasing their contribution by 30%, LiFTinG detects 86% of the freeriders after only 30 seconds and wrongfully expels only a few honest nodes.

...read moreread less

34 citations

Proceedings Article•DOI•

Adapting distributed real-time and embedded pub/sub middleware for cloud computing environments

[...]

Joe Hoffert¹, Douglas C. Schmidt¹, Aniruddha Gokhale¹•Institutions (1)

Vanderbilt University¹

29 Nov 2010

TL;DR: Results show how supervised machine learning can configure DRE pub/sub middleware adaptively in < 10 μsec with bounded time complexity to support key QoS reliability and latency requirements.

...read moreread less

Abstract: Enterprise distributed real-time and embedded (DRE) publish/subscribe (pub/sub) systems manage resources and data that are vital to users. Cloud computing---where computing resources are provisioned elastically and leased as a service---is an increasingly popular deployment paradigm. Enterprise DRE pub/sub systems can leverage cloud computing provisioning services to execute needed functionality when on-site computing resources are not available. Although cloud computing provides flexible on-demand computing and networking resources, enterprise DRE pub/sub systems often cannot accurately characterize their behavior a priori for the variety of resource configurations cloud computing supplies (e.g., CPU and network bandwidth), which makes it hard for DRE systems to leverage conventional cloud computing platforms.This paper provides two contributions to the study of how autonomic configuration of DRE pub/sub middleware can provision and use on-demand cloud resources effectively. We first describe how supervised machine learning can configure DRE pub/sub middleware services and transport protocols autonomically to support end-to-end quality-of-service (QoS) requirements based on cloud computing resources. We then present results that empirically validate how computing and networking resources affect enterprise DRE pub/sub system QoS. These results show how supervised machine learning can configure DRE pub/sub middleware adaptively in

...read moreread less

24 citations

Proceedings Article•DOI•

Bridging the gap between legacy services and web services

[...]

Tegawendé F. Bissyandé¹, Laurent Réveillère¹, Yérom-David Bromberg¹, Julia Lawall², Gilles Muller³ - Show less +1 more•Institutions (3)

University of Bordeaux¹, University of Copenhagen², French Institute for Research in Computer Science and Automation³

29 Nov 2010

TL;DR: The Janus domain-specific language is designed, which provides developers with a high-level way to describe the operations that are required to encapsulate legacy service functionalities and shows preliminary experiments show that Janus-based WS wrappers have performance comparable to manually written wrappers.

...read moreread less

Abstract: Web Services is an increasingly used instantiation of Service-Oriented Architectures (SOA) that relies on standard Internet protocols to produce services that are highly interoperable. Other types of services, relying on legacy application layer protocols, however, cannot be composed directly. A promising solution is to implement wrappers to translate between the application layer protocols and the WS protocol. Doing so manually, however, requires a high level of expertise, in the relevant application layer protocols, in low-level network and system programming, and in the Web Service paradigm itself.In this paper, we introduce a generative language based approach for constructing wrappers to facilitate the migration of legacy service functionalities to Web Services. To this end, we have designed the Janus domain-specific language, which provides developers with a high-level way to describe the operations that are required to encapsulate legacy service functionalities. We have successfully used Janus to develop a number of wrappers, including wrappers for IMAP and SMTP servers, for a RTSP-compliant media server and for UPnP service discovery. Preliminary experiments show that Janus-based WS wrappers have performance comparable to manually written wrappers.

...read moreread less

19 citations

Proceedings Article•DOI•

Anonygator: privacy and integrity preserving data aggregation

[...]

Krishna P. N. Puttaswamy¹, Ranjita Bhagwan², Venkata N. Padmanabhan²•Institutions (2)

University of California, Santa Barbara¹, Microsoft²

29 Nov 2010

TL;DR: Anonygator uses anonymous routing to provide user anonymity by disassociating messages from the hosts that generated them and maintains overall system scalability by employing a novel distributed tree-based data aggregation procedure that is robust to pollution attacks.

...read moreread less

Abstract: Data aggregation is a key aspect of many distributed applications, such as distributed sensing, performance monitoring, and distributed diagnostics. In such settings, user anonymity is a key concern of the participants. In the absence of an assurance of anonymity, users may be reluctant to contribute data such as their location or configuration settings on their computer.In this paper, we present the design, analysis, implementation, and evaluation of Anonygator, an anonymity-preserving data aggregation service for large-scale distributed applications. Anonygator uses anonymous routing to provide user anonymity by disassociating messages from the hosts that generated them. It prevents malicious users from uploading disproportionate amounts of spurious data by using a light-weight accounting scheme. Finally, Anonygator maintains overall system scalability by employing a novel distributed tree-based data aggregation procedure that is robust to pollution attacks. All of these components are tuned by a customization tool, with a view to achieve specific anonymity, pollution resistance, and efficiency goals. We have implemented Anonygator as a service and have used it to prototype three applications, one of which we have evaluated on PlanetLab. The other two have been evaluated on a local testbed.

...read moreread less

Proceedings Article•DOI•

Distributed middleware enforcement of event flow security policy

[...]

Matteo Migliavacca¹, Ioannis Papagiannis¹, David Eyers², Brian Shand³, Jean Bacon², Peter Pietzuch¹ - Show less +2 more•Institutions (3)

Imperial College London¹, University of Cambridge², National Health Service³

29 Nov 2010

TL;DR: D DEFCon-Policy is described, a middleware that enforces security policy in multi-domain, event-driven applications and can provide global security guarantees without burdening application developers.

...read moreread less

Abstract: Distributed, event-driven applications that process sensitive user data and involve multiple organisational domains must comply with complex security requirements. Ideally, developers want to express security policy for such applications in data-centric terms, controlling the flow of information throughout the system. Current middleware does not support the specification of such end-to-end security policy and lacks uniform mechanisms for enforcement.We describe DEFCon-Policy, a middleware that enforces security policy in multi-domain, event-driven applications. Event flow policy is expressed in a high-level language that specifies permitted flows between distributed software components. The middleware limits the interaction of components based on the policy and the data that components have observed. It achieves this by labelling data and assigning privileges to components. We evaluate DEFCon-Policy in a realistic medical scenario and demonstrate that it can provide global security guarantees without burdening application developers.

...read moreread less

Proceedings Article•DOI•

FaReCast: fast, reliable application layer multicast for flash dissemination

[...]

Kyungbaek Kim¹, Sharad Mehrotra¹, Nalini Venkatasubramanian¹•Institutions (1)

University of California, Irvine¹

29 Nov 2010

TL;DR: A forest-based M2M (Multiple parents-To-Multiple children) ALM structure where every node has multiple children and multiple parents is designed, to enable lower dissemination latency through multiple children, while enabling higher reliability through multiple parents.

...read moreread less

Abstract: To disseminate messages from a single source to a large number of targeted receivers, a natural approach is the tree-based application layer multicast (ALM). However, in time-constrained flash dissemination scenarios, e.g. earthquake early warning, where time is of the essence, the tree-based ALM has a single point of failure; its reliable extensions using ack-based failure recovery protocols cannot support reliable dissemination in the timeframe needed. In this paper, we exploit path diversity, i.e. exploit the use of multiple data paths, to achieve fast and reliable data dissemination. First, we design a forest-based M2M (Multiple parents-To-Multiple children) ALM structure where every node has multiple children and multiple parents. The intuition is to enable lower dissemination latency through multiple children, while enabling higher reliability through multiple parents. Second, we design multidirectional multicasting algorithms that effectively utilize the multiple data paths in the M2M ALM structure. A key aspect of our reliable dissemination mechanism is that nodes, in addition to communicating the data to children, also selectively disseminate the data to parents and siblings. As compared to trees using traditional multicasting algorithm, we observe an 80% improvement in reliability under 20% of failed nodes with no significant increase in latency for over 99% of the nodes.

...read moreread less

Proceedings Article•DOI•

PerPos: a translucent positioning middleware supporting adaptation of internal positioning processes

[...]

Jakob Langdal, Kari R. Schougaard¹, Mikkel Baun Kjærgaard¹, Thomas Toftkjær•Institutions (1)

Aarhus University¹

29 Nov 2010

TL;DR: In this paper, the authors propose a positioning middleware named PerPos that is translucent and adaptable, i.e., it supports both high and low-level interaction, and they extend the internal position processing of the middleware with functionality supporting probabilistic position tracking and strategies for minimization of the energy consumption.

...read moreread less

Abstract: A positioning middleware benefits the development of location aware applications. Traditionally, positioning middleware provides position transparency in the sense that it hides low-level details. However, many applications require access to specific details of the usually hidden positioning process. To address this problem this paper proposes a positioning middleware named PerPos that is translucent and adaptable, i.e., it supports both high- and low-level interaction. The PerPos middleware provides translucency with respect to the positioning process and allows programmatic definition of application specific features that can be applied to the internal position processing of the middleware. To evaluate these capabilities we extend the internal position processing of the middleware with functionality supporting probabilistic position tracking and strategies for minimization of the energy consumption. The result of the evaluation is that using only the proposed capabilities we can, in a structured manner, extend the internal positioning processing.

...read moreread less

Proceedings Article•DOI•

Automatically generating symbolic prefetches for distributed transactional memories

[...]

Alokika Dash¹, Brian Demsky¹•Institutions (1)

University of California, Irvine¹

29 Nov 2010

TL;DR: The results show that symbolic prefetching combined with caching can eliminate an average of 87% of remote reads, and the approach was designed to hide the latency of accessing remote objects in distributed transactional memory and a wide range of distributed object middleware frameworks.

...read moreread less

Abstract: Developing efficient distributed applications while managing complexity can be challenging. Managing network latency is a key challenge for distributed applications. We propose a new approach to prefetching, symbolic prefetching, that can prefetch remote objects before their addresses are known. Our approach was designed to hide the latency of accessing remote objects in distributed transactional memory and a wide range of distributed object middleware frameworks. We present a static compiler analysis for the automatic generation of symbolic prefetches--- symbolic prefetches allow objects whose addresses are unknown to be prefetched.We evaluate this prefetching mechanism in the context of a middleware framework for distributed transactional memory. Our evaluation includes microbench-marks, scientific benchmarks, and distributed benchmarks. Our results show that symbolic prefetching combined with caching can eliminate an average of 87% of remote reads. We measured speedups due to prefetching of up to 13.31 x for accessing arrays and 4.54× for accessing linked lists.

...read moreread less

Proceedings Article•DOI•

Kevlar: a flexible infrastructure for wide-area collaborative applications

[...]

Qi Huang¹, Daniel A. Freedman², Ymir Vigfusson³, Kenneth P. Birman², Bo Peng² - Show less +1 more•Institutions (3)

Huazhong University of Science and Technology¹, Cornell University², IBM³

29 Nov 2010

TL;DR: In this paper, the authors design and implement the Kevlar system, which uses an overarching network-overlay structure to integrate central hosted content with peer-to-peer multicast.

...read moreread less

Abstract: While Web Services ensure interoperability and extensibility for networked applications, they also complicate the deployment of highly collaborative systems, such as virtual reality environments and massively multiplayer online games. Quite simply, such systems often manifest a natural peer-to-peer structure. This conflicts with Web Services' imposition of a client-server communication model, vectoring all events through a data center and emerging as a performance bottleneck. We design and implement the Kevlar system to alleviate such choke points, using an overarching network-overlay structure to integrate central hosted content with peer-to-peer multicast. Kevlar leverages the given storage and communication models that best match the respective information: data most naturally retrieved from the cloud is managed using hosted objects, while edge updates are transmitted directly peer-to-peer using multicast. Here, we present the Kevlar architecture and a series of carefully controlled experiments to evaluate our implementation. We demonstrate Kevlar's successful and efficient support of deployments across wide-area networks and its adaptivity and resilience to firewalls, constrained network segments, and other peculiarities of local network policy.

...read moreread less

Proceedings Article•DOI•

Middleware for a re-configurable distributed archival store based on secret sharing

[...]

Shiva Chaitanya¹, Dharani Vijayakumar², Bhuvan Urgaonkar³, Anand Sivasubramaniam³•Institutions (3)

NetApp¹, VMware², Pennsylvania State University³

29 Nov 2010

TL;DR: It is demonstrated that FlexArchive can achieve dynamic data re-configurations in significantly lower times (factor of 50 or more) without any sacrifice in confidentiality and with a negligible loss in availability (less than 1%).

...read moreread less

Abstract: Modern storage systems are often faced with complex trade-offs between the confidentiality, availability, and performance they offer their users Secret sharing is a data encoding technique that provides information-theoretically provable guarantees on confidentiality unlike conventional encryption Additionally, secret sharing provides quantifiable guarantees on the availability of the encoded data We argue that these properties make secret sharing-based encoding of data particularly suitable for the design of increasingly popular and important distributed archival data stores These guarantees, however, come at the cost of increased resource consumption during reads/writes Consequently, it is desirable that such a storage system employ techniques that could dynamically transform data representation to operate the store within required confidentiality, availability, and performance regimes (or budgets) despite changes to the operating environment Since state-of-the-art transformation techniques suffer from prohibitive data transfer overheads, we develop a middleware for dynamic data transformation Using this, we propose the design and operation of a secure, available, and tunable distributed archival store called FlexArchive Using a combination of analysis and empirical evaluation, we demonstrate the feasibility of our archival store In particular, we demonstrate that FlexArchive can achieve dynamic data re-configurations in significantly lower times (factor of 50 or more) without any sacrifice in confidentiality and with a negligible loss in availability (less than 1%)

...read moreread less

Proceedings Article•DOI•

dFault: fault localization in large-scale peer-to-peer systems

[...]

Pawan Prakash¹, Ramana Rao Kompella¹, Venugopalan Ramasubramanian², Ranveer Chandra²•Institutions (2)

Purdue University¹, Microsoft²

29 Nov 2010

TL;DR: The design of dFault is described, and it is shown that it can accurately localize the root causes of faults with modest amount of information collected from individual nodes using a real prototype deployed over PlanetLab.

...read moreread less

Abstract: Distributed hash tables (DHTs) have been adopted as a building block for large-scale distributed systems. The upshot of this success is that their robust operation is even more important as mission-critical applications begin to be layered on them. Even though DHTs can detect and heal around unresponsive hosts and disconnected links, several hidden faults and performance bottlenecks go undetected, resulting in unanswered queries and delayed responses. In this paper, we propose dFault, a system that helps large-scale DHTs to localize such faults. Informed with a log of failed queries called symptoms and some available information about the hosts in the DHT, dFault identifies the potential root causes (hosts and overlay links) that with high likelihood contributed towards those symptoms. Its design is based on the recently proposed dependency graph modeling and inference approach for fault localization. We describe the design of dFault, and show that it can accurately localize the root causes of faults with modest amount of information collected from individual nodes using a real prototype deployed over PlanetLab.

...read moreread less

Proceedings Article•DOI•

A dynamic data middleware cache for rapidly-growing scientific repositories

[...]

Tanu Malik¹, Xiaodan Wang², Philip Little³, Amitabh Chaudhary³, Ani Thakar² - Show less +1 more•Institutions (3)

Purdue University¹, Johns Hopkins University², University of Notre Dame³

29 Nov 2010

TL;DR: Delta is presented, a dynamic data middleware cache system for rapidly-growing scientific repositories that adaptively decouples data objects and profiles incoming workload to search for optimal data decoupling that reduces network costs.

...read moreread less

Abstract: Modern scientific repositories are growing rapidly in size. Scientists are increasingly interested in viewing the latest data as part of query results. Current scientific middleware cache systems, however, assume repositories are static. Thus, they cannot answer scientific queries with the latest data. The queries, instead, are routed to the repository until data at the cache is refreshed. In data-intensive scientific disciplines, such as astronomy, indiscriminate query routing or data refreshing often results in runaway network costs. This severely affects the performance and scalability of the repositories and makes poor use of the cache system. We present Delta a dynamic data middleware cache system for rapidly-growing scientific repositories. Delta's key component is a decision framework that adaptively decouples data objects---choosing to keep some data object at the cache, when they are heavily queried, and keeping some data objects at the repository, when they are heavily updated. Our algorithm profiles incoming workload to search for optimal data decoupling that reduces network costs. It leverages formal concepts from the network flow problem, and is robust to evolving scientific workloads. We evaluate the efficacy of Delta, through a prototype implementation, by running query traces collected from a real astronomy survey.

...read moreread less