scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Computers in 2013"


Journal ArticleDOI
TL;DR: This paper proposes a mechanism that combines data deduplication with dynamic data operations in the privacy preserving public auditing for secure cloud storage and shows that the proposed mechanism is highly efficient and provably secure.
Abstract: Using cloud storage, users can remotely store their data and enjoy the on-demand high-quality applications and services from a shared pool of configurable computing resources, without the burden of local data storage and maintenance. However, the fact that users no longer have physical possession of the outsourced data makes the data integrity protection in cloud computing a formidable task, especially for users with constrained computing resources. Moreover, users should be able to just use the cloud storage as if it is local, without worrying about the need to verify its integrity. Thus, enabling public auditability for cloud storage is of critical importance so that users can resort to a third-party auditor (TPA) to check the integrity of outsourced data and be worry free. To securely introduce an effective TPA, the auditing process should bring in no new vulnerabilities toward user data privacy, and introduce no additional online burden to user. In this paper, we propose a secure cloud storage system supporting privacy-preserving public auditing. We further extend our result to enable the TPA to perform audits for multiple users simultaneously and efficiently. Extensive security and performance analysis show the proposed schemes are provably secure and highly efficient. Our preliminary experiment conducted on Amazon EC2 instance further demonstrates the fast performance of the design.

982 citations


Journal ArticleDOI
TL;DR: Three of the principal axioms of parallel machine design (memory coherence, synchronicity, and determinism) have been discarded in the design without, surprisingly, compromising the ability to perform meaningful computations.
Abstract: SpiNNaker (a contraction of Spiking Neural Network Architecture) is a million-core computing engine whose flagship goal is to be able to simulate the behavior of aggregates of up to a billion neurons in real time. It consists of an array of ARM9 cores, communicating via packets carried by a custom interconnect fabric. The packets are small (40 or 72 bits), and their transmission is brokered entirely by hardware, giving the overall engine an extremely high bisection bandwidth of over 5 billion packets/s. Three of the principal axioms of parallel machine design (memory coherence, synchronicity, and determinism) have been discarded in the design without, surprisingly, compromising the ability to perform meaningful computations. A further attribute of the system is the acknowledgment, from the initial design stages, that the sheer size of the implementation will make component failures an inevitable aspect of day-to-day operation, and fault detection and recovery mechanisms have been built into the system at many levels of abstraction. This paper describes the architecture of the machine and outlines the underlying design philosophy; software and applications are to be described in detail elsewhere, and only introduced in passing here as necessary to illuminate the description.

619 citations


Journal ArticleDOI
TL;DR: New metrics are proposed for evaluating the reliability as well as the power efficiency of approximate and probabilistic adders and it is shown that the MED is an effective metric for measuring the implementation accuracy of a multiple-bit adder and that the NED is a nearly invariant metric independent of the size of an adder.
Abstract: Addition is a fundamental function in arithmetic operation; several adder designs have been proposed for implementations in inexact computing. These adders show different operational profiles; some of them are approximate in nature while others rely on probabilistic features of nanoscale circuits. However, there has been a lack of appropriate metrics to evaluate the efficacy of various inexact designs. In this paper, new metrics are proposed for evaluating the reliability as well as the power efficiency of approximate and probabilistic adders. Reliability is analyzed using the so-called sequential probability transition matrices (SPTMs). Error distance (ED) is initially defined as the arithmetic distance between an erroneous output and the correct output for a given input. The mean error distance (MED) and normalized error distance (NED) are then proposed as unified figures that consider the averaging effect of multiple inputs and the normalization of multiple-bit adders. It is shown that the MED is an effective metric for measuring the implementation accuracy of a multiple-bit adder and that the NED is a nearly invariant metric independent of the size of an adder. The MED is, therefore, useful in assessing the effectiveness of an approximate or probabilistic adder implementation, while the NED is useful in characterizing the reliability of a specific design. Since inexact adders are often used for saving power, the product of power and NED is further utilized for evaluating the tradeoffs between power consumption and precision. Although illustrated using adders, the proposed metrics are potentially useful in assessing other arithmetic circuit designs for applications of inexact computing.

453 citations


Journal ArticleDOI
TL;DR: Two asynchronous Byzantine fault-tolerant state machine replication (BFT) algorithms, which improve previous algorithms in terms of several metrics, and can have better throughput than Castro and Liskov's PBFT, and better latency in networks with nonnegligible communication delays.
Abstract: We present two asynchronous Byzantine fault-tolerant state machine replication (BFT) algorithms, which improve previous algorithms in terms of several metrics. First, they require only 2f+1 replicas, instead of the usual 3f+1. Second, the trusted service in which this reduction of replicas is based is quite simple, making a verified implementation straightforward (and even feasible using commercial trusted hardware). Third, in nice executions the two algorithms run in the minimum number of communication steps for nonspeculative and speculative algorithms, respectively, four and three steps. Besides the obvious benefits in terms of cost, resilience and management complexity-fewer replicas to tolerate a certain number of faults-our algorithms are simpler than previous ones, being closer to crash fault-tolerant replication algorithms. The performance evaluation shows that, even with the trusted component access overhead, they can have better throughput than Castro and Liskov's PBFT, and better latency in networks with nonnegligible communication delays.

310 citations


Journal ArticleDOI
TL;DR: This paper first formulate the optimal networked cloud mapping problem as a mixed integer programming (MIP) problem, indicating objectives related to cost efficiency of the resource mapping procedure, and proposes a method for the efficient mapping of resource requests onto a shared substrate interconnecting various islands of computing resources.
Abstract: Cloud computing builds upon advances on virtualization and distributed computing to support cost-efficient usage of computing resources, emphasizing on resource scalability and on demand services. Moving away from traditional data-center oriented models, distributed clouds extend over a loosely coupled federated substrate, offering enhanced communication and computational services to target end-users with quality of service (QoS) requirements, as dictated by the future Internet vision. Toward facilitating the efficient realization of such networked computing environments, computing and networking resources need to be jointly treated and optimized. This requires delivery of user-driven sets of virtual resources, dynamically allocated to actual substrate resources within networked clouds, creating the need to revisit resource mapping algorithms and tailor them to a composite virtual resource mapping problem. In this paper, toward providing a unified resource allocation framework for networked clouds, we first formulate the optimal networked cloud mapping problem as a mixed integer programming (MIP) problem, indicating objectives related to cost efficiency of the resource mapping procedure, while abiding by user requests for QoS-aware virtual resources. We subsequently propose a method for the efficient mapping of resource requests onto a shared substrate interconnecting various islands of computing resources, and adopt a heuristic methodology to address the problem. The efficiency of the proposed approach is illustrated in a simulation/emulation environment, that allows for a flexible, structured, and comparative performance evaluation. We conclude by outlining a proof-of-concept realization of our proposed schema, mounted over the European future Internet test-bed FEDERICA, a resource virtualization platform augmented with network and computing facilities.

249 citations


Journal ArticleDOI
TL;DR: This work proposes a novel Data Routing for In-Network Aggregation, called DRINA, that has some key aspects such as a reduced number of messages for setting up a routing tree, maximized number of overlapping routes, high aggregation rate, and reliable data aggregation and transmission.
Abstract: Large scale dense Wireless Sensor Networks (WSNs) will be increasingly deployed in different classes of applications for accurate monitoring. Due to the high density of nodes in these networks, it is likely that redundant data will be detected by nearby nodes when sensing an event. Since energy conservation is a key issue in WSNs, data fusion and aggregation should be exploited in order to save energy. In this case, redundant data can be aggregated at intermediate nodes reducing the size and number of exchanged messages and, thus, decreasing communication costs and energy consumption. In this work, we propose a novel Data Routing for In-Network Aggregation, called DRINA, that has some key aspects such as a reduced number of messages for setting up a routing tree, maximized number of overlapping routes, high aggregation rate, and reliable data aggregation and transmission. The proposed DRINA algorithm was extensively compared to two other known solutions: the Information Fusion-based Role Assignment (InFRA) and Shortest Path Tree (SPT) algorithms. Our results indicate clearly that the routing tree built by DRINA provides the best aggregation quality when compared to these other algorithms. The obtained results show that our proposed solution outperforms these solutions in different scenarios and in different key aspects required by WSNs.

215 citations


Journal ArticleDOI
TL;DR: A novel noninvasive, multiple-parameter side-channel analysisbased Trojan detection approach that uses the intrinsic relationship between dynamic current and maximum operating frequency of a circuit to isolate the effect of a Trojan circuit from process noise.
Abstract: Hardware Trojan attack in the form of malicious modification of a design has emerged as a major security threat. Sidechannel analysis has been investigated as an alternative to conventional logic testing to detect the presence of hardware Trojans. However, these techniques suffer from decreased sensitivity toward small Trojans, especially because of the large process variations present in modern nanometer technologies. In this paper, we propose a novel noninvasive, multiple-parameter side-channel analysisbased Trojan detection approach. We use the intrinsic relationship between dynamic current and maximum operating frequency of a circuit to isolate the effect of a Trojan circuit from process noise. We propose a vector generation approach and several design/test techniques to improve the detection sensitivity. Simulation results with two large circuits, a 32-bit integer execution unit (IEU) and a 128-bit advanced encryption standard (AES) cipher, show a detection resolution of 1.12 percent amidst ±20 percent parameter variations. The approach is also validated with experimental results. Finally, the use of a combined side-channel analysis and logic testing approach is shown to provide high overall detection coverage for hardware Trojan circuits of varying types and sizes.

207 citations


Journal ArticleDOI
TL;DR: In this paper, Li et al. proposed a public-key encryption with fuzzy keyword search (PEFKS) scheme, in which two or more keywords share the same fuzzy keyword trapdoor.
Abstract: Public-key encryption with keyword search (PEKS) is a versatile tool. It allows a third party knowing the search trapdoor of a keyword to search encrypted documents containing that keyword without decrypting the documents or knowing the keyword. However, it is shown that the keyword will be compromised by a malicious third party under a keyword guess attack (KGA) if the keyword space is in a polynomial size. We address this problem with a keyword privacy enhanced variant of PEKS referred to as public-key encryption with fuzzy keyword search (PEFKS). In PEFKS, each keyword corresponds to an exact keyword search trapdoor and a fuzzy keyword search trapdoor. Two or more keywords share the same fuzzy keyword trapdoor. To search encrypted documents containing a specific keyword, only the fuzzy keyword search trapdoor is provided to the third party, i.e., the searcher. Thus, in PEFKS, a malicious searcher can no longer learn the exact keyword to be searched even if the keyword space is small. We propose a universal transformation which converts any anonymous identity-based encryption (IBE) scheme into a secure PEFKS scheme. Following the generic construction, we instantiate the first PEFKS scheme proven to be secure under KGA in the case that the keyword space is in a polynomial size.

190 citations


Journal ArticleDOI
TL;DR: The proposed mm-wave wireless NoC (mWNoC) outperforms the corresponding conventional wireline counterpart in terms of achievable bandwidth and is significantly more energy efficient.
Abstract: The Network-on-chip (NoC) is an enabling technology to integrate large numbers of embedded cores on a single die. The existing methods of implementing a NoC with planar metal interconnects are deficient due to high latency and significant power consumption arising out of multihop links used in data exchange. To address these problems, we propose design of a hierarchical small-world wireless NoC architecture where the multihop wire interconnects are replaced with high-bandwidth and single-hop long-range wireless shortcuts operating in the millimeter (mm)-wave frequency range. The proposed mm-wave wireless NoC (mWNoC) outperforms the corresponding conventional wireline counterpart in terms of achievable bandwidth and is significantly more energy efficient. The performance improvement is achieved through efficient data routing and optimum placement of wireless hubs. Multiple wireless shortcuts operating simultaneously further enhance the performance, and provide an energy-efficient solution for design of communication infrastructures for multicore chips.

189 citations


Journal ArticleDOI
TL;DR: The optimal priority order of using the four levels of parallelism in SSDs is found to be: 1) the channel-level parallelism; 2) the die-level Parallelism; 3) the plane-levelallelism; and 4) the chip- level parallelism.
Abstract: Given the multilevel internal SSD parallelism at the different four levels: channel-level, chip-level, die-level, and plane-level, how to exploit these levels of parallelism will directly and significantly impact the performance and endurance of SSDs, which is in turn primarily determined by three internal factors, namely, advanced commands, allocation schemes, and the priority order of exploiting the four levels of parallelism. In this paper, we analyze these internal factors to characterize their impacts, interplay, and parallelism for the purpose of performance and endurance enhancement of SSDs through an in-depth experimental study. We come to the following key conclusions: 1) Different advanced commands provided by Flash manufacturers exploit different levels of parallelism inside SSDs, where they can either improve or degrade the SSD performance and endurance depending on how they are used; 2) Different physical-page allocation schemes employ different advanced commands and exploit different levels of parallelism inside SSDs, giving rise to different performance and endurance impacts; 3) The priority order of using the four levels of parallelism has the most significant performance and endurance impact among the three internal factors. The optimal priority order of using the four levels of parallelism in SSDs is found to be: 1) the channel-level parallelism; 2) the die-level parallelism; 3) the plane-level parallelism; and 4) the chip-level parallelism.

140 citations


Journal ArticleDOI
TL;DR: A new type of authentication, call group authentication, which authenticates all users belonging to the same group is proposed in this paper, which is based on Shamir's (t, n) secret sharing (SS) scheme.
Abstract: A new type of authentication, call group authentication, which authenticates all users belonging to the same group is proposed in this paper. The group authentication is specially designed for group-oriented applications. The group authentication is no longer a one-to-one type of authentication as most conventional user authentication schemes which have one prover and one verifier; but, it is a many-to-many type of authentication which has multiple provers and multiple verifiers. We propose a basic t-secure m-user n-group authentication scheme ((t, m, n) GAS), where t is the threshold of the proposed scheme, m is the number of users participated in the group authentication, and n is the number of members of the group, which is based on Shamir's (t, n) secret sharing (SS) scheme. The basic scheme can only work properly in synchronous communications. We also propose asynchronous (t, m, n) GASs, one is a GAS with one-time authentication and the other is a GAS with multiple authentications. The (t, m, n) GAS is very efficient since it is sufficient to authenticate all users at once if all users are group members; however, if there are nonmembers, it can be used as a preprocess before applying conventional user authentication to identify nonmembers.

Journal ArticleDOI
TL;DR: A new guideline for the design and implementation of effective LSMs on GPU is introduced and very efficient approaches are proposed for CPU-GPU data transfer optimization, thread control, mapping of neighboring solutions to GPU threads, and memory management.
Abstract: Local search metaheuristics (LSMs) are efficient methods for solving complex problems in science and industry. They allow significantly to reduce the size of the search space to be explored and the search time. Nevertheless, the resolution time remains prohibitive when dealing with large problem instances. Therefore, the use of GPU-based massively parallel computing is a major complementary way to speed up the search. However, GPU computing for LSMs is rarely investigated in the literature. In this paper, we introduce a new guideline for the design and implementation of effective LSMs on GPU. Very efficient approaches are proposed for CPU-GPU data transfer optimization, thread control, mapping of neighboring solutions to GPU threads, and memory management. These approaches have been experimented using four well-known combinatorial and continuous optimization problems and four GPU configurations. Compared to a CPU-based execution, accelerations up to \times 80 are reported for the large combinatorial problems and up to \times 240 for a continuous problem. Finally, extensive experiments demonstrate the strong potential of GPU-based LSMs compared to cluster or grid-based parallel architectures.

Journal ArticleDOI
TL;DR: It is formally proved that independently of the shape and dimensions of the planar topologies and of the number and placement of the TSVs, the proposed routing algorithm using two virtual channels in the plane is deadlock and livelock free.
Abstract: In this paper, we propose a distributed routing algorithm for vertically partially connected regular 2D topologies of different shapes and sizes (e.g., 2D mesh, torus, ring). The topologies that are the target of this algorithm are of practical interest in the 3D integration of heterogeneous dies using Through-Silicon-Vias (TSVs). Indeed, TSV-based 3D integration allows to envision the stacking of dies with different functions and technologies, using as an interconnect backbone a 3D-NoC. Intrinsically, 3D topologies have better performances, but yield and active area (and thus the cost) are function of the number of TSVs; therefore, the designs tend to use only a subset of available TSVs between two dies. The definition of blockage free and low implementation cost distributed deterministic routing on this kind of topology is thus of theoretical and practical interests. We formally prove that independently of the shape and dimensions of the planar topologies and of the number and placement of the TSVs, the proposed routing algorithm using two virtual channels in the plane is deadlock and livelock free. We also experimentally show that the performance of this algorithm is still acceptable when the number of vertical connections decreases.

Journal ArticleDOI
TL;DR: Several techniques are introduced to do optimization on GPUs, including reducing global memory transactions of input buffer, reducing latency of transition table lookup, eliminating output table accesses, avoiding bank-conflict of shared memory, coalescing writes to global memory, and enhancing data transmission via peripheral component interconnect express.
Abstract: Graphics processing units (GPUs) have attracted a lot of attention due to their cost-effective and enormous power for massive data parallel computing. In this paper, we propose a novel parallel algorithm for exact pattern matching on GPUs. A traditional exact pattern matching algorithm matches multiple patterns simultaneously by traversing a special state machine called an Aho-Corasick machine. Considering the particular parallel architecture of GPUs, in this paper, we first propose an efficient state machine on which we perform very efficient parallel algorithms. Also, several techniques are introduced to do optimization on GPUs, including reducing global memory transactions of input buffer, reducing latency of transition table lookup, eliminating output table accesses, avoiding bank-conflict of shared memory, coalescing writes to global memory, and enhancing data transmission via peripheral component interconnect express. We evaluate the performance of the proposed algorithm using attack patterns from Snort V2.8 and input streams from DEFCON. The experimental results show that the proposed algorithm performed on NVIDIA GPUs achieves up to 143.16-Gbps throughput, 14.74 times faster than the Aho-Corasick algorithm implemented on a 3.06-GHz quad-core CPU with the OpenMP. The library of the proposed algorithm is publically accessible through Google Code.

Journal ArticleDOI
TL;DR: This paper presents three enhanced MaaS capabilities and shows that window- based state monitoring is not only more resilient to noises and outliers, but also saves considerable communication cost and violation-likelihood-based state monitoring can dynamically adjust monitoring intensity based on the likelihood of detecting important events, leading to significant gain in monitoring service consolidation.
Abstract: This paper introduces the concept of monitoring-as-a-service (MaaS), its main components, and a suite of key functional requirements of MaaS in cloud. We argue that MaaS should support not only the conventional state monitoring capabilities, such as instantaneous violation detection, periodical state monitoring, and single tenant monitoring, but also performance-enhanced functionalities that can optimize on monitoring cost, scalability, and the effectiveness of monitoring service consolidation and isolation. In this paper, we present three enhanced MaaS capabilities and show that window-based state monitoring is not only more resilient to noises and outliers, but also saves considerable communication cost. Similarly, violation-likelihood-based state monitoring can dynamically adjust monitoring intensity based on the likelihood of detecting important events, leading to significant gain in monitoring service consolidation. Finally, multitenancy support in state monitoring allows multiple cloud users to enjoy MaaS with improved performance and efficiency at more affordable cost. We perform extensive experiments in an emulated cloud environment with real-world system and network traces. The experimental results suggest that our MaaS framework achieves significant lower monitoring cost, higher scalability, and better multitenancy performance.

Journal ArticleDOI
TL;DR: In this article, the design space of lightweight hash functions based on the sponge construction instantiated with present-type permutations is explored and the resulting family of hash functions is called spongent.
Abstract: The design of secure yet efficiently implementable cryptographic algorithms is a fundamental problem of cryptography. Lately, lightweight cryptography--optimizing the algorithms to fit the most constrained environments--has received a great deal of attention, the recent research being mainly focused on building block ciphers. As opposed to that, the design of lightweight hash functions is still far from being well investigated with only few proposals in the public domain. In this paper, we aim to address this gap by exploring the design space of lightweight hash functions based on the sponge construction instantiated with present-type permutations. The resulting family of hash functions is called spongent. We propose 13 spongent variants--or different levels of collision and (second) preimage resistance as well as for various implementation constraints. For each of them, we provide several ASIC hardware implementations--ranging from the lowest area to the highest throughput. We make efforts to address the fairness of comparison with other designs in the field by providing an exhaustive hardware evaluation on various technologies, including an open core library. We also prove essential differential properties of spongent permutations, give a security analysis in terms of collision and preimage resistance, as well as study in detail dedicated linear distinguishers.

Journal ArticleDOI
TL;DR: Inspired by the commodity servers in today's data centers that come with dual port, this paper considers how to build expandable and cost-effective structures without expensive high-end switches and additional hardware on servers except the two NIC ports.
Abstract: A fundamental goal of data center networking is to efficiently interconnect a large number of servers with the low equipment cost. Several server-centric network structures for data centers have been proposed. They, however, are not truly expandable and suffer a low degree of regularity and symmetry. Inspired by the commodity servers in today's data centers that come with dual port, we consider how to build expandable and cost-effective structures without expensive high-end switches and additional hardware on servers except the two NIC ports. In this paper, two such network structures, called HCN and BCN, are designed, both of which are of server degree 2. We also develop the low overhead and robust routing mechanisms for HCN and BCN. Although the server degree is only 2, HCN can be expanded very easily to encompass hundreds of thousands servers with the low diameter and high bisection width. Additionally, HCN offers a high degree of regularity, scalability, and symmetry, which conform to the modular designs of data centers. BCN is the largest known network structure for data centers with the server degree 2 and network diameter 7. Furthermore, BCN has many attractive features, including the low diameter, high bisection width, large number of node-disjoint paths for the one-to-one traffic, and good fault-tolerant ability. Mathematical analysis and comprehensive simulations show that HCN and BCN possess excellent topological properties and are viable network structures for data centers.

Journal ArticleDOI
TL;DR: This work proposes a technique to model and evaluate the VMM aging process and to investigate the optimal rejuvenation policy that maximizes the V MM availability under variable workload conditions and proposes a time-based policy that adapts the rejuvenation timer to theVMM workload condition improving the system availability.
Abstract: Cloud computing is a promising paradigm able to rationalize the use of hardware resources by means of virtualization. Virtualization allows to instantiate one or more virtual machines (VMs) on top of a single physical machine managed by a virtual machine monitor (VMM). Similarly to any other software, a VMM experiences aging and failures. Software rejuvenation is a proactive fault management technique that involves terminating an application, cleaning up the system internal state, and restarting it to prevent the occurrence of future failures. In this work, we propose a technique to model and evaluate the VMM aging process and to investigate the optimal rejuvenation policy that maximizes the VMM availability under variable workload conditions. Starting from dynamic reliability theory and adopting symbolic algebraic techniques, we investigate and compare existing time-based VMM rejuvenation policies. We also propose a time-based policy that adapts the rejuvenation timer to the VMM workload condition improving the system availability. The effectiveness of the proposed modeling technique is demonstrated through a numerical example based on a case study taken from the literature.

Journal ArticleDOI
TL;DR: This work proposes two energy-efficient proactive data reporting protocols, S sinkTrail and SinkTrail-S, for mobile sink-based data collection, which feature low-complexity and reduced control overheads and demonstrate satisfactory performance in finding shorter routing paths.
Abstract: In large-scale Wireless Sensor Networks (WSNs), leveraging data sinks' mobility for data gathering has drawn substantial interests in recent years. Current researches either focus on planning a mobile sink's moving trajectory in advance to achieve optimized network performance, or target at collecting a small portion of sensed data in the network. In many application scenarios, however, a mobile sink cannot move freely in the deployed area. Therefore, the precalculated trajectories may not be applicable. To avoid constant sink location update traffics when a sink's future locations cannot be scheduled in advance, we propose two energy-efficient proactive data reporting protocols, SinkTrail and SinkTrail-S, for mobile sink-based data collection. The proposed protocols feature low-complexity and reduced control overheads. Two unique aspects distinguish our approach from previous ones: 1) we allow sufficient flexibility in the movement of mobile sinks to dynamically adapt to various terrestrial changes; and 2) without requirements of GPS devices or predefined landmarks, SinkTrail establishes a logical coordinate system for routing and forwarding data packets, making it suitable for diverse application scenarios. We systematically analyze the impact of several design factors in the proposed algorithms. Both theoretical analysis and simulation results demonstrate that the proposed algorithms reduce control overheads and yield satisfactory performance in finding shorter routing paths.

Journal ArticleDOI
TL;DR: By exploring the boundary problem of the BC networks, it is proved that when n ≥ 4 and 0 ≤ h ≤ n - 4 the h-extra connectivity of an n-dimensional BC network Xn is kh(Xn = n(h + 1)- 1/2h (h + 3).
Abstract: Reliability evaluation of interconnection network is important to the design and maintenance of multiprocessor systems. Extra connectivity determination and faulty networks' structure analysis are two important aspects for the reliability evaluation of interconnection networks. An n-dimensional bijective connection network (in brief, BC network), denoted by Xn, is an n-regular graph with 2n vertices and n2n-1 edges. The hypercubes, Mobius cubes, crossed cubes, and twisted cubes are some examples of the BC networks. By exploring the boundary problem of the BC networks, we prove that when n ≥ 4 and 0 ≤ h ≤ n - 4 the h-extra connectivity of an n-dimensional BC network Xn is kh(Xn = n(h + 1)- 1/2h (h + 3). Furthermore, there exists a large connected component and the remaining small components have at most h vertices in total if the total number of faulty vertices is strictly less its h-extra connectivity. As an application, the results on the h-extra connectivity and structure of faulty networks on hypercubes, Mobius cubes, crossed cubes, and twisted cubes are obtained.

Journal ArticleDOI
TL;DR: This work maps 16 implementations of an Advanced Encryption Standard (AES) cipher with both online and offline key expansion on a fine-grained many-core system and shows 2.0 times higher throughput than the TI DSP C6201, and 2.9 times higher energy efficiency than the GeForce 8800 GTX.
Abstract: By exploring different granularities of data-level and task-level parallelism, we map 16 implementations of an Advanced Encryption Standard (AES) cipher with both online and offline key expansion on a fine-grained many-core system. The smallest design utilizes only six cores for offline key expansion and eight cores for online key expansion, while the largest requires 107 and 137 cores, respectively. In comparison with published AES cipher implementations on general purpose processors, our design has 3.5-15.6 times higher throughput per unit of chip area and 8.2-18.1 times higher energy efficiency. Moreover, the design shows 2.0 times higher throughput than the TI DSP C6201, and 3.3 times higher throughput per unit of chip area and 2.9 times higher energy efficiency than the GeForce 8800 GTX.

Journal ArticleDOI
TL;DR: A novel concept of triple key distribution is introduced, in which three nodes share common keys, and its application in secure forwarding, detecting malicious nodes and key management in clustered sensor networks is discussed.
Abstract: We address pairwise and (for the first time) triple key establishment problems in wireless sensor networks (WSN). Several types of combinatorial designs have already been applied in key establishment. A BIBD(v, b, r, k, λ) (or t - (v, b, r, k, λ) design) can be mapped to a sensor network, where v represents the size of the key pool, b represents the maximum number of nodes that the network can support, and k represents the size of the key chain. Any pair (or t-subset) of keys occurs together uniquely in exactly λ nodes; λ = 2 and λ = 3 are used to establish unique pairwise or triple keys. We use several known constructions of designs with λ = 2, to predistribute keys in sensors. We also describe a new construction of a design called strong Steiner trade and use it for pairwise key establishment. To the best of our knowledge, this is the first paper on application of trades to key distribution. Our scheme is highly resilient against node capture attacks (achieved by key refreshing) and is applicable for mobile sensor networks (as key distribution is independent on the connectivity graph), while preserving low storage, computation and communication requirements. We introduce a novel concept of triple key distribution, in which three nodes share common keys, and discuss its application in secure forwarding, detecting malicious nodes and key management in clustered sensor networks. We present a polynomial-based and a combinatorial approach (using trades) for triple key distribution. We also extend our construction to simultaneously provide pairwise and triple key distribution scheme, and apply it to secure data aggregation.

Journal ArticleDOI
TL;DR: A novel system, named Malwise, for malware classification using a fast application-level emulator to reverse the code packing transformation, and two flowgraph matching algorithms to perform classification, which is able to detect malware with near real-time performance.
Abstract: Signature-based malware detection systems have been a much used response to the pervasive problem of malware. Identification of malware variants is essential to a detection system and is made possible by identifying invariant characteristics in related samples. To classify the packed and polymorphic malware, this paper proposes a novel system, named Malwise, for malware classification using a fast application-level emulator to reverse the code packing transformation, and two flowgraph matching algorithms to perform classification. An exact flowgraph matching algorithm is employed that uses string-based signatures, and is able to detect malware with near real-time performance. Additionally, a more effective approximate flowgraph matching algorithm is proposed that uses the decompilation technique of structuring to generate string-based signatures amenable to the string edit distance. We use real and synthetic malware to demonstrate the effectiveness and efficiency of Malwise. Using more than 15,000 real malware, collected from honeypots, the effectiveness is validated by showing that there is an 88 percent probability that new malware is detected as a variant of existing malware. The efficiency is demonstrated from a smaller sample set of malware where 86 percent of the samples can be classified in under 1.3 seconds.

Journal ArticleDOI
TL;DR: This study investigates some topological properties of k-ary n-cubes, and shows that the conditional diagnosability of k -ary n -cubes under the comparison diagnosis model is 6n - 5.
Abstract: Processor fault diagnosis plays an important role in measuring the reliability of multiprocessor systems and diagnosing many well-known interconnection networks. Conditional diagnosability is a novel measure of diagnosability that adds the additional condition that any faulty set cannot contain all of the neighbors of any vertex in a system. This study investigates some topological properties of k-ary n-cubes, where k ≥ 4 and n ≥ 4, and shows that the conditional diagnosability of k-ary n-cubes under the comparison diagnosis model is 6n - 5.

Journal ArticleDOI
TL;DR: This paper presents a systematic methodology to produce decomposable PMC-based power models on current multicore architectures and compares the models against existing approaches concluding that the proposed methodology produces more accurate, responsive, and informative models.
Abstract: Power modeling based on performance monitoring counters (PMCs) attracted the interest of researchers since it became a quick approach to understand the power behavior of real systems. Consequently, several power-aware policies use models to guide their decisions. Hence, the presence of power models that are informative, accurate, and capable of detecting power phases is critical to improve the success of power-saving techniques. Additionally, the design of current processors varied considerably with the appearance of CMPs (multiple cores sharing resources). Thus, PMC-based power models warrant further investigation on current energy-efficient multicore processors. In this paper, we present a systematic methodology to produce decomposable PMC-based power models on current multicore architectures. Apart from being able to estimate the power consumption accurately, the models provide per component power consumption, supplying extra insights about power behavior. Moreover, we study their responsiveness -the capacity to detect power phases-. Specifically, we produce power models for an Intel Core 2 Duo with one and two cores enabled for all the DVFS configurations. The models are empirically validated using the SPECcpu2006, NAS and LMBENCH benchmarks. Finally, we compare the models against existing approaches concluding that the proposed methodology produces more accurate, responsive, and informative models.

Journal ArticleDOI
TL;DR: An efficient denoising scheme and its VLSI architecture for the removal of random-valued impulse noise is proposed and can obtain better performances in terms of both quantitative evaluation and visual quality than the previous lower complexity methods.
Abstract: Images are often corrupted by impulse noise in the procedures of image acquisition and transmission. In this paper, we propose an efficient denoising scheme and its VLSI architecture for the removal of random-valued impulse noise. To achieve the goal of low cost, a low-complexity VLSI architecture is proposed. We employ a decision-tree-based impulse noise detector to detect the noisy pixels, and an edge-preserving filter to reconstruct the intensity values of noisy pixels. Furthermore, an adaptive technology is used to enhance the effects of removal of impulse noise. Our extensive experimental results demonstrate that the proposed technique can obtain better performances in terms of both quantitative evaluation and visual quality than the previous lower complexity methods. Moreover, the performance can be comparable to the higher,- complexity methods. The VLSI architecture of our design yields a processing rate of about 200 MHz by using TSMC 0.18 μm technology. Compared with the state-of-the-art techniques, this work can reduce memory storage by more than 99 percent. The design requires only low computational complexity and two line memory buffers. Its hardware cost is low and suitable to be applied to many real-time applications.

Journal ArticleDOI
TL;DR: This paper combines the ideas of hardware pipeline and loop unrolling to design an architecture that produces 2 RC4 keystream bytes per clock cycle, and proposes the fastest known architecture for the cipher.
Abstract: RC4 is the most popular stream cipher in the domain of cryptology. In this paper, we present a systematic study of the hardware implementation of RC4, and propose the fastest known architecture for the cipher. We combine the ideas of hardware pipeline and loop unrolling to design an architecture that produces 2 RC4 keystream bytes per clock cycle. We have optimized and implemented our proposed design using VHDL description, synthesized with 130, 90, and 65 nm fabrication technologies at clock frequencies 625 MHz, 1.37 GHz, and 1.92 GHz, respectively, to obtain a final RC4 keystream throughput of 10, 21.92, and 30.72 Gbps in the respective technologies.

Journal ArticleDOI
TL;DR: This paper proposes a cooperative resource provisioning solution that could save the server cost aggressively with respect to the noncooperative solutions that are widely used in state-of-the-practice hosting data centers or cloud systems.
Abstract: Recent cost analysis shows that the server cost still dominates the total cost of high-scale data centers or cloud systems. In this paper, we argue for a new twist on the classical resource provisioning problem: heterogeneous workloads are a fact of life in large-scale data centers, and current resource provisioning solutions do not act upon this heterogeneity. Our contributions are threefold: first, we propose a cooperative resource provisioning solution, and take advantage of differences of heterogeneous workloads so as to decrease their peak resources consumption under competitive conditions; second, for four typical heterogeneous workloads: parallel batch jobs, web servers, search engines, and MapReduce jobs, we build an agile system PhoenixCloud that enables cooperative resource provisioning; and third, we perform a comprehensive evaluation for both real and synthetic workload traces. Our experiments show that our solution could save the server cost aggressively with respect to the noncooperative solutions that are widely used in state-of-the-practice hosting data centers or cloud systems: for example, EC2, which leverages the statistical multiplexing technique, or RightScale, which roughly implements the elastic resource provisioning technique proposed in related state-of-the-art work.

Journal ArticleDOI
TL;DR: This paper proposes a design methodology that enhances the classical system-level design flow for embedded systems to introduce reliability-awareness, and allows the designer to specify that only some parts of the systems need to be hardened against faults.
Abstract: This paper proposes a design methodology that enhances the classical system-level design flow for embedded systems to introduce reliability-awareness. The mapping and scheduling step is extended to support the application of hardening techniques to fulfill the required fault management properties that the final system must exhibit; moreover, the methodology allows the designer to specify that only some parts of the systems need to be hardened against faults. The reference architecture is a complex distributed one, constituted by resources with different characteristics in terms of performance and available fault detection/tolerance mechanisms. The approach is evaluated and compared against the most recent and relevant work, with an in-depth analysis on a large set of benchmarks.

Journal ArticleDOI
TL;DR: This work promotes a different, traffic-aware, modular approach in the design of FPGA-based NIDS, which classify and group homogeneous traffic, and dispatch it to differently capable hardware blocks, each supporting a (smaller) rule set tailored to the specific traffic category.
Abstract: Security of today's networks heavily rely on network intrusion detection systems (NIDSs). The ability to promptly update the supported rule sets and detect new emerging attacks makes field-programmable gate arrays (FPGAs) a very appealing technology. An important issue is how to scale FPGA-based NIDS implementations to ever faster network links. Whereas a trivial approach is to balance traffic over multiple, but functionally equivalent, hardware blocks, each implementing the whole rule set (several thousands rules), the obvious cons is the linear increase in the resource occupation. In this work, we promote a different, traffic-aware, modular approach in the design of FPGA-based NIDS. Instead of purely splitting traffic across equivalent modules, we classify and group homogeneous traffic, and dispatch it to differently capable hardware blocks, each supporting a (smaller) rule set tailored to the specific traffic category. We implement and validate our approach using the rule set of the well-known Snort NIDS, and we experimentally investigate the emerging trade-offs and advantages, showing resource savings up to 80 percent based on real-world traffic statistics gathered from an operator's backbone.