scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Computers in 2010"


Journal ArticleDOI
TL;DR: The approach, namely, MDAPRA strives to provide a mutual exclusion mechanism in repositioning the nodes to restore connectivity and localize the scope of the recovery and minimize the overhead imposed on the nodes.
Abstract: Mobility has been introduced to sensor networks through the deployment of movable nodes. In movable wireless networks, network connectivity among the nodes is a crucial factor in order to relay data to the sink node, exchange data for collaboration, and perform data aggregation. However, such connectivity can be lost due to a failure of one or more nodes. Even a single node failure may partition the network, and thus, eventually reduce the quality and efficiency of the network operation. To handle this connectivity problem, we present PADRA to detect possible partitions, and then, restore the network connectivity through controlled relocation of movable nodes. The idea is to identify whether or not the failure of a node will cause partitioning in advance in a distributed manner. If a partitioning is to occur, PADRA designates a failure handler to initiate the connectivity restoration process. The overall goal in this process is to localize the scope of the recovery and minimize the overhead imposed on the nodes. We further extend PADRA to handle multiple node failures. The approach, namely, MDAPRA strives to provide a mutual exclusion mechanism in repositioning the nodes to restore connectivity. The effectiveness of the proposed approaches is validated through simulation experiments.

200 citations


Journal ArticleDOI
TL;DR: This paper proposes an authenticated key transfer protocol based on secret sharing scheme that KGC can broadcast group key information to all group members at once and only authorized group members can recover the group key; but unauthorized users cannot recover this group key.
Abstract: Key transfer protocols rely on a mutually trusted key generation center (KGC) to select session keys and transport session keys to all communication entities secretly. Most often, KGC encrypts session keys under another secret key shared with each entity during registration. In this paper, we propose an authenticated key transfer protocol based on secret sharing scheme that KGC can broadcast group key information to all group members at once and only authorized group members can recover the group key; but unauthorized users cannot recover the group key. The confidentiality of this transformation is information theoretically secure. We also provide authentication for transporting this group key. Goals and security threats of our proposed group key transfer protocol will be analyzed in detail.

184 citations


Journal ArticleDOI
TL;DR: A modification to the Wallace reduction is presented that ensures that the delay is the same as for the conventional Wallace reduction, producing implementations with 80 percent fewer half adders than standard Wallace multipliers, with a very slight increase in the number of full adders.
Abstract: Wallace high-speed multipliers use full adders and half adders in their reduction phase. Half adders do not reduce the number of partial product bits. Therefore, minimizing the number of half adders used in a multiplier reduction will reduce the complexity. A modification to the Wallace reduction is presented that ensures that the delay is the same as for the conventional Wallace reduction. The modified reduction method greatly reduces the number of half adders; producing implementations with 80 percent fewer half adders than standard Wallace multipliers, with a very slight increase in the number of full adders.

128 citations


Journal ArticleDOI
TL;DR: This paper proposes low-cost structure-independent fault detection schemes for the AES encryption and decryption using new formulations for the fault detection of SubBytes and inverse SubBytes using the relation between the input and the output of the S-box and the inverse S-boxes.
Abstract: The Advanced Encryption Standard (AES) has been lately accepted as the symmetric cryptography standard for confidential data transmission. However, the natural and malicious injected faults reduce its reliability and may cause confidential information leakage. In this paper, we study concurrent fault detection schemes for reaching a reliable AES architecture. Specifically, we propose low-cost structure-independent fault detection schemes for the AES encryption and decryption. We have obtained new formulations for the fault detection of SubBytes and inverse SubBytes using the relation between the input and the output of the S-box and the inverse S-box. The proposed schemes are independent of the way the S-box and the inverse S-box are constructed. Therefore, they can be used for both the S-boxes and the inverse S-boxes using lookup tables and those utilizing logic gates based on composite fields. Our simulation results show the error coverage of greater than 99 percent for the proposed schemes. Moreover, the proposed and the previously reported fault detection schemes have been implemented on the most recent Xilinx Virtex FPGAs. Their area and delay overheads have been compared and it is shown that the proposed schemes outperform the previously reported ones.

125 citations


Journal ArticleDOI
TL;DR: A formal model of the corresponding optimization problem including constraints concerning buffer sizes, timing, and rates is presented and a new method for approximate multiparametric linear programming is suggested which substantially lowers the computational demand and memory requirement of the embedded software.
Abstract: Recently, there has been a substantial interest in the design of systems that receive their energy from regenerative sources such as solar cells. In contrast to approaches that minimize the power consumption subject to performance constraints, we are concerned with optimizing the performance of an application while respecting the limited and time-varying amount of available power. In this paper, we address power management of, e.g., wireless sensor nodes which receive their energy from solar cells. Based on a prediction of the future available energy, we adapt parameters of the application in order to maximize the utility in a long-term perspective. The paper presents a formal model of the corresponding optimization problem including constraints concerning buffer sizes, timing, and rates. Instead of solving the optimization problem online which may be prohibitively complex in terms of running time and energy consumption, we apply multiparametric programming to precompute the application parameters offline for different environmental conditions and system states. In order to guarantee sustainable operation, we propose a hierarchical software design which comprises a worst-case prediction of the incoming energy. As a further contribution, we suggest a new method for approximate multiparametric linear programming which substantially lowers the computational demand and memory requirement of the embedded software. Our approaches are evaluated using long-term measurements of solar energy in an outdoor environment.

123 citations


Journal ArticleDOI
TL;DR: It is proved that 3D k-covered WSNs can sustain a large number of sensor failures and relax some widely used assumptions in coverage and connectivity in WSNS, such as sensor homogeneity and unit sensing and communication model, so as to promote the practicality of the results in real-world scenarios.
Abstract: In a wireless sensor network (WSN), connectivity enables the sensors to communicate with each other, while sensing coverage reflects the quality of surveillance. Although the majority of studies on coverage and connectivity in WSNs consider 2D space, 3D settings represent more accurately the network design for real-world applications. As an example, underwater sensor networks require design in 3D rather than 2D space. In this paper, we focus on the connectivity and k-coverage issues in 3D WSNs, where each point is covered by at least k sensors (the maximum value of k is called the coverage degree). Precisely, we propose the Reuleaux tetrahedron model to characterize k-coverage of a 3D field and investigate the corresponding minimum sensor spatial density. We prove that a 3D field is guaranteed to be k-covered if any Reuleaux tetrahedron region of the field contains at least k sensors. We also compute the connectivity of 3D k-covered WSNs. Based on the concepts of conditional connectivity and forbidden faulty sensor set, which cannot include all the neighbors of a sensor, we prove that 3D k-covered WSNs can sustain a large number of sensor failures. Precisely, we prove that 3D k-covered WSNs have connectivity higher than their coverage degree k. Then, we relax some widely used assumptions in coverage and connectivity in WSNs, such as sensor homogeneity and unit sensing and communication model, so as to promote the practicality of our results in real-world scenarios. Also, we propose a placement strategy of sensors to achieve full k-coverage of a 3D field. This strategy can be used in the design of energy-efficient scheduling protocols for 3D k-covered WSNs to extend the network lifetime.

122 citations


Journal ArticleDOI
TL;DR: A distributed algorithm for optimal power control is devised and it is proved that the system is power stable only if the nodes comply with certain transmit power thresholds, showing that even in a noncooperative scenario, it is in the best interest of the nodes to comply with these thresholds.
Abstract: In infrastructure-less sensor networks, efficient usage of energy is very critical because of the limited energy available to the sensor nodes. Among various phenomena that consume energy, radio communication is by far the most demanding one. One of the effective ways to limit unnecessary energy loss is to control the power at which the nodes transmit signals. In this paper, we apply game theory to solve the power control problem in a CDMA-based distributed sensor network. We formulate a noncooperative game under incomplete information and study the existence of Nash equilibrium. With the help of this equilibrium, we devise a distributed algorithm for optimal power control and prove that the system is power stable only if the nodes comply with certain transmit power thresholds. We show that even in a noncooperative scenario, it is in the best interest of the nodes to comply with these thresholds. The power level at which a node should transmit, to maximize its utility, is evaluated. Moreover, we compare the utilities when the nodes are allowed to transmit with discrete and continuous power levels; the performance with discrete levels is upper bounded by the continuous case. We define a distortion metric that gives a quantitative measure of the goodness of having finite power levels and also find those levels that minimize the distortion. Numerical results demonstrate that the proposed algorithm achieves the best possible payoff/utility for the sensor nodes even by consuming less power.

122 citations


Journal ArticleDOI
TL;DR: An anonymous multireceiver identity-based encryption scheme where Lagrange interpolating polynomial mechanisms are adopted to make it impossible for an attacker or any other message receiver to derive the identity of a message receiver such that the privacy of every receiver can be guaranteed.
Abstract: Recently, many multireceiver identity-based encryption schemes have been proposed in the literature. However, none can protect the privacy of message receivers among these schemes. In this paper, we present an anonymous multireceiver identity-based encryption scheme where we adopt Lagrange interpolating polynomial mechanisms to cope with the above problem. Our scheme makes it impossible for an attacker or any other message receiver to derive the identity of a message receiver such that the privacy of every receiver can be guaranteed. Furthermore, the proposed scheme is quite receiver efficient since each of the receivers merely needs to perform twice of pairing computation to decrypt the received ciphertext. We prove that our scheme is secure against adaptive chosen plaintext attacks and adaptive chosen ciphertext attacks. Finally, we also formally show that every receiver in the proposed scheme is anonymous to any other receiver.

120 citations


Journal ArticleDOI
TL;DR: This work attempts to achieve improved flash-memory endurance without substantially increasing overhead and without excessively modifying popular implementation designs such as the flash translation layer protocol (FTL), NAND flash translationLayer protocol (NFTL, and block-level flash translationlayer protocol (BL).
Abstract: Motivated by the strong demand for flash memory with enhanced reliability, this work attempts to achieve improved flash-memory endurance without substantially increasing overhead and without excessively modifying popular implementation designs such as the flash translation layer protocol (FTL), NAND flash translation layer protocol (NFTL), and block-level flash translation layer protocol (BL). A wear-leveling mechanism for moving data that are not updated is proposed to distribute wear-leveling actions over the entire physical address space, so that static or rarely updated data can be proactively moved and memory-space requirements can be minimized. The properties of the mechanism are then explored with various implementation considerations. A series of experiments based on a realistic trace demonstrates the significantly improved endurance of FTL, NFTL, and BL with limited system overhead.

112 citations


Journal ArticleDOI
TL;DR: A systematic approach to the comparison of the graphics processor (GPU) and reconfigurable logic is defined in terms of three throughput drivers, applied to five case study algorithms, characterized by their arithmetic complexity, memory access requirements, and data dependence.
Abstract: A systematic approach to the comparison of the graphics processor (GPU) and reconfigurable logic is defined in terms of three throughput drivers. The approach is applied to five case study algorithms, characterized by their arithmetic complexity, memory access requirements, and data dependence, and two target devices: the nVidia GeForce 7900 GTX GPU and a Xilinx Virtex-4 field programmable gate array (FPGA). Two orders of magnitude speedup, over a general-purpose processor, is observed for each device for arithmetic intensive algorithms. An FPGA is superior, over a GPU, for algorithms requiring large numbers of regular memory accesses, while the GPU is superior for algorithms with variable data reuse. In the presence of data dependence, the implementation of a customized data path in an FPGA exceeds GPU performance by up to eight times. The trends of the analysis to newer and future technologies are analyzed.

109 citations


Journal ArticleDOI
TL;DR: Hydra is described, a high-performance flash memory SSD architecture that translates the parallelism inherent in multiple flash memory chips into improved performance, by means of both bus-level and chip-level interleaving.
Abstract: Flash memory solid-state disks (SSDs) are replacing hard disk drives (HDDs) in mobile computing systems because of their lower power consumption, faster random access, and greater shock resistance. We describe Hydra, a high-performance flash memory SSD architecture that translates the parallelism inherent in multiple flash memory chips into improved performance, by means of both bus-level and chip-level interleaving. Hydra has a prioritized structure of memory controllers, consisting of a single high-priority foreground unit, to deal with read requests, and multiple background units, all capable of autonomous execution of sequences of high-level flash memory operations. Hydra also employs an aggressive write buffering mechanism based on block mapping to ensure that multiple flash memory chips are used effectively, and also to expedite the processing of write requests. Performance evaluation of an FPGA implementation of the Hydra SSD architecture shows that its performance is more than 80 percent better than the best of the comparable HDDs and SSDs that we considered.

Journal ArticleDOI
TL;DR: RIM strives to efficiently restore the network connectivity after a node failure by triggers a local recovery process by relocating the neighbors of the lost node and minimizing the messaging overhead.
Abstract: Recent years have witnessed a growing interest in the applications of wireless sensor networks (WSNs). In some of these applications, such as search and rescue and battlefield reconnaissance, a set of mobile nodes is deployed in order to collectively survey an area of interest and/or perform specific surveillance tasks. Such collaboration among the sensors requires internode interaction and thus maintaining network connectivity is critical to the effectiveness of WSNs. While connectivity can be provisioned at startup time and then sustained through careful coordination when nodes move, a sudden failure of a node poses a challenge since the network may get partitioned. This paper presents RIM; a distributed algorithm for Recovery through Inward Motion. RIM strives to efficiently restore the network connectivity after a node failure. Instead of performing a networkwide analysis to assess the impact of the node failure and orchestrate a course of action, RIM triggers a local recovery process by relocating the neighbors of the lost node. In addition to minimizing the messaging overhead, RIM opts to reduce the distance that the individual nodes have to travel during the recovery. The correctness of the RIM algorithm is proven and the incurred overhead is analyzed. The performance of RIM is validated through simulation experiments.

Journal ArticleDOI
TL;DR: Predictive temperature-aware dynamic voltage and frequency scaling (DVFS) using the performance counters that are already embedded in commercial microprocessors is proposed and results show that in a Linux-based laptop with the Intel Core2 Duo processor, DVFS using theperformance counters performs comparable to DV FS using the thermal sensor.
Abstract: In this paper, we propose predictive temperature-aware dynamic voltage and frequency scaling (DVFS) using the performance counters that are already embedded in commercial microprocessors. By using the performance counters and simple regression analysis, we can predict the localized temperature and efficiently scale the voltage/frequency. When localized thermal problems that were not detected by thermal sensors are found after layout (or fabrication), the thermal problems can be avoided by the proposed software solution without delaying time-to-market. The evaluation results show that in a Linux-based laptop with the Intel Core2 Duo processor, DVFS using the performance counters performs comparable to DVFS using the thermal sensor.

Journal ArticleDOI
TL;DR: The proposed architectures of two parallel decimal multipliers have interesting area-delay figures compared to conventional Booth radix-4 and radix--8 parallel binary multipliers and outperform the figures of previous alternatives for decimal multiplication.
Abstract: The new generation of high-performance decimal floating-point units (DFUs) is demanding efficient implementations of parallel decimal multipliers. In this paper, we describe the architectures of two parallel decimal multipliers. The parallel generation of partial products is performed using signed-digit radix-10 or radix-5 recodings of the multiplier and a simplified set of multiplicand multiples. The reduction of partial products is implemented in a tree structure based on a decimal multioperand carry-save addition algorithm that uses unconventional (non BCD) decimal-coded number systems. We further detail these techniques and present the new improvements to reduce the latency of the previous designs, which include: optimized digit recoders for the generation of 2n-tuples (and 5-tuples), decimal carry-save adders (CSAs) combining different decimal-coded operands, and carry-free adders implemented by special designed bit counters. Moreover, we detail a design methodology that combines all these techniques to obtain efficient reduction trees with different area and delay trade-offs for any number of partial products generated. Evaluation results for 16-digit operands show that the proposed architectures have interesting area-delay figures compared to conventional Booth radix-4 and radix--8 parallel binary multipliers and outperform the figures of previous alternatives for decimal multiplication.

Journal ArticleDOI
TL;DR: Both proposed latches are faster than the latches most recently presented in the literature, while providing better or comparable robustness to transient faults, at comparable or lower costs in terms of area and power, respectively.
Abstract: First, a new high-performance robust latch (referred to as HiPeR latch) is presented that is insensitive to transient faults affecting its internal and output nodes by design, independently of the size of its transistors. Then, a modified version of the HiPeR latch (referred as HiPeR-CG) is proposed that is suitable to be used together with clock gating. Both proposed latches are faster than the latches most recently presented in the literature, while providing better or comparable robustness to transient faults, at comparable or lower costs in terms of area and power, respectively. Therefore, thanks to the good trade-offs in terms of performance, robustness, and cost, our proposed latches are particularly suitable to be adopted on critical paths.

Journal ArticleDOI
TL;DR: This paper first study the randomized scheduling algorithm via both analysis and simulations in terms of network coverage intensity, detection delay, and detection probability, and proves that the optimal solution exists, and provides conditions of the existence of the optimal solutions.
Abstract: In wireless sensor networks, some sensor nodes are put in sleep mode while other sensor nodes are in active mode for sensing and communication tasks in order to reduce energy consumption and extend network lifetime. This approach is a special case (k=2) of a randomized scheduling algorithm, in which k subsets of sensors work alternatively. In this paper, we first study the randomized scheduling algorithm via both analysis and simulations in terms of network coverage intensity, detection delay, and detection probability. We further study asymptotic coverage and other properties. Finally, we analyze a problem of maximizing network lifetime under quality of service constraints such as bounded detection delay, detection probability, and network coverage intensity. We prove that the optimal solution exists, and provide conditions of the existence of the optimal solutions.

Journal ArticleDOI
TL;DR: This work proposes an adaptive reputation management system that realizes that changes in node behavior may be driven by changes in network conditions and that accommodates such changes by adapting its operating parameters, and introduces a time-slotted approach to allow the evaluation function to quickly and accurately capture changes in nodes behavior.
Abstract: Reputation management systems have been proposed as a cooperation enforcement solution in ad hoc networks. Typically, the functions of reputation management (evaluation, detection, and reaction) are carried out homogeneously across time and space. However, the dynamic nature of ad hoc networks causes node behavior to vary both spatially and temporally due to changes in local and network-wide conditions. When reputation management functions do not adapt to such changes, their effectiveness, measured in terms of accuracy (correct identification of node behavior) and promptness (timely identification of node misbehavior), may be compromised. We propose an adaptive reputation management system that realizes that changes in node behavior may be driven by changes in network conditions and that accommodates such changes by adapting its operating parameters. We introduce a time-slotted approach to allow the evaluation function to quickly and accurately capture changes in node behavior. We show how the duration of an evaluation slot can adapt according to the network's activity to enhance the system accuracy and promptness. We then show how the detection function can utilize a Sequential Probability Ratio Test (SPRT) to distinguish between cooperative and misbehaving neighbors. The SPRT adapts to changes in neighbors' behavior that are a by-product of changing network conditions, by using the node's own behavior as a benchmark. We compare our proposed solution to a nonadaptive system, showing the ability of our system to achieve high accuracy and promptness in dynamic environments. To the best of our knowledge, this is the first work to explore the adaptation of the reputation management functions to changes in network conditions.

Journal ArticleDOI
TL;DR: This study introduces a hybrid approach to large SSDs that combines MLC Nand flash and SLC NAND flash, and shows that the two can complement each other and produce a hybrid SSD.
Abstract: Replacing power-hungry disks with NAND-flash-based solid-state disks (SSDs) is a recently emerging trend in flash-memory applications. One important SSD design issue is achieving a good balance between cost, performance, and lifetime. This study introduces a hybrid approach to large SSDs that combines MLC NAND flash and SLC NAND flash. Each of these flash architectures has its own drawbacks and benefits, and this study proposes that the two can complement each other. However, there are technical challenges pertaining to data placement, data migration, and wear leveling in heterogeneous NAND flash. The experimental results of our study show that combining 256 MB SLC flash with 20 GB MLC flash produces a hybrid SSD. This hybrid SSD is 1.8 times faster than a purely MLC-flash-based SSD in terms of average response time and improves energy consumption by 46 percent. The proposed hybrid SSD costs only four percent more than a purely MLC-flash-based SSD. The extra cost of a hybrid SSD is very limited and rewarding.

Journal ArticleDOI
TL;DR: This work proposes two heterogenous quorum-based asynchronous wake-up scheduling schemes for wireless sensor networks and proves that any two grid quorum systems will automatically form a gqs-pair, and analytically-established performance trade-off of both designs.
Abstract: We present heterogenous quorum-based asynchronous wake-up scheduling schemes for wireless sensor networks. The schemes can ensure that two nodes that adopt different quorum systems as their wake-up schedules can hear each other at least once in bounded time intervals. We propose two such schemes: cyclic quorum system pair (cqs-pair) and grid quorum system pair (gqs-pair). The cqs-pair which contains two cyclic quorum systems provides an optimal solution, in terms of energy saving ratio, for asynchronous wake-up scheduling. To quickly assemble a cqs-pair, we present a fast construction scheme which is based on the multiplier theorem and the (N,k,M, l)-difference pair defined by us. Regarding the gqs-pair, we prove that any two grid quorum systems will automatically form a gqs-pair. We further analyze the performance of both designs, in terms of average discovery delay, quorum ratio, and energy saving ratio. We show that our designs achieve better trade-off between the average discovery delay and quorum ratio (and thus energy consumption) for different cycle lengths. We implemented the proposed designs in a wireless sensor network platform of Telosb motes. Our implementation-based measurements further validate the analytically-established performance trade-off of our designs.

Journal ArticleDOI
TL;DR: A counter architecture for online DVFS profitability estimation on superscalar out-of-order processors that can accurately estimate the performance and energy consumption at different V/f operating points from a single program execution is proposed.
Abstract: Dynamic voltage and frequency scaling (DVFS) is a well known and effective technique for reducing power consumption in modern microprocessors. An important concern though is to estimate its profitability in terms of performance and energy. Current DVFS profitability estimation approaches, however, lack accuracy or incur runtime performance and/or energy overhead. This paper proposes a counter architecture for online DVFS profitability estimation on superscalar out-of-order processors. The counter architecture teases apart the fraction of the execution time that is susceptible to clock frequency versus the fraction that is insusceptible to clock frequency. By doing so, the counter architecture can accurately estimate the performance and energy consumption at different V/f operating points from a single program execution. The DVFS counter architecture estimates performance, energy consumption, and energy-delaysquared-product (ED2P) within 0.2, 0.5, and 0.8 percent on average, respectively, over a 4x frequency range. Further, the counter architecture incurs a small hardware cost and is an enabler for online DVFS scheduling both at the intracore as well as at the intercore level in a multicore processor.

Journal ArticleDOI
TL;DR: The results show that the proposed hash-based monitoring pattern can detect attacks within one instruction cycle at lower memory requirements than traditional approaches that use control flow information.
Abstract: The inherent limitations of embedded systems make them particularly vulnerable to attacks. We have developed a hardware monitor that operates in parallel to an embedded processor and detects any attack that causes the embedded processor to deviate from its originally programmed behavior. We explore several different characteristics that can be used for monitoring and quantify trade-offs between these approaches. Our results show that our proposed hash-based monitoring pattern can detect attacks within one instruction cycle at lower memory requirements than traditional approaches that use control flow information.

Journal ArticleDOI
TL;DR: This paper tackles the problem of deriving a Petri net from a state-based model, using the theory of regions, using some of the restrictions required in the traditional approach, together with significant extensions that make the approach applicable in new scenarios.
Abstract: The theory of regions was introduced in the early nineties as a method to bridge state and event-based models. This paper tackles the problem of deriving a Petri net from a state-based model, using the theory of regions. Some of the restrictions required in the traditional approach are dropped in this paper, together with significant extensions that make the approach applicable in new scenarios. One of these scenarios is Process Mining, where accepting (discovering) additional behavior in the synthesized Petri net is sometimes valued. The algorithmic emphasis used in this paper contributes to the demystification of the theory of regions as been only a good theoretical exercise, opening the door for its application in the industrial domain.

Journal ArticleDOI
TL;DR: This paper deals with the problem of scheduling multiple applications, made of collections of independent and identical tasks, on a heterogeneous master-worker platform, and design and introduce a heuristic for the general case of online applications.
Abstract: Scheduling problems are already difficult on traditional parallel machines, and they become extremely challenging on heterogeneous clusters. In this paper, we deal with the problem of scheduling multiple applications, made of collections of independent and identical tasks, on a heterogeneous master-worker platform. The applications are submitted online, which means that there is no a priori (static) knowledge of the workload distribution at the beginning of the execution. The objective is to minimize the maximum stretch, i.e., the maximum ratio between the actual time an application has spent in the system and the time this application would have spent if executed alone. On the theoretical side, we design an optimal algorithm for the offline version of the problem (when all release dates and application characteristics are known beforehand). We also introduce a heuristic for the general case of online applications. On the practical side, we have conducted extensive simulations and MPI experiments, showing that we are able to deal with very large problem instances in a few seconds. Also, the solution that we compute totally outperforms classical heuristics from the literature, thereby fully assessing the usefulness of our approach.

Journal ArticleDOI
TL;DR: The results show that concerning the speed, the proposed architecture outperforms the modular multiplier based on standard modular multiplication by more than 50 percent and consumes less area compared to the standard solutions.
Abstract: This paper proposes two improved interleaved modular multiplication algorithms based on Barrett and Montgomery modular reduction. The algorithms are simple and especially suitable for hardware implementations. Four large sets of moduli for which the proposed methods apply are given and analyzed from a security point of view. By considering state-of-the-art attacks on public-key cryptosystems, we show that the proposed sets are safe to use, in practice, for both elliptic curve cryptography and RSA cryptosystems. We propose a hardware architecture for the modular multiplier that is based on our methods. The results show that concerning the speed, our proposed architecture outperforms the modular multiplier based on standard modular multiplication by more than 50 percent. Additionally, our design consumes less area compared to the standard solutions.

Journal ArticleDOI
TL;DR: The simulation results demonstrate that the hybrid peer-to-peer system can utilize both the efficiency of structured peer-To-peer network and the flexibility of the unstructured peer- to- peer network and achieve a good balance between the two types of networks.
Abstract: Peer-to-peer overlay networks are widely used in distributed systems. Based on whether a regular topology is maintained among peers, peer-to-peer networks can be divided into two categories: structured peer-to-peer networks in which peers are connected by a regular topology, and unstructured peer-to-peer networks in which the topology is arbitrary. Structured peer-to-peer networks usually can provide efficient and accurate services but need to spend a lot of effort in maintaining the regular topology. On the other hand, unstructured peer-to-peer networks are extremely resilient to the frequent peer joining and leaving but this is usually achieved at the expense of efficiency. The objective of this work is to design a hybrid peer-to-peer system for distributed data sharing which combines the advantages of both types of peer-to-peer networks and minimizes their disadvantages. The proposed hybrid peer-to-peer system is composed of two parts: the first part is a structured core network which forms the backbone of the hybrid system; the second part is made of multiple unstructured peer-to-peer networks each of which is attached to a node in the core network. The core structured network can narrow down the data lookup within a certain unstructured network accurately, while the unstructured networks provide a low-cost mechanism for peers to join or leave the system freely. A data lookup operation first checks the local unstructured network, and then, the structured network. This two-tier hierarchy can decouple the flexibility of the system from the efficiency of the system. Our simulation results demonstrate that the hybrid peer-to-peer system can utilize both the efficiency of structured peer-to-peer network and the flexibility of the unstructured peer-to-peer network and achieve a good balance between the two types of networks.

Journal ArticleDOI
TL;DR: This work presents an extendable broadcast authentication scheme called X-TESLA, as a new member of the TESLA family, to remedy the fact that previous schemes do not consider problems arising from sleep modes, network failures, idle sessions, as well as the time-memory-data tradeoff risk, and to reduce their high cost of countering DoS attacks.
Abstract: Authenticated broadcast, enabling a base station to send commands and requests to low-powered sensor nodes in an authentic manner, is one of the core challenges for securing wireless sensor networks. μTESLA and its multilevel variants based on delayed exposure of one-way chains are well known valuable broadcast authentication schemes, but concerns still remain for their practical application. To use these schemes on resource-limited sensor nodes, a 64-bit key chain is desirable for efficiency, but care must be taken. We will first show, by both theoretical analysis and rigorous experiments on real sensor nodes, that if μTESLA is implemented in a raw form with 64-bit key chains, some of the future keys can be discovered through time-memory-data-tradeoff techniques. We will then present an extendable broadcast authentication scheme called X-TESLA, as a new member of the TESLA family, to remedy the fact that previous schemes do not consider problems arising from sleep modes, network failures, idle sessions, as well as the time-memory-data tradeoff risk, and to reduce their high cost of countering DoS attacks. In X-TESLA, two levels of chains that have distinct intervals and cross-authenticate each other are used. This allows the short key chains to continue indefinitely and makes new interesting strategies and management methods possible, significantly reducing unnecessary computation and buffer occupation, and leads to efficient solutions to the raised problems.

Journal ArticleDOI
TL;DR: This work embodies the first attempt to accelerate a bioinformatics application using NoC and proposes optimized NoC architectures for different sequence alignment algorithms that were originally designed for distributed memory parallel computers and provides a thorough comparative evaluation of their respective performance and energy dissipation.
Abstract: The most pervasive compute operation carried out in almost all bioinformatics applications is pairwise sequence homology detection (or sequence alignment). Due to exponentially growing sequence databases, computing this operation at a large-scale is becoming expensive. An effective approach to speed up this operation is to integrate a very high number of processing elements in a single chip so that the massive scales of fine-grain parallelism inherent in several bioinformatics applications can be exploited efficiently. Network-on-chip (NoC) is a very efficient method to achieve such large-scale integration. In this work, we propose to bridge the gap between data generation and processing in bioinformatics applications by designing NoC architectures for the sequence alignment operation. Specifically, we 1) propose optimized NoC architectures for different sequence alignment algorithms that were originally designed for distributed memory parallel computers and 2) provide a thorough comparative evaluation of their respective performance and energy dissipation. While accelerators using other hardware architectures such as FPGA, general purpose graphics processing unit (GPU), and the cell broadband engine (CBE) have been previously designed for sequence alignment, the NoC paradigm enables integration of a much larger number of processing elements on a single chip and also offers a higher degree of flexibility in placing them along the die to suit the underlying algorithm. The results show that our NoC-based implementations can provide above 102-103-fold speedup over other hardware accelerators and above 104-fold speedup over traditional CPU architectures. This is significant because it will drastically reduce the time required to perform the millions of alignment operations that are typical in large-scale bioinformatics projects. To the best of our knowledge, this work embodies the first attempt to accelerate a bioinformatics application using NoC.

Journal ArticleDOI
TL;DR: Three heuristic algorithms are proposed to determine suitable partitions to satisfy HW/SW partitioning constraints and empirical results show that the proposed algorithms produce comparable and often better solutions when compared to the latest algorithm while reducing the time complexity significantly.
Abstract: Hardware/software (HW/SW) partitioning is one of the key challenges in HW/SW codesign. This paper presents efficient algorithms for the HW/SW partitioning problem, which has been proved to be NP-hard. We reduce the HW/SW partitioning problem to a variation of knapsack problem that is approximately solved by searching 1D solution space, instead of searching 2D solution space in the latest work cited in this paper, to reduce time complexity. Three heuristic algorithms are proposed to determine suitable partitions to satisfy HW/SW partitioning constraints. We have shown that the time complexity for partitioning a graph with n nodes and m edges is significantly reduced from O(dx · dy · n3) to O(n log n + d · (n + m)), where d and dx · dy are the number of the fragments of the searched 1D solution space and the searched 2D solution space, respectively. The lower bound on the solution quality is also proposed based on the new computing model to show that it is comparable to that reported in the literature. Moreover, empirical results show that the proposed algorithms produce comparable and often better solutions when compared to the latest algorithm while reducing the time complexity significantly.

Journal ArticleDOI
TL;DR: The problem of hierarchical bandwidth sharing in dynamic spectrum access (or cognitive radio) environment is considered as an interrelated market model used in microeconomics for which a multiple-level market is established among the primary, secondary, tertiary, and quaternary services.
Abstract: We consider the problem of hierarchical bandwidth sharing in dynamic spectrum access (or cognitive radio) environment. In the system model under consideration, licensed service (i.e., primary service) can share/sell its available bandwidth to an unlicensed service (i.e., secondary service), and again, this unlicensed service can share/sell its allocated bandwidth to other services (i.e., tertiary and quaternary services). We formulate the problem of hierarchical bandwidth sharing as an interrelated market model used in microeconomics for which a multiple-level market is established among the primary, secondary, tertiary, and quaternary services. We use the concept of demand and supply functions to obtain the equilibrium at which all the services are satisfied with the amount of allocated bandwidth and the price. These demand and supply functions are derived based on the utility of the connections using the different services (i.e., primary, secondary, tertiary, and quaternary services). For distributed implementation of the hierarchical bandwidth sharing model in a system in which global information is not available, iterative algorithms are proposed through which each service adapts its strategies to reach the equilibrium. The system stability condition is analyzed for these algorithms. Finally, we demonstrate the application of the proposed model to achieve dynamic bandwidth sharing in an integrated WiFi-WiMAX network.

Journal ArticleDOI
TL;DR: It is demonstrated empirically that the H-TRAA provides orders of magnitude faster convergence compared to the LAKG for simulated data pertaining to two-material unit-value functions, and opens avenues for handling demanding real-world applications such as the allocation of sampling resources in large-scale Web accessibility assessment problems.
Abstract: In a multitude of real-world situations, resources must be allocated based on incomplete and noisy information. However, in many cases, incomplete and noisy information render traditional resource allocation techniques ineffective. The decentralized Learning Automata Knapsack Game (LAKG) was recently proposed for solving one such class of problems, namely the class of Stochastic Nonlinear Fractional Knapsack Problems. Empirically, the LAKG was shown to yield a superior performance when compared to methods which are based on traditional parameter estimation schemes. This paper presents a completely new online Learning Automata (LA) system, namely the Hierarchy of Twofold Resource Allocation Automata (H-TRAA). In terms of contributions, we first of all, note that the primitive component of the H-TRAA is a Twofold Resource Allocation Automaton (TRAA) which possesses novelty in the field of LA. Second, the paper contains a formal analysis of the TRAA, including a rigorous proof for its convergence. Third, the paper proves the convergence of the H-TRAA itself. Finally, we demonstrate empirically that the H-TRAA provides orders of magnitude faster convergence compared to the LAKG for simulated data pertaining to two-material unit-value functions. Indeed, in contrast to the LAKG, the H-TRAA scales sublinearly. Consequently, we believe that the H-TRAA opens avenues for handling demanding real-world applications such as the allocation of sampling resources in large-scale Web accessibility assessment problems. We are currently working on applying the H-TRAA solution to the web-polling and sample-size detection problems applicable to the world wide web.