scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Computers in 2009"


Journal Article•DOI•
TL;DR: It is demonstrated that a 512-byte SRAM fingerprint contains sufficient entropy to generate 128-bit true random numbers and that the generated numbers pass the NIST tests for runs, approximate entropy, and block frequency.
Abstract: Intermittently powered applications create a need for low-cost security and privacy in potentially hostile environments, supported by primitives including identification and random number generation. Our measurements show that power-up of SRAM produces a physical fingerprint. We propose a system of fingerprint extraction and random numbers in SRAM (FERNS) that harvests static identity and randomness from existing volatile CMOS memory without requiring any dedicated circuitry. The identity results from manufacture-time physically random device threshold voltage mismatch, and the random numbers result from runtime physically random noise. We use experimental data from high-performance SRAM chips and the embedded SRAM of the WISP UHF RFID tag to validate the principles behind FERNS. For the SRAM chip, we demonstrate that 8-byte fingerprints can uniquely identify circuits among a population of 5,120 instances and extrapolate that 24-byte fingerprints would uniquely identify all instances ever produced. Using a smaller population, we demonstrate similar identifying ability from the embedded SRAM. In addition to identification, we show that SRAM fingerprints capture noise, enabling true random number generation. We demonstrate that a 512-byte SRAM fingerprint contains sufficient entropy to generate 128-bit true random numbers and that the generated numbers pass the NIST tests for runs, approximate entropy, and block frequency.

846 citations


Journal Article•DOI•
TL;DR: 3D NoC architectures are evaluated and demonstrate their superior functionality in terms of throughput, latency, energy dissipation and wiring area overhead compared to traditional 2D implementations.
Abstract: The Network-on-Chip (NoC) paradigm has emerged as a revolutionary methodology for integrating a very high number of intellectual property (IP) blocks in a single die. The achievable performance benefit arising out of adopting NoCs is constrained by the performance limitation imposed by the metal wire, which is the physical realization of communication channels. With technology scaling, only depending on the material innovation will extend the lifetime of conventional interconnect systems a few technology generations. According to International Technology Roadmap for Semiconductors (ITRS) for the longer term, new interconnect paradigms are in need. The conventional two dimensional (2D) integrated circuit (IC) has limited floor-planning choices, and consequently it limits the performance enhancements arising out of NoC architectures. Three dimensional (3D) ICs are capable of achieving better performance, functionality, and packaging density compared to more traditional planar ICs. On the other hand, NoC is an enabling solution for integrating large numbers of embedded cores in a single die. 3D NoC architectures combine the benefits of these two new domains to offer an unprecedented performance gain. In this paper we evaluate the performance of 3D NoC architectures and demonstrate their superior functionality in terms of throughput, latency, energy dissipation and wiring area overhead compared to traditional 2D implementations.

474 citations


Journal Article•DOI•
TL;DR: The unique QCA characteristics are utilizes to design a carry flow adder that is fast and efficient and the design of serial parallel multipliers is explored, which indicates very attractive performance.
Abstract: Quantum-dot cellular automata (QCA) is an emerging nanotechnology, with the potential for faster speed, smaller size, and lower power consumption than transistor-based technology. Quantum-dot cellular automata has a simple cell as the basic element. The cell is used as a building block to construct gates and wires. Previously, adder designs based on conventional designs were examined for implementation with QCA technology. That work demonstrated that the design trade-offs are very different in QCA. This paper utilizes the unique QCA characteristics to design a carry flow adder that is fast and efficient. Simulations indicate very attractive performance (i.e., complexity, area, and delay). This paper also explores the design of serial parallel multipliers. A serial parallel multiplier is designed and simulated with several different operand sizes.

342 citations


Journal Article•DOI•
TL;DR: It is argued that in this model, the product combining is more efficient not only than absolute difference combining, but also than all the other combining techniques proposed in the literature.
Abstract: Second order Differential Power Analysis (2O-DPA) is a powerful side-channel attack that allows an attacker to bypass the widely used masking countermeasure. To thwart 2O-DPA, higher order masking may be employed but it implies a nonnegligible overhead. In this context, there is a need to know how efficient a 2O-DPA can be, in order to evaluate the resistance of an implementation that uses first order masking and, possibly, some hardware countermeasures. Different methods of mounting a practical 2O-DPA attack have been proposed in the literature. However, it is not yet clear which of these methods is the most efficient. In this paper, we give a formal description of the higher order DPA that are mounted against software implementations. We then introduce a framework in which the attack efficiencies may be compared. The attacks we focus on involve the combining of several leakage signals and the computation of correlation coefficients to discriminate the wrong key hypotheses. In the second part of this paper, we pay particular attention to 2O-DPA that involves the product combining or the absolute difference combining. We study them under the assumption that the device leaks the Hamming weight of the processed data together with an independent Gaussian noise. After showing a way to improve the product combining, we argue that in this model, the product combining is more efficient not only than absolute difference combining, but also than all the other combining techniques proposed in the literature.

288 citations


Journal Article•DOI•
TL;DR: FPC is described and evaluated, a fast lossless compression algorithm for linear streams of 64-bit floating-point data that works well on hard-to-compress scientific data sets and meets the throughput demands of high-performance systems.
Abstract: Many scientific programs exchange large quantities of double-precision data between processing nodes and with mass storage devices. Data compression can reduce the number of bytes that need to be transferred and stored. However, data compression is only likely to be employed in high-end computing environments if it does not impede the throughput. This paper describes and evaluates FPC, a fast lossless compression algorithm for linear streams of 64-bit floating-point data. FPC works well on hard-to-compress scientific data sets and meets the throughput demands of high-performance systems. A comparison with five lossless compression schemes, BZIP2, DFCM, FSD, GZIP, and PLMI, on 4 architectures and 13 data sets shows that FPC compresses and decompresses one to two orders of magnitude faster than the other algorithms at the same geometric-mean compression ratio. Moreover, FPC provides a guaranteed throughput as long as the prediction tables fit into the L1 data cache. For example, on a 1.6-GHz Itanium 2 server, the throughput is 670 Mbytes/s regardless of what data are being compressed.

224 citations


Journal Article•DOI•
TL;DR: This paper proposes new results on necessary and sufficient schedulability analysis for EDF scheduling; the new results reduce, exponentially, the calculation times, in all situations, for schedulable task sets, and in most situation, for unscheduled task sets.
Abstract: Real-time scheduling is the theoretical basis of real-time systems engineering. Earliest deadline first (EDF) is an optimal scheduling algorithm for uniprocessor real-time systems. Existing results on an exact schedulability test for EDF task systems with arbitrary relative deadlines need to calculate the processor demand of the task set at every absolute deadline to check if there is an overflow in a specified time interval. The resulting large number of calculations severely restricts the use of EDF in practice. In this paper, we propose new results on necessary and sufficient schedulability analysis for EDF scheduling; the new results reduce, exponentially, the calculation times, in all situations, for schedulable task sets, and in most situations, for unschedulable task sets. For example, a 16-task system that in the previous analysis had to check 858,331 points (deadlines) can, with the new analysis, be checked at just 12 points. There are no restrictions on the new results: each task can be periodic or sporadic, with relative deadline, which can be less than, equal to, or greater than its period, and task parameters can range over many orders of magnitude.

213 citations


Journal Article•DOI•
TL;DR: This work investigates static and dynamic reliability-aware energy management schemes to minimize energy consumption for periodic real-time systems while preserving system reliability and presents two integrated approaches to reclaim both static andynamic slack at runtime.
Abstract: Dynamic voltage and frequency scaling (DVFS) has been widely used to manage energy in real-time embedded systems. However, it was recently shown that DVFS has direct and adverse effects on system reliability. In this work, we investigate static and dynamic reliability-aware energy management schemes to minimize energy consumption for periodic real-time systems while preserving system reliability. Focusing on earliest deadline first (EDF) scheduling, we first show that the static version of the problem is NP-hard and propose two task-level utilization-based heuristics. Then, we develop a job-level online scheme by building on the idea of wrapper-tasks, to monitor and manage dynamic slack efficiently in reliability-aware settings. The feasibility of the dynamic scheme is formally proved. Finally, we present two integrated approaches to reclaim both static and dynamic slack at runtime. To preserve system reliability, the proposed schemes incorporate recovery tasks/jobs into the schedule as needed, while still using the remaining slack for energy savings. The proposed schemes are evaluated through extensive simulations. The results confirm that all the proposed schemes can preserve the system reliability, while the ordinary (but reliability-ignorant) energy management schemes result in drastically decreased system reliability. For the static heuristics, the energy savings are close to what can be achieved by an optimal solution by a margin of 5 percent. By effectively exploiting the runtime slack, the dynamic schemes can achieve additional energy savings while preserving system reliability.

167 citations


Journal Article•DOI•
Sooyong Kang1, Sungmin Park1, Ho-Young Jung1, Hyoki Shim1, Jaehyuk Cha1 •
TL;DR: This paper proposes various block-based NVRAM write buffer management policies and evaluates the performance improvement of NAND flash memory-based storage systems under each policy and proposes a novel write buffer-aware flash translation layer algorithm, optimistic FTL, which is designed to harmonize well with N VRAM write buffers.
Abstract: While NAND flash memory is used in a variety of end-user devices, it has a few disadvantages, such as asymmetric speed of read and write operations, inability to in-place updates, among others. To overcome these problems, various flash-aware strategies have been suggested in terms of buffer cache, file system, FTL, and others. Also, the recent development of next-generation nonvolatile memory types such as MRAM, FeRAM, and PRAM provide higher commercial value to non-volatile RAM (NVRAM). At today's prices, however, they are not yet cost-effective. In this paper, we suggest the utilization of small-sized, next-generation NVRAM as a write buffer to improve the .overall performance of NAND flash memory-based storage systems. We propose various block-based NVRAM write buffer management policies and evaluate the performance improvement of NAND flash memory-based storage systems under each policy. Also, we propose a novel write buffer-aware flash translation layer algorithm, optimistic FTL, which is designed to harmonize well with NVRAM write buffers. Simulation results show that the proposed buffer management policies outperform the traditional page-based LRU algorithm and the proposed optimistic FTL outperforms previous log block-based FTL algorithms, such as BAST and FAST.

140 citations


Journal Article•DOI•
TL;DR: The multiple directional cover sets (MDCS) problem of organizing the directions of sensors into a group of non-disjoint cover sets to extend the network lifetime is addressed and the MDCS is proved to be NP-complete.
Abstract: Unlike convectional omnidirectional sensors that always have an omni-angle of sensing range, directional sensors may have a limited angle of sensing range due to the technical constraints or cost considerations. A directional sensor network consists of a number of directional sensors, which can switch to several directions to extend their sensing ability to cover all the targets in a given area. Power conservation is still an important issue in such directional sensor networks. In this paper, we address the multiple directional cover sets (MDCS) problem of organizing the directions of sensors into a group of non-disjoint cover sets to extend the network lifetime. One cover set in which the directions cover all the targets is activated at one time. We prove the MDCS to be NP-complete and propose several algorithms for the MDCS. Simulation results are presented to demonstrate the performance of these algorithms.

127 citations


Journal Article•DOI•
TL;DR: A cryptographic formalization of steganographic security in terms of computational indistinguishability from a channel, an indexed family of probability distributions on cover messages, and a construction that is provably secure and computationally efficient and has nearly optimal bandwidth, assuming repeatable access to the channel distribution.
Abstract: Steganography is the problem of hiding secret messages in "innocent-lookingrdquo public communication so that the presence of the secret messages cannot be detected. This paper introduces a cryptographic formalization of steganographic security in terms of computational indistinguishability from a channel, an indexed family of probability distributions on cover messages. We use cryptographic and complexity-theoretic proof techniques to show that the existence of one-way functions and the ability to sample from the channel are necessary conditions for secure steganography. We then construct a steganographic protocol, based on rejection sampling from the channel, that is provably secure and has nearly optimal bandwidth under these conditions. This is the first known example of a general provably secure steganographic protocol. We also give the first formalization of "robustrdquo steganography, where an adversary attempts to remove any hidden messages without unduly disrupting the cover channel. We give a necessary condition on the amount of disruption the adversary is allowed in terms of a worst case measure of mutual information. We give a construction that is provably secure and computationally efficient and has nearly optimal bandwidth, assuming repeatable access to the channel distribution.

99 citations


Journal Article•DOI•
TL;DR: This work proposes synchronous semantics for function blocks and shows its feasibility by translating function blocks into a subset of Esterel, a well-known synchronous language.
Abstract: IEC 61499 has been endorsed as the standard for modeling and implementing distributed industrial process measurement and control systems. The standard prescribes the use of function blocks for designing systems in a component-oriented approach. The execution model of a basic function block and the manner for event/data connections between blocks are described therein. Unfortunately, the standard does not provide exhaustive specifications for function block execution. Consequently, multiple standard-compliant implementations exhibiting different behaviors are possible. This not only defeats the purpose of having a standard but also makes verification of function block systems difficult. To overcome this, we propose synchronous semantics for function blocks and show its feasibility by translating function blocks into a subset of Esterel, a well-known synchronous language. The proposed semantics avoids causal cycles common in Esterel and is proved to be reactive and deterministic under any composition. Moreover, verification techniques developed for synchronous systems can now be applied to function blocks.

Journal Article•DOI•
TL;DR: A recursive localization system that works with low-density networks, reduces the position error by almost 30 percent, requires 37 percent less processor resources to estimate a position, uses fewer beacon nodes, and also indicates the node position error based on its distance to the recursion origin.
Abstract: The establishment of a localization system is an important task in wireless sensor networks. Due to the geographical correlation between sensed data, location information is commonly used to name the gathered data and address nodes and regions in data dissemination protocols. In general, to estimate its location, a node needs the position information of at least three reference points (neighbors that know their positions). In this work, we propose a different scheme in which only two reference points are required in order to estimate a position. To choose between the two possible solutions of an estimate, we use the known direction of the recursion. This approach leads to a recursive localization system that works with low-density networks (increasing by 40 percent the number of nodes with estimates in some cases), reduces the position error by almost 30 percent, requires 37 percent less processor resources to estimate a position, uses fewer beacon nodes, and also indicates the node position error based on its distance to the recursion origin. No GPS-enabled node is required, since the recursion origin can be used as a relative coordinate system. The algorithm's evaluation is performed by comparing it with a similar localization system; also, experiments are made to evaluate the impact of both systems in geographic algorithms.

Journal Article•DOI•
TL;DR: This paper identifies two cases that may happen when scheduling dependent tasks with primary-backup approach, and derives two important constraints that must be satisfied, which play a crucial role in limiting the schedulability and overloading efficiency of backups of dependent tasks.
Abstract: Fault-tolerant scheduling is an imperative step for large-scale computational grid systems, as often geographically distributed nodes co-operate to execute a task. By and large, primary-backup approach is a common methodology used for fault tolerance wherein each task has a primary copy and a backup copy on two different processors. In this paper, we identify two cases that may happen when scheduling dependent tasks with primary-backup approach. We derive two important constraints that must be satisfied. Further, we show that these two constraints play a crucial role in limiting the schedulability and overloading efficiency of backups of dependent tasks. We then propose two strategies to improve schedulability and overloading efficiency, respectively. We propose two algorithms (MRC-ECT and MCT-LRC), to schedule backups of independent jobs and dependent jobs, respectively. MRC-ECT is shown to guarantee an optimal backup schedule in terms of replication cost for an independent task, while MCT-LRC can schedule a backup of a dependent task with minimum completion time and less replication cost. We conduct extensive simulation experiments to quantify the performance of the proposed algorithms.

Journal Article•DOI•
TL;DR: This work correct and extend a few state-of-the-art dynamic SPT algorithms to handle multiple edge weight updates and compares them with the well-known static Dijkstra algorithm.
Abstract: Let G = (V, E, omega) be a simple digraph, in which all edge weights are nonnegative real numbers. Let G' be obtained from G by an application of a set of edge weight updates to G. Let sisinV and let Ts and Ts' be Shortest Path Trees (SPTs) rooted at s in G and G', respectively. The Dynamic Shortest Path (DSP) problem is to compute Ts' from Ts. Existing work on this problem focuses on either a single edge weight change or multiple edge weight changes in which some of them are incorrect or are not optimized. We correct and extend a few state-of-the-art dynamic SPT algorithms to handle multiple edge weight updates. We prove that these algorithms are correct. Dynamic algorithms may not outperform static algorithms all the time. To evaluate the proposed dynamic algorithms, we compare them with the well-known static Dijkstra algorithm. Extensive experiments are conducted with both real-life and artificial data sets. The experimental results suggest the most appropriate algorithms to be used under different circumstances.

Journal Article•DOI•
TL;DR: This research derives a technique possessing three desirable properties of estimates of the exact response times: continuity with respect to system parameters, efficient computability, and approximability for estimating the worst-case response time of sporadic task systems that are scheduled using fixed priorities upon a preemptive uniprocessor.
Abstract: Since worst case response times must be determined repeatedly during the interactive design of real-time application systems, repeated exact computation of such response times would slow down the design process considerably. In this research, we identify three desirable properties of estimates of the exact response times: continuity with respect to system parameters, efficient computability, and approximability. We derive a technique possessing these properties for estimating the worst-case response time of sporadic task systems that are scheduled using fixed priorities upon a preemptive uniprocessor.

Journal Article•DOI•
TL;DR: This work formalizes the distance-sensitive service discovery problem in wireless sensor and actor networks, and proposes a novel localized algorithm, iMesh, which uses no global computation and generates constant per-node storage load.
Abstract: We formalize the distance-sensitive service discovery problem in wireless sensor and actor networks, and propose a novel localized algorithm, iMesh. Unlike existing solutions, iMesh uses no global computation and generates constant per-node storage load. In iMesh, new service providers (i.e., actors) publish their location information in four directions, updating an information mesh. Information propagation for relatively remote services is restricted by a blocking rule, which also updates the mesh structure. Based on an extension rule, nodes along mesh edges may further advertise newly arrived relatively near service by backward distance-limited transmissions, replacing previously closer service location. The final information mesh is a planar structure constituted by the information propagation paths. It stores locations of all the service providers and serves as service directory. Service consumers (i.e., sensors) conduct a lookup process restricted within their home mesh cells to discover nearby services. We analytically study the properties of iMesh including construction cost and distance sensitivity over a static network model. We evaluate its performance in static/dynamic network scenarios through extensive simulation. Simulation results verify our theoretical findings and show that iMesh guarantees nearby (closest) service selection with very high probability, Gt99 percent (respectively, Gt95 percent).

Journal Article•DOI•
TL;DR: This paper considers the Montgomery multiplication in the binary extension fields and designs two bit-parallel multipliers which are comparable to the best finite field multipliers reported in the literature.
Abstract: Multiplication and squaring are main finite field operations in cryptographic computations and designing efficient multipliers and squarers affect the performance of cryptosystems. In this paper, we consider the Montgomery multiplication in the binary extension fields and study different structures of bit-serial and bit-parallel multipliers. For each of these structures, we study the role of the Montgomery factor, and then by using appropriate factors, propose new architectures. Specifically, we propose two bit-serial multipliers for general irreducible polynomials, and then derive bit-parallel Montgomery multipliers for two important classes of irreducible polynomials. In this regard, first we consider trinomials and provide a way for finding efficient Montgomery factors which results in a low time complexity. Then, we consider type-II irreducible pentanomials and design two bit-parallel multipliers which are comparable to the best finite field multipliers reported in the literature. Moreover, we consider squaring using this family of irreducible polynomials and show that this operation can be performed very fast with the time complexity of two XOR gates.

Journal Article•DOI•
TL;DR: State-of-the-art statistical simulation is enhanced by modeling the memory address stream behavior in a more microarchitecture-independent way and by modeling a program's time-varying execution behavior to enable accurately modeling resource conflicts in shared resources as observed in the memory hierarchy of contemporary chip multiprocessors when multiple programs are coexecuting on the CMP.
Abstract: Developing fast chip multiprocessor simulation techniques is a challenging problem. Solving this problem is especially valuable for design space exploration purposes during the early stages of the design cycle where a large number of design points need to be evaluated quickly. This paper studies statistical simulation as a fast simulation technique for chip multiprocessor (CMP) design space exploration. The idea of statistical simulation is to measure a number of program execution characteristics from a real program execution through profiling, to generate a synthetic trace from it, and simulate that synthetic trace as a proxy for the original program. The important benefit is that the synthetic trace is much shorter compared to a real program trace, which leads to substantial simulation speedups. This paper enhances state-of-the-art statistical simulation: 1) by modeling the memory address stream behavior in a more microarchitecture-independent way and 2) by modeling a program's time-varying execution behavior. These two enhancements enable accurately modeling resource conflicts in shared resources as observed in the memory hierarchy of contemporary chip multiprocessors when multiple programs are coexecuting on the CMP. Our experimental evaluation using the SPEC CPU benchmarks demonstrates average prediction error of 7.3 percent across a range of CMP configurations while varying the number of cores and memory hierarchy configurations.

Journal Article•DOI•
TL;DR: It is shown that the dense die-to-die vias enable 3D-integrated SRAM components that are partitioned at the level of individual wordlines or bitlines, which results in a wire length reduction within SRAM arrays, and a reduction in the area footprint, which reduces the wires required for global routing.
Abstract: 3D integration is an emergent technology that has the potential to greatly increase device density while simultaneously providing faster on-chip communication. 3D fabrication involves stacking two or more die connected with a very high density and low-latency interface. The die-to-die vias that comprise this interface can be treated as regular on-chip metal due to their small size (on the order of 1 mum) and high speed (sub-FO4 die-to-die communication delay). The increased device density and the ability to place and route in the third dimension provide new opportunities for microarchitecture design. In this paper, we focus on the 3D-integrated designs of SRAM structures. We show that the dense die-to-die vias enable 3D-integrated SRAM components that are partitioned at the level of individual wordlines or bitlines. This results in a wire length reduction within SRAM arrays, and a reduction in the area footprint, which reduces the wires required for global routing. The wire length reduction provides simultaneous latency and energy reduction benefits, e.g., 47 percent latency reduction and 18 percent energy reduction for a 4 MB 4-die stacked 3D SRAM array. A 3D implementation of a 128-entry multiported SRAM array achieves a 36 percent latency improvement with a simultaneous energy reduction of 55 percent. As planar designs adapt high-performance techniques such as hierarchical wordlines to improve performance, 3D integration provides even larger benefits, making it a desirable technology for high-performance designs. For the 4 MB SRAM array, the 3D-integrated designs provide additional latency reduction benefit over the planar designs when hierarchical wordlines are implemented in both planar and 3D designs.

Journal Article•DOI•
Marius Cornea1, John Harrison1, Cristina S. Anderson1, P. Tang2, E. Schneider1, E. Gvozdev1 •
TL;DR: New algorithms and properties are presented in this paper which are used in a software implementation of the IEEE 754R decimal floatingpoint arithmetic, with emphasis on using binary operations efficiently.
Abstract: The IEEE Standard 754-1985 for binary floating-point arithmetic [19] was revised [20], and an important addition is the definition of decimal floating-point arithmetic [8], [24]. This is intended mainly to provide a robust reliable framework for financial applications that are often subject to legal requirements concerning rounding and precision of the results, because the binary floating-point arithmetic may introduce small but unacceptable errors. Using binary floating-point calculations to emulate decimal calculations in order to correct this issue has led to the existence of numerous proprietary software packages, each with its own characteristics and capabilities. The IEEE 754R decimal arithmetic should unify the ways decimal floating-point calculations are carried out on various platforms. New algorithms and properties are presented in this paper, which are used in a software implementation of the IEEE 754R decimal floating-point arithmetic, with emphasis on using binary operations efficiently. The focus is on rounding techniques for decimal values stored in binary format, but algorithms are outlined for the more important or interesting operations of addition, multiplication, and division, including the case of nonhomogeneous operands, as well as conversions between binary and decimal floating-point formats. Performance results are included for a wider range of operations, showing promise that our approach is viable for applications that require decimal floating-point calculations. This paper extends an earlier publication [6].

Journal Article•DOI•
TL;DR: An in-depth investigation of security problems unique to UWSNs (including a new adversarial model) is presented and some simple and effective countermeasures for a certain class of attacks are proposed.
Abstract: In recent years, wireless sensor networks (WSNs) have been a very popular research topic, offering a treasure trove of systems, networking, hardware, security, and application-related problems. Much of prior research assumes that the WSN is supervised by a constantly present sink and sensors can quickly offload collected data. In this paper, we focus on unattended WSNs (UWSNs) characterized by intermittent sink presence and operation in hostile settings. Potentially lengthy intervals of sink absence offer greatly increased opportunities for attacks resulting in erasure, modification, or disclosure of sensor-collected data. This paper presents an in-depth investigation of security problems unique to UWSNs (including a new adversarial model) and proposes some simple and effective countermeasures for a certain class of attacks.

Journal Article•DOI•
TL;DR: This work proposes an online instability detection architecture that can be implemented by individual routers based on adaptive segmentation of feature traces extracted from BGP update messages and exploiting the temporal and spatial correlations in the traces for robust detection of the instability events.
Abstract: The importance of border gateway protocol (BGP) as the primary interautonomous system (AS) routing protocol that maintains the connectivity of the Internet imposes stringent stability requirements on its route selection process. Accidental and malicious activities such as misconfigurations, failures, and worm attacks can induce severe BGP instabilities leading to data loss, extensive delays, and loss of connectivity. In this work, we propose an online instability detection architecture that can be implemented by individual routers. We use statistical pattern recognition techniques for detecting the instabilities, and the algorithm is evaluated using real Internet data for a diverse set of events including misconfiguration, node failures, and several worm attacks. The proposed scheme is based on adaptive segmentation of feature traces extracted from BGP update messages and exploiting the temporal and spatial correlations in the traces for robust detection of the instability events. Furthermore, we use route change information to pinpoint the culprit ASes where the instabilities have originated.

Journal Article•DOI•
TL;DR: In order to improve the speed of parallel decimal multiplication, a new PPG method is presented, fine-tune the PPR method of one of the full solutions and the final addition scheme of the other; thus, assembling a new full solution is presented.
Abstract: Hardware support for decimal computer arithmetic is regaining popularity. One reason is the recent growth of decimal computations in commercial, scientific, financial, and Internet-based computer applications. Newly commercialized decimal arithmetic hardware units use radix-10 sequential multipliers that are rather slow for multiplication-intensive applications. Therefore, the future relevant processors are likely to host fast parallel decimal multiplication circuits. The corresponding hardware algorithms are normally composed of three steps: partial product generation (PPG), partial product reduction (PPR), and final carry-propagating addition. The state of the art is represented by two recent full solutions with alternative designs for all the three aforementioned steps. In addition, PPR by itself has been the focus of other recent studies. In this paper, we examine both of the full solutions and the impact of a PPR-only design on the appropriate one. In order to improve the speed of parallel decimal multiplication, we present a new PPG method, fine-tune the PPR method of one of the full solutions and the final addition scheme of the other; thus, assembling a new full solution. Logical Effort analysis and 0.13 mum synthesis show at least 13 percent speed advantage, but at a cost of at most 36 percent additional area consumption.

Journal Article•DOI•
TL;DR: Using the construction, it is shown that every m-dimensional restricted HL-graph and recursive circulant G(2m, 4) with f or less faulty elements have a paired k-DPC for any f and k ges 2 with f + 2k les m.
Abstract: A many-to-many k-disjoint path cover (k-DPC) of a graph G is a set of k disjoint paths joining k sources and k sinks in which each vertex of G is covered by a path. It is called a paired many-to-many disjoint path cover when each source should be joined to a specific sink, and it is called an unpaired many-to-many disjoint path cover when each source can be joined to an arbitrary sink. In this paper, we discuss about paired and unpaired many-to-many disjoint path covers including their relationships, application to strong Hamiltonicity, and necessary conditions. And then, we give a construction scheme for paired many-to-many disjoint path covers in the graph H0 oplus H1 obtained from connecting two graphs H0 and H1 with |V(H0)| = |V(H1)| by |V(H1)| pairwise nonadjacent edges joining vertices in H0 and vertices in H1, where H0 = G0 oplus G1 and H1 = G2 oplus G3 for some graphs Gj. Using the construction, we show that every m-dimensional restricted HL-graph and recursive circulant G(2m, 4) with f or less faulty elements have a paired k-DPC for any f and k ges 2 with f + 2k les m.

Journal Article•DOI•
TL;DR: A proactive content poisoning scheme to stop colluders and pirates from alleged copyright infringements in P2P file sharing, and develops a new peer authorization protocol (PAP) to distinguish pirates from legitimate clients.
Abstract: Collusive piracy is the main source of intellectual property violations within the boundary of a P2P network. Paid clients (colluders) may illegally share copyrighted content files with unpaid clients (pirates). Such online piracy has hindered the use of open P2P networks for commercial content delivery. We propose a proactive content poisoning scheme to stop colluders and pirates from alleged copyright infringements in P2P file sharing. The basic idea is to detect pirates timely with identity-based signatures and time-stamped tokens. The scheme stops collusive piracy without hurting legitimate P2P clients by targeting poisoning on detected violators, exclusively. We developed a new peer authorization protocol (PAP) to distinguish pirates from legitimate clients. Detected pirates will receive poisoned chunks in their repeated attempts. Pirates are thus severely penalized with no chance to download successfully in tolerable time. Based on simulation results, we find 99.9 percent prevention rate in Gnutella, KaZaA, and Freenet. We achieved 85-98 percent prevention rate on eMule, eDonkey, Morpheus, etc. The scheme is shown less effective in protecting some poison-resilient networks like BitTorrent and Azureus. Our work opens up the low-cost P2P technology for copyrighted content delivery. The advantage lies mainly in minimum delivery cost, higher content availability, and copyright compliance in exploring P2P network resources.

Journal Article•DOI•
TL;DR: A new deadlock avoidance technique is proposed for 3D meshes using only two virtual channels by making full use of the idle channels in a deadlock-free adaptive fault-tolerant routing scheme based on a planar network (PN) fault model.
Abstract: The number of virtual channels required for deadlock-free routing is important for cost-effective and high-performance system design. The planar adaptive routing scheme is an effective deadlock avoidance technique using only three virtual channels for each physical channel in 3D or higher dimensional mesh networks with a very simple deadlock avoidance scheme. However, there exist one idle virtual channel for all physical channels along the first dimension and two idle virtual channels for channels along the last dimension in a mesh network based on the planar adaptive routing algorithm. A new deadlock avoidance technique is proposed for 3D meshes using only two virtual channels by making full use of the idle channels. The deadlock-free adaptive routing scheme is then modified to a deadlock-free adaptive fault-tolerant routing scheme based on a planar network (PN) fault model. The proposed deadlock-free adaptive routing scheme is also extended to n-dimensional meshes still using two virtual channels. Sufficient simulation results are presented to demonstrate the effectiveness of the proposed algorithm.

Journal Article•DOI•
TL;DR: This paper presents an approach for software rejuvenation based on automated self-healing techniques that can be easily applied to off-the-shelf application servers, and exploits the usage of virtualization to optimize the self-recovery actions.
Abstract: In this paper, we present an approach for software rejuvenation based on automated self-healing techniques that can be easily applied to off-the-shelf application servers. Software aging and transient failures are detected through continuous monitoring of system data and performability metrics of the application server. If some anomalous behavior is identified, the system triggers an automatic rejuvenation action. This self-healing scheme is meant to disrupt the running service for a minimal amount of time, achieving zero downtime in most cases. In our scheme, we exploit the usage of virtualization to optimize the self-recovery actions. The techniques described in this paper have been tested with a set of open-source Linux tools and the XEN virtualization middleware. We conducted an experimental study with two application benchmarks (Tomcat/Axis and TPC-W). Our results demonstrate that virtualization can be extremely helpful for failover and software rejuvenation in the occurrence of transient failures and software aging.

Journal Article•DOI•
TL;DR: The statistical bases for current models of RAID reliability are reviewed and a highly accurate alternative is provided and justified, which corrects statistical errors associated with the pervasive assumption that system (RAID group) times to failure follow a homogeneous Poisson process.
Abstract: The statistical bases for current models of RAID reliability are reviewed and a highly accurate alternative is provided and justified. This new model corrects statistical errors associated with the pervasive assumption that system (RAID group) times to failure follow a homogeneous Poisson process, and corrects errors associated with assuming the time-to-failure and time-to-restore distributions are exponentially distributed. Statistical justification for the new model uses theory for reliability of repairable systems. Four critical component distributions are developed from field data. These distributions are for times to catastrophic failure, reconstruction and restoration, read errors, and disk data scrubs. Model results have been verified and predict between 2 to 1,500 times as many double disk failures as estimates made using the mean time to data loss method. Model results are compared to system level field data for RAID group of 14 drives and show excellent correlation and greater accuracy than either MTTDL.

Journal Article•DOI•
TL;DR: A multiresolution compression and query (MRCQ) framework to support in-network data compression and data storage in WSNs from both space and time domains is proposed and is expected to save sensors' energy significantly and thus, can support long-term monitoring WSN applications.
Abstract: In many WSN (wireless sensor network) applications, such as [1], [2], [3], the targets are to provide long-term monitoring of environments. In such applications, energy is a primary concern because sensor nodes have to regularly report data to the sink and need to continuously work for a very long time so that users may periodically request a rough overview of the monitored environment. On the other hand, users may occasionally query more in-depth data of certain areas to analyze abnormal events. These requirements motivate us to propose a multiresolution compression and query (MRCQ) framework to support in-network data compression and data storage in WSNs from both space and time domains. Our MRCQ framework can organize sensor nodes hierarchically and establish multiresolution summaries of sensing data inside the network, through spatial and temporal compressions. In the space domain, only lower resolution summaries are sent to the sink; the other higher resolution summaries are stored in the network and can be obtained via queries. In the time domain, historical data stored in sensor nodes exhibit a finer resolution for more recent data, and a coarser resolution for older data. Our methods consider the hardware limitations of sensor nodes. So, the result is expected to save sensors' energy significantly, and thus, can support long-term monitoring WSN applications. A prototyping system is developed to verify its feasibility. Simulation results also show the efficiency of MRCQ compared to existing work.

Journal Article•DOI•
TL;DR: In this paper, a rational interval arithmetic for real number calculations on elementary functions is presented. But real number calculation is difficult to handle in mechanical proofs and it is not possible to handle it in a theorem prover.
Abstract: Real number calculations on elementary functions are remarkably difficult to handle in mechanical proofs. In this paper, we show how these calculations can be performed within a theorem prover or proof assistant in a convenient and highly automated as well as interactive way. First, we formally establish upper and lower bounds for elementary functions. Then, based on these bounds, we develop a rational interval arithmetic where real number calculations take place in an algebraic setting. In order to reduce the dependency effect of interval arithmetic, we integrate two techniques: interval splitting and Taylor series expansions. This pragmatic approach has been developed, and formally verified, in a theorem prover. The formal development also includes a set of customizable strategies to automate proofs involving explicit calculations over real numbers. Our ultimate goal is to provide guaranteed proofs of numerical properties with minimal human theorem-prover interaction.