Showing papers in &quot;IEEE Transactions on Computers in 2009&quot;

Networks-on-Chip in a Three-Dimensional Environment: A Performance Evaluation

TL;DR: It is demonstrated that a 512-byte SRAM fingerprint contains sufficient entropy to generate 128-bit true random numbers and that the generated numbers pass the NIST tests for runs, approximate entropy, and block frequency.

...read moreread less

Abstract: Intermittently powered applications create a need for low-cost security and privacy in potentially hostile environments, supported by primitives including identification and random number generation. Our measurements show that power-up of SRAM produces a physical fingerprint. We propose a system of fingerprint extraction and random numbers in SRAM (FERNS) that harvests static identity and randomness from existing volatile CMOS memory without requiring any dedicated circuitry. The identity results from manufacture-time physically random device threshold voltage mismatch, and the random numbers result from runtime physically random noise. We use experimental data from high-performance SRAM chips and the embedded SRAM of the WISP UHF RFID tag to validate the principles behind FERNS. For the SRAM chip, we demonstrate that 8-byte fingerprints can uniquely identify circuits among a population of 5,120 instances and extrapolate that 24-byte fingerprints would uniquely identify all instances ever produced. Using a smaller population, we demonstrate similar identifying ability from the embedded SRAM. In addition to identification, we show that SRAM fingerprints capture noise, enabling true random number generation. We demonstrate that a 512-byte SRAM fingerprint contains sufficient entropy to generate 128-bit true random numbers and that the generated numbers pass the NIST tests for runs, approximate entropy, and block frequency.

...read moreread less

846 citations

Journal Article•DOI•

[...]

B. Feero, Partha Pratim Pande¹•Institutions (1)

Washington State University¹

01 Jan 2009-IEEE Transactions on Computers

TL;DR: 3D NoC architectures are evaluated and demonstrate their superior functionality in terms of throughput, latency, energy dissipation and wiring area overhead compared to traditional 2D implementations.

...read moreread less

Abstract: The Network-on-Chip (NoC) paradigm has emerged as a revolutionary methodology for integrating a very high number of intellectual property (IP) blocks in a single die. The achievable performance benefit arising out of adopting NoCs is constrained by the performance limitation imposed by the metal wire, which is the physical realization of communication channels. With technology scaling, only depending on the material innovation will extend the lifetime of conventional interconnect systems a few technology generations. According to International Technology Roadmap for Semiconductors (ITRS) for the longer term, new interconnect paradigms are in need. The conventional two dimensional (2D) integrated circuit (IC) has limited floor-planning choices, and consequently it limits the performance enhancements arising out of NoC architectures. Three dimensional (3D) ICs are capable of achieving better performance, functionality, and packaging density compared to more traditional planar ICs. On the other hand, NoC is an enabling solution for integrating large numbers of embedded cores in a single die. 3D NoC architectures combine the benefits of these two new domains to offer an unprecedented performance gain. In this paper we evaluate the performance of 3D NoC architectures and demonstrate their superior functionality in terms of throughput, latency, energy dissipation and wiring area overhead compared to traditional 2D implementations.

...read moreread less

474 citations

Journal Article•DOI•

Adder and Multiplier Design in Quantum-Dot Cellular Automata

[...]

H. Cho¹, Earl E. Swartzlander²•Institutions (2)

Qualcomm¹, University of Texas at Austin²

Statistical Analysis of Second Order Differential Power Analysis

TL;DR: The unique QCA characteristics are utilizes to design a carry flow adder that is fast and efficient and the design of serial parallel multipliers is explored, which indicates very attractive performance.

...read moreread less

Abstract: Quantum-dot cellular automata (QCA) is an emerging nanotechnology, with the potential for faster speed, smaller size, and lower power consumption than transistor-based technology. Quantum-dot cellular automata has a simple cell as the basic element. The cell is used as a building block to construct gates and wires. Previously, adder designs based on conventional designs were examined for implementation with QCA technology. That work demonstrated that the design trade-offs are very different in QCA. This paper utilizes the unique QCA characteristics to design a carry flow adder that is fast and efficient. Simulations indicate very attractive performance (i.e., complexity, area, and delay). This paper also explores the design of serial parallel multipliers. A serial parallel multiplier is designed and simulated with several different operand sizes.

...read moreread less

342 citations

Journal Article•DOI•

[...]

Emmanuel Prouff, Matthieu Rivain¹, Régis Bevan•Institutions (1)

University of Luxembourg¹

FPC: A High-Speed Compressor for Double-Precision Floating-Point Data

TL;DR: It is argued that in this model, the product combining is more efficient not only than absolute difference combining, but also than all the other combining techniques proposed in the literature.

...read moreread less

Abstract: Second order Differential Power Analysis (2O-DPA) is a powerful side-channel attack that allows an attacker to bypass the widely used masking countermeasure. To thwart 2O-DPA, higher order masking may be employed but it implies a nonnegligible overhead. In this context, there is a need to know how efficient a 2O-DPA can be, in order to evaluate the resistance of an implementation that uses first order masking and, possibly, some hardware countermeasures. Different methods of mounting a practical 2O-DPA attack have been proposed in the literature. However, it is not yet clear which of these methods is the most efficient. In this paper, we give a formal description of the higher order DPA that are mounted against software implementations. We then introduce a framework in which the attack efficiencies may be compared. The attacks we focus on involve the combining of several leakage signals and the computation of correlation coefficients to discriminate the wrong key hypotheses. In the second part of this paper, we pay particular attention to 2O-DPA that involves the product combining or the absolute difference combining. We study them under the assumption that the device leaks the Hamming weight of the processed data together with an independent Gaussian noise. After showing a way to improve the product combining, we argue that in this model, the product combining is more efficient not only than absolute difference combining, but also than all the other combining techniques proposed in the literature.

...read moreread less

288 citations

Journal Article•DOI•

[...]

Martin Burtscher¹, Paruj Ratanaworabhan²•Institutions (2)

University of Texas at Austin¹, Cornell University²

01 Jan 2009-IEEE Transactions on Computers

TL;DR: FPC is described and evaluated, a fast lossless compression algorithm for linear streams of 64-bit floating-point data that works well on hard-to-compress scientific data sets and meets the throughput demands of high-performance systems.

...read moreread less

Abstract: Many scientific programs exchange large quantities of double-precision data between processing nodes and with mass storage devices. Data compression can reduce the number of bytes that need to be transferred and stored. However, data compression is only likely to be employed in high-end computing environments if it does not impede the throughput. This paper describes and evaluates FPC, a fast lossless compression algorithm for linear streams of 64-bit floating-point data. FPC works well on hard-to-compress scientific data sets and meets the throughput demands of high-performance systems. A comparison with five lossless compression schemes, BZIP2, DFCM, FSD, GZIP, and PLMI, on 4 architectures and 13 data sets shows that FPC compresses and decompresses one to two orders of magnitude faster than the other algorithms at the same geometric-mean compression ratio. Moreover, FPC provides a guaranteed throughput as long as the prediction tables fit into the L1 data cache. For example, on a 1.6-GHz Itanium 2 server, the throughput is 670 Mbytes/s regardless of what data are being compressed.

...read moreread less

224 citations

Journal Article•DOI•

Schedulability Analysis for Real-Time Systems with EDF Scheduling

[...]

Fengxiang Zhang¹, Alan Burns¹•Institutions (1)

University of York¹

Reliability-Aware Energy Management for Periodic Real-Time Tasks

TL;DR: This paper proposes new results on necessary and sufficient schedulability analysis for EDF scheduling; the new results reduce, exponentially, the calculation times, in all situations, for schedulable task sets, and in most situation, for unscheduled task sets.

...read moreread less

Abstract: Real-time scheduling is the theoretical basis of real-time systems engineering. Earliest deadline first (EDF) is an optimal scheduling algorithm for uniprocessor real-time systems. Existing results on an exact schedulability test for EDF task systems with arbitrary relative deadlines need to calculate the processor demand of the task set at every absolute deadline to check if there is an overflow in a specified time interval. The resulting large number of calculations severely restricts the use of EDF in practice. In this paper, we propose new results on necessary and sufficient schedulability analysis for EDF scheduling; the new results reduce, exponentially, the calculation times, in all situations, for schedulable task sets, and in most situations, for unschedulable task sets. For example, a 16-task system that in the previous analysis had to check 858,331 points (deadlines) can, with the new analysis, be checked at just 12 points. There are no restrictions on the new results: each task can be periodic or sporadic, with relative deadline, which can be less than, equal to, or greater than its period, and task parameters can range over many orders of magnitude.

...read moreread less

213 citations

Journal Article•DOI•

[...]

Hakan Aydin¹, Dakai Zhu²•Institutions (2)

George Mason University¹, University of Texas at San Antonio²

01 Oct 2009-IEEE Transactions on Computers

TL;DR: This work investigates static and dynamic reliability-aware energy management schemes to minimize energy consumption for periodic real-time systems while preserving system reliability and presents two integrated approaches to reclaim both static andynamic slack at runtime.

...read moreread less

Abstract: Dynamic voltage and frequency scaling (DVFS) has been widely used to manage energy in real-time embedded systems. However, it was recently shown that DVFS has direct and adverse effects on system reliability. In this work, we investigate static and dynamic reliability-aware energy management schemes to minimize energy consumption for periodic real-time systems while preserving system reliability. Focusing on earliest deadline first (EDF) scheduling, we first show that the static version of the problem is NP-hard and propose two task-level utilization-based heuristics. Then, we develop a job-level online scheme by building on the idea of wrapper-tasks, to monitor and manage dynamic slack efficiently in reliability-aware settings. The feasibility of the dynamic scheme is formally proved. Finally, we present two integrated approaches to reclaim both static and dynamic slack at runtime. To preserve system reliability, the proposed schemes incorporate recovery tasks/jobs into the schedule as needed, while still using the remaining slack for energy savings. The proposed schemes are evaluated through extensive simulations. The results confirm that all the proposed schemes can preserve the system reliability, while the ordinary (but reliability-ignorant) energy management schemes result in drastically decreased system reliability. For the static heuristics, the energy savings are close to what can be achieved by an optimal solution by a margin of 5 percent. By effectively exploiting the runtime slack, the dynamic schemes can achieve additional energy savings while preserving system reliability.

...read moreread less

167 citations

Journal Article•DOI•

Performance Trade-Offs in Using NVRAM Write Buffer for Flash Memory-Based Storage Devices

[...]

Sooyong Kang¹, Sungmin Park¹, Ho-Young Jung¹, Hyoki Shim¹, Jaehyuk Cha¹ - Show less +1 more•Institutions (1)

Hanyang University¹

Energy Efficient Target-Oriented Scheduling in Directional Sensor Networks

TL;DR: This paper proposes various block-based NVRAM write buffer management policies and evaluates the performance improvement of NAND flash memory-based storage systems under each policy and proposes a novel write buffer-aware flash translation layer algorithm, optimistic FTL, which is designed to harmonize well with N VRAM write buffers.

...read moreread less

Abstract: While NAND flash memory is used in a variety of end-user devices, it has a few disadvantages, such as asymmetric speed of read and write operations, inability to in-place updates, among others. To overcome these problems, various flash-aware strategies have been suggested in terms of buffer cache, file system, FTL, and others. Also, the recent development of next-generation nonvolatile memory types such as MRAM, FeRAM, and PRAM provide higher commercial value to non-volatile RAM (NVRAM). At today's prices, however, they are not yet cost-effective. In this paper, we suggest the utilization of small-sized, next-generation NVRAM as a write buffer to improve the .overall performance of NAND flash memory-based storage systems. We propose various block-based NVRAM write buffer management policies and evaluate the performance improvement of NAND flash memory-based storage systems under each policy. Also, we propose a novel write buffer-aware flash translation layer algorithm, optimistic FTL, which is designed to harmonize well with NVRAM write buffers. Simulation results show that the proposed buffer management policies outperform the traditional page-based LRU algorithm and the proposed optimistic FTL outperforms previous log block-based FTL algorithms, such as BAST and FAST.

...read moreread less

140 citations

Journal Article•DOI•

[...]

Yanli Cai¹, Wei Lou², Minglu Li¹, Xiang-Yang Li³•Institutions (3)

Shanghai Jiao Tong University¹, Hong Kong Polytechnic University², Illinois Institute of Technology³

Provably Secure Steganography

TL;DR: The multiple directional cover sets (MDCS) problem of organizing the directions of sensors into a group of non-disjoint cover sets to extend the network lifetime is addressed and the MDCS is proved to be NP-complete.

...read moreread less

Abstract: Unlike convectional omnidirectional sensors that always have an omni-angle of sensing range, directional sensors may have a limited angle of sensing range due to the technical constraints or cost considerations. A directional sensor network consists of a number of directional sensors, which can switch to several directions to extend their sensing ability to cover all the targets in a given area. Power conservation is still an important issue in such directional sensor networks. In this paper, we address the multiple directional cover sets (MDCS) problem of organizing the directions of sensors into a group of non-disjoint cover sets to extend the network lifetime. One cover set in which the directions cover all the targets is activated at one time. We prove the MDCS to be NP-complete and propose several algorithms for the MDCS. Simulation results are presented to demonstrate the performance of these algorithms.

...read moreread less

127 citations

Journal Article•DOI•

[...]

Nicholas Hopper¹, L. von Ahn², John Langford³•Institutions (3)

University of Minnesota¹, Carnegie Mellon University², Yahoo!³

01 May 2009-IEEE Transactions on Computers

TL;DR: A cryptographic formalization of steganographic security in terms of computational indistinguishability from a channel, an indexed family of probability distributions on cover messages, and a construction that is provably secure and computationally efficient and has nearly optimal bandwidth, assuming repeatable access to the channel distribution.

...read moreread less

Abstract: Steganography is the problem of hiding secret messages in "innocent-lookingrdquo public communication so that the presence of the secret messages cannot be detected. This paper introduces a cryptographic formalization of steganographic security in terms of computational indistinguishability from a channel, an indexed family of probability distributions on cover messages. We use cryptographic and complexity-theoretic proof techniques to show that the existence of one-way functions and the ability to sample from the channel are necessary conditions for secure steganography. We then construct a steganographic protocol, based on rejection sampling from the channel, that is provably secure and has nearly optimal bandwidth under these conditions. This is the first known example of a general provably secure steganographic protocol. We also give the first formalization of "robustrdquo steganography, where an adversary attempts to remove any hidden messages without unduly disrupting the cover channel. We give a necessary condition on the amount of disruption the adversary is allowed in terms of a worst case measure of mutual information. We give a construction that is provably secure and computationally efficient and has nearly optimal bandwidth, assuming repeatable access to the channel distribution.

...read moreread less

99 citations

Journal Article•DOI•

A Synchronous Approach for IEC 61499 Function Block Implementation

[...]

Li Hsien Yoong¹, Partha S. Roop¹, Valeriy Vyatkin¹, Zoran Salcic¹•Institutions (1)

University of Auckland¹

01 Dec 2009-IEEE Transactions on Computers

TL;DR: This work proposes synchronous semantics for function blocks and shows its feasibility by translating function blocks into a subset of Esterel, a well-known synchronous language.

...read moreread less

Abstract: IEC 61499 has been endorsed as the standard for modeling and implementing distributed industrial process measurement and control systems. The standard prescribes the use of function blocks for designing systems in a component-oriented approach. The execution model of a basic function block and the manner for event/data connections between blocks are described therein. Unfortunately, the standard does not provide exhaustive specifications for function block execution. Consequently, multiple standard-compliant implementations exhibiting different behaviors are possible. This not only defeats the purpose of having a standard but also makes verification of function block systems difficult. To overcome this, we propose synchronous semantics for function blocks and show its feasibility by translating function blocks into a subset of Esterel, a well-known synchronous language. The proposed semantics avoids causal cycles common in Esterel and is proved to be reactive and deterministic under any composition. Moreover, verification techniques developed for synchronous systems can now be applied to function blocks.

...read moreread less

Journal Article•DOI•

An Efficient Directed Localization Recursion Protocol for Wireless Sensor Networks

[...]

H.A.B.F. de Oliveira¹, Azzedine Boukerche¹, Eduardo F. Nakamura, Antonio A. F. Loureiro²•Institutions (2)

University of Ottawa¹, Universidade Federal de Minas Gerais²

01 May 2009-IEEE Transactions on Computers

TL;DR: A recursive localization system that works with low-density networks, reduces the position error by almost 30 percent, requires 37 percent less processor resources to estimate a position, uses fewer beacon nodes, and also indicates the node position error based on its distance to the recursion origin.

...read moreread less

Abstract: The establishment of a localization system is an important task in wireless sensor networks. Due to the geographical correlation between sensed data, location information is commonly used to name the gathered data and address nodes and regions in data dissemination protocols. In general, to estimate its location, a node needs the position information of at least three reference points (neighbors that know their positions). In this work, we propose a different scheme in which only two reference points are required in order to estimate a position. To choose between the two possible solutions of an estimate, we use the known direction of the recursion. This approach leads to a recursive localization system that works with low-density networks (increasing by 40 percent the number of nodes with estimates in some cases), reduces the position error by almost 30 percent, requires 37 percent less processor resources to estimate a position, uses fewer beacon nodes, and also indicates the node position error based on its distance to the recursion origin. No GPS-enabled node is required, since the recursion origin can be used as a relative coordinate system. The algorithm's evaluation is performed by comparing it with a similar localization system; also, experiments are made to evaluate the impact of both systems in geographic algorithms.

...read moreread less

Journal Article•DOI•

On the Design of Fault-Tolerant Scheduling Strategies Using Primary-Backup Approach for Computational Grids with Low Replication Costs

[...]

Qin Zheng, Bharadwaj Veeravalli¹, Chen-Khong Tham¹•Institutions (1)

National University of Singapore¹

01 Mar 2009-IEEE Transactions on Computers

TL;DR: This paper identifies two cases that may happen when scheduling dependent tasks with primary-backup approach, and derives two important constraints that must be satisfied, which play a crucial role in limiting the schedulability and overloading efficiency of backups of dependent tasks.

...read moreread less

Abstract: Fault-tolerant scheduling is an imperative step for large-scale computational grid systems, as often geographically distributed nodes co-operate to execute a task. By and large, primary-backup approach is a common methodology used for fault tolerance wherein each task has a primary copy and a backup copy on two different processors. In this paper, we identify two cases that may happen when scheduling dependent tasks with primary-backup approach. We derive two important constraints that must be satisfied. Further, we show that these two constraints play a crucial role in limiting the schedulability and overloading efficiency of backups of dependent tasks. We then propose two strategies to improve schedulability and overloading efficiency, respectively. We propose two algorithms (MRC-ECT and MCT-LRC), to schedule backups of independent jobs and dependent jobs, respectively. MRC-ECT is shown to guarantee an optimal backup schedule in terms of replication cost for an independent task, while MCT-LRC can schedule a backup of a dependent task with minimum completion time and less replication cost. We conduct extensive simulation experiments to quantify the performance of the proposed algorithms.

...read moreread less

Journal Article•DOI•

Shortest Path Tree Computation in Dynamic Graphs

[...]

E.P.F. Chan¹, Yaya Yang²•Institutions (2)

University of Waterloo¹, Oracle Corporation²

01 Apr 2009-IEEE Transactions on Computers

TL;DR: This work correct and extend a few state-of-the-art dynamic SPT algorithms to handle multiple edge weight updates and compares them with the well-known static Dijkstra algorithm.

...read moreread less

Abstract: Let G = (V, E, omega) be a simple digraph, in which all edge weights are nonnegative real numbers. Let G' be obtained from G by an application of a set of edge weight updates to G. Let sisinV and let Ts and Ts' be Shortest Path Trees (SPTs) rooted at s in G and G', respectively. The Dynamic Shortest Path (DSP) problem is to compute Ts' from Ts. Existing work on this problem focuses on either a single edge weight change or multiple edge weight changes in which some of them are incorrect or are not optimized. We correct and extend a few state-of-the-art dynamic SPT algorithms to handle multiple edge weight updates. We prove that these algorithms are correct. Dynamic algorithms may not outperform static algorithms all the time. To evaluate the proposed dynamic algorithms, we compare them with the well-known static Dijkstra algorithm. Extensive experiments are conducted with both real-life and artificial data sets. The experimental results suggest the most appropriate algorithms to be used under different circumstances.

...read moreread less

Journal Article•DOI•

A Response-Time Bound in Fixed-Priority Scheduling with Arbitrary Deadlines

[...]

Enrico Bini, Thi Huyen Chau Nguyen¹, Pascal Richard¹, Sanjoy Baruah²•Institutions (2)

University of Poitiers¹, University of North Carolina at Chapel Hill²

01 Feb 2009-IEEE Transactions on Computers

TL;DR: This research derives a technique possessing three desirable properties of estimates of the exact response times: continuity with respect to system parameters, efficient computability, and approximability for estimating the worst-case response time of sporadic task systems that are scheduled using fixed priorities upon a preemptive uniprocessor.

...read moreread less

Abstract: Since worst case response times must be determined repeatedly during the interactive design of real-time application systems, repeated exact computation of such response times would slow down the design process considerably. In this research, we identify three desirable properties of estimates of the exact response times: continuity with respect to system parameters, efficient computability, and approximability. We derive a technique possessing these properties for estimating the worst-case response time of sporadic task systems that are scheduled using fixed priorities upon a preemptive uniprocessor.

...read moreread less

Journal Article•DOI•

Localized Distance-Sensitive Service Discovery in Wireless Sensor and Actor Networks

[...]

Xu Li¹, Nicola Santoro², Ivan Stojmenovic¹•Institutions (2)

University of Ottawa¹, Carleton University²

Bit-Serial and Bit-Parallel Montgomery Multiplication and Squaring over GF(2^m)

TL;DR: This work formalizes the distance-sensitive service discovery problem in wireless sensor and actor networks, and proposes a novel localized algorithm, iMesh, which uses no global computation and generates constant per-node storage load.

...read moreread less

Abstract: We formalize the distance-sensitive service discovery problem in wireless sensor and actor networks, and propose a novel localized algorithm, iMesh. Unlike existing solutions, iMesh uses no global computation and generates constant per-node storage load. In iMesh, new service providers (i.e., actors) publish their location information in four directions, updating an information mesh. Information propagation for relatively remote services is restricted by a blocking rule, which also updates the mesh structure. Based on an extension rule, nodes along mesh edges may further advertise newly arrived relatively near service by backward distance-limited transmissions, replacing previously closer service location. The final information mesh is a planar structure constituted by the information propagation paths. It stores locations of all the service providers and serves as service directory. Service consumers (i.e., sensors) conduct a lookup process restricted within their home mesh cells to discover nearby services. We analytically study the properties of iMesh including construction cost and distance sensitivity over a static network model. We evaluate its performance in static/dynamic network scenarios through extensive simulation. Simulation results verify our theoretical findings and show that iMesh guarantees nearby (closest) service selection with very high probability, Gt99 percent (respectively, Gt95 percent).

...read moreread less

Journal Article•DOI•

[...]

Arash Hariri¹, Arash Reyhani-Masoleh¹•Institutions (1)

University of Western Ontario¹

01 Oct 2009-IEEE Transactions on Computers

TL;DR: This paper considers the Montgomery multiplication in the binary extension fields and designs two bit-parallel multipliers which are comparable to the best finite field multipliers reported in the literature.

...read moreread less

Abstract: Multiplication and squaring are main finite field operations in cryptographic computations and designing efficient multipliers and squarers affect the performance of cryptosystems. In this paper, we consider the Montgomery multiplication in the binary extension fields and study different structures of bit-serial and bit-parallel multipliers. For each of these structures, we study the role of the Montgomery factor, and then by using appropriate factors, propose new architectures. Specifically, we propose two bit-serial multipliers for general irreducible polynomials, and then derive bit-parallel Montgomery multipliers for two important classes of irreducible polynomials. In this regard, first we consider trinomials and provide a way for finding efficient Montgomery factors which results in a low time complexity. Then, we consider type-II irreducible pentanomials and design two bit-parallel multipliers which are comparable to the best finite field multipliers reported in the literature. Moreover, we consider squaring using this family of irreducible polynomials and show that this operation can be performed very fast with the time complexity of two XOR gates.

...read moreread less

Journal Article•DOI•

Chip Multiprocessor Design Space Exploration through Statistical Simulation

[...]

Davy Genbrugge¹, Lieven Eeckhout¹•Institutions (1)

Ghent University¹

01 Dec 2009-IEEE Transactions on Computers

TL;DR: State-of-the-art statistical simulation is enhanced by modeling the memory address stream behavior in a more microarchitecture-independent way and by modeling a program's time-varying execution behavior to enable accurately modeling resource conflicts in shared resources as observed in the memory hierarchy of contemporary chip multiprocessors when multiple programs are coexecuting on the CMP.

...read moreread less

Abstract: Developing fast chip multiprocessor simulation techniques is a challenging problem. Solving this problem is especially valuable for design space exploration purposes during the early stages of the design cycle where a large number of design points need to be evaluated quickly. This paper studies statistical simulation as a fast simulation technique for chip multiprocessor (CMP) design space exploration. The idea of statistical simulation is to measure a number of program execution characteristics from a real program execution through profiling, to generate a synthetic trace from it, and simulate that synthetic trace as a proxy for the original program. The important benefit is that the synthetic trace is much shorter compared to a real program trace, which leads to substantial simulation speedups. This paper enhances state-of-the-art statistical simulation: 1) by modeling the memory address stream behavior in a more microarchitecture-independent way and 2) by modeling a program's time-varying execution behavior. These two enhancements enable accurately modeling resource conflicts in shared resources as observed in the memory hierarchy of contemporary chip multiprocessors when multiple programs are coexecuting on the CMP. Our experimental evaluation using the SPEC CPU benchmarks demonstrates average prediction error of 7.3 percent across a range of CMP configurations while varying the number of cores and memory hierarchy configurations.

...read moreread less

Journal Article•DOI•

3D-Integrated SRAM Components for High-Performance Microprocessors

[...]

K. Puttaswamy¹, Gabriel H. Loh²•Institutions (2)

Intel¹, Georgia Institute of Technology²

01 Oct 2009-IEEE Transactions on Computers

TL;DR: It is shown that the dense die-to-die vias enable 3D-integrated SRAM components that are partitioned at the level of individual wordlines or bitlines, which results in a wire length reduction within SRAM arrays, and a reduction in the area footprint, which reduces the wires required for global routing.

...read moreread less

Abstract: 3D integration is an emergent technology that has the potential to greatly increase device density while simultaneously providing faster on-chip communication. 3D fabrication involves stacking two or more die connected with a very high density and low-latency interface. The die-to-die vias that comprise this interface can be treated as regular on-chip metal due to their small size (on the order of 1 mum) and high speed (sub-FO4 die-to-die communication delay). The increased device density and the ability to place and route in the third dimension provide new opportunities for microarchitecture design. In this paper, we focus on the 3D-integrated designs of SRAM structures. We show that the dense die-to-die vias enable 3D-integrated SRAM components that are partitioned at the level of individual wordlines or bitlines. This results in a wire length reduction within SRAM arrays, and a reduction in the area footprint, which reduces the wires required for global routing. The wire length reduction provides simultaneous latency and energy reduction benefits, e.g., 47 percent latency reduction and 18 percent energy reduction for a 4 MB 4-die stacked 3D SRAM array. A 3D implementation of a 128-entry multiported SRAM array achieves a 36 percent latency improvement with a simultaneous energy reduction of 55 percent. As planar designs adapt high-performance techniques such as hierarchical wordlines to improve performance, 3D integration provides even larger benefits, making it a desirable technology for high-performance designs. For the 4 MB SRAM array, the 3D-integrated designs provide additional latency reduction benefit over the planar designs when hierarchical wordlines are implemented in both planar and 3D designs.

...read moreread less

Journal Article•DOI•

A Software Implementation of the IEEE 754R Decimal Floating-Point Arithmetic Using the Binary Encoding Format

[...]

Marius Cornea¹, John Harrison¹, Cristina S. Anderson¹, P. Tang², E. Schneider¹, E. Gvozdev¹ - Show less +2 more•Institutions (2)

Intel¹, D. E. Shaw Research²

01 Feb 2009-IEEE Transactions on Computers

TL;DR: New algorithms and properties are presented in this paper which are used in a software implementation of the IEEE 754R decimal floatingpoint arithmetic, with emphasis on using binary operations efficiently.

...read moreread less

Abstract: The IEEE Standard 754-1985 for binary floating-point arithmetic [19] was revised [20], and an important addition is the definition of decimal floating-point arithmetic [8], [24]. This is intended mainly to provide a robust reliable framework for financial applications that are often subject to legal requirements concerning rounding and precision of the results, because the binary floating-point arithmetic may introduce small but unacceptable errors. Using binary floating-point calculations to emulate decimal calculations in order to correct this issue has led to the existence of numerous proprietary software packages, each with its own characteristics and capabilities. The IEEE 754R decimal arithmetic should unify the ways decimal floating-point calculations are carried out on various platforms. New algorithms and properties are presented in this paper, which are used in a software implementation of the IEEE 754R decimal floating-point arithmetic, with emphasis on using binary operations efficiently. The focus is on rounding techniques for decimal values stored in binary format, but algorithms are outlined for the more important or interesting operations of addition, multiplication, and division, including the case of nonhomogeneous operands, as well as conversions between binary and decimal floating-point formats. Performance results are included for a wider range of operations, showing promise that our approach is viable for applications that require decimal floating-point calculations. This paper extends an earlier publication [6].

...read moreread less

Journal Article•DOI•

Data Security in Unattended Wireless Sensor Networks

[...]

R. Di Pietro, Luigi V. Mancini¹, Claudio Soriente², Angelo Spognardi³, Gene Tsudik² - Show less +1 more•Institutions (3)

Sapienza University of Rome¹, University of California, Irvine², French Institute for Research in Computer Science and Automation³

An Online Mechanism for BGP Instability Detection and Analysis

TL;DR: An in-depth investigation of security problems unique to UWSNs (including a new adversarial model) is presented and some simple and effective countermeasures for a certain class of attacks are proposed.

...read moreread less

Abstract: In recent years, wireless sensor networks (WSNs) have been a very popular research topic, offering a treasure trove of systems, networking, hardware, security, and application-related problems. Much of prior research assumes that the WSN is supervised by a constantly present sink and sensors can quickly offload collected data. In this paper, we focus on unattended WSNs (UWSNs) characterized by intermittent sink presence and operation in hostile settings. Potentially lengthy intervals of sink absence offer greatly increased opportunities for attacks resulting in erasure, modification, or disclosure of sensor-collected data. This paper presents an in-depth investigation of security problems unique to UWSNs (including a new adversarial model) and proposes some simple and effective countermeasures for a certain class of attacks.

...read moreread less

Journal Article•DOI•

[...]

Shivani Deshpande, Marina Thottan¹, Tin Kam Ho¹, B. Sikdar²•Institutions (2)

Bell Labs¹, Rensselaer Polytechnic Institute²

Improving the Speed of Parallel Decimal Multiplication

TL;DR: This work proposes an online instability detection architecture that can be implemented by individual routers based on adaptive segmentation of feature traces extracted from BGP update messages and exploiting the temporal and spatial correlations in the traces for robust detection of the instability events.

...read moreread less

Abstract: The importance of border gateway protocol (BGP) as the primary interautonomous system (AS) routing protocol that maintains the connectivity of the Internet imposes stringent stability requirements on its route selection process. Accidental and malicious activities such as misconfigurations, failures, and worm attacks can induce severe BGP instabilities leading to data loss, extensive delays, and loss of connectivity. In this work, we propose an online instability detection architecture that can be implemented by individual routers. We use statistical pattern recognition techniques for detecting the instabilities, and the algorithm is evaluated using real Internet data for a diverse set of events including misconfiguration, node failures, and several worm attacks. The proposed scheme is based on adaptive segmentation of feature traces extracted from BGP update messages and exploiting the temporal and spatial correlations in the traces for robust detection of the instability events. Furthermore, we use route change information to pinpoint the culprit ASes where the instabilities have originated.

...read moreread less

Journal Article•DOI•

[...]

Ghassem Jaberipur¹, Amir Kaivani¹•Institutions (1)

Shahid Beheshti University¹

Many-to-Many Disjoint Path Covers in the Presence of Faulty Elements

TL;DR: In order to improve the speed of parallel decimal multiplication, a new PPG method is presented, fine-tune the PPR method of one of the full solutions and the final addition scheme of the other; thus, assembling a new full solution is presented.

...read moreread less

Abstract: Hardware support for decimal computer arithmetic is regaining popularity. One reason is the recent growth of decimal computations in commercial, scientific, financial, and Internet-based computer applications. Newly commercialized decimal arithmetic hardware units use radix-10 sequential multipliers that are rather slow for multiplication-intensive applications. Therefore, the future relevant processors are likely to host fast parallel decimal multiplication circuits. The corresponding hardware algorithms are normally composed of three steps: partial product generation (PPG), partial product reduction (PPR), and final carry-propagating addition. The state of the art is represented by two recent full solutions with alternative designs for all the three aforementioned steps. In addition, PPR by itself has been the focus of other recent studies. In this paper, we examine both of the full solutions and the impact of a PPR-only design on the appropriate one. In order to improve the speed of parallel decimal multiplication, we present a new PPG method, fine-tune the PPR method of one of the full solutions and the final addition scheme of the other; thus, assembling a new full solution. Logical Effort analysis and 0.13 mum synthesis show at least 13 percent speed advantage, but at a cost of at most 36 percent additional area consumption.

...read moreread less

Journal Article•DOI•

[...]

Jung-Heum Park¹, Hee-Chul Kim², Hyeong-Seok Lim³•Institutions (3)

Catholic University of Korea¹, Hankuk University of Foreign Studies², Chonnam National University³

01 Apr 2009-IEEE Transactions on Computers

TL;DR: Using the construction, it is shown that every m-dimensional restricted HL-graph and recursive circulant G(2^m, 4) with f or less faulty elements have a paired k-DPC for any f and k ges 2 with f + 2k les m.

...read moreread less

Abstract: A many-to-many k-disjoint path cover (k-DPC) of a graph G is a set of k disjoint paths joining k sources and k sinks in which each vertex of G is covered by a path. It is called a paired many-to-many disjoint path cover when each source should be joined to a specific sink, and it is called an unpaired many-to-many disjoint path cover when each source can be joined to an arbitrary sink. In this paper, we discuss about paired and unpaired many-to-many disjoint path covers including their relationships, application to strong Hamiltonicity, and necessary conditions. And then, we give a construction scheme for paired many-to-many disjoint path covers in the graph H0 oplus H1 obtained from connecting two graphs H0 and H1 with |V(H0)| = |V(H1)| by |V(H1)| pairwise nonadjacent edges joining vertices in H0 and vertices in H1, where H0 = G0 oplus G1 and H1 = G2 oplus G3 for some graphs Gj. Using the construction, we show that every m-dimensional restricted HL-graph and recursive circulant G(2m, 4) with f or less faulty elements have a paired k-DPC for any f and k ges 2 with f + 2k les m.

...read moreread less

Journal Article•DOI•

Collusive Piracy Prevention in P2P Content Delivery Networks

[...]

Xiaosong Lou¹, Kai Hwang¹•Institutions (1)

University of Southern California¹

01 Jul 2009-IEEE Transactions on Computers

TL;DR: A proactive content poisoning scheme to stop colluders and pirates from alleged copyright infringements in P2P file sharing, and develops a new peer authorization protocol (PAP) to distinguish pirates from legitimate clients.

...read moreread less

Abstract: Collusive piracy is the main source of intellectual property violations within the boundary of a P2P network. Paid clients (colluders) may illegally share copyrighted content files with unpaid clients (pirates). Such online piracy has hindered the use of open P2P networks for commercial content delivery. We propose a proactive content poisoning scheme to stop colluders and pirates from alleged copyright infringements in P2P file sharing. The basic idea is to detect pirates timely with identity-based signatures and time-stamped tokens. The scheme stops collusive piracy without hurting legitimate P2P clients by targeting poisoning on detected violators, exclusively. We developed a new peer authorization protocol (PAP) to distinguish pirates from legitimate clients. Detected pirates will receive poisoned chunks in their repeated attempts. Pirates are thus severely penalized with no chance to download successfully in tolerable time. Based on simulation results, we find 99.9 percent prevention rate in Gnutella, KaZaA, and Freenet. We achieved 85-98 percent prevention rate on eMule, eDonkey, Morpheus, etc. The scheme is shown less effective in protecting some poison-resilient networks like BitTorrent and Azureus. Our work opens up the low-cost P2P technology for copyrighted content delivery. The advantage lies mainly in minimum delivery cost, higher content availability, and copyright compliance in exploring P2P network resources.

...read moreread less

Journal Article•DOI•

Practical Deadlock-Free Fault-Tolerant Routing in Meshes Based on the Planar Network Fault Model

[...]

Dong Xiang¹, Yueli Zhang², Yi Pan³•Institutions (3)

Tsinghua University¹, University of Toronto², Georgia State University³

01 May 2009-IEEE Transactions on Computers

TL;DR: A new deadlock avoidance technique is proposed for 3D meshes using only two virtual channels by making full use of the idle channels in a deadlock-free adaptive fault-tolerant routing scheme based on a planar network (PN) fault model.

...read moreread less

Abstract: The number of virtual channels required for deadlock-free routing is important for cost-effective and high-performance system design. The planar adaptive routing scheme is an effective deadlock avoidance technique using only three virtual channels for each physical channel in 3D or higher dimensional mesh networks with a very simple deadlock avoidance scheme. However, there exist one idle virtual channel for all physical channels along the first dimension and two idle virtual channels for channels along the last dimension in a mesh network based on the planar adaptive routing algorithm. A new deadlock avoidance technique is proposed for 3D meshes using only two virtual channels by making full use of the idle channels. The deadlock-free adaptive routing scheme is then modified to a deadlock-free adaptive fault-tolerant routing scheme based on a planar network (PN) fault model. The proposed deadlock-free adaptive routing scheme is also extended to n-dimensional meshes still using two virtual channels. Sufficient simulation results are presented to demonstrate the effectiveness of the proposed algorithm.

...read moreread less

Journal Article•DOI•

Using Virtualization to Improve Software Rejuvenation

[...]

Luis Silva¹, Javier Alonso, Jordi Torres•Institutions (1)

University of Coimbra¹

A Highly Accurate Method for Assessing Reliability of Redundant Arrays of Inexpensive Disks (RAID)

TL;DR: This paper presents an approach for software rejuvenation based on automated self-healing techniques that can be easily applied to off-the-shelf application servers, and exploits the usage of virtualization to optimize the self-recovery actions.

...read moreread less

Abstract: In this paper, we present an approach for software rejuvenation based on automated self-healing techniques that can be easily applied to off-the-shelf application servers. Software aging and transient failures are detected through continuous monitoring of system data and performability metrics of the application server. If some anomalous behavior is identified, the system triggers an automatic rejuvenation action. This self-healing scheme is meant to disrupt the running service for a minimal amount of time, achieving zero downtime in most cases. In our scheme, we exploit the usage of virtualization to optimize the self-recovery actions. The techniques described in this paper have been tested with a set of open-source Linux tools and the XEN virtualization middleware. We conducted an experimental study with two application benchmarks (Tomcat/Axis and TPC-W). Our results demonstrate that virtualization can be extremely helpful for failover and software rejuvenation in the occurrence of transient failures and software aging.

...read moreread less

Journal Article•DOI•

[...]

J.G. Elerath, Michael Pecht¹•Institutions (1)

University of Maryland, College Park¹

01 Mar 2009-IEEE Transactions on Computers

TL;DR: The statistical bases for current models of RAID reliability are reviewed and a highly accurate alternative is provided and justified, which corrects statistical errors associated with the pervasive assumption that system (RAID group) times to failure follow a homogeneous Poisson process.

...read moreread less

Abstract: The statistical bases for current models of RAID reliability are reviewed and a highly accurate alternative is provided and justified. This new model corrects statistical errors associated with the pervasive assumption that system (RAID group) times to failure follow a homogeneous Poisson process, and corrects errors associated with assuming the time-to-failure and time-to-restore distributions are exponentially distributed. Statistical justification for the new model uses theory for reliability of repairable systems. Four critical component distributions are developed from field data. These distributions are for times to catastrophic failure, reconstruction and restoration, read errors, and disk data scrubs. Model results have been verified and predict between 2 to 1,500 times as many double disk failures as estimates made using the mean time to data loss method. Model results are compared to system level field data for RAID group of 14 drives and show excellent correlation and greater accuracy than either MTTDL.

...read moreread less

Journal Article•DOI•

Multiresolution Spatial and Temporal Coding in a Wireless Sensor Network for Long-Term Monitoring Applications

[...]

You-Chiun Wang¹, Yao-Yu Hsieh², Yu-Chee Tseng¹•Institutions (2)

National Chiao Tung University¹, Realtek²