Showing papers in &quot;IEEE Transactions on Computers in 2008&quot;

Reducing the Energy Consumption of Ethernet with Adaptive Link Rate (ALR)

TL;DR: Results confirm the unique benefits for future generations of CMPs that can be achieved by bringing optics into the chip in the form of photonic NoCs, as well as a comparative power analysis of a photonic versus an electronic NoC.

...read moreread less

Abstract: The design and performance of next-generation chip multiprocessors (CMPs) will be bound by the limited amount of power that can be dissipated on a single die We present photonic networks-on-chip (NoC) as a solution to reduce the impact of intra-chip and off-chip communication on the overall power budget A photonic interconnection network can deliver higher bandwidth and lower latencies with significantly lower power dissipation We explain why on-chip photonic communication has recently become a feasible opportunity and explore the challenges that need to be addressed to realize its implementation We introduce a novel hybrid micro-architecture for NoCs combining a broadband photonic circuit-switched network with an electronic overlay packet-switched control network We address the critical design issues including: topology, routing algorithms, deadlock avoidance, and path-setup/tear-down procedures We present experimental results obtained with POINTS, an event-driven simulator specifically developed to analyze the proposed idea, as well as a comparative power analysis of a photonic versus an electronic NoC Overall, these results confirm the unique benefits for future generations of CMPs that can be achieved by bringing optics into the chip in the form of photonic NoCs

...read moreread less

873 citations

Journal Article•DOI•

[...]

Chamara Gunaratne¹, Ken Christensen¹, Bruce Nordman², S.W. Suen¹•Institutions (2)

University of South Florida¹, Lawrence Berkeley National Laboratory²

01 Apr 2008-IEEE Transactions on Computers

TL;DR: Simulation experiments show that an Ethernet link with ALR can operate at a lower data rate for over 80 percent of the time, yielding significant energy savings with only a very small increase in packet delay.

...read moreread less

Abstract: The rapidly increasing energy consumption by computing and communications equipment is a significant economic and environmental problem that needs to be addressed. Ethernet network interface controllers (NICs) in the US alone consume hundreds of millions of US dollars in electricity per year. Most Ethernet links are underutilized and link energy consumption can be reduced by operating at a lower data rate. In this paper, we investigate adaptive link rate (ALR) as a means of reducing the energy consumption of a typical Ethernet link by adaptively varying the link data rate in response to utilization. Policies to determine when to change the link data rate are studied. Simple policies that use output buffer queue length thresholds and fine-grain utilization monitoring are shown to be effective. A Markov model of a state-dependent service rate queue with rate transitions only at service completion is used to evaluate the performance of ALR with respect to the mean packet delay, the time spent in an energy-saving low link data rate, and the oscillation of link data rates. Simulation experiments using actual and synthetic traffic traces show that an Ethernet link with ALR can operate at a lower data rate for over 80 percent of the time, yielding significant energy savings with only a very small increase in packet delay.

...read moreread less

423 citations

Journal Article•DOI•

STAR : An Efficient Coding Scheme for Correcting Triple Storage Node Failures

[...]

Cheng Huang¹, Lihao Xu²•Institutions (2)

Microsoft¹, Wayne State University²

01 Jul 2008-IEEE Transactions on Computers

TL;DR: In this paper, a new coding scheme, called the STAR code, was proposed for correcting triple storage node failures (erasures), which is an extension of the double-erasure-correcting EVENODD code.

...read moreread less

Abstract: Proper data placement schemes based on erasure correcting codes are one of the most important components for a highly available data storage system. For such schemes, low decoding complexity for correcting (or recovering) storage node failures is essential for practical systems. In this paper, we describe a new coding scheme, which we call the STAR code, for correcting triple storage node failures (erasures). The STAR code is an extension of the double-erasure-correcting EVENODD code and a modification of the generalized triple-erasure-correcting EVENODD code. The STAR code is an Maximum Distance Separable (MDS) code and thus is optimal in terms of node failure recovery capability for a given data redundancy. We provide detailed STAR code decoding algorithms for correcting various triple node failures. We show that the decoding complexity of the STAR code is much lower than those of existing comparable codes; thus, the STAR code is practically very meaningful for storage systems that need higher reliability.

...read moreread less

264 citations

Journal Article•DOI•

Elliptic-Curve-Based Security Processor for RFID

[...]

Yong Ki Lee¹, Kazuo Sakiyama², Lejla Batina³, Ingrid Verbauwhede¹•Institutions (3)

University of California, Los Angeles¹, University of Electro-Communications², Katholieke Universiteit Leuven³

Counter-Based Cache Replacement and Bypassing Algorithms

TL;DR: This paper presents an architecture of a state-of-the-art processor for RFID tags with an elliptic curve (EC) processor over GF(2163) and shows the plausibility of meeting both security and efficiency requirements even in a passive RFID tag.

...read moreread less

Abstract: RFID (radio frequency identification) tags need to include security functions, yet at the same time their resources are extremely limited. Moreover, to provide privacy, authentication and protection against tracking of RFID tags without loosing the system scalability, a public-key based approach is inevitable, which is shown by M. Burmester et al. In this paper, we present an architecture of a state-of-the-art processor for RFID tags with an elliptic curve (EC) processor over GF(2163). It shows the plausibility of meeting both security and efficiency requirements even in a passive RFID tag. The proposed processor is able to perform EC scalar multiplications as well as general modular arithmetic (additions and multiplications) which are needed for the cryptographic protocols. As we work with large numbers, the register file is the most critical component in the architecture. By combining several techniques, we are able to reduce the number of registers from 9 to 6 resulting in EC processor of 10.1 K gates. To obtain an efficient modulo arithmetic, we introduce a redundant modular operation. Moreover the proposed architecture can support multiple cryptographic protocols. The synthesis results with a 0.13 um CMOS technology show that the gate area of the most compact version is 12.5 K gates.

...read moreread less

253 citations

Journal Article•DOI•

[...]

Mazen Kharbutli¹, Yan Solihin²•Institutions (2)

Jordan University of Science and Technology¹, North Carolina State University²

01 Apr 2008-IEEE Transactions on Computers

TL;DR: A new counter-based approach to deal with cache pollution, predicting lines that have become dead and replacing them early from the L2 cache and identifying never-reaccessed lines, which is augmented with an event counter that is incremented when an event of interest such as certain cache accesses occurs.

...read moreread less

Abstract: Recent studies have shown that, in highly associative caches, the performance gap between the least recently used (LRU) and the theoretical optimal replacement algorithms is large, motivating the design of alternative replacement algorithms to improve cache performance. In LRU replacement, a line, after its last use, remains in the cache for a long time until it becomes the LRU line. Such deadlines unnecessarily reduce the cache capacity available for other lines. In addition, in multilevel caches, temporal reuse patterns are often inverted, showing in the L1 cache but, due to the filtering effect of the L1 cache, not showing in the L2 cache. At the L2, these lines appear to be brought in the cache but are never reaccessed until they are replaced. These lines unnecessarily pollute the L2 cache. This paper proposes a new counter-based approach to deal with the above problems. For the former problem, we predict lines that have become dead and replace them early from the L2 cache. For the latter problem, we identify never-reaccessed lines, bypass the L2 cache, and place them directly in the L1 cache. Both techniques are achieved through a single counter-based mechanism. In our approach, each line in the L2 cache is augmented with an event counter that is incremented when an event of interest such as certain cache accesses occurs. When the counter reaches a threshold, the line ";expires"; and becomes replaceable. Each line's threshold is unique and is dynamically learned. We propose and evaluate two new replacement algorithms: Access interval predictor (AIP) and live-time predictor (LvP). AIP and LvP speed up 10 capacity-constrained SPEC2000 benchmarks by up to 48 percent and 15 percent on average (7 percent on average for the whole 21 Spec2000 benchmarks). Cache bypassing further reduces L2 cache pollution and improves the average speedups to 17 percent (8 percent for the whole 21 Spec2000 benchmarks).

...read moreread less

230 citations

Journal Article•DOI•

Implementation and Analysis of a New Selection Strategy for Adaptive Routing in Networks-on-Chip

[...]

Giuseppe Ascia¹, Vincenzo Catania¹, Maurizio Palesi¹, Davide Patti¹•Institutions (1)

University of Catania¹

01 Jun 2008-IEEE Transactions on Computers

TL;DR: A novel selection strategy based on the concept of Neighbors-on-Path is presented that can be coupled with any adaptive routing algorithm to exploit the situations of indecision occurring when the routing function returns several admissible output channels.

...read moreread less

Abstract: Efficient and deadlock-free routing is critical to the performance of networks-on-chip. The effectiveness of any adaptive routing algorithm strongly depends on the underlying selection strategy. A selection function is used to select the output channel where the packet will be forwarded on. In this paper we present a novel selection strategy that can be coupled with any adaptive routing algorithm. The proposed selection strategy is based on the concept of Neighbors-on-Path the aims of which is to exploit the situations of indecision occurring when the routing function returns several admissible output channels. The overall objective is to choose the channel that will allow the packet to be routed to its destination along a path that is as free as possible of congested nodes. Performance evaluation is carried out by using a flit-accurate simulator under traffic scenarios generated by both synthetic and real applications. Results obtained show how the proposed selection strategy applied to the Odd-Even routing algorithm yields an improvement in both average delay and saturation point up to 20% and 30% on average respectively, with a minimal overhead in terms of area occupation. In addition, a positive effect on total energy consumption is also observed under near-congestion packet injection rates.

...read moreread less

226 citations

Journal Article•DOI•

SD-MAC: Design and Synthesis of a Hardware-Efficient Collision-Free QoS-Aware MAC Protocol for Wireless Network-on-Chip

[...]

Dan Zhao¹, Yi Wang¹•Institutions (1)

University of Louisiana at Lafayette¹

High-Performance Architecture of Elliptic Curve Scalar Multiplication

TL;DR: This paper designs and implements a synchronous and distributed medium access control protocol and proposes a QoS-aware SD-MAC to ensure the serviceability of the entire system and to improve the bandwidth utilization of the system.

...read moreread less

Abstract: To bridge the widening gap between computation requirements and communication efficiency faced by gigascale heterogeneous SoCs in the upcoming ubiquitous era, a new on-chip communication system, dubbed Wireless Network-on-Chip (WNoC), is introduced by using the recently developed CMOS UWB wireless interconnection technology. In this paper, a synchronous and distributed medium access control (SD-MAC) protocol is designed and implemented. Tailored for WNoC, SD-MAC employs a binary countdown approach to resolve channel contention between RF nodes. The receiver_select_sender mechanism and hidden terminal elimination scheme are proposed to increase the throughput and channel utilization of the system. Our simulation study shows the promising performance of SD-MAC in terms of throughput, latency, and network utilization. We further propose a QoS-aware SD-MAC to ensure the serviceability of the entire system and to improve the bandwidth utilization. As a major component of simple and compact RF node design, a MAC unit implements the proposed SD-MAC that guarantees correct operation of synchronized frames while keeping overhead low. The synthesis results demonstrate several attractive features such as high speed, low power consumption, nice scalability and low area cost.

...read moreread less

196 citations

Journal Article•DOI•

[...]

Bijan Ansari¹, M.A. Hasan²•Institutions (2)

University of California, Los Angeles¹, University of Waterloo²

The Algebra of Connectors—Structuring Interaction in BIP

TL;DR: A high performance architecture of elliptic curve scalar multiplication based on the Montgomery ladder method over finite field GF(2m) is proposed and a pseudo-pipelined word serial finite field multiplier with word size w, suitable for the scalar multiplied, is developed.

...read moreread less

Abstract: A high performance architecture of elliptic curve scalar multiplication based on the Montgomery ladder method over finite field GF(2m) is proposed. A pseudo-pipelined word serial finite field multiplier with word size w, suitable for the scalar multiplication is also developed. Implemented in hardware, this system performs a scalar multiplication in approximately 6lceilm/wrceil(m-1) clock cycles and the gate delay in the critical path is equal to TAND + lceillog2(w/k)rceilTXOR, where TAND and TXOR are delays due to two-input AND and XOR gates respectively and 1 les k Lt w is used to shorten the critical path.

...read moreread less

166 citations

Journal Article•DOI•

[...]

Simon Bliudze, Joseph Sifakis

Cryptanalysis with COPACOBANA

TL;DR: A number of properties of AC(P) used to symbolically simplify and handle connectors are provided, including a general component model encompassing methods for incremental model decomposition and efficient implementation by using symbolic techniques.

...read moreread less

Abstract: We provide an algebraic formalization of connectors in the BIP component framework. A connector relates a set of typed ports. Types are used to describe different modes of synchronization, in particular, rendezvous and broadcast. Connectors on a set of ports P are modeled as terms of the algebra AC(P), generated from P by using a binary fusion operator and a unary typing operator. Typing associates with terms (ports or connectors) synchronization types - trigger or synchron - that determine modes of synchronization. Broadcast interactions are initiated by triggers. Rendezvous is a maximal interaction of a connector that includes only synchrons. The semantics of AC(P) associates with a connector the set of its interactions. It induces on connectors an equivalence relation which is not a congruence as it is not stable for fusion. We provide a number of properties of AC(P) used to symbolically simplify and handle connectors. We provide examples illustrating applications of AC(P), including a general component model encompassing methods for incremental model decomposition and efficient implementation by using symbolic techniques.

...read moreread less

162 citations

Journal Article•DOI•

[...]

Tim Güneysu¹, Timo Kasper¹, Martin Novotny¹, Christof Paar¹, Andy Rupp¹ - Show less +1 more•Institutions (1)

Ruhr University Bochum¹

Efficient Exact Schedulability Tests for Fixed Priority Real-Time Systems

TL;DR: This work describes various exhaustive key search attacks on symmetric ciphers and demonstrates an attack on a security mechanism employed in the electronic passport and introduces efficient implementations of more complex cryptanalysis on asymmetric cryptosystems, e.g., elliptic curve cryptosSystems (ECCs) and number cofactorization for RSA.

...read moreread less

Abstract: Cryptanalysis of ciphers usually involves massive computations. The security parameters of cryptographic algorithms are commonly chosen so that attacks are infeasible with available computing resources. Thus, in the absence of mathematical breakthroughs to a cryptanalytical problem, a promising way for tackling the computations involved is to build special-purpose hardware exhibiting a (much) better performance-cost ratio than off-the-shelf computers. This contribution presents a variety of cryptanalytical applications utilizing the cost-optimized parallel code breaker (COPACOBANA) machine, which is a high-performance low-cost cluster consisting of 120 field-programmable gate arrays (FPGAs). COPACOBANA appears to be the only such reconfigurable parallel FPGA machine optimized for code breaking tasks reported in the open literature. Depending on the actual algorithm, the parallel hardware architecture can outperform conventional computers by several orders of magnitude. In this work, we focus on novel implementations of cryptanalytical algorithms, utilizing the impressive computational power of COPACOBANA. We describe various exhaustive key search attacks on symmetric ciphers and demonstrate an attack on a security mechanism employed in the electronic passport (e-passport). Furthermore, we describe time-memory trade-off techniques that can, e.g., be used for attacking the popular A5/1 algorithm used in GSM voice encryption. In addition, we introduce efficient implementations of more complex cryptanalysis on asymmetric cryptosystems, e.g., elliptic curve cryptosystems (ECCs) and number cofactorization for RSA. Even though breaking RSA or elliptic curves with parameter lengths used in most practical applications is out of reach with COPACOBANA, our attacks on algorithms with artificially short bit lengths allow us to extrapolate more reliable security estimates for real-world bit lengths. This is particularly useful for deriving estimates about the longevity of asymmetric key lengths.

...read moreread less

157 citations

Journal Article•DOI•

[...]

Robert I. Davis¹, A. Zabos¹, Alan Burns¹•Institutions (1)

University of York¹

Throughput-Buffering Trade-Off Exploration for Cyclo-Static and Synchronous Dataflow Graphs

TL;DR: This paper addresses performance issues with exact response time analysis (RTA) for fixed priority preemptive systems, and initial values are introduced that improve the efficiency of the standard RTA algorithm when exact response times are required, and when only exact scheduling need be determined.

...read moreread less

Abstract: Efficient exact schedulability tests are required both for on-line admission of applications to dynamic systems and as an integral part of design tools for complex distributed real-time systems. This paper addresses performance issues with exact response time analysis (RTA) for fixed priority preemptive systems. Initial values are introduced that improve the efficiency of the standard RTA algorithm (i) when exact response times are required, and (ii) when only exact schedulability need be determined. The paper also explores modifications to the standard RTA algorithm, including; the use of a response time upper bound to determine when exact analysis is needed, incremental computation aimed at faster convergence, and checking tasks in reverse priority order to identify unschedulable task sets early. The various initial values and algorithm implementations are compared by means of experiments on a PC recording the number of iterations required, and execution time measurements on a real-time embedded microprocessor. Recommendations are provided for engineers tasked with the problem of implementing exact schedulability tests, as part of on-line acceptance tests and spare capacity allocation algorithms, or as part of off-line system design tools.

...read moreread less

Journal Article•DOI•

[...]

Sander Stuijk¹, Marc Geilen¹, Twan Basten¹•Institutions (1)

Eindhoven University of Technology¹

Efficient Prefix Updates for IP Router Using Lexicographic Ordering and Updatable Address Set

TL;DR: An exact technique is presented to chart the Pareto space of throughput and storage trade-offs, which can be used to determine the minimal buffer space needed to execute a graph under a given throughput constraint.

...read moreread less

Abstract: Multimedia applications usually have throughput constraints. An implementation must meet these constraints, while it minimizes resource usage and energy consumption. The compute intensive kernels of these applications are often specified as cyclo-static or synchronous dataflow graphs. Communication between nodes in these graphs requires storage space which influences throughput. We present an exact technique to chart the Pareto space of throughput and storage trade-offs, which can be used to determine the minimal buffer space needed to execute a graph under a given throughput constraint. The feasibility of the exact technique is demonstrated with experiments on a set of realistic DSP and multimedia applications. To increase scalability of the approach, a fast approximation technique is developed that guarantees both throughput and a, tight, bound on the maximal overestimation of buffer requirements. The approximation technique allows to trade off worst-case overestimation versus run-time.

...read moreread less

Journal Article•DOI•

[...]

Sieteng Soh¹, Lely Hiryanto², Suresh Rai³•Institutions (3)

Curtin University¹, Tarumanagara University², Louisiana State University³

01 Jan 2008-IEEE Transactions on Computers

TL;DR: Simulations show that the improved FEC and CNHA/CWA schemes outperform the most recent O(log2|T|) schemes in terms of lookup time, update time, and memory requirement.

...read moreread less

Abstract: Dynamic IP router table schemes, which have recently been proposed in the literature, perform an IP lookup or an online prefix update in O(log2|T|) memory accesses (MAs). In terms of lookup time, they are still slower than the full expansion/compression (FEC) scheme (compressed next-hop array/code word array (CNHA/CWA)), which requires exactly (at most) three MAs, irrespective of the number of prefixes |T| in a routing table T. The prefix updates in both FEC and CNHA/CWA have a drawback: Inefficient offline structure reconstruction is arguably the only viable solution. This paper solves the problem. We propose the use of lexicographic ordered prefixes to reduce the offline construction time of both schemes. Simulations on several real routing databases, run on the same platform, show that our approach constructs FEC (CNHA/CWA) tables in 2.68 to 7.54 (4.57 to 6) times faster than that from previous techniques. We also propose an online update scheme that, using an updatable address set and selectively decompressing the FEC and CNHA/CWA structures, modifies only the next hops of the addresses in the set. Recompressing the updated structures, the resulting forwarding tables are identical to those obtained by structure reconstructions, but are obtained at much lower computational cost. Our simulations show that the improved FEC and CNHA/CWA outperform the most recent O(log2|T|) schemes in terms of lookup time, update time, and memory requirement.

...read moreread less

Journal Article•DOI•

Double-Data-Rate Computation as a Countermeasure against Fault Analysis

[...]

Paolo Maistri, Regis Leveugle

Secure Memory Accesses on Networks-on-Chip

TL;DR: A novel approach based on a Double-Data-Rate (DDR) computation template is proposed, which is compared to other existing architectures and countermeasures, and a thorough dependability analysis is given.

...read moreread less

Abstract: Differential Fault Analysis (DFA) is one of the most powerful techniques to attack cryptosystems. Several countermeasures have been proposed, which are based either on information or temporal redundancy. In this work, we propose a novel approach based on a Double-Data-Rate (DDR) computation template. A few sample architectures have been implemented: they are compared to other existing architectures and countermeasures, and a thorough dependability analysis is given.

...read moreread less

Journal Article•DOI•

[...]

Leandro Fiorin¹, Gianluca Palermo², Slobodan Lukovic¹, Valerio Catalano³, Cristina Silvano² - Show less +1 more•Institutions (3)

University of Lugano¹, Polytechnic University of Milan², STMicroelectronics³

Energy-Efficient Multihop Polling in Clusters of Two-Layered Heterogeneous Sensor Networks

TL;DR: This paper presents a secure NoC architecture composed of a set of data protection units (DPUs) implemented within the network interfaces, and focuses on the dynamic updating of the DPUs to support their utilization in dynamic environments, and on the utilization of authentication techniques to increase the level of security.

...read moreread less

Abstract: Security is gaining increasing relevance in the development of embedded devices. Towards a secure system at each level of design, this paper addresses security aspects related to network-on-chip (NoC) architectures, foreseen as the communication infrastructure of next-generation embedded devices. In the context of NoC-based multiprocessor systems, we focus on the topic, not yet thoroughly faced, of data protection. In this paper, we present a secure NoC architecture composed of a set of data protection units (DPUs) implemented within the network interfaces. The run-time configuration of the programmable part of the DPUs is managed by a central unit, the network security manager (NSM). The DPU, similar to a firewall, can check and limit the access rights (none, read, write, or both) of processors accessing data and instructions in a shared memory - in particular distinguishing between the operating roles (supervisor/user and secure/unsecure) of the processing elements. We explore different alternative implementations for the DPU and demonstrate how this unit does not affect the network latency if the memory request has the appropriate rights. We also focus on the dynamic updating of the DPUs to support their utilization in dynamic environments, and on the utilization of authentication techniques to increase the level of security.

...read moreread less

Journal Article•DOI•

[...]

Zhenghao Zhang¹, Ming Ma², Yuanyuan Yang²•Institutions (2)

Florida State University¹, State University of New York System²

01 Feb 2008-IEEE Transactions on Computers

TL;DR: The results show that the polling scheme can reduce the active time of sensors by a significant amount while sustaining 100 percent throughput and the problem of finding an optimal schedule is NP-hard and then given a fast online algorithm to solve it approximately.

...read moreread less

Abstract: In this paper, we study two-layered heterogeneous sensor networks where two types of nodes are deployed: the basic sensor nodes and the cluster head nodes. The basic sensor nodes are simple and have limited power supplies, whereas the cluster head nodes are much more powerful and have many more power supplies, which organize sensors around them into clusters. Such two-layered heterogeneous sensor networks have better scalability and lower overall cost than homogeneous sensor networks. We propose using polling to collect data from sensors to the cluster head since polling can prolong network life by avoiding collisions and reducing the idle listening time of sensors. We focus on finding energy-efficient and collision-free polling schedules in a multihop cluster. To reduce energy consumption in idle listening, a schedule is optimal if it uses the minimum time. We show that the problem of finding an optimal schedule is NP-hard and then give a fast online algorithm to solve it approximately. We also consider dividing a cluster into sectors and using multiple nonoverlapping frequency channels to further reduce the idle listening time of sensors. We conducted simulations on the NS-2 simulator and the results show that our polling scheme can reduce the active time of sensors by a significant amount while sustaining 100 percent throughput.

...read moreread less

Journal Article•DOI•

DRES: Dynamic Range Encoding Scheme for TCAM Coprocessors

[...]

Hao Che¹, Zhijun Wang², Kai Zheng³, Bin Liu⁴•Institutions (4)

University of Texas at Arlington¹, Hong Kong Polytechnic University², IBM³, Tsinghua University⁴

01 Jul 2008-IEEE Transactions on Computers

TL;DR: The dynamic range encoding scheme (DRES) is proposed to significantly improve the TCAM storage efficiency for range matching and is evaluated based on real-world databases and shows that DRES can reduce theTCAM storage expansion ratio from 6.20 to 1.23.

...read moreread less

Abstract: One of the most critical resource management issues in the use of ternary content-addressable memory (TCAM) for packet classification/filtering is how to effectively support filtering rules with ranges, known as range matching. In this paper, the dynamic range encoding scheme (DRES) is proposed to significantly improve the TCAM storage efficiency for range matching. Unlike the existing range encoding schemes requiring additional hardware support, DRES uses the TCAM coprocessor itself to assist range encoding. Hence, DRES can be readily programmed in a network processor using a TCAM coprocessor for packet classification. A salient feature of DRES is its ability to allow a subset of ranges to be encoded and, hence, to have full control over the range code size. This advantage allows DRES to exploit the TCAM structure to maximize the TCAM storage efficiency. DRES is a comprehensive solution, including a dynamic range selection algorithm, a search key encoding scheme, a range encoding scheme, and a dynamic encoded range update algorithm. Although the dynamic range selection algorithm running in the software allows optimal selection of ranges to be encoded to fully utilize the TCAM storage, the dynamic encoded range update algorithm allows the TCAM database to be updated lock free without interrupting the TCAM database lookup process. DRES is evaluated based on real-world databases and the results show that DRES can reduce the TCAM storage expansion ratio from 6.20 to 1.23. The performance analysis of DRES based on a probabilistic model demonstrates that DRES significantly improves the TCAM storage efficiency for a wide spectrum of range distributions.

...read moreread less

Journal Article•DOI•

Integrated Coverage and Connectivity in Wireless Sensor Networks: A Two-Dimensional Percolation Problem

[...]

Habib M. Ammari¹, Sajal K. Das²•Institutions (2)

Hofstra University¹, University of Texas at Arlington²

Accurate, Pre-RTL Temperature-Aware Design Using a Parameterized, Geometric Thermal Model

TL;DR: This paper proposes a probabilistic approach to compute the covered area fraction at critical percolation for both of the SCPT and NCPT problems, and proposes a model forpercolation in WSNs, called correlated disk model, which provides a basis for solving the SC PT and N CPT problems together.

...read moreread less

Abstract: While sensing coverage reflects the surveillance quality provided by a wireless sensor network (WSN), network connectivity enables data gathered by sensors to reach a central node, called the sink. Given an initially uncovered field and as more and more sensors are continuously added to a WSN, the size of partial covered areas increases. At some point, the situation abruptly changes from small fragmented covered areas to a single large covered area. We call this abrupt change as the sensing-coverage phase transition (SCPT). Also, given an originally disconnected WSN and as more and more sensors are added, the number of connected components changes such that the WSN suddenly becomes connected at some point. We call this sudden change as the network-connectivity phase transition (NCPT). The nature of such phase transitions is a central topic in percolation theory of Boolean models. In this paper, we propose a probabilistic approach to compute the covered area fraction at critical percolation for both of the SCPT and NCPT problems. Because sensing coverage and network connectivity are not totally orthogonal, we also propose a model for percolation in WSNs, called correlated disk model, which provides a basis for solving the SCPT and NCPT problems together.

...read moreread less

Journal Article•DOI•

[...]

Wei Huang¹, Karthik Sankaranarayanan¹, Kevin Skadron¹, Robert J. Ribando¹, Mircea R. Stan¹ - Show less +1 more•Institutions (1)

University of Virginia¹

Improving Quality of VoIP Streams over WiMax

TL;DR: An improved block-based compact thermal model (HotSpot 4.0) is presented that automatically achieves good accuracy even under extreme conditions and has been extensively validated with detailed finite-element thermal simulation tools.

...read moreread less

Abstract: Preventing silicon chips from negative, even disastrous thermal hazards has become increasingly challenging these days; considering thermal effects early in the design cycle is thus required. To achieve this, an accurate yet fast temperature model together with an early-stage, thermally optimized, design flow are needed. In this paper, we present an improved block-based compact thermal model (HotSpot 4.0) that automatically achieves good accuracy even under extreme conditions. The model has been extensively validated with detailed finite-element thermal simulation tools. We also show that properly modeling package components and applying the right boundary conditions are crucial to making full-chip thermal models like HotSpot accurately resemble what happens in the real world. Ignoring or over-simplifying package components can lead to inaccurate temperature estimations and potential thermal hazards that are costly to fix in later designs stages. Such a full-chip and package thermal model can then be incorporated into a thermally optimized design flow where it acts as an efficient communication medium among computer architects, circuit designers and package designers in early microprocessor design stages, to achieve early and accurate design decisions and also faster design convergence. For example, the temperature-leakage interaction can be readily analyzed within such a design flow to predict potential thermal hazards such as thermal runaway.

...read moreread less

Journal Article•DOI•

[...]

Shamik Sengupta¹, Mainak Chatterjee¹, Samrat Ganguly²•Institutions (2)

University of Central Florida¹, Princeton University²

01 Feb 2008-IEEE Transactions on Computers

TL;DR: This paper exploits the rich set of flexible features offered at the medium access control (MAC) layer of WiMax for the construction and transmission of MAC protocol data units (MPDUs) for supporting multiple VoIP streams and shows that the feedback-based technique coupled with retransmissions, aggregation, and variable length MPDUs are effective and increase the R-score and mean opinion score.

...read moreread less

Abstract: Real-time services such as VoIP are becoming popular and are major revenue earners for network service providers. These services are no longer confined to the wired domain and are being extended over wireless networks. Although some of the existing wireless technologies can support some low-bandwidth applications, the bandwidth demands of many multimedia applications exceed the capacity of these technologies. The IEEE 802.16-based WiMax promises to be one of the wireless access technologies capable of supporting very high bandwidth applications. In this paper, we exploit the rich set of flexible features offered at the medium access control (MAC) layer of WiMax for the construction and transmission of MAC protocol data units (MPDUs) for supporting multiple VoIP streams. We study the quality of VoIP calls, usually given by R-score, with respect to the delay and loss of packets. We observe that loss is more sensitive than delay; hence, we compromise the delay performance within acceptable limits in order to achieve a lower packet loss rate. Through a combination of techniques like forward error correction, automatic repeat request, MPDU aggregation, and minislot allocation, we strike a balance between the desired delay and loss. Simulation experiments are conducted to test the performance of the proposed mechanisms. We assume a three-state Markovian channel model and study the performance with and without retransmissions. We show that the feedback-based technique coupled with retransmissions, aggregation, and variable length MPDUs are effective and increase the R-score and mean opinion score by about 40 percent.

...read moreread less

Journal Article•DOI•

SEA: A Striping-Based Energy-Aware Strategy for Data Placement in RAID-Structured Storage Systems

[...]

Tao Xie¹•Institutions (1)

San Diego State University¹

01 Jun 2008-IEEE Transactions on Computers

TL;DR: This paper proposes a novel energy-aware strategy, called striping-based energy- aware (SEA), which can be integrated into data placement in RAID-structured storage systems to noticeably save energy while providing quick responses and extensive experimental results demonstrate that compared with traditional non-stripping data placement algorithms, the algorithms significantly improve performance and save energy.

...read moreread less

Abstract: Many real-world applications need to frequently access data stored on large-scale parallel disk storage systems. On one hand, prompt responses to access requests are essential for these applications. On the other hand, however, with an explosive increase of data volume and the emerging of faster disks with higher power requirements, energy consumption of disk-based storage systems has become a salient issue. To achieve energy-conservation and prompt responses simultaneously, in this paper we propose a novel energy-aware strategy, called striping-based energy-aware (SEA), which can be integrated into data placement in RAID-structured storage systems to noticeably save energy while providing quick responses. Next, to illustrate the effectiveness of SEA, we implement two SEA-powered striping-based data placement algorithms, SEA0 and SEA5, by incorporating the SEA strategy into RAID-0 and RAID-5, respectively. Extensive experimental results demonstrate that compared with traditional non-stripping data placement algorithms, our algorithms significantly improve performance and save energy. Further, compared with an existing stripping-based data placement scheme, the two SEA-powered strategies noticeably reduce energy consumption with only a little performance degradation.

...read moreread less

Journal Article•DOI•

Implementing Synchronous Models on Loosely Time Triggered Architectures

[...]

Stavros Tripakis¹, C. Pinello¹, Albert Benveniste², A. Sangiovanni-Vincent³, Paul Caspi⁴, M. Di Natale - Show less +2 more•Institutions (4)

Cadence Design Systems¹, French Institute for Research in Computer Science and Automation², University of California, Berkeley³, Centre national de la recherche scientifique⁴

eRAID: Conserving Energy in Conventional Disk-Based RAID System

TL;DR: This work shows how to maintain semantic equivalence between specification and implementation using an intermediate model (similar to a Kahn process network but with finite queues) that helps in defining the transformation.

...read moreread less

Abstract: Synchronous systems offer a clean semantics and an easy verification path at the expense of often inefficient implementations. Capturing design specifications as synchronous models and then implementing the specifications in a less restrictive platform allow to address a much larger design space. The key issue in this approach is maintaining semantic equivalence between the synchronous model and its implementation. We address this problem by showing how to map a synchronous model onto a loosely time-triggered architecture that is fairly straightforward to implement as it does not require global synchronization or blocking communication. We show how to maintain semantic equivalence between specification and implementation using an intermediate model (similar to a Kahn process network but with finite queues) that helps in defining the transformation. Performance of the semantic preserving implementation is studied for the general case as well as for a few special cases.

...read moreread less

Journal Article•DOI•

[...]

Jun Wang¹, Huijun Zhu¹, Dong Li•Institutions (1)

University of Central Florida¹

01 Mar 2008-IEEE Transactions on Computers

TL;DR: This paper develops a multiconstraint energy-saving model for the RAID environment by considering both disk characteristics and workload features and proposes an energy saving policy, eRAID (energy-efficient RAID), for conventional disk-based mirrored and parity redundant disk array architectures.

...read moreread less

Abstract: Recently, high-energy consumption has become a serious concern for both storage servers and data centers. Recent research studies have utilized the short transition times of multispeed disks to decrease energy consumption. Manufacturing challenges and costs have so far prevented commercial deployment of multispeed disks. In this paper, we propose an energy saving policy, eRAID (energy-efficient RAID), for conventional disk-based mirrored and parity redundant disk array architectures. eRAID saves energy by spinning down partial or the entire mirror disk group with constraints of acceptable performance degradation. We first develop a multiconstraint energy-saving model for the RAID environment by considering both disk characteristics and workload features. Then, we develop a performance (response time and throughput) control scheme for eRAID based on the analytical model. Experimental results show that eRAID can save up to 32 percent energy while satisfying the predefined performance requirement.

...read moreread less

Journal Article•DOI•

A Novel Distributed Sensor Positioning System Using the Dual of Target Tracking

[...]

Liqiang Zhang¹, Qiang Cheng, Yingge Wang, Sherali Zeadally•Institutions (1)

Indiana University¹

01 Feb 2008-IEEE Transactions on Computers

TL;DR: This work investigates the sensor localization problem from a novel perspective by treating it as a functional dual of target tracking, utilizing a moving location assistant (LA) (with a global positioning system (GPS) or a predefined moving path) to help location-unaware sensors to accurately discover their positions.

...read moreread less

Abstract: As one of the fundamental issues in wireless sensor networks (WSNs), the sensor localization problem has recently received extensive attention. In this work, we investigate this problem from a novel perspective by treating it as a functional dual of target tracking. In traditional tracking problems, static location-aware sensors track and predict the position and/or velocity of a moving target. As a dual, we utilize a moving location assistant (LA) (with a global positioning system (GPS) or a predefined moving path) to help location-unaware sensors to accurately discover their positions. We call our proposed system Landscape. In Landscape, an LA (an aircraft, for example) periodically broadcasts its current location (we call it a beacon) while it moves around or through a sensor field. Each sensor collects the location beacons, measures the distance between itself and the LA based on the received signal strength (RSS), and individually calculates their locations via an Unscented Kalman Filter (UKF)-based algorithm. Landscape has several features that are favorable to WSNs, such as high scalability, no intersensor communication overhead, moderate computation cost, robustness to range errors and network connectivity, etc. Extensive simulations demonstrate that Landscape is an efficient sensor positioning scheme for outdoor sensor networks.

...read moreread less

Journal Article•DOI•

A CFD-Based Tool for Studying Temperature in Rack-Mounted Servers

[...]

Jeonghwan Choi¹, Youngjae Kim¹, Anand Sivasubramaniam¹, Jelena Srebric¹, Qian Wang¹, Joonwon Lee² - Show less +2 more•Institutions (2)

Pennsylvania State University¹, KAIST²

01 Aug 2008-IEEE Transactions on Computers

TL;DR: A detailed 3-dimensional computational fluid dynamics based thermal modeling tool, called ThermoStat, for rack-mounted server systems, and proposes reactive and proactive thermal management for rack mounted server and isothermal workload distribution for rack.

...read moreread less

Abstract: Temperature-aware computing is becoming more important in design of computer systems as power densities are increasing and the implications of high operating temperatures result in higher failure rates of components and increased demand for cooling capability. Computer architects and system software designers need to understand the thermal consequences of their proposals, and develop techniques to lower operating temperatures to reduce both transient and permanent component failures. Recognizing the need for thermal modeling tools to support those researches, there has been work on modeling temperatures of processors at the micro-architectural level which can be easily understood and employed by computer architects for processor designs. However, there is a dearth of such tools in the academic/research community for undertaking architectural/systems studies beyond a processor - a server box, rack or even a machine room. In this paper we presents a detailed 3-dimensional computational fluid dynamics based thermal modeling tool, called ThermoStat, for rack-mounted server systems. We conduct several experiments with this tool to show how different load conditions affect the thermal profile, and also illustrate how this tool can help design dynamic thermal management techniques. We propose reactive and proactive thermal management for rack mounted server and isothermal workload distribution for rack.

...read moreread less

Journal Article•DOI•

A Serial Memory by Quantum-Dot Cellular Automata (QCA)

[...]

V. Vankamamidi¹, Marco Ottavi², Fabrizio Lombardi¹•Institutions (2)

Northeastern University¹, Advanced Micro Devices²

01 May 2008-IEEE Transactions on Computers

TL;DR: An extensive comparison of the proposed architecture and previous QCA serial memories is pursued in terms of latency, timing, clocking requirements, and hardware complexity.

...read moreread less

Abstract: Quantum-dot Cellular Automata (QCA) has been widely advocated as a new device architecture for nanotechnology. QCA systems require extremely low power, together with the potential for high density and regularity. These features make QCA an attractive technology for manufacturing memories in which the paradigm of memory-in-motion can be fully exploited. This paper proposes a novel serial memory architecture for QCA implementation. This architecture is based on utilizing new building blocks (referred to as tiles) in the storage and input/output circuitry of the memory. The QCA paradigm of memory-in-motion is accomplished using a novel arrangement in the storage loop and timing/clocking; a three-zone memory tile is proposed by which information is moved across a concatenation of tiles by utilizing a two-level clocking mechanism. Clocking zones are shared between memory cells and the length of the QCA line of a clocking zone is independent of the word size. QCA circuits for address decoding and input/output for simplification of the Read/Write operations are discussed in detail. An extensive comparison of the proposed architecture and previous QCA serial memories is pursued in terms of latency, timing, clocking requirements, and hardware complexity.

...read moreread less

Journal Article•DOI•

High-Performance Designs for Linear Algebra Operations on Reconfigurable Hardware

[...]

Ling Zhuo¹, Viktor K. Prasanna¹•Institutions (1)

University of Southern California¹

01 Aug 2008-IEEE Transactions on Computers

TL;DR: This paper proposes FPGA-based designs for several basic linear algebra operations, including dot product, matrix-vector multiplication, matrix multiplication and matrix factorization, and shows that with faster floating-point units and larger devices, the performance of the designs increases accordingly.

...read moreread less

Abstract: Numerical linear algebra operations are key primitives in scientific computing. Performance optimizations of such operations have been extensively investigated. With the rapid advances in technology, hardware acceleration of linear algebra applications using FPGAs (field programmable gate arrays) has become feasible. In this paper, we propose FPGA-based designs for several basic linear algebra operations, including dot product, matrix-vector multiplication, matrix multiplication and matrix factorization. By identifying the parameters for each operation, we analyze the trade-offs and propose a high-performance design. In the implementations of the designs, the values of the parameters are determined according to the hardware constraints, such as the available chip area, the size of available memory, the memory bandwidth, and the number of I/O pins. The proposed designs are implemented on Xilinx Virtex-II Pro FPGAs. Experimental results show that our designs scale with the available hardware resources. Also, the performance of our designs compares favorably with that of general-purpose processor based designs. We also show that with faster floating-point units and larger devices, the performance of our designs increases accordingly.

...read moreread less

Journal Article•DOI•

A High-Fault-Coverage Approach for the Test of Data, Control and Handshake Interconnects in Mesh Networks-on-Chip

[...]

Erika Cota, Fernanda Lima Kastensmidt, M. Cassel, Marcos Herve¹, Paulo Meirelles¹, Alexandre M. Amory¹, Marcelo Lubaszewski¹ - Show less +3 more•Institutions (1)

Universidade Federal do Rio Grande do Sul¹