scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Computers in 2006"


Journal ArticleDOI
TL;DR: The first practical and secure way to integrate the iris biometric into cryptographic applications is proposed, and an error-free key can be reproduced reliably from genuine iris codes with a 99.5 percent success rate.
Abstract: We propose the first practical and secure way to integrate the iris biometric into cryptographic applications. A repeatable binary string, which we call a biometric key, is generated reliably from genuine iris codes. A well-known difficulty has been how to cope with the 10 to 20 percent of error bits within an iris code and derive an error-free key. To solve this problem, we carefully studied the error patterns within iris codes and devised a two-layer error correction technique that combines Hadamard and Reed-Solomon codes. The key is generated from a subject's iris image with the aid of auxiliary error-correction data, which do not reveal the key and can be saved in a tamper-resistant token, such as a smart card. The reproduction of the key depends on two factors: the iris biometric and the token. The attacker has to procure both of them to compromise the key. We evaluated our technique using iris samples from 70 different eyes, with 10 samples from each eye. We found that an error-free key can be reproduced reliably from genuine iris codes with a 99.5 percent success rate. We can generate up to 140 bits of biometric key, more than enough for 128-bit AES. The extraction of a repeatable binary string from biometrics opens new possible applications, where a strong binding is required between a person and cryptographic operations. For example, it is possible to identify individuals without maintaining a central database of biometric templates, to which privacy objections-might be raised

600 citations


Journal ArticleDOI
TL;DR: The diffusion-based protocol is presented, which is fully localized, and it is shown that, by imposing some constraints on the sensor network, global clock synchronization can be achieved in the presence of malicious nodes that exhibit Byzantine failures.
Abstract: Global synchronization is important for many sensor network applications that require precise mapping of collected sensor data with the time of the events, for example, in tracking and surveillance. It also plays an important role in energy conservation in MAC layer protocols. This paper describes four methods to achieve global synchronization in a sensor network: a node-based approach, a hierarchical cluster-based method, a diffusion-based method, and a fault-tolerant diffusion-based method. The diffusion-based protocol is fully localized. We present two implementations of the diffusion-based protocol for synchronous and asynchronous systems and prove its convergence. Finally, we show that, by imposing some constraints on the sensor network, global clock synchronization can be achieved in the presence of malicious nodes that exhibit Byzantine failures.

504 citations


Journal ArticleDOI
TL;DR: The Sesame framework as mentioned in this paper provides high-level modeling and simulation methods and tools for system-level performance evaluation and exploration of heterogeneous embedded systems, and it takes a designer systematically along the path from selecting candidate architectures, using analytical modeling and multi-objective optimization, to simulating these candidate architectures with our system level simulation environment.
Abstract: The sheer complexity of today's embedded systems forces designers to start with modeling and simulating system components and their interactions in the very early design stages. It is therefore imperative to have good tools for exploring a wide range of design choices, especially during the early design stages, where the design space is at its largest. This paper presents an overview of the Sesame framework, which provides high-level modeling and simulation methods and tools for system-level performance evaluation and exploration of heterogeneous embedded systems. More specifically, we describe Sesame's modeling methodology and trajectory. It takes a designer systematically along the path from selecting candidate architectures, using analytical modeling and multiobjective optimization, to simulating these candidate architectures with our system-level simulation environment. This simulation environment subsequently allows for architectural exploration at different levels of abstraction while maintaining high-level and architecture-independent application specifications. We illustrate all these aspects using a case study in which we traverse Sesame's exploration trajectory for a motion-JPEG encoder application.

366 citations


Journal ArticleDOI
TL;DR: This work proposes a fault-tolerant detection scheme that explicitly introduces the sensor fault probability into the optimal event detection process and mathematically shows that the optimal detection error decreases exponentially with the increase of the neighborhood size.
Abstract: In this paper, we consider two important problems for distributed fault-tolerant detection in wireless sensor networks: 1) how to address both the noise-related measurement error and sensor fault simultaneously in fault-tolerant detection and 2) how to choose a proper neighborhood size n for a sensor node in fault correction such that the energy could be conserved. We propose a fault-tolerant detection scheme that explicitly introduces the sensor fault probability into the optimal event detection process. We mathematically show that the optimal detection error decreases exponentially with the increase of the neighborhood size. Experiments with both Bayesian and Neyman-Pearson approaches in simulated sensor networks demonstrate that the proposed algorithm is able to achieve better detection and better balance between detection accuracy and energy usage. Our work makes it possible to perform energy-efficient fault-tolerant detection in a wireless sensor network.

345 citations


Journal ArticleDOI
TL;DR: This paper models the risk and insecure conditions in grid job scheduling, and proposes six risk-resilient scheduling algorithms to assure secure grid job execution under different risky conditions that can upgrade grid performance significantly at only a moderate increase in extra resources or scheduling delays in a risky grid computing environment.
Abstract: In scheduling a large number of user jobs for parallel execution on an open-resource grid system, the jobs are subject to system failures or delays caused by infected hardware, software vulnerability, and distrusted security policy. This paper models the risk and insecure conditions in grid job scheduling. Three risk-resilient strategies, preemptive, replication, and delay-tolerant, are developed to provide security assurance. We propose six risk-resilient scheduling algorithms to assure secure grid job execution under different risky conditions. We report the simulated grid performances of these new grid job scheduling algorithms under the NAS and PSA workloads. The relative performance is measured by the total job makespan, grid resource utilization, job failure rate, slowdown ratio, replication overhead, etc. In addition to extending from known scheduling heuristics, we developed a new space-time genetic algorithm (STGA) based on faster searching and protected chromosome formation. Our simulation results suggest that, in a wide-area grid environment, it is more resilient for the global job scheduler to tolerate some job delays instead of resorting to preemption or replication or taking a risk on unreliable resources allocated. We find that delay-tolerant min-min and STGA job scheduling have 13-23 percent higher performance than using risky or preemptive or replicated algorithms. The resource overheads for replicated job scheduling are kept at a low 15 percent. The delayed job execution is optimized with a delay factor, which is 20 percent of the total makespan. A Kiviat graph is proposed for demonstrating the quality of grid computing services. These risk-resilient job scheduling schemes can upgrade grid performance significantly at only a moderate increase in extra resources or scheduling delays in a risky grid computing environment

200 citations


Journal ArticleDOI
TL;DR: The area-throughput trade-off for an ASIC implementation of the advanced encryption standard (AES) is explored and the over 30 Gbits/s, fully pipelined AES processor operating in the counter mode of operation can be used for the encryption of data on optical links.
Abstract: This paper explores the area-throughput trade-off for an ASIC implementation of the advanced encryption standard (AES). Different pipelined implementations of the AES algorithm as well as the design decisions and the area optimizations that lead to a low area and high throughput AES encryption processor are presented. With loop unrolling and outer-round pipelining techniques, throughputs of 30 Gbits/s to 70 Gbits/s are achievable in a 0.18-/spl mu/m CMOS technology. Moreover, by pipelining the composite field implementation of the byte substitution phase of the AES algorithm (inner-round pipelining), the area consumption is reduced up to 35 percent. By designing an offline key scheduling unit for the AES processor the area cost is further reduced by 28 percent, which results in a total reduction of 48 percent while the same throughput is maintained. Therefore, the over 30 Gbits/s, fully pipelined AES processor operating in the counter mode of operation can be used for the encryption of data on optical links.

197 citations


Journal ArticleDOI
Seung-Ho Lim1, Kyu Ho Park1
TL;DR: The flash file system proposed in this paper is designed for NAND flash memory storage while considering the existing file system characteristics and outperformed other flash file systems both in booting time and garbage collection overheads.
Abstract: In this paper, we present an efficient flash file system for flash memory storage. Flash memory, especially NAND flash memory, has become a major method for data storage. Currently, a block level translation interface is required between an existing file system and flash memory chips due to its physical characteristics. However, the approach of existing file systems on top of the emulating block interface has many restrictions and is, thus, inefficient because existing file systems are designed for disk-based storage systems. The flash file system proposed in this paper is designed for NAND flash memory storage while considering the existing file system characteristics. Our target performance metrics are the system booting time and garbage collection overheads, which are important issues in flash memory. In our experiments, the proposed flash file system outperformed other flash file systems both in booting time and garbage collection overheads.

175 citations


Journal ArticleDOI
J.D. Golic1
TL;DR: A new method for digital true random number generation based on asynchronous logic circuits with feedback based on the so-called Galois and Fibonacci ring oscillators is introduced and a concrete technique using a self-clock-controlled linear feedback shift register is proposed.
Abstract: A new method for digital true random number generation based on asynchronous logic circuits with feedback is introduced. In particular, a concrete technique using the so-called Galois and Fibonacci ring oscillators is developed and analyzed both theoretically and experimentally. The generated random binary sequences may have a very high speed and a higher and more robust entropy rate in comparison with previous proposals for digital random number generators. A new method for digital postprocessing of random data based on irregularly clocked nonautonomous synchronous logic circuits with feedback is also introduced and a concrete technique using a self-clock-controlled linear feedback shift register is proposed. The postprocessing can provide both randomness extraction and computationally secure speed increase of input random data

174 citations


Journal ArticleDOI
TL;DR: It is shown that the problem of routing messages in a wireless sensor network so as to maximize network lifetime is NP-hard and an online heuristic is developed, which performs two shortest path computations to route each message, which results in greater lifetime.
Abstract: We show that the problem of routing messages in a wireless sensor network so as to maximize network lifetime is NP-hard. In our model, the online model, each message has to be routed without knowledge of future route requests. We also develop an online heuristic to maximize network lifetime. Our heuristic, which performs two shortest path computations to route each message, is superior to previously published heuristics for lifetime maximization - our heuristic results in greater lifetime and its performance is less sensitive to the selection of heuristic parameters. Additionally, our heuristic is superior on the capacity metric

171 citations


Journal ArticleDOI
TL;DR: This paper provides several extensions of the source-independent MPR to generate a smaller CDS using 3-hop neighborhood information to cover each node's 2-hop neighbor set and shows that the extended MPR has a constant local approximation ratio compared with a logarithmic local ratio in the original MPR.
Abstract: Multipoint relays (MPR) provide a localized and optimized way of broadcasting messages in a mobile ad hoc network (MANET). Using partial 2-hop information, each node chooses a small set of forward neighbors to relay messages and this set covers the node's 2-hop neighbor set. These selected forward nodes form a connected dominating set (CDS) to ensure full coverage. Adjih et al. later proposed a novel extension of MPR to construct a small CDS and it is source-independent. In this paper, we provide several extensions to generate a smaller CDS using complete 2-hop information to cover each node's 2-hop neighbor set. We extend the notion of coverage in the original MPR. We prove that the extended MPR has a constant local approximation ratio compared with a logarithmic local ratio in the original MPR. In addition, we show that the extended MPR has a constant global probabilistic approximation ratio, while no such ratio exists in the original MPR and its existing extensions. The effectiveness of our approach is confirmed through a simulation study.

164 citations


Journal ArticleDOI
TL;DR: This paper focuses on a means to counteract fault attacks by presenting a new way of implementing exponentiation algorithms that can be used to obtain fast FA-resistant RSA signature generations in both the straightforward method and Chinese remainder theorem modes.
Abstract: Nowadays, side channel attacks allow an attacker to recover secrets stored in embedded devices more efficiently than any other kind of attack. Among the former, fault attacks (FA) and single power analysis (SPA) are probably the most effective: when applied to straightforward implementations of the RSA cryptosystem, only one execution of the algorithm is required to recover the secret key. Over recent years, many countermeasures have been proposed to prevent side channel attacks on RSA. Regarding fault attacks, only one countermeasure offers effective protection and it can be very costly. In this paper, we focus on a means to counteract fault attacks by presenting a new way of implementing exponentiation algorithms. This method can be used to obtain fast FA-resistant RSA signature generations in both the straightforward method and Chinese remainder theorem modes. Moreover, as it has been shown that fault attacks can benefit from the weaknesses introduced by some SPA countermeasures, we ensure that our method resists SPA and, thus, does not require supplementary SPA countermeasures

Journal ArticleDOI
TL;DR: A reliability-oriented place and route algorithm is presented that is able to effectively mitigate the effects of the considered faults and is demonstrated by extensive fault injection experiments showing that the capability of tolerating SEU effects in the FPGA's configuration memory increases up to 85 times with respect to a standard TMR design technique.
Abstract: The very high integration levels reached by VLSI technologies for SRAM-based field programmable gate arrays (FPGAs) lead to high occurrence-rate of transient faults induced by single event upsets (SEUs) in FPGAs' configuration memory. Since the configuration memory defines which circuit an SRAM-based FPGA implements, any modification induced by SEUs may dramatically change the implemented circuit. When such devices are used in safety-critical applications, fault-tolerant techniques are needed to mitigate the effects of SEUs in FPGAs' configuration memory. In this paper, we analyze the effects induced by the SEUs in the configuration memory of SRAM-based FPGAs. The reported analysis outlines that SEUs in the FPGA's configuration memory are particularly critical since they are able to escape well-known fault masking techniques such as triple modular redundancy (TMR). We then present a reliability-oriented place and route algorithm that, coupled with TMR, is able to effectively mitigate the effects of the considered faults. The effectiveness of the new reliability-oriented place and route algorithm is demonstrated by extensive fault injection experiments showing that the capability of tolerating SEU effects in the FPGA's configuration memory increases up to 85 times with respect to a standard TMR design technique

Journal ArticleDOI
TL;DR: In order to prevent the Advanced Encryption Standard (AES) from suffering from differential fault attacks, the technique of error detection can be adopted to detect the errors during encryption or decryption and then to provide the information for taking further action, such as interrupting the AES process or redoing the process.
Abstract: In order to prevent the Advanced Encryption Standard (AES) from suffering from differential fault attacks, the technique of error detection can be adopted to detect the errors during encryption or decryption and then to provide the information for taking further action, such as interrupting the AES process or redoing the process. Because errors occur within a function, it is not easy to predict the output. Therefore, general error control codes are not suited for AES operations. In this work, several error-detection schemes have been proposed. These schemes are based on the (n+1, n) cyclic redundancy check (CRC) over GF(28), where nisin{4,8,16}. Because of the good algebraic properties of AES, specifically the MixColumns operation, these error detection schemes are suitable for AES and efficient for the hardware implementation; they may be designed using round-level, operation-level, or algorithm-level detection. The proposed schemes have high fault coverage. In addition, the schemes proposed are scalable and symmetrical. The scalability makes these schemes suitable for an AES circuit implemented in 8-bit, 32-bit, or 128-bit architecture. Symmetry also benefits the implementation of the proposed schemes to achieve that the encryption process and the decryption process can share the same error detection hardware. These schemes are also suitable for encryption-only or decryption-only cases. Error detection for the key schedule in AES is also proposed and is based on the derived results in the data procedure of AES

Journal ArticleDOI
TL;DR: A hardware Gaussian noise generator based on the Box-Muller method that provides highly accurate noise samples and is currently being used at the Jet Propulsion Laboratory, NASA to evaluate the performance of low-density parity-check codes for deep-space communications.
Abstract: We present a hardware Gaussian noise generator based on the Box-Muller method that provides highly accurate noise samples. The noise generator can be used as a key component in a hardware-based simulation system, such as for exploring channel code behavior at very low bit error rates, as low as 10-12 to 10-13. The main novelties of this work are accurate analytical error analysis and bit-width optimization for the elementary functions involved in the Box-Muller method. Two 16-bit noise samples are generated every clock cycle and, due to the accurate error analysis, every sample is analytically guaranteed to be accurate to one unit in the last place. An implementation on a Xilinx Virtex-4 XC4VLX100-12 FPGA occupies 1,452 slices, three block RAMs, and 12 DSP slices, and is capable of generating 750 million samples per second at a clock speed of 375 MHz. The performance can be improved by exploiting concurrent execution: 37 parallel instances of the noise generator at 95 MHz on a Xilinx Virtex-II Pro XC2VP100-7 FPGA generate seven billion samples per second and can run over 200 times faster than the output produced by software running on an Intel Pentium-4 3 GHz PC. The noise generator is currently being used at the Jet Propulsion Laboratory, NASA to evaluate the performance of low-density parity-check codes for deep-space communications

Journal ArticleDOI
TL;DR: A security overhead model is built that can be used to reasonably measure security overheads incurred by the security-critical tasks and incorporates the earliest deadline first (EDF) scheduling policy into SAREC to implement a novel security-aware real-time scheduling algorithm (SAEDF).
Abstract: Security-critical real-time applications such as military aircraft flight control systems have mandatory security requirements in addition to stringent timing constraints. Conventional real-time scheduling algorithms, however, either disregard applications' security needs and thus expose the applications to security threats or run applications at inferior security levels without optimizing security performance. In recognition that many applications running on clusters demand both real-time performance and security, we investigate the problem of scheduling a set of independent real-time tasks with various security requirements. We build a security overhead model that can be used to reasonably measure security overheads incurred by the security-critical tasks. Next, we propose a security-aware real-time heuristic strategy for clusters (SAREC), which integrates security requirements into the scheduling for real-time applications on clusters. Further, to evaluate the performance of SAREC, we incorporate the earliest deadline first (EDF) scheduling policy into SAREC to implement a novel security-aware real-time scheduling algorithm (SAEDF). Experimental results from both real-world traces and a real application show that SAEDF significantly improves security over three existing scheduling algorithms (EDF, least laxity first, and first come first serve) by up to 266.7 percent while achieving high schedulability.

Journal ArticleDOI
TL;DR: From the study of the similarity between the four generations of SPEC CPU benchmark suites, it is found that, other than a dramatic increase in the dynamic instruction count and increasingly poor temporal data locality, the inherent program characteristics have more or less remained unchanged.
Abstract: This paper proposes a methodology for measuring the similarity between programs based on their inherent microarchitecture-independent characteristics, and demonstrates two applications for it: 1) finding a representative subset of programs from benchmark suites and 2) studying the evolution of four generations of SPEC CPU benchmark suites. Using the proposed methodology, we find a representative subset of programs from three popular benchmark suites - SPEC CPU2000, MediaBench, and MiBench. We show that this subset of representative programs can be effectively used to estimate the average benchmark suite IPC, L1 data cache miss-rates, and speedup on 11 machines with different ISAs and microarchitectures - this enables one to save simulation time with little loss in accuracy. From our study of the similarity between the four generations of SPEC CPU benchmark suites, we find that, other than a dramatic increase in the dynamic instruction count and increasingly poor temporal data locality, the inherent program characteristics have more or less remained unchanged

Journal ArticleDOI
TL;DR: A hardware-based solution, called SmashGuard, to protect against all known forms of attack on the function return addresses stored on the program stack, which is secure and does not require recompilation of applications.
Abstract: A buffer overflow attack is perhaps the most common attack used to compromise the security of a host. This attack can be used to change the function return address and redirect execution to the attacker's code. We present a hardware-based solution, called SmashGuard, to protect against all known forms of attack on the function return addresses stored on the program stack. With each function call instruction, the current return address is pushed onto a hardware stack. A return instruction compares its address to the return address from the top of the hardware stack. An exception is raised to signal the mismatch. Because the stack operations and checks are done in hardware in parallel with the usual execution of instructions, our best-performing implementation scheme has virtually no performance overhead (because we are modifying hardware, it is impossible to guarantee zero overhead without an actual hardware implementation). While previous software-based approaches' average performance degradation for the SPEC2000 benchmarks is only 2.8 percent, their worst-case degradation is up to 8.3 percent. Apart from the lack of robustness in performance, the software approaches' key disadvantages are less security coverage and the need for recompilation of applications. SmashGuard, on the other hand, is secure and does not require recompilation of applications

Journal ArticleDOI
TL;DR: In this article, the authors present protocols that protect both sensitive credentials and sensitive policies in trust negotiations in an open environment such as the Internet, where the decision to collaborate with a stranger (e.g., by granting access to a resource) is often based on the characteristics (rather than the identity) of the requester via digital credentials.
Abstract: In an open environment such as the Internet, the decision to collaborate with a stranger (e.g., by granting access to a resource) is often based on the characteristics (rather than the identity) of the requester, via digital credentials: access is granted if Alice's credentials satisfy Bob's access policy. The literature contains many scenarios in which it is desirable to carry out such trust negotiations in a privacy-preserving manner, i.e., so as minimize the disclosure of credentials and/or of access policies. Elegant solutions were proposed for achieving various degrees of privacy-preservation through minimal disclosure. In this paper, we present protocols that protect both sensitive credentials and sensitive policies. That is, Alice gets the resource only if she satisfies the policy, Bob does not learn anything about Alice's credentials (not even whether Alice got access), and Alice learns neither Bob's policy structure nor which credentials caused her to gain access. Our protocols are efficient in terms of communication and in rounds of interaction

Journal ArticleDOI
TL;DR: In this article, the double accumulator multiplier (DAM) and the N-accumulator multipliers (NAM) are proposed for GF(2m) and GF(1.5m), respectively.
Abstract: Digit serial multipliers are used extensively in hardware implementations of elliptic and hyperelliptic curve cryptography. This contribution shows different architectural enhancements in least significant digit (LSD) multiplier for binary fields GF(2m). We propose two different architectures, the double accumulator multiplier (DAM) and N-accumulator multiplier (NAM), which are both faster compared to traditional LSD multipliers. Our evaluation of the multipliers for different digit sizes gives optimum choices and shows that currently used digit sizes are the worst possible choices. Hence, one of the most important results of this contribution is that digit sizes of the form 2l - 1, where l is an integer, are preferable for the digit multipliers. Furthermore, one should always use the NAM architecture to get the best timings. Considering the time area product DAM or NAM gives the best performance depending on the digit size

Journal ArticleDOI
TL;DR: A novel routing algorithm, called adaptive fusion Steiner tree (AFST), which jointly optimize over the costs for both data transmission and fusion, but also evaluates the benefit and cost of data fusion along information routes and adaptively adjusts whether fusion shall be performed at a particular node.
Abstract: While in-network data fusion can reduce data redundancy and, hence, curtail network load, the fusion process itself may introduce significant energy consumption for emerging wireless sensor networks with vectorial data and/or security requirements. Therefore, fusion-driven routing protocols for sensor networks cannot optimize over communication cost only - fusion cost must also be accounted for. In our prior work, while a randomized algorithm termed MFST is devised toward this end, it assumes that fusion shall be performed at any intersection node whenever data streams encounter. In this paper, we design a novel routing algorithm, called adaptive fusion Steiner tree (AFST), for energy efficient data gathering. Not only does AFST jointly optimize over the costs for both data transmission and fusion, but also AFST evaluates the benefit and cost of data fusion along information routes and adaptively adjusts whether fusion shall be performed at a particular node. Analytically and experimentally, we show that AFST achieves better performance than existing algorithms, including SLT, SPT, and MFST

Journal ArticleDOI
TL;DR: This paper presents two vector-level software algorithms which essentially eliminate bit-wise inner product operations for Gaussian normal bases and shows that the software implementation of the proposed algorithm is faster than previously reported normal basis multiplication algorithms.
Abstract: Recently, implementations of normal basis multiplication over the extended binary field GF(2/sup m/) have received considerable attention. A class of low complexity normal bases called Gaussian normal bases has been included in a number of standards, such as IEEE and NIST for an elliptic curve digital signature algorithm. The multiplication algorithms presented there are slow in software since they rely on bit-wise inner product operations. In this paper, we present two vector-level software algorithms which essentially eliminate such bit-wise operations for Gaussian normal bases. Our analysis and timing results show that the software implementation of the proposed algorithm is faster than previously reported normal basis multiplication algorithms. The proposed algorithm is also more memory efficient compared with its look-up table-based counterpart. Moreover, two new digit-level multiplier architectures are proposed and it is shown that they outperform the existing normal basis multiplier structures. As compared with similar digit-level normal basis multipliers, the proposed multiplier with serial output requires the fewest number of XOR gates and the one with parallel output is the fastest multiplier.

Journal ArticleDOI
TL;DR: A more systematic and efficient construction of symmetry-breaking predicates is described, which uses the cycle structure of symmetry generators, which typically involve very few variables, to drastically reduce the size of SBPs.
Abstract: Identifying and breaking the symmetries of conjunctive normal form (CNF) formulae has been shown to lead to significant reductions in search times. Symmetries in the search space are broken by adding appropriate symmetry-breaking predicates (SBPs) to an SAT instance in CNF. The SBPs prune the search space by acting as a filter that confines the search to nonsymmetric regions of the space without affecting the satisfiability of the CNF formula. For symmetry breaking to be effective in practice, the computational overhead of generating and manipulating SBPs must be significantly less than the runtime savings they yield due to search space pruning. In this paper, we describe a more systematic and efficient construction of SBPs. In particular, we use the cycle structure of symmetry generators, which typically involve very few variables, to drastically reduce the size of SBPs. Furthermore, our new SBP construction grows linearly with the number of relevant variables as opposed to the previous quadratic constructions. Our empirical data suggest that these improvements reduce search runtimes by one to two orders of magnitude on a wide variety of benchmarks with symmetries.

Journal ArticleDOI
TL;DR: This paper presents a framework for QoS specification and management consisting of a model for expressing QoS requirements, an architecture based on feedback control scheduling, and a set of algorithms implementing different policies and behaviors.
Abstract: Real-time applications such as e-commerce, flight control, chemical and nuclear control, and telecommunication are becoming increasingly sophisticated in their data needs, resulting in greater demands for real-time data services that are provided by real-time databases. Since the workload of real-time databases cannot be precisely predicted, they can become overloaded and thereby cause temporal violations, resulting in damage or even a catastrophe. Imprecise computation techniques address this problem and allow graceful degradation during overloads. In this paper, we present a framework for QoS specification and management consisting of a model for expressing QoS requirements, an architecture based on feedback control scheduling, and a set of algorithms implementing different policies and behaviors. Our approach gives a robust and controlled behavior of real-time databases, even for transient overloads and with inaccurate runtime estimates of the transactions. Further, performance experiments show that the proposed algorithms outperform a set of baseline algorithms that uses feedback control.

Journal ArticleDOI
TL;DR: The queuing-theoretic and optimization-based model for joint BA and CAC provides a unified radio resource management solution for the IEEE 802.16-based multiservice broadband wireless access networks considering both packet-level and connection-level quality-of-service (QoS) constraints.
Abstract: We present a queuing-theoretic and optimization-based model for radio resource management in IEEE 802.16-based multiservice broadband wireless access (BWA) networks considering both packet-level and connection-level quality-of-service (QoS) constraints. Specifically, we model and analyze two approaches, namely, the optimal and the iterative approaches, for joint bandwidth allocation (BA) and connection admission control (CAC). To limit the amount of bandwidth allocated to each service type, for both these approaches, the total available bandwidth is shared among the different types of services using a complete partitioning approach. While, for the optimal approach, an assignment problem is formulated and solved, a water-filling mechanism is used for the iterative approach. The latter incurs significantly less computational complexity compared to the former while providing similar system performances. To analyze the connection-level performance measures such as connection blocking probability and average number of ongoing connections, a queuing model is developed. Then, an optimization formulation is used to obtain the optimal threshold settings for complete partitioning of the available bandwidth resource so that the connection-level QoS for the different services can be maintained at the target level while maximizing the average system revenue. To analyze the packet-level performance measures such as the packet delay statistics and transmission rate (or throughput), a queuing analytical model is developed which considers adaptive modulation and coding (AMC) at the physical/radio link layer. In summary, the queuing-theoretic and optimization-based model for joint BA and CAC provides a unified radio resource management solution for the IEEE 802.16-based broadband wireless access networks

Journal ArticleDOI
TL;DR: An algorithm to achieve fast multiplication in two's complement representation is presented, which results in a true diamond-shape for the partial product tree, which is more efficient in terms of implementation.
Abstract: The performance of multiplication is crucial for multimedia applications such as 3D graphics and signal processing systems, which depend on the execution of large numbers of multiplications. Previously reported algorithms mainly focused on rapidly reducing the partial products rows down to final sums and carries used for the final accumulation. These techniques mostly rely on circuit optimization and minimization of the critical paths. In this paper, an algorithm to achieve fast multiplication in two's complement representation is presented. Rather than focusing on reducing the partial products rows down to final sums and carries, our approach strives to generate fewer partial products rows. In turn, this influences the speed of the multiplication, even before applying partial products reduction techniques. Fewer partial products rows are produced, thereby lowering the overall operation time. In addition to the speed improvement, our algorithm results in a true diamond-shape for the partial product tree, which is more efficient in terms of implementation. The synthesis results of our multiplication algorithm using the Artisan TSMC 0.13mum 1.2-volt standard-cell library show 13 percent improvement in speed and 14 percent improvement in power savings for 8-bit times 8-bit multiplications (10 percent and 3 percent, respectively, for 16-bit times 16-bit multiplications) when compared to conventional multiplication algorithms

Journal ArticleDOI
TL;DR: This paper proposes a novel strategy that enables a two-way interaction between the OS and the SMT processor and allows the OS to run jobs at a certain percentage of their maximum speed, regardless of the workload in which these jobs are executed.
Abstract: Current operating systems (OS) perceive the different contexts of simultaneous multithreaded (SMT) processors as multiple independent processing units, although, in reality, threads executed in these units compete for the same hardware resources. Furthermore, hardware resources are assigned to threads implicitly as determined by the SMT instruction fetch (Ifetch) policy, without the control of the OS. Both factors cause a lack of control over how individual threads are executed, which can frustrate the work of the job scheduler. This presents a problem for general purpose systems, where the OS job scheduler cannot enforce priorities, and also for embedded systems, where it would be difficult to guarantee worst-case execution times. In this paper, we propose a novel strategy that enables a two-way interaction between the OS and the SMT processor and allows the OS to run jobs at a certain percentage of their maximum speed, regardless of the workload in which these jobs are executed. In contrast to previous approaches, our approach enables the OS to run time-critical jobs without dedicating all internal resources to them so that non-time-critical jobs can make significant progress as well and without significantly compromising overall throughput. In fact, our mechanism, in addition to fulfilling OS requirements, achieves 90 percent of the throughput of one of the best currently known fetch policies for SMTs.

Journal ArticleDOI
Subhasish Mitra1, K.S. Kim1
TL;DR: Experimental results on industrial designs demonstrate that this new XPAND technique achieves exponential reduction in test data volume and test time compared to traditional scan and significantly outperforms existing test compression tools.
Abstract: Combinational circuits implemented with exclusive-or gates are used for on-chip generation of deterministic test patterns from compressed seeds. Unlike major test compression techniques, this technique doesn't require test pattern generation with don't cares. Experimental results on industrial designs demonstrate that this new XPAND technique achieves exponential reduction in test data volume and test time compared to traditional scan and significantly outperforms existing test compression tools. The XPAND technique is currently being used by several industrial designs.

Journal ArticleDOI
TL;DR: A wide range of approaches exist and since many of them overlap, this paper describes, classifies, and compares them to aid the computer architect in selecting the most appropriate one.
Abstract: Simulators have become an integral part of the computer architecture research and design process. Since they have the advantages of cost, time, and flexibility, architects use them to guide design space exploration and to quantify the efficacy of an enhancement. However, long simulation times and poor accuracy limit their effectiveness. To reduce the simulation time, architects have proposed several techniques that increase the simulation speed or throughput. To increase the accuracy, architects try to minimize the amount of error in their simulators and have proposed adding statistical rigor to their simulation methodology. Since a wide range of approaches exist and since many of them overlap, this paper describes, classifies, and compares them to aid the computer architect in selecting the most appropriate one.

Journal ArticleDOI
TL;DR: The HSDefender technique can defend a system against more types of buffer overflow attacks with less overhead compared with the previous work, and is analyzed with respect to hardware cost, security, and performance.
Abstract: With more embedded systems networked, it becomes an important problem to effectively defend embedded systems against buffer overflow attacks. Due to the increasing complexity and strict requirements, off-the-shelf software components are widely used in embedded systems, especially for military and other critical applications. Therefore, in addition to effective protection, we also need to provide an approach for system integrators to efficiently check whether software components have been protected. In this paper, we propose the HSDefender (Hardware/Software Defender) technique to perform protection and checking together. Our basic idea is to design secure call instructions so systems can be secured and checking can be easily performed. In the paper, we classify buffer overflow attacks into two categories and provide two corresponding defending strategies. We analyze the HSDefender technique with respect to hardware cost, security, and performance. We experiment with our HSDefender technique on the simplescalar/ARM simulator with benchmarks from MiBench, an embedded benchmark suite. The results show that our HSDefender technique can defend a system against more types of buffer overflow attacks with less overhead compared with the previous work.

Journal ArticleDOI
TL;DR: This paper presents a new fault-tolerant routing methodology that does not degrade performance in the absence of faults and tolerates a reasonably large number of faults without disabling any healthy node.
Abstract: Massively parallel computing systems are being built with thousands of nodes. The interconnection network plays a key role for the performance of such systems. However, the high number of components significantly increases the probability of failure. Additionally, failures in the interconnection network may isolate a large fraction of the machine. It is therefore critical to provide an efficient fault-tolerant mechanism to keep the system running, even in the presence of faults. This paper presents a new fault-tolerant routing methodology that does not degrade performance in the absence of faults and tolerates a reasonably large number of faults without disabling any healthy node. In order to avoid faults, for some source-destination pairs, packets are first sent to an intermediate node and then from this node to the destination node. Fully adaptive routing is used along both subpaths. The methodology assumes a static fault model and the use of a checkpoint/restart mechanism. However, there are scenarios where the faults cannot be avoided solely by using an intermediate node. Thus, we also provide some extensions to the methodology. Specifically, we propose disabling adaptive routing and/or using misrouting on a per-packet basis. We also propose the use of more than one intermediate node for some paths. The proposed fault-tolerant routing methodology is extensively evaluated in terms of fault tolerance, complexity, and performance.