scispace - formally typeset
Search or ask a question

Showing papers by "Srinivas Devadas published in 2004"


Proceedings ArticleDOI
17 Jun 2004
TL;DR: It is shown that there exists enough delay variation across ICs implementing, the proposed circuit to identify individual ICs, to build a secret key unique to each IC.
Abstract: This paper describes a technique that exploits the statistical delay variations of wires and transistors across ICs to build a secret key unique to each IC. To explore its feasibility, we fabricated a candidate circuit to generate a response based on its delay characteristics. We show that there exists enough delay variation across ICs implementing, the proposed circuit to identify individual ICs. Further. the circuit, functions reliably over a practical range of environmental variation such as temperature and voltage.

841 citations


Proceedings ArticleDOI
07 Oct 2004
TL;DR: This work presents a simple architectural mechanism called dynamic information flow tracking that can significantly improve the security of computing systems with negligible performance overhead and is transparent to users or application programmers.
Abstract: We present a simple architectural mechanism called dynamic information flow tracking that can significantly improve the security of computing systems with negligible performance overhead. Dynamic information flow tracking protects programs against malicious software attacks by identifying spurious information flows from untrusted I/O and restricting the usage of the spurious information.Every security attack to take control of a program needs to transfer the program's control to malevolent code. In our approach, the operating system identifies a set of input channels as spurious, and the processor tracks all information flows from those inputs. A broad range of attacks are effectively defeated by checking the use of the spurious values as instructions and pointers.Our protection is transparent to users or application programmers; the executables can be used without any modification. Also, our scheme only incurs, on average, a memory overhead of 1.4% and a performance overhead of 1.1%.

811 citations


Journal ArticleDOI
TL;DR: The results show that smart cache management and scheduling is essential to achieve high performance with shared cache memory and can improve the total IPC significantly over the standard least recently used (LRU) replacement policy.
Abstract: This paper proposes dynamic cache partitioning amongst simultaneously executing processes/threads. We present a general partitioning scheme that can be applied to set-associative caches. Since memory reference characteristics of processes/threads can change over time, our method collects the cache miss characteristics of processes/threads at run-time. Also, the workload is determined at run-time by the operating system scheduler. Our scheme combines the information, and partitions the cache amongst the executing processes/threads. Partition sizes are varied dynamically to reduce the total number of misses. The partitioning scheme has been evaluated using a processor simulator modeling a two-processor CMP system. The results show that the scheme can improve the total IPC significantly over the standard least recently used (LRU) replacement policy. In a certain case, partitioning doubles the total IPC over standard LRU. Our results show that smart cache management and scheduling is essential to achieve high performance with shared cache memory.

402 citations


Journal ArticleDOI
TL;DR: Experiments show that the technique to reliably and securely identify individual integrated circuits (ICs) based on the precise measurement of circuit delays and a simple challenge–response protocol is viable, but that current implementations could require some strengthening before it can be considered as secure.
Abstract: This paper describes a technique to reliably and securely identify individual integrated circuits (ICs) based on the precise measurement of circuit delays and a simple challenge–response protocol. This technique could be used to produce key-cards that are more difficult to clone than ones involving digital keys on the IC. We consider potential venues of attack against our system, and present candidate implementations. Experiments on Field Programmable Gate Arrays show that the technique is viable, but that our current implementations could require some strengthening before it can be considered as secure. Copyright © 2004 John Wiley & Sons, Ltd.

317 citations


Journal ArticleDOI
TL;DR: Experiments on Field Programmable Gate Arrays show that the technique to reliably and securely identify individual integrated circuits based on the precise measurement of circuit delays and a simple challenge–response protocol is viable, but that current implementations could require some strengthening.
Abstract: This paper describes a technique to reliably and securely identify individual integrated circuits (ICs) based on the precise measurement of circuit delays and a simple challenge–response protocol. This technique could be used to produce key-cards that are more difficult to clone than ones involving digital keys on the IC. We consider potential venues of attack against our system, and present candidate implementations. Experiments on Field Programmable Gate Arrays show that the technique is viable, but that our current implementations could require some strengthening before it can be considered as secure. Copyright © 2004 John Wiley & Sons, Ltd.

80 citations


01 Jan 2004
TL;DR: This paper proposes a dynamic cache partitioning method for simultaneous multithreading systems that collects the miss-rate characteristics of simultaneously executing threads at runtime, and partitions the cache among the executing threads.
Abstract: This paper proposes a dynamic cache partitioning method for simultaneous multithreading systems. We present a general partitioning scheme that can be applied to setassociative caches at any partition granularity. Furthermore, in our scheme threads can have overlapping partitions, which provides more degrees of freedom when partitioning caches with low associativity. Since memory reference characteristics of threads can change very quickly, our method collects the miss-rate characteristics of simultaneously executing threads at runtime, and partitions the cache among the executing threads. Partition sizes are varied dynamically to improve hit rates. Trace-driven simulation results show a relative improvement in the L2 hit-rate of up to 40.5% over those generated by the standard least recently used replacement policy, and IPC improvements of up to 17%. Our results show that smart cache management and scheduling is important for SMT systems to achieve high performance.

73 citations


Patent
22 Jul 2004
TL;DR: In this paper, the authors propose a method for selecting a queue for service across a shared link, which includes classifying each queue from a group of queues within a plurality of ingresses into one tier of a number 'N' of tiers.
Abstract: A method for selecting a queue for service across a shared link. The method includes classifying each queue from a group of queues within a plurality of ingresses into one tier of a number 'N' of tiers. The number 'N' is greater than or equal to 2. Information about allocated bandwidth is used to classify at least some of the queues into the tiers. Each tier is assigned a different priority. The method also includes matching queues to available egresses by matching queues classified within tiers with higher priorities before matching queues classified within tiers with lower priorities.

69 citations


01 Jan 2004
TL;DR: This paper proposes a technique for dynamic cache partitioning amongst simultaneously executing processes/threads, and presents a general partitioning scheme that can be applied to set-associative caches at any partition granularity.
Abstract: This paper proposes a technique for dynamic cache partitioning amongst simultaneously executing processes/threads. We present a general partitioning scheme that can be applied to set-associative caches at any partition granularity. Fur- thermore, in our scheme, processes/threads can have overlapping partitions, which provides more degrees of freedom when partitioning caches with low associativity. Since memory reference characteristics of processes/threads can change very quickly, our method collects the miss-rate characteristics of processes/threads at run-time, and partitions the cache amongst the executing ones. Partition sizes are varied dynamically to improve miss-rates. Trace-driven simulation results show a relative improvement in the L2 hit-rate of up to 40.5% over those generated by the standard least recently used replacement policy, and IPC improvements of up to 17%. Our results show that smart cache management and scheduling is important for CMP/SMT systems to achieve high performance.

13 citations


Proceedings ArticleDOI
07 Mar 2004
TL;DR: The design of an input-queued switch system and its associated arbitration and rate allocation algorithms that achieve both absolute rate guarantees and proportional bandwidth sharing even under overloaded or adversarial traffic are described.
Abstract: Despite increasing bandwidth demand and the significant research and commercial activity in large-scale terabit routers for multi-gigabit/s links, many current switch designs do not provide adequate support for rate guarantees. In particular, designs based on the popular combined-input/output-queueing (CIOQ) paradigm have unpredictable performance despite implementing sophisticated scheduling schemes on egress links, because the crossbar arbitration between ingress and egress links is done without regard to desired rate guarantees or prevailing traffic conditions. This work describes the design of an input-queued switch system and its associated arbitration and rate allocation algorithms that achieve both absolute rate guarantees and proportional bandwidth sharing even under overloaded or adversarial traffic. Our algorithms are simple and scalable and require a switch speedup of two to provide rate guarantees; we give the theoretical justification and report on simulation results that justify our claims. A semiconductor chipset based on variants of these algorithms for routers with an aggregate capacity of 160 Gbps with links up to 10 Gbps is now commercially available, and a second-generation chipset supporting 640 Gbps is also available.

9 citations


Journal ArticleDOI
TL;DR: A method of polynomial simulation to calculate switching activities in a general-delay logic circuit is described, a generalization of the exact signal probability evaluation method due to Parker and McCluskey, which has been extended to handle temporal correlation and arbitrary transport delays.
Abstract: We describe a method of polynomial simulation to calculate switching activities in a general-delay logic circuit. This method is a generalization of the exact signal probability evaluation method due to Parker and McCluskey, which has been extended to handle temporal correlation and arbitrary transport delays. The method can target both combinational and sequential circuits. Our method is parameterized by a single parameter l, which determines the speed-accuracy tradeoff. l indicates the depth in terms of logic levels over which spatial signal correlation is taken into account. This is done by only taking into account reconvergent paths whose length is at most l. The rationale is that ignoring spatial correlation for signals that reconverge after many levels of logic introduces negligible error. When l = L, where L is the total number of levels of logic in the circuit, the method will produce the exact switching activity under a zero delay model, taking into account all internal correlation. We present results that show that the error in the switching activity and power estimates is very small even for small values of l. In fact, for most of the examples, power estimates with l = 0 are within 5% of the exact. However, this error can be higher than 20% for some examples. More robust estimates are obtained with l = 2, providing a good compromise between speed and accuracy.

4 citations


Journal ArticleDOI
TL;DR: This paper designs and implements an architecture for access‐controlled resource discovery by integrating access control with the Intentional Naming System (INS), a resource discovery and service location system that fits well within a proxy‐based security framework designed for dynamic networks.
Abstract: Networks of the future will be characterized by a variety of computational devices that display a level of dynamism not seen in traditional wired networks. Because of the dynamic nature of these networks, resource discovery is one of the fundamental problems that must be solved. While resource discovery systems are not a novel concept, securing these systems in an efficient and scalable way is challenging. This paper describes the design and implementation of an architecture for access-controlled resource discovery. This system achieves this goal by integrating access control with the Intentional Naming System (INS), a resource discovery and service location system. The integration is scalable, efficient, and fits well within a proxy-based security framework designed for dynamic networks. We provide performance experiments that show how our solution outperforms existing schemes. The result is a system that provides secure, access-controlled resource discovery that can scale to large numbers of resources and users. Copyright © 2004 John Wiley & Sons, Ltd.