scispace - formally typeset
Search or ask a question

Showing papers by "Igor L. Markov published in 2010"


Journal ArticleDOI
TL;DR: EPIC accomplishes this using a novel low-overhead combinational chip-locking system and a chip-activation protocol based on public-key cryptography.
Abstract: An effective technique to combat IC piracy is to render infringement impractical by making physical tampering unprofitable and attacks computationally infeasible. EPIC accomplishes this using a novel low-overhead combinational chip-locking system and a chip-activation protocol based on public-key cryptography.

321 citations


Proceedings ArticleDOI
07 Nov 2010
TL;DR: A self-contained, flat, quadratic global placer that is simpler than existing placers and easier to integrate into timing-closure flows and is amenable to parallelism, which is reported on with SSE2 instructions and up to eight parallel threads.
Abstract: We propose a self-contained, flat, force-directed algorithm for global placement that is simpler than existing placers and easier to integrate into timing-closure flows. It maintains lower-bound and upper-bound placements that converge to a final solution. The upper-bound placement is produced by a novel rough legalization algorithm. Our placer SimPL outperforms mPL6, NTUPlace3, FastPlace3, APlace2 and Capo simultaneously in runtime and solution quality, running 6.4 times faster than mPL6 and reducing wirelength by 2% on the ISPD 2005 benchmark suite.

94 citations


Proceedings ArticleDOI
14 Mar 2010
TL;DR: A novel branch-free representation for routed nets, a trigonometric penalty function (TPF), dynamic adjustment of Lagrange multipliers (DALM), and aggressive lower-bound estimates (ALBE) for A*-search, resulting in faster routing are introduced.
Abstract: To ensure chip manufacturability, all routes must be completed without violations. Furthermore, the chip's power consumption and performance are determined by the length of its routed wires. Therefore, our work focuses on minimizing wirelength. Our key innovations include: (1) a novel branch-free representation (BFR) for routed nets, (2) a trigonometric penalty function (TPF), (3) dynamic adjustment of Lagrange multipliers (DALM), (4) cyclic net locking (CNL), and (5) aggressive lower-bound estimates (ALBE) for A*-search, resulting in faster routing. We complete all routable ISPD 2008 contest benchmarks and re-placed adaptec suite without violation and produce shorter routes.

81 citations


Proceedings ArticleDOI
17 Jun 2010
TL;DR: This work develops several verification techniques for reversible circuits which arise as runtime bottlenecks of key quantum algorithms, and extends existing quantum verification tools using SAT-solvers.
Abstract: We perform formal verification of quantum circuits by integrating several techniques specialized to particular classes of circuits. Our verification methodology is based on the new notion of a reversible miter that allows one to leverage existing techniques for simplification of quantum circuits. For reversible circuits which arise as runtime bottlenecks of key quantum algorithms, we develop several verification techniques and empirically compare them. We also extend existing quantum verification tools using SAT-solvers. Experiments with circuits for Shor's number-factoring algorithm, containing thousands of gates, show improvements in efficiency by four orders of magnitude.

48 citations


Book ChapterDOI
11 Jul 2010
TL;DR: A highly scalable symmetry detection algorithm based on a decision tree that combines elements of group-theoretic computation and SAT-inspired backtracking search is described and results of its application on the SAT 2009 competition benchmarks are provided.
Abstract: The past few years have seen significant progress in algorithms and heuristics for both SAT and symmetry detection. Additionally, the thesis that some of SAT’s intractability can be explained by the presence of symmetry, and that it can be addressed by the introduction of symmetry-breaking constraints, was tested, albeit only to a rather limited extent. In this paper we explore further connections between symmetry and satisfiability and demonstrate the existence of intractable SAT instances that exhibit few or no symmetries. Specifically, we describe a highly scalable symmetry detection algorithm based on a decision tree that combines elements of group-theoretic computation and SAT-inspired backtracking search, and provide results of its application on the SAT 2009 competition benchmarks. For SAT instances with significant symmetry we also compare SAT runtimes with and without the addition of symmetry-breaking constraints.

41 citations


Proceedings ArticleDOI
07 Nov 2010
TL;DR: This work develops new modeling techniques and algorithms, as well as a methodology, for clock power optimization subject to tight skew constraints in the presence of process variations, for high-performance CPUs and SoCs.
Abstract: Clock networks contribute a significant fraction of dynamic power and can be a limiting factor in high-performance CPUs and SoCs. The need for multi-objective optimization over a large parameter space and the increasing impact of process variation make clock network synthesis particularly challenging. In this work, we develop new modeling techniques and algorithms, as well as a methodology, for clock power optimization subject to tight skew constraints in the presence of process variations. Key contributions include a new time-budgeting step for clock-tree tuning, accurate optimizations that satisfy budgets, modeling and optimization of variational skew. Our implementation, Contango 2.0, outperforms the winners of the ISPD 2010 clock-network synthesis contest on 45nm benchmarks from Intel and IBM.

37 citations


Patent
09 Mar 2010
TL;DR: In this article, a shared secret protocol is used between an IC designer and a fabrication facility building the IC, where the IC at the fabrication facility scrambles the bus on the IC using an encryption key generated from unique identification data received from the IC designer.
Abstract: Techniques are able to lock and unlock and integrated circuit (IC) based device by encrypting/decrypting a bus on the device. The bus may be a system bus for the IC, a bus within the IC, or an external input/output bus. A shared secret protocol is used between an IC designer and a fabrication facility building the IC. The IC at the fabrication facility scrambles the bus on the IC using an encryption key generated from unique identification data received from the IC designer. With the IC bus locked by the encryption key, only the IC designer may be able to determine and communicate the appropriate activation key required to unlock (e.g., unscramble) the bus and thus make the integrated circuit usable.

29 citations


Proceedings ArticleDOI
08 Mar 2010
TL;DR: This work proposes a methodology for Boolean matching under permutations of inputs and outputs (PP-equivalence checking problem) — a key step in incremental logic design that identifies large sections of a netlist that are not affected by a change in specifications.
Abstract: We propose a methodology for Boolean matching under permutations of inputs and outputs (PP-equivalence checking problem) --- a key step in incremental logic design that identifies large sections of a netlist that are not affected by a change in specifications. Finding reusable sections of a netlist reduces the amount of work in each design iteration and accelerates design closure. Our approach integrates graph-based, simulation-driven and SAT-based techniques to make Boolean matching feasible for large circuits. Experimental results confirm scalability of our techniques to circuits with hundreds and even thousands of inputs and outputs.

23 citations


Proceedings ArticleDOI
08 Mar 2010
TL;DR: This work offers new algorithms and a methodology for SPICE-accurate optimization of clock networks, coordinated to satisfy slew constraints and achieve best trade-offs between skew, insertion delay, power, as well as tolerance to variations.
Abstract: On-chip clock networks are remarkable in their impact on the performance and power of synchronous circuits, in their susceptibility to adverse effects of semiconductor technology scaling, as well as in their strong potential for improvement through better CAD algorithms and tools. Our work offers new algorithms and a methodology for SPICE-accurate optimization of clock networks, coordinated to satisfy slew constraints and achieve best trade-offs between skew, insertion delay, power, as well as tolerance to variations. Our implementation, called Contango, is evaluated on 45nm benchmarks from IBM Research and Texas Instruments with up to 50K sinks.

21 citations


Patent
09 Mar 2010
TL;DR: In this article, the authors proposed a scheme for reducing the likelihood of piracy of integrated circuit design using combinational circuit locking system and activation protocol based on public-key cryptography, where every integrated circuit is to be activated with an external key, which can only be generated by an authenticator.
Abstract: Techniques are provided for reducing the likelihood of piracy of integrated circuit design using combinational circuit locking system and activation protocol based on public-key cryptography. Every integrated circuit is to be activated with an external key, which can only be generated by an authenticator, such as the circuit designer. During circuit design, register transfer level (RTL) descriptions of the IC design are embedded with combinational logic based on a master key applied by the authenticator. That combinational logic renders at least one module of the RTL description locked, i.e., encrypted. The completed circuit design from the authenticator is sent to a fabrication lab with the combinationally locked modules. After fabrication, the circuit can only be activated when the authenticator sends an appropriate key that is used by the circuit to unlock the locked portions and thereby activate the circuit.

16 citations


Journal ArticleDOI
TL;DR: This work extends modern digital synthesis with a novel technique, called SWEDE, that makes use of extensive external don't-cares present implicitly in existing simulation-based verification environments for circuit customization.
Abstract: Traditional digital circuit synthesis flows start from an HDL behavioral definition and assume that circuit functions are almost completely defined, making don't-care conditions rare. However, recent design methodologies do not always satisfy these assumptions. For instance, third-party IP blocks used in a system-on-chip are often overdesigned for the requirements at hand. By focusing only on the input combinations occurring in a specific application, one could resynthesize the system to greatly reduce its area and power consumption. Therefore we extend modern digital synthesis with a novel technique, called SWEDE, that makes use of extensive external don't-cares. In addition, we utilize such don't-cares present implicitly in existing simulation-based verification environments for circuit customization. Experiments indicate that SWEDE scales to large ICs with half-million input vectors and handles practical cases well.

Journal ArticleDOI
TL;DR: This work performs formal verification of quantum circuits by integrating several techniques specialized toparticular classes of circuits, based on the new notion of a reversible moniter, and combines existing quantum verification tools with the use of SAT-solvers.
Abstract: We perform formal verification of quantum circuits by integrating several techniques specialized toparticular classes of circuits. Our verification methodology is based on the new notion of a reversiblemiter that allows one to leverage existing techniques for simplification of quantum circuits. For reversiblecircuits which arise as runtime bottlenecks of key quantum algorithms, we develop severalverification techniques and empirically compare them. We also combine existing quantum verificationtools with the use of SAT-solvers. Experiments with circuits for Shor's number-factoring algorithm,containing thousands of gates, show improvements in efficiency by four orders of magnitude.

Proceedings ArticleDOI
07 Nov 2010
TL;DR: This work develops an integrated transformation system that performs multiple optimizations simultaneously on larger design partitions than existing approaches, and combines physically-aware register retiming, along with a novel form of cloning and register placement.
Abstract: The impact of physical synthesis on design performance is increasing as process technology scales. Current physical synthesis flows generally perform a series of individual netlist transformations based on local timing conditions. However, such optimizations lack sufficient perspective or scope to achieve timing closure in many cases. To address these issues, we develop an integrated transformation system that performs multiple optimizations simultaneously on larger design partitions than existing approaches. Our system, SPIRE, combines physically-aware register retiming, along with a novel form of cloning and register placement. SPIRE also incorporates a placement-dependent static timing analyzer (STA) with a delay model that accounts for buffering and is suitable for physical synthesis. Empirical results on 45nm microprocessor designs show 8% improvement in worst-case slack and 69% improvement in total negative slack after an industrial physical synthesis flow was already completed.

Journal ArticleDOI
TL;DR: This article describes a paradigm of transactional timing analysis, which, together with incremental updates, offers an efficient, nested undo functionality that avoids significant timing calculations.
Abstract: Modern physical-synthesis flows operate on very large designs and perform increasingly aggressive timing optimizations. Traditional incremental timing analysis now represents the single greatest bottleneck in such optimizations and lacks the features necessary to support them efficiently. This article describes a paradigm of transactional timing analysis, which, together with incremental updates, offers an efficient, nested undo functionality that avoids significant timing calculations.

Journal ArticleDOI
TL;DR: This is a review of Advanced Excel for Scientific Data Analysis, 2nd ed, by Robert de Levie.
Abstract: This is a review of Advanced Excel for Scientific Data Analysis, 2nd ed. (by Robert de Levie)

Proceedings ArticleDOI
08 Mar 2010
TL;DR: This work utilizes EDA-inspired high-performance algorithms to simulate natural energy minimization in spin systems and study the significance of hyper-couplings in the context of recently implemented adiabatic quantum computers.
Abstract: With the prospect of atomic-scale computing, we study cumulative energy profiles of spin-spin interactions in non-ferromagnetic lattices (Ising spin-glasses)---an established topic in solid-state physics that is now becoming relevant to atomic-scale EDA. Recent proposals suggest non-traditional computing devices based on nature's ability to find min-energy states. Spinto utilizes EDA-inspired high-performance algorithms to (i) simulate natural energy minimization in spin systems and (ii) study its potential for solving hard computational problems. Unlike previous work, our algorithms are not limited to planar Ising topologies. In one CPU-day, our branch-and-bound algorithm finds min-energy (ground) states on 100 spins, while our local search approximates ground states on 1,000, 000 spins. We use this computational tool to study the significance of hyper-couplings in the context of recently implemented adiabatic quantum computers.

Dissertation
01 Jan 2010
TL;DR: Integrated optimization techniques described in this dissertation ensure graceful timing-closure process and impact nearly every aspect of a typical physical synthesis flow and integrate groups of related transformations to break circular dependencies and increase the number of circuit elements that can be jointly optimized to escape local minima.
Abstract: In modern VLSI design, physical synthesis tools are primarily responsible for satisfying chip-performance constraints by invoking a broad range of circuit optimizations, such as buffer insertion, logic restructuring, gate sizing and relocation. This process is known as timing closure. Our research seeks more powerful and efficient optimizations to improve the state of the art in modern chip design. In particular, we integrate timing-driven relocation, retiming, logic cloning, buffer insertion and gate sizing in novel ways to create powerful circuit transformations that help timing-critical paths satisfy setup-time constraints. State-of-the-art physical synthesis optimizations are typically applied at two scales: i) global algorithms that affect the entire netlist and ii) local transformations that focus on a handful of gates or interconnections. The scale of modern chip designs dictates that only near-linear-time optimization algorithms can be applied at the global scope — typically limited to wirelength-driven placement and legalization. Localized transformations can rely on more time-consuming optimizations with accurate delay models. Few techniques bridge the gap between fully-global and localized optimizations. This dissertation broadens the scope of physical synthesis optimization to include accurate transformations operating between the global and local scales. In particular, we integrate groups of related transformations to break circular dependencies and increase the number of circuit elements that can be jointly optimized to escape local minima. Integrated transformations in this dissertation are developed by identifying and removing obstacles to successful optimizations. Integration is achieved through mapping multiple operations to rigorous mathematical optimization problems that can be solved simultaneously. We achieve computational scalability in our techniques by leveraging analytical delay models and focusing optimization efforts on carefully selected regions of the chip. In this regard, we make extensive use of a linear interconnect-delay model that accounts for the impact of subsequent repeated insertion. Our integrated transformations are evaluated on high-performance circuits with over 100,000 gates. Integrated optimization techniques described in this dissertation ensure graceful timing-closure process and impact nearly every aspect of a typical physical synthesis flow. They have been validated in EDA tools used at IBM for physical synthesis of high-performance CPU and ASIC designs, where they significantly improved chip performance.

Proceedings ArticleDOI
13 Jun 2010
TL;DR: This work focuses on stream processing and quantifies performance losses due to stochastic runtimes, and derives new analytical results and validate them by numerical simulations, to explore unique benefits of stoChasticity and show that they outweigh the costs for software streams.
Abstract: With the end of clock-frequency scaling, parallelism has emerged as the key driver of chip-performance growth. Yet, several factors undermine efficient simultaneous use of on-chip resources, which continue scaling with Moore's law. These factors are often due to sequential dependencies, as illustrated by Amdahl's law. Quantifying achievable parallelism can help prevent futile programming efforts and guide innovation toward the most significant challenges. To complement Amdahl's law, we focus on stream processing and quantify performance losses due to stochastic runtimes. Using spectral theory of random matrices, we derive new analytical results and validate them by numerical simulations. These results allow us to explore unique benefits of stochasticity and show that they outweigh the costs for software streams.