Showing papers in "IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems in 2009"
TL;DR: This paper provides a general description of NoC architectures and applications and enumerates several related research problems organized under five main categories: Application characterization, communication paradigm, communication infrastructure, analysis, and solution evaluation.
Abstract: To alleviate the complex communication problems that arise as the number of on-chip components increases, network-on-chip (NoC) architectures have been recently proposed to replace global interconnects. In this paper, we first provide a general description of NoC architectures and applications. Then, we enumerate several related research problems organized under five main categories: Application characterization, communication paradigm, communication infrastructure, analysis, and solution evaluation. Motivation, problem description, proposed approaches, and open issues are discussed for each problem from system, microarchitecture, and circuit perspectives. Finally, we address the interactions among these research problems and put the NoC design process into perspective.
TL;DR: Simulation results for a set of ISCAS-89 benchmark circuits and the advanced-encryption-standard IP core show that high levels of security can be achieved at less than 5% area and power overhead under delay constraint.
Abstract: Hardware intellectual-property (IP) cores have emerged as an integral part of modern system-on-chip (SoC) designs. However, IP vendors are facing major challenges to protect hardware IPs from IP piracy. This paper proposes a novel design methodology for hardware IP protection using netlist-level obfuscation. The proposed methodology can be integrated in the SoC design and manufacturing flow to simultaneously obfuscate and authenticate the design. Simulation results for a set of ISCAS-89 benchmark circuits and the advanced-encryption-standard IP core show that high levels of security can be achieved at less than 5% area and power overhead under delay constraint.
TL;DR: The architectural influence on static timing analysis is described and recommendations as to profitable and unacceptable architectural features are given and results show that measurement-based methods still used in industry are not useful for quite commonly used complex processors.
Abstract: Embedded hard real-time systems need reliable guarantees for the satisfaction of their timing constraints. Experience with the use of static timing-analysis methods and the tools based on them in the automotive and the aeronautics industries is positive. However, both the precision of the results and the efficiency of the analysis methods are highly dependent on the predictability of the execution platform. In fact, the architecture determines whether a static timing analysis is practically feasible at all and whether the most precise obtainable results are precise enough. Results contained in this paper also show that measurement-based methods still used in industry are not useful for quite commonly used complex processors. This dependence on the architectural development is of growing concern to the developers of timing-analysis tools and their customers, the developers in industry. The problem reaches a new level of severity with the advent of multicore architectures in the embedded domain. This paper describes the architectural influence on static timing analysis and gives recommendations as to profitable and unacceptable architectural features.
TL;DR: This paper presents accelerated algorithms for restoring circuit state elements from the traces collected during a debug session, by exploiting bitwise parallelism and introduces new metrics that guide the automated selection of trace signals, which can enhance the real-time observability during in-system debug.
Abstract: To locate and correct design errors that escape pre-silicon verification, silicon debug has become a necessary step in the implementation flow of digital integrated circuits. Embedded logic analysis, which employs on-chip storage units to acquire data in real time from the internal signals of the circuit-under-debug, has emerged as a powerful technique for improving observability during in-system debug. However, as the amount of data that can be acquired is limited by the on-chip storage capacity, the decision on which signals to sample is essential when it is not known a priori where the bugs will occur. In this paper, we present accelerated algorithms for restoring circuit state elements from the traces collected during a debug session, by exploiting bitwise parallelism. We also introduce new metrics that guide the automated selection of trace signals, which can enhance the real-time observability during in-system debug.
TL;DR: In this paper, exact algorithms for the synthesis of multiple-control Toffoli networks are presented, i.e., algorithms that guarantee to find a network with the minimal number of gates.
Abstract: Synthesis of reversible logic has become a very important research area in recent years. Applications can be found in the domain of low-power design, optical computing, and quantum computing. In the past, several approaches have been introduced that synthesize reversible networks with respect to a given function. Most of these methods only approximate a minimal network representation. In this paper, exact algorithms for the synthesis of multiple-control Toffoli networks are presented, i.e., algorithms that guarantee to find a network with the minimal number of gates. Our iterative algorithms formulate the synthesis problem as a sequence of decision problems. The decision problems are encoded as Boolean satisfiability (SAT) or SAT modulo theory (SMT) instances, respectively. As soon as one of these instances becomes satisfiable, a Toffoli network representation for the given function has been found. We show that choosing the encoding for synthesis is crucial for the resulting runtimes. Furthermore, we discuss the principal limits of the SAT and SMT approaches. To overcome these limits, we propose a method using problem-specific knowledge during synthesis. In addition, better embeddings to make irreversible functions reversible are considered. For the resulting synthesis problems, an improvement is presented that reduces the overall runtime by automatically setting the constant inputs to their optimal values. Experimental results on a large set of benchmarks demonstrate the differences between three exact synthesis algorithms. In addition, a comparison with the best-known heuristic results is provided. In summary, the results show that, for some benchmarks, the heuristic approaches have already found the minimal network, while for other benchmarks, significantly smaller networks exist.
TL;DR: This paper develops and proposes a novel classification for ESL synthesis tools, and presents six different academic approaches in this context based on common principles and needs that are ultimately required for a true ESL synthesis solution.
Abstract: With ever-increasing system complexities, all major semiconductor roadmaps have identified the need for moving to higher levels of abstraction in order to increase productivity in electronic system design. Most recently, many approaches and tools that claim to realize and support a design process at the so-called electronic system level (ESL) have emerged. However, faced with the vast complexity challenges, in most cases at best, only partial solutions are available. In this paper, we develop and propose a novel classification for ESL synthesis tools, and we will present six different academic approaches in this context. Based on these observations, we can identify such common principles and needs as they are leading toward and are ultimately required for a true ESL synthesis solution, covering the whole design process from specification to implementation for complete systems across hardware and software boundaries.
TL;DR: In this paper, statistical blockade is used to filter-to-block-unwanted samples that are insufficiently rare in the tail distribution of the SRAM yield distribution, which can achieve speedups of 10 - 100 times over standard Monte Carlo.
Abstract: Circuit reliability under random parametric variation is an area of growing concern. For highly replicated circuits, e.g., static random access memories (SRAMs), a rare statistical event for one circuit may induce a not-so-rare system failure. Existing techniques perform poorly when tasked to generate both efficient sampling and sound statistics for these rare events. Statistical blockade is a novel Monte Carlo technique that allows us to efficiently filter-to block-unwanted samples that are insufficiently rare in the tail distributions we seek. The method synthesizes ideas from data mining and extreme value theory and, for the challenging application of SRAM yield analysis, shows speedups of 10 - 100 times over standard Monte Carlo.
TL;DR: Simulation results on a complex superscalar processor demonstrate that IFRA is effective in accurately localizing electrical bugs with very little impact on overall chip area.
Abstract: Instruction Footprint Recording and Analysis (IFRA) overcomes challenges associated with an expensive step in post-silicon validation of processors-pinpointing the bug location and the instruction sequence that exposes the bug from a system failure. On-chip recorders collect instruction footprints (information about flows of instructions and what the instructions did as they passed through various design blocks) during the normal operation of the processor in a post-silicon system validation setup. Upon system failure, the recorded information is scanned out and analyzed offline for bug localization. Special self-consistency-based program analysis techniques, together with the test program binary of the application executed during post-silicon validation, are used for this purpose. Major benefits of using IFRA over traditional techniques for post-silicon bug localization are as follows: 1) it does not require full system-level reproduction of bugs, and 2) it does not require full system-level simulation. Simulation results on a complex superscalar processor demonstrate that IFRA is effective in accurately localizing electrical bugs with very little impact on overall chip area.
TL;DR: This paper investigates how to use predictors for forecasting temperature and workload dynamics, and proposes proactive thermal management techniques for multiprocessor system-on-chips.
Abstract: Conventional thermal management techniques are reactive, as they take action after temperature reaches a threshold. Such approaches do not always minimize and balance the temperature, and they control temperature at a noticeable performance cost. This paper investigates how to use predictors for forecasting temperature and workload dynamics, and proposes proactive thermal management techniques for multiprocessor system-on-chips. The predictors we study include autoregressive moving average modeling and lookup tables. We evaluate several reactive and predictive techniques on an UltraSPARC T1 processor and an architecture-level simulator. Proactive methods achieve significantly better thermal profiles and performance in comparison to reactive policies.
TL;DR: An efficient DSE methodology for application-specific MPSoC is proposed that is efficient in the sense that it is capable of finding a set of good candidate architecture configurations by minimizing the number of simulations to be executed.
Abstract: Application-specific multiprocessor systems-on-chip (MPSoCs) are usually designed by using a platform-based approach, where a wide range of customizable parameters can be tuned to find the best tradeoff in terms of the selected figures of merit (such as energy, delay, and area). This optimization phase is called design space exploration (DSE), and it usually consists of a multiobjective optimization problem with multiple constraints. So far, several heuristic techniques have been proposed to address the DSE problem for MPSoC, but they are not efficient enough for managing the application-specific constraints and for identifying the Pareto front. In this paper, an efficient DSE methodology for application-specific MPSoC is proposed. The methodology is efficient in the sense that it is capable of finding a set of good candidate architecture configurations by minimizing the number of simulations to be executed. The methodology combines the design of experiments (DoEs) and response surface modeling (RSM) techniques for managing system-level constraints. First, the DoE phase generates an initial plan of experiments used to create a coarse view of the target design space to be explored by simulations. Then, a set of RSM techniques is used to refine the simulation-based exploration by exploiting the application-specific constraints to identify the maximum number of feasible solutions. To trade off the accuracy and efficiency of the proposed techniques, a set of experimental results for the customization of a symmetric shared-memory on-chip multiprocessor with actual workloads has been reported in this paper.
TL;DR: Synchronous and asynchronous elastic circuits can be designed, analyzed, and optimized using similar techniques, and choices between synchronous and asynchronous implementations are localized and deferred until late in the design process.
Abstract: Elasticity in circuits and systems provides tolerance to variations in computation and communication delays. This paper presents a comprehensive overview of elastic circuits for those designers who are mainly familiar with synchronous design. Elasticity can be implemented both synchronously and asynchronously, although it was traditionally more often associated with asynchronous circuits. This paper shows that synchronous and asynchronous elastic circuits can be designed, analyzed, and optimized using similar techniques. Thus, choices between synchronous and asynchronous implementations are localized and deferred until late in the design process.
TL;DR: Three accurate and scalable algorithms for reliability analysis of logic circuits are presented that provide a closed-form expression for reliability and are accurate when single gate failures are dominant in a logic circuit.
Abstract: Reliability of logic circuits is emerging as an important concern in scaled electronic technologies. Reliability analysis of logic circuits is computationally complex because of the exponential number of inputs, combinations, and correlations in gate failures. This paper presents three accurate and scalable algorithms for reliability analysis of logic circuits. The first algorithm, called observability-based reliability analysis, provides a closed-form expression for reliability and is accurate when single gate failures are dominant in a logic circuit. The second algorithm, called single-pass reliability analysis, computes reliability in a single topological walk through the logic circuit. It computes the exact reliability for circuits without reconvergent fan-out, even in the presence of multiple gate failures. The algorithm can also handle circuits with reconvergent fan-out with high accuracy using correlation coefficients as described in this paper. The third algorithm, called maximum- k gate failure reliability analysis, allows a constraint on the maximum number (k) of gates that can fail simultaneously in a logic circuit. Simulation results for several benchmark circuits demonstrate the accuracy, performance, and potential applications of the proposed algorithms.
TL;DR: The results show that the proposed online-learning algorithm adapts really well and achieves an overall performance comparable to the best-performing expert at any point in time, with energy savings as high as 61% and 49% for HDD and CPU, respectively.
Abstract: In this paper, we propose a novel online-learning algorithm for system-level power management. We formulate both dynamic power management (DPM) and dynamic voltage-frequency scaling problems as one of workload characterization and selection and solve them using our algorithm. The selection is done among a set of experts, which refers to a set of DPM policies and voltage-frequency settings, leveraging the fact that different experts outperform each other under different workloads and device leakage characteristics. The online-learning algorithm adapts to changes in the characteristics and guarantees fast convergence to the best-performing expert. In our evaluation, we perform experiments on a hard disk drive (HDD) and Intel PXA27x core (CPU) with real-life workloads. Our results show that our algorithm adapts really well and achieves an overall performance comparable to the best-performing expert at any point in time, with energy savings as high as 61% and 49% for HDD and CPU, respectively. Moreover, it is extremely lightweight and has negligible overhead.
TL;DR: This work presents a distributed architecture for a data-acquisition system that is based on a number of complex intelligent sensors inside the tire that form a wireless sensor network with coordination nodes placed on the body of the car.
Abstract: Active safety systems are based upon the accurate and fast estimation of the value of important dynamical variables such as forces, load transfer, actual tire-road friction (kinetic friction) muk, and maximum tire-road friction available (potential friction) mup. Measuring these parameters directly from tires offers the potential for improving significantly the performance of active safety systems. We present a distributed architecture for a data-acquisition system that is based on a number of complex intelligent sensors inside the tire that form a wireless sensor network with coordination nodes placed on the body of the car. The design of this system has been extremely challenging due to the very limited available energy combined with strict application requirements for data rate, delay, size, weight, and reliability in a highly dynamical environment. Moreover, it required expertise in multiple engineering disciplines, including control-system design, signal processing, integrated-circuit design, communications, real-time software design, antenna design, energy scavenging, and system assembly.
TL;DR: This paper presents the first linear-time-packing algorithm for the placement with symmetry constraints using the topological floorplan representations and proposes automatically symmetric-feasible (ASF) B*-trees to directly model the placement of a symmetry island.
Abstract: To reduce the effect of parasitic mismatches and circuit sensitivity to thermal gradients or process variations for analog circuits, some pairs of modules need to be placed symmetrically with respect to a common axis, and the symmetric modules are preferred to be placed at closest proximity for better electrical properties. Most previous works handle the problem with symmetry constraints by imposing symmetric-feasible conditions in floorplan representations and using cost functions to minimize the distance between symmetric modules. Such approaches are inefficient due to the large search space and cannot guarantee the closest proximity of symmetry modules. In this paper, we present the first linear-time-packing algorithm for the placement with symmetry constraints using the topological floorplan representations. We first introduce the concept of a symmetry island which is formed by modules of the same symmetry group in a single connected placement. Based on this concept and the B*-tree representation, we propose automatically symmetric-feasible (ASF) B*-trees to directly model the placement of a symmetry island. We then present hierarchical B*-trees (HB*-trees) which can simultaneously optimize the placement with both symmetry islands and nonsymmetric modules. Unlike the previous works, our approach can place the symmetry modules in a symmetry group in close proximity and significantly reduce the search space based on the symmetry-island formulation. In particular, the packing time for an ASF-B*-tree or an HB*-tree is the same as that for a plain B*-tree (only linear) and much faster than previous works. Experimental results show that our approach achieves the best-published quality and runtime efficiency for analog placement.
TL;DR: A signature-based CAD framework that incorporates tools for the logic-level analysis of soft error rate (x) and for signature- based design for reliability (SiDeR) and it is shown that combining the two synthesis approaches can result in further area-reliability improvements.
Abstract: We explore the use of signatures, i.e., partial truth tables generated via bit-parallel functional simulation, during soft error analysis and logic synthesis. We first present a signature-based CAD framework that incorporates tools for the logic-level analysis of soft error rate (x) and for signature-based design for reliability (SiDeR). We observe that the soft error rate (SER) of a logic circuit is closely related to various testability parameters, such as signal observability and probability. We show that these parameters can be computed very efficiently (in linear time) by means of signatures. Consequently, AnSER evaluates logic masking two to three orders of magnitude faster than other SER evaluators while maintaining accuracy. AnSER can also compute SER efficiently in sequential circuits by approximating steady-state probabilities and sequential signal observabilities. In the second part of this paper, we incorporate AnSER into logic synthesis design flows aimed at reliable circuit design. SiDeR identifies and exploits redundancy already present in a circuit via signature comparison to decrease SER. We show that SiDeR reduces SER by 40% with only 13% area overhead. We also describe a second signature-based synthesis strategy that employs local rewriting to simultaneously improve area and decrease SER. This technique yields 13% reduction in SER with a 2% area decrease. We show that combining the two synthesis approaches can result in further area-reliability improvements.
TL;DR: The design of a lightweight thermal balancing policy MiGra, which bounds on-chip temperature gradients via task migration, is presented, which achieves significantly better thermal balancing than state-of-the-art thermal management solutions while keeping the number of migrations bounded.
Abstract: Die-temperature control to avoid hotspots is increasingly critical in multiprocessor systems-on-chip (MPSoCs) for stream computing. In this context, thermal balancing policies based on task migration are a promising approach to redistribute power dissipation and even out temperature gradients. Since stream computing applications require strict quality of service and timing constraints, the real-time performance impact of thermal balancing policies must be carefully evaluated. In this paper, we present the design of a lightweight thermal balancing policy MiGra, which bounds on-chip temperature gradients via task migration. The proposed policy exploits run-time temperature as well as workload information of streaming applications to define suitable run-time thermal migration patterns, which minimize the number of deadline misses. Furthermore, we have experimentally assessed the effectiveness of our thermal balancing policy using a complete field-programmable-gate-array-based emulation of an actual three-core MPSoC streaming platform coupled with a thermal simulator. Our results indicate that MiGra achieves significantly better thermal balancing than state-of-the-art thermal management solutions while keeping the number of migrations bounded.
TL;DR: The proposed workload estimation technique is based on the Kalman filter and can estimate the processing time of workloads in a robust and accurate manner by adaptively calibrating estimation error by feedback and is found that the accuracy of the approach is almost comparable to the oracle accuracy achievable only by offline analysis.
Abstract: Dynamic voltage scaling (DVS) is a popular energy-saving technique for real-time tasks. The effectiveness of DVS critically depends on the accuracy of workload estimation, since DVS exploits the slack or the difference between the deadline and execution time. Many existing DVS techniques are profile based and simply utilize the worst-case or average execution time without estimation. Several recent approaches recognize the importance of workload estimation and adopt statistical estimation techniques. However, these approaches still require extensive profiling to extract reliable workload statistics and furthermore cannot effectively handle time-varying workloads. Feedback-control-based adaptive algorithms have been proposed to handle such nonstationary workloads, but their results are often too sensitive to parameter selection. To overcome these limitations of existing approaches, we propose a novel workload estimation technique for DVS. This technique is based on the Kalman filter and can estimate the processing time of workloads in a robust and accurate manner by adaptively calibrating estimation error by feedback. We tested the proposed method with workloads of various characteristics extracted from eight MPEG video clips. To thoroughly evaluate the performance of our approach, we used both a cycle-accurate simulator and an XScale-based test board. Our simulation result demonstrates that the proposed technique outperforms the compared alternatives with respect to the ability to meet given timing and Quality of Service constraints. Furthermore, we found that the accuracy of our approach is almost comparable to the oracle accuracy achievable only by offline analysis. Experimental results indicate that using our approach can reduce energy consumption by 57.5% on average, only with negligible deadline miss ratio (DMR) around 6.1%. Moreover, the average of computational overheads for the proposed technique is just 0.3%, which is the minimum value compared to other methods. More importantly, the DMR of our method is bounded by 11.7% in the worst case, while those of other methods are twice or more than ours.
TL;DR: A new on-chip continuous-flow decompressor that integrates seamlessly with test logic synthesis flow, and it fits well into various design paradigms, including modular design flow where blocks come with individual decompressors and compactors.
Abstract: This paper presents a new and comprehensive low-power test scheme compatible with a test compression environment. The key contribution of this paper is a flexible test-application framework that achieves significant reductions in switching activity during all phases of scan test: loading, capture, and unloading. In particular, we introduce a new on-chip continuous-flow decompressor. Its synergistic use with a power-aware scan controller allows a significant reduction of toggling rates when feeding scan chains with decompressed test patterns. While the proposed solution requires minimal modifications of the existing design for test logic, experiments indicate that its use results in a low switching activity which reduces power consumption to or below a level of a functional mode. It resolves problems related to power dissipation, voltage drop, and increased temperature. Our approach integrates seamlessly with test logic synthesis flow, and it does not compromise compression ratios. It fits well into various design paradigms, including modular design flow where blocks come with individual decompressors and compactors.
TL;DR: A new layout level automation tool for analog CMOS circuits, namely, analog layout generator (ALG), which is capable of generating individual or matched components as well as placement and routing.
Abstract: In this paper, we present a new layout level automation tool for analog CMOS circuits, namely, analog layout generator (ALG). ALG is capable of generating individual or matched components as well as placement and routing. ALG takes performance considerations into account, optimizing the layout in each step. A distinguishing feature of the tool is primarily providing spectra of generation possibilities ranging from full custom to automatic generation. ALG is not only designed to work as a standalone tool but also implemented to be the final step of an analog automation flow. The flow supports circuit level specification in addition to layout level user specifications, so that it can be integrated into an analog automation system. Another feature of ALG is its interaction with a layout adviser tool, namely, YASA. YASA performs sensitivity simulations using a spicelike simulator providing sensitivities of performance parameters with respect to circuit parameters.
TL;DR: A novel low-transition linear feedback shift register (LFSR) that is based on some new observations about the output sequence of a conventional LFSR, and combined with a scan-chain-ordering algorithm that reduces the average and peak power in the test cycle or while scanning out a response to a signature analyzer.
Abstract: This paper presents a novel low-transition linear feedback shift register (LFSR) that is based on some new observations about the output sequence of a conventional LFSR. The proposed design, called bit-swapping LFSR (BS-LFSR), is composed of an LFSR and a 2 times 1 multiplexer. When used to generate test patterns for scan-based built-in self-tests, it reduces the number of transitions that occur at the scan-chain input during scan shift operation by 50% when compared to those patterns produced by a conventional LFSR. Hence, it reduces the overall switching activity in the circuit under test during test applications. The BS-LFSR is combined with a scan-chain-ordering algorithm that orders the cells in a way that reduces the average and peak power (scan and capture) in the test cycle or while scanning out a response to a signature analyzer. These techniques have a substantial effect on average- and peak-power reductions with negligible effect on fault coverage or test application time. Experimental results on ISCAS'89 benchmark circuits show up to 65% and 55% reductions in average and peak power, respectively.
TL;DR: The presented architecture introduces infrastructure IP cores to overcome key challenges in moving to automotive multicore SoCs: a time-triggered network-on-a-chip with fault isolation for the interconnection of functional IP cores, a diagnostic IP cores for error detection and state recovery, a gateway IP core for interfacing legacy systems, and an IP coreFor reconfiguration.
Abstract: This paper describes an integrated system architecture for automotive electronic systems based on multicore systems-on-chips (SoCs). We integrate functions from different suppliers into a few powerful electronic control units using a dedicated core for each function. This work is fueled by technological opportunities resulting from recent advances in the semiconductor industry and the challenges of providing dependable automotive electronic systems at competitive costs. The presented architecture introduces infrastructure IP cores to overcome key challenges in moving to automotive multicore SoCs: a time-triggered network-on-a-chip with fault isolation for the interconnection of functional IP cores, a diagnostic IP core for error detection and state recovery, a gateway IP core for interfacing legacy systems, and an IP core for reconfiguration. This paper also outlines the migration from today's federated architectures to the proposed integrated architecture using an exemplary automotive E/E system.
TL;DR: Reflective simulation platform (ReSP) exploits the concept of reflection, enabling the integration of SystemC components without source-code modifications and providing full observability of their internal state, enabling complete design space exploration.
Abstract: This paper presents reflective simulation platform (ReSP), a transaction-level multiprocessor simulation platform based on the integration of SystemC and Python. ReSP exploits the concept of reflection, enabling the integration of SystemC components without source-code modifications and providing full observability of their internal state. ReSP offers fine-grained simulation control and supports the evaluation of different hardware/software configurations of a given application, enabling complete design space exploration. ReSP allows the evaluation of real-time applications on high-level hardware models since it provides the transparent emulation of POSIX-compliant real-time operating systems (RTOS) primitives. A number of experiments have been performed to validate ReSP and its capabilities, using a set of single- and multithreaded benchmarks, with both POSIX Threads (PThreads) and OpenMP programming styles. These experiments confirm that reflection introduces negligible ( <1%) overhead when comparing ReSP to plain SystemC simulation. The results also show that ReSP can be successfully used to analyze and explore concurrent and reconfigurable applications even at very early development stages. In fact, the average error introduced by ReSP's RTOS emulation is below 6.6 plusmn 5% w.r.t. the same RTOS running on an instruction set simulator, while simulation speed increases by a factor of ten. Owing to the integration with a scripted language, simulation management is simplified, and experimental setup effort is considerably reduced.
TL;DR: A probabilistic model is presented which incorporates processing and design parameters and enables quantitative analysis of the impact of metallic CNTs on leakage, noise margin, and delay variations of CNFET-based digital logic circuits and provides design and processing guidelines for very large scale integration (VLSI)-scale metallic-CNT-tolerant digital circuits.
Abstract: Metallic carbon nanotubes (CNTs) pose a major barrier to the design of digital logic circuits using CNT field-effect transistors (CNFETs). Metallic CNTs create source to drain shorts in CNFETs, resulting in undesirable effects such as excessive leakage and degraded noise margins. No known CNT growth technique guarantees 0% metallic CNTs. Therefore, special processing techniques are required for removing metallic CNTs after CNT growth. This paper presents a probabilistic model which incorporates processing and design parameters and enables quantitative analysis of the impact of metallic CNTs on leakage, noise margin, and delay variations of CNFET-based digital logic circuits. With practical constraints on these key circuit performance metrics, the model provides design and processing guidelines that are required for very large scale integration (VLSI)-scale metallic-CNT-tolerant digital circuits.
TL;DR: A method that is capable of handling process variations to evaluate analog/RF test measurements at the design stage and provides a general framework to compare alternative test solutions that are continuously being proposed toward reducing the high cost of specification-based tests is presented.
Abstract: We present a method that is capable of handling process variations to evaluate analog/RF test measurements at the design stage. The method can readily be used to estimate test metrics, such as parametric test escape and yield loss, with parts per million accuracy, and to fix test limits that satisfy specific tradeoffs between test metrics of interest. Furthermore, it provides a general framework to compare alternative test solutions that are continuously being proposed toward reducing the high cost of specification-based tests. The key idea of the method is to build a statistical model of the circuit under test and the test measurements using nonparametric density estimation. Thereafter, the statistical model can be simulated very fast to generate an arbitrarily large volume of new data. The method is demonstrated for a previously proposed built-in self-test measurement for low-noise amplifiers. The result indicates that the new synthetic data have the exact same structure of data generated by a computationally intensive brute-force Monte Carlo circuit simulation.
TL;DR: Based on this methodology, a design space exploration framework and the energy model of sensor node have been developed and exploration results show that the energy optimal ECC saves 15-58% node energy for given parameters.
Abstract: In this paper, we first establish that, in wireless sensor networks, operating over ldquosmallrdquo distances, both computation energy and radio energy influence the battery life. In such a scenario, to evaluate the utility of error-correcting codes (ECCs) from an energy perspective, one has to consider the energy consumed in encoding-decoding and transmitting additional ldquoredundantrdquo bits vis-a-vis the energy saved due to coding gain. This paper presents a framework for evaluating various ECCs based on a comprehensive energy model of a sensor node. The framework supports exploration of sensor node design space with application- and deployment-related parameters, like distance, bit error rate, path loss exponent, as well as the modulation scheme and ECC parameters. The exploration results show that, as compared to the uncoded-data transmission, the energy-optimal ECC saves 15%-60% node energy for the given parameters.
TL;DR: In this article, the authors present theoretical and practical results concerning the stability of piecewise-linear (PWL) reduced models for the purposes of analog macromodeling.
Abstract: This paper presents theoretical and practical results concerning the stability of piecewise-linear (PWL) reduced models for the purposes of analog macromodeling. Results include proofs of input-output (I/O) stability for PWL approximations to certain classes of nonlinear descriptor systems, along with projection techniques that are guaranteed to preserve I/O stability in reduced-order PWL models. We also derive a new PWL formulation and introduce a new nonlinear projection, allowing us to extend our stability results to a broader class of nonlinear systems described by models containing nonlinear descriptor functions. Lastly, we present algorithms to compute efficiently the required stabilizing nonlinear left-projection matrix operators.
TL;DR: CAFFEINE is a method to automatically generate compact interpretable symbolic performance models of analog circuits with no prior specification of an equation template that demonstrates lower prediction error than posynomials, splines, neural networks, kriging, and support vector machines.
Abstract: This paper presents CAFFEINE, a method to automatically generate compact interpretable symbolic performance models of analog circuits with no prior specification of an equation template. CAFFEINE uses SPICE simulation data to model arbitrary nonlinear circuits and circuit characteristics. CAFFEINE expressions are canonical-form functions: product-of-sum layers alternating with sum-of-product layers, as defined by a grammar. Multiobjective genetic programming trades off error with model complexity. On test problems, CAFFEINE models demonstrate lower prediction error than posynomials, splines, neural networks, kriging, and support vector machines. This paper also demonstrates techniques to scale CAFFEINE to larger problems.
TL;DR: This paper surveys existing performance analysis approaches from real-time systems research and compares them to the established layered software architectures of automotive system design and highlights key challenges for the application of performance analysis in this domain.
Abstract: Software timing aspects have only recently received broad attention in the automotive industry. New design trends and the ongoing work in the AUTOSAR (Automotive Open System Architecture) partnership have significantly increased the industry's awareness to these issues. Now, timing is recognized as a major challenge and has been put explicitly on the agenda of AUTOSAR and other industry-driven research projects. The goals include complementing the existing standard by a timing view and adding methodological steps, if necessary. Clearly, establishing such timing models requires knowing well the implications of modern architectures and topologies. In this paper, we survey existing performance analysis approaches from real-time systems research and compare them to the established layered software architectures of automotive system design. We highlight key challenges for the application of performance analysis in this domain and identify structural as well as behavioral ldquomodeling gapsrdquo. While structural gaps can be overcome by model transformations, behavioral gaps require real extensions to known analyses. We discuss two such extensions in detail, namely, the use of hierarchical event models and the specialties of timing analysis for multicore platforms. This paper concludes with an overview over qualitative comparisons of analysis techniques, both technically and concerning their industrial applicability.
TL;DR: MOJITO-R is presented, a tool that performs variation-aware structural synthesis of analog circuits that returns trustworthy topologies by searching across a space of thousands of possible topologies defined by hierarchically organized analog structural building blocks.
Abstract: This paper presents MOJITO-R, a tool that performs variation-aware structural synthesis of analog circuits. It returns trustworthy topologies by searching across a space of thousands of possible topologies defined by hierarchically organized analog structural building blocks. ldquoStructural homotopyrdquo conducts search at several objective-function tightening levels (numbers of process corners) simultaneously. Multiobjective evolutionary search returns sized topologies which trade off power, area, performances, and yield. An experimental validation run returned 78 643 Pareto-optimal designs, having 982 sized topologies with various specification/yield combinations. A decision tree is extracted to visualize the performance-topology relationship.