scispace - formally typeset
Search or ask a question

Showing papers by "V. Kamakoti published in 2015"


Proceedings ArticleDOI
22 Nov 2015
TL;DR: SHAKTI-F, a RISC-V based SEE-tolerant micro-processor architecture that provides a solution to the reliability issues mentioned above, is presented and there is a 45% reduction in power consumption due to introduction of fault tolerance.
Abstract: Deeply scaled CMOS circuits are vulnerable to soft and hard errors. These errors pose reliability concerns, especially for systems used in radiation-prone environments like space and nuclear applications. This paper presents SHAKTI-F, a RISC-V based SEE-tolerant micro-processor architecture that provides a solution to the reliability issues mentioned above. The proposed architecture uses error correcting codes (ECC) to tolerate errors in registers and memories, while it employs a combination of space and time redundancy based techniques to tolerate errors in the ALU. Two novel re-computation techniques for detecting errors for the addition/subtraction and multiplication modules are proposed. The scheme also identifies parts of the circuitry that need to be radiation hardened thus providing a total protection to SEEs. The proposed scheme provides fine-grain error detection capability that help in localization of the error to a specific functional unit and isolating the same, rather than the entire processor or a large module within a processor. This provides a graceful degradation and/or fail-safe shutdown capability to the processor. The HDL model of the processor was validated by simulating it with randomly induced SEEs. The proposed scheme adds an extra penalty of only 20% on the core area and 25% penalty on the performance when compared with conventional systems. This is very less when compared to the penalty incurred by employing schemes including double modular and triple modular redundancy. Interestingly, there is a 45% reduction in power consumption due to introduction of faulttolerance. The resulting system runs at 330M Hz on a 55nm technology node, which is sufficient for the class of applications these cores are utilized for.

31 citations


Journal ArticleDOI
TL;DR: A new design-for-testability (DFT) scheme for launch-on-shift (LOS) testing, which ensures that the combinational logic remains undisturbed between the interleaved capture phases, providing computer-aided-design (CAD) tools with extra search space for minimizing launch-to-capture switching activity through test pattern ordering (TPO).
Abstract: Scan-based testing is crucial to ensuring correct functioning of chips. In this scheme, the scan and capture phases are interleaved. It is well known that for large designs, excessive switching activity during the launch-to-capture window leads to high voltage droop on the power grid, ultimately resulting in false delay failures during at-speed test. This article proposes a new design-for-testability (DFT) scheme for launch-on-shift (LOS) testing, which ensures that the combinational logic remains undisturbed between the interleaved capture phases, providing computer-aided-design (CAD) tools with extra search space for minimizing launch-to-capture switching activity through test pattern ordering (TPO). We further propose a new TPO algorithm that keeps track of the don't cares during the ordering process, so that the don't care filling step after the ordering process yields a better reduction in launch-to-capture switching activity compared to any other technique in the literature. The proposed DFT-assisted technique, when applied to circuits in ITC99 benchmark suite, produces an average reduction of 17.68p in peak launch-to-capture switching activity (CSA) compared to the best known lowpower TPO technique. Even for circuits whose test cubes are not rich in don't care bits, the proposed technique produces an average reduction of 15p in peak CSA, while for the circuits with test cubes rich in don't care bits (≥75p), the average reduction is 24p. The proposed technique also reduces the average power dissipation (considering both scan cells and combinational logic) during the scan phase by about 43.5p on an average, compared to the adjacent filling technique.

11 citations


Proceedings ArticleDOI
01 Jan 2015
TL;DR: This paper maps the problem of optimal X-filling for peak power minimization during LOS scheme to a variant of interval coloring problem and proposes a dynamic programming (DP) algorithm for the same along with a theoretical proof for its optimality.
Abstract: At-speed testing is crucial to catch small delay defects that occur during the manufacture of high performance digital chips. Launch-Off-Capture (LOC) and Launch-Off-Shift (LOS) are two prevalently used schemes for this purpose. LOS scheme achieves higher fault coverage while consuming lesser test time over LOC scheme, but dissipates higher power during the capture phase of the at-speed test. Excessive IR-drop during capture phase on the power grid causes false delay failures leading to significant yield reduction that is unwarranted. As reported in literature, an intelligent filling of don't care bits (X-filling) in test cubes has yielded significant power reduction. Given that the tests output by automatic test pattern generation (ATPG) tools for big circuits have large number of don't care bits, the X-filling technique is very effective for them. Assuming that the design for testability (DFT) scheme preserves the state of the combinational logic between capture phases of successive patterns, this paper maps the problem of optimal X-filling for peak power minimization during LOS scheme to a variant of interval coloring problem and proposes a dynamic programming (DP) algorithm for the same along with a theoretical proof for its optimality. To the best of our knowledge, this is the first ever reported X-filling algorithm that is optimal. The proposed algorithm when experimented on ITC99 benchmarks produced peak power savings of up to 34% over the best known low power X-filling algorithm for LOS testing. Interestingly, it is observed that the power savings increase with the size of the circuit.

4 citations


Journal ArticleDOI
TL;DR: This article proposes a proactive workload aware temperature management framework for low-power chip multi-processors (ProWATCh), which includes a novel compiler design for estimating the architectural parameters of a task at compile time and a model-based technique for dynamic estimation of architectural parameters at runtime.
Abstract: With the increase in process variations and diversity in workloads, it is imperative to holistically explore optimization techniques for power and temperature from the circuit layer right up to the compiler/operating system (OS) layer. This article proposes one such holistic technique, called proactive workload aware temperature management framework for low-power chip multi-processors (ProWATCh). At the compiler level ProWATCh includes two techniques: (1) a novel compiler design for estimating the architectural parameters of a task at compile time; and (2) a model-based technique for dynamic estimation of architectural parameters at runtime. At the OS level ProWATCh integrates two techniques: (1) a workload- and temperature-aware process manager for dynamic distribution of tasks to different cores; and (2) a model predictive control-based task scheduler for generating the efficient sequence of task execution. At the circuit level ProWATCh implements either of two techniques: (1) a workload-aware voltage manager for dynamic supply and body bias voltage assignment for a given frequency in processors that support adaptive body bias (ABB); or (2) a workload-aware frequency governor for efficient assignment of upper and lower frequency bounds for frequency scaling in processors that do not support an ABB. Employing ProWATCh (with voltage manager) on an ABB-compatible 3D OpenSPARC architecture using MiBench benchmarks resulted in an average 18p (19ˆC) reduction in peak temperature. Evaluating ProWATCh on an existing quad-core Intel Corei7 processor with frequency governor alone (as the processor does not support an ABB interface) resulted in 10p (8ˆC) reduction in peak temperature when compared to what was obtained using the native Linux 3.0 completely fair scheduler (CFS). To study the effectiveness of the proposed framework across benchmark suites, ProWATCh was evaluated on a quad-core Intel Corei7 processor using CPU SPEC 2006 benchmarks which resulted in 7ˆC reduction in peak temperature as compared to the native Linux 3.0 CFS.

4 citations


Proceedings ArticleDOI
01 Jan 2015
TL;DR: A modeling and optimization framework to systematically improve the FIT (failure-in-time) rate of a design with minimal impact on power, performance and area is proposed.
Abstract: With increasing adoption of newer technologies and architectures targeted for automotive and aviation electronics with an objective to improve performance and/or reduce power/area, soft-error robustness is becoming an important issue to ensure reliable operation for an extended lifetime over a wide range of operating conditions. In this paper, we propose a modeling and optimization framework to systematically improve the FIT (failure-in-time) rate of a design with minimal impact on power, performance and area. We first propose a framework to model and evaluate the relative vulnerability to soft errors of the standard master-slave flip-flops and Dual Interlocked Storage Cells (DICE) in the cell library. Later, we formulate a linear optimization problem using this information to selectively replace the flip-flops so as to improve the FIT rate of the design with minimal impact on area and power. Employing the proposed technique on a popular industrial IP core shows a 32% relative improvement in the design robustness with just 2% increase in design area.

3 citations


Proceedings ArticleDOI
24 Sep 2015
TL;DR: This paper proposes a novel method to efficiently compute the TVF and AVF parameters followed by a linear programming technique that uses these parameters to reduce the soft-error rate of the given design.
Abstract: The advancement in the semiconductor manufacturing process has reduced the device dimensions, which in turn has reduced design and manufacturing costs of the Integrated Chips (IC). This has accelerated the IC penetration in automobiles, health care and safety critical systems. However, the smaller device dimensions have made the ICs vulnerable to soft-errors. The sequential cells in a given design contribute significantly to its soft-error rate (SER). Some of the soft-errors get masked and do not cause any adverse impact. The masking can occur due to logic or timing reasons. This paper presents a flow that uses the Timing Vulnerability Factor (TVF) and Architecture Vulnerability Factor (AVF) of the sequential instances in a given design to reduce its soft-error rate (SER). The paper proposes a novel method to efficiently compute the TVF and AVF parameters followed by a linear programming technique that uses these parameters to reduce the SER of the given design. Using the proposed technique, we have reduced the sequential cell contribution to the SER of an in-house IP design by 36% for an increase of 9% in sequential cells area.

3 citations


Journal ArticleDOI
01 Feb 2015
TL;DR: An evolutionary algorithm aided by particle swarm optimization methodology to generate synthetic benchmark circuits (SBC) that can be used for ALT of FPGAs that is demonstrated to be more suitable forALT, measured in terms of meeting the multiple criteria.
Abstract: Graphical abstractDisplay Omitted HighlightsGenerating circuits for accelerated life testing of field programmable gate arrays.The problem involves multi-variable optimization.Proposed a genetic algorithm for the purpose mentioned above.Proved empirically that particle swarm optimization can be used to enhance the quality of results yielded by a GA.Showed that GA aided by PSO not only reduces time (from months to hours) but can also yield better results than the hand-crafted SBC. Accelerated life testing (ALT) of a field programmable gate array (FPGA) requires it to be configured with a circuit that satisfies multiple criteria. Hand-crafting such a circuit is a herculean task as many components of the criteria are orthogonal to each other demanding a complex multivariate optimization. This paper presents an evolutionary algorithm aided by particle swarm optimization methodology to generate synthetic benchmark circuits (SBC) that can be used for ALT of FPGAs. The proposed algorithm was used to generate a SBC for ALT of a commercial FPGA. The generated SBC when compared with a hand-crafted one, demonstrated to be more suitable for ALT, measured in terms of meeting the multiple criteria. The SBC generated by the proposed technique utilizes 8.37% more resources; operates at a maximum frequency which is 40% higher; and has 7.75% higher switching activity than the hand-crafted one reported in the literature. The hand-crafted circuit is very specific to the particular device of that family of FPGAs, whereas the proposed algorithm is device-independent. In addition, it took several man months to hand-craft the SBC, whereas the proposed algorithm took less than half-a-day.

1 citations