scispace - formally typeset
Search or ask a question

Showing papers on "Pipeline (computing) published in 2011"


Journal ArticleDOI
TL;DR: While core min-delay constraints limit the potential benefits of the embedded EDS design, a salient advantage of the TRC design is the ability to detect a wider range of dynamic delay variation, as demonstrated through low supply voltage (VCC) measurements.
Abstract: A 45 nm microprocessor core integrates resilient error-detection and recovery circuits to mitigate the clock frequency (FCLK) guardbands for dynamic parameter variations to improve throughput and energy efficiency. The core supports two distinct error-detection designs, allowing a direct comparison of the relative trade-offs. The first design embeds error-detection sequential (EDS) circuits in critical paths to detect late timing transitions. In addition to reducing the Fclk guardbands for dynamic variations, the embedded EDS design can exploit path-activation rates to operate the microprocessor faster than infrequently-activated critical paths. The second error-detection design offers a less-intrusive approach for dynamic timing-error detection by placing a tunable replica circuit (TRC) per pipeline stage to monitor worst-case delays. Although the TRCs require a delay guardband to ensure the TRC delay is always slower than critical-path delays, the TRC design captures most of the benefits from the embedded EDS design with less implementation overhead. Furthermore, while core min-delay constraints limit the potential benefits of the embedded EDS design, a salient advantage of the TRC design is the ability to detect a wider range of dynamic delay variation, as demonstrated through low supply voltage (VCC) measurements. Both error-detection designs interface with error-recovery techniques, enabling the detection and correction of timing errors from fast-changing variations such as high-frequency VCC droops. The microprocessor core also supports two separate error-recovery techniques to guarantee correct execution even if dynamic variations persist. The first technique requires clock control to replay errant instructions at 1/2FCLK. In comparison, the second technique is a new multiple-issue instruction replay design that corrects errant instructions with a lower performance penalty and without requiring clock control. Silicon measurements demonstrate that resilient circuits enable a 41% throughput gain at equal energy or a 22% energy reduction at equal throughput, as compared to a conventional design when executing a benchmark program with a 10% VCC droop. In addition, the microprocessor includes a new adaptive clock control circuit that interfaces with the resilient circuits and a phase-locked loop (PLL) to track recovery cycles and adapt to persistent errors by dynamically changing Fclk f°Γ maximum efficiency.

263 citations


Journal ArticleDOI
22 Feb 2011
TL;DR: This paper applies Razor to a 32 bit ARM processor with a micro-architecture design that has balanced pipeline stages with critical memory access and clock-gating enable paths, and shows potential for parametric yield improvement through energy-efficient operation using Razor.
Abstract: Razor is a hybrid technique for dynamic detection and correction of timing errors. A combination of error detecting circuits and micro-architectural recovery mechanisms creates a system that is robust in the face of timing errors, and can be tuned to an efficient operating point by dynamically eliminating unused timing margins. Savings from margin reclamation can be realized as per device power-efficiency improvement, or parametric yield improvement for a batch of devices. In this paper, we apply Razor to a 32 bit ARM processor with a micro-architecture design that has balanced pipeline stages with critical memory access and clock-gating enable paths. The design is fabricated on a UMC 65 nm process, using industry standard EDA tools, with a worst-case STA signoff of 724 MHz. Based on measurements on 87 samples from split-lots, we obtain 52% power reduction for the overall distribution at 1 GHz operation. We present error rate driven dynamic voltage and frequency scaling schemes where runtime adaptation to PVT variations and tolerance of fast transients is demonstrated. All Razor cells are augmented with a sticky error history bit, allowing precise diagnosis of timing errors over the execution of test vectors. We show potential for parametric yield improvement through energy-efficient operation using Razor.

203 citations


Journal ArticleDOI
TL;DR: In this article, a two-stage pipeline ADC architecture with a large first-stage resolution, enabled with the help of a SAR-based sub-ADC, is presented, achieving an ENOB of 104b at Nyquist and a figure-of-merit of 52 f J/conversion-step.
Abstract: Successive approximation register (SAR) ADC architectures are popular for achieving high energy efficiency but they suffer from resolution and speed limitations On the other hand pipeline ADC architectures can achieve high resolution and speed but have lower energy-efficiency and are more complex We pro pose a two-stage pipeline ADC architecture with a large first-stage resolution, enabled with the help of a SAR-based sub-ADC The prototype 12b 50 MS/s ADC achieves an ENOB of 104b at Nyquist, and a figure-of-merit of 52 f J/conversion-step The ADC achieves low-power, high-resolution and high-speed operation without calibration The ADC is fabricated in 65 nm and 90 nm CMOS and occupies a core area of only 016 mm2

201 citations


Proceedings ArticleDOI
27 Feb 2011
TL;DR: This paper compares the delay and area of a comprehensive set of processor building block circuits when implemented on custom CMOS and FPGA substrates to infer how the microarchitecture of soft processors on FPGAs should be different from hard processors on customCMOS.
Abstract: As soft processors are increasingly used in diverse applications, there is a need to evolve their microarchitectures in a way that suits the FPGA implementation substrate. This paper compares the delay and area of a comprehensive set of processor building block circuits when implemented on custom CMOS and FPGA substrates. We then use the results of these comparisons to infer how the microarchitecture of soft processors on FPGAs should be different from hard processors on custom CMOS.We find that the ratios of the area required by an FPGA to that of custom CMOS for different building blocks varies significantly more than the speed ratios. As area is often a key design constraint in FPGA circuits, area ratios have the most impact on microarchitecture choices. Complete processor cores have area ratios of 17-27x and delay ratios of 18-26x. Building blocks that have dedicated hardware support on FPGAs such as SRAMs, adders, and multipliers are particularly area-efficient (2-7x area ratio), while multiplexers and CAMs are particularly area-inefficient (>100x area ratio), leading to cheaper ALUs, larger caches of low associativity, and more expensive bypass networks than on similar hard processors. We also find that a low delay ratio for pipeline latches (12-19x) suggests soft processors should have pipeline depths 20% greater than hard processors of similar complexity.

133 citations


Proceedings ArticleDOI
04 Jun 2011
TL;DR: From this idea, a toolset is developed, called FabScalar, for automatically composing the synthesizable register-transfer-level (RTL) designs of arbitrary cores within a canonical superscalar template, which defines canonical pipeline stages and interfaces among them.
Abstract: A growing body of work has compiled a strong case for the single-ISA heterogeneous multi-core paradigm. A single-ISA heterogeneous multi-core provides multiple, differently-designed superscalar core types that can streamline the execution of diverse programs and program phases. No prior research has addressed the 'Achilles' heel of this paradigm: design and verification effort is multiplied by the number of different core types. This work frames superscalar processors in a canonical form, so that it becomes feasible to quickly design many cores that differ in the three major superscalar dimensions: superscalar width, pipeline depth, and sizes of structures for extracting instruction-level parallelism (ILP). From this idea, we develop a toolset, called FabScalar, for automatically composing the synthesizable register-transfer-level (RTL) designs of arbitrary cores within a canonical superscalar template. The template defines canonical pipeline stages and interfaces among them. A Canonical Pipeline Stage Library (CPSL) provides many implementations of each canonical pipeline stage, that differ in their superscalar width and depth of sub-pipelining. An RTL generation tool uses the template and CPSL to automatically generate an overall core of desired configuration. Validation experiments are performed along three fronts to evaluate the quality of RTL designs generated by FabScalar: functional and performance (instructions-per-cycle (IPC)) validation, timing validation (cycle time), and confirmation of suitability for standard ASIC flows. With FabScalar, a chip with many different superscalar core types is conceivable.

128 citations


Journal ArticleDOI
TL;DR: In this article, the complex problem of strength verification of a buried steel pipeline crossing the trace of a normal active fault is treated analytically, and a refined methodology for the calculation of the axial and bending pipeline strains is presented.

115 citations


Journal ArticleDOI
Jiangwen Wan1, Yang Yu, Yinfeng Wu, Renjian Feng, Ning Yu 
27 Dec 2011-Sensors
TL;DR: Experimental results illustrate that this hierarchical pipeline leak detection and localization method could effectively improve the accuracy of the leak point localization and reduce the undetected rate as well as false alarm rate.
Abstract: In light of the problems of low recognition efficiency, high false rates and poor localization accuracy in traditional pipeline security detection technology, this paper proposes a type of hierarchical leak detection and localization method for use in natural gas pipeline monitoring sensor networks. In the signal preprocessing phase, original monitoring signals are dealt with by wavelet transform technology to extract the single mode signals as well as characteristic parameters. In the initial recognition phase, a multi-classifier model based on SVM is constructed and characteristic parameters are sent as input vectors to the multi-classifier for initial recognition. In the final decision phase, an improved evidence combination rule is designed to integrate initial recognition results for final decisions. Furthermore, a weighted average localization algorithm based on time difference of arrival is introduced for determining the leak point's position. Experimental results illustrate that this hierarchical pipeline leak detection and localization method could effectively improve the accuracy of the leak point localization and reduce the undetected rate as well as false alarm rate.

111 citations


Journal ArticleDOI
04 Jun 2011
TL;DR: This work creates the first-ever provably information-flow secure processor with micro-architectural features including pipelining and cache and describes a new hardware description language, Caisson, that combines domain-specific abstractions common to hardware design with insights from type-based techniques used in secure programming languages.
Abstract: Information flow is an important security property that must be incorporated from the ground up, including at hardware design time, to provide a formal basis for a system's root of trust. We incorporate insights and techniques from designing information-flow secure programming languages to provide a new perspective on designing secure hardware. We describe a new hardware description language, Caisson, that combines domain-specific abstractions common to hardware design with insights from type-based techniques used in secure programming languages. The proper combination of these elements allows for an expressive, provably-secure HDL that operates at a familiar level of abstraction to the target audience of the language, hardware architects.We have implemented a compiler for Caisson that translates designs into Verilog and then synthesizes the designs using existing tools. As an example of Caisson's usefulness we have addressed an open problem in secure hardware by creating the first-ever provably information-flow secure processor with micro-architectural features including pipelining and cache. We synthesize the secure processor and empirically compare it in terms of chip area, power consumption, and clock frequency with both a standard (insecure) commercial processor and also a processor augmented at the gate level to dynamically track information flow. Our processor is competitive with the insecure processor and significantly better than dynamic tracking.

105 citations


DOI
18 Mar 2011
TL;DR: This paper presents Patmos, a processor optimized for low WCET bounds rather than high average case performance, a dual- issue, statically scheduled RISC processor that relies on a customized compiler.
Abstract: Current processors are optimized for average case performance, often leading to a high worst-case execution time (WCET). Many architectural features that increase the average case performance are hard to be modeled for the WCET analysis. In this paper we present Patmos, a processor optimized for low WCET bounds rather than high average case performance. Patmos is a dual- issue, statically scheduled RISC processor. The instruction cache is organized as a method cache and the data cache is organized as a split cache in order to simplify the cache WCET analysis. To fill the dual-issue pipeline with enough useful instructions, Patmos relies on a customized compiler. The compiler also plays a central role in optimizing the application for the WCET instead of average case performance.

105 citations


Proceedings ArticleDOI
03 Dec 2011
TL;DR: This approach identifies recurring instruction sequences as phases of “temporal regularity” in a program's execution, and maps suitable ones to the BERET hardware, a three-stage pipeline with a bundled execution model that demonstrates significant savings on instruction fetch, decode and register file accesses energy.
Abstract: Technology scaling has delivered on its promises of increasing device density on a single chip. However, the voltage scaling trend has failed to keep up, introducing tight power constraints on manufactured parts. In such a scenario, there is a need to incorporate energy-efficient processing resources that can enable more computation within the same power budget. Energy efficiency solutions in the past have typically relied on application specific hardware and accelerators. Unfortunately, these approaches do not extend to general purpose applications due to their irregular and diverse code base. Towards this end, we propose BERET, an energy-efficient co-processor that can be configured to benefit a wide range of applications. Our approach identifies recurring instruction sequences as phases of "temporal regularity" in a program's execution, and maps suitable ones to the BERET hardware, a three-stage pipeline with a bundled execution model. This judicious off-loading of program execution to a reduced-complexity hardware demonstrates significant savings on instruction fetch, decode and register file accesses energy. On average, BERET reduces energy consumption by a factor of 3-4X for the program regions selected across a range of general-purpose and media applications. The average energy savings for the entire application run was 35% over a single-issue in-order processor.

104 citations


Journal ArticleDOI
TL;DR: The accuracy in source-level reconstruction by the proposed pipeline is confirmed by an improved specificity in the retrieval of RSNs from experimental data, and the sensitivity of the ICA results to the decomposition algorithm is assessed.
Abstract: To study functional connectivity using magnetoencephalographic (MEG) data, the high-quality source-level reconstruction of brain activity constitutes a critical element. MEG resting-state networks (RSNs) have been documented by means of a dedicated processing pipeline: MEG recordings are decomposed by independent component analysis (ICA) into artifact and brain components (ICs); next, the channel maps associated with the latter ones are projected into the source space and the resulting voxel-wise weights are used to linearly combine the IC time courses. An extensive description of the proposed pipeline is provided here, along with an assessment of its performances with respect to alternative approaches. The following investigations were carried out: (1) ICA decomposition algorithm. Synthetic data are used to assess the sensitivity of the ICA results to the decomposition algorithm, by testing FastICA, INFOMAX, and SOBI. FastICA with deflation approach, a standard solution, provides the best decomposition. (2) Recombination of brain ICs versus subtraction of artifactual ICs (at the channel level). Both the recombination of the brain ICs in the sensor space and the classical procedure of subtracting the artifactual ICs from the recordings provide a suitable reconstruction, with a lower distortion using the latter approach. (3) Recombination of brain ICs after localization versus localization of artifact-corrected recordings. The brain IC recombination after source localization, as implemented in the proposed pipeline, provides a lower source-level signal distortion. (4) Detection of RSNs. The accuracy in source-level reconstruction by the proposed pipeline is confirmed by an improved specificity in the retrieval of RSNs from experimental data.

Journal ArticleDOI
TL;DR: A new message passing scheme named tile-based BP that reduces the memory and bandwidth to a fraction of the ordinary BP algorithms without performance degradation by splitting the MRF into many tiles and only storing the messages across the neighboring tiles is proposed.
Abstract: Loopy belief propagation (BP) is an effective solution for assigning labels to the nodes of a graphical model such as the Markov random field (MRF), but it requires high memory, bandwidth, and computational costs. Furthermore, the iterative, pixel-wise, and sequential operations of BP make it difficult to parallelize the computation. In this paper, we propose two techniques to address these issues. The first technique is a new message passing scheme named tile-based BP that reduces the memory and bandwidth to a fraction of the ordinary BP algorithms without performance degradation by splitting the MRF into many tiles and only storing the messages across the neighboring tiles. The tile-wise processing also enables data reuse and pipeline, resulting in efficient hardware implementation. The second technique is an O(L) fast message construction algorithm that exploits the properties of robust functions for parallelization. We apply these two techniques to a very large-scale integration circuit for stereo matching that generates high-resolution disparity maps in near real-time. We also implement the proposed schemes on graphics processing unit (GPU) which is four-time faster than standard BP on GPU.

Patent
22 Sep 2011
TL;DR: In this article, a heat dissipation device is mounted around a large-sized electronic appliance to absorb and dissipate the heat generated by the appliance so as to effectively reduce the working temperature of the electronic appliances to normal level.
Abstract: A heat dissipation device has an inlet pipeline, an outlet pipeline, at least one heat-dissipating unit and at least one actuator. Each of the at least one heat-dissipating unit is connected to the inlet pipeline and the outlet pipeline and includes multiple heat-dissipating elements being connected to each other. The at least one actuator is mounted on the inlet pipeline. The heat dissipation device is mounted around a large-sized electronic appliance to absorb and dissipate the heat generated by the large-sized electronic appliance so as to effectively reduce the working temperature of the electronic appliance to normal level. Therefore, the large-sized electronic appliance can work safely and is power-saving and environmental friendly.

Journal ArticleDOI
TL;DR: This paper presents a simple and efficient multiplier with the possibility to achieve an arbitrary accuracy through an iterative procedure, prior to achieving the exact result.

Journal ArticleDOI
TL;DR: This paper presents a real-time processing platform for high-definition stereo video capable to process stereo video streams at resolutions up to 1, 920 × 1, 080 at 30 frames per second and shows how the corresponding algorithms can be implemented very efficiently in programmable hardware, relieving the GPU from the burden of these tasks.
Abstract: This paper presents a real-time processing platform for high-definition stereo video. The system is capable to process stereo video streams at resolutions up to 1, 920 × 1, 080 at 30 frames per second (1080p30). In the hybrid FPGA-GPU-CPU system, a high-density FPGA is used not only to perform the low-level image processing tasks such as color interpolation and cross-image color correction, but also to carry out radial undistortion, image rectification, and disparity estimation. We show how the corresponding algorithms can be implemented very efficiently in programmable hardware, relieving the GPU from the burden of these tasks. Our FPGA implementation results are compared with corresponding GPU implementations and with other implementations reported in the literature.

Journal ArticleDOI
01 Feb 2011
TL;DR: To eliminate the read-only memories used to store the twiddle factors, the proposed architecture applies a reconfigurable complex multiplier and bit-parallel multipliers to achieve a ROM-less FFT/IFFT processor, thus consuming lower power than the existing works.
Abstract: 4G and other wireless systems are currently hot topics of research and development in the communication field. Broadband wireless systems based on orthogonal frequency division multiplexing (OFDM) often require an inverse fast Fourier transform (IFFT) to produce multiple subcarriers. In this paper, we present the efficient implementation of a pipeline FFT/IFFT processor for OFDM applications. Our design adopts a single-path delay feedback style as the proposed hardware architecture. To eliminate the read-only memories (ROM's) used to store the twiddle factors, the proposed architecture applies a reconfigurable complex multiplier and bit-parallel multipliers to achieve a ROM-less FFT/IFFT processor, thus consuming lower power than the existing works. The design spends about 33.6K gates, and its power consumption is about 9.8mW at 20MHz.

Journal ArticleDOI
TL;DR: A mathematical proof of existence of a linear transformation to transform LFSR circuits into equivalent state space formulations achieves a full speed-up compared to the serial architecture at the cost of an increase in hardware overhead.
Abstract: Linear feedback shift register (LFSR) is an important component of the cyclic redundancy check (CRC) operations and BCH encoders. The contribution of this paper is two fold. First, this paper presents a mathematical proof of existence of a linear transformation to transform LFSR circuits into equivalent state space formulations. This transformation achieves a full speed-up compared to the serial architecture at the cost of an increase in hardware overhead. This method applies to all generator polynomials used in CRC operations and BCH encoders. Second, a new formulation is proposed to modify the LFSR into the form of an infinite impulse response (IIR) filter. We propose a novel high speed parallel LFSR architecture based on parallel IIR filter design, pipelining and retiming algorithms. The advantage of the proposed approach over the previous architectures is that it has both feedforward and feedback paths. We further propose to apply combined parallel and pipelining techniques to eliminate the fanout effect in long generator polynomials. The proposed scheme can be applied to any generator polynomial, i.e., any LFSR in general. The proposed parallel architecture achieves better area-time product compared to the previous designs.

Journal ArticleDOI
TL;DR: In this paper, an approach was presented to analyze the dynamic response of a Timoshenko pipeline conveying fluid under random excitation, considering the fluid-structure interaction and the effect of shear deformation.

Proceedings ArticleDOI
07 Apr 2011
TL;DR: A two-way time-interleaved (TI) switched-current 1Gs/s 12b pipelined ADC in SiGe BiCMOS that addresses two critical design challenges: the process limits the sampling rate, and the pipeline architecture limits power efficiency.
Abstract: Pipelined ADCs designed in analog BiCMOS technologies can offer good linearity and high SNR performance for input signals with reasonable voltage swings. Such ADCs, however, face two critical design challenges: the process limits the sampling rate, and the pipeline architecture limits power efficiency. This paper introduces a two-way time-interleaved (TI) switched-current 1Gs/s 12b pipelined ADC in SiGe BiCMOS that addresses these issues.

Journal ArticleDOI
TL;DR: A fast on-line tracker that significantly outperforms the CPU version for large events while it entirely maintains its efficiency and a dynamic scheduler was introduced.
Abstract: The on-line event reconstruction in ALICE is performed by the High Level Trigger, which should process up to 2000 events per second in proton-proton collisions and up to 300 central events per second in heavy-ion collisions, corresponding to an input data stream of 30 GB/s. In order to fulfill the time requirements, a fast on-line tracker has been developed. The algorithm combines a Cellular Automaton method being used for a fast pattern recognition and the Kalman Filter method for fitting of found trajectories and for the final track selection. The tracker was adapted to run on Graphics Processing Units (GPU) using the NVIDIA Compute Unified Device Architecture (CUDA) framework. The implementation of the algorithm had to be adjusted at many points to allow for an efficient usage of the graphics cards. In particular, achieving a good overall workload for many processor cores, efficient transfer to and from the GPU, as well as optimized utilization of the different memories the GPU offers turned out to be critical. To cope with these problems a dynamic scheduler was introduced, which redistributes the workload among the processor cores. Additionally a pipeline was implemented so that the tracking on the GPU, the initialization and the output processed by the CPU, as well as the DMA transfer can overlap. The GPU tracking algorithm significantly outperforms the CPU version for large events while it entirely maintains its efficiency.

Journal ArticleDOI
TL;DR: A particle re-initialization scheme is also presented in this paper to further improve the execution performance of the PSO and demonstrate that the proposed HW/SW co-design approach to realize PSO is capable of achieving a high-quality solution effectively.

22 Mar 2011
TL;DR: In this paper, nonlinear observers can be used as tools for the monitoring of pipelines, and two observer approaches for two different applications are presented: a one-leak detection and isolation problem on the one the hand, and the same problem with friction estimation in addition on the other hand.
Abstract: This article shows how nonlinear observers can be used as tools for the monitoring of pipelines. In particular two observer approaches for two different applications are presented: a one-leak detection and isolation problem on the one the hand, and the same problem with friction estimation in addition on the other hand. In the first case, the system which represents the pipeline with a leak satisfies some uniform observability condition allowing for the design of a classical high gain observer (with a static Lyapunov equation). In the second case, the system is no longer uniformly observable, but still satisfies the observability rank condition, and an Extended Kalman Filter is proposed, under the use of exciting inputs. In both cases, experimental results are provided.

Patent
19 Oct 2011
TL;DR: In this article, a microprocessor includes a hardware instruction translator that translates x86 ISA and ARM ISA machine language program instructions into microinstructions, which are encoded in a distinct manner from the x86 and ARM instructions.
Abstract: A microprocessor includes a hardware instruction translator that translates x86 ISA and ARM ISA machine language program instructions into microinstructions, which are encoded in a distinct manner from the x86 and ARM instructions. An execution pipeline executes the microinstructions to generate x86/ARM-defined results. The microinstructions are distinct from the results generated by the execution of the microinstructions by the execution pipeline. The translator directly provides the microinstructions to the execution pipeline for execution. Each time the microprocessor performs one of the x86 ISA and ARM ISA instructions, the translator translates it into the microinstructions. An indicator indicates either x86 or ARM as a boot ISA. After reset, the microprocessor initializes its architectural state, fetches its first instructions from a reset address, and translates them all as defined by the boot ISA. An instruction cache caches the x86 and ARM instructions and provides them to the translator.

Journal ArticleDOI
TL;DR: A novel framework which can efficiently evaluate approximate Boolean set operations for B-rep models by highly parallel algorithms is presented by taking axis-aligned surfels of Layered Depth Images (LDI) as a bridge and performing Boolean operations on the structured points.
Abstract: We present a novel framework which can efficiently evaluate approximate Boolean set operations for B-rep models by highly parallel algorithms. This is achieved by taking axis-aligned surfels of Layered Depth Images (LDI) as a bridge and performing Boolean operations on the structured points. As compared with prior surfel-based approaches, this paper has much improvement. Firstly, we adopt key-data pairs to store LDI more compactly. Secondly, robust depth peeling is investigated to overcome the bottleneck of layer-complexity. Thirdly, an out-of-core tiling technique is presented to overcome the limitation of memory. Real-time feedback is provided by streaming the proposed pipeline on the many-core graphics hardware.

Journal ArticleDOI
TL;DR: The proposed Matrix Multiplication with systolic architecture is enhances the speed of matrix multiplication by twice of conventional method.
Abstract: The evolution of computer and Internet has brought demand for powerful and high speed data processing, but in such complex environment fewer methods can provide perfect solution. To handle above addressed issue, parallel computing is proposed as a solution to the contradiction. This paper provides solution for the addressed issues of demand for high speed data processing. This paper demonstrates an effective design for the Matrix Multiplication using Systolic Architecture on Reconfigurable Systems (RS) like Field Programmable Gate Arrays (FPGAs). Here, the systolic architecture increases the computing speed by combining the concept of parallel processing and pipelining into a single concept. Here, the RTL code is written for matrix multiplication with systolic architecture and matrix multiplication without systolic architecture in Verilog HDL, compiled and simulated by using Modelsim XE III 6.4b, Synthesized by using Xilinx ISE 9.2i and targeted to the device xc3s500e-5-ft256 and then finally the designs are compared to each other to evaluate the performance of proposed architecture. The proposed Matrix Multiplication with systolic architecture is enhances the speed of matrix multiplication by twice of conventional method.

Journal ArticleDOI
TL;DR: The computational results have demonstrated that the model is able to define new operational points to the pipeline, providing significant cost savings, and the CLP-MILP model is an efficient tool to aid operational decision-making within this real-world pipeline scenario.
Abstract: This paper addresses the problem of developing an optimization model to aid the operational scheduling in a real-world pipeline scenario The pipeline connects refinery and harbor, conveying different types of commodities (gasoline, diesel, kerosene, etc) An optimization model was developed to determine pipeline scheduling with improved efficiency This model combines constraint logic programming (CLP) and mixed integer linear programming (MILP) in a CLP-MILP approach The proposed model uses decomposition strategies, continuous time representation, intervals that indicate time constraints (time windows), and a series of operational issues, such as the seasonal and hourly cost of electric energy (on-peak demand hours) Real cases were solved in a matter of seconds The computational results have demonstrated that the model is able to define new operational points to the pipeline, providing significant cost savings Indeed the CLP-MILP model is an efficient tool to aid operational decision-making within this real-world pipeline scenario

Journal ArticleDOI
TL;DR: This paper presents the concept of micro observation versus macro observation and shows that the most effective method of using SBST is through a multiple input signature register connected to the processor local bus, while conventional methods that observe only the program results in the memory lead to significantly less processor fault coverage.
Abstract: This paper presents an effective hybrid test program for the software-based self-testing (SBST) of pipeline processor cores. The test program combines a deterministically developed program which explores different levels of processor core information and a block-based random program which consists of a combination of in-order instructions, random-order instructions, return instructions, as well as instruction sequences used to trigger exception/interrupt requests. Due to the complementary nature of this hybrid test program, it can achieve processor fault coverage that is comparable to the performance of the conventional scan chain method. The test response observation methods and their impacts on fault coverage are also investigated. We present the concept of micro observation versus macro observation and show that the most effective method of using SBST is through a multiple input signature register connected to the processor local bus, while conventional methods that observe only the program results in the memory lead to significantly less processor fault coverage.

Journal ArticleDOI
TL;DR: It is clear that this field is on the brink of significant improvements to the SOC that should follow the highly anticipated approval of the first direct antiviral agents, telaprevir (Vertex/Johnson & Johnson/Mitsubishi) and boceprevir (Merck), in 2011.
Abstract: Over 3% of the world’s population is chronically infected with hepatitis C virus (HCV). Chronic hepatitis C (CHC) is a major cause of liver damage, cirrhosis and liver cancer that can lead to liver failure and death. The current standard of care (SOC) is a combination of pegylated interferon with ribavirin (PEG-IFN/RBV), but this only eradicates the virus in approximately 50% of patients1. The recently concluded sixty-first annual meeting of the American Association for the Study of Liver Diseases (AASLD) provided a broad overview of the pipeline of novel drugs for the treatment of CHC2. It is clear that this field is on the brink of significant improvements to the SOC that should follow the highly anticipated approval of the first direct antiviral agents (DAAs), telaprevir (Vertex/Johnson & Johnson/Mitsubishi) and boceprevir (Merck), in 2011.


Proceedings ArticleDOI
07 Apr 2011
TL;DR: Several architectural and circuit techniques used to achieve this performance are presented, which include a dynamically driven deep N-well input sampling switch, an offset-cancelled comparator, and a back-gate voltage-biased MDAC amplifier.
Abstract: The high channel count of many modern communication systems increasingly requires high-performance ADCs that consume very little power. The 16b pipeline ADC described here achieves 77.6dBFS SNR, 77.6dBFS SNDR and 95dBc SFDR at 80MS/s with a 10MHz input. With a 200MHz input, the ADC achieves 71.0dBFS SNR, 69.4dBFS SNDR and 81dBc SFDR. The complete ADC including reference, clock, and digital circuitry consumes 100mW from a 1.8V supply. This compares favorably with recently reported ADCs in this performance class [1–3]. In this paper, several architectural and circuit techniques used to achieve this performance are presented. The techniques include a dynamically driven deep N-well input sampling switch, an offset-cancelled comparator, and a back-gate voltage-biased MDAC amplifier. The ADC is fabricated in a 1P5M 0.18μm CMOS process with deep N-well (DNW) isolation.