Showing papers on "Pipeline (computing) published in 2011"

PDF

Open Access

Journal Article•DOI•

A 45 nm Resilient Microprocessor Core for Dynamic Variation Tolerance

[...]

Keith Bowman¹, J. Tschanz¹, Shih-Lien Lu¹, Paolo Aseron¹, Muhammad M. Khellah¹, Arijit Raychowdhury¹, Bibiche M. Geuskens¹, Carlos Tokunaga¹, Christopher B. Wilkerson¹, Tanay Karnik¹, Vivek De¹ - Show less +7 more•Institutions (1)

Intel¹

01 Jan 2011-IEEE Journal of Solid-state Circuits

TL;DR: While core min-delay constraints limit the potential benefits of the embedded EDS design, a salient advantage of the TRC design is the ability to detect a wider range of dynamic delay variation, as demonstrated through low supply voltage (VCC) measurements.

...read moreread less

Abstract: A 45 nm microprocessor core integrates resilient error-detection and recovery circuits to mitigate the clock frequency (FCLK) guardbands for dynamic parameter variations to improve throughput and energy efficiency. The core supports two distinct error-detection designs, allowing a direct comparison of the relative trade-offs. The first design embeds error-detection sequential (EDS) circuits in critical paths to detect late timing transitions. In addition to reducing the Fclk guardbands for dynamic variations, the embedded EDS design can exploit path-activation rates to operate the microprocessor faster than infrequently-activated critical paths. The second error-detection design offers a less-intrusive approach for dynamic timing-error detection by placing a tunable replica circuit (TRC) per pipeline stage to monitor worst-case delays. Although the TRCs require a delay guardband to ensure the TRC delay is always slower than critical-path delays, the TRC design captures most of the benefits from the embedded EDS design with less implementation overhead. Furthermore, while core min-delay constraints limit the potential benefits of the embedded EDS design, a salient advantage of the TRC design is the ability to detect a wider range of dynamic delay variation, as demonstrated through low supply voltage (VCC) measurements. Both error-detection designs interface with error-recovery techniques, enabling the detection and correction of timing errors from fast-changing variations such as high-frequency VCC droops. The microprocessor core also supports two separate error-recovery techniques to guarantee correct execution even if dynamic variations persist. The first technique requires clock control to replay errant instructions at 1/2FCLK. In comparison, the second technique is a new multiple-issue instruction replay design that corrects errant instructions with a lower performance penalty and without requiring clock control. Silicon measurements demonstrate that resilient circuits enable a 41% throughput gain at equal energy or a 22% energy reduction at equal throughput, as compared to a conventional design when executing a benchmark program with a 10% VCC droop. In addition, the microprocessor includes a new adaptive clock control circuit that interfaces with the resilient circuits and a phase-locked loop (PLL) to track recovery cycles and adapt to persistent errors by dynamically changing Fclk f°Γ maximum efficiency.

...read moreread less

263 citations

Journal Article•DOI•

Correction to “A Power-Efficient 32 bit ARM Processor Using Timing-Error Detection and Correction for Transient-Error Tolerance and Adaptation to PVT Variation”

[...]

David Michael Bull, Shidhartha Das, Karthik Shivashankar, Ganesh Dasika¹, Krisztian Flautner, David Blaauw¹ - Show less +2 more•Institutions (1)

University of Michigan¹

22 Feb 2011

TL;DR: This paper applies Razor to a 32 bit ARM processor with a micro-architecture design that has balanced pipeline stages with critical memory access and clock-gating enable paths, and shows potential for parametric yield improvement through energy-efficient operation using Razor.

...read moreread less

Abstract: Razor is a hybrid technique for dynamic detection and correction of timing errors. A combination of error detecting circuits and micro-architectural recovery mechanisms creates a system that is robust in the face of timing errors, and can be tuned to an efficient operating point by dynamically eliminating unused timing margins. Savings from margin reclamation can be realized as per device power-efficiency improvement, or parametric yield improvement for a batch of devices. In this paper, we apply Razor to a 32 bit ARM processor with a micro-architecture design that has balanced pipeline stages with critical memory access and clock-gating enable paths. The design is fabricated on a UMC 65 nm process, using industry standard EDA tools, with a worst-case STA signoff of 724 MHz. Based on measurements on 87 samples from split-lots, we obtain 52% power reduction for the overall distribution at 1 GHz operation. We present error rate driven dynamic voltage and frequency scaling schemes where runtime adaptation to PVT variations and tolerance of fast transients is demonstrated. All Razor cells are augmented with a sticky error history bit, allowing precise diagnosis of timing errors over the execution of test vectors. We show potential for parametric yield improvement through energy-efficient operation using Razor.

...read moreread less

203 citations

Journal Article•DOI•

A SAR-Assisted Two-Stage Pipeline ADC

[...]

Chun C. Lee¹, Michael P. Flynn²•Institutions (2)

Intel¹, University of Michigan²

17 Feb 2011-IEEE Journal of Solid-state Circuits

TL;DR: In this article, a two-stage pipeline ADC architecture with a large first-stage resolution, enabled with the help of a SAR-based sub-ADC, is presented, achieving an ENOB of 104b at Nyquist and a figure-of-merit of 52 f J/conversion-step.

...read moreread less

Abstract: Successive approximation register (SAR) ADC architectures are popular for achieving high energy efficiency but they suffer from resolution and speed limitations On the other hand pipeline ADC architectures can achieve high resolution and speed but have lower energy-efficiency and are more complex We pro pose a two-stage pipeline ADC architecture with a large first-stage resolution, enabled with the help of a SAR-based sub-ADC The prototype 12b 50 MS/s ADC achieves an ENOB of 104b at Nyquist, and a figure-of-merit of 52 f J/conversion-step The ADC achieves low-power, high-resolution and high-speed operation without calibration The ADC is fabricated in 65 nm and 90 nm CMOS and occupies a core area of only 016 mm2

...read moreread less

201 citations

Proceedings Article•DOI•

Comparing FPGA vs. custom cmos and the impact on processor microarchitecture

[...]

Henry Wong¹, Vaughn Betz², Jonathan Rose¹•Institutions (2)

University of Toronto¹, Altera²

27 Feb 2011

TL;DR: This paper compares the delay and area of a comprehensive set of processor building block circuits when implemented on custom CMOS and FPGA substrates to infer how the microarchitecture of soft processors on FPGAs should be different from hard processors on customCMOS.

...read moreread less

Abstract: As soft processors are increasingly used in diverse applications, there is a need to evolve their microarchitectures in a way that suits the FPGA implementation substrate. This paper compares the delay and area of a comprehensive set of processor building block circuits when implemented on custom CMOS and FPGA substrates. We then use the results of these comparisons to infer how the microarchitecture of soft processors on FPGAs should be different from hard processors on custom CMOS.We find that the ratios of the area required by an FPGA to that of custom CMOS for different building blocks varies significantly more than the speed ratios. As area is often a key design constraint in FPGA circuits, area ratios have the most impact on microarchitecture choices. Complete processor cores have area ratios of 17-27x and delay ratios of 18-26x. Building blocks that have dedicated hardware support on FPGAs such as SRAMs, adders, and multipliers are particularly area-efficient (2-7x area ratio), while multiplexers and CAMs are particularly area-inefficient (>100x area ratio), leading to cheaper ALUs, larger caches of low associativity, and more expensive bypass networks than on similar hard processors. We also find that a low delay ratio for pipeline latches (12-19x) suggests soft processors should have pipeline depths 20% greater than hard processors of similar complexity.

...read moreread less

133 citations

Proceedings Article•DOI•

FabScalar: composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template

[...]

Niket K. Choudhary¹, Salil V. Wadhavkar¹, Tanmay A. Shah², Hiran Mayukh³, Jayneel Gandhi³, Brandon H. Dwiel¹, Sandeep Navada¹, Hashem Hashemi Najaf-abadi², Eric Rotenberg¹ - Show less +5 more•Institutions (3)

North Carolina State University¹, Intel², University of Wisconsin-Madison³

04 Jun 2011

TL;DR: From this idea, a toolset is developed, called FabScalar, for automatically composing the synthesizable register-transfer-level (RTL) designs of arbitrary cores within a canonical superscalar template, which defines canonical pipeline stages and interfaces among them.

...read moreread less

Abstract: A growing body of work has compiled a strong case for the single-ISA heterogeneous multi-core paradigm. A single-ISA heterogeneous multi-core provides multiple, differently-designed superscalar core types that can streamline the execution of diverse programs and program phases. No prior research has addressed the 'Achilles' heel of this paradigm: design and verification effort is multiplied by the number of different core types. This work frames superscalar processors in a canonical form, so that it becomes feasible to quickly design many cores that differ in the three major superscalar dimensions: superscalar width, pipeline depth, and sizes of structures for extracting instruction-level parallelism (ILP). From this idea, we develop a toolset, called FabScalar, for automatically composing the synthesizable register-transfer-level (RTL) designs of arbitrary cores within a canonical superscalar template. The template defines canonical pipeline stages and interfaces among them. A Canonical Pipeline Stage Library (CPSL) provides many implementations of each canonical pipeline stage, that differ in their superscalar width and depth of sub-pipelining. An RTL generation tool uses the template and CPSL to automatically generate an overall core of desired configuration. Validation experiments are performed along three fronts to evaluate the quality of RTL designs generated by FabScalar: functional and performance (instructions-per-cycle (IPC)) validation, timing validation (cycle time), and confirmation of suitability for standard ASIC flows. With FabScalar, a chip with many different superscalar core types is conceivable.

...read moreread less

128 citations

Journal Article•DOI•

An analytical method for strength verification of buried steel pipelines at normal fault crossings

[...]

Dimitris K Karamitros, George D. Bouckovalas¹, George P. Kouretzis¹, Vasiliki Gkesouli¹•Institutions (1)

National Technical University of Athens¹

01 Nov 2011-Soil Dynamics and Earthquake Engineering

TL;DR: In this article, the complex problem of strength verification of a buried steel pipeline crossing the trace of a normal active fault is treated analytically, and a refined methodology for the calculation of the axial and bending pipeline strains is presented.

...read moreread less

115 citations

Journal Article•DOI•

Hierarchical leak detection and localization method in natural gas pipeline monitoring sensor networks.

[...]

Jiangwen Wan¹, Yang Yu, Yinfeng Wu, Renjian Feng, Ning Yu - Show less +1 more•Institutions (1)

Beihang University¹

27 Dec 2011-Sensors

TL;DR: Experimental results illustrate that this hierarchical pipeline leak detection and localization method could effectively improve the accuracy of the leak point localization and reduce the undetected rate as well as false alarm rate.

...read moreread less

Abstract: In light of the problems of low recognition efficiency, high false rates and poor localization accuracy in traditional pipeline security detection technology, this paper proposes a type of hierarchical leak detection and localization method for use in natural gas pipeline monitoring sensor networks. In the signal preprocessing phase, original monitoring signals are dealt with by wavelet transform technology to extract the single mode signals as well as characteristic parameters. In the initial recognition phase, a multi-classifier model based on SVM is constructed and characteristic parameters are sent as input vectors to the multi-classifier for initial recognition. In the final decision phase, an improved evidence combination rule is designed to integrate initial recognition results for final decisions. Furthermore, a weighted average localization algorithm based on time difference of arrival is introduced for determining the leak point's position. Experimental results illustrate that this hierarchical pipeline leak detection and localization method could effectively improve the accuracy of the leak point localization and reduce the undetected rate as well as false alarm rate.

...read moreread less

111 citations

Journal Article•DOI•

Caisson: a hardware description language for secure information flow

[...]

Xun Li¹, Mohit Tiwari¹, Jason Oberg², Vineeth Kashyap¹, Frederic T. Chong¹, Timothy Sherwood¹, Ben Hardekopf¹ - Show less +3 more•Institutions (2)

University of California, Santa Barbara¹, University of California, San Diego²

04 Jun 2011

TL;DR: This work creates the first-ever provably information-flow secure processor with micro-architectural features including pipelining and cache and describes a new hardware description language, Caisson, that combines domain-specific abstractions common to hardware design with insights from type-based techniques used in secure programming languages.

...read moreread less

Abstract: Information flow is an important security property that must be incorporated from the ground up, including at hardware design time, to provide a formal basis for a system's root of trust. We incorporate insights and techniques from designing information-flow secure programming languages to provide a new perspective on designing secure hardware. We describe a new hardware description language, Caisson, that combines domain-specific abstractions common to hardware design with insights from type-based techniques used in secure programming languages. The proper combination of these elements allows for an expressive, provably-secure HDL that operates at a familiar level of abstraction to the target audience of the language, hardware architects.We have implemented a compiler for Caisson that translates designs into Verilog and then synthesizes the designs using existing tools. As an example of Caisson's usefulness we have addressed an open problem in secure hardware by creating the first-ever provably information-flow secure processor with micro-architectural features including pipelining and cache. We synthesize the secure processor and empirically compare it in terms of chip area, power consumption, and clock frequency with both a standard (insecure) commercial processor and also a processor augmented at the gate level to dynamically track information flow. Our processor is competitive with the insecure processor and significantly better than dynamic tracking.

...read moreread less

105 citations

DOI•

Towards a Time-predictable Dual-Issue Microprocessor: The Patmos Approach

[...]

Martin Schoeberl, Pascal Schleuniger, Wolfgang Puffitsch, Florian Brandner, Christian W. Probst, Sven Karlsson, Tommy Thorn - Show less +3 more

18 Mar 2011

TL;DR: This paper presents Patmos, a processor optimized for low WCET bounds rather than high average case performance, a dual- issue, statically scheduled RISC processor that relies on a customized compiler.

...read moreread less

Abstract: Current processors are optimized for average case performance, often leading to a high worst-case execution time (WCET). Many architectural features that increase the average case performance are hard to be modeled for the WCET analysis. In this paper we present Patmos, a processor optimized for low WCET bounds rather than high average case performance. Patmos is a dual- issue, statically scheduled RISC processor. The instruction cache is organized as a method cache and the data cache is organized as a split cache in order to simplify the cache WCET analysis. To fill the dual-issue pipeline with enough useful instructions, Patmos relies on a customized compiler. The compiler also plays a central role in optimizing the application for the WCET instead of average case performance.

...read moreread less

105 citations

Proceedings Article•DOI•

Bundled execution of recurring traces for energy-efficient general purpose processing

[...]

Shantanu Gupta¹, Shuguang Feng¹, Amin Ansari¹, Scott Mahlke¹, David I. August² - Show less +1 more•Institutions (2)

University of Michigan¹, Princeton University²

03 Dec 2011

TL;DR: This approach identifies recurring instruction sequences as phases of “temporal regularity” in a program's execution, and maps suitable ones to the BERET hardware, a three-stage pipeline with a bundled execution model that demonstrates significant savings on instruction fetch, decode and register file accesses energy.

...read moreread less

Abstract: Technology scaling has delivered on its promises of increasing device density on a single chip. However, the voltage scaling trend has failed to keep up, introducing tight power constraints on manufactured parts. In such a scenario, there is a need to incorporate energy-efficient processing resources that can enable more computation within the same power budget. Energy efficiency solutions in the past have typically relied on application specific hardware and accelerators. Unfortunately, these approaches do not extend to general purpose applications due to their irregular and diverse code base. Towards this end, we propose BERET, an energy-efficient co-processor that can be configured to benefit a wide range of applications. Our approach identifies recurring instruction sequences as phases of "temporal regularity" in a program's execution, and maps suitable ones to the BERET hardware, a three-stage pipeline with a bundled execution model. This judicious off-loading of program execution to a reduced-complexity hardware demonstrates significant savings on instruction fetch, decode and register file accesses energy. On average, BERET reduces energy consumption by a factor of 3-4X for the program regions selected across a range of general-purpose and media applications. The average energy savings for the entire application run was 35% over a single-issue in-order processor.

...read moreread less

104 citations

Journal Article•DOI•

A signal-processing pipeline for magnetoencephalography resting-state networks.

[...]

Dante Mantini¹, Stefania Della Penna, Laura Marzetti, Francesco de Pasquale, Vittorio Pizzella, Maurizio Corbetta, Gian Luca Romani - Show less +3 more•Institutions (1)

University of Chieti-Pescara¹

01 Jun 2011-Brain connectivity

TL;DR: The accuracy in source-level reconstruction by the proposed pipeline is confirmed by an improved specificity in the retrieval of RSNs from experimental data, and the sensitivity of the ICA results to the decomposition algorithm is assessed.

...read moreread less

Abstract: To study functional connectivity using magnetoencephalographic (MEG) data, the high-quality source-level reconstruction of brain activity constitutes a critical element. MEG resting-state networks (RSNs) have been documented by means of a dedicated processing pipeline: MEG recordings are decomposed by independent component analysis (ICA) into artifact and brain components (ICs); next, the channel maps associated with the latter ones are projected into the source space and the resulting voxel-wise weights are used to linearly combine the IC time courses. An extensive description of the proposed pipeline is provided here, along with an assessment of its performances with respect to alternative approaches. The following investigations were carried out: (1) ICA decomposition algorithm. Synthetic data are used to assess the sensitivity of the ICA results to the decomposition algorithm, by testing FastICA, INFOMAX, and SOBI. FastICA with deflation approach, a standard solution, provides the best decomposition. (2) Recombination of brain ICs versus subtraction of artifactual ICs (at the channel level). Both the recombination of the brain ICs in the sensor space and the classical procedure of subtracting the artifactual ICs from the recordings provide a suitable reconstruction, with a lower distortion using the latter approach. (3) Recombination of brain ICs after localization versus localization of artifact-corrected recordings. The brain IC recombination after source localization, as implemented in the proposed pipeline, provides a lower source-level signal distortion. (4) Detection of RSNs. The accuracy in source-level reconstruction by the proposed pipeline is confirmed by an improved specificity in the retrieval of RSNs from experimental data.

...read moreread less

Journal Article•DOI•

Hardware-Efficient Belief Propagation

[...]

Chia-Kai Liang¹, Chao-Chung Cheng¹, Yen-Chieh Lai¹, Liang-Gee Chen¹, Homer H. Chen¹ - Show less +1 more•Institutions (1)

National Taiwan University¹

01 May 2011-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: A new message passing scheme named tile-based BP that reduces the memory and bandwidth to a fraction of the ordinary BP algorithms without performance degradation by splitting the MRF into many tiles and only storing the messages across the neighboring tiles is proposed.

...read moreread less

Abstract: Loopy belief propagation (BP) is an effective solution for assigning labels to the nodes of a graphical model such as the Markov random field (MRF), but it requires high memory, bandwidth, and computational costs. Furthermore, the iterative, pixel-wise, and sequential operations of BP make it difficult to parallelize the computation. In this paper, we propose two techniques to address these issues. The first technique is a new message passing scheme named tile-based BP that reduces the memory and bandwidth to a fraction of the ordinary BP algorithms without performance degradation by splitting the MRF into many tiles and only storing the messages across the neighboring tiles. The tile-wise processing also enables data reuse and pipeline, resulting in efficient hardware implementation. The second technique is an O(L) fast message construction algorithm that exploits the properties of robust functions for parallelization. We apply these two techniques to a very large-scale integration circuit for stereo matching that generates high-resolution disparity maps in near real-time. We also implement the proposed schemes on graphics processing unit (GPU) which is four-time faster than standard BP on GPU.

...read moreread less

Patent•

Heat dissipation device

[...]

Cheng-Feng Wan, Hao-Hui Lin, Su-Chen Hu

22 Sep 2011

TL;DR: In this article, a heat dissipation device is mounted around a large-sized electronic appliance to absorb and dissipate the heat generated by the appliance so as to effectively reduce the working temperature of the electronic appliances to normal level.

...read moreread less

Abstract: A heat dissipation device has an inlet pipeline, an outlet pipeline, at least one heat-dissipating unit and at least one actuator. Each of the at least one heat-dissipating unit is connected to the inlet pipeline and the outlet pipeline and includes multiple heat-dissipating elements being connected to each other. The at least one actuator is mounted on the inlet pipeline. The heat dissipation device is mounted around a large-sized electronic appliance to absorb and dissipate the heat generated by the large-sized electronic appliance so as to effectively reduce the working temperature of the electronic appliance to normal level. Therefore, the large-sized electronic appliance can work safely and is power-saving and environmental friendly.

...read moreread less

Journal Article•DOI•

An iterative logarithmic multiplier

[...]

Zdenka Babic¹, Aleksej Avramovic¹, Patricio Bulić²•Institutions (2)

University of Banja Luka¹, University of Ljubljana²

01 Feb 2011-Microprocessors and Microsystems

TL;DR: This paper presents a simple and efficient multiplier with the possibility to achieve an arbitrary accuracy through an iterative procedure, prior to achieving the exact result.

...read moreread less

Journal Article•DOI•

An FPGA-based processing pipeline for high-definition stereo video

[...]

Pierre Greisen¹, Pierre Greisen², Simon Heinzle¹, Markus Gross², Markus Gross¹, Andreas Burg - Show less +2 more•Institutions (2)

Disney Research¹, ETH Zurich²

04 Nov 2011-Eurasip Journal on Image and Video Processing

TL;DR: This paper presents a real-time processing platform for high-definition stereo video capable to process stereo video streams at resolutions up to 1, 920 × 1, 080 at 30 frames per second and shows how the corresponding algorithms can be implemented very efficiently in programmable hardware, relieving the GPU from the burden of these tasks.

...read moreread less

Abstract: This paper presents a real-time processing platform for high-definition stereo video. The system is capable to process stereo video streams at resolutions up to 1, 920 × 1, 080 at 30 frames per second (1080p30). In the hybrid FPGA-GPU-CPU system, a high-density FPGA is used not only to perform the low-level image processing tasks such as color interpolation and cross-image color correction, but also to carry out radial undistortion, image rectification, and disparity estimation. We show how the corresponding algorithms can be implemented very efficiently in programmable hardware, relieving the GPU from the burden of these tasks. Our FPGA implementation results are compared with corresponding GPU implementations and with other implementations reported in the literature.

...read moreread less

Journal Article•DOI•

A novel low-power 64-point pipelined FFT/IFFT processor for OFDM applications

[...]

Chu Yu¹, Yi-Ting Liao¹, Mao-Hsu Yen², Pao-Ann Hsiung³, Sao-Jie Chen⁴ - Show less +1 more•Institutions (4)

National Ilan University¹, National Taiwan Ocean University², National Chung Cheng University³, National Taiwan University⁴

01 Feb 2011

TL;DR: To eliminate the read-only memories used to store the twiddle factors, the proposed architecture applies a reconfigurable complex multiplier and bit-parallel multipliers to achieve a ROM-less FFT/IFFT processor, thus consuming lower power than the existing works.

...read moreread less

Abstract: 4G and other wireless systems are currently hot topics of research and development in the communication field. Broadband wireless systems based on orthogonal frequency division multiplexing (OFDM) often require an inverse fast Fourier transform (IFFT) to produce multiple subcarriers. In this paper, we present the efficient implementation of a pipeline FFT/IFFT processor for OFDM applications. Our design adopts a single-path delay feedback style as the proposed hardware architecture. To eliminate the read-only memories (ROM's) used to store the twiddle factors, the proposed architecture applies a reconfigurable complex multiplier and bit-parallel multipliers to achieve a ROM-less FFT/IFFT processor, thus consuming lower power than the existing works. The design spends about 33.6K gates, and its power consumption is about 9.8mW at 20MHz.

...read moreread less

Journal Article•DOI•

High-Speed Parallel Architectures for Linear Feedback Shift Registers

[...]

Manohar Ayinala¹, Keshab K. Parhi¹•Institutions (1)

University of Minnesota¹

01 Sep 2011-IEEE Transactions on Signal Processing

TL;DR: A mathematical proof of existence of a linear transformation to transform LFSR circuits into equivalent state space formulations achieves a full speed-up compared to the serial architecture at the cost of an increase in hardware overhead.

...read moreread less

Abstract: Linear feedback shift register (LFSR) is an important component of the cyclic redundancy check (CRC) operations and BCH encoders. The contribution of this paper is two fold. First, this paper presents a mathematical proof of existence of a linear transformation to transform LFSR circuits into equivalent state space formulations. This transformation achieves a full speed-up compared to the serial architecture at the cost of an increase in hardware overhead. This method applies to all generator polynomials used in CRC operations and BCH encoders. Second, a new formulation is proposed to modify the LFSR into the form of an infinite impulse response (IIR) filter. We propose a novel high speed parallel LFSR architecture based on parallel IIR filter design, pipelining and retiming algorithms. The advantage of the proposed approach over the previous architectures is that it has both feedforward and feedback paths. We further propose to apply combined parallel and pipelining techniques to eliminate the fanout effect in long generator polynomials. The proposed scheme can be applied to any generator polynomial, i.e., any LFSR in general. The proposed parallel architecture achieves better area-time product compared to the previous designs.

...read moreread less

Journal Article•DOI•

Dynamic response of pipeline conveying fluid to random excitation

[...]

Hong-bo Zhai¹, Zi-yan Wu¹, Yongshou Liu¹, Zhufeng Yue¹•Institutions (1)

Northwestern Polytechnical University¹

01 Aug 2011-Nuclear Engineering and Design

TL;DR: In this paper, an approach was presented to analyze the dynamic response of a Timoshenko pipeline conveying fluid under random excitation, considering the fluid-structure interaction and the effect of shear deformation.

...read moreread less

Proceedings Article•DOI•

A 12b 1GS/s SiGe BiCMOS two-way time-interleaved pipeline ADC

[...]

Robert Floyd Payne¹, Charles Sestok¹, William J. Bright¹, Manar El-Chammas¹, Marco Corsi¹, David Smith¹, Noam Tal¹ - Show less +3 more•Institutions (1)

Texas Instruments¹

07 Apr 2011

TL;DR: A two-way time-interleaved (TI) switched-current 1Gs/s 12b pipelined ADC in SiGe BiCMOS that addresses two critical design challenges: the process limits the sampling rate, and the pipeline architecture limits power efficiency.

...read moreread less

Abstract: Pipelined ADCs designed in analog BiCMOS technologies can offer good linearity and high SNR performance for input signals with reasonable voltage swings. Such ADCs, however, face two critical design challenges: the process limits the sampling rate, and the pipeline architecture limits power efficiency. This paper introduces a two-way time-interleaved (TI) switched-current 1Gs/s 12b pipelined ADC in SiGe BiCMOS that addresses these issues.

...read moreread less

Journal Article•DOI•

ALICE HLT High Speed Tracking on GPU

[...]

S. Gorbunov, D. Rohr¹, K. Aamodt², Torsten Alt, H. Appelshäuser³, A. Arend³, M. Bach, B. Becker, S. Böttger¹, T. Breitner¹, H. Busching³, Sukalyan Chattopadhyay, Jean Willy Andre Cleymans⁴, C. Cicalo, Indranil Das, Øystein Djuvsland, H. Engel¹, Hege Austrheim Erdal, R. W. Fearick⁴, Øystein Senneset Haaland, P. T. Hille⁵, Sebastian Kalcher, K. Kanaki, Udo Kebschull¹, Ivan Kisel, Matthias Kretz, C. Lara¹, Svein Lindal², Volker Lindenstruth, A. Ahmad Masoodi⁶, G. Ovrebekk, R. Panse¹, J. Peschek¹, Mateusz Andrzej Ploskon⁷, T.A. Pocheptsov², D. Ram, T. Rascanu³, Matthias Richter², Dieter Røhrich, Federico Ronchetti, Bernhard Skaali², O. Smorholm, Camilla H. Stokkevåg, T.M. Steinbeck, A. Szostak, J. Thäder, Trine Spedstad Tveter², Kjetil Ullaland, Zabulon Vilakazi⁴, R. Weis¹, Zhongbao Yin, P. Zelnicek¹ - Show less +48 more•Institutions (7)

Heidelberg University¹, University of Oslo², Goethe University Frankfurt³, University of Cape Town⁴, Yale University⁵, Aligarh Muslim University⁶, Lawrence Berkeley National Laboratory⁷

30 Jun 2011-IEEE Transactions on Nuclear Science

TL;DR: A fast on-line tracker that significantly outperforms the CPU version for large events while it entirely maintains its efficiency and a dynamic scheduler was introduced.

...read moreread less

Abstract: The on-line event reconstruction in ALICE is performed by the High Level Trigger, which should process up to 2000 events per second in proton-proton collisions and up to 300 central events per second in heavy-ion collisions, corresponding to an input data stream of 30 GB/s. In order to fulfill the time requirements, a fast on-line tracker has been developed. The algorithm combines a Cellular Automaton method being used for a fast pattern recognition and the Kalman Filter method for fitting of found trajectories and for the final track selection. The tracker was adapted to run on Graphics Processing Units (GPU) using the NVIDIA Compute Unified Device Architecture (CUDA) framework. The implementation of the algorithm had to be adjusted at many points to allow for an efficient usage of the graphics cards. In particular, achieving a good overall workload for many processor cores, efficient transfer to and from the GPU, as well as optimized utilization of the different memories the GPU offers turned out to be critical. To cope with these problems a dynamic scheduler was introduced, which redistributes the workload among the processor cores. Additionally a pipeline was implemented so that the tracking on the GPU, the initialization and the output processed by the CPU, as well as the DMA transfer can overlap. The GPU tracking algorithm significantly outperforms the CPU version for large events while it entirely maintains its efficiency.

...read moreread less

Journal Article•DOI•

Hardware/software co-design for particle swarm optimization algorithm

[...]

Shih-An Li¹, Chen-Chien Hsu², Ching-Chang Wong¹, Chia-Jun Yu¹•Institutions (2)

Tamkang University¹, National Taiwan Normal University²

01 Oct 2011-Information Sciences

TL;DR: A particle re-initialization scheme is also presented in this paper to further improve the execution performance of the PSO and demonstrate that the proposed HW/SW co-design approach to realize PSO is capable of achieving a high-quality solution effectively.

...read moreread less

Examples of pipeline monitoring with nonlinear observers and real-data validation

[...]

Lizeth Torres, Gildas Besancon, Adrian Navarro¹, Ofelia Begovich¹, Didier Georges - Show less +1 more•Institutions (1)

Instituto Politécnico Nacional¹

22 Mar 2011

TL;DR: In this paper, nonlinear observers can be used as tools for the monitoring of pipelines, and two observer approaches for two different applications are presented: a one-leak detection and isolation problem on the one the hand, and the same problem with friction estimation in addition on the other hand.

...read moreread less

Abstract: This article shows how nonlinear observers can be used as tools for the monitoring of pipelines. In particular two observer approaches for two different applications are presented: a one-leak detection and isolation problem on the one the hand, and the same problem with friction estimation in addition on the other hand. In the first case, the system which represents the pipeline with a leak satisfies some uniform observability condition allowing for the design of a classical high gain observer (with a static Lyapunov equation). In the second case, the system is no longer uniformly observable, but still satisfies the observability rank condition, and an Extended Kalman Filter is proposed, under the use of exciting inputs. In both cases, experimental results are provided.

...read moreread less

Patent•

Microprocessor that performs X86 ISA and ARM ISA machine language program instructions by hardware translation into microinstructions executed by common execution pipeline

[...]

G. Glenn Henry¹, Terry Parks¹, Rodney E. Hooker¹•Institutions (1)

VIA Technologies¹

19 Oct 2011

TL;DR: In this article, a microprocessor includes a hardware instruction translator that translates x86 ISA and ARM ISA machine language program instructions into microinstructions, which are encoded in a distinct manner from the x86 and ARM instructions.

...read moreread less

Abstract: A microprocessor includes a hardware instruction translator that translates x86 ISA and ARM ISA machine language program instructions into microinstructions, which are encoded in a distinct manner from the x86 and ARM instructions. An execution pipeline executes the microinstructions to generate x86/ARM-defined results. The microinstructions are distinct from the results generated by the execution of the microinstructions by the execution pipeline. The translator directly provides the microinstructions to the execution pipeline for execution. Each time the microprocessor performs one of the x86 ISA and ARM ISA instructions, the translator translates it into the microinstructions. An indicator indicates either x86 or ARM as a boot ISA. After reset, the microprocessor initializes its architectural state, fetches its first instructions from a reset address, and translates them all as defined by the boot ISA. An instruction cache caches the x86 and ARM instructions and provides them to the translator.

...read moreread less

Journal Article•DOI•

Parallel and efficient Boolean on polygonal solids

[...]

Hanli Zhao¹, Charlie C. L. Wang², Yong Chen³, Xiaogang Jin⁴•Institutions (4)

Wenzhou University¹, The Chinese University of Hong Kong², University of Southern California³, Zhejiang University⁴

01 Jun 2011-The Visual Computer

TL;DR: A novel framework which can efficiently evaluate approximate Boolean set operations for B-rep models by highly parallel algorithms is presented by taking axis-aligned surfels of Layered Depth Images (LDI) as a bridge and performing Boolean operations on the structured points.

...read moreread less

Abstract: We present a novel framework which can efficiently evaluate approximate Boolean set operations for B-rep models by highly parallel algorithms. This is achieved by taking axis-aligned surfels of Layered Depth Images (LDI) as a bridge and performing Boolean operations on the structured points. As compared with prior surfel-based approaches, this paper has much improvement. Firstly, we adopt key-data pairs to store LDI more compactly. Secondly, robust depth peeling is investigated to overcome the bottleneck of layer-complexity. Thirdly, an out-of-core tiling technique is presented to overcome the limitation of memory. Real-time feedback is provided by streaming the proposed pipeline on the many-core graphics hardware.

...read moreread less

Journal Article•DOI•

Design and FPGA Implementation of Systolic Array Architecture for Matrix Multiplication

[...]

Mahendra Vucha, Arvind Rajawat

31 Jul 2011-International Journal of Computer Applications

TL;DR: The proposed Matrix Multiplication with systolic architecture is enhances the speed of matrix multiplication by twice of conventional method.

...read moreread less

Abstract: The evolution of computer and Internet has brought demand for powerful and high speed data processing, but in such complex environment fewer methods can provide perfect solution. To handle above addressed issue, parallel computing is proposed as a solution to the contradiction. This paper provides solution for the addressed issues of demand for high speed data processing. This paper demonstrates an effective design for the Matrix Multiplication using Systolic Architecture on Reconfigurable Systems (RS) like Field Programmable Gate Arrays (FPGAs). Here, the systolic architecture increases the computing speed by combining the concept of parallel processing and pipelining into a single concept. Here, the RTL code is written for matrix multiplication with systolic architecture and matrix multiplication without systolic architecture in Verilog HDL, compiled and simulated by using Modelsim XE III 6.4b, Synthesized by using Xilinx ISE 9.2i and targeted to the device xc3s500e-5-ft256 and then finally the designs are compared to each other to evaluate the performance of proposed architecture. The proposed Matrix Multiplication with systolic architecture is enhances the speed of matrix multiplication by twice of conventional method.

...read moreread less

Journal Article•DOI•

A combined CLP-MILP approach for scheduling commodities in a pipeline

[...]

Leandro Magatão¹, Lúcia Valéria Ramos de Arruda¹, Flávio Neves-Jr¹•Institutions (1)

Federal University of Technology - Paraná¹

01 Feb 2011-Journal of Scheduling

TL;DR: The computational results have demonstrated that the model is able to define new operational points to the pipeline, providing significant cost savings, and the CLP-MILP model is an efficient tool to aid operational decision-making within this real-world pipeline scenario.

...read moreread less

Abstract: This paper addresses the problem of developing an optimization model to aid the operational scheduling in a real-world pipeline scenario The pipeline connects refinery and harbor, conveying different types of commodities (gasoline, diesel, kerosene, etc) An optimization model was developed to determine pipeline scheduling with improved efficiency This model combines constraint logic programming (CLP) and mixed integer linear programming (MILP) in a CLP-MILP approach The proposed model uses decomposition strategies, continuous time representation, intervals that indicate time constraints (time windows), and a series of operational issues, such as the seasonal and hourly cost of electric energy (on-peak demand hours) Real cases were solved in a matter of seconds The computational results have demonstrated that the model is able to define new operational points to the pipeline, providing significant cost savings Indeed the CLP-MILP model is an efficient tool to aid operational decision-making within this real-world pipeline scenario

...read moreread less

Journal Article•DOI•

Effective Hybrid Test Program Development for Software-Based Self-Testing of Pipeline Processor Cores

[...]

Tai-Hua Lu¹, Chung-Ho Chen¹, Kuen-Jong Lee¹•Institutions (1)

National Cheng Kung University¹

01 Mar 2011-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: This paper presents the concept of micro observation versus macro observation and shows that the most effective method of using SBST is through a multiple input signature register connected to the processor local bus, while conventional methods that observe only the program results in the memory lead to significantly less processor fault coverage.

...read moreread less

Abstract: This paper presents an effective hybrid test program for the software-based self-testing (SBST) of pipeline processor cores. The test program combines a deterministically developed program which explores different levels of processor core information and a block-based random program which consists of a combination of in-order instructions, random-order instructions, return instructions, as well as instruction sequences used to trigger exception/interrupt requests. Due to the complementary nature of this hybrid test program, it can achieve processor fault coverage that is comparable to the performance of the conventional scan chain method. The test response observation methods and their impacts on fault coverage are also investigated. We present the concept of micro observation versus macro observation and show that the most effective method of using SBST is through a multiple input signature register connected to the processor local bus, while conventional methods that observe only the program results in the memory lead to significantly less processor fault coverage.

...read moreread less

Journal Article•DOI•

Hepatitis C — pipeline update

[...]

Irena Melnikova

01 Feb 2011-Nature Reviews Drug Discovery

TL;DR: It is clear that this field is on the brink of significant improvements to the SOC that should follow the highly anticipated approval of the first direct antiviral agents, telaprevir (Vertex/Johnson & Johnson/Mitsubishi) and boceprevir (Merck), in 2011.

...read moreread less

Abstract: Over 3% of the world’s population is chronically infected with hepatitis C virus (HCV). Chronic hepatitis C (CHC) is a major cause of liver damage, cirrhosis and liver cancer that can lead to liver failure and death. The current standard of care (SOC) is a combination of pegylated interferon with ribavirin (PEG-IFN/RBV), but this only eradicates the virus in approximately 50% of patients1. The recently concluded sixty-first annual meeting of the American Association for the Study of Liver Diseases (AASLD) provided a broad overview of the pipeline of novel drugs for the treatment of CHC2. It is clear that this field is on the brink of significant improvements to the SOC that should follow the highly anticipated approval of the first direct antiviral agents (DAAs), telaprevir (Vertex/Johnson & Johnson/Mitsubishi) and boceprevir (Merck), in 2011.

...read moreread less

Journal Article•

Regression models estimate pipeline construction costs

[...]

Zhenhua Rui, Paul A. Metz, Douglas B. Reynolds, Gang Chen, Xiyu Zhou - Show less +1 more

01 Jan 2011-Oil & Gas Journal

Proceedings Article•DOI•

A 16b 80MS/s 100mW 77.6dB SNR CMOS pipeline ADC

[...]

Janet Brunsilius¹, Eric Siragusa¹, Steve Kosic¹, Frank Murden¹, Ege Yetis¹, Binh Luu¹, Jeff Bray¹, Phil Brown¹, Allen R. Barlow¹ - Show less +5 more•Institutions (1)

Analog Devices¹

07 Apr 2011

TL;DR: Several architectural and circuit techniques used to achieve this performance are presented, which include a dynamically driven deep N-well input sampling switch, an offset-cancelled comparator, and a back-gate voltage-biased MDAC amplifier.

...read moreread less

Abstract: The high channel count of many modern communication systems increasingly requires high-performance ADCs that consume very little power. The 16b pipeline ADC described here achieves 77.6dBFS SNR, 77.6dBFS SNDR and 95dBc SFDR at 80MS/s with a 10MHz input. With a 200MHz input, the ADC achieves 71.0dBFS SNR, 69.4dBFS SNDR and 81dBc SFDR. The complete ADC including reference, clock, and digital circuitry consumes 100mW from a 1.8V supply. This compares favorably with recently reported ADCs in this performance class [1–3]. In this paper, several architectural and circuit techniques used to achieve this performance are presented. The techniques include a dynamically driven deep N-well input sampling switch, an offset-cancelled comparator, and a back-gate voltage-biased MDAC amplifier. The ADC is fabricated in a 1P5M 0.18μm CMOS process with deep N-well (DNW) isolation.

...read moreread less

Collapse