scispace - formally typeset
Search or ask a question

Showing papers on "Pipeline (computing) published in 2008"


Journal ArticleDOI
TL;DR: In this article, the authors developed an analytical model of pipeline CO2 transport for a broad range of potential carbon dioxide capture and storage (CCS) projects and showed that the transport cost is most sensitive to pipeline capacity factor and the capital recovery factor.

309 citations


Journal ArticleDOI
TL;DR: This paper details the design of a new high-speed pipelined application-specific instruction set processor (ASIP) for elliptic curve cryptography (ECC) using field-programmable gate-array (FPGA) technology.
Abstract: This paper details the design of a new high-speed pipelined application-specific instruction set processor (ASIP) for elliptic curve cryptography (ECC) using field-programmable gate-array (FPGA) technology. Different levels of pipelining were applied to the data path to explore the resulting performances and find an optimal pipeline depth. Three complex instructions were used to reduce the latency by reducing the overall number of instructions, and a new combined algorithm was developed to perform point doubling and point addition using the application specific instructions. An implementation for the United States Government National Institute of Standards and Technology-recommended curve over GF(2163) is shown, which achieves a point multiplication time of 33.05 s at 91 MHz on a Xilinx Virtex-E FPGA-the fastest figure reported in the literature to date. Using the more modern Xilinx Virtex-4 technology, a point multiplication time of 19.55 s was achieved, which translates to over 51120 point multiplications per second.

167 citations


Proceedings ArticleDOI
20 Feb 2008
TL;DR: FastForward is presented, a cache-optimized single-producer/single-consumer concurrent lock-free queue for pipeline parallelism on multicore architectures, with weak to strongly ordered consistency models, with up to 5x faster than the next best solution.
Abstract: Low overhead core-to-core communication is critical for efficient pipeline-parallel software applications. This paper presents FastForward, a cache-optimized single-producer/single-consumer concurrent lock-free queue for pipeline parallelism on multicore architectures, with weak to strongly ordered consistency models. Enqueue and dequeue times on a 2.66 GHz Opteron 2218 based system are as low as 28.5 ns, up to 5x faster than the next best solution. FastForward's effectiveness is demonstrated for real applications by applying it to line-rate soft network processing on Gigabit Ethernet with general purpose commodity hardware.

163 citations


Journal ArticleDOI
TL;DR: A novel approach, called QPALMA, for computing accurate spliced alignments which takes advantage of the read’s quality information as well as computational splice site predictions and uses a large margin approach similar to support vector machines to estimate its parameters to maximize alignment accuracy.
Abstract: Motivation: Next generation sequencing technologies open exciting new possibilities for genome and transcriptome sequencing. While reads produced by these technologies are relatively short and error prone compared to the Sanger method their throughput is several magnitudes higher. To utilize such reads for transcriptome sequencing and gene structure identification, one needs to be able to accurately align the sequence reads over intron boundaries. This represents a significant challenge given their short length and inherent high error rate. Results: We present a novel approach, called QPALMA, for computing accurate spliced alignments which takes advantage of the read's quality information as well as computational splice site predictions. Our method uses a training set of spliced reads with quality information and known alignments. It uses a large margin approach similar to support vector machines to estimate its parameters to maximize alignment accuracy. In computational experiments, we illustrate that the quality information as well as the splice site predictions help to improve the alignment quality. Finally, to facilitate mapping of massive amounts of sequencing data typically generated by the new technologies, we have combined our method with a fast mapping pipeline based on enhanced suffix arrays. Our algorithms were optimized and tested using reads produced with the Illumina Genome Analyzer for the model plant Arabidopsis thaliana. Availability: Datasets for training and evaluation, additional results and a stand-alone alignment tool implemented in C++ and python are available at http://www.fml.mpg.de/raetsch/projects/qpalma. Contact: Gunnar.Raetsch@tuebingen.mpg.de

157 citations


Patent
Kar-Han Tan1, Tomio Sonehara1
09 Dec 2008
TL;DR: In this article, a display pipeline of user-supplied image modification processing modules are reduced by first representing the processing modules as multiple, individual matrix operations and then combined with, i.e., multiplied to, the transformation matrix to create a modified transformation matrix.
Abstract: A projection system uses a transformation matrix to transform a projection image p in such a manner so as to compensate for surface irregularities on a projection surface. The transformation matrix makes use of properties of light transport relating a projector to a camera. A display pipeline of user-supplied image modification processing modules are reduced by first representing the processing modules as multiple, individual matrix operations. All the matrix operations are then combined with, i.e., multiplied to, the transformation matrix to create a modified transformation matrix. The created transformation matrix is then used in place of the original transformation matrix to simultaneously achieve both image transformation and any pre and post image processing defined by the image modification processing modules.

149 citations


Journal ArticleDOI
01 Aug 2008
TL;DR: A streaming multigrid solver is developed, which needs just two sequential passes over out-of-core data and can outperform spatially adaptive solvers that exploit application-specific knowledge.
Abstract: We introduce a new tool to solve the large linear systems arising from gradient-domain image processing. Specifically, we develop a streaming multigrid solver, which needs just two sequential passes over out-of-core data. This fast solution is enabled by a combination of three techniques: (1) use of second-order finite elements (rather than traditional finite differences) to reach sufficient accuracy in a single V-cycle, (2) temporally blocked relaxation, and (3) multi-level streaming to pipeline the restriction and prolongation phases into single streaming passes. A key contribution is the extension of the B-spline finite-element method to be compatible with the forward-difference gradient representation commonly used with images. Our streaming solver is also efficient for in-memory images, due to its fast convergence and excellent cache behavior. Remarkably, it can outperform spatially adaptive solvers that exploit application-specific knowledge. We demonstrate seamless stitching and tone-mapping of gigapixel images in about an hour on a notebook PC.

141 citations


Journal ArticleDOI
TL;DR: A novel MINLP formulation based on a continuous time representation for the scheduling of multiproduct pipeline systems that must supply multiple consumer markets, which considers that the pipeline operates intermittently and that the pumping costs depend on the booster stations yield rates, which in turn may generate different flow rates.

122 citations


Journal ArticleDOI
TL;DR: The pipeline deals with architectures that are made of planar faces and faithfully constructs a polyhedron of low complexity based on the incomplete scans, which offers a convenient user interface but minimizes the necessity of user intervention.
Abstract: We present a pipeline to reconstruct complete geometry of architectural buildings from point clouds obtained by sparse range laser scanning. Due to limited accessibility of outdoor environments, complete and sufficient scanning of every face of an architectural building is often impossible. Our pipeline deals with architectures that are made of planar faces and faithfully constructs a polyhedron of low complexity based on the incomplete scans. The pipeline first recognizes planar regions based on point clouds, then proceeds to compute plane intersections and corners (in this paper, we use the informal terms corner or vertex corner to stand for a polyhedron vertex. See the Overview section for notation declarations), and finally produces a complete polyhedron. Within the pipeline, several algorithms based on the polyhedron geometry assumption are designed to perform data clustering, boundary detection, and face extraction. Our system offers a convenient user interface but minimizes the necessity of user intervention. We demonstrate the capability and advantage of our system by modeling real-life buildings.

117 citations


01 Mar 2008
TL;DR: The Ames Stereo Pipeline (ASP) as discussed by the authors is an automated stereo processing software system that is capable of generating high quality digital terrain models (DTMs) from orbital imagery using a fully automated process.
Abstract: Introduction: The Mars Orbital Laser Altimeter (MOLA) has significantly advanced the study of the Martian surface by providing geologists with a highly accurate elevation map of the entire planet [1]. However, its limited resolution (463m/pixel at the equator) and localized interpolation artifacts have rendered it insufficient for detailed studies of specific sites; e.g. geologic stratification and deposition analysis, or in the case of mission planning, landing site selection. The most common technique for obtaining higherresolution digital terrain models (DTMs) is to employ stereogrammetric techniques, however the substantial number of man-hours and resources required for this aproach has meant that relatively few of these data products have reached the scientific community. To address this problem, the Intelligent Robotics Group (IRG) at NASA Ames Research Center has developed an automated stereo processing software system, the Ames Stereo Pipeline (ASP), that is capable of generating high quality DTMs from orbital imagery using a fully automated process [2]. Approach: The image processing pipeline for the ASP can be broken down as follows. First, images are pre-processed by applying the “Sign of the Laplacian of the Gaussian” (SLOG) filter introduced by Nishihara [3]. This filter is a composition of Laplacian and Gaussian filters followed by the application of a threshold step, which results in increased robustness to lighting variation in the stereo pair. For dense stereo correlation, the ASP implements a fast area based sum of absolute difference (SOAD) correlation algorithm. The resulting disparity map encodes the offsets between matching pixels in the stereo pair. Several versions of this correlator are available including a multi-scale implementation which first computes a low resolution stereo disparity map that is then refined at successively higher levels of detail until the native resolution of the source images has been reached. Further performance is gained by adaptively partitioning the stereo images into tiles to minimize the disparity search range for any given tile. A final 3D point cloud is calculated from the disparity map by computing the closest point of intersection of two rays emanating from the cameras through the matched pixels. The ASP includes several camera models that describe the geometry of various imagers including an adaptation of the linear push-broom model [4] of line-scan imagers; a geometry that is found in many modern orbiting camera platforms. Several final data products can be generated from the 3D point cloud including 3D triangle meshes (e.g. VRML models) and ortho-rectified, map projected DTMs and camera imagery. Results: Typical processing times are on the order of minutes to tens of minutes depending on the resolution of the images. A level of quality assessment and control that is useful for many applications can be achieved with minimal human involvement. The ASP is being used in existing collaborations with Malin Space Science Systems (MSSS) and the US Geological Survey (USGS) to generate DTMs from the Narrow Angle Mars Orbital Camera (MOC-NA) (Figures 1 and 2), the MRO Context Camera (CTX), the High Resolution Stereo Camera (HRSC), and the Apollo Panoramic & Metric Cameras (Figure 3). Current and Future Activities: As work on the Ames Stereo Pipeline continues, our focus will be on integration, validation, and scalability. In particular, we have begun work in the following areas: Integration with widely adopted cartographic software: The USGS Integrated Software for Imagers and Spectrometers (ISIS) package is widely used in the planetary science community for processing raw spacecraft imagery into high level data products of scientific interest such as map projected and mosaicked imagery [5,6]. We are enabling the ASP to read ISIS image files and to utilize ISIS camera models, thereby allowing scientists to prepare data for stereo processing using a familiar tool-chain and peer reviewed camera photometric and geometric calibration.

111 citations


Journal ArticleDOI
TL;DR: A systematic SBST methodology that enhances existing SBST programs so that they comprehensively test the pipeline logic, and applies it to two complex benchmark RISC processors with respect to two fault models: stuck-at fault model and transition delay fault model.
Abstract: Software-based self-test (SBST) has recently emerged as an effective methodology for the manufacturing test of processors and other components in systems-on-chip (SoCs). By moving test related functions from external resources to the SoC's interior, in the form of test programs that the on-chip processor executes, SBST significantly reduces the need for high-cost, big-iron testers, and enables high-quality at-speed testing and performance binning. Thus far, SBST approaches have focused almost exclusively on the functional (programmer visible) components of the processor. In this paper, we analyze the challenges involved in testing an important component of modern processors, namely, the pipelining logic, and propose a systematic SBST methodology to address them. We first demonstrate that SBST programs that only target the functional components of the processor are not sufficient to test the pipeline logic, resulting in a significant loss of overall processor fault coverage. We further identify the testability hotspots in the pipeline logic using two fully pipelined reduced instruction set computer (RISC) processor benchmarks. Finally, we develop a systematic SBST methodology that enhances existing SBST programs so that they comprehensively test the pipeline logic. The proposed methodology is complementary to previous SBST techniques that target functional components (their results can form the input to our methodology, and thus we can reuse the test development effort behind preexisting SBST programs). We automate our methodology and incorporate it in an integrated software environment (developed using Java, XML, and archC) for the automatic generation of SBST routines for microprocessors. We apply the methodology to the two complex benchmark RISC processors with respect to two fault models: stuck-at fault model and transition delay fault model. Simulation results show that our methodology provides significant improvements for the two fault models, both for the entire processor (12% fault coverage improvement on average) and for the pipeline logic itself (19% fault coverage improvement on average), compared to a conventional SBST approach.

109 citations


Proceedings ArticleDOI
13 Apr 2008
TL;DR: A parallel SRAM-based multi- pipeline architecture for terabit IP lookup, with a two-level mapping scheme, that can store a core routing table with over 200 K unique routing prefixes using 3.5 MB of memory.
Abstract: Continuous growth in network link rates poses a strong demand on high speed IP lookup engines. While Ternary Content Addressable Memory (TCAM) based solutions serve most of today's high-end routers, they do not scale well for the next-generation. On the other hand, pipelined SRAM- based algorithmic solutions become attractive. Intuitively multiple pipelines can be utilized in parallel to have a multiplicative effect on the throughput. However, several challenges must be addressed for such solutions to realize high throughput. First, the memory distribution across different stages of each pipeline as well as across different pipelines must be balanced. Second, the traffic on various pipelines should be balanced. In this paper, we propose a parallel SRAM-based multi- pipeline architecture for terabit IP lookup. To balance the memory requirement over the stages, a two-level mapping scheme is presented. By trie partitioning and subtrie-to-pipeline mapping, we ensure that each pipeline contains approximately equal number of trie nodes. Then, within each pipeline, a fine-grained node-to-stage mapping is used to achieve evenly distributed memory across the stages. To balance the traffic on different pipelines, both pipelined prefix caching and dynamic subtrie-to-pipeline remapping are employed. Simulation using real-life data shows that the proposed architecture with 8 pipelines can store a core routing table with over 200 K unique routing prefixes using 3.5 MB of memory. It achieves a throughput of up to 3.2 billion packets per second, i.e. 1 Tbps for minimum size (40 bytes) packets.

Patent
23 Sep 2008
TL;DR: In this paper, a processor core implements n virtual processors, a pipeline having p ordered stages, including a memory operation stage, and a virtual processor selector function, with the first and last memory banks being the same as in this paper.
Abstract: A parallel processing computing system includes an ordered set of m memory banks and a processor core. The ordered set of m memory banks includes a first and a last memory bank, wherein m is an integer greater than 1. The processor core implements n virtual processors, a pipeline having p ordered stages, including a memory operation stage, and a virtual processor selector function.

Journal ArticleDOI
TL;DR: The prototype ADC achieves low-power consumption and small die area by sharing an opamp between two successive pipeline stages by completely merging the front-end sample-and-hold amplifier into the first multiplying digital-to-analog converter (MDAC) using the proposed opamp and capacitor sharing technique.
Abstract: A low-power 14-b 100-MS/s analog-to-digital converter (ADC) is described. The prototype ADC achieves low-power consumption and small die area by sharing an opamp between two successive pipeline stages. Further reduction of power and area is achieved by completely merging the front-end sample-and-hold amplifier (SHA) into the first multiplying digital-to-analog converter (MDAC) using the proposed opamp and capacitor sharing technique. The ADC, implemented in a 0.18-mum dual-gate-oxide (DGO) CMOS technology, achieves 72.4-dB signal-to-noise and distortion ratio, 88.5-dB spurious free dynamic range, and 11.7 effective number of bits at full sampling rate with a 46-MHz input while consuming 230-mW from a 3-V supply.

Journal Article
TL;DR: In this article, the authors proposed a new architecture of multiplier-and-accumulator (MAC) for high speed multiplication and accumulation arithmetic, by combining multiplication with accumulation and devising a hybrid type of carry save adder (CSA), the performance was improved.
Abstract: In this paper, we propose a new architecture of multiplier-and-accumulator (MAC) for high speed multiplication and accumulation arithmetic. By combining multiplication with accumulation and devising a hybrid type of carry save adder (CSA), the performance was improved. Since the accumulator which has the largest delay in MAC was removed and its function was included into CSA, the overall performance becomes to be elevated. The proposed CSA tree uses 1's complement-based radix-2 modified booth algorithm (MBA) and has the modified array for the sign extension in order to increase the bit density of operands. The CSA propagates the carries by the least significant bits of the partial products and generates the least significant bits in advance for decreasing the number of the input bits of the final adder. Also, the proposed MAC accumulates the intermediate results in the type of sum and carry bits not the output of the final adder for improving the performance by optimizing the efficiency of pipeline scheme. The proposed architecture was synthesized with and 90nm standard CMOS library after designing it. We analyzed the results such as hardware resource, delay, and pipeline which are based on the theoretical and experimental estimation. We used Sakurai's alpha power low for the delay modeling. The proposed MAC has the superior properties to the standard design in many ways and its performance is twice as much than the previous research in the similar clock frequency.

Journal ArticleDOI
TL;DR: This paper implements the FastICA algorithm in a field-programmable gate array (FPGA), with the ability of real-time sequential mixed signals processing by the proposed pipelined FastICA architecture, and demonstrates the effectiveness of the presented hardware FastICA as expected.
Abstract: Fast independent component analysis (FastICA) algorithm separates the independent sources from their mixtures by measuring non-Gaussian. FastICA is a common offline method to identify artifact and interference from their mixtures such as electroencephalogram (EEG), magnetoencephalography (MEG), and electrocardiogram (ECG). Therefore, it is valuable to implement FastICA for real-time signal processing. In this paper, the FastICA algorithm is implemented in a field-programmable gate array (FPGA), with the ability of real-time sequential mixed signals processing by the proposed pipelined FastICA architecture. Moreover, in order to increase the numbers precision, the hardware floating-point (FP) arithmetic units had been carried out in the hardware FastICA. In addition, the proposed pipeline FastICA provides the high sampling rate (192 kHz) capability by hand coding the hardware FastICA in hardware description language (HDL). To verify the features of the proposed hardware FastICA, simulations are first performed, then real-time signal processing experimental results are presented using the fabricated platform. Experimental results demonstrate the effectiveness of the presented hardware FastICA as expected.

Journal ArticleDOI
Charles F. Webb1
TL;DR: This article focuses on the high-frequency design techniques used to achieve a 4.4-GHz system, and on the pipeline design that optimizes z10's CPU performance.
Abstract: The IBM system z10 includes four microprocessor cores - each with a private 3-Mbyte cache - and integrated accelerators for decimal floating-point computation, cryptography, and data compression. A separate SMP hub chip provides a shared third-level cache and interconnect fabric for multiprocessor scaling. This article focuses on the high-frequency design techniques used to achieve a 4.4-GHz system, and on the pipeline design that optimizes z10's CPU performance.

Patent
31 Jan 2008
TL;DR: In this article, the number of instances of the graphics processing operations needed to process the set of plural sampling points which the fragment represents is reduced in comparison to conventional multisampling graphics processing techniques which perform graphics processing operation for fragments on a per sample basis.
Abstract: A graphics processing pipeline determines whether respective graphics processing operations, such as respective blends, respective depth tests, etc., to be performed at a stage of the graphics processing pipeline would produce the same result for each sampling point of a set of plural sampling points represented by a fragment being processed by the graphics processing pipeline. If it is determined that respective graphics processing operations would produce the same result for each of the sampling points, then only a single instance of the graphics processing operation is performed and the result of that graphics processing operation is associated with each of the sampling points. The number of instances of the graphics processing operations needed to process the set of plural sampling points which the fragment represents is reduced in comparison to conventional multisampling graphics processing techniques which perform graphics processing operations for fragments on a “per sample” basis. The determination of whether or not the same result would be produced for each sampling point of the set of plural sampling points is facilitated by providing metadata which indicates whether or not fragment data and/or stored sample data for use when processing the sampling points is the same.

Journal ArticleDOI
TL;DR: In this paper, an efficient VLSI architecture of a pipeline fast Fourier transform (FFT) processor capable of producing the normal output order sequence is presented and a sequence conversion method by integrating the conversion function into the last-stage data commutator module is presented.
Abstract: In this paper, an efficient VLSI architecture of a pipeline fast Fourier transform (FFT) processor capable of producing the normal output order sequence is presented. A new FFT design based on the decimated dual-path delay feed-forward data commutator unit by splitting the input stream into two half-word streams is first proposed. The resulting architecture can achieve full hardware efficiency such that the required number of adders can be reduced by half. Next, in order to generate the normal output order sequence, this paper also presents a sequence conversion method by integrating the conversion function into the last-stage data commutator module.

Proceedings ArticleDOI
18 Jun 2008
TL;DR: Simulations indicate that the architecture and circuitry are well suited to scaling below 90 nm, and the prototype ADC, implemented in 0.18 mum CMOS, provides 10.65 ENOB at 250 MS/s while consuming only 140 mW, yielding an exceptionally low FoM of 0.28 pJ/conversion-step.
Abstract: A 13-bit ADC is implemented using a novel charge-domain architecture. Enhanced bucket-brigade circuitry and a tapered charge pipeline provide precision charge-domain operation in a standard CMOS process, while eliminating the need for signal-path op-amps. The prototype ADC, implemented in 0.18 mum CMOS, provides 10.65 ENOB at 250 MS/s while consuming only 140 mW, yielding an exceptionally low FoM of 0.28 pJ/conversion-step. Simulations indicate that the architecture and circuitry are well suited to scaling below 90 nm.

01 Jan 2008
TL;DR: A novel flexible multiprocessor platform for high throughput turbo decoding that enables exploiting all parallelism levels of turbo decoding applications to fulfill performance requirements and is reusable for all simple and double binary turbo codes of existing and emerging standards.
Abstract: Emergingdigitalcommunicationapplicationsandthe underlying architectures encounter drastically increasing perfor- mance and flexibility requirements. In this paper, we present a novel flexible multiprocessor platform for high throughput turbo decoding. The proposed platform enables exploiting all parallelism levelsofturbodecodingapplicationstofulfillperformancerequire- ments. In order to fulfill flexibility requirements, the platform is structuredaroundconfigurableapplication-specificinstruction-set processors (ASIP) combined with an efficient memory and com- munication interconnect scheme. The designed ASIP has an single instruction multiple data (SIMD) architecture with a specialized and extensible instruction-set and 6-stages pipeline control. The attached memories and communication interfaces enable its inte- gration in multiprocessor architectures. These multiprocessor ar- chitectures benefit from the recent shuffled decoding technique in- troduced in the turbo-decoding field to achieve higher throughput. The major characteristics of the proposed platform are its flex- ibility and scalability which make it reusable for all simple and double binary turbo codes of existing and emerging standards. Re- sults obtained for double binary WiMAX turbo codes demonstrate around 250 Mb/s throughput using 16-ASIP multiprocessor archi- tecture.

Journal ArticleDOI
TL;DR: This work describes a novel superpipelined, fully parallelized architecture for optical-flow processing, which is capable of processing up to 170 frames per second at a resolution of 800x600 pixels, and discusses the advantages of high-frame-rate processing.

Journal ArticleDOI
TL;DR: An integer programming approach to oil derivative transportation scheduling is presented, which aims to meet market demands while satisfying many pipeline operational constraints such as minimum interfaces.
Abstract: This paper presents an integer programming approach to oil derivative transportation scheduling. The system reported is composed of an oil refinery, one multi-branch multi-product pipeline connected to several depots and also local consumer markets which receive large amounts of refinery products. Batches of refined products and grades are pumped back-to-back in the pipeline, without any separation device between them. The sequence and lengths of such pumping runs should be carefully selected in order to meet market demands while satisfying many pipeline operational constraints such as minimum interfaces.

Proceedings ArticleDOI
13 Jun 2008
TL;DR: This paper focuses on the CPU efficiency tradeoffs of tuple representations inside the query execution engine, while tuples flow through a processing pipeline, and analyzes the performance in the context of query engines using so-called "block-oriented" processing.
Abstract: Comparisons between the merits of row-wise storage (NSM) and columnar storage (DSM) are typically made with respect to the persistent storage layer of database systems. In this paper, however, we focus on the CPU efficiency tradeoffs of tuple representations inside the query execution engine, while tuples flow through a processing pipeline. We analyze the performance in the context of query engines using so-called "block-oriented" processing -- a recently popularized technique that can strongly improve the CPU efficiency. With this high efficiency, the performance trade-offs between NSM and DSM can have a decisive impact on the query execution performance, as we demonstrate using both microbenchmarks and TPC-H query 1. This means that NSM-based database systems can sometimes benefit from converting tuples into DSM on-the-fly, and vice versa.

Journal ArticleDOI

Journal ArticleDOI
TL;DR: In this article, the authors present linear models of the most common components in the value chain for capture and storage for new gas power plants, which is consistent with current models for gas, electricity, and heat infrastructures.
Abstract: This paper presents linear models of the most common components in the value chain for capture and storage. The optimal investment planning of new gas power plants traditionally includes the cost of fuel versus sales of electricity and heat from the plant. If a new power plant also causes additional investments in gas infrastructure, these should be included in the optimization. With the increasing focus on global emissions, yet another aspect is introduced in the form of technology and infrastructure for capture, transport, and storage of . To be able to include all these aspects in the planning of new power plants, linear models for capture and storage are formulated consistent with current models for gas, electricity, and heat infrastructures. This paper presents models for the following infrastructure: source, combined cycle gas turbine producing electricity, heat and exhaust, capture plant, pipeline, liquefaction plant, storage, ship transport, injection pump, and demand/market.

Journal ArticleDOI
TL;DR: This paper describes techniques for accelerating the performance of the string set matching problem with particular emphasis on applications in computational proteomics by splitting the traditional Aho-Corasick finite state machine into five simpler FSMs, which operate on a single bit position in the input.
Abstract: Background This paper describes techniques for accelerating the performance of the string set matching problem with particular emphasis on applications in computational proteomics. The process of matching peptide sequences against a genome translated in six reading frames is part of a proteogenomic mapping pipeline that is used as a case-study. The Aho-Corasick algorithm is adapted for execution in field programmable gate array (FPGA) devices in a manner that optimizes space and performance. In this approach, the traditional Aho-Corasick finite state machine (FSM) is split into smaller FSMs, operating in parallel, each of which matches up to 20 peptides in the input translated genome. Each of the smaller FSMs is further divided into five simpler FSMs such that each simple FSM operates on a single bit position in the input (five bits are sufficient for representing all amino acids and special symbols in protein sequences).

Journal ArticleDOI
08 Mar 2008-Fly
TL;DR: A data pipeline developed to extract the quantitative data on segmentation gene expression from confocal images of gene expression patterns in Drosophila was successfully applied to obtain quantitative gene expression data at cellular resolution in space and at the 6.5 minute resolution in time.
Abstract: We describe a data pipeline developed to extract the quantitative data on segmentation gene expression from confocal images of gene expression patterns in Drosophila. The pipeline consists of five steps: image segmentation, background removal, temporal characterization of an embryo, data registration and data averaging. This pipeline was successfully applied to obtain quantitative gene expression data at cellular resolution in space and at the 6.5-minute resolution in time, as well as to construct a spatiotemporal atlas of segmentation gene expression. Each data pipeline step can be easily adapted to process a wide range of images of gene expression patterns.

Proceedings ArticleDOI
01 Feb 2008
TL;DR: A low-power 1.2 V pipelined ADC is implemented in a 65 nm CMOS process to achieve 10b resolution at 100 MS/s based on the use of a dedicated thin-oxide high-performance analog (HPA) MOS transistor.
Abstract: A low-power 1.2 V pipelined ADC is implemented in a 65 nm CMOS process to achieve 10b resolution at 100 MS/s based on the use of a dedicated thin-oxide high-performance analog (HPA) MOS transistor. The pipeline ADC is composed of eight 1.5b pipelined stages followed by a 2b flash converter as the last stage. In order to optimize the power consumption, the capacitances and the bias current of each stage have been scaled down along the pipeline chain. Measurement results of this ADC revealed a SNDR of 59 dB with a power dissipation of 4.5 mW. The core occupies 0.07 mm2, and 0.1 mm2 with the reference.

Proceedings ArticleDOI
08 Jun 2008
TL;DR: This work presents a parallel transient simulation methodology and its multi-threaded implementation for general analog and digital ICs, and exploits coarsegrained application-level parallelism by simultaneously computing circuit solutions at multiple adjacent time points in a way resembling hardware pipelining.
Abstract: While the emergence of multi-core shared-memory machines offers a promising computing solution to ever complex chip design problems, new parallel CAD methodologies must be developed to gain the full benefit of these increasingly parallel computing systems. We present a parallel transient simulation methodology and its multi-threaded implementation for general analog and digital ICs. Our new approach, Waveform Pipelining (abbreviated as WavePipe), exploits coarsegrained application-level parallelism by simultaneously computing circuit solutions at multiple adjacent time points in a way resembling hardware pipelining. There are two embodiments in WavePipe: backward and forward pipelining schemes. While the former creates independent computing tasks that contribute to a larger future time step by moving backwards in time, the latter performs predictive computing along the forward direction of the time axis. Unlike existing relaxation methods, WavePipe facilitates parallel circuit simulation without jeopardying convergence and accuracy. As a coarse-grained parallel approach, WavePipe not only requires low parallel programming effort, more importantly, it creates new avenues to fully utilize increasingly parallel hardware by going beyond conventional finer grained parallel device model evaluation and matrix solutions.

Proceedings ArticleDOI
14 Apr 2008
TL;DR: An IP lookup rate of 325 MLPS is achieved using a novel SRAM-based bidirectional optimized linear pipeline architecture on Field Programmable Gate Array, named BiOLP, for tree-based search engines in IP routers, which can achieve a high throughput of up to 1.3 GLPS.
Abstract: Internet Protocol (IP) lookup in routers can be implemented by some form of tree traversal. Pipelining can dramatically improve the search throughput. However, it results in unbalanced memory allocation over the pipeline stages. This has been identified as a major challenge for pipelined solutions. In this paper, an IP lookup rate of 325 MLPS (millions lookups per second) is achieved using a novel SRAM-based bidirectional optimized linear pipeline architecture on Field Programmable Gate Array, named BiOLP, for tree-based search engines in IP routers. BiOLP can also achieve a perfectly balanced memory distribution over the pipeline stages. Moreover, by employing caching to exploit the Internet traffic locality, BiOLP can achieve a high throughput of up to 1.3 GLPS (billion lookups per second). It also maintains packet input order, and supports route updates without blocking subsequent incoming packets.