scispace - formally typeset
Search or ask a question

Showing papers on "Pipeline (computing) published in 2016"


Journal ArticleDOI
18 Jun 2016
TL;DR: This work explores an in-situ processing approach, where memristor crossbar arrays not only store input weights, but are also used to perform dot-product operations in an analog manner.
Abstract: A number of recent efforts have attempted to design accelerators for popular machine learning algorithms, such as those involving convolutional and deep neural networks (CNNs and DNNs). These algorithms typically involve a large number of multiply-accumulate (dot-product) operations. A recent project, DaDianNao, adopts a near data processing approach, where a specialized neural functional unit performs all the digital arithmetic operations and receives input weights from adjacent eDRAM banks.This work explores an in-situ processing approach, where memristor crossbar arrays not only store input weights, but are also used to perform dot-product operations in an analog manner. While the use of crossbar memory as an analog dot-product engine is well known, no prior work has designed or characterized a full-fledged accelerator based on crossbars. In particular, our work makes the following contributions: (i) We design a pipelined architecture, with some crossbars dedicated for each neural network layer, and eDRAM buffers that aggregate data between pipeline stages. (ii) We define new data encoding techniques that are amenable to analog computations and that can reduce the high overheads of analog-to-digital conversion (ADC). (iii) We define the many supporting digital components required in an analog CNN accelerator and carry out a design space exploration to identify the best balance of memristor storage/compute, ADCs, and eDRAM storage on a chip. On a suite of CNN and DNN workloads, the proposed ISAAC architecture yields improvements of 14.8×, 5.5×, and 7.5× in throughput, energy, and computational density (respectively), relative to the state-of-the-art DaDianNao architecture.

1,558 citations


Book ChapterDOI
08 Oct 2016
TL;DR: This work introduces a novel Deep Network architecture that implements the full feature point handling pipeline, that is, detection, orientation estimation, and feature description, and shows how to learn to do all three in a unified manner while preserving end-to-end differentiability.
Abstract: We introduce a novel Deep Network architecture that implements the full feature point handling pipeline, that is, detection, orientation estimation, and feature description. While previous works have successfully tackled each one of these problems individually, we show how to learn to do all three in a unified manner while preserving end-to-end differentiability. We then demonstrate that our Deep pipeline outperforms state-of-the-art methods on a number of benchmark datasets, without the need of retraining.

878 citations


Posted Content
TL;DR: In this article, a novel deep network architecture is introduced that implements the full feature point handling pipeline, that is, detection, orientation estimation, and feature description, in a unified manner while preserving end-to-end differentiability.
Abstract: We introduce a novel Deep Network architecture that implements the full feature point handling pipeline, that is, detection, orientation estimation, and feature description. While previous works have successfully tackled each one of these problems individually, we show how to learn to do all three in a unified manner while preserving end-to-end differentiability. We then demonstrate that our Deep pipeline outperforms state-of-the-art methods on a number of benchmark datasets, without the need of retraining.

325 citations


Posted Content
TL;DR: An efficient, scalable feature extraction algorithm for time series, which filters the available features in an early stage of the machine learning pipeline with respect to their significance for the classification or regression task, while controlling the expected percentage of selected but irrelevant features.
Abstract: The all-relevant problem of feature selection is the identification of all strongly and weakly relevant attributes This problem is especially hard to solve for time series classification and regression in industrial applications such as predictive maintenance or production line optimization, for which each label or regression target is associated with several time series and meta-information simultaneously Here, we are proposing an efficient, scalable feature extraction algorithm for time series, which filters the available features in an early stage of the machine learning pipeline with respect to their significance for the classification or regression task, while controlling the expected percentage of selected but irrelevant features The proposed algorithm combines established feature extraction methods with a feature importance filter It has a low computational complexity, allows to start on a problem with only limited domain knowledge available, can be trivially parallelized, is highly scalable and based on well studied non-parametric hypothesis tests We benchmark our proposed algorithm on all binary classification problems of the UCR time series classification archive as well as time series from a production line optimization project and simulated stochastic processes with underlying qualitative change of dynamics

227 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors presented a risk-based accident model to conduct quantitative risk analysis (QRA) for leakage failure of submarine pipeline, which can provide a more case-specific and realistic analysis consequence compared to bow-tie method.

205 citations


Proceedings ArticleDOI
13 Aug 2016
TL;DR: A highly accurate SMART-based analysis pipeline that can correctly predict the necessity of a disk replacement even 10-15 days in advance and uses statistical techniques to automatically detect which SMART parameters correlate with disk replacement.
Abstract: Disks are among the most frequently failing components in today's IT environments. Despite a set of defense mechanisms such as RAID, the availability and reliability of the system are still often impacted severely. In this paper, we present a highly accurate SMART-based analysis pipeline that can correctly predict the necessity of a disk replacement even 10-15 days in advance. Our method has been built and evaluated on more than 30000 disks from two major manufacturers, monitored over 17 months. Our approach employs statistical techniques to automatically detect which SMART parameters correlate with disk replacement and uses them to predict the replacement of a disk with even 98% accuracy.

134 citations


Patent
05 Jan 2016
TL;DR: A system, method and apparatus for executing a sequence analysis pipeline on genetic sequence data includes a structured ASIC formed of a set of hardwired digital logic circuits that are interconnected by physical electrical interconnects.
Abstract: A system, method and apparatus for executing a sequence analysis pipeline on genetic sequence data includes a structured ASIC formed of a set of hardwired digital logic circuits that are interconnected by physical electrical interconnects One of the physical electrical interconnects forms an input to the structured ASIC connected with an electronic data source for receiving reads of genomic data The hardwired digital logic circuits are arranged as a set of processing engines, each processing engine being formed of a subset of the hardwired digital logic circuits to perform one or more steps in the sequence analysis pipeline on the reads of genomic data Each subset of the hardwired digital logic circuits is formed in a wired configuration to perform the one or more steps in the sequence analysis pipeline

124 citations


Posted Content
TL;DR: In this paper, a highway network architecture is proposed for computing the matching cost at each possible disparity, based on multilevel weighted residual shortcuts, trained with a hybrid loss that supports multi-level comparison of image patches.
Abstract: We present an improved three-step pipeline for the stereo matching problem and introduce multiple novelties at each stage. We propose a new highway network architecture for computing the matching cost at each possible disparity, based on multilevel weighted residual shortcuts, trained with a hybrid loss that supports multilevel comparison of image patches. A novel post-processing step is then introduced, which employs a second deep convolutional neural network for pooling global information from multiple disparities. This network outputs both the image disparity map, which replaces the conventional "winner takes all" strategy, and a confidence in the prediction. The confidence score is achieved by training the network with a new technique that we call the reflective loss. Lastly, the learned confidence is employed in order to better detect outliers in the refinement step. The proposed pipeline achieves state of the art accuracy on the largest and most competitive stereo benchmarks, and the learned confidence is shown to outperform all existing alternatives.

120 citations


Journal ArticleDOI
TL;DR: The multi-band template analysis (MBTA) pipeline as discussed by the authors is a low-latency coincident analysis pipeline for the detection of gravitational waves (GWs) from compact binary coalescences.
Abstract: The multi-band template analysis (MBTA) pipeline is a low-latency coincident analysis pipeline for the detection of gravitational waves (GWs) from compact binary coalescences. MBTA runs with a low computational cost, and can identify candidate GW events online with a sub-minute latency. The low computational running cost of MBTA also makes it useful for data quality studies. Events detected by MBTA online can be used to alert astronomical partners for electromagnetic follow-up. We outline the current status of MBTA and give details of recent pipeline upgrades and validation tests that were performed in preparation for the first advanced detector observing period. The MBTA pipeline is ready for the outset of the advanced detector era and the exciting prospects it will bring.

100 citations


Journal ArticleDOI
01 May 2016
TL;DR: A pipeline of algorithms is presented that decomposes a given polygon model into parts such that each part can be 3D printed with high (outer) surface quality due to the fact that most 3D printing technologies have an anisotropic resolution.
Abstract: We present a pipeline of algorithms that decomposes a given polygon model into parts such that each part can be 3D printed with high (outer) surface quality. For this we exploit the fact that most 3D printing technologies have an anisotropic resolution and hence the surface smoothness varies significantly with the orientation of the surface. Our pipeline starts by segmenting the input surface into patches such that their normals can be aligned perpendicularly to the printing direction. A 3D Voronoi diagram is computed such that the intersections of the Voronoi cells with the surface approximate these surface patches. The intersections of the Voronoi cells with the input model's volume then provide an initial decomposition. We further present an algorithm to compute an assembly order for the parts and generate connectors between them. A post processing step further optimizes the seams between segments to improve the visual quality. We run our pipeline on a wide range of 3D models and experimentally evaluate the obtained improvements in terms of numerical, visual, and haptic quality.

90 citations


Proceedings ArticleDOI
19 Aug 2016
TL;DR: Two algorithm optimizations for a distributed cloud-based encoding pipeline are described, including per-title complexity analysis for bitrate-resolution selection and per-chunk bitrate control for consistent-quality encoding, which result in more efficient bandwidth usage and more consistent video quality.
Abstract: A cloud-based encoding pipeline which generates streams for video-on-demand distribution typically processes a wide diversity of content that exhibit varying signal characteristics. To produce the best quality video streams, the system needs to adapt the encoding to each piece of content, in an automated and scalable way. In this paper, we describe two algorithm optimizations for a distributed cloud-based encoding pipeline: (i) per-title complexity analysis for bitrate-resolution selection; and (ii) per-chunk bitrate control for consistent-quality encoding. These improvements result in a number of advantages over a simple “one-size-fits-all” encoding system, including more efficient bandwidth usage and more consistent video quality.

Proceedings ArticleDOI
21 Feb 2016
TL;DR: This paper describes architectural enhancements in the Altera Stratix?
Abstract: This paper describes architectural enhancements in the Altera Stratix? 10 HyperFlex? FPGA architecture, fabricated in the Intel 14nm FinFET process. Stratix 10 includes ubiquitous flip-flops in the routing to enable a high degree of pipelining. In contrast to the earlier architectural exploration of pipelining in pass-transistor based architectures, the direct drive routing fabric in Stratix-style FPGAs enables an extremely low-cost pipeline register. The presence of ubiquitous flip-flops simplifies circuit retiming and improves performance. The availability of predictable retiming affects all stages of the cluster, place and route flow. Ubiquitous flip-flops require a low-cost clock network with sufficient flexibility to enable pipelining of dozens of clock domains. Different cost/performance tradeoffs in a pipelined fabric and use of a 14nm process, lead to other modifications to the routing fabric and the logic element. User modification of the design enables even higher performance, averaging 2.3X faster in a small set of designs.

Journal ArticleDOI
TL;DR: In this article, an integrated Finite Element Method (FEM) model is proposed to investigate the dynamic seabed response for several specific pipeline layouts and to simulate the pipeline stability under waves loading.

Journal ArticleDOI
TL;DR: Measurements of the sensitivity of the pipeline used to generate the Q1-Q17 DR24 planet candidate catalog find a strong period dependence in the measured detection efficiency, with longer (>40 day) periods having a significantly lower detectability than shorter periods.
Abstract: With each new version of the Kepler pipeline and resulting planet candidate catalog, an updated measurement of the underlying planet population can only be recovered with a corresponding measurement of the Kepler pipeline detection efficiency. Here we present measurements of the sensitivity of the pipeline (version 9.2) used to generate the Q1–Q17 DR24 planet candidate catalog. We measure this by injecting simulated transiting planets into the pixel-level data of 159,013 targets across the entire Kepler focal plane, and examining the recovery rate. Unlike previous versions of the Kepler pipeline, we find a strong period dependence in the measured detection efficiency, with longer (>40 day) periods having a significantly lower detectability than shorter periods, introduced in part by an incorrectly implemented veto. Consequently, the sensitivity of the 9.2 pipeline cannot be cast as a simple one-dimensional function of the signal strength of the candidate planet signal, as was possible for previous versions of the pipeline. We report on the implications for occurrence rate calculations based on the Q1–Q17 DR24 planet candidate catalog, and offer important caveats and recommendations for performing such calculations. As before, we make available the entire table of injected planet parameters and whether they were recovered by the pipeline, enabling readers to derive the pipeline detection sensitivity in the planet and/or stellar parameter space of their choice.

Journal ArticleDOI
TL;DR: This paper presents the realtime image subtraction pipeline in the intermediate Palomar Transient Factory, using high-performance computing, efficient database, and machine learning algorithms to reliably deliver transient candidates within ten minutes of images being taken.
Abstract: A fast-turnaround pipeline for realtime data reduction plays an essential role in discovering and permitting follow-up observations to young supernovae and fast-evolving transients in modern time-domain surveys. In this paper, we present the realtime image subtraction pipeline in the intermediate Palomar Transient Factory. By using high-performance computing, efficient databases, and machine-learning algorithms, this pipeline manages to reliably deliver transient candidates within 10 minutes of images being taken. Our experience in using high-performance computing resources to process big data in astronomy serves as a trailblazer to dealing with data from large-scale time-domain facilities in the near future.

Journal ArticleDOI
TL;DR: Harmonica-a framework of heterogeneous computing system enhanced by memristor-based neuromorphic computing accelerators (NCAs) is presented, which are superior to the designs with either digital neural processing units (D-NPUs) or MBC arrays cooperating with a digital interconnection network.
Abstract: Following technology scaling, on-chip heterogeneous architecture emerges as a promising solution to combat the power wall of microprocessors. This work presents Harmonica —aframework of heterogeneous computing system enhanced by memristor-based neuromorphic computing accelerators (NCAs). In Harmonica, a conventional pipeline is augmented with a NCA which is designed to speedup artificial neural network (ANN) relevant executions by leveraging the extremely efficient mixed-signal computation capability of nanoscale memristor-based crossbar (MBC) arrays. With the help of a mixed-signal interconnection network (M-Net), the hierarchically arranged MBC arrays can accelerate the computation of a variety of ANNs. Moreover, an inline calibration scheme is proposed to ensure the computation accuracy degradation incurred by the memristor resistance shifting within an acceptable range during NCA executions. Compared to general-purpose processor, Harmonica can achieve on average 27.06 $\times$ performance speedup and 25.23 $\times$ energy savings when the NCA is configured with auto-associative memory (AAM) implementation. If the NCA is configured with multilayer perception (MLP) implementation, the performance speedup and energy savings can be boosted to 178.41 $\times$ and 184.24 $\times$ , respectively, with slightly degraded computation accuracy. Moreover, the performance and power efficiency of Harmonica are superior to the designs with either digital neural processing units (D-NPUs) or MBC arrays cooperating with a digital interconnection network. Compared to the baseline of general-purpose processor, the classification rate degradation of Harmonica in MLP or AAM is less than 8% or 4%, respectively.

Posted Content
TL;DR: A novel, equalization-based soft-output data-detection algorithm and corresponding reference FPGA designs for wideband massive MU-MIMO systems that use orthogonal frequency-division multiplexing (OFDM).
Abstract: Data detection in massive multi-user (MU) multiple-input multiple-output (MIMO) wireless systems is among the most critical tasks due to the excessively high implementation complexity. In this paper, we propose a novel, equalization-based soft-output data-detection algorithm and corresponding reference FPGA designs for wideband massive MU-MIMO systems that use orthogonal frequency-division multiplexing (OFDM). Our data-detection algorithm performs approximate minimum mean-square error (MMSE) or box-constrained equalization using coordinate descent. We deploy a variety of algorithm-level optimizations that enable near-optimal error-rate performance at low implementation complexity, even for systems with hundreds of base-station (BS) antennas and thousands of subcarriers. We design a parallel VLSI architecture that uses pipeline interleaving and can be parametrized at design time to support various antenna configurations. We develop reference FPGA designs for massive MU-MIMO-OFDM systems and provide an extensive comparison to existing designs in terms of implementation complexity, throughput, and error-rate performance. For a 128 BS antenna, 8 user massive MU-MIMO-OFDM system, our FPGA design outperforms the next-best implementation by more than 2.6x in terms of throughput per FPGA look-up tables.

Patent
07 Jul 2016
TL;DR: In this paper, the first binary sequence is defined by least significant bit (LSB) outputs from the ADC and the second binary sequence of bits is defined as a truly random unbiased sequence with an equal probability of 1 and 0.
Abstract: A radar sensing system for a vehicle includes transmit and receive pipelines. The transmit pipeline includes transmitters able to transmit radio signals. The receive pipeline includes receivers able to receive signals. The received signals are transmitted signals that are reflected from an object. The transmit pipeline phase modulates the signals before transmission, as defined by a first binary sequence. The receive pipeline comprises an analog to digital converter (ADC) for sampling the received signals. The transmit pipeline includes a pseudorandom binary sequence (PRBS) generator for outputting a second binary sequence of bits with an equal probability of 1 and 0. The first binary sequence is defined by least significant bit (LSB) outputs from the ADC and the second binary sequence of bits. The first binary sequence comprises a truly random unbiased sequence of bits with an equal probability of 1 and 0.

Journal ArticleDOI
TL;DR: In this article, a detailed technical and economic assessment of conditioning and transporting 13.1 MTPA CO 2 with impurities in an on-shore pipe-line over a distance of 500 kilometres is presented.


Proceedings ArticleDOI
15 Oct 2016
TL;DR: This paper describes a 1GHz 32-character-wide HARE design targeting ASIC implementation that processes data at 32 GB/s - matching modern memory bandwidths and demonstrates a scaled-down FPGA proof-of-concept that operates at 100MHz with 4-wide parallelism (400 MB/s).
Abstract: Rapidly processing text data is critical for many technical and business applications. Traditional software-based tools for processing large text corpora use memory bandwidth inefficiently due to software overheads and thus fall far short of peak scan rates possible on modern memory systems. Prior hardware designs generally target I/O rather than memory bandwidth. In this paper, we present HARE, a hardware accelerator for matching regular expressions against large in-memory logs. HARE comprises a stall-free hardware pipeline that scans input data at a fixed rate, examining multiple characters from a single input stream in parallel in a single accelerator clock cycle. We describe a 1GHz 32-character-wide HARE design targeting ASIC implementation that processes data at 32 GB/s — matching modern memory bandwidths. This ASIC design outperforms software solutions by as much as two orders of magnitude. We further demonstrate a scaled-down FPGA proof-of-concept that operates at 100MHz with 4-wide parallelism (400 MB/s). Even at this reduced rate, the prototype outperforms grep by 1.5–20x on commonly used regular expressions.

Journal ArticleDOI
TL;DR: In this paper, an equalization-based soft-output data-detection algorithm and corresponding reference FPGA designs for wideband massive MU-MIMO systems that use orthogonal frequency division multiplexing (OFDM) were proposed.
Abstract: Data detection in massive multi-user (MU) multiple-input multiple-output (MIMO) wireless systems is among the most critical tasks due to the excessively high implementation complexity. In this paper, we propose a novel, equalization-based soft-output data-detection algorithm and corresponding reference FPGA designs for wideband massive MU-MIMO systems that use orthogonal frequency-division multiplexing (OFDM). Our data-detection algorithm performs approximate minimum mean-square error (MMSE) or box-constrained equalization using coordinate descent. We deploy a variety of algorithm-level optimizations that enable near-optimal error-rate performance at low implementation complexity, even for systems with hundreds of base-station (BS) antennas and thousands of subcarriers. We design a parallel VLSI architecture that uses pipeline interleaving and can be parametrized at design time to support various antenna configurations. We develop reference FPGA designs for massive MU-MIMO-OFDM systems and provide an extensive comparison to existing designs in terms of implementation complexity, throughput, and error-rate performance. For a 128 BS antenna, 8-user massive MU-MIMO-OFDM system, our FPGA design outperforms the next-best implementation by more than $2.6 \times $ in terms of throughput per FPGA look-up tables.

Journal ArticleDOI
TL;DR: A dedicated medical pre-processing pipeline aimed at taking on many problems and opportunities contained within EMR data, such as their temporal, inaccurate and incomplete nature is proposed, which has great potential to enhance disease prediction, and hence early detection and intervention in medical practice.

Journal ArticleDOI
TL;DR: The objective of this research work is to design, optimize, and model FPGA implementation of the HIGHT cipher, and shows that the scalar designs have smaller area and power dissipation, whereas the pipeline designs have higher throughput and lower energy.
Abstract: The growth of low-resource devices has increased rapidly in recent years. Communication in such devices presents two challenges: security and resource limitation. Lightweight ciphers, such as HIGHT cipher, are encryption algorithms targeted for low resource systems. Designing lightweight ciphers in reconfigurable platform e.g., field-programmable gate array provides speedup as well as flexibility. The HIGHT cipher consists of simple operations and provides adequate security level. The objective of this research work is to design, optimize, and model FPGA implementation of the HIGHT cipher. Several optimized designs are presented to minimize the required hardware resources and energy including the scalar and pipeline ones. Our analysis shows that the scalar designs have smaller area and power dissipation, whereas the pipeline designs have higher throughput and lower energy. Because of the fact that obtaining the best performance out of any implemented design mainly requires balancing the design area and energy, our experimental results demonstrate that it is possible to obtain such optimal performance using the pipeline design with two and four rounds per stage as well as with the scalar design with one and eight rounds. Comparing the best implementations of pipeline and scalar designs, the scalar design requires 18% less resources and 10% less power, while the pipeline design has 18 times higher throughput and 60% less energy consumption. Copyright © 2016 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: The three-stage pipelined architecture is shown to have the best performance, which achieves a scalar multiplication over GF(2163) in 6.1 μs using 7354 Slices on Virtex-4.
Abstract: This paper proposes an efficient pipelined architecture of elliptic curve scalar multiplication (ECSM) over GF( ${2}^{m}$ ). The architecture uses a bit-parallel finite-field (FF) multiplier accumulator (MAC) based on the Karatsuba-Ofman algorithm. The Montgomery ladder algorithm is modified for better sharing of execution paths. The data path in the architecture is well designed, so that the critical path contains few extra logic primitives apart from the FF MAC. In order to find the optimal number of pipeline stages, scheduling schemes with different pipeline stages are proposed and the ideal placement of pipeline registers is thoroughly analyzed. We implement ECSM over the five binary fields recommended by the National Institute of Standard and Technology on Xilinx Virtex-4 and Virtex-5 field-programmable gate arrays. The three-stage pipelined architecture is shown to have the best performance, which achieves a scalar multiplication over GF( ${2^{163}}$ ) in 6.1 $\mu \text{s}$ using 7354 Slices on Virtex-4. Using Virtex-5, the scalar multiplication for ${m} = 163$ , 233, 283, 409, and 571 can be achieved in 4.6, 7.9, 10.9, 19.4, and 36.5 $\mu \text{s}$ , respectively, which are faster than previous results.

Journal ArticleDOI
TL;DR: A novel GPU‐based approach to robustly and efficiently simulate high‐resolution and complexly layered cloth using a parallelized matrix assembly algorithm that can quickly build a large and sparse matrix in a compressed format and accurately solve linear systems on GPUs.
Abstract: We present a novel GPU-based approach to robustly and efficiently simulate high-resolution and complexly layered cloth. The key component of our formulation is a parallelized matrix assembly algorithm that can quickly build a large and sparse matrix in a compressed format and accurately solve linear systems on GPUs. We also present a fast and integrated solution for parallel collision handling, including collision detection and response computations, which utilizes spatio-temporal coherence. We combine these algorithms as part of a new cloth simulation pipeline that incorporates contact forces into implicit time integration for collision avoidance. The entire pipeline is implemented on GPUs, and we evaluate its performance on complex benchmarks consisting of 100 -- 300K triangles. In practice, our system takes a few seconds to simulate one frame of a complex cloth scene, which represents significant speedups over prior CPU and GPU-based cloth simulation systems.

Journal ArticleDOI
01 Oct 2016
TL;DR: An automated pipeline for low-resolution structure refinement (LORESTR) has been developed to assist in the hassle-free refinement of difficult cases, automates the selection of high-resolution homologues for external restraint generation and optimizes the parameters for ProSMART and REFMAC5.
Abstract: Since the ratio of the number of observations to adjustable parameters is small at low resolution, it is necessary to use complementary information for the analysis of such data. ProSMART is a program that can generate restraints for macromolecules using homologous structures, as well as generic restraints for the stabilization of secondary structures. These restraints are used by REFMAC5 to stabilize the refinement of an atomic model. However, the optimal refinement protocol varies from case to case, and it is not always obvious how to select appropriate homologous structure(s), or other sources of prior information, for restraint generation. After running extensive tests on a large data set of low-resolution models, the best-performing refinement protocols and strategies for the selection of homologous structures have been identified. These strategies and protocols have been implemented in the Low-Resolution Structure Refinement (LORESTR) pipeline. The pipeline performs auto-detection of twinning and selects the optimal scaling method and solvent parameters. LORESTR can either use user-supplied homologous structures, or run an automated BLAST search and download homologues from the PDB. The pipeline executes multiple model-refinement instances using different parameters in order to find the best protocol. Tests show that the automated pipeline improves R factors, geometry and Ramachandran statistics for 94% of the low-resolution cases from the PDB included in the test set.

Journal ArticleDOI
22 Aug 2016
TL;DR: An FPGA-based SQL query processing approach exploiting the capabilities of partial dynamic reconfiguration of modern FPGAs and a performance analysis is introduced that is able to estimate the processing time of a query for different processing strategies and different communication and processing architecture configurations.
Abstract: In this article, we propose an FPGA-based SQL query processing approach exploiting the capabilities of partial dynamic reconfiguration of modern FPGAs. After the analysis of an incoming query, a query-specific hardware processing unit is generated on the fly and loaded on the FPGA for immediate query execution. For each query, a specialized hardware accelerator pipeline is composed and configured on the FPGA from a set of presynthesized hardware modules. These partially reconfigurable hardware modules are gathered in a library covering all major SQL operations like restrictions and aggregations, as well as more complex operations such as joins and sorts. Moreover, this holistic query processing approach in hardware supports different data processing strategies including row- as column-wise data processing in order to optimize data communication and processing. This article gives an overview of the proposed query processing methodology and the corresponding library of modules. Additionally, a performance analysis is introduced that is able to estimate the processing time of a query for different processing strategies and different communication and processing architecture configurations. With the help of this performance analysis, architectural bottlenecks may be exposed and future optimized architectures, besides the two prototypes presented here, may be determined.

Journal ArticleDOI
TL;DR: In this paper, a simple analytical method of burst pressure calculation for a straight pipeline repaired with a composite sleeve was investigated, and the Monte Carlo method was selected for estimation of pipeline failure probability and cumulative failure probability due to the external corrosion considering fluid pressure fluctuations in dynamic flow effects with respect to statistical distribution of input parameters.

Journal ArticleDOI
TL;DR: A high throughput memory efficient pipelining architecture for Fast Efficient Set Partitioning in Hierarchical Trees (SPIHT) image compression system is explained and maximum PSNR value, CR is attained and very high accurate image after decompression is produced.
Abstract: In this research paper, a high throughput memory efficient pipelining architecture for Fast Efficient Set Partitioning in Hierarchical Trees (SPIHT) image compression system is explained. The main aim of this paper is to compress and implement the image without any loss of information. So, we are using spatial oriented tree approach in Fast Efficient SPIHT algorithm for compression and Spartan 3 EDK kit for hardware implementation analysis purpose. Integer wavelet transform is used for encoding and decoding process in SPIHT algorithm. Here, we are using pipelining architecture to implement it in FPGA kit because pipeline architecture is more suitable for hardware utility purpose. Generally an image file will occupy more amount of space. In order to reduce the memory size no loss during transmission we are using this approach. By this way we are attained maximum PSNR value, CR and also produced very high accurate image after decompression, when compared with the results of other previous algorithms. In this module, the hardware tools used are dual core processor and FPGA Spartan 3 EDK kit and the software tool windows 8 operating system and the tool kit is MATLAB 7.8.