scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Whole Genome Comparison on a Network of Workstations

TL;DR: It is shown that on somewhat outdated hardware the authors can achieve speeds upwards of 8000 MCUPS; one of the fastest implementations of the Smith-Waterman algorithm.
Abstract: Whole genome comparison consists of comparing or aligning genome sequences with a goal of finding similarities between them. Previously we have shown how SIMD extensions used in Intel processors can be used to efficiently implement the, genome comparing, Smith-Waterman algorithm. Here we present distributed version of that algorithm. We show that on somewhat outdated hardware we can achieve speeds upwards of 8000 MCUPS; one of the fastest implementations of the Smith-Waterman algorithm.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
23 May 2009
TL;DR: This paper evaluates the performance and sensibility of z-align, a parallel exact strategy that runs in user-restricted memory space and shows that two sequences of size 23MBP and 24MBP were successfully aligned with z- align.
Abstract: Pairwise Sequence Alignment is a basic operation in Bioinformatics that is performed thousands of times, in a daily basis. The exact methods proposed in the literature have quadratic time complexity. For this reason, heuristic methods such as BLAST are widely used. Nevertheless, it is known that exact methods present better sensitivity, leading to better results. To obtain exact results faster, many parallel strategies have been proposed but most of them fail to align huge biological sequences. This happens because not only the quadratic time must be considered but also the space should be reduced. In this paper, we evaluate the performance and sensibility of z-align, a parallel exact strategy that runs in user-restricted memory space. The results obtained in a 64-processor cluster show that two sequences of size 23MBP (Mega Base Pairs) and 24MBP, respectively, were successfully aligned with z-align. Also, in order to align two 3MBP sequences, a speedup of 34.35 was achieved. Finally, when comparing z-align with BLAST, we can see that the z-align alignments are longer and have a higher score.

10 citations

Proceedings Article
01 Jan 2012
TL;DR: Experimental results and a theoretical performance model of the hybrid implementations of the Commentz-Walter, Wu-Manber, Set Backward Oracle Matching and the Salmela-TarhioKytojoki family of multiple pattern matching algorithms when executed in parallel on biological sequence databases are presented.
Abstract: Multiple pattern matching is widely used in computational biology to locate any number of nucleotides in genome databases. Processing data of this size often requires more computing power than a sequential computer can provide. A viable and cost-effective solution that can offer the power required by computationally intensive applications at low cost is to share computational tasks among the processing nodes of a high performance hybrid distributed and shared memory platform that consists of cluster workstations and multi-core processors. This paper presents experimental results and a theoretical performance model of the hybrid implementations of the Commentz-Walter, Wu-Manber, Set Backward Oracle Matching and the Salmela-TarhioKytojoki family of multiple pattern matching algorithms when executed in parallel on biological sequence databases.

6 citations


Cites background from "Whole Genome Comparison on a Networ..."

  • ...…alignment have been presented in the research literature for distributed memory platforms (Li, 2003), (Li and Chen, 2005), (Boukerche et al., 2007), (Jacob et al., 2007) and for shared memory platforms (Cuvillo et al., 2003), (Chaichoompu et al., 2006), (Rashid et al., 2007), (Zomaya, 2006)....

    [...]

  • ..., 2007), (Jacob et al., 2007) and for shared memory platforms (Cuvillo et al....

    [...]

Journal ArticleDOI
TL;DR: Evaluation of the performance of Z-align, a parallel exact strategy that runs in user-restricted memory space, and the evaluation of the work distribution mechanism shows that the execution times can be sensibly reduced when appropriate parameters are chosen.
Abstract: Sequence Alignment is a basic operation in Bioinformatics that is performed thousands of times, on daily basis. The exact methods for pairwise alignment have quadratic time complexity. For this reason, heuristic methods such as BLAST are widely used. To obtain exact results faster, parallel strategies have been proposed but most of them fail to align huge biological sequences. This happens because not only the quadratic time must be considered but also the space should be reduced. In this paper, we evaluate the performance of Z-align, a parallel exact strategy that runs in user-restricted memory space. Also, we propose and evaluate a tunable work distribution mechanism. The results obtained in two clusters show that two sequences of size 24MBP (Mega Base Pairs) and 23MBP, respectively, were successfully aligned with Z-align. Also, in order to align two 3MBP sequences, a speedup of 34.35 was achieved for 64 processors. The evaluation of our work distribution mechanism shows that the execution times can be sensibly reduced when appropriate parameters are chosen. Finally, when comparing Z-align with BLAST, it is clear that, in many cases, Z-align is able to produce alignments with higher score.

2 citations

Dissertation
01 Jan 2011
TL;DR: Two algorithms named Needleman-Wunsch and Smith-Waterman will be implemented on FPGA as spam detection engine and the corpus from Text Retrieval Conference will be used to test the effectiveness of the anti-spam engines.
Abstract: Spam have been a significant problem as it consumes bandwidth of the internet, waste surfers time, waste computational resources of internet service providers and reduce the efficiency of email as a way of communication. Despite various anti spam solutions introduced, spam mails tend to be able to avoid detection by slightly modifying their spam signature. This helps to avoid anti-spam solutions from successfully detecting the keywords in emails that are closely associated with spam. Two algorithms named Needleman-Wunsch and Smith-Waterman will be implemented on FPGA as spam detection engine. Both algorithms share its origin from the theory of dynamic programming and are normally implemented in bioinformatics for sequence alignment. As both are well-known for their ability to detect sequences with slight changes caused by mutation, these two algorithms will be used to detect spam messages that slightly change its spam keyword. FPGA have been selected as the device for implementation. As hardware are faster than software, using FPGA helps to reduce the scanning time and reduce the CPU load of the computer. The advancement of FPGA technologies help to make it capable of becoming a standalone scanning unit. The effectiveness of both algorithms in spam scanning will be looked into. The corpus from Text Retrieval Conference (TREC 2007) will be used to test the effectiveness of the anti-spam engines.

2 citations

References
More filters
Journal ArticleDOI
TL;DR: A computer adaptable method for finding similarities in the amino acid sequences of two proteins has been developed and it is possible to determine whether significant homology exists between the proteins to trace their possible evolutionary development.

11,844 citations


"Whole Genome Comparison on a Networ..." refers background in this paper

  • ...Initially, Needleman and Wunsch [ 5 ] and Sellers [6] introduced the global alignment algorithm based on the dynamic programming approach....

    [...]

Journal ArticleDOI
TL;DR: This letter extends the heuristic homology algorithm of Needleman & Wunsch (1970) to find a pair of segments, one from each of two long sequences, such that there is no other Pair of segments with greater similarity (homology).

10,262 citations


"Whole Genome Comparison on a Networ..." refers background in this paper

  • ...Their goal is to identify closely related genomic sequences assuming that high degree of similarity in genome sequences may imply similarity of functional or structural characteristics....

    [...]

Journal ArticleDOI
TL;DR: The algorithm of Waterman et al. (1976) for matching biological sequences was modified under some limitations to be accomplished in essentially MN steps, instead of the M 2 N steps necessary in the original algorithm.

1,760 citations


"Whole Genome Comparison on a Networ..." refers background in this paper

  • ...In computational practice, an alignment score between the query sequence and sequences in a database are calculated to assess similarity....

    [...]

Journal ArticleDOI
TL;DR: The goal of this paper is to give Hirschberg's idea the visibility it deserves by developing a linear-space version of Gotoh's algorithm, which accommodates affine gap penalties.
Abstract: Space, not time, is often the limiting factor when computing optimal sequence alignments, and a number of recent papers in the biology literature have proposed space-saving strategies. However, a 1975 computer science paper by Hirschberg presented a method that is superior to the new proposals, both in theory and in practice. The goal of this paper is to give Hirschberg's idea the visibility it deserves by developing a linear-space version ofGotoh's algorithm, which accommodates affine gap penalties. A portable C-software package implementing this algorithm is available on the BIONET free of charge.

1,513 citations

Journal ArticleDOI
TL;DR: The algorithm, introduced here, lends itself to computer programming and provides a method to compute evolutionary distance which is shorter than the other methods currently in use.
Abstract: This paper gives a formal definition of the biological concept of evolutionary distance and an algorithm to compute it. For any set S of finite sequences of varying lengths this distance is a real-valued function on $S \times S$, and it is shown to be a metric under conditions which are wide enough to include the biological application. The algorithm, introduced here, lends itself to computer programming and provides a method to compute evolutionary distance which is shorter than the other methods currently in use.

523 citations


"Whole Genome Comparison on a Networ..." refers methods in this paper

  • ...Genome sequence similarity searches are often utilized in Computational Biology....

    [...]