Whole Genome Comparison on a Network of Workstations

doi:10.1109/ISPDC.2007.58

Home
/
Papers
/
Whole Genome Comparison on a Network of Workstations

Proceedings Article•DOI•

Whole Genome Comparison on a Network of Workstations

A. Jacob¹, Sugata Sanyal², Marcin Paprzycki, Rajan Arora³, M. Ganzha - Show less +1 more•Institutions (3)

VIT University¹, Tata Institute of Fundamental Research², Indian Institutes of Information Technology³

05 Jul 2007-pp 7-7

TL;DR: It is shown that on somewhat outdated hardware the authors can achieve speeds upwards of 8000 MCUPS; one of the fastest implementations of the Smith-Waterman algorithm.

read less

Abstract: Whole genome comparison consists of comparing or aligning genome sequences with a goal of finding similarities between them. Previously we have shown how SIMD extensions used in Intel processors can be used to efficiently implement the, genome comparing, Smith-Waterman algorithm. Here we present distributed version of that algorithm. We show that on somewhat outdated hardware we can achieve speeds upwards of 8000 MCUPS; one of the fastest implementations of the Smith-Waterman algorithm.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•

Toward a conceptual and operational union of bacterial systematics, ecology, and evolution

[...]

Frederick M. Cohan

01 Jan 2006-Proceedings of The Royal Society B: Biological Sciences

17 citations

Proceedings Article•DOI•

Exact pairwise alignment of megabase genome biological sequences using a novel z-align parallel strategy

[...]

Azzedine Boukerche¹, Rodolfo Bezerra Batista², Alba Cristina Magalhaes Alves de Melo¹•Institutions (2)

University of Ottawa¹, University of Brasília²

23 May 2009

TL;DR: This paper evaluates the performance and sensibility of z-align, a parallel exact strategy that runs in user-restricted memory space and shows that two sequences of size 23MBP and 24MBP were successfully aligned with z- align.

...read moreread less

Abstract: Pairwise Sequence Alignment is a basic operation in Bioinformatics that is performed thousands of times, in a daily basis. The exact methods proposed in the literature have quadratic time complexity. For this reason, heuristic methods such as BLAST are widely used. Nevertheless, it is known that exact methods present better sensitivity, leading to better results. To obtain exact results faster, many parallel strategies have been proposed but most of them fail to align huge biological sequences. This happens because not only the quadratic time must be considered but also the space should be reduced. In this paper, we evaluate the performance and sensibility of z-align, a parallel exact strategy that runs in user-restricted memory space. The results obtained in a 64-processor cluster show that two sequences of size 23MBP (Mega Base Pairs) and 24MBP, respectively, were successfully aligned with z-align. Also, in order to align two 3MBP sequences, a speedup of 34.35 was achieved. Finally, when comparing z-align with BLAST, we can see that the z-align alignments are longer and have a higher score.

...read moreread less

10 citations

Proceedings Article•

Performance study of parallel hybrid multiple pattern matching algorithms for biological sequences

[...]

Charalampos S. Kouzinopoulos¹, Panagiotis D. Michailidis², Konstantinos G. Margaritis¹•Institutions (2)

University of Macedonia¹, University of Western Macedonia²

01 Jan 2012

TL;DR: Experimental results and a theoretical performance model of the hybrid implementations of the Commentz-Walter, Wu-Manber, Set Backward Oracle Matching and the Salmela-TarhioKytojoki family of multiple pattern matching algorithms when executed in parallel on biological sequence databases are presented.

...read moreread less

Abstract: Multiple pattern matching is widely used in computational biology to locate any number of nucleotides in genome databases. Processing data of this size often requires more computing power than a sequential computer can provide. A viable and cost-effective solution that can offer the power required by computationally intensive applications at low cost is to share computational tasks among the processing nodes of a high performance hybrid distributed and shared memory platform that consists of cluster workstations and multi-core processors. This paper presents experimental results and a theoretical performance model of the hybrid implementations of the Commentz-Walter, Wu-Manber, Set Backward Oracle Matching and the Salmela-TarhioKytojoki family of multiple pattern matching algorithms when executed in parallel on biological sequence databases.

...read moreread less

6 citations

Cites background from "Whole Genome Comparison on a Networ..."

...…alignment have been presented in the research literature for distributed memory platforms (Li, 2003), (Li and Chen, 2005), (Boukerche et al., 2007), (Jacob et al., 2007) and for shared memory platforms (Cuvillo et al., 2003), (Chaichoompu et al., 2006), (Rashid et al., 2007), (Zomaya, 2006)....
[...]
..., 2007), (Jacob et al., 2007) and for shared memory platforms (Cuvillo et al....
[...]

Journal Article•DOI•

Exact parallel alignment of megabase genomic sequences with tunable work distribution

[...]

Azzedine Boukerche¹, Rodolfo Bezerra Batista², Alba Cristina Magalhaes Alves de Melo², Felipe Brandt Scarel², Lavir Antonio Bahia Carvalho De Souza² - Show less +1 more•Institutions (2)

University of Ottawa¹, University of Brasília²

06 Apr 2012-International Journal of Foundations of Computer Science

TL;DR: Evaluation of the performance of Z-align, a parallel exact strategy that runs in user-restricted memory space, and the evaluation of the work distribution mechanism shows that the execution times can be sensibly reduced when appropriate parameters are chosen.

...read moreread less

Abstract: Sequence Alignment is a basic operation in Bioinformatics that is performed thousands of times, on daily basis. The exact methods for pairwise alignment have quadratic time complexity. For this reason, heuristic methods such as BLAST are widely used. To obtain exact results faster, parallel strategies have been proposed but most of them fail to align huge biological sequences. This happens because not only the quadratic time must be considered but also the space should be reduced. In this paper, we evaluate the performance of Z-align, a parallel exact strategy that runs in user-restricted memory space. Also, we propose and evaluate a tunable work distribution mechanism. The results obtained in two clusters show that two sequences of size 24MBP (Mega Base Pairs) and 23MBP, respectively, were successfully aligned with Z-align. Also, in order to align two 3MBP sequences, a speedup of 34.35 was achieved for 64 processors. The evaluation of our work distribution mechanism shows that the execution times can be sensibly reduced when appropriate parameters are chosen. Finally, when comparing Z-align with BLAST, it is clear that, in many cases, Z-align is able to produce alignments with higher score.

...read moreread less

2 citations

Dissertation•

Needleman-Wunsch and smith-waterman implementation for Spam/Uce inline filter

[...]

Ming Thong Chiew

01 Jan 2011

TL;DR: Two algorithms named Needleman-Wunsch and Smith-Waterman will be implemented on FPGA as spam detection engine and the corpus from Text Retrieval Conference will be used to test the effectiveness of the anti-spam engines.

...read moreread less

Abstract: Spam have been a significant problem as it consumes bandwidth of the internet, waste surfers time, waste computational resources of internet service providers and reduce the efficiency of email as a way of communication. Despite various anti spam solutions introduced, spam mails tend to be able to avoid detection by slightly modifying their spam signature. This helps to avoid anti-spam solutions from successfully detecting the keywords in emails that are closely associated with spam. Two algorithms named Needleman-Wunsch and Smith-Waterman will be implemented on FPGA as spam detection engine. Both algorithms share its origin from the theory of dynamic programming and are normally implemented in bioinformatics for sequence alignment. As both are well-known for their ability to detect sequences with slight changes caused by mutation, these two algorithms will be used to detect spam messages that slightly change its spam keyword. FPGA have been selected as the device for implementation. As hardware are faster than software, using FPGA helps to reduce the scanning time and reduce the CPU load of the computer. The advancement of FPGA technologies help to make it capable of becoming a standalone scanning unit. The effectiveness of both algorithms in spam scanning will be looked into. The corpus from Text Retrieval Conference (TREC 2007) will be used to test the effectiveness of the anti-spam engines.

...read moreread less

2 citations

References

PDF

Open Access

More filters

Journal Article•DOI•

A general method applicable to the search for similarities in the amino acid sequence of two proteins

[...]

Saul B. Needleman¹, Christian D. Wunsch¹•Institutions (1)

Northwestern University¹

28 Mar 1970-Journal of Molecular Biology

TL;DR: A computer adaptable method for finding similarities in the amino acid sequences of two proteins has been developed and it is possible to determine whether significant homology exists between the proteins to trace their possible evolutionary development.

...read moreread less

11,844 citations

"Whole Genome Comparison on a Networ..." refers background in this paper

...Initially, Needleman and Wunsch [ 5 ] and Sellers [6] introduced the global alignment algorithm based on the dynamic programming approach....
[...]

Journal Article•DOI•

Identification of common molecular subsequences.

[...]

Temple F. Smith¹, Michael S. Waterman²•Institutions (2)

Northern Michigan University¹, Los Alamos National Laboratory²

25 Mar 1981-Journal of Molecular Biology

TL;DR: This letter extends the heuristic homology algorithm of Needleman & Wunsch (1970) to find a pair of segments, one from each of two long sequences, such that there is no other Pair of segments with greater similarity (homology).

...read moreread less

10,262 citations

"Whole Genome Comparison on a Networ..." refers background in this paper

...Their goal is to identify closely related genomic sequences assuming that high degree of similarity in genome sequences may imply similarity of functional or structural characteristics....
[...]

Journal Article•DOI•

An improved algorithm for matching biological sequences

[...]

Osamu Gotoh

15 Dec 1982-Journal of Molecular Biology

TL;DR: The algorithm of Waterman et al. (1976) for matching biological sequences was modified under some limitations to be accomplished in essentially MN steps, instead of the M 2 N steps necessary in the original algorithm.

...read moreread less

1,760 citations

"Whole Genome Comparison on a Networ..." refers background in this paper

...In computational practice, an alignment score between the query sequence and sequences in a database are calculated to assess similarity....
[...]

Journal Article•DOI•

Optimal alignments in linear space.

[...]

Eugene W. Myers¹, Webb Miller²•Institutions (2)

University of Arizona¹, Pennsylvania State University²

01 Mar 1988-Bioinformatics

TL;DR: The goal of this paper is to give Hirschberg's idea the visibility it deserves by developing a linear-space version of Gotoh's algorithm, which accommodates affine gap penalties.

...read moreread less

Abstract: Space, not time, is often the limiting factor when computing optimal sequence alignments, and a number of recent papers in the biology literature have proposed space-saving strategies. However, a 1975 computer science paper by Hirschberg presented a method that is superior to the new proposals, both in theory and in practice. The goal of this paper is to give Hirschberg's idea the visibility it deserves by developing a linear-space version ofGotoh's algorithm, which accommodates affine gap penalties. A portable C-software package implementing this algorithm is available on the BIONET free of charge.

...read moreread less

1,513 citations

Journal Article•DOI•

On the Theory and Computation of Evolutionary Distances

[...]

Peter H. Sellers

01 Jan 1974-Siam Journal on Applied Mathematics

TL;DR: The algorithm, introduced here, lends itself to computer programming and provides a method to compute evolutionary distance which is shorter than the other methods currently in use.

...read moreread less

Abstract: This paper gives a formal definition of the biological concept of evolutionary distance and an algorithm to compute it. For any set S of finite sequences of varying lengths this distance is a real-valued function on $S \times S$, and it is shown to be a metric under conditions which are wide enough to include the biological application. The algorithm, introduced here, lends itself to computer programming and provides a method to compute evolutionary distance which is shorter than the other methods currently in use.

...read moreread less

523 citations