Showing papers on "De Bruijn graph published in 2011"

PDF

Open Access

Journal Article•DOI•

How to apply de Bruijn graphs to genome assembly

[...]

Phillip E. C. Compeau¹, Pavel A. Pevzner¹, Glenn Tesler¹•Institutions (1)

01 Nov 2011-Nature Biotechnology

TL;DR: A mathematical concept known as a de Bruijn graph turns the formidable challenge of assembling a contiguous genome from billions of short sequencing reads into a tractable computational problem.

...read moreread less

Abstract: A mathematical concept known as a de Bruijn graph turns the formidable challenge of assembling a contiguous genome from billions of short sequencing reads into a tractable computational problem.

...read moreread less

623 citations

Proceedings Article•DOI•

MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads

[...]

Toshiaki Namiki¹, Tsuyoshi Hachiya¹, H. Tanaka¹, Yasubumi Sakakibara¹•Institutions (1)

Keio University¹

01 Aug 2011

TL;DR: MetaVelvet succeeded to generate higher N50 scores and smaller chimeric scaffolds than any compared single-genome assemblers, produce high-quality scaffolds as well as the separate assembly using Velvet from isolated species sequence reads, and MetaVelvet reconstructed even relatively low-coverage genome sequences as scaffolds.

...read moreread less

Abstract: Motivation:An important step of "metagenomics" analysis is the assembly of multiple genomes from mixed sequence reads of multiple species in a microbial community. Most conventional pipelines employ a single-genome assembler with carefully optimized parameters and post-process the resulting scaffolds to correct assembly errors. Limitations of the use of a single-genome assembler for de novo metagenome assembly are that highly conserved sequences shared between different species often causes chimera contigs, and sequences of highly abundant species are likely mis-identified as repeats in a single genome, resulting in a number of small fragmented scaffolds. The metagenome assembly problem becomes harder when assembling from very short sequence reads.Method:We modified and extended a single-genome and de Bruijn-graph based assembler, known as "Velvet" [27], for short reads to metagenome assembly, called "MetaVelvet", for mixed short reads of multiple species. Our fundamental ideas are first decomposing de Bruijn graph constructed from mixed short reads into individual sub-graphs and second building scaffolds based on every decomposed de Bruijn sub-graph as isolate species genome. We make use of two features, graph connectivity and coverage (abundance) difference, for the decomposition of de Bruijn graph.Results:On simulated datasets, MetaVelvet succeeded to generate higher N50 scores and smaller chimeric scaffolds than any compared single-genome assemblers, produce high-quality scaffolds as well as the separate assembly using Velvet from isolated species sequence reads, and MetaVelvet reconstructed even relatively low-coverage genome sequences as scaffolds. On a real dataset of Human Gut microbial read data, MetaVelvet produced longer scaffolds, increased the number of predicted genes, and improved the assignments of a phylum-level taxonomy in the sense that the rate of predicted genes that cannot be assigned to any tanoxomy is reduced.Availability:The source code of MetaVelvet is freely available at http://metavelvet.dna.bio.keio.ac.jp under the GNU General Public License.

...read moreread less

218 citations

Journal Article•DOI•

Paired de bruijn graphs: a novel approach for incorporating mate pair information into genome assemblers.

[...]

Paul Medvedev¹, Son Pham¹, Mark Chaisson², Glenn Tesler¹, Pavel A. Pevzner¹ - Show less +1 more•Institutions (2)

University of California, San Diego¹, Pacific Biosciences²

14 Oct 2011-Journal of Computational Biology

TL;DR: The paired de bruijn graph is introduced, a generalization of the de Bruijn graph that incorporates mate pair information into the graph structure itself instead of analyzing mate pairs at a post-processing step to effectively improve the contig sizes in assembly.

...read moreread less

Abstract: The recent proliferation of next generation sequencing with short reads has enabled many new experimental opportunities but, at the same time, has raised formidable computational challenges in genome assembly. One of the key advances that has led to an improvement in contig lengths has been mate pairs, which facilitate the assembly of repeating regions. Mate pairs have been algorithmically incorporated into most next generation assemblers as various heuristic post-processing steps to correct the assembly graph or to link contigs into scaffolds. Such methods have allowed the identification of longer contigs than would be possible with single reads; however, they can still fail to resolve complex repeats. Thus, improved methods for incorporating mate pairs will have a strong effect on contig length in the future. Here, we introduce the paired de Bruijn graph, a generalization of the de Bruijn graph that incorporates mate pair information into the graph structure itself instead of analyzing mate pairs at a post-processing step. This graph has the potential to be used in place of the de Bruijn graph in any de Bruijn graph based assembler, maintaining all other assembly steps such as error-correction and repeat resolution. Through assembly results on simulated perfect data, we argue that this can effectively improve the contig sizes in assembly.

...read moreread less

81 citations

Journal Article•DOI•

PE-Assembler

[...]

Pramila Nuwantha Ariyaratne¹, Wing-Kin Sung¹•Institutions (1)

National University of Singapore¹

01 Jan 2011-Bioinformatics

TL;DR: A method that eschews the traditional graph-based approach in favor of a simple 3' extension approach that has potential to be massively parallelized and able to obtain assemblies that are more contiguous, complete and less error prone compared with existing methods is presented.

...read moreread less

Abstract: Motivation: Many de novo genome assemblers have been proposed recently. The basis for most existing methods relies on the de bruijn graph: a complex graph structure that attempts to encompass the entire genome. Such graphs can be prohibitively large, may fail to capture subtle information and is difficult to be parallelized. Result: We present a method that eschews the traditional graph-based approach in favor of a simple 3′ extension approach that has potential to be massively parallelized. Our results show that it is able to obtain assemblies that are more contiguous, complete and less error prone compared with existing methods. Availability: The software package can be found at http://www.comp.nus.edu.sg/~bioinfo/peasm/. Alternatively it is available from authors upon request. Contact:[email protected]; [email protected] Supplementary information:Supplementary data are available at Bioinformatics online.

...read moreread less

47 citations

Book Chapter•DOI•

Self-stabilizing De Bruijn networks

[...]

Andréa W. Richa¹, Christian Scheideler², Phillip Stevens¹•Institutions (2)

Arizona State University¹, University of Paderborn²

10 Oct 2011

TL;DR: This paper presents a dynamic overlay network based on the De Bruijn graph which it is shown that there is a simple local-control algorithm that can recover the LDB network from any network topology that is weakly connected.

...read moreread less

Abstract: This paper presents a dynamic overlay network based on the De Bruijn graph which we call Linearized De Bruijn (LDB) network. The LDB network has the advantage that it has a guaranteed constant node degree and that the routing between any two nodes takes at most O(log n) hops with high probability. Also, we show that there is a simple local-control algorithm that can recover the LDB network from any network topology that is weakly connected.

...read moreread less

39 citations

Book Chapter•DOI•

Paired de bruijn graphs: a novel approach for incorporating mate pair information into genome assemblers

[...]

Paul Medvedev¹, Son Pham¹, Mark Chaisson², Glenn Tesler¹, Pavel A. Pevzner¹ - Show less +1 more•Institutions (2)

University of California, San Diego¹, Pacific Biosciences²

28 Mar 2011

TL;DR: The paired de bruijn graph is introduced, a generalization of the de Bruijn graph that incorporates mate pair information into the graph structure itself instead of analyzing mate pairs at a post-processing step, and it is argued that this can effectively improve the contig sizes in assembly.

...read moreread less

Abstract: The recent proliferation of next generation sequencing with short reads has enabled many new experimental opportunities but, at the same time, has raised formidable computational challenges in genome assembly. One of the key advances that has led to an improvement in contig lengths has been mate pairs, which facilitate the assembly of repeating regions. Mate pairs have been algorithmically incorporated into most next generation assemblers as various heuristic post-processing steps to correct the assembly graph or to link contigs into scaffolds. Such methods have allowed the identification of longer contigs than would be possible with single reads; however, they can still fail to resolve complex repeats. Thus, improved methods for incorporating mate pairs will have a strong effect on contig length in the future. Here, we introduce the paired de Bruijn graph, a generalization of the de Bruijn graph that incorporates mate pair information into the graph structure itself instead of analyzing mate pairs at a post-processing step. This graph has the potential to be used in place of the de Bruijn graph in any de Bruijn graph based assembler, maintaining all other assembly steps such as error-correction and repeat resolution. Through assembly results on simulated error-free data, we argue that this can effectively improve the contig sizes in assembly.

...read moreread less

37 citations

Journal Article•DOI•

On extending de Bruijn sequences

[...]

Verónica Becher¹, Pablo Ariel Heiber¹•Institutions (1)

Facultad de Ciencias Exactas y Naturales¹

01 Sep 2011-Information Processing Letters

TL;DR: A complete proof of the following theorem is given: Every de Bruijn sequence of order n in at least three symbols can be extended to a de Bru Netherlands sequence ofOrder n+1.

...read moreread less

32 citations

Journal Article•DOI•

A recursive construction of nonbinary de Bruijn sequences

[...]

Abbas Alhakim¹, Mufutau Akinwande²•Institutions (2)

American University of Beirut¹, Clarkson University²

01 Aug 2011-Designs, Codes and Cryptography

TL;DR: This method generalizes the Lempel construction of binary de Bruijn sequences as well as its efficient implementation by Annextein and obtains an exponentially large class of distinct de bruijn cycles.

...read moreread less

Abstract: This paper presents a method to find new de Bruijn sequences based on ones of lesser order. This is done by mapping a de Bruijn cycle to several vertex disjoint cycles in a de Bruijn digraph of higher order and then connecting these cycles into one full cycle. We present precise formulae for the locations where those cycles can be rejoined into one full cycle. We obtain an exponentially large class of distinct de Bruijn cycles. This method generalizes the Lempel construction of binary de Bruijn sequences as well as its efficient implementation by Annextein.

...read moreread less

21 citations

Journal Article•DOI•

Low Latency and Energy Efficient Scalable Architecture for Massive NoCs Using Generalized de Bruijn Graph

[...]

Mohammad Hosseinabady¹, Mohammad Reza Kakoee², Jimson Mathew¹, Dhiraj K. Pradhan¹•Institutions (2)

University of Bristol¹, University of Bologna²

01 Aug 2011-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: The generalized binary de Bruijn (GBDB) graph is proposed as a reliable and efficient network topology for a large NoC and a reliable routing algorithm to detour a faulty channel between two adjacent switches is proposed.

...read moreread less

Abstract: Employing thousands of cores in a single chip is the natural trend to handle the ever increasing performance requirements of complex applications such as those used in graphics and multimedia processing. System-on-chips (SoCs) platforms based on network-on-chips (NoCs) could be a viable option for the deployment of large multicore designs with thousands of cores. This paper proposes the generalized binary de Bruijn (GBDB) graph as a reliable and efficient network topology for a large NoC. We propose a reliable routing algorithm to detour a faulty channel between two adjacent switches. In addition, using integer linear programming, we propose an optimal tile-based implementation for a GBDB-based NoC in which the number of channels is less than that of Torus which has the same number of links. Our experimental results show that the latency and energy consumption of the generalized de Bruijn graph are much less than those of Mesh and Torus. The low energy consumption of a de Bruijn graph-based NoC makes it suitable for portable devices which have to operate on limited batteries. Also, the gate level implementation of the proposed reliable routing shows small area, power, and timing overheads due to the proposed reliable routing algorithm.

...read moreread less

17 citations

Book Chapter•DOI•

De Bruijn sequences for the binary strings with maximum density

[...]

Joe Sawada¹, Brett Stevens², Aaron Williams²•Institutions (2)

University of Guelph¹, Carleton University²

18 Feb 2011

TL;DR: This paper efficiently generates maximum-density de Bruijn sequences for all values of n and m and is a "complement-free de Bruijk sequence" since it is a circular binary string that contains each binary string of length n or its complement exactly once as a substring.

...read moreread less

Abstract: A de Bruijn sequence is a circular binary string of length 2n that contains each binary string of length n exactly once as a substring. A maximum-density de Bruijn sequence is a circular binary string of length n (n 0)+(n 1)+(n 2)+...+(n m) that contains each binary string of length n with density (number of 1s) between 0 and m, inclusively. In this paper we efficiently generate maximum-density de Bruijn sequences for all values of n and m. An interesting special case occurs when n = 2m+1. In this case our result is a "complement-free de Bruijn sequence" since it is a circular binary string of length 2n-1 that contains each binary string of length n or its complement exactly once as a substring.

...read moreread less

14 citations

Book Chapter•DOI•

T-IDBA: a de novo iterative de bruijn graph assembler for transcriptome

[...]

Yu Peng¹, Henry C. M. Leung¹, Siu-Ming Yiu¹, Francis Y. L. Chin¹•Institutions (1)

University of Hong Kong¹

28 Mar 2011

TL;DR: This work proposes the T-IDBA algorithm, a de novo transcriptome assembler that outperforms Abyss substantially in terms of sensitivity and precision for both simulated and real data.

...read moreread less

Abstract: RNA-seq data produced by next-generation sequencing technology is a useful tool for analyzing transcriptomes. However, existing de novo transcriptome assemblers do not fully utilize the properties of transcriptomes and may result in short contigs because of the splicing nature (shared exons) of the genes. We propose the T-IDBA algorithm to reconstruct expressed isoforms without reference genome. By using pair-end information to solve the problem of long repeats in different genes and branching in the same gene due to alternative splicing, the graph can be decomposed into small components, each corresponds to a gene. The most possible isoforms with sufficient support from the pair-end reads will be found heuristically. In practice, our de novo transcriptome assembler, T-IDBA, outperforms Abyss substantially in terms of sensitivity and precision for both simulated and real data. T-IDBA is available at http://www.cs.hku.hk/~alse/tidba/

...read moreread less

Patent•

Method for assembling genome

[...]

Ruiqiang Li, Wang Jun, Jian Wang, Li Songgang, Huanming Yang, Hongmei Zhu, Jue Ruan - Show less +3 more

29 Jun 2011

TL;DR: In this article, a sliding cutting on each base of the received order-checking sequence is carried out to obtain a short string with a fixed base length and a left and right connecting relation of the short string; storing a sequence value of the obtained short string, the left andright connecting relation and a connection number as a node of a de Bruijn graph; and assembling a genome based on the constructed de Bruijin graph.

...read moreread less

Abstract: The invention is applicable to the technical field of gene engineering, and provides a method for assembling genome. The method comprises the following steps: receiving an order-checking sequence; carrying out sliding cutting on each base of the received order-checking sequence to obtain a short string with a fixed base length and a left and right connecting relation of the short string; storing a sequence value of the obtained short string, the left and right connecting relation and a connection number as a node of a de Bruijn graph; and assembling a genome based on the constructed de Bruijin graph . In the invention, the method for assemblying genome can be realized by slidingly cutting the base of the received order-checking sequence one by one to obtain the short string with the fixed base length and the left and right connecting relation of the short string, and storing the sequence value of the obtained short string, the left and right connecting relation andthe connection number as the node of the de Bruijn graph. The method can assemble a large genome with small occupied memory and fast speed.

...read moreread less

De Bruijn graphs and their applications to fault tolerant networks

[...]

Joel Baker

16 Dec 2011

TL;DR: In this expository paper, the properties of Hamiltonian and Eulerian cycles that occur on De Bruijn graphs are explored and the type of redundancy that occurs as a result is explored.

...read moreread less

Abstract: The goal of this expository paper is to introduce De Bruijn graphs and discuss their applications to fault tolerant networks. We will begin by examining N.G. de Bruijn’s original paper and the proof of his claim that there are exactly 2 n−1−n De Bruijn cycles in the binary De Bruijn graph B(2, n). In order to study fault tolerance we explore the properties of Hamiltonian and Eulerian cycles that occur on De Bruijn graphs and the type of redundancy that occurs as a result. Lastly, in this paper we seek to provide some guidance into further research on De Bruijn graphs and their potential applications to other areas.

...read moreread less

Journal Article•DOI•

Bounds on Feedback Numbers of de Bruijn Graphs

[...]

Xirong Xu, Jun-Ming Xu, Yongchang Cao

06 Jan 2011-Taiwanese Journal of Mathematics

TL;DR: In this paper, the upper bound on the minimum feedback vertex sets in shuffle-based interconnection networks has been shown to be Ω(d,n) for the de Bruijn graph, where n is the number of vertices whose removal from the vertices results in an acyclic subgraph.

...read moreread less

Abstract: The feedback number of a graph $G$ is the minimum number of vertices whose removal from $G$ results in an acyclic subgraph. We use $f(d,n)$ to denote the feedback number of the de Bruijn graph $UB(d,n)$. R. Kr a lovic and P. Ruzicka [Minimum feedback vertex sets in shuffle-based interconnection networks. Information Processing Letters, 86 (4) (2003), 191-196] proved that $f(2,n)=\lceil \frac{2^{n}-2}{3}\rceil$. This paper gives the upper bound on $f(d,n)$ for $d\ge 3$, that is, $f(d,n)\leq d^n\left(1-\left(\frac{d}{1+d}\right)^{d-1}\right)+\binom{n+d-2}{d-2}$.

...read moreread less

Posted Content•

SparseAssembler: de novo Assembly with the Sparse de Bruijn Graph

[...]

Chengxi Ye, Zhanshan Sam Ma, Charles H. Cannon, Mihai Pop, Douglas W. Yu - Show less +1 more

14 Jun 2011-arXiv: Data Structures and Algorithms

TL;DR: A sparse de Bruijn graph-based denoising algorithm that can remove more than 99% of substitution errors from datasets with a \leq 2% error rate is developed and a novel Dijkstra-like breadth-first search algorithm is introduced to circumvent residual errors and resolve polymorphisms.

...read moreread less

Abstract: de Bruijn graph-based algorithms are one of the two most widely used approaches for de novo genome assembly A major limitation of this approach is the large computational memory space requirement to construct the de Bruijn graph, which scales with k-mer length and total diversity (N) of unique k-mers in the genome expressed in base pairs or roughly (2k+8)N bits This limitation is particularly important with large-scale genome analysis and for sequencing centers that simultaneously process multiple genomes We present a sparse de Bruijn graph structure, based on which we developed SparseAssembler that greatly reduces memory space requirements The structure also allows us to introduce a novel method for the removal of substitution errors introduced during sequencing The sparse de Bruijn graph structure skips g intermediate k-mers, therefore reducing the theoretical memory space requirement to ~(2k/g+8)N We have found that a practical value of g=16 consumes approximately 10% of the memory required by standard de Bruijn graph-based algorithms but yields comparable results A high error rate could potentially derail the SparseAssembler Therefore, we developed a sparse de Bruijn graph-based denoising algorithm that can remove more than 99% of substitution errors from datasets with a \leq 2% error rate Given that substitution error rates for the current generation of sequencers is lower than 1%, our denoising procedure is sufficiently effective to safeguard the performance of our algorithm Finally, we also introduce a novel Dijkstra-like breadth-first search algorithm for the sparse de Bruijn graph structure to circumvent residual errors and resolve polymorphisms

...read moreread less

Proceedings Article•DOI•

Evolutionary construction of de bruijn sequences

[...]

Meltem Sönmez Turan¹•Institutions (1)

National Institute of Standards and Technology¹

21 Oct 2011

TL;DR: In this paper, a new randomized construction method based on genetic algorithms is proposed for constructing binary de Bruijn sequences of order n, which is a cyclic sequence of period 2n, where each n-bit pattern appears exactly once.

...read moreread less

Abstract: A binary de Bruijn sequence of order n is a cyclic sequence of period 2n, in which each n-bit pattern appears exactly once. These sequences are commonly used in random number generation and symmetric key cryptography particularly in stream cipher design, mainly due to their good statistical properties. Constructing de Bruijn sequences is of interest and well studied in the literature. In this study, we propose a new randomized construction method based on genetic algorithms. The method models de Bruijn sequences as a special type of traveling salesman tours and tries to find optimal solutions to this special type of the traveling salesman problem (TSP). We present some experimental results for n d 14.

...read moreread less

Book Chapter•DOI•

Constructing orthogonal de Bruijn sequences

[...]

Yaw-Ling Lin¹, Charles B. Ward², Bharat Jain², Steven Skiena²•Institutions (2)

Providence College¹, Stony Brook University²

15 Aug 2011

TL;DR: This paper proves that there are at least ⌊σ/2⌋ mutually-orthogonal order-k de Bruijn sequences on alphabets of size σ for all k, and presents a heuristic which proves capable of efficiently constructing optimal collections of mutually- orthogonal sequences for small values of σ and k.

...read moreread less

Abstract: A (σ, k)-de Bruijn sequence is a minimum length string on an alphabet set of size σ which contains all σk k-mers exactly once. Motivated by an application in synthetic biology, we say a given collection of de Bruijn sequences are orthogonal if no two of them contain the same (k + 1)-mer; that is, the length of their longest common substring is k. In this paper, we show how to construct large collections of orthogonal de Bruijn sequences. In particular, we prove that there are at least ⌊σ/2⌋ mutually-orthogonal order-k de Bruijn sequences on alphabets of size σ for all k. Based on this approach, we present a heuristic which proves capable of efficiently constructing optimal collections of mutually-orthogonal sequences for small values of σ and k, which supports our conjecture that σ - 1 mutually-orthogonal de Bruijn sequences exist for all σ and k.

...read moreread less

Posted Content•

Efficient tilings of de Bruijn and Kautz graphs

[...]

Washington Taylor, Jud Leonard, Lawrence C. Stewart

10 Jan 2011-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: This paper considers the mathematical problem of uniformly tiling a de Bruijn or Kautz graph by a set of identical subgraphs, and derives a simple lower bound on the number of edges which must leave each tile, and constructs a class of tilings whose number of edge leaving each tile agrees asymptotically in form with the lower bound to within a constant factor.

...read moreread less

Abstract: Kautz and de Bruijn graphs have a high degree of connectivity which makes them ideal candidates for massively parallel computer network topologies. In order to realize a practical computer architecture based on these graphs, it is useful to have a means of constructing a large-scale system from smaller, simpler modules. In this paper we consider the mathematical problem of uniformly tiling a de Bruijn or Kautz graph. This can be viewed as a generalization of the graph bisection problem. We focus on the problem of graph tilings by a set of identical subgraphs. Tiles should contain a maximal number of internal edges so as to minimize the number of edges connecting distinct tiles. We find necessary and sufficient conditions for the construction of tilings. We derive a simple lower bound on the number of edges which must leave each tile, and construct a class of tilings whose number of edges leaving each tile agrees asymptotically in form with the lower bound to within a constant factor. These tilings make possible the construction of large-scale computing systems based on de Bruijn and Kautz graph topologies.

...read moreread less

Proceedings Article•DOI•

Node ID assignment in group theoretic graphs for WSNs

[...]

Junghun Ryu¹, Jaewook Yu¹, Eric Noel², K. Wendy Tang¹•Institutions (2)

Stony Brook University¹, AT&T Labs²

13 Apr 2011

TL;DR: This paper investigates and presents different node ID assignment algorithms for group-theoretic graphs such as Borel Cayley and de Bruijn graphs and finds that simulated annealing has the best performance, and all three methods outperforms random ID assignment for the authors' simulations.

...read moreread less

Abstract: In this paper, we investigate and present different node ID assignment algorithms for group-theoretic graphs such as Borel Cayley and de Bruijn graphs. These graphs have been shown to be effective logical topologies in wireless sensor networks when all the nodes are within communication range of each other. However, in practice a sensor node's communication range is limited and some nodes can be out of range with each other. Under this more realistic scenario, the original theoretic graph cannot be imposed to the network in its entirety. But rather, only partial connections of the original graphs can be imposed on the physical network. Thus, node ID assignment becomes an important issue. An effective assignment allows most connections to be imposed and hence resulting in a shorter diameter and the average path length. We investigate three algorithms: (a) ID swapping assignment, (b) simulated annealing based assignment, and (c) distributed ID swapping assignment. While the first two are centralized algorithms that are appropriate for wireless sensor network with fixed infrastructure, the latter is efficient for ad hoc WSNs. As expected, being most computationally intensive, simulated annealing has the best performance, and all three methods outperforms random ID assignment for our simulations.

...read moreread less

Posted Content•

SparseAssembler2: Sparse k-mer Graph for Memory Efficient Genome Assembly

[...]

Chengxi Ye, Charles H. Cannon, Zhanshan Sam Ma, Douglas W. Yu, Mihai Pop¹ - Show less +1 more•Institutions (1)

University of Idaho¹

17 Aug 2011-arXiv: Data Structures and Algorithms

TL;DR: SparseAssembler1 as discussed by the authors replaces the idea of the de Bruijn graph from the beginning, and achieves similar memory efficiency and much better robustness compared with the previous SparseAssembleler1.

...read moreread less

Abstract: The formal version of our work has been published in BMC Bioinformatics and can be found here: this http URL Motivation: To tackle the problem of huge memory usage associated with de Bruijn graph-based algorithms, upon which some of the most widely used de novo genome assemblers have been built, we released SparseAssembler1. SparseAssembler1 can save as much as 90% memory consumption in comparison with the state-of-art assemblers, but it requires rounds of denoising to accurately assemble genomes. In this paper, we introduce a new general model for genome assembly that uses only sparse k-mers. The new model replaces the idea of the de Bruijn graph from the beginning, and achieves similar memory efficiency and much better robustness compared with our previous SparseAssembler1. Results: We demonstrate that the decomposition of reads of all overlapping k-mers, which is used in existing de Bruijn graph genome assemblers, is overly cautious. We introduce a sparse k-mer graph structure for saving sparse k-mers, which greatly reduces memory space requirements necessary for de novo genome assembly. In contrast with the de Bruijn graph approach, we devise a simple but powerful strategy, i.e., finding links between the k-mers in the genome and traversing following the links, which can be done by saving only a few k-mers. To implement the strategy, we need to only select some k-mers that may not even be overlapping ones, and build the links between these k-mers indicated by the reads. We can traverse through this sparse k-mer graph to build the contigs, and ultimately complete the genome assembly. Since the new sparse k-mers graph shares almost all advantages of de Bruijn graph, we are able to adapt a Dijkstra-like breadth-first search algorithm to circumvent sequencing errors and resolve polymorphisms.

...read moreread less

Proceedings Article•DOI•

Heuristic for Routing and Wavelength Assignment in de Bruijn WDM networks based on Graph Decomposition

[...]

Monish Chatterjee¹, Akik Goswami¹, Sabyasachi Mukherjee¹, Uma Bhattacharya²•Institutions (2)

Asansol Engineering College¹, Indian Institute of Engineering Science and Technology, Shibpur²

01 Dec 2011

TL;DR: A new static polynomial time RWA heuristic LBGD-RWA (Load Balancing with Graph Decomposition based RWA) for static Wavelength Assignment in a special class of WDM networks which are based on de Bruijn graph is proposed.

...read moreread less

Abstract: An important parameter for performance analysis of Routing and Wavelength Assignment (RWA) strategies in WDM networks is blocking probability. Past research has shown that the process in which RWA is carried out significantly affects the wavelength conversion requirements, which in turn affects blocking probability. In this paper we propose a new strategy GDWA (Graph Decomposition based Wavelength Assignment) for static Wavelength Assignment (WA) in a special class of WDM networks which are based on de Bruijn graph. We combine our own request routing strategy LBR (Load Balanced Routing) with the new WA strategy effectively to propose a new static polynomial time RWA heuristic LBGD-RWA (Load Balancing with Graph Decomposition based RWA). We compare the performance of our heuristic with three alternate RWA strategies. Performance comparison reveals that the proposed heuristic gives the best blocking performance.

...read moreread less

Journal Article•DOI•

A wavelength assignment algorithm for de Bruijn WDM networks

[...]

Monish Chatterjee¹, Swagato Sanyal², Mita Nasipuri³, Uma Bhattacharya⁴•Institutions (4)

Asansol Engineering College¹, Indian Institute of Technology Kanpur², Jadavpur University³, Indian Institute of Engineering Science and Technology, Shibpur⁴

08 Dec 2011-International Journal of Parallel, Emergent and Distributed Systems

TL;DR: This work shows that de Bruijn graph can be expressed as union of edge-disjoint rings and wavelengths are assigned for the individual rings in the graph thus resulting in the assignment for the graph itself.

...read moreread less

Abstract: This paper proposes an offline wavelength assignment technique for de Bruijn (d, k) optical wavelength division multiplexing (WDM) networks having nodal degree d and diameter k that can support lightpaths between pair of nodes. Each lightpath uses a channel (wavelength) along each link in its route. An efficient algorithm is proposed that can be used to assign wavelengths to lightpath requests in O(k|V|) time, where |V| represents the number of nodes of the de Bruijn network and V represents the set of nodes. The proposed algorithm can be efficiently used for wavelength assignment in de Bruijn WDM networks having limited wavelength conversion capabilities. This work shows that de Bruijn graph can be expressed as union of edge-disjoint rings. Wavelengths are assigned for the individual rings in the graph thus resulting in the assignment for the graph itself. Results are shown for a given de Bruijn graph and for an arbitrary request. The proposed algorithm is compared with two other algorithms, which use tw...

...read moreread less

Journal Article•DOI•

The Maximum Independent Sets of de Bruijn Graphs of Diameter 3

[...]

Dustin Cartwright¹, Maria Angelica Cueto², Enrique A. Tobis³•Institutions (3)

Yale University¹, Columbia University², Harvard University³

03 Oct 2011-Electronic Journal of Combinatorics

TL;DR: An inductive characterization of the maximum independent sets of the de Bruijn graphs and a recurrence relation and an exponential generating function for their number are derived.

...read moreread less

Abstract: The nodes of the de Bruijn graph $B(d,3)$ consist of all strings of length $3$, taken from an alphabet of size $d$, with edges between words which are distinct substrings of a word of length $4$. We give an inductive characterization of the maximum independent sets of the de Bruijn graphs $B(d,3)$ and for the de Bruijn graph of diameter three with loops removed, for arbitrary alphabet size. We derive a recurrence relation and an exponential generating function for their number. This recurrence allows us to construct exponentially many comma-free codes of length 3 with maximal cardinality.

...read moreread less

Scaling short read de novo DNA sequence assembly to gigabase genomes

[...]

Jeffrey J. Cook

25 May 2011

Proceedings Article•

PadeNA: A PARALLEL DE NOVO ASSEMBLER

[...]

Gaurav Thareja, Vivek Kumar, Michael Zyskowski¹, Simon Mercer¹, Bob Davidson¹ - Show less +1 more•Institutions (1)

Microsoft¹

15 Jul 2011

TL;DR: PadeNA (Parallel de Novo Assembler), a parallelized DNA sequence assembler with a graphical user interface, designed using interface-driven architecture to facilitate code reusability and extensibility, and is provided as part of the open source Microsoft Biology Foundation.

...read moreread less

Abstract: Recent technological advances in DNA sequencing technology are resulting in ever-larger quantities of sequence information being made available to an increasingly broad segment of the scientific and clinical community. This is in turn driving the need for standard, rapid and easy to use tools for genomic reconstruction and analysis. As a step towards addressing this challenge, we present PadeNA (Parallel de Novo Assembler), a parallelized DNA sequence assembler with a graphical user interface. PadeNA is designed using interface-driven architecture to facilitate code reusability and extensibility, and is provided as part of the open source Microsoft Biology Foundation. Installers and documentation are available at http://research.microsoft.com/bio/.

...read moreread less

Bachelorproject: Shift Registers and De Bruijn Graphs

[...]

Christine van Vredendaal

01 Jan 2011

TL;DR: This essay is an attempt to create a generalized periodic shift register function that produces a De Bruijn sequence and the minimal Sum-of-Products boolean functions and the Exclusive-OR-Sum- of-Products are discussed.

...read moreread less

Abstract: This essay is an attempt to create a generalized periodic shift register function that produces a De Bruijn sequence. To this end we first devise an algorithm to create all De Bruijn sequences. In this algorithm all spanning trees of a De Bruijn graph are created, these trees are converted into Euler paths and finally the De Bruijn sequences are extracted from the Euler paths. Then the focus shifts onto creating the boolean functions that produce these sequences. The minimal Sum-of-Products boolean functions and the Exclusive-OR-Sum-of-Products are discussed. Finally some general properties of the functions are derived, but no general function is found.

...read moreread less

Characterization of de Bruijn graphs homomorphisms

[...]

Akinwande Mufutau

01 Jan 2011

TL;DR: In this paper, the authors studied homomorphisms between de Bruijn digraphs of different orders, where the inverse of a lower order digraph is also a factor in the higher order one, where a factor is a collection of cycles that partition the digraph.

...read moreread less

Abstract: We study homomorphisms between de Bruijn digraphs of different orders. A main theme of this paper is to characterize de Bruijn graph homomorphisms such that the inverse of a factor in the lower order digraph is also a factor in the higher order one, where a factor is a collection of cycles that partition the digraph. We generalize Lempel's homomorphism by describing and characterizing a class of homomorphisms between two de Bruijn digraphs of arbitrarily different orders but with the same alphabet, the direction of these functions being of course from the higher order digraph to the lower order one. Finally, we single out the binary case, which due to its simplicity admits a more concise characterization.

...read moreread less

Proceedings Article•DOI•

Workshop: Efficient sequential and parallel algorithms for sequence assembly

[...]

Sanguthevar Rajasekaran¹, Hieu Dinh¹, Vamsi Kundeti¹•Institutions (1)

University of Connecticut¹

03 Feb 2011

TL;DR: The authors' algorithms are based on sorting and efficient in sequential, out-of-core, and parallel settings and provide computationally efficient algorithms to these fundamental bi-directed de Bruijn graph operations.

...read moreread less

Abstract: Next Generation Sequence (NGS) assemblers are challenged with the problem of handling massive number of reads. Bi-directed de Bruijn graph is the most fundamental data structure on which numerous NGS assemblers have been built (e.g. Velvet, ABySS). Most of these assemblers only differ in the heuristics which they employ to operate on this de Bruijn graph. These heuristics are composed of several fundamental operations such as construction, compaction and pruning of the underlying bi-directed de Bruijn graph. Unfortunately the current algorithms to accomplish these fundamental operations on the de Bruijn graph are computationally inefficient and have become a bottleneck to scale the NGS assemblers. In this talk we discuss some of our recent results which provide computationally efficient algorithms to these fundamental bi-directed de Bruijn graph operations. Our algorithms [1] are based on sorting and efficient in sequential, out-of-core, and parallel settings.

...read moreread less

Journal Article•DOI•

SparseAssembler2: Sparse k-mer Graph for Memory Efficient Genome Assembly

[...]

Chengxi Ye, Charles H. Cannon, Zhanshan Sam Ma, Douglas W. Yu, Mihai Pop¹ - Show less +1 more•Institutions (1)

University of Idaho¹

18 Oct 2011-F1000Research

TL;DR: A new general model for genome assembly that uses only sparse k-mers is introduced, which greatly reduces memory space requirements necessary for de novo genome assembly and adapts a Dijkstra-like breadth-first search algorithm to circumvent sequencing errors and resolve polymorphisms.

...read moreread less

Abstract: Motivation: To tackle the problem of huge memory usage associated with de Bruijn graph-based algorithms, upon which some of the most widely used de novo genome assemblers have been built, we released SparseAssembler1. SparseAssembler1 can save as much as 90% memory consumption in comparison with the state-of-art assemblers, but it requires rounds of denoising to accurately assemble genomes. Algorithmetically, we developed an extension of de Bruijn graph structure — 'sparse de Bruijn graphs' — skipping a certain number of intermediate k-mers. In this paper, we introduce a new general model for genome assembly that uses only sparse k-mers. The new model replaces the idea of the de Bruijn graph from the beginning, and achieves similar memory efficiency and much better robustness compared with our previous SparseAssembler1. Results: Based on the sparse k-mers graph model, we develop SparseAssembler2. We demonstrate that the decomposition of reads of all overlapping k-mers, which is used in existing de Bruijn graph genome assemblers, is overly cautious. We introduce a sparse *To whom correspondence should be addressed. k-mer graph structure for saving sparse k-mers, which greatly reduces memory space requirements necessary for de novo genome assembly. In contrast with the de Bruijn graph approach, we devise a simple but powerful strategy, i.e., finding links between the k-mers in the genome and traversing following the links, which can be done by saving only a few k-mers. To implement the strategy, we need to only select some k-mers that may not even be overlapping ones, and build the links between these k-mers indicated by the reads. We can traverse through this sparse k-mer graph to build the contigs, and ultimately complete the genome assembly. Since the new sparse k-mers graph shares almost all advantages of de Bruijn graph, we are able to adapt a Dijkstra-like breadth-first search algorithm, for the new sparse k-mer graph in order to circumvent sequencing errors and resolve polymorphisms. Availability: Programs in both Windows and Linux are available at: https://sites.google.com/site/sparseassembler/. Contact: ma@vandals.uidaho.edu or mpop@umiacs.umd.edu SparseAssembler2: Sparse k-mer Graph for Memory Efficient Genome Assembly

...read moreread less

Journal Article•

Efficient Graph Based Assembly of Short-Read Sequences on Hybrid Core Architecture

[...]

Alexander Sczyrba, Abhishek Pratap, Shane Canon, James Han, Alex Copeland, Zhong Wang, Tony Brewer, David Soper, Mike D'Jamoos, Kirby Collins, George Vacek - Show less +7 more

22 Mar 2011-Lawrence Berkeley National Laboratory

TL;DR: JGI is comparing the performance of Convey?s graph constructor and Velvet on both synthetic and real data, and preliminary results on memory usage and run time metrics for various data sets with different sizes are presented.

...read moreread less

Abstract: Advanced architectures can deliver dramatically increased throughput for genomics and proteomics applications, reducing time-to-completion in some cases from days to minutes. One such architecture, hybrid-core computing, marries a traditional x86 environment with a reconfigurable coprocessor, based on field programmable gate array (FPGA) technology. In addition to higher throughput, increased performance can fundamentally improve research quality by allowing more accurate, previously impractical approaches. We will discuss the approach used by Convey?s de Bruijn graph constructor for short-read, de-novo assembly. Bioinformatics applications that have random access patterns to large memory spaces, such as graph-based algorithms, experience memory performance limitations on cache-based x86 servers. Convey?s highly parallel memory subsystem allows application-specific logic to simultaneously access 8192 individual words in memory, significantly increasing effective memory bandwidth over cache-based memory systems. Many algorithms, such as Velvet and other de Bruijn graph based, short-read, de-novo assemblers, can greatly benefit from this type of memory architecture. Furthermore, small data type operations (four nucleotides can be represented in two bits) make more efficient use of logic gates than the data types dictated by conventional programming models. JGI is comparing the performance of Convey?s graph constructor and Velvet on both synthetic and real data. We will present preliminary results on memory usage and run time metrics for various data sets with different sizes, from small microbial and fungal genomes to very large cow rumen metagenome. For genomes with references we will also present assembly quality comparisons between the two assemblers.

...read moreread less