scispace - formally typeset
Search or ask a question
Journal ArticleDOI

NextGenMap: Fast and accurate read mapping in highly polymorphic genomes

01 Nov 2013-Bioinformatics (Oxford University Press)-Vol. 29, Iss: 21, pp 2790-2791
TL;DR: NextGenMap is reported, a fast and accurate read mapper, which aligns reads reliably to a reference genome even when the sequence difference between target and reference genome is large, i.e. highly polymorphic genome.
Abstract: Summary: When choosing a read mapper, one faces the trade off between speed and the ability to map reads in highly polymorphic regions. Here, we report NextGenMap, a fast and accurate read mapper, which reduces this dilemma. NextGenMap aligns reads reliably to a reference genome even when the sequence difference between target and reference genome is large, i.e. highly polymorphic genome. At the same time, NextGenMap outperforms current mapping methods with respect to runtime and to the number of correctly mapped reads. NextGenMap efficiently uses the available hardware by exploiting multi-core CPUs as well as graphic cards (GPUs), if available. In addition, NextGenMap handles automatically any read data independent of read length and sequencing technology. Availability: NextGenMap source code and documentation are available at: http://cibiv.github.io/NextGenMap/ Contact: fritz.sedlazeck@univie.ac.at Supplementary information: Supplementary data are available at Bioinformatics online.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: NGMLR and Sniffles perform highly accurate alignment and structural variation detection from long-read sequencing data and can automatically filter false events and operate on low-coverage data, thereby reducing the high costs that have hindered the application of long reads in clinical and research settings.
Abstract: Structural variations are the greatest source of genetic variation, but they remain poorly understood because of technological limitations. Single-molecule long-read sequencing has the potential to dramatically advance the field, although high error rates are a challenge with existing methods. Addressing this need, we introduce open-source methods for long-read alignment (NGMLR; https://github.com/philres/ngmlr ) and structural variant identification (Sniffles; https://github.com/fritzsedlazeck/Sniffles ) that provide unprecedented sensitivity and precision for variant detection, even in repeat-rich regions and for complex nested events that can have substantial effects on human health. In several long-read datasets, including healthy and cancerous human genomes, we discovered thousands of novel variants and categorized systematic errors in short-read approaches. NGMLR and Sniffles can automatically filter false events and operate on low-coverage data, thereby reducing the high costs that have hindered the application of long reads in clinical and research settings.

1,058 citations

Journal ArticleDOI
17 Jul 2018-Immunity
TL;DR: A library of congenic tumor cell clones from an autochthonous mouse model of pancreatic adenocarcinoma is established, identifying heterogeneous and multifactorial pathways regulating tumor‐cell‐intrinsic mechanisms that dictate the immune microenvironment and thereby responses to immunotherapy.

453 citations

Journal ArticleDOI
05 May 2015-eLife
TL;DR: Investigation of DNA methylation variation in Swedish Arabidopsis thaliana accessions grown at two different temperatures finds that accessions from colder regions had higher levels of GBM for a significant fraction of the genome, and this was associated with increased transcription for the genes affected.
Abstract: Epigenome modulation potentially provides a mechanism for organisms to adapt, within and between generations. However, neither the extent to which this occurs, nor the mechanisms involved are known. Here we investigate DNA methylation variation in Swedish Arabidopsis thaliana accessions grown at two different temperatures. Environmental effects were limited to transposons, where CHH methylation was found to increase with temperature. Genome-wide association studies (GWAS) revealed that the extensive CHH methylation variation was strongly associated with genetic variants in both cis and trans, including a major trans-association close to the DNA methyltransferase CMT2. Unlike CHH methylation, CpG gene body methylation (GBM) was not affected by growth temperature, but was instead correlated with the latitude of origin. Accessions from colder regions had higher levels of GBM for a significant fraction of the genome, and this was associated with increased transcription for the genes affected. GWAS revealed that this effect was largely due to trans-acting loci, many of which showed evidence of local adaptation.

427 citations


Cites methods from "NextGenMap: Fast and accurate read ..."

  • ...…al., 2012) to a modified pseudo-reference chromosome in which SNPs were inserted into the TAIR10 reference genome using NextGenMap (version 0.4.3; Sedlazeck et al., 2013) allowing up to 10% mismatch between the reads (-i 0.90) and the reference sequence and discarding reads that map equally well…...

    [...]

Journal ArticleDOI
TL;DR: It is shown that copy number variants (CNVs) show a variety of genetic signals consistent with rapid turnover and make substantial contributions to quantitative traits, most notably intracellular amino acid concentrations, growth under stress and sugar utilization in winemaking, whereas rearrangements are strongly associated with reproductive isolation.
Abstract: Large structural variations (SVs) within genomes are more challenging to identify than smaller genetic variants but may substantially contribute to phenotypic diversity and evolution. We analyse the effects of SVs on gene expression, quantitative traits and intrinsic reproductive isolation in the yeast Schizosaccharomyces pombe. We establish a high-quality curated catalogue of SVs in the genomes of a worldwide library of S. pombe strains, including duplications, deletions, inversions and translocations. We show that copy number variants (CNVs) show a variety of genetic signals consistent with rapid turnover. These transient CNVs produce stoichiometric effects on gene expression both within and outside the duplicated regions. CNVs make substantial contributions to quantitative traits, most notably intracellular amino acid concentrations, growth under stress and sugar utilization in winemaking, whereas rearrangements are strongly associated with reproductive isolation. Collectively, these findings have broad implications for evolution and for our understanding of quantitative traits including complex human diseases.

400 citations

Journal ArticleDOI
TL;DR: A draft genome sequence of mungbean is constructed to facilitate genome research into the subgenus Ceratotropis, which includes several important dietary legumes in Asia, and to enable a better understanding of the evolution of leguminous species.
Abstract: Mungbean (Vigna radiata) is a fast-growing, warm-season legume crop that is primarily cultivated in developing countries of Asia. Here we construct a draft genome sequence of mungbean to facilitate genome research into the subgenus Ceratotropis, which includes several important dietary legumes in Asia, and to enable a better understanding of the evolution of leguminous species. Based on the de novo assembly of additional wild mungbean species, the divergence of what was eventually domesticated and the sampled wild mungbean species appears to have predated domestication. Moreover, the de novo assembly of a tetraploid Vigna species (V. reflexo-pilosa var. glabra) provides genomic evidence of a recent allopolyploid event. The species tree is constructed using de novo RNA-seq assemblies of 22 accessions of 18 Vigna species and protein sets of Glycine max. The present assembly of V. radiata var. radiata will facilitate genome research and accelerate molecular breeding of the subgenus Ceratotropis.

397 citations

References
More filters
Journal ArticleDOI
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Abstract: Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: [email protected]

43,862 citations


"NextGenMap: Fast and accurate read ..." refers methods in this paper

  • ...Representatives of BWT-based methods were BWA (Li and Durbin, 2009), its extension for longer reads BWA-SW (Li and Durbin, 2010) and Bowtie2 (Langmead and Salzberg, 2012)....

    [...]

  • ...For low genomic polymorphism (0.1%), BWA-SW shows 0.1% and 0.2% more correctly mapped reads compared with NextGenMap for S1 and S2, respectively....

    [...]

  • ...First, Burrows Wheeler transformation (BWT)-based methods, e.g. BWA (Li and Durbin, 2009), which are fast but optimized for short reads and genomes with low polymorphism....

    [...]

  • ...If we compare the runtimes of the methods that showed the highest accuracy, NexGenMap is between 2.9 and 5.8 times faster than BWA-SW....

    [...]

Journal ArticleDOI
TL;DR: Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
Abstract: As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

37,898 citations


"NextGenMap: Fast and accurate read ..." refers methods in this paper

  • ...Representatives of BWT-based methods were BWA (Li and Durbin, 2009), its extension for longer reads BWA-SW (Li and Durbin, 2010) and Bowtie2 (Langmead and Salzberg, 2012)....

    [...]

  • ...NextGen Map’s CPU implementation was 1.1–2.3 times faster than Bowtie2, the fastest method so far....

    [...]

Journal ArticleDOI
TL;DR: Recently developed statistical methods both improve and quantify the considerable uncertainty associated with genotype calling, and will especially benefit the growing number of studies using low- to medium-coverage data.
Abstract: Meaningful analysis of next-generation sequencing (NGS) data, which are produced extensively by genetics and genomics studies, relies crucially on the accurate calling of SNPs and genotypes. Recently developed statistical methods both improve and quantify the considerable uncertainty associated with genotype calling, and will especially benefit the growing number of studies using low- to medium-coverage data. We review these methods and provide a guide for their use in NGS studies.

1,371 citations


"NextGenMap: Fast and accurate read ..." refers methods in this paper

  • ...At the same time, NextGenMap outperforms current map- ping methods with respect to runtime and to the number of correctly mapped reads....

    [...]

Journal ArticleDOI
TL;DR: A read mapper, Stampy, which uses a hybrid mapping algorithm and a detailed statistical model to achieve both speed and sensitivity, particularly when reads include sequence variation, which results in a higher useable sequence yield and improved accuracy compared to that of existing software.
Abstract: High-volume sequencing of DNA and RNA is now within reach of any research laboratory and is quickly becoming established as a key research tool. In many workflows, each of the short sequences ("reads") resulting from a sequencing run are first "mapped" (aligned) to a reference sequence to infer the read from which the genomic location derived, a challenging task because of the high data volumes and often large genomes. Existing read mapping software excel in either speed (e.g., BWA, Bowtie, ELAND) or sensitivity (e.g., Novoalign), but not in both. In addition, performance often deteriorates in the presence of sequence variation, particularly so for short insertions and deletions (indels). Here, we present a read mapper, Stampy, which uses a hybrid mapping algorithm and a detailed statistical model to achieve both speed and sensitivity, particularly when reads include sequence variation. This results in a higher useable sequence yield and improved accuracy compared to that of existing software.

1,184 citations


"NextGenMap: Fast and accurate read ..." refers methods in this paper

  • ...As hash-based representative we selected Stampy (Lunter and Goodson, 2011)....

    [...]

  • ...Second, hashbased methods like Stampy (Lunter and Goodson, 2011), which are slow but also suited for highly polymorphic genomes....

    [...]

Journal ArticleDOI
TL;DR: This survey focuses on classifying mappers through a wide number of characteristics to allow practitioners to compare the mappers more easily and find those that are most suitable for their specific problem.
Abstract: Motivation: A ubiquitous and fundamental step in high-throughput sequencing analysis is the alignment (mapping) of the generated reads to a reference sequence. To accomplish this task, numerous software tools have been proposed. Determining the mappers that are most suitable for a specific application is not trivial. Results: This survey focuses on classifying mappers through a wide number of characteristics. The goal is to allow practitioners to compare the mappers more easily and find those that are most suitable for their specific problem. Availability: A regularly updated compendium of mappers can be found at http://wwwdev.ebi.ac.uk/fg/hts_mappers/. Contact: [email protected] Supplementary information:Supplementary data are available at Bioinformatics online.

305 citations