scispace - formally typeset
Open AccessJournal ArticleDOI

SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments

Reads0
Chats0
TLDR
SNPs can be extracted from a 8.3 GB alignment file using 59 MB of RAM and 1 CPU core, making it feasible to run on modest computers, and results in multiple formats for downstream analysis are output.
Abstract
Rapidly decreasing genome sequencing costs have led to a proportionate increase in the number of samples used in prokaryotic population studies. Extracting single nucleotide polymorphisms (SNPs) from a large whole genome alignment is now a routine task, but existing tools have failed to scale efficiently with the increased size of studies. These tools are slow, memory inefficient and are installed through non-standard procedures. We present SNP-sites which can rapidly extract SNPs from a multi-FASTA alignment using modest resources and can output results in multiple formats for downstream analysis. SNPs can be extracted from a 8.3 GB alignment file (1842 taxa, 22 618 sites) in 267 seconds using 59 MB of RAM and 1 CPU core, making it feasible to run on modest computers. It is easy to install through the Debian and Homebrew package managers, and has been successfully tested on more than 20 operating systems. SNP-sites is implemented in C and is available under the open source license GNU GPL version 3.

read more

Citations
More filters
Journal ArticleDOI

ARIBA: rapid antimicrobial resistance genotyping directly from sequencing reads

TL;DR: A new tool is presented, ARIBA, that identifies AMR-associated genes and single nucleotide polymorphisms directly from short reads, and generates detailed and customizable output.
Journal ArticleDOI

Genomic architecture and introgression shape a butterfly radiation

TL;DR: Tests to distinguish incomplete lineage sorting from introgression indicate that gene flow has obscured several ancient phylogenetic relationships in this group over large swathes of the genome, and a hitherto unknown inversion that traps a color pattern switch locus is identified.
References
More filters
Journal ArticleDOI

MUSCLE: multiple sequence alignment with high accuracy and high throughput

TL;DR: MUSCLE is a new computer program for creating multiple alignments of protein sequences that includes fast distance estimation using kmer counting, progressive alignment using a new profile function the authors call the log-expectation score, and refinement using tree-dependent restricted partitioning.
Journal ArticleDOI

MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability

TL;DR: This version of MAFFT has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update.
Journal ArticleDOI

RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.

TL;DR: This work presents some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting post-analyses on sets of trees.
Journal Article

PHYLIP-Phylogeny inference package (Version 3.2)

J. Felsenstein
- 01 Jan 1989 - 
Related Papers (5)