SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments
Andrew J. Page,Ben Taylor,Aidan Delaney,Jorge Soares,Torsten Seemann,Jacqueline A. Keane,Simon R. Harris +6 more
- Vol. 2, Iss: 4
Reads0
Chats0
TLDR
SNPs can be extracted from a 8.3 GB alignment file using 59 MB of RAM and 1 CPU core, making it feasible to run on modest computers, and results in multiple formats for downstream analysis are output.Abstract:
Rapidly decreasing genome sequencing costs have led to a proportionate increase in the number of samples used in prokaryotic population studies. Extracting single nucleotide polymorphisms (SNPs) from a large whole genome alignment is now a routine task, but existing tools have failed to scale efficiently with the increased size of studies. These tools are slow, memory inefficient and are installed through non-standard procedures. We present SNP-sites which can rapidly extract SNPs from a multi-FASTA alignment using modest resources and can output results in multiple formats for downstream analysis. SNPs can be extracted from a 8.3 GB alignment file (1842 taxa, 22 618 sites) in 267 seconds using 59 MB of RAM and 1 CPU core, making it feasible to run on modest computers. It is easy to install through the Debian and Homebrew package managers, and has been successfully tested on more than 20 operating systems. SNP-sites is implemented in C and is available under the open source license GNU GPL version 3.read more
Citations
More filters
Journal ArticleDOI
Structural and Functional Analysis of the D614G SARS-CoV-2 Spike Protein Variant.
Leonid Yurkovetskiy,Xue Wang,Kristen E. Pascal,Christopher Tomkins-Tinch,Thomas Nyalile,Yetao Wang,Alina Baum,William E. Diehl,Ann Dauphin,Claudia Carbone,Kristen Veinotte,Shawn B. Egri,Stephen F. Schaffner,Stephen F. Schaffner,Jacob E. Lemieux,James B. Munro,Ashique Rafique,Abhi Barve,Pardis C. Sabeti,Christos A. Kyratsous,Natalya Dudkina,Kuang Shen,Jeremy Luban +22 more
TL;DR: It is shown that D614G was more infectious than the ancestral form on human lung cells, colon cells, and on cells rendered permissive by ectopic expression of human ACE2 or of ACE2 orthologs from various mammals, including Chinese rufous horseshoe bat and Malayan pangolin.
Journal ArticleDOI
Emergence of an Extensively Drug-Resistant Salmonella enterica Serovar Typhi Clone Harboring a Promiscuous Plasmid Encoding Resistance to Fluoroquinolones and Third-Generation Cephalosporins.
Elizabeth J. Klemm,Sadia Shakoor,Andrew J. Page,Farah Naz Qamar,Kim Judge,Dania K. Saeed,Vanessa K. Wong,Timothy J. Dallman,Satheesh Nair,Stephen Baker,Stephen Baker,Ghazala Shaheen,Shahida Qureshi,Mohammad Tahir Yousafzai,Muhammad Khalid Saleem,Zahra Hasan,Gordon Dougan,Gordon Dougan,Rumina Hasan +18 more
TL;DR: The first large-scale emergence and spread of a novel extensively drug-resistant S. Typhi clone in Sindh, Pakistan is reported, highlighting the evolving threat of antibiotic resistance in S. typhi and the value of antibiotic susceptibility testing and whole-genome sequencing in understanding emerging infectious diseases.
Journal ArticleDOI
ARIBA: rapid antimicrobial resistance genotyping directly from sequencing reads
Martin Hunt,Alison E. Mather,Alison E. Mather,Leonor Sánchez-Busó,Andrew J. Page,Julian Parkhill,Jacqueline A. Keane,Simon R. Harris +7 more
TL;DR: A new tool is presented, ARIBA, that identifies AMR-associated genes and single nucleotide polymorphisms directly from short reads, and generates detailed and customizable output.
Journal ArticleDOI
Genomic architecture and introgression shape a butterfly radiation
Nathaniel B. Edelman,Paul B. Frandsen,Paul B. Frandsen,Michael Miyagi,Bernardo J. Clavijo,John W. Davey,John W. Davey,Rebecca B. Dikow,Gonzalo García-Accinelli,Steven M. Van Belleghem,Nick Patterson,Nick Patterson,Daniel E. Neafsey,Daniel E. Neafsey,Richard Challis,Sujai Kumar,Gilson R. P. Moreira,Camilo Salazar,Mathieu Chouteau,Brian A. Counterman,Riccardo Papa,Riccardo Papa,Mark Blaxter,Robert D. Reed,Kanchon K. Dasmahapatra,Marcus R. Kronforst,Mathieu Joron,Chris D. Jiggins,W. Owen McMillan,Federica Di Palma,Andrew J. Blumberg,John Wakeley,David B. Jaffe,James Mallet +33 more
TL;DR: Tests to distinguish incomplete lineage sorting from introgression indicate that gene flow has obscured several ancient phylogenetic relationships in this group over large swathes of the genome, and a hitherto unknown inversion that traps a color pattern switch locus is identified.
Posted ContentDOI
Structural and Functional Analysis of the D614G SARS-CoV-2 Spike Protein Variant.
Leonid Yurkovetskiy,Xue Wang,Kristen E. Pascal,Christopher Tomkins-Tinch,Christopher Tomkins-Tinch,Thomas Nyalile,Yetao Wang,Alina Baum,William E. Diehl,Ann Dauphin,Claudia Carbone,Kristen Veinotte,Shawn B. Egri,Stephen F. Schaffner,Stephen F. Schaffner,Jacob E. Lemieux,Jacob E. Lemieux,James B. Munro,Ashique Rafique,Abhi Barve,Pardis C. Sabeti,Christos A. Kyratsous,Natalya Dudkina,Kuang Shen,Jeremy Luban +24 more
TL;DR: D614G adopts conformations that make virion membrane fusion with the target cell membrane more probable but that D614G retains susceptibility to therapies that disrupt interaction of the SARS-CoV-2 S protein with the ACE2 receptor.
References
More filters
Journal ArticleDOI
MUSCLE: multiple sequence alignment with high accuracy and high throughput
TL;DR: MUSCLE is a new computer program for creating multiple alignments of protein sequences that includes fast distance estimation using kmer counting, progressive alignment using a new profile function the authors call the log-expectation score, and refinement using tree-dependent restricted partitioning.
Journal ArticleDOI
MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability
Kazutaka Katoh,Daron M. Standley +1 more
TL;DR: This version of MAFFT has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update.
Journal ArticleDOI
RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.
TL;DR: This work presents some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting post-analyses on sets of trees.