scispace - formally typeset
Open accessJournalISSN: 2167-8359


About: PeerJ is an academic journal. The journal publishes majorly in the area(s): Population & Gene. It has an ISSN identifier of 2167-8359. It is also open access. Over the lifetime, 13444 publication(s) have been published receiving 155145 citation(s). The journal is also known as: PeerJ Life & Environment. more

Topics: Population, Gene, Genome more

Open accessJournal ArticleDOI: 10.7717/PEERJ.2584
Torbjørn Rognes1, Torbjørn Rognes2, Tomas Flouri3, Tomas Flouri4  +4 moreInstitutions (7)
18 Oct 2016-PeerJ
Abstract: Background: VSEARCH is an open source and free of charge multithreaded 64-bit tool for processing and preparing metagenomics, genomics and population genomics nucleotide sequence data. It is designed as an alternative to the widely used USEARCH tool (Edgar, 2010) for which the source code is not publicly available, algorithm details are only rudimentarily described, and only a memory-confined 32-bit version is freely available for academic use. Methods: When searching nucleotide sequences, VSEARCH uses a fast heuristic based on words shared by the query and target sequences in order to quickly identify similar sequences, a similar strategy is probably used in USEARCH. VSEARCH then performs optimal global sequence alignment of the query against potential target sequences, using full dynamic programming instead of the seed-and-extend heuristic used by USEARCH. Pairwise alignments are computed in parallel using vectorisation and multiple threads. Results: VSEARCH includes most commands for analysing nucleotide sequences available in USEARCH version 7 and several of those available in USEARCH version 8, including searching (exact or based on global alignment), clustering by similarity (using length pre-sorting, abundance pre-sorting or a user-defined order), chimera detection (reference-based or de novo), dereplication (full length or prefix), pairwise alignment, reverse complementation, sorting, and subsampling. VSEARCH also includes commands for FASTQ file processing, i.e., format detection, filtering, read quality statistics, and merging of paired reads. Furthermore, VSEARCH extends functionality with several new commands and improvements, including shuffling, rereplication, masking of low-complexity sequences with the well-known DUST algorithm, a choice among different similarity definitions, and FASTQ file format conversion. VSEARCH is here shown to be more accurate than USEARCH when performing searching, clustering, chimera detection and subsampling, while on a par with USEARCH for paired-ends read merging. VSEARCH is slower than USEARCH when performing clustering and chimera detection, but significantly faster when performing paired-end reads merging and dereplication. VSEARCH is available at under either the BSD 2-clause license or the GNU General Public License version 3.0. Discussion: VSEARCH has been shown to be a fast, accurate and full-fledged alternative to USEARCH. A free and open-source versatile tool for sequence analysis is now available to the metagenomics community. more

Topics: FASTQ format (53%)

3,673 Citations

Open accessJournal ArticleDOI: 10.7717/PEERJ.453
19 Jun 2014-PeerJ
Abstract: scikit-image is an image processing library that implements algorithms and utilities for use in research, education and industry applications. It is released under the liberal Modified BSD open source license, provides a well-documented API in the Python programming language, and is developed by an active, international team of collaborators. In this paper we highlight the advantages of open source to achieve the goals of the scikit-image library, and we showcase several real-world image processing applications that use scikit-image. More information can be found on the project homepage, more

2,367 Citations

Open accessJournal ArticleDOI: 10.7717/PEERJ.281
04 Mar 2014-PeerJ
Abstract: Many microbial, fungal, or oomcyete populations violate assumptions for population genetic analysis because these populations are clonal, admixed, partially clonal, and/or sexual. Furthermore, few tools exist that are specifically designed for analyzing data from clonal populations, making analysis difficult and haphazard. We developed the R package poppr providing unique tools for analysis of data from admixed, clonal, mixed, and/or sexual populations. Currently, poppr can be used for dominant/codominant and haploid/diploid genetic data. Data can be imported from several formats including GenAlEx formatted text files and can be analyzed on a user-defined hierarchy that includes unlimited levels of subpopulation structure and clone censoring. New functions include calculation of Bruvo’s distance for microsatellites, batch-analysis of the index of association with several indices of genotypic diversity, and graphing including dendrograms with bootstrap support and minimum spanning networks. While functions for genotypic diversity and clone censoring are specific for clonal populations, several functions found in poppr are also valuable to analysis of any populations. A manual with documentation and examples is provided. Poppr is open source and major releases are available on CRAN: More supporting documentation and tutorials can be found under ‘resources’ at: more

Topics: Population (52%)

1,365 Citations

Open accessJournal ArticleDOI: 10.7717/PEERJ-CS.55
06 Apr 2016-PeerJ
Abstract: Probabilistic Programming allows for automatic Bayesian inference on user-defined probabilistic models. Recent advances in Markov chain Monte Carlo (MCMC) sampling allow inference on increasingly complex models. This class of MCMC, known as Hamiltonian Monte Carlo, requires gradient information which is often not readily available. PyMC3 is a new open source Probabilistic Programming framework written in Python that uses Theano to compute gradients via automatic dierentiation as well as compile probabilistic programs on-the-fly to C for increased speed. Contrary to other Probabilistic Programming languages, PyMC3 allows model specification directly in Python code. The lack of a domain specific language allows for great flexibility and direct interaction with the model. This paper is a tutorial-style introduction to this software package. more

Topics: Probabilistic CTL (63%), Python (programming language) (61%), Probabilistic logic (60%) more

1,263 Citations

Open accessJournal ArticleDOI: 10.7717/PEERJ.1165
Dongwan D. Kang1, Jeff Froula2, Jeff Froula1, Rob Egan2  +2 moreInstitutions (2)
27 Aug 2015-PeerJ
Abstract: Grouping large genomic fragments assembled from shotgun metagenomic sequences to deconvolute complex microbial communities, or metagenome binning, enables the study of individual organisms and their interactions. Because of the complex nature of these communities, existing metagenome binning methods often miss a large number of microbial species. In addition, most of the tools are not scalable to large datasets. Here we introduce automated software called MetaBAT that integrates empirical probabilistic distances of genome abundance and tetranucleotide frequency for accurate metagenome binning. MetaBAT outperforms alternative methods in accuracy and computational efficiency on both synthetic and real metagenome datasets. It automatically forms hundreds of high quality genome bins on a very large assembly consisting millions of contigs in a matter of hours on a single node. MetaBAT is open source software and available at more

Topics: Metagenomics (53%)

1,072 Citations

No. of papers from the Journal in previous years

Top Attributes

Show by:

Journal's top 5 most impactful authors

Kenneth B. Storey

12 papers, 154 citations

Robert J. Toonen

11 papers, 209 citations

Michael Wink

10 papers, 103 citations

Yong Poovorawan

9 papers, 54 citations

Antonio Palazón-Bru

7 papers, 37 citations

Network Information
Related Journals (5)

252.9K papers, 7.2M citations

84% related
Royal Society Open Science

4.4K papers, 53.6K citations

82% related
Biology Letters

3.7K papers, 142.4K citations

82% related
BMC Evolutionary Biology

4.2K papers, 194.2K citations

82% related

920 papers, 38.2K citations

82% related