scispace - formally typeset
Search or ask a question

Showing papers in "PeerJ in 2016"


Journal ArticleDOI
18 Oct 2016-PeerJ
TL;DR: VSEARCH is here shown to be more accurate than USEARCH when performing searching, clustering, chimera detection and subsampling, while on a par with US EARCH for paired-ends read merging and dereplication.
Abstract: Background: VSEARCH is an open source and free of charge multithreaded 64-bit tool for processing and preparing metagenomics, genomics and population genomics nucleotide sequence data. It is designed as an alternative to the widely used USEARCH tool (Edgar, 2010) for which the source code is not publicly available, algorithm details are only rudimentarily described, and only a memory-confined 32-bit version is freely available for academic use. Methods: When searching nucleotide sequences, VSEARCH uses a fast heuristic based on words shared by the query and target sequences in order to quickly identify similar sequences, a similar strategy is probably used in USEARCH. VSEARCH then performs optimal global sequence alignment of the query against potential target sequences, using full dynamic programming instead of the seed-and-extend heuristic used by USEARCH. Pairwise alignments are computed in parallel using vectorisation and multiple threads. Results: VSEARCH includes most commands for analysing nucleotide sequences available in USEARCH version 7 and several of those available in USEARCH version 8, including searching (exact or based on global alignment), clustering by similarity (using length pre-sorting, abundance pre-sorting or a user-defined order), chimera detection (reference-based or de novo), dereplication (full length or prefix), pairwise alignment, reverse complementation, sorting, and subsampling. VSEARCH also includes commands for FASTQ file processing, i.e., format detection, filtering, read quality statistics, and merging of paired reads. Furthermore, VSEARCH extends functionality with several new commands and improvements, including shuffling, rereplication, masking of low-complexity sequences with the well-known DUST algorithm, a choice among different similarity definitions, and FASTQ file format conversion. VSEARCH is here shown to be more accurate than USEARCH when performing searching, clustering, chimera detection and subsampling, while on a par with USEARCH for paired-ends read merging. VSEARCH is slower than USEARCH when performing clustering and chimera detection, but significantly faster when performing paired-end reads merging and dereplication. VSEARCH is available at https://github.com/torognes/vsearch under either the BSD 2-clause license or the GNU General Public License version 3.0. Discussion: VSEARCH has been shown to be a fast, accurate and full-fledged alternative to USEARCH. A free and open-source versatile tool for sequence analysis is now available to the metagenomics community.

5,850 citations


Journal ArticleDOI
06 Apr 2016-PeerJ
TL;DR: This paper is a tutorial-style introduction to PyMC3, a new open source Probabilistic Programming framework written in Python that uses Theano to compute gradients via automatic dierentiation as well as compile probabilistic programs on-the-fly to C for increased speed.
Abstract: Probabilistic Programming allows for automatic Bayesian inference on user-defined probabilistic models. Recent advances in Markov chain Monte Carlo (MCMC) sampling allow inference on increasingly complex models. This class of MCMC, known as Hamiltonian Monte Carlo, requires gradient information which is often not readily available. PyMC3 is a new open source Probabilistic Programming framework written in Python that uses Theano to compute gradients via automatic dierentiation as well as compile probabilistic programs on-the-fly to C for increased speed. Contrary to other Probabilistic Programming languages, PyMC3 allows model specification directly in Python code. The lack of a domain specific language allows for great flexibility and direct interaction with the model. This paper is a tutorial-style introduction to this software package.

1,969 citations


Journal ArticleDOI
13 Jun 2016-PeerJ
TL;DR: OncoLnc contains survival data for 8,647 patients from 21 cancer studies performed by The Cancer Genome Atlas, along with RNA-SEQ expression for mRNAs and miRNAs from TCGA, and lncRNA expression from MiTranscriptome beta and stores precomputed survival analyses, allowing users to quickly explore survival correlations for up to 21 cancers in a single click.
Abstract: OncoLnc is a tool for interactively exploring survival correlations, and for downloading clinical data coupled to expression data for mRNAs, miRNAs, or long noncoding RNAs (lncRNAs). OncoLnc contains survival data for 8,647 patients from 21 cancer studies performed by The Cancer Genome Atlas (TCGA), along with RNA-SEQ expression for mRNAs and miRNAs from TCGA, and lncRNA expression from MiTranscriptome beta. Storing this data gives users the ability to separate patients by gene expression, and then create publication-quality Kaplan-Meier plots or download the data for further analyses. OncoLnc also stores precomputed survival analyses, allowing users to quickly explore survival correlations for up to 21 cancers in a single click. This resource allows researchers studying a specific gene to quickly investigate if it may have a role in cancer, and the supporting data allows researchers studying a specific cancer to identify the mRNAs, miRNAs, and lncRNAs most correlated with survival, and researchers looking for a novel lncRNA involved with cancer lists of potential candidates. OncoLnc is available at http://www.oncolnc.org. Subjects Bioinformatics, Computational Biology, Databases

511 citations


Journal ArticleDOI
28 Jan 2016-PeerJ
TL;DR: AMAS (Alignment Manipulation And Summary), a tool that can be used either as a stand-alone command-line utility or as a Python package, works on amino acid and nucleotide alignments and combines capabilities of sequence manipulation with a function that calculates basic statistics.
Abstract: The amount of data used in phylogenetics has grown explosively in the recent years and many phylogenies are inferred with hundreds or even thousands of loci and many taxa. These modern phylogenomic studies often entail separate analyses of each of the loci in addition to multiple analyses of subsets of genes or concatenated sequences. Computationally efficient tools for handling and computing properties of thousands of single-locus or large concatenated alignments are needed. Here I present AMAS (Alignment Manipulation And Summary), a tool that can be used either as a stand-alone command-line utility or as a Python package. AMAS works on amino acid and nucleotide alignments and combines capabilities of sequence manipulation with a function that calculates basic statistics. The manipulation functions include conversions among popular formats, concatenation, extracting sites and splitting according to a pre-defined partitioning scheme, creation of replicate data sets, and removal of taxa. The statistics calculated include the number of taxa, alignment length, total count of matrix cells, overall number of undetermined characters, percent of missing data, AT and GC contents (for DNA alignments), count and proportion of variable sites, count and proportion of parsimony informative sites, and counts of all characters relevant for a nucleotide or amino acid alphabet. AMAS is particularly suitable for very large alignments with hundreds of taxa and thousands of loci. It is computationally efficient, utilizes parallel processing, and performs better at concatenation than other popular tools. AMAS is a Python 3 program that relies solely on Python's core modules and needs no additional dependencies. AMAS source code and manual can be downloaded from http://github.com/marekborowiec/AMAS/ under GNU General Public License.

421 citations


Journal ArticleDOI
24 Oct 2016-PeerJ
TL;DR: The empirical analysis indicates that the formal facts of a case are the most important predictive factor, consistent with the theory of legal realism suggesting that judicial decision-making is significantly affected by the stimulus of the facts.
Abstract: Recent advances in Natural Language Processing and Machine Learning provide us with the tools to build predictive models that can be used to unveil patterns driving judicial decisions. This can be useful, for both lawyers and judges, as an assisting tool to rapidly identify cases and extract patterns which lead to certain decisions. This paper presents the first systematic study on predicting the outcome of cases tried by the European Court of Human Rights based solely on textual content. We formulate a binary classification task where the input of our classifiers is the textual content extracted from a case and the target output is the actual judgment as to whether there has been a violation of an article of the convention of human rights. Textual information is represented using contiguous word sequences, i.e., N-grams, and topics. Our models can predict the court’s decisions with a strong accuracy (79% on average). Our empirical analysis indicates that the formal facts of a case are the most important predictive factor. This is consistent with the theory of legal realism suggesting that judicial decision-making is significantly affected by the stimulus of the facts. We also observe that the topical content of a case is another important feature in this classification task and explore this relationship further by conducting a qualitative analysis.

412 citations


Journal ArticleDOI
23 Feb 2016-PeerJ
TL;DR: Contrary to long held beliefs that the orb web is the crowning achievement of spider evolution, ancestral state reconstructions of web type support a phylogenetically ancient origin of the orbweb, and diversification analyses show that the mostly ground-dwelling, web-less RTA clade diversified faster than orb weavers.
Abstract: Spiders (Order Araneae) are massively abundant generalist arthropod predators that are found in nearly every ecosystem on the planet and have persisted for over 380 million years. Spiders have long served as evolutionary models for studying complex mating and web spinning behaviors, key innovation and adaptive radiation hypotheses, and have been inspiration for important theories like sexual selection by female choice. Unfortunately, past major attempts to reconstruct spider phylogeny typically employing the "usual suspect" genes have been unable to produce a well-supported phylogenetic framework for the entire order. To further resolve spider evolutionary relationships we have assembled a transcriptome-based data set comprising 70 ingroup spider taxa. Using maximum likelihood and shortcut coalescence-based approaches, we analyze eight data sets, the largest of which contains 3,398 gene regions and 696,652 amino acid sites forming the largest phylogenomic analysis of spider relationships produced to date. Contrary to long held beliefs that the orb web is the crowning achievement of spider evolution, ancestral state reconstructions of web type support a phylogenetically ancient origin of the orb web, and diversification analyses show that the mostly ground-dwelling, web-less RTA clade diversified faster than orb weavers. Consistent with molecular dating estimates we report herein, this may reflect a major increase in biomass of non-flying insects during the Cretaceous Terrestrial Revolution 125-90 million years ago favoring diversification of spiders that feed on cursorial rather than flying prey. Our results also have major implications for our understanding of spider systematics. Phylogenomic analyses corroborate several well-accepted high level groupings: Opisthothele, Mygalomorphae, Atypoidina, Avicularoidea, Theraphosoidina, Araneomorphae, Entelegynae, Araneoidea, the RTA clade, Dionycha and the Lycosoidea. Alternatively, our results challenge the monophyly of Eresoidea, Orbiculariae, and Deinopoidea. The composition of the major paleocribellate and neocribellate clades, the basal divisions of Araneomorphae, appear to be falsified. Traditional Haplogynae is in need of revision, as our findings appear to support the newly conceived concept of Synspermiata. The sister pairing of filistatids with hypochilids implies that some peculiar features of each family may in fact be synapomorphic for the pair. Leptonetids now are seen as a possible sister group to the Entelegynae, illustrating possible intermediates in the evolution of the more complex entelegyne genitalic condition, spinning organs and respiratory organs.

255 citations


Journal ArticleDOI
31 Aug 2016-PeerJ
TL;DR: Results of the GEC show the necessity of action to end the African elephants’ downward trajectory by preventing poaching and protecting habitat, and provide the first quantitative model of elephant population trends across Africa.
Abstract: African elephants (Loxodonta africana) are imperiled by poaching and habitat loss. Despite global attention to the plight of elephants, their population sizes and trends are uncertain or unknown over much of Africa. To conserve this iconic species, conservationists need timely, accurate data on elephant populations. Here, we report the results of the Great Elephant Census (GEC), the first continent-wide, standardized survey of African savannah elephants. We also provide the first quantitative model of elephant population trends across Africa. We estimated a population of 352,271 savannah elephants on study sites in 18 countries, representing approximately 93% of all savannah elephants in those countries. Elephant populations in survey areas with historical data decreased by an estimated 144,000 from 2007 to 2014, and populations are currently shrinking by 8% per year continent-wide, primarily due to poaching. Though 84% of elephants occurred in protected areas, many protected areas had carcass ratios that indicated high levels of elephant mortality. Results of the GEC show the necessity of action to end the African elephants' downward trajectory by preventing poaching and protecting habitat.

242 citations


Journal ArticleDOI
04 May 2016-PeerJ
TL;DR: It is found that while leopard research was increasing, research effort was primarily on the subspecies with the most remaining range whereas subspecies that are most in need of urgent attention were neglected.
Abstract: The leopard's (Panthera pardus) broad geographic range, remarkable adaptability, and secretive nature have contributed to a misconception that this species might not be severely threatened across its range. We find that not only are several subspecies and regional populations critically endangered but also the overall range loss is greater than the average for terrestrial large carnivores. To assess the leopard's status, we compile 6,000 records at 2,500 locations from over 1,300 sources on its historic (post 1750) and current distribution. We map the species across Africa and Asia, delineating areas where the species is confirmed present, is possibly present, is possibly extinct or is almost certainly extinct. The leopard now occupies 25-37% of its historic range, but this obscures important differences between subspecies. Of the nine recognized subspecies, three (P. p. pardus, fusca, and saxicolor) account for 97% of the leopard's extant range while another three (P. p. orientalis, nimr, and japonensis) have each lost as much as 98% of their historic range. Isolation, small patch sizes, and few remaining patches further threaten the six subspecies that each have less than 100,000 km(2) of extant range. Approximately 17% of extant leopard range is protected, although some endangered subspecies have far less. We found that while leopard research was increasing, research effort was primarily on the subspecies with the most remaining range whereas subspecies that are most in need of urgent attention were neglected.

235 citations


Journal ArticleDOI
19 Oct 2016-PeerJ
TL;DR: The results indicated that AgNPs@AV can be effectively utilized in pharmaceutical, biotechnological and biomedical applications and showed that the antibacterial effect of this hybrid nanomaterial was sufficient that it could be used to inhibit pathogenic bacteria.
Abstract: Background There is worldwide interest in silver nanoparticles (AgNPs) synthesized by various chemical reactions for use in applications exploiting their antibacterial activity, even though these processes exhibit a broad range of toxicity in vertebrates and invertebrates alike. To avoid the chemical toxicity, biosynthesis (green synthesis) of metal nanoparticles is proposed as a cost-effective and environmental friendly alternative. Aloe vera leaf extract is a medicinal agent with multiple properties including an antibacterial effect. Moreover the constituents of aloe vera leaves include lignin, hemicellulose, and pectins which can be used in the reduction of silver ions to produce as AgNPs@aloe vera (AgNPs@AV) with antibacterial activity. Methods AgNPs were prepared by an eco-friendly hydrothermal method using an aloe vera plant extract solution as both a reducing and stabilizing agent. AgNPs@AV were characterized using XRD and SEM. Additionally, an agar well diffusion method was used to screen for antimicrobial activity. MIC and MBC were used to correlate the concentration of AgNPs@AV its bactericidal effect. SEM was used to investigate bacterial inactivation. Then the toxicity with human cells was investigated using an MTT assay. Results The synthesized AgNPs were crystalline with sizes of 70.70 ± 22-192.02 ± 53 nm as revealed using XRD and SEM. The sizes of AgNPs can be varied through alteration of times and temperatures used in their synthesis. These AgNPs were investigated for potential use as an antibacterial agent to inhibit pathogenic bacteria. Their antibacterial activity was tested on S. epidermidis and P. aeruginosa. The results showed that AgNPs had a high antibacterial which depended on their synthesis conditions, particularly when processed at 100 oC for 6 h and 200 oC for 12 h. The cytotoxicity of AgNPs was determined using human PBMCs revealing no obvious cytotoxicity. These results indicated that AgNPs@AV can be effectively utilized in pharmaceutical, biotechnological and biomedical applications. Discussion Aloe vera extract was processed using a green and facile method. This was a hydrothermal method to reduce silver nitrate to AgNPs@AV. Varying the hydrothermal temperature provided the fine spherical shaped nanoparticles. The size of the nanomaterial was affected by its thermal preparation. The particle size of AgNPs could be tuned by varying both time and temperature. A process using a pure AG phase could go to completion in 6 h at 200 oC, whereas reactions at lower temperatures required longer times. Moreover, the antibacterial effect of this hybrid nanomaterial was sufficient that it could be used to inhibit pathogenic bacteria since silver release was dependent upon its particle size. The high activity of the largest AgNPs might have resulted from a high concentration of aloe vera compounds incorporated into the AgNPs during hydrothermal synthesis.

227 citations


Journal ArticleDOI
28 Apr 2016-PeerJ
TL;DR: A new data matrix composed of 96 separate taxa and 600 osteological characters was assembled and analysed to generate a comprehensive higher-level phylogenetic hypothesis of basal Archosauromorphs and shed light on the species-level interrelationships of taxa historically identified as proterosuchian archosauriforms.
Abstract: The early evolution of archosauromorphs during the Permo-Triassic constitutes an excellent empirical case study to shed light on evolutionary radiations in deep time and the timing and processes of recovery of terrestrial faunas after a mass extinction. However, macroevolutionary studies of early archosauromorphs are currently limited by poor knowledge of their phylogenetic relationships. In particular, one of the main early archosauromorph groups that need an exhaustive phylogenetic study is "Proterosuchia," which as historically conceived includes members of both Proterosuchidae and Erythrosuchidae. A new data matrix composed of 96 separate taxa (several of them not included in a quantitative phylogenetic analysis before) and 600 osteological characters was assembled and analysed to generate a comprehensive higher-level phylogenetic hypothesis of basal archosauromorphs and shed light on the species-level interrelationships of taxa historically identified as proterosuchian archosauriforms. The results of the analysis using maximum parsimony include a polyphyletic "Prolacertiformes" and "Protorosauria," in which the Permian Aenigmastropheus and Protorosaurus are the most basal archosauromorphs. The enigmatic choristoderans are either found as the sister-taxa of all other lepidosauromorphs or archosauromorphs, but consistently placed within Sauria. Prolacertids, rhynchosaurs, allokotosaurians and tanystropheids are the major successive sister clades of Archosauriformes. The Early Triassic Tasmaniosaurus is recovered as the sister-taxon of Archosauriformes. Proterosuchidae is unambiguosly restricted to five species that occur immediately after and before the Permo-Triassic boundary, thus implying that they are a short-lived "disaster" clade. Erythrosuchidae is composed of eight nominal species that occur during the Early and Middle Triassic. "Proterosuchia" is polyphyletic, in which erythrosuchids are more closely related to Euparkeria and more crownward archosauriforms than to proterosuchids, and several species are found widespread along the archosauromorph tree, some being nested within Archosauria (e.g., "Chasmatosaurus ultimus," Youngosuchus). Doswelliids and proterochampsids are recovered as more closely related to each other than to other archosauromorphs, forming a large clade (Proterochampsia) of semi-aquatic to aquatic forms that includes the bizarre genus Vancleavea. Euparkeria is one of the sister-taxa of the clade composed of proterochampsians and archosaurs. The putative Indian archosaur Yarasuchus is recovered in a polytomy with Euparkeria and more crownward archosauriforms, and as more closely related to the Russian Dongusuchus than to other species. Phytosaurs are recovered as the sister-taxa of all other pseudosuchians, thus being nested within Archosauria.

213 citations


Journal ArticleDOI
29 Sep 2016-PeerJ
TL;DR: The finding that GPS features predict depressive symptom severity up to 10 weeks prior to assessment suggests that GPS Features may have the potential as early warning signals of depression.
Abstract: Background Smartphones offer the hope that depression can be detected using passively collected data from the phone sensors. The aim of this study was to replicate and extend previous work using geographic location (GPS) sensors to identify depressive symptom severity. Methods We used a dataset collected from 48 college students over a 10-week period, which included GPS phone sensor data and the Patient Health Questionnaire 9-item (PHQ-9) to evaluate depressive symptom severity at baseline and end-of-study. GPS features were calculated over the entire study, for weekdays and weekends, and in 2-week blocks. Results The results of this study replicated our previous findings that a number of GPS features, including location variance, entropy, and circadian movement, were significantly correlated with PHQ-9 scores (r's ranging from -0.43 to -0.46, p-values Discussion Our findings were consistent with past research demonstrating that GPS features may be an important and reliable predictor of depressive symptom severity. The varying strength of these relationships on weekends and weekdays suggests the role of weekend/weekday as a moderating variable. The finding that GPS features predict depressive symptom severity up to 10 weeks prior to assessment suggests that GPS features may have the potential as early warning signals of depression.

Journal ArticleDOI
26 Jul 2016-PeerJ
TL;DR: The Phage On Tap protocol is introduced for the quick and efficient preparation of homogenous bacteriophage stocks, producing homogenous, laboratory-scale, high titer phage banks that can be used to eliminate the variability between phage propagations and improve the molecular characterizations of phage.
Abstract: A major limitation with traditional phage preparations is the variability in titer, salts, and bacterial contaminants between successive propagations. Here we introduce the Phage On Tap (PoT) protocol for the quick and efficient preparation of homogenous bacteriophage (phage) stocks. This method produces homogenous, laboratory-scale, high titer (up to 10(10-11) PFU·ml(-1)), endotoxin reduced phage banks that can be used to eliminate the variability between phage propagations and improve the molecular characterizations of phage. The method consists of five major parts, including phage propagation, phage clean up by 0.22 μm filtering and chloroform treatment, phage concentration by ultrafiltration, endotoxin removal, and the preparation and storage of phage banks for continuous laboratory use. From a starting liquid lysate of > 100 mL, the PoT protocol generated a clean, homogenous, laboratory phage bank with a phage recovery efficiency of 85% within just two days. In contrast, the traditional method took upwards of five days to produce a high titer, but lower volume phage stock with a recovery efficiency of only 4%. Phage banks can be further purified for the removal of bacterial endotoxins, reducing endotoxin concentrations by over 3,000-fold while maintaining phage titer. The PoT protocol focused on T-like phages, but is broadly applicable to a variety of phages that can be propagated to sufficient titer, producing homogenous, high titer phage banks that are applicable for molecular and cellular assays.

Journal ArticleDOI
19 Jan 2016-PeerJ
TL;DR: The findings are consistent with the presence of a unique microbiota dominated by Bacteroides residing on the endometrium of the human non-pregnant uterus and likely to have a previously unrecognized role in uterine physiology and human reproduction.
Abstract: Background. It is widely assumed that the uterine cavity in non-pregnant women is physiologically sterile, also as a premise to the long-held view that human infants develop in a sterile uterine environment, though likely reflecting under-appraisal of the extent of the human bacterial metacommunity. In an exploratory study, we aimed to investigate the putative presence of a uterine microbiome in a selected series of non-pregnant women through deep sequencing of the V1-2 hypervariable region of the 16S ribosomal RNA (rRNA) gene. Methods. Nineteen women with various reproductive conditions, including subfertility, scheduled for hysteroscopy and not showing uterine anomalies were recruited. Subjects were highly diverse with regard to demographic and medical history and included nulliparous and parous women. Endometrial tissue and mucus harvesting was performed by use of a transcervical device designed to obtain endometrial biopsy, while avoiding cervicovaginal contamination. Bacteria were targeted by use of a barcoded Illumina MiSeq paired-end sequencing method targeting the 16S rRNA gene V1-2 region, yielding an average of 41,194 reads per sample after quality filtering. Taxonomic annotation was pursued by comparison with sequences available through the Ribosomal Database Project and the NCBI database. Results. Out of 183 unique 16S rRNA gene amplicon sequences, 15 phylotypes were present in all samples. In some 90% of the women included, community architecture was fairly similar inasmuch B. xylanisolvens, B. thetaiotaomicron, B. fragilis and an undetermined Pelomonas taxon constituted over one third of the endometrial bacterial community. On the singular phylotype level, six women showed predominance of L. crispatus or L. iners in the presence of the Bacteroides core. Two endometrial communities were highly dissimilar, largely lacking the Bacteroides core, one dominated by L. crispatus and another consisting of a highly diverse community, including Prevotella spp., Atopobium vaginae, and Mobiluncus curtisii. Discussion. Our findings are, albeit not necessarily generalizable, consistent with the presence of a unique microbiota dominated by Bacteroides residing on the endometrium of the human non-pregnant uterus. The transcervical sampling approach may be influenced to an unknown extent by endocervical microbiota, which remain uncharacterised, and therefore warrants further validation. Nonetheless, consistent with our understanding of the human microbiome, the uterine microbiota are likely to have a previously unrecognized role in uterine physiology and human reproduction. Further study is therefore warranted to document community ecology and dynamics of the uterine microbiota, as well as the role of the uterine microbiome in health and disease.

Journal ArticleDOI
28 Mar 2016-PeerJ
TL;DR: The ability of the PacBio Single Molecule, Real-Time (SMRT) DNA sequencing platform to generate sequence reads from the 16S rRNA gene is tested and a simple method based on sequence characteristics and quality scores is developed to reduce the observed error rate for the V1–V9 region.
Abstract: Over the past 10 years, microbial ecologists have largely abandoned sequencing 16S rRNA genes by the Sanger sequencing method and have instead adopted highly parallelized sequencing platforms. These new platforms, such as 454 and Illumina's MiSeq, have allowed researchers to obtain millions of high quality but short sequences. The result of the added sequencing depth has been significant improvements in experimental design. The tradeoff has been the decline in the number of full-length reference sequences that are deposited into databases. To overcome this problem, we tested the ability of the PacBio Single Molecule, Real-Time (SMRT) DNA sequencing platform to generate sequence reads from the 16S rRNA gene. We generated sequencing data from the V4, V3-V5, V1-V3, V1-V5, V1-V6, and V1-V9 variable regions from within the 16S rRNA gene using DNA from a synthetic mock community and natural samples collected from human feces, mouse feces, and soil. The mock community allowed us to assess the actual sequencing error rate and how that error rate changed when different curation methods were applied. We developed a simple method based on sequence characteristics and quality scores to reduce the observed error rate for the V1-V9 region from 0.69 to 0.027%. This error rate is comparable to what has been observed for the shorter reads generated by 454 and Illumina's MiSeq sequencing platforms. Although the per base sequencing cost is still significantly more than that of MiSeq, the prospect of supplementing reference databases with full-length sequences from organisms below the limit of detection from the Sanger approach is exciting.

Journal ArticleDOI
05 Apr 2016-PeerJ
TL;DR: WGS MLST derived serotyping is a high throughput, accurate, robust, reliable typing method, well suited to routine public health surveillance and supports the maintenance of traditional serovar nomenclature while providing additional insight on the true phylogenetic relationship between isolates.
Abstract: In April 2015, Public Health England implemented whole genome sequencing (WGS) as a routine typing tool for public health surveillance of Salmonella, adopting a multilocus sequence typing (MLST) approach as a replacement for traditional serotyping. The WGS derived sequence type (ST) was compared to the phenotypic serotype for 6,887 isolates of S. enterica subspecies I, and of these, 6,616 (96%) were concordant. Of the 4% (n = 271) of isolates of subspecies I exhibiting a mismatch, 119 were due to a process error in the laboratory, 26 were likely caused by the serotype designation in the MLST database being incorrect and 126 occurred when two different serovars belonged to the same ST. The population structure of S. enterica subspecies II-IV differs markedly from that of subspecies I and, based on current data, defining the serovar from the clonal complex may be less appropriate for the classification of this group. Novel sequence types that were not present in the MLST database were identified in 8.6% of the total number of samples tested (including S. enterica subspecies I-IV and S. bongori) and these 654 isolates belonged to 326 novel STs. For S. enterica subspecies I, WGS MLST derived serotyping is a high throughput, accurate, robust, reliable typing method, well suited to routine public health surveillance. The combined output of ST and serovar supports the maintenance of traditional serovar nomenclature while providing additional insight on the true phylogenetic relationship between isolates.


Journal ArticleDOI
22 Mar 2016-PeerJ
TL;DR: The Emotiv EPOC device may be more suitable for control tasks using the attention/meditation level or eye blinking than the Neurosky MindWave device, which exhibits high variability and non-normality of attention and meditation data.
Abstract: We present the evaluation of two well-known, low-cost consumer-grade EEG devices: the Emotiv EPOC and the Neurosky MindWave. Problems with using the consumer-grade EEG devices (BCI illiteracy, poor technical characteristics, and adverse EEG artefacts) are discussed. The experimental evaluation of the devices, performed with 10 subjects asked to perform concentration/relaxation and blinking recognition tasks, is given. The results of statistical analysis show that both devices exhibit high variability and non-normality of attention and meditation data, which makes each of them difficult to use as an input to control tasks. BCI illiteracy may be a significant problem, as well as setting up of the proper environment of the experiment. The results of blinking recognition show that using the Neurosky device means recognition accuracy is less than 50%, while the Emotiv device has achieved a recognition accuracy of more than 75%; for tasks that require concentration and relaxation of subjects, the Emotiv EPOC device has performed better (as measured by the recognition accuracy) by ∼9%. Therefore, the Emotiv EPOC device may be more suitable for control tasks using the attention/meditation level or eye blinking than the Neurosky MindWave device.

Journal ArticleDOI
09 Aug 2016-PeerJ
TL;DR: While current pollinator management approaches are largely driven by mitigating past impacts, this work presents opportunities for pre-emptive practice, legislation, and policy to sustainably manage pollinators for future generations.
Abstract: Background. Pollinators, which provide the agriculturally and ecologically essential service of pollination, are under threat at a global scale. Habitat loss and homogenisation, pesticides, parasites and pathogens, invasive species, and climate change have been identified as past and current threats to pollinators. Actions to mitigate these threats, e.g., agri-environment schemes and pesticide-use moratoriums, exist, but have largely been applied post-hoc. However, future sustainability of pollinators and the service they provide requires anticipation of potential threats and opportunities before they occur, enabling timely implementation of policy and practice to prevent, rather than mitigate, further pollinator declines. Methods.Using a horizon scanning approach we identified issues that are likely to impact pollinators, either positively or negatively, over the coming three decades. Results.Our analysis highlights six high priority, and nine secondary issues. High priorities are: (1) corporate control of global agriculture, (2) novel systemic pesticides, (3) novel RNA viruses, (4) the development of new managed pollinators, (5) more frequent heatwaves and drought under climate change, and (6) the potential positive impact of reduced chemical use on pollinators in non-agricultural settings. Discussion. While current pollinator management approaches are largely driven by mitigating past impacts, we present opportunities for pre-emptive practice, legislation, and policy to sustainably manage pollinators for future generations.

Journal ArticleDOI
20 Oct 2016-PeerJ
TL;DR: The recent shift from costly and complex engineering solutions to recover degraded reef structure to more economical and efficient ecological approaches that focus on recovering the living components of reef communities is described.
Abstract: Reef restoration activities have proliferated in response to the need to mitigate coral declines and recover lost reef structure, function, and ecosystem services. Here, we describe the recent shift from costly and complex engineering solutions to recover degraded reef structure to more economical and efficient ecological approaches that focus on recovering the living components of reef communities. We review the adoption and expansion of the coral gardening framework in the Caribbean and Western Atlantic where practitioners now grow and outplant 10,000’s of corals onto degraded reefs each year. We detail the steps for establishing a gardening program as well as long-term goals and direct and indirect benefits of this approach in our region. With a strong scientific basis, coral gardening activities now contribute significantly to reef and species recovery, provide important scientific, education, and outreach opportunities, and offer alternate livelihoods to local stakeholders. While challenges still remain, the transition from engineering to ecological solutions for reef degradation has opened the field of coral reef restoration to a wider audience poised to contribute to reef conservation and recovery in regions where coral losses and recruitment bottlenecks hinder natural recovery.

Journal ArticleDOI
08 Dec 2016-PeerJ
TL;DR: Mock viral communities are designed and application to community virus DNA from three freshwater and three marine samples revealed that ssDNA viruses as a whole represent only a minor fraction of DNA virus communities, though individual ssDNA genomes can be among the most abundant viral genomes in a sample.
Abstract: National Science Foundation [1536989]; Gordon and Betty Moore Foundation [3790, GBMF2631]; Flinn Foundation; University of Arizona Technology and Research Initiative Fund through the Water, Environmental and Energy Solutions Initiative; Ecosystem Genomics Institute; NSF [MCB-0701984, DEB-1555854]

Journal ArticleDOI
26 Apr 2016-PeerJ
TL;DR: A negative relationship between job satisfaction and intention to quit the existing employment of nurses in Turkey was revealed and satisfaction with supervisor support was the only facet that significantly explained turnover intent when controlling for gender, age, marital status, education, and experience.
Abstract: The aim of this study was to identify the facets influencing job satisfaction and intention to quit of nurses employed in Turkey. Using a non-probability sampling technique, 417 nurses from six large private hospitals were surveyed from March 2014 to June 2014. The nurses' demographic data, their job-related satisfaction and turnover intentions were recorded through a self-administered questionnaire. In this study, descriptive and bivariate analyses were used to explore data, and multivariate analysis was performed using logistic regression. Nurses' job satisfaction was found at a moderate level with 61% of the nurses intended to quit. Nevertheless, nurses reported a high satisfaction level with work environment, supervisor support, and co-workers among the selected nine facets of job satisfaction. They also reported a low satisfaction level with contingent reward, fringe benefits, and pay. The impact of demographic characteristics on job satisfaction and intention to quit was also examined. The study revealed a negative relationship between job satisfaction and intention to quit the existing employment. Moreover, satisfaction with supervisor support was the only facet that significantly explained turnover intent when controlling for gender, age, marital status, education, and experience. The implications for nurse management were also described for increasing nurses' job satisfaction and retention. This study is beneficial for hospital management to ensure proper nursing care that would lead to a better quality healthcare service.

Journal ArticleDOI
02 Feb 2016-PeerJ
TL;DR: This is the first study to present source spectra for populations of different ship classes operating in coastal habitats, including at higher frequencies used by killer whales for both communication and echolocation.
Abstract: Combining calibrated hydrophone measurements with vessel location data from the Automatic Identification System, we estimate underwater sound pressure levels for 1,582 unique ships that transited the core critical habitat of the endangered Southern Resident killer whales during 28 months between March, 2011, and October, 2013. Median received spectrum levels of noise from 2,809 isolated transits are elevated relative to median background levels not only at low frequencies (20-30 dB re 1 µPa(2)/Hz from 100 to 1,000 Hz), but also at high frequencies (5-13 dB from 10,000 to 96,000 Hz). Thus, noise received from ships at ranges less than 3 km extends to frequencies used by odontocetes. Broadband received levels (11.5-40,000 Hz) near the shoreline in Haro Strait (WA, USA) for the entire ship population were 110 ± 7 dB re 1 µPa on average. Assuming near-spherical spreading based on a transmission loss experiment we compute mean broadband source levels for the ship population of 173 ± 7 dB re 1 µPa 1 m without accounting for frequency-dependent absorption. Mean ship speed was 7.3 ± 2.0 m/s (14.1 ± 3.9 knots). Most ship classes show a linear relationship between source level and speed with a slope near +2 dB per m/s (+1 dB/knot). Spectrum, 1/12-octave, and 1/3-octave source levels for the whole population have median values that are comparable to previous measurements and models at most frequencies, but for select studies may be relatively low below 200 Hz and high above 20,000 Hz. Median source spectrum levels peak near 50 Hz for all 12 ship classes, have a maximum of 159 dB re 1 µPa(2)/Hz @ 1 m for container ships, and vary between classes. Below 200 Hz, the class-specific median spectrum levels bifurcate with large commercial ships grouping as higher power noise sources. Within all ship classes spectrum levels vary more at low frequencies than at high frequencies, and the degree of variability is almost halved for classes that have smaller speed standard deviations. This is the first study to present source spectra for populations of different ship classes operating in coastal habitats, including at higher frequencies used by killer whales for both communication and echolocation.

Journal ArticleDOI
21 Mar 2016-PeerJ
TL;DR: With proper planning of take-off and landing sites, flight paths and careful UAV model selection, UAVs can provide an excellent tool for accurately surveying wild waterfowl populations and provide archival data with fewer logistical issues than traditional methods such as manned aerial surveys.
Abstract: The use of unmanned aerial vehicles (UAVs) for ecological research has grown rapidly in recent years, but few studies have assessed the disturbance impacts of these tools on focal subjects, particularly when observing easily disturbed species such as waterfowl. In this study we assessed the level of disturbance that a range of UAV shapes and sizes had on free-living, non-breeding waterfowl surveyed in two sites in eastern Australia between March and May 2015, as well as the capability of airborne digital imaging systems to provide adequate resolution for unambiguous species identification of these taxa. We found little or no obvious disturbance effects on wild, mixed-species flocks of waterfowl when UAVs were flown at least 60m above the water level (fixed wing models) or 40m above individuals (multirotor models). Disturbance in the form of swimming away from the UAV through to leaving the water surface and flying away from the UAV was visible at lower altitudes and when fixed-wing UAVs either approached subjects directly or rapidly changed altitude and/or direction near animals. Using tangential approach flight paths that did not cause disturbance, commercially available onboard optical equipment was able to capture images of sufficient quality to identify waterfowl and even much smaller taxa such as swallows. Our results show that with proper planning of take-off and landing sites, flight paths and careful UAV model selection, UAVs can provide an excellent tool for accurately surveying wild waterfowl populations and provide archival data with fewer logistical issues than traditional methods such as manned aerial surveys.

Journal ArticleDOI
18 Aug 2016-PeerJ
TL;DR: The isolation of an antimicrobial compound produced by Pseudovibrio sp.
Abstract: Bacterial communities associated with healthy corals produce antimicrobial compounds that inhibit the colonization and growth of invasive microbes and potential pathogens. To date, however, bacteria-derived antimicrobial molecules have not been identified in reef-building corals. Here, we report the isolation of an antimicrobial compound produced by Pseudovibrio sp. P12, a common and abundant coral-associated bacterium. This strain was capable of metabolizing dimethylsulfoniopropionate (DMSP), a sulfur molecule produced in high concentrations by reef-building corals and playing a role in structuring their bacterial communities. Bioassay-guided fractionation coupled with nuclear magnetic resonance (NMR) and mass spectrometry (MS), identified the antimicrobial as tropodithietic acid (TDA), a sulfur-containing compound likely derived from DMSP catabolism. TDA was produced in large quantities by Pseudovibrio sp., and prevented the growth of two previously identified coral pathogens, Vibrio coralliilyticus and V. owensii, at very low concentrations (0.5 mg/mL) in agar diffusion assays. Genome sequencing of Pseudovibrio sp. P12 identified gene homologs likely involved in the metabolism of DMSP and production of TDA. These results provide additional evidence for the integral role of DMSP in structuring coral-associated bacterial communities and underline the potential of these DMSP-metabolizing microbes to contribute to coral disease prevention.

Journal ArticleDOI
19 Apr 2016-PeerJ
TL;DR: The ability of the 16S ribosomal DNA gene as an alternative metabarcoding marker for species level assessments was explored and variation in read abundances of two orders of magnitudes is still observed.
Abstract: Cytochrome c oxidase I (COI) is a powerful marker for DNA barcoding of animals, with good taxonomic resolution and a large reference database. However, when used for DNA metabarcoding, estimation of taxa abundances and species detection are limited due to primer bias caused by highly variable primer binding sites across the COI gene. Therefore, we explored the ability of the 16S ribosomal DNA gene as an alternative metabarcoding marker for species level assessments. Ten bulk samples, each containing equal amounts of tissue from 52 freshwater invertebrate taxa, were sequenced with the Illumina NextSeq 500 system. The 16S primers amplified three more insect species than the Folmer COI primers and amplified more equally, probably due to decreased primer bias. Estimation of biomass might be less biased with 16S than with COI, although variation in read abundances of two orders of magnitudes is still observed. According to these results, the marker choice depends on the scientific question. If the goal is to obtain a taxonomic identification at the species level, then COI is more appropriate due to established reference databases and known taxonomic resolution of this marker, knowing that a greater proportion of insects will be missed using COI Folmer primers. If the goal is to obtain a more comprehensive survey the 16S marker, which requires building a local reference database, or optimised degenerated COI primers could be more appropriate.

Journal ArticleDOI
05 Oct 2016-PeerJ
TL;DR: Preterm infants benefit from probiotics to prevent severe NEC and death, and no difference was shown in culture-proven sepsis RR 0.77–1.00].
Abstract: Context Necrotizing enterocolitis (NEC) is the most frequent gastrointestinal emergency in neonates. The microbiome of the preterm gut may regulate the integrity of the intestinal mucosa. Probiotics may positively contribute to mucosal integrity, potentially reducing the risk of NEC in neonates.

Journal ArticleDOI
14 Sep 2016-PeerJ
TL;DR: A WGS-based serotyping method that can predict capsular type to serotype level for 89/94 serotypes and to serogroup level for the remaining four is developed, which could be integrated into routine typing workflows in reference laboratories, reducing the need for phenotypic immunological testing.
Abstract: Streptococcus pneumoniae typically express one of 92 serologically distinct capsule polysaccharide (cps) types (serotypes). Some of these serotypes are closely related to each other; using the commercially available typing antisera, these are assigned to common serogroups containing types that show cross-reactivity. In this serotyping scheme, factor antisera are used to allocate serotypes within a serogroup, based on patterns of reactions. This serotyping method is technically demanding, requires considerable experience and the reading of the results can be subjective. This study describes the analysis of the S. pneumoniae capsular operon genetic sequence to determine serotype distinguishing features and the development, evaluation and verification of an automated whole genome sequence (WGS)-based serotyping bioinformatics tool, PneumoCaT (Pneumococcal Capsule Typing). Initially, WGS data from 871 S. pneumoniae isolates were mapped to reference cps locus sequences for the 92 serotypes. Thirty-two of 92 serotypes could be unambiguously identified based on sequence similarities within the cps operon. The remaining 60 were allocated to one of 20 'genogroups' that broadly correspond to the immunologically defined serogroups. By comparing the cps reference sequences for each genogroup, unique molecular differences were determined for serotypes within 18 of the 20 genogroups and verified using the set of 871 isolates. This information was used to design a decision-tree style algorithm within the PneumoCaT bioinformatics tool to predict to serotype level for 89/94 (92 + 2 molecular types/subtypes) from WGS data and to serogroup level for serogroups 24 and 32, which currently comprise 2.1% of UK referred, invasive isolates submitted to the National Reference Laboratory (NRL), Public Health England (June 2014-July 2015). PneumoCaT was evaluated with an internal validation set of 2065 UK isolates covering 72/92 serotypes, including 19 non-typeable isolates and an external validation set of 2964 isolates from Thailand (n = 2,531), USA (n = 181) and Iceland (n = 252). PneumoCaT was able to predict serotype in 99.1% of the typeable UK isolates and in 99.0% of the non-UK isolates. Concordance was evaluated in UK isolates where further investigation was possible; in 91.5% of the cases the predicted capsular type was concordant with the serologically derived serotype. Following retesting, concordance increased to 99.3% and in most resolved cases (97.8%; 135/138) discordance was shown to be caused by errors in original serotyping. Replicate testing demonstrated that PneumoCaT gave 100% reproducibility of the predicted serotype result. In summary, we have developed a WGS-based serotyping method that can predict capsular type to serotype level for 89/94 serotypes and to serogroup level for the remaining four. This approach could be integrated into routine typing workflows in reference laboratories, reducing the need for phenotypic immunological testing.

Journal ArticleDOI
08 Nov 2016-PeerJ
TL;DR: New insights into the soil carbon cycle during an intense period of carbon turnover, including biogeochemical roles to previously little known soil microbes, were made possible via the combination of metagenomics, proteomics, and metabolomics.
Abstract: Annually, half of all plant-derived carbon is added to soil where it is microbially respired to CO2. However, understanding of the microbiology of this process is limited because most culture-independent methods cannot link metabolic processes to the organisms present, and this link to causative agents is necessary to predict the results of perturbations on the system. We collected soil samples at two sub-root depths (10-20 cm and 30-40 cm) before and after a rainfall-driven nutrient perturbation event in a Northern California grassland that experiences a Mediterranean climate. From ten samples, we reconstructed 198 metagenome-assembled genomes that represent all major phylotypes. We also quantified 6,835 proteins and 175 metabolites and showed that after the rain event the concentrations of many sugars and amino acids approach zero at the base of the soil profile. Unexpectedly, the genomes of novel members of the Gemmatimonadetes and Candidate Phylum Rokubacteria phyla encode pathways for methylotrophy. We infer that these abundant organisms contribute substantially to carbon turnover in the soil, given that methylotrophy proteins were among the most abundant proteins in the proteome. Previously undescribed Bathyarchaeota and Thermoplasmatales archaea are abundant in deeper soil horizons and are inferred to contribute appreciably to aromatic amino acid degradation. Many of the other bacteria appear to breakdown other components of plant biomass, as evidenced by the prevalence of various sugar and amino acid transporters and corresponding hydrolyzing machinery in the proteome. Overall, our work provides organism-resolved insight into the spatial distribution of bacteria and archaea whose activities combine to degrade plant-derived organics, limiting the transport of methanol, amino acids and sugars into underlying weathered rock. The new insights into the soil carbon cycle during an intense period of carbon turnover, including biogeochemical roles to previously little known soil microbes, were made possible via the combination of metagenomics, proteomics, and metabolomics.

Journal ArticleDOI
02 Nov 2016-PeerJ
TL;DR: This paper simulated millions of smooth, random 1D datasets to validate theoretical predictions of the 0D, 1D and ROI approaches and to emphasize how ROIs provide a continuous bridge between 0D and 1D results, and showed that a priori ROI particulars can qualitatively affect the biomechanical conclusions that emerge from analyses.
Abstract: One-dimensional (1D) kinematic, force, and EMG trajectories are often analyzed using zero-dimensional (0D) metrics like local extrema. Recently whole-trajectory 1D methods have emerged in the literature as alternatives. Since 0D and 1D methods can yield qualitatively different results, the two approaches may appear to be theoretically distinct. The purposes of this paper were (a) to clarify that 0D and 1D approaches are actually just special cases of a more general region-of-interest (ROI) analysis framework, and (b) to demonstrate how ROIs can augment statistical power. We first simulated millions of smooth, random 1D datasets to validate theoretical predictions of the 0D, 1D and ROI approaches and to emphasize how ROIs provide a continuous bridge between 0D and 1D results. We then analyzed a variety of public datasets to demonstrate potential effects of ROIs on biomechanical conclusions. Results showed, first, that a priori ROI particulars can qualitatively affect the biomechanical conclusions that emerge from analyses and, second, that ROIs derived from exploratory/pilot analyses can detect smaller biomechanical effects than are detectable using full 1D methods. We recommend regarding ROIs, like data filtering particulars and Type I error rate, as parameters which can affect hypothesis testing results, and thus as sensitivity analysis tools to ensure arbitrary decisions do not influence scientific interpretations. Last, we describe open-source Python and MATLAB implementations of 1D ROI analysis for arbitrary experimental designs ranging from one-sample t tests to MANOVA.

Journal ArticleDOI
17 Aug 2016-PeerJ
TL;DR: Deriving MLST from WGS data is more sensitive than the conventional method, and the mapping based approach was the most sensitive when comparing WGS based methods.
Abstract: Multilocus sequence typing (MLST) is an effective method to describe bacterial populations. Conventionally, MLST involves Polymerase Chain Reaction (PCR) amplification of housekeeping genes followed by Sanger DNA sequencing. Public Health England (PHE) is in the process of replacing the conventional MLST methodology with a method based on short read sequence data derived from Whole Genome Sequencing (WGS). This paper reports the comparison of the reliability of MLST results derived from WGS data, comparing mapping and assembly-based approaches to conventional methods using 323 bacterial genomes of diverse species. The sensitivity of the two WGS based methods were further investigated with 26 mixed and 29 low coverage genomic data sets from Salmonella enteridis and Streptococcus pneumoniae. Of the 323 samples, 92.9% (n = 300), 97.5% (n = 315) and 99.7% (n = 322) full MLST profiles were derived by the conventional method, assembly- and mapping-based approaches, respectively. The concordance between samples that were typed by conventional (92.9%) and both WGS methods was 100%. From the 55 mixed and low coverage genomes, 89.1% (n = 49) and 67.3% (n = 37) full MLST profiles were derived from the mapping and assembly based approaches, respectively. In conclusion, deriving MLST from WGS data is more sensitive than the conventional method. When comparing WGS based methods, the mapping based approach was the most sensitive. In addition, the mapping based approach described here derives quality metrics, which are difficult to determine quantitatively using conventional and WGS-assembly based approaches.