scispace - formally typeset
Search or ask a question

Showing papers in "PeerJ in 2015"


Journal ArticleDOI
27 Aug 2015-PeerJ
TL;DR: MetaBAT as mentioned in this paper integrates empirical probabilistic distances of genome abundance and tetranucleotide frequency for accurate metagenome binning, and automatically forms hundreds of high quality genome bins on a very large assembly consisting millions of contigs.
Abstract: Grouping large genomic fragments assembled from shotgun metagenomic sequences to deconvolute complex microbial communities, or metagenome binning, enables the study of individual organisms and their interactions. Because of the complex nature of these communities, existing metagenome binning methods often miss a large number of microbial species. In addition, most of the tools are not scalable to large datasets. Here we introduce automated software called MetaBAT that integrates empirical probabilistic distances of genome abundance and tetranucleotide frequency for accurate metagenome binning. MetaBAT outperforms alternative methods in accuracy and computational efficiency on both synthetic and real metagenome datasets. It automatically forms hundreds of high quality genome bins on a very large assembly consisting millions of contigs in a matter of hours on a single node. MetaBAT is open source software and available at https://bitbucket.org/berkeleylab/metabat.

1,406 citations


Journal ArticleDOI
08 Oct 2015-PeerJ
TL;DR: Using anvi’o, this work re-analyzed publicly available datasets and explored temporal genomic changes within naturally occurring microbial populations through de novo characterization of single nucleotide variations, and linked cultivar and single-cell genomes with metagenomic and metatranscriptomic data.
Abstract: Advances in high-throughput sequencing and ‘omics technologies are revolutionizing studies of naturally occurring microbial communities. Comprehensive investigations of microbial lifestyles require the ability to interactively organize and visualize genetic information and to incorporate subtle differences that enable greater resolution of complex data. Here we introduce anvi’o, an advanced analysis and visualization platform that offers automated and human-guided characterization of microbial genomes in metagenomic assemblies, with interactive interfaces that can link ‘omics data from multiple sources into a single, intuitive display. Its extensible visualization approach distills multiple dimensions of information about each contig, offering a dynamic and unified work environment for data exploration, manipulation, and reporting. Using anvi’o, we re-analyzed publicly available datasets and explored temporal genomic changes within naturally occurring microbial populations through de novo characterization of single nucleotide variations, and linked cultivar and single-cell genomes with metagenomic and metatranscriptomic data. Anvi’o is an open-source platform that empowers researchers without extensive bioinformatics skills to perform and communicate in-depth analyses on large ‘omics datasets.

1,287 citations


Journal ArticleDOI
28 May 2015-PeerJ
TL;DR: VirSorter is a tool designed to detect viral signal in these different types of microbial sequence data in both a reference-dependent and reference-independent manner, leveraging probabilistic models and extensive virome data to maximize detection of novel viruses.
Abstract: Viruses of microbes impact all ecosystems where microbes drive key energy and substrate transformations including the oceans, humans and industrial fermenters. However, despite this recognized importance, our understanding of viral diversity and impacts remains limited by too few model systems and reference genomes. One way to fill these gaps in our knowledge of viral diversity is through the detection of viral signal in microbial genomic data. While multiple approaches have been developed and applied for the detection of prophages (viral genomes integrated in a microbial genome), new types of microbial genomic data are emerging that are more fragmented and larger scale, such as Single-cell Amplified Genomes (SAGs) of uncultivated organisms or genomic fragments assembled from metagenomic sequencing. Here, we present VirSorter, a tool designed to detect viral signal in these different types of microbial sequence data in both a reference-dependent and reference-independent manner, leveraging probabilistic models and extensive virome data to maximize detection of novel viruses. Performance testing shows that VirSorter's prophage prediction capability compares to that of available prophage predictors for complete genomes, but is superior in predicting viral sequences outside of a host genome (i.e., from extrachromosomal prophages, lytic infections, or partially assembled prophages). Furthermore, VirSorter outperforms existing tools for fragmented genomic and metagenomic datasets, and can identify viral signal in assembled sequence (contigs) as short as 3kb, while providing near-perfect identification (>95% Recall and 100% Precision) on contigs of at least 10kb. Because VirSorter scales to large datasets, it can also be used in "reverse" to more confidently identify viral sequence in viral metagenomes by sorting away cellular DNA whether derived from gene transfer agents, generalized transduction or contamination. Finally, VirSorter is made available through the iPlant Cyberinfrastructure that provides a web-based user interface interconnected with the required computing resources. VirSorter thus complements existing prophage prediction softwares to better leverage fragmented, SAG and metagenomic datasets in a way that will scale to modern sequencing. Given these features, VirSorter should enable the discovery of new viruses in microbial datasets, and further our understanding of uncultivated viral communities across diverse ecosystems.

863 citations


Journal ArticleDOI
18 Jun 2015-PeerJ
TL;DR: GraPhlAn (Graphical Phylogenetic Analysis), a computational tool that produces high-quality, compact visualizations of microbial genomes and metagenomes, is developed as an open-source command-driven tool in order to be easily integrated into complex, publication-quality bioinformatics pipelines.
Abstract: The increased availability of genomic and metagenomic data poses challenges at multiple analysis levels, including visualization of very large-scale microbial and microbial community data paired with rich metadata. We developed GraPhlAn (Graphical Phylogenetic Analysis), a computational tool that produces high-quality, compact visualizations of microbial genomes and metagenomes. This includes phylogenies spanning up to thousands of taxa, annotated with metadata ranging from microbial community abundances to microbial physiology or host and environmental phenotypes. GraPhlAn has been developed as an open-source command-driven tool in order to be easily integrated into complex, publication-quality bioinformatics pipelines. It can be executed either locally or through an online Galaxy web application. We present several examples including taxonomic and phylogenetic visualization of microbial communities, metabolic functions, and biomarker discovery that illustrate GraPhlAn's potential for modern microbial and community genomics.

590 citations


Journal ArticleDOI
10 Dec 2015-PeerJ
TL;DR: Swarm v2 has two important novel features: a new algorithm for d = 1 that allows the computation time of the program to scale linearly with increasing amounts of data; and the new fastidious option that reduces under-grouping by grafting low abundant OTUs onto larger ones.
Abstract: Previously we presented Swarm v1, a novel and open source amplicon clustering program that produced fine-scale molecular operational taxonomic units (OTUs), free of arbitrary global clustering thresholds and input-order dependency. Swarm v1 worked with an initial phase that used iterative single-linkage with a local clustering threshold (d), followed by a phase that used the internal abundance structures of clusters to break chained OTUs. Here we present Swarm v2, which has two important novel features: (1) a new algorithm for d = 1 that allows the computation time of the program to scale linearly with increasing amounts of data; and (2) the new fastidious option that reduces under-grouping by grafting low abundant OTUs (e.g., singletons and doubletons) onto larger ones. Swarm v2 also directly integrates the clustering and breaking phases, dereplicates sequencing reads with d = 0, outputs OTU representatives in fasta format, and plots individual OTUs as two-dimensional networks.

440 citations


Journal ArticleDOI
21 Jul 2015-PeerJ
TL;DR: Simulation results suggest that OLRE are a useful tool for modelling overdispersion in Binomial data, but that they do not perform well in all circumstances and researchers should take care to verify the robustness of parameter estimates of OLRE models.
Abstract: Overdispersion is a common feature of models of biological data, but researchers often fail to model the excess variation driving the overdispersion, resulting in biased parameter estimates and standard errors. Quantifying and modeling overdispersion when it is present is therefore critical for robust biological inference. One means to account for overdispersion is to add an observation-level random effect (OLRE) to a model, where each data point receives a unique level of a random effect that can absorb the extra-parametric variation in the data. Although some studies have investigated the utility of OLRE to model overdispersion in Poisson count data, studies doing so for Binomial proportion data are scarce. Here I use a simulation approach to investigate the ability of both OLRE models and Beta-Binomial models to recover unbiased parameter estimates in mixed effects models of Binomial data under various degrees of overdispersion. In addition, as ecologists often fit random intercept terms to models when the random effect sample size is low (<5 levels), I investigate the performance of both model types under a range of random effect sample sizes when overdispersion is present. Simulation results revealed that the efficacy of OLRE depends on the process that generated the overdispersion; OLRE failed to cope with overdispersion generated from a Beta-Binomial mixture model, leading to biased slope and intercept estimates, but performed well for overdispersion generated by adding random noise to the linear predictor. Comparison of parameter estimates from an OLRE model with those from its corresponding Beta-Binomial model readily identified when OLRE were performing poorly due to disagreement between effect sizes, and this strategy should be employed whenever OLRE are used for Binomial data to assess their reliability. Beta-Binomial models performed well across all contexts, but showed a tendency to underestimate effect sizes when modelling non-Beta-Binomial data. Finally, both OLRE and Beta-Binomial models performed poorly when models contained <5 levels of the random intercept term, especially for estimating variance components, and this effect appeared independent of total sample size. These results suggest that OLRE are a useful tool for modelling overdispersion in Binomial data, but that they do not perform well in all circumstances and researchers should take care to verify the robustness of parameter estimates of OLRE models.

287 citations


Journal ArticleDOI
08 Dec 2015-PeerJ
TL;DR: This study demonstrates that de novo methods are the optimal method of assigning sequences into OTUs and that the quality of these assignments needs to be assessed for multiple methods to identify the optimal clustering method for a particular dataset.
Abstract: Background. 16S rRNA gene sequences are routinely assigned to operational taxonomic units (OTUs) that are then used to analyze complex microbial communities. A number of methods have been employed to carry out the assignment of 16S rRNA gene sequences to OTUs leading to confusion over which method is optimal. A recent study suggested that a clustering method should be selected based on its ability to generate stable OTU assignments that do not change as additional sequences are added to the dataset. In contrast, we contend that the quality of the OTU assignments, the ability of the method to properly represent the distances between the sequences, is more important. Methods. Our analysis implemented six de novo clustering algorithms including the single linkage, complete linkage, average linkage, abundance-based greedy clustering, distance-based greedy clustering, and Swarm and the open and closed-reference methods. Using two previously published datasets we used the Matthew’s Correlation Coefficient (MCC) to assess the stability and quality of OTU assignments. Results. The stability of OTU assignments did not reflect the quality of the assignments. Depending on the dataset being analyzed, the average linkage and the distance and abundance-based greedy clustering methods generated OTUs that were more likely to represent the actual distances between sequences than the open and closed-reference methods. We also demonstrated that for the greedy algorithms VSEARCH produced assignments that were comparable to those produced by USEARCH making VSEARCH a viable free and open source alternative to USEARCH. Further interrogation of the reference-based methods indicated that when USEARCH or VSEARCH were used to identify the closest reference, the OTU assignments were sensitive to the order of the reference sequences because the reference sequences can be identical over the region being considered. More troubling was the observation that while both USEARCH and VSEARCH have a high level of sensitivity to detect reference sequences, the specificity of those matches was poor relative to the true best match. Discussion. Our analysis calls into question the quality and stability of OTU assignments generated by the open and closed-reference methods as implemented in current version of QIIME. This study demonstrates that de novo methods are the optimal method of assigning sequences into OTUs and that the quality of these assignments needs to be assessed for multiple methods to identify the optimal clustering method for a particular dataset.

238 citations


Journal ArticleDOI
07 Jul 2015-PeerJ
TL;DR: The 3D reconstruction of reef structure and complexity can be integrated with other physiological and ecological parameters in future research to develop reliable ecosystem models and improve capacity to monitor changes in the health and function of coral reef ecosystems.
Abstract: The structural complexity of coral reefs plays a major role in the biodiversity, productivity, and overall functionality of reef ecosystems. Conventional metrics with 2-dimensional properties are inadequate for characterization of reef structural complexity. A 3-dimensional (3D) approach can better quantify topography, rugosity and other structural characteristics that play an important role in the ecology of coral reef communities. Structure-from-Motion (SfM) is an emerging low-cost photogrammetric method for high-resolution 3D topographic reconstruction. This study utilized SfM 3D reconstruction software tools to create textured mesh models of a reef at French Frigate Shoals, an atoll in the Northwestern Hawaiian Islands. The reconstructed orthophoto and digital elevation model were then integrated with geospatial software in order to quantify metrics pertaining to 3D complexity. The resulting data provided high-resolution physical properties of coral colonies that were then combined with live cover to accurately characterize the reef as a living structure. The 3D reconstruction of reef structure and complexity can be integrated with other physiological and ecological parameters in future research to develop reliable ecosystem models and improve capacity to monitor changes in the health and function of coral reef ecosystems.

213 citations


Journal ArticleDOI
23 Jul 2015-PeerJ
TL;DR: An overview of the recent literature on the use of internet-based testing to address important questions in perception research is provided, andStrengths and weaknesses of the online approach, relative to others, are highlighted, and recommendations made for those researchers who might be thinking about conducting their own studies using this increasingly-popular approach to research in the psychological sciences.
Abstract: This article provides an overview of the recent literature on the use of internet-based testing to address important questions in perception research. Our goal is to provide a starting point for the perception researcher who is keen on assessing this tool for their own research goals. Internet-based testing has several advantages over in-lab research, including the ability to reach a relatively broad set of participants and to quickly and inexpensively collect large amounts of empirical data, via services such as Amazon’s Mechanical Turk or Prolific Academic. In many cases, the quality of online data appears to match that collected in lab research. Generally-speaking, online participants tend to be more representative of the population at large than those recruited for lab based research. There are, though, some important caveats, when it comes to collecting data online. It is obviously much more difficult to control the exact parameters of stimulus presentation (such as display characteristics) with online research. There are also some thorny ethical elements that need to be considered by experimenters. Strengths and weaknesses of the online approach, relative to others, are highlighted, and recommendations made for those researchers who might be thinking about conducting their own studies using this increasingly-popular approach to research in the psychological sciences.

207 citations


Journal ArticleDOI
25 Aug 2015-PeerJ
TL;DR: It is suggested that the oropharyngeal microbiome in individuals with schizophrenia is significantly different compared to controls, and that particular microbial species and metabolic pathways differentiate both groups.
Abstract: The role of the human microbiome in schizophrenia remains largely unexplored. The microbiome has been shown to alter brain development and modulate behavior and cognition in animals through gut-brain connections, and research in humans suggests that it may be a modulating factor in many disorders. This study reports findings from a shotgun metagenomic analysis of the oropharyngeal microbiome in 16 individuals with schizophrenia and 16 controls. High-level differences were evident at both the phylum and genus levels, with Proteobacteria, Firmicutes, Bacteroidetes, and Actinobacteria dominating both schizophrenia patients and controls, and Ascomycota being more abundant in schizophrenia patients than controls. Controls were richer in species but less even in their distributions, i.e., dominated by fewer species, as opposed to schizophrenia patients. Lactic acid bacteria were relatively more abundant in schizophrenia, including species of Lactobacilli and Bifidobacterium, which have been shown to modulate chronic inflammation. We also found Eubacterium halii, a lactate-utilizing species. Functionally, the microbiome of schizophrenia patients was characterized by an increased number of metabolic pathways related to metabolite transport systems including siderophores, glutamate, and vitamin B12. In contrast, carbohydrate and lipid pathways and energy metabolism were abundant in controls. These findings suggest that the oropharyngeal microbiome in individuals with schizophrenia is significantly different compared to controls, and that particular microbial species and metabolic pathways differentiate both groups. Confirmation of these findings in larger and more diverse samples, e.g., gut microbiome, will contribute to elucidating potential links between schizophrenia and the human microbiota.

202 citations


Journal ArticleDOI
26 Aug 2015-PeerJ
TL;DR: CFSAN SNP Pipeline is a robust and accurate tool that it is among the first to combine into a single executable the myriad steps required to produce a SNP matrix from NGS data.
Abstract: The analysis of next-generation sequence (NGS) data is often a fragmented step-wise process. For example, multiple pieces of software are typically needed to map NGS reads, extract variant sites, and construct a DNA sequence matrix containing only single nucleotide polymorphisms (i.e., a SNP matrix) for a set of individuals. The management and chaining of these software pieces and their outputs can often be a cumbersome and diffi cult task. Here, we present CFSAN SNP Pipeline, which combines into a single package the mapping of NGS reads to a reference genome with Bowtie2, processing of those mapping (BAM) files using SAMtools, identification of variant sites using VarScan, and production of a SNP matrix using custom Python scripts. We also introduce a Python package (CFSAN SNP Mutator) that when given a reference genome will generate variants of known position against which we validate our pipeline. We created 1,000 simulated Salmonella enterica sp. enterica Serovar Agona genomes at 100× and 20× coverage, each containing 500 SNPs, 20 single-base insertions and 20 single-base deletions. For the 100× dataset, the CFSAN SNP Pipeline recovered 98.9% of the introduced SNPs and had a false positive rate of 1.04 × 10 −6 ; for the 20× dataset 98.8% of SNPs were recovered and the false positive rate was 8.34 × 10 −7 . Based on these results, CFSAN SNP Pipeline is a robust and accurate tool that it is among the first to combine into a single executable the myriad steps required to produce a SNP matrix from NGS data. Such a tool is useful to those working in an applied setting (e.g., food safety traceback investigations) as well as for those interested in evolutionary questions.

Journal ArticleDOI
22 Sep 2015-PeerJ
TL;DR: It is confirmed that an occupied space is microbially distinct from an unoccupied one, and it is demonstrated for the first time that individuals release their own personalized microbial cloud.
Abstract: Dispersal of microbes between humans and the built environment can occur through direct contact with surfaces or through airborne release; the latter mechanism remains poorly understood. Humans emit upwards of 10 6 biological particles per hour, and have long been known to transmit pathogens to other individuals and to indoor surfaces. However it has not previously been demonstrated that humans emit a detectible microbial cloud into surrounding indoor air, nor whether such clouds are suffi ciently diff erentiated to allow the identification of individual occupants. We used high-throughput sequencing of 16S rRNA genes to characterize the airborne bacterial contribution of a single person sitting in a sanitized custom experimental climate chamber. We compared that to air sampled in an adjacent, identical, unoccupied chamber, as well as to supply and exhaust air sources. Additionally, we assessed microbial communities in settled particles surrounding each occupant, to investigate the potential long-term fate of airborne microbial emissions. Most occupants could be clearly detected by their airborne bacterial emissions, as well as their contribution to settled particles, within 1.5–4 h. Bacterial clouds from the occupants were statistically distinct, allowing the identification of some individual occupants. Our results confirm that an occupied space is microbially distinct from an unoccupied one, and demonstrate for the first time that individuals release their own personalized microbial cloud.

Journal ArticleDOI
30 Sep 2015-PeerJ
TL;DR: The findings show that, at the level of contents, negative messages spread faster than positive ones, but positive ones reach larger audiences, suggesting that people are more inclined to share and favorite positive contents, the so-calledpositive bias.
Abstract: Social media have become the main vehicle of information production and consumption online. Millions of users every day log on their Facebook or Twitter accounts to get updates and news, read about their topics of interest, and become exposed to new opportunities and interactions. Although recent studies suggest that the contents users produce will affect the emotions of their readers, we still lack a rigorous understanding of the role and effects of contents sentiment on the dynamics of information diffusion. This work aims at quantifying the effect of sentiment on information diffusion, to understand: (i) whether positive conversations spread faster and/or broader than negative ones (or vice-versa); (ii) what kind of emotions are more typical of popular conversations on social media; and, (iii) what type of sentiment is expressed in conversations characterized by different temporal dynamics. Our findings show that, at the level of contents, negative messages spread faster than positive ones, but positive ones reach larger audiences, suggesting that people are more inclined to share and favorite positive contents, the so-called positive bias. As for the entire conversations, we highlight how different temporal dynamics exhibit different sentiment patterns: for example, positive sentiment builds up for highly-anticipated events, while unexpected events are mainly characterized by negative sentiment. Our contribution is a milestone to understand how the emotions expressed in short texts affect their spreading in online social ecosystems, and may help to craft effective policies and strategies for content generation and diffusion.

Journal ArticleDOI
26 Mar 2015-PeerJ
TL;DR: In this paper, the authors identify several common triggers used to achieve ASMR, including whispering, personal attention, crisp sounds and slow movements, and a high prevalence of synaesthesia (5.9%) within the sample suggests a possible link between ASMR and synaesthetic, similar to that of misophonia.
Abstract: Autonomous Sensory Meridian Response (ASMR) is a previously unstudied sensory phenomenon, in which individuals experience a tingling, static-like sensation across the scalp, back of the neck and at times further areas in response to specific triggering audio and visual stimuli. This sensation is widely reported to be accompanied by feelings of relaxation and well-being. The current study identifies several common triggers used to achieve ASMR, including whispering, personal attention, crisp sounds and slow movements. Data obtained also illustrates temporary improvements in symptoms of depression and chronic pain in those who engage in ASMR. A high prevalence of synaesthesia (5.9%) within the sample suggests a possible link between ASMR and synaesthesia, similar to that of misophonia. Links between number of effective triggers and heightened flow state suggest that flow may be necessary to achieve sensations associated with ASMR.

Journal ArticleDOI
02 Dec 2015-PeerJ
TL;DR: There is a strong correlation between collective and individual diversity, supporting the notion that when the authors use social media they find ourselves inside “social bubbles,” and could lead to a deeper understanding of how technology biases their exposure to new information.
Abstract: Social media have become a prevalent channel to access information, spread ideas, and influence opinions. However, it has been suggested that social and algorithmic filtering may cause exposure to less diverse points of view. Here we quantitatively measure this kind of social bias at the collective level by mining a massive datasets of web clicks. Our analysis shows that collectively, people access information from a significantly narrower spectrum of sources through social media and email, compared to a search baseline. The significance of this finding for individual exposure is revealed by investigating the relationship between the diversity of information sources experienced by users at both the collective and individual levels in two datasets where individual users can be analyzed—Twitter posts and search logs. There is a strong correlation between collective and individual diversity, supporting the notion that when we use social media we find ourselves inside “social bubbles.” Our results could lead to a deeper understanding of how technology biases our exposure to new information.

Journal ArticleDOI
07 Apr 2015-PeerJ
TL;DR: The use of a specimen-, rather than species-based approach increases knowledge of intraspecific and intrageneric variation in diplodocids, and the study demonstrates how specimen-based phylogenetic analysis is a valuable tool in sauropod taxonomy, and potentially in paleontology and taxonomy as a whole.
Abstract: Diplodocidae are among the best known sauropod dinosaurs. Several species were described in the late 1800s or early 1900s from the Morrison Formation of North America. Since then, numerous additional specimens were recovered in the USA, Tanzania, Portugal, and Argentina, as well as possibly Spain, England, Georgia, Zimbabwe, and Asia. To date, the clade includes about 12 to 15 nominal species, some of them with questionable taxonomic status (e.g., ‘Diplodocus’ hayi or Dyslocosaurus polyonychius), and ranging in age from Late Jurassic to Early Cretaceous. However, intrageneric relationships of the iconic, multi-species genera Apatosaurus and Diplodocus are still poorly known. The way to resolve this issue is a specimen-based phylogenetic analysis, which has been previously implemented for Apatosaurus, but is here performed for the first time for the entire clade of Diplodocidae. The analysis includes 81 operational taxonomic units, 49 of which belong to Diplodocidae. The set of OTUs includes all name-bearing type specimens previously proposed to belong to Diplodocidae, alongside a set of relatively complete referred specimens, which increase the amount of anatomically overlapping material. Non-diplodocid outgroups were selected to test the affinities of potential diplodocid specimens that have subsequently been suggested to belong outside the clade. The specimens were scored for 477 morphological characters, representing one of the most extensive phylogenetic analyses of sauropod dinosaurs. Character states were figured and tables given in the case of numerical characters. The resulting cladogram recovers the classical arrangement of diplodocid relationships. Two numerical approaches were used to increase reproducibility in our taxonomic delimitation of species and genera. This resulted in the proposal that some species previously included in well-known genera like Apatosaurus and Diplodocus are generically distinct. Of particular note is that the famous genus Brontosaurus is considered valid by our quantitative approach. Furthermore, “Diplodocus” hayi represents a unique genus, which will herein be called Galeamopus gen. nov. On the other hand, these numerical approaches imply synonymization of “Dinheirosaurus” from the Late Jurassic of Portugal with the Morrison Formation genus Supersaurus. Our use of a specimen-, rather than species-based approach increases knowledge of intraspecific and intrageneric variation in diplodocids, and the study demonstrates how specimen-based phylogenetic analysis is a valuable tool in sauropod taxonomy, and potentially in paleontology and taxonomy as a whole.

Journal ArticleDOI
24 Mar 2015-PeerJ
TL;DR: Innovative strategies recently developed by scientists are described in this review to accelerate the identification of causal genes and deepen the understanding of obesity etiology.
Abstract: Obesity is a major public health concern. This condition results from a constant and complex interplay between predisposing genes and environmental stimuli. Current attempts to manage obesity have been moderately effective and a better understanding of the etiology of obesity is required for the development of more successful and personalized prevention and treatment options. To that effect, mouse models have been an essential tool in expanding our understanding of obesity, due to the availability of their complete genome sequence, genetically identified and defined strains, various tools for genetic manipulation and the accessibility of target tissues for obesity that are not easily attainable from humans. Our knowledge of monogenic obesity in humans greatly benefited from the mouse obesity genetics field. Genes underlying highly penetrant forms of monogenic obesity are part of the leptin-melanocortin pathway in the hypothalamus. Recently, hypothesis-generating genome-wide association studies for polygenic obesity traits in humans have led to the identification of 119 common gene variants with modest effect, most of them having an unknown function. These discoveries have led to novel animal models and have illuminated new biologic pathways. Integrated mouse-human genetic approaches have firmly established new obesity candidate genes. Innovative strategies recently developed by scientists are described in this review to accelerate the identification of causal genes and deepen our understanding of obesity etiology. An exhaustive dissection of the molecular roots of obesity may ultimately help to tackle the growing obesity epidemic worldwide.

Posted ContentDOI
29 Sep 2015-PeerJ
TL;DR: The R package agricolae is a well established statistical toolbox based on R with a broad range of applications in design and analyses of experiments also in the wider biological community.
Abstract: Plant breeders and educators working with the International Potato Center (CIP) needed freely available statistical tools. In response, we created first a set of scripts for specific tasks using the open source statistical software R. Based on this we eventually compiled the R package agricolae as it covered a niche. Here we describe for the first time its main functions in the form of an article. We also review its reception using download statistics, citation data, and feedback from a user survey. We highlight usage in our extended network of collaborators. The package has found applications beyond agriculture in fields like aquaculture, ecology, biodiversity, conservation biology and cancer research. In summary, the package agricolae is a well established statistical toolbox based on R with a broad range of applications in design and analyses of experiments also in the wider biological community .

Journal ArticleDOI
05 Nov 2015-PeerJ
TL;DR: The hypothesis that bacteria abundance profiles in saliva are useful biomarkers for pancreatic cancer though much larger patient studies are needed to verify their predictive utility is supported.
Abstract: Clinical manifestations of pancreatic cancer often do not occur until the cancer has undergone metastasis, resulting in a very low survival rate. In this study, we investigated whether salivary bacterial profiles might provide useful biomarkers for early detection of pancreatic cancer. Using high-throughput sequencing of bacterial small subunit ribosomal RNA (16S rRNA) gene, we characterized the salivary microbiota of patients with pancreatic cancer and compared them to healthy patients and patients with other diseases, including pancreatic disease, non-pancreatic digestive disease/cancer and non-digestive disease/cancer. A total of 146 patients were enrolled at the UCSD Moores Cancer Center where saliva and demographic data were collected from each patient. Of these, we analyzed the salivary microbiome of 108 patients: 8 had been diagnosed with pancreatic cancer, 78 with other diseases and 22 were classified as non-diseased (healthy) controls. Bacterial 16S rRNA sequences were amplified directly from salivary DNA extractions and subjected to high-throughput sequencing (HTS). Several bacterial genera differed in abundance in patients with pancreatic cancer. We found a significantly higher ratio of Leptotrichia to Porphyromonas in the saliva of patients with pancreatic cancer than in the saliva of healthy patients or those with other disease (Kruskal-Wallis Test; P < 0.001). Leptotrichia abundances were confirmed using real-time qPCR with Leptotrichia specific primers. Similar to previous studies, we found lower relative abundances of Neisseria and Aggregatibacter in the saliva of pancreatic cancer patients, though these results were not significant at the P < 0.05 level (K-W Test; P = 0.07 and P = 0.09 respectively). However, the relative abundances of other previously identified bacterial biomarkers, e.g., Streptococcus mitis and Granulicatella adiacens, were not significantly different in the saliva of pancreatic cancer patients. Overall, this study supports the hypothesis that bacteria abundance profiles in saliva are useful biomarkers for pancreatic cancer though much larger patient studies are needed to verify their predictive utility.

Journal ArticleDOI
22 Oct 2015-PeerJ
TL;DR: A food composition database for seven discretionary food categories of packaged GF products is developed and indicates that for GF foods no predominant health benefits are indicated; in fact, some critical nutrients must be considered when being on a GF diet.
Abstract: Notwithstanding a growth in popularity and consumption of gluten-free (GF) food products, there is a lack of substantiated analysis of the nutritional quality compared with their gluten-containing counterparts. To put GF foods into proper perspective both for those who need it (patients with celiac disease) and for those who do not, we provide contemporary data about cost and nutritional quality of GF food products. The objective of this study is to develop a food composition database for seven discretionary food categories of packaged GF products. Nutrient composition, nutritional information and cost of foods from 63 GF and 126 gluten-containing counterparts were systematically obtained from 12 different Austrian supermarkets. The nutrition composition (macro and micronutrients) was analyzed by using two nutrient composition databases in a stepwise approximation process. A total of 63 packaged GF foods were included in the analysis representing a broad spectrum of different GF categories (flour/bake mix, bread and bakery products, pasta and cereal-based food, cereals, cookies and cakes, snacks and convenience food). Our results show that the protein content of GF products is >2 fold lower across 57% of all food categories. In 65% of all GF foods, low sodium content was observed (defined as 6g/100 g). On average, GF foods were substantially higher in cost, ranging from +205% (cereals) to +267% (bread and bakery products) compared to similar gluten-containing products. In conclusion, our results indicate that for GF foods no predominant health benefits are indicated; in fact, some critical nutrients must be considered when being on a GF diet. For individuals with celiac disease, the GF database provides a helpful tool to identify the food composition of their medical diet. For healthy consumers, replacing gluten-containing products with GF foods is aligned with substantial cost differences but GF foods do not provide additional health benefits from a nutritional perspective.

Journal ArticleDOI
12 Mar 2015-PeerJ
TL;DR: A very high prevalence of academic stress and poor sleep quality among medical students is found and many medical students reported using sedatives more than once a week.
Abstract: Introduction. Medicine is one of the most stressful fields of education because of its highly demanding professional and academic requirements. Psychological stress, anxiety, depression and sleep disturbances are highly prevalent in medical students. Methods. This cross-sectional study was undertaken at the Combined Military Hospital Lahore Medical College and the Institute of Dentistry in Lahore (CMH LMC), Pakistan. Students enrolled in all yearly courses for the Bachelor of Medicine and Bachelor of Surgery (MBBS) degree were included. The questionnaire consisted of four sections: (1) demographics (2) a table listing 34 potential stressors, (3) the 14-item Perceived Stress Scale (PSS-14), and (4) the Pittsburgh Quality of Sleep Index (PSQI). Logistic regression was run to identify associations between group of stressors, gender, year of study, student’s background, stress and quality of sleep. Results. Total response rate was 93.9% (263/280 respondents returned the questionnaire). The mean (SD) PSS-14 score was 30 (6.97). Logistic regression analysis showed that cases of high-level stress were associated with year of study and academic-related stressors only. Univariate analysis identified 157 cases with high stress levels (59.7%). The mean (SD) PSQI score was 8.1 (3.12). According to PSQI score, 203/263 respondents (77%) were poor sleepers. Logistic regression showed that mean PSS-14 score was a significant predictor of PSQI score (OR 1.99, P < 0.05). Conclusion. We found a very high prevalence of academic stress and poor sleep quality among medical students. Many medical students reported using sedatives more than once a week. Academic stressors contributed significantly to stress and sleep disorders in medical students.

Journal ArticleDOI
21 Jul 2015-PeerJ
TL;DR: The results highlight a range of issues that must be considered when pairing genomic data collection with non-invasive sampling, particularly related to field sampling protocols for minimizing exogenous DNA, data collection strategies and quality control steps for enhancing target organism yield, and analytical approaches for maximizing cost-effectiveness and information content of recovered genomic data.
Abstract: Conservation genomics has become an increasingly popular term, yet it remains unclear whether the non-invasive sampling that is essential for many conservation-related studies is compatible with the minimum requirements for harnessing next-generation sequencing technologies. Here, we evaluated the feasibility of using genotyping-by-sequencing of non-invasively collected hair samples to simultaneously identify and genotype single nucleotide polymorphisms (SNPs) in a climate-sensitive mammal, the American pika (Ochotona princeps). We identified and genotyped 3,803 high-confidence SNPs across eight sites distributed along two elevational transects using starting DNA amounts as low as 1 ng. Fifty-five outlier loci were detected as candidate gene regions under divergent selection, constituting potential targets for future validation. Genome-wide estimates of gene diversity significantly and positively correlated with elevation across both transects, with all low elevation sites exhibiting significant heterozygote deficit likely due to inbreeding. More broadly, our results highlight a range of issues that must be considered when pairing genomic data collection with non-invasive sampling, particularly related to field sampling protocols for minimizing exogenous DNA, data collection strategies and quality control steps for enhancing target organism yield, and analytical approaches for maximizing cost-effectiveness and information content of recovered genomic data.

Journal ArticleDOI
24 Sep 2015-PeerJ
TL;DR: It is concluded that Docker containers have only a minor impact on the performance of common genomic pipelines, which is negligible when the executed jobs are long in terms of computational time.
Abstract: Genomic pipelines consist of several pieces of third party software and, because of their experimental nature, frequent changes and updates are commonly necessary thus raising serious deployment and reproducibility issues. Docker containers are emerging as a possible solution for many of these problems, as they allow the packaging of pipelines in an isolated and self-contained manner. This makes it easy to distribute and execute pipelines in a portable manner across a wide range of computing platforms. Thus, the question that arises is to what extent the use of Docker containers might affect the performance of these pipelines. Here we address this question and conclude that Docker containers have only a minor impact on the performance of common genomic pipelines, which is negligible when the executed jobs are long in terms of computational time.

Journal ArticleDOI
21 Apr 2015-PeerJ
TL;DR: An adapted EPOC EEG system can be used to index children’s late auditory ERP peaks and their MMN ERP component and indicates that the ERP morphology recorded with the two systems was very similar.
Abstract: Background. Previous work has demonstrated that a commercial gaming electroencephalography (EEG) system, Emotiv EPOC, can be adjusted to provide valid auditory event-related potentials (ERPs) in adults that are comparable to ERPs recorded by a research-grade EEG system, Neuroscan. The aim of the current study was to determine if the same was true for children. Method. An adapted Emotiv EPOC system and Neuroscan system were used to make simultaneous EEG recordings in nineteen 6- to 12-year-old children under “passive” and “active” listening conditions. In the passive condition, children were instructed to watch a silent DVD and ignore 566 standard (1,000 Hz) and 100 deviant (1,200 Hz) tones. In the active condition, they listened to the same stimuli, and were asked to count the number of ‘high’ (i.e., deviant) tones. Results. Intraclass correlations (ICCs) indicated that the ERP morphology recorded with the two systems was very similar for the P1, N1, P2, N2, and P3 ERP peaks (r = .82 to .95) in both passive and active conditions, and less so, though still strong, for mismatch negativity ERP component (MMN; r = .67 to .74). There were few differences between peak amplitude and latency estimates for the two systems. Conclusions. An adapted EPOC EEG system can be used to index children’s late auditory ERP peaks (i.e., P1, N1, P2, N2, P3) and their MMN ERP component.

Journal ArticleDOI
01 Jan 2015-PeerJ
TL;DR: The guidance provided here is intended to help achieve widespread, uniform human and machine accessibility of deposited data, in support of significantly improved verification, validation, reproducibility and re-use of scholarly/scientific data.
Abstract: Reproducibility and reusability of research results is an important concern in scientific communication and science policy. A foundational element of reproducibility and reusability is the open and persistently available presentation of research data. However, many common approaches for primary data publication in use today do not achieve sufficient long-term robustness, openness, accessibility or uniformity. Nor do they permit comprehensive exploitation by modern Web technologies. This has led to several authoritative studies recommending uniform direct citation of data archived in persistent repositories. Data are to be considered as first-class scholarly objects, and treated similarly in many ways to cited and archived scientific and scholarly literature. Here we briefly review the most current and widely agreed set of principle-based recommendations for scholarly data citation, the Joint Declaration of Data Citation Principles (JDDCP). We then present a framework for operationalizing the JDDCP; and a set of initial recommendations on identifier schemes, identifier resolution behavior, required metadata elements, and best practices for realizing programmatic machine actionability of cited data. The main target audience for the common implementation guidelines in this article consists of publishers, scholarly organizations, and persistent data repositories, including technical staff members in these organizations. But ordinary researchers can also benefit from these recommendations. The guidance provided here is intended to help achieve widespread, uniform human and machine accessibility of deposited data, in support of significantly improved verification, validation, reproducibility and re-use of scholarly/scientific data.

Journal ArticleDOI
12 May 2015-PeerJ
TL;DR: The reefs of Kāneʻohe Bay have developed and persist under rather severe natural and anthropogenic perturbations, and to date, these reefs have proved to be very resilient once the stressor has been removed.
Abstract: Kāne'ohe Bay, which is located on the on the NE coast of O'ahu, Hawai'i, represents one of the most intensively studied estuarine coral reef ecosystems in the world Despite a long history of anthropogenic disturbance, from early settlement to post European contact, the coral reef ecosystem of Kāne'ohe Bay appears to be in better condition in comparison to other reefs around the world The island of Moku o Lo'e (Coconut Island) in the southern region of the bay became home to the Hawai'i Institute of Marine Biology in 1947, where researchers have since documented the various aspects of the unique physical, chemical, and biological features of this coral reef ecosystem The first human contact by voyaging Polynesians occurred at least 700 years ago By AD 1250 Polynesians voyagers had settled inhabitable islands in the region which led to development of an intensive agricultural, fish pond and ocean resource system that supported a large human population Anthropogenic disturbance initially involved clearing of land for agriculture, intentional or accidental introduction of alien species, modification of streams to supply water for taro culture, and construction of massive shoreline fish pond enclosures and extensive terraces in the valleys that were used for taro culture The arrival by the first Europeans in 1778 led to further introductions of plants and animals that radically changed the landscape Subsequent development of a plantation agricultural system led to increased human immigration, population growth and an end to traditional land and water management practices The reefs were devastated by extensive dredge and fill operations as well as rapid growth of human population, which led to extensive urbanization of the watershed By the 1960's the bay was severely impacted by increased sewage discharge along with increased sedimentation due to improper grading practices and stream channelization, resulting in extensive loss of coral cover The reefs of Kāne'ohe Bay developed under estuarine conditions and thus have been subjected to multiple natural stresses These include storm floods, a more extreme temperature range than more oceanic reefs, high rates of sedimentation, and exposure at extreme low tides Deposition and degradation of organic materials carried into the bay from the watershed results in low pH conditions such that according to some ocean acidification projections the rich coral reefs in the bay should not exist Increased global temperature due to anthropogenic fossil fuel emmisions is now impacting these reefs with the first "bleaching event" in 1996 and a second more severe event in 2014 The reefs of Kāne'ohe Bay have developed and persist under rather severe natural and anthropogenic perturbations To date, these reefs have proved to be very resilient once the stressor has been removed A major question remains to be answered concerning the limits of Kāne'ohe Bay reef resilience in the face of global climate change

Journal ArticleDOI
24 Feb 2015-PeerJ
TL;DR: The data suggest that oxygen availability is at least one major factor determining specific partnerships in methane oxidation, and suggest that speciation within Methylococcaceae and Methylophilaceae may be driven by niche adaptation tailored toward specific placements within the oxygen gradient.
Abstract: We have previously observed that methane supplied to lake sediment microbial communities as a substrate not only causes a response by bona fide methanotrophic bacteria, but also by non-methane-oxidizing bacteria, especially by members of the family Methylophilaceae. This result suggested that methane oxidation in this environment likely involves communities composed of different functional guilds, rather than a single type of microbe. To obtain further support for this concept and to obtain further insights into the factors that may define such partnerships, we carried out microcosm incubations with sediment samples from Lake Washington at five different oxygen tensions, while methane was supplied at the same concentration in each. Community composition was determined through 16S rRNA gene amplicon sequencing after 10 and 16 weeks of incubation. We demonstrate that, in support of our prior observations, the methane-consuming communities were represented by two major types: the methanotrophs of the family Methylococcaceae and by non-methanotrophic methylotrophs of the family Methylophilaceae. However, different species persisted under different oxygen tensions. At high initial oxygen tensions (150 to 225 µM) the major players were, respectively, species of the genera Methylosarcina and Methylophilus, while at low initial oxygen tensions (15 to 75 µM) the major players were Methylobacter and Methylotenera. These data suggest that oxygen availability is at least one major factor determining specific partnerships in methane oxidation. The data also suggest that speciation within Methylococcaceae and Methylophilaceae may be driven by niche adaptation tailored toward specific placements within the oxygen gradient.

Journal ArticleDOI
13 Jan 2015-PeerJ
TL;DR: In this paper, the authors review and analyze body size for 25 ocean giants ranging across the animal kingdom and find considerable variability in intraspecific size distributions from strongly left-to strongly right-skewed.
Abstract: What are the greatest sizes that the largest marine megafauna obtain? This is a simple question with a difficult and complex answer. Many of the largest-sized species occur in the world's oceans. For many of these, rarity, remoteness, and quite simply the logistics of measuring these giants has made obtaining accurate size measurements difficult. Inaccurate reports of maximum sizes run rampant through the scientific literature and popular media. Moreover, how intraspecific variation in the body sizes of these animals relates to sex, population structure, the environment, and interactions with humans remains underappreciated. Here, we review and analyze body size for 25 ocean giants ranging across the animal kingdom. For each taxon we document body size for the largest known marine species of several clades. We also analyze intraspecific variation and identify the largest known individuals for each species. Where data allows, we analyze spatial and temporal intraspecific size variation. We also provide allometric scaling equations between different size measurements as resources to other researchers. In some cases, the lack of data prevents us from fully examining these topics and instead we specifically highlight these deficiencies and the barriers that exist for data collection. Overall, we found considerable variability in intraspecific size distributions from strongly left- to strongly right-skewed. We provide several allometric equations that allow for estimation of total lengths and weights from more easily obtained measurements. In several cases, we also quantify considerable geographic variation and decreases in size likely attributed to humans.

Journal ArticleDOI
17 Sep 2015-PeerJ
TL;DR: The displacement filed is effective in detection of AD and related brain-regions and is better than or comparable with not only the other proposed two methods, but also ten state-of-the-art methods.
Abstract: Aim. Alzheimer's disease (AD) is a chronic neurodegenerative disease. Recently, computer scientists have developed various methods for early detection based on computer vision and machine learning techniques. Method. In this study, we proposed a novel AD detection method by displacement field (DF) estimation between a normal brain and an AD brain. The DF was treated as the AD-related features, reduced by principal component analysis (PCA), and finally fed into three classifiers: support vector machine (SVM), generalized eigenvalue proximal SVM (GEPSVM), and twin SVM (TSVM). The 10-fold cross validation repeated 50 times. Results. The results showed the "DF + PCA + TSVM" achieved the accuracy of 92.75 ± 1.77, sensitivity of 90.56 ± 1.15, specificity of 93.37 ± 2.05, and precision of 79.61 ± 2.21. This result is better than or comparable with not only the other proposed two methods, but also ten state-of-the-art methods. Besides, our method discovers the AD is related to following brain regions disclosed in recent publications: Angular Gyrus, Anterior Cingulate, Cingulate Gyrus, Culmen, Cuneus, Fusiform Gyrus, Inferior Frontal Gyrus, Inferior Occipital Gyrus, Inferior Parietal Lobule, Inferior Semi-Lunar Lobule, Inferior Temporal Gyrus, Insula, Lateral Ventricle, Lingual Gyrus, Medial Frontal Gyrus, Middle Frontal Gyrus, Middle Occipital Gyrus, Middle Temporal Gyrus, Paracentral Lobule, Parahippocampal Gyrus, Postcentral Gyrus, Posterior Cingulate, Precentral Gyrus, Precuneus, Sub-Gyral, Superior Parietal Lobule, Superior Temporal Gyrus, Supramarginal Gyrus, and Uncus. Conclusion. The displacement filed is effective in detection of AD and related brain-regions.

Journal ArticleDOI
22 Dec 2015-PeerJ
TL;DR: The resulting strict consensus tree is the most well-resolved, stratigraphically consistent hypothesis of basal ornithischian relationships yet hypothesized and provides a comprehensive framework for testing further hypotheses regarding evolutionary patterns and processes within Ornithischia.
Abstract: The systematic relationships of taxa traditionally referred to as 'basal ornithopods' or 'hypsilophodontids' remain poorly resolved since it was discovered that these taxa are not a monophyletic group, but rather a paraphyletic set of neornithischian taxa. Thus, even as the known diversity of these taxa has dramatically increased over the past two decades, our knowledge of their placement relative to each other and the major ornithischian subclades remained incomplete. This study employs the largest phylogenetic dataset yet compiled to assess basal ornithischian relationships (255 characters for 65 species level terminal taxa). The resulting strict consensus tree is the most well-resolved, stratigraphically consistent hypothesis of basal ornithischian relationships yet hypothesized. The only non-iguanodontian ornithopod (=basal ornithopod) recovered in this analysis is Hypsilophodon foxii. The majority of former 'hypsilophodontid' taxa are recovered within a single clade (Parksosauridae) that is situated as the sister-taxon to Cerapoda. The Parksosauridae is divided between two subclades, the Orodrominae and the Thescelosaurinae. This study does not recover a clade consisting of the Asian taxa Changchunsaurus, Haya, and Jeholosaurus (=Jeholosauridae). Rather, the former two taxa are recovered as basal members of Thescelosaurinae, while the latter taxon is recovered in a clade with Yueosaurus near the base of Neornithischia.The endemic South American clade Elasmaria is recovered within the Thescelosaurinae as the sister taxon to Thescelosaurus. This study supports the origination of Dinosauria and the early diversification of Ornithischia within Gondwana. Neornithischia first arose in Africa by the Early Jurassic before dispersing to Asia before the late Middle Jurassic, where much of the diversification among non-cerapodan neornithischians occurred. Under the simplest scenario the Parksosauridae originated in North America, with at least two later dispersals to Asia and one to South America. However, when ghost lineages are considered, an alternate dispersal hypothesis has thescelosaurines dispersing from Asia into South America (via North America) during the Early Cretaceous, then back into North America in the latest Cretaceous. The latter hypothesis may explain the dominance of orodromine taxa prior to the Maastrichtian in North America and the sudden appearance and wide distribution of thescelosaurines in North America beginning in the early Maastrichtian. While the diversity of parksosaurids has greatly increased over the last fifteen years, a ghost lineage of over 40 myr is present between the base of Parksosauridae and Cerapoda, indicating that much of the early history and diversity of this clade is yet to be discovered. This new phylogenetic hypothesis provides a comprehensive framework for testing further hypotheses regarding evolutionary patterns and processes within Ornithischia.