Showing papers in "PLOS ONE in 2014"
TL;DR: Pilon is a fully automated, all-in-one tool for correcting draft assemblies and calling sequence variants of multiple sizes, including very large insertions and deletions, which is being used to improve the assemblies of thousands of new genomes and to identify variants from thousands of clinically relevant bacterial strains.
Abstract: Advances in modern sequencing technologies allow us to generate sufficient data to analyze hundreds of bacterial genomes from a single machine in a single day. This potential for sequencing massive numbers of genomes calls for fully automated methods to produce high-quality assemblies and variant calls. We introduce Pilon, a fully automated, all-in-one tool for correcting draft assemblies and calling sequence variants of multiple sizes, including very large insertions and deletions. Pilon works with many types of sequence data, but is particularly strong when supplied with paired end data from two Illumina libraries with small e.g., 180 bp and large e.g., 3-5 Kb inserts. Pilon significantly improves draft genome assemblies by correcting bases, fixing mis-assemblies and filling gaps. For both haploid and diploid genomes, Pilon produces more contiguous genomes with fewer errors, enabling identification of more biologically relevant genes. Furthermore, Pilon identifies small variants with high accuracy as compared to state-of-the-art tools and is unique in its ability to accurately identify large sequence variants including duplications and resolve large insertions. Pilon is being used to improve the assemblies of thousands of new genomes and to identify variants from thousands of clinically relevant bacterial strains. Pilon is freely available as open source software.
TL;DR: The total number of plastic particles and their weight floating in the world's oceans is estimated from 24 expeditions across all five sub-tropical gyres, costal Australia, Bay of Bengal and the Mediterranean Sea conducting surface net tows and visual survey transects of large plastic debris.
Abstract: Plastic pollution is ubiquitous throughout the marine environment, yet estimates of the global abundance and weight of floating plastics have lacked data, particularly from the Southern Hemisphere and remote regions. Here we report an estimate of the total number of plastic particles and their weight floating in the world’s oceans from 24 expeditions (2007–2013) across all five sub-tropical gyres, costal Australia, Bay of Bengal and the Mediterranean Sea conducting surface net tows (N5680) and visual survey transects of large plastic debris (N5891). Using an oceanographic model of floating debris dispersal calibrated by our data, and correcting for wind-driven vertical mixing, we estimate a minimum of 5.25 trillion particles weighing 268,940 tons. When comparing between four size classes, two microplastic ,4.75 mm and meso- and macroplastic .4.75 mm, a tremendous loss of microplastics is observed from the sea surface compared to expected rates of fragmentation, suggesting there are mechanisms at play that remove ,4.75 mm plastic particles from the ocean surface.
TL;DR: ForceAtlas2 is a force-directed layout close to other algorithms used for network spatialization, designed for the Gephi user experience, and proposed for the first time as a benchmark for the compromise between performance and quality.
Abstract: Gephi is a network visualization software used in various disciplines (social network analysis, biology, genomics…). One of its key features is the ability to display the spatialization process, aiming at transforming the network into a map, and ForceAtlas2 is its default layout algorithm. The latter is developed by the Gephi team as an all-around solution to Gephi users’ typical networks (scale-free, 10 to 10,000 nodes). We present here for the first time its functioning and settings. ForceAtlas2 is a force-directed layout close to other algorithms used for network spatialization. We do not claim a theoretical advance but an attempt to integrate different techniques such as the Barnes Hut simulation, degree-dependent repulsive force, and local and global adaptive temperatures. It is designed for the Gephi user experience (it is a continuous algorithm), and we explain which constraints it implies. The algorithm benefits from much feedback and is developed in order to provide many possibilities through its settings. We lay out its complete functioning for the users who need a precise understanding of its behaviour, from the formulas to graphic illustration of the result. We propose a benchmark for our compromise between performance and quality. We also explain why we integrated its various features and discuss our design choices.
TL;DR: The additional genes identified in this study, have an array of functions previously implicated in Alzheimer's disease, including aspects of energy metabolism, protein degradation and the immune system and add further weight to these pathways as potential therapeutic targets in Alzheimers disease.
Abstract: Background: Alzheimer's disease is a common debilitating dementia with known heritability, for which 20 late onset susceptibility loci have been identified, but more remain to be discovered. This s ...
TL;DR: The tassel-gbs pipeline, designed for the efficient processing of raw GBS sequence data into SNP genotypes, is described and benchmark it based upon a large scale, species wide analysis in maize, where the average error rate was reduced to 0.0042.
Abstract: Genotyping by sequencing (GBS) is a next generation sequencing based method that takes advantage of reduced representation to enable high throughput genotyping of large numbers of individuals at a large number of SNP markers. The relatively straightforward, robust, and cost-effective GBS protocol is currently being applied in numerous species by a large number of researchers. Herein we describe a bioinformatics pipeline, tassel-gbs, designed for the efficient processing of raw GBS sequence data into SNP genotypes. The tassel-gbs pipeline successfully fulfills the following key design criteria: (1) Ability to run on the modest computing resources that are typically available to small breeding or ecological research programs, including desktop or laptop machines with only 8–16 GB of RAM, (2) Scalability from small to extremely large studies, where hundreds of thousands or even millions of SNPs can be scored in up to 100,000 individuals (e.g., for large breeding programs or genetic surveys), and (3) Applicability in an accelerated breeding context, requiring rapid turnover from tissue collection to genotypes. Although a reference genome is required, the pipeline can also be run with an unfinished “pseudo-reference” consisting of numerous contigs. We describe the tassel-gbs pipeline in detail and benchmark it based upon a large scale, species wide analysis in maize (Zea mays), where the average error rate was reduced to 0.0042 through application of population genetic-based SNP filters. Overall, the GBS assay and the tassel-gbs pipeline provide robust tools for studying genomic diversity.
TL;DR: Beta diversity analysis showed a species-based differentiation between GP-PR and M manufactures indicating differences between the preparations, and the possibility of using non rRNA targets for quantitative biotype identification in food was highlighted.
Abstract: Mozzarella (M), Grana Padano (GP) and Parmigiano Reggiano (PR) are three of the most important traditional Italian cheeses. In the three cheese manufactures the initial fermentation is carried out by adding natural whey cultures (NWCs) according to a back-slopping procedure. In this study, NWCs and the corresponding curds from M, GP and PR manufactures were analyzed by culture-independent pyrosequencing of the amplified V1–V3 regions of the 16S rRNA gene, in order to provide insights into the microbiota involved in the curd acidification. Moreover, culture-independent high-throughput sequencing of lacS gene amplicons was carried out to evaluate the biodiversity occurring within the S. thermophilus species. Beta diversity analysis showed a species-based differentiation between GP-PR and M manufactures indicating differences between the preparations. Nevertheless, all the samples shared a naturally-selected core microbiome, that is involved in the curd acidification. Type-level variability within S. thermophilus species was also found and twenty-eight lacS gene sequence types were identified. Although lacS gene did not prove variable enough within S. thermophilus species to be used for quantitative biotype monitoring, the possibility of using non rRNA targets for quantitative biotype identification in food was highlighted.
TL;DR: The aim of the present study was to develop a robust 96-plex immunoassay based on the proximity extension assay (PEA) for improved high throughput detection of protein biomarkers and the development of the current multiplex technique is a step toward robust high throughput protein marker discovery and research.
Abstract: Medical research is developing an ever greater need for comprehensive high-quality data generation to realize the promises of personalized health care based on molecular biomarkers. The nucleic acid proximity-based methods proximity ligation and proximity extension assays have, with their dual reporters, shown potential to relieve the shortcomings of antibodies and their inherent cross-reactivity in multiplex protein quantification applications. The aim of the present study was to develop a robust 96-plex immunoassay based on the proximity extension assay (PEA) for improved high throughput detection of protein biomarkers. This was enabled by: (1) a modified design leading to a reduced number of pipetting steps compared to the existing PEA protocol, as well as improved intra-assay precision; (2) a new enzymatic system that uses a hyper-thermostabile enzyme, Pwo, for uniting the two probes allowing for room temperature addition of all reagents and improved the sensitivity; (3) introduction of an inter-plate control and a new normalization procedure leading to improved inter-assay precision (reproducibility). The multiplex proximity extension assay was found to perform well in complex samples, such as serum and plasma, and also in xenografted mice and resuspended dried blood spots, consuming only 1 µL sample per test. All-in-all, the development of the current multiplex technique is a step toward robust high throughput protein marker discovery and research.
TL;DR: Sequence Demarcation Tool (SDT) as discussed by the authors is a free user-friendly computer program that aims to provide a robust and highly reproducible means of objectively using pairwise genetic identity calculations to classify any set of nucleotide or amino acid sequences.
Abstract: The perpetually increasing rate at which viral full-genome sequences are being determined is creating a pressing demand for computational tools that will aid the objective classification of these genome sequences. Taxonomic classification approaches that are based on pairwise genetic identity measures are potentially highly automatable and are progressively gaining favour with the International Committee on Taxonomy of Viruses (ICTV). There are, however, various issues with the calculation of such measures that could potentially undermine the accuracy and consistency with which they can be applied to virus classification. Firstly, pairwise sequence identities computed based on multiple sequence alignments rather than on multiple independent pairwise alignments can lead to the deflation of identity scores with increasing dataset sizes. Also, when gap-characters need to be introduced during sequence alignments to account for insertions and deletions, methodological variations in the way that these characters are introduced and handled during pairwise genetic identity calculations can cause high degrees of inconsistency in the way that different methods classify the same sets of sequences. Here we present Sequence Demarcation Tool (SDT), a free user-friendly computer program that aims to provide a robust and highly reproducible means of objectively using pairwise genetic identity calculations to classify any set of nucleotide or amino acid sequences. SDT can produce publication quality pairwise identity plots and colour-coded distance matrices to further aid the classification of sequences according to ICTV approved taxonomic demarcation criteria. Besides a graphical interface version of the program for Windows computers, command-line versions of the program are available for a variety of different operating systems (including a parallel version for cluster computing platforms).
TL;DR: An R package called pRRophetic is created, allowing prediction of clinical drug response for many cancer drugs in a user-friendly R environment and showing that prediction of bortezomib sensitivity in multiple myeloma may be improved by training models on a large set of neoplastic hematological cell lines.
Abstract: We recently described a methodology that reliably predicted chemotherapeutic response in multiple independent clinical trials. The method worked by building statistical models from gene expression and drug sensitivity data in a very large panel of cancer cell lines, then applying these models to gene expression data from primary tumor biopsies. Here, to facilitate the development and adoption of this methodology we have created an R package called pRRophetic. This also extends the previously described pipeline, allowing prediction of clinical drug response for many cancer drugs in a user-friendly R environment. We have developed several other important use cases; as an example, we have shown that prediction of bortezomib sensitivity in multiple myeloma may be improved by training models on a large set of neoplastic hematological cell lines. We have also shown that the package facilitates model development and prediction using several different classes of data.
TL;DR: The findings demonstrate that the prokaryotic universal primer set designed in the present study will permit the simultaneous detection of Bacteria and Archaea, and will therefore allow for a more comprehensive understanding of microbial community structures in environmental samples.
Abstract: For the analysis of microbial community structure based on 16S rDNA sequence diversity, sensitive and robust PCR amplification of 16S rDNA is a critical step. To obtain accurate microbial composition data, PCR amplification must be free of bias; however, amplifying all 16S rDNA species with equal efficiency from a sample containing a large variety of microorganisms remains challenging. Here, we designed a universal primer based on the V3-V4 hypervariable region of prokaryotic 16S rDNA for the simultaneous detection of Bacteria and Archaea in fecal samples from crossbred pigs (Landrace×Large white×Duroc) using an Illumina MiSeq next-generation sequencer. In-silico analysis showed that the newly designed universal prokaryotic primers matched approximately 98.0% of Bacteria and 94.6% of Archaea rRNA gene sequences in the Ribosomal Database Project database. For each sequencing reaction performed with the prokaryotic universal primer, an average of 69,330 (±20,482) reads were obtained, of which archaeal rRNA genes comprised approximately 1.2% to 3.2% of all prokaryotic reads. In addition, the detection frequency of Bacteria belonging to the phylum Verrucomicrobia, including members of the classes Verrucomicrobiae and Opitutae, was higher in the NGS analysis using the prokaryotic universal primer than that performed with the bacterial universal primer. Importantly, this new prokaryotic universal primer set had markedly lower bias than that of most previously designed universal primers. Our findings demonstrate that the prokaryotic universal primer set designed in the present study will permit the simultaneous detection of Bacteria and Archaea, and will therefore allow for a more comprehensive understanding of microbial community structures in environmental samples.
TL;DR: A method that can identify arbitrary numbers of time-varying diversification processes on phylogenies without specifying their locations in advance is developed and will greatly facilitate the exploration of macroevolutionary dynamics across large phylogenetic trees, which may have been shaped by heterogeneous mixtures of distinct evolutionary processes.
Abstract: A number of methods have been developed to infer differential rates of species diversification through time and among clades using time-calibrated phylogenetic trees. However, we lack a general framework that can delineate and quantify heterogeneous mixtures of dynamic processes within single phylogenies. I developed a method that can identify arbitrary numbers of time-varying diversification processes on phylogenies without specifying their locations in advance. The method uses reversible-jump Markov Chain Monte Carlo to move between model subspaces that vary in the number of distinct diversification regimes. The model assumes that changes in evolutionary regimes occur across the branches of phylogenetic trees under a compound Poisson process and explicitly accounts for rate variation through time and among lineages. Using simulated datasets, I demonstrate that the method can be used to quantify complex mixtures of time-dependent, diversity-dependent, and constant-rate diversification processes. I compared the performance of the method to the MEDUSA model of rate variation among lineages. As an empirical example, I analyzed the history of speciation and extinction during the radiation of modern whales. The method described here will greatly facilitate the exploration of macroevolutionary dynamics across large phylogenetic trees, which may have been shaped by heterogeneous mixtures of distinct evolutionary processes.
TL;DR: SoilGrids1km provides an initial set of examples of soil spatial data for input into global models at a resolution and consistency not previously available and results of regression modeling indicate that the most useful covariates for modeling soils at the global scale are climatic and biomass indices, lithology, and taxonomic mapping units derived from conventional soil survey.
Abstract: Background: Soils are widely recognized as a non-renewable natural resource and as biophysical carbon sinks. As such, there is a growing requirement for global soil information. Although several global soil information systems already exist, these tend to suffer from inconsistencies and limited spatial detail. Methodology/Principal Findings: We present SoilGrids1km — a global 3D soil information system at 1 km resolution — containing spatial predictions for a selection of soil properties (at six standard depths): soil organic carbon (g kg21), soil pH, sand, silt and clay fractions (%), bulk density (kg m23), cation-exchange capacity (cmol+/kg), coarse fragments (%), soil organic carbon stock (t ha21), depth to bedrock (cm), World Reference Base soil groups, and USDA Soil Taxonomy suborders. Our predictions are based on global spatial prediction models which we fitted, per soil variable, using a compilation of major international soil profile databases (ca. 110,000 soil profiles), and a selection of ca. 75 global environmental covariates representing soil forming factors. Results of regression modeling indicate that the most useful covariates for modeling soils at the global scale are climatic and biomass indices (based on MODIS images), lithology, and taxonomic mapping units derived from conventional soil survey (Harmonized World Soil Database). Prediction accuracies assessed using 5–fold cross-validation were between 23–51%. Conclusions/Significance: SoilGrids1km provide an initial set of examples of soil spatial data for input into global models at a resolution and consistency not previously available. Some of the main limitations of the current version of SoilGrids1km are: (1) weak relationships between soil properties/classes and explanatory variables due to scale mismatches, (2) difficulty to obtain covariates that capture soil forming factors, (3) low sampling density and spatial clustering of soil profile locations. However, as the SoilGrids system is highly automated and flexible, increasingly accurate predictions can be generated as new input data become available. SoilGrids1km are available for download via http://soilgrids.org under a Creative Commons Non Commercial license.
TL;DR: RNA-Seq was superior in detecting low abundance transcripts, differentiating biologically critical isoforms, and allowing the identification of genetic variants, while microarray demonstrated a broader dynamic range than microarray, which allowed for the detection of more differentially expressed genes with higher fold-change.
Abstract: To demonstrate the benefits of RNA-Seq over microarray in transcriptome profiling, both RNA-Seq and microarray analyses were performed on RNA samples from a human T cell activation experiment. In contrast to other reports, our analyses focused on the difference, rather than similarity, between RNA-Seq and microarray technologies in transcriptome profiling. A comparison of data sets derived from RNA-Seq and Affymetrix platforms using the same set of samples showed a high correlation between gene expression profiles generated by the two platforms. However, it also demonstrated that RNA-Seq was superior in detecting low abundance transcripts, differentiating biologically critical isoforms, and allowing the identification of genetic variants. RNA-Seq also demonstrated a broader dynamic range than microarray, which allowed for the detection of more differentially expressed genes with higher fold-change. Analysis of the two datasets also showed the benefit derived from avoidance of technical issues inherent to microarray probe performance such as cross-hybridization, non-specific hybridization and limited detection range of individual probes. Because RNA-Seq does not rely on a pre-designed complement sequence detection probe, it is devoid of issues associated with probe redundancy and annotation, which simplified interpretation of the data. Despite the superior benefits of RNA-Seq, microarrays are still the more common choice of researchers when conducting transcriptional profiling experiments. This is likely because RNA-Seq sequencing technology is new to most researchers, more expensive than microarray, data storage is more challenging and analysis is more complex. We expect that once these barriers are overcome, the RNA-Seq platform will become the predominant tool for transcriptome analysis.
TL;DR: There is strong evidence for PT interventions favoring intensive high repetitive task-oriented and task-specific training in all phases poststroke, and suggestions for prioritizing PT stroke research are given.
Abstract: Background Physical therapy (PT) is one of the key disciplines in interdisciplinary stroke rehabilitation. The aim of this systematic review was to provide an update of the evidence for stroke rehabilitation interventions in the domain of PT.
TL;DR: Criteria 1 was the research question or objective clearly stated, the study population clearly specified and defined, and the participation rate of eligible persons at least 50% was at least50%.
Abstract: Criteria 1 Was the research question or objective in this paper clearly stated? 2 Was the study population clearly specified and defined? 3 Was the participation rate of eligible persons at least 50%? 4 Were all the subjects selected or recruited from the same or similar populations (including the same time period)? Were inclusion and exclusion criteria for being in the study prespecified and applied uniformly to all participants? 5 Was a sample size justification, power description, or variance and effect estimates provided? 6 For the analyses in this paper, were the exposure(s) of interest measured prior to the outcome(s) being measured? 7 Was the timeframe sufficient so that one could reasonably expect to see an association between exposure and outcome if it existed? 8 For exposures that can vary in amount or level, did the study examine different levels of the exposure as related to the outcome (e.g., categories of exposure, or exposure measured as continuous variable)? 9 Were the exposure measures (independent variables) clearly defined, valid, reliable, and implemented consistently across all study participants? 10 Was the exposure(s) assessed more than once over time? 11 Were the outcome measures (dependent variables) clearly defined, valid, reliable, and implemented consistently across all study participants? 12 Were the outcome assessors blinded to the exposure status of participants? 13 Was loss to follow-up after baseline 20% or less? 14 Were key potential confounding variables measured and adjusted statistically for their impact on the relationship between exposure(s) and outcome(s)? Quality rating Rater #1 TN Rater #2 BST
TL;DR: An easy-to-use tool named HemI (Heat map Illustrator), which can visualize either gene or protein expression data in heatmaps and provides multiple clustering strategies for analyzing the data.
Abstract: Recent high-throughput techniques have generated a flood of biological data in all aspects. The transformation and visualization of multi-dimensional and numerical gene or protein expression data in a single heatmap can provide a concise but comprehensive presentation of molecular dynamics under different conditions. In this work, we developed an easy-to-use tool named HemI (Heat map Illustrator), which can visualize either gene or protein expression data in heatmaps. Additionally, the heatmaps can be recolored, rescaled or rotated in a customized manner. In addition, HemI provides multiple clustering strategies for analyzing the data. Publication-quality figures can be exported directly. We propose that HemI can be a useful toolkit for conveniently visualizing and manipulating heatmaps. The stand-alone packages of HemI were implemented in Java and can be accessed at http://hemi.biocuckoo.org/down.php.
TL;DR: An approach to determining confidence in the output of a network meta-analysis is proposed based on methodology developed by the Grading of Recommendations Assessment, Development and Evaluation Working Group for pairwise meta-analyses and applied to a systematic review comparing topical antibiotics without steroids for chronically discharging ears with underlying eardrum perforations.
Abstract: Systematic reviews that collate data about the relative effects of multiple interventions via network meta-analysis are highly informative for decision-making purposes. A network meta-analysis provides two types of findings for a specific outcome: the relative treatment effect for all pairwise comparisons, and a ranking of the treatments. It is important to consider the confidence with which these two types of results can enable clinicians, policy makers and patients to make informed decisions. We propose an approach to determining confidence in the output of a network meta-analysis. Our proposed approach is based on methodology developed by the Grading of Recommendations Assessment, Development and Evaluation (GRADE) Working Group for pairwise meta-analyses. The suggested framework for evaluating a network meta-analysis acknowledges (i) the key role of indirect comparisons (ii) the contributions of each piece of direct evidence to the network meta-analysis estimates of effect size; (iii) the importance of the transitivity assumption to the validity of network meta-analysis; and (iv) the possibility of disagreement between direct evidence and indirect evidence. We apply our proposed strategy to a systematic review comparing topical antibiotics without steroids for chronically discharging ears with underlying eardrum perforations. The proposed framework can be used to determine confidence in the results from a network meta-analysis. Judgements about evidence from a network meta-analysis can be different from those made about evidence from pairwise meta-analyses.
TL;DR: A simple method of automated digital IHC image analysis algorithm for an unbiased, quantitative assessment of antibody staining intensity in tissue sections and can be adopted globally for scoring most protein targets where the marker protein expression is of cytoplasmic and/or nuclear type.
Abstract: In anatomic pathology, immunohistochemistry (IHC) serves as a diagnostic and prognostic method for identification of disease markers in tissue samples that directly influences classification and grading the disease, influencing patient management. However, till today over most of the world, pathological analysis of tissue samples remained a time-consuming and subjective procedure, wherein the intensity of antibody staining is manually judged and thus scoring decision is directly influenced by visual bias. This instigated us to design a simple method of automated digital IHC image analysis algorithm for an unbiased, quantitative assessment of antibody staining intensity in tissue sections. As a first step, we adopted the spectral deconvolution method of DAB/hematoxylin color spectra by using optimized optical density vectors of the color deconvolution plugin for proper separation of the DAB color spectra. Then the DAB stained image is displayed in a new window wherein it undergoes pixel-by-pixel analysis, and displays the full profile along with its scoring decision. Based on the mathematical formula conceptualized, the algorithm is thoroughly tested by analyzing scores assigned to thousands (n = 1703) of DAB stained IHC images including sample images taken from human protein atlas web resource. The IHC Profiler plugin developed is compatible with the open resource digital image analysis software, ImageJ, which creates a pixel-by-pixel analysis profile of a digital IHC image and further assigns a score in a four tier system. A comparison study between manual pathological analysis and IHC Profiler resolved in a match of 88.6% (P<0.0001, CI = 95%). This new tool developed for clinical histopathological sample analysis can be adopted globally for scoring most protein targets where the marker protein expression is of cytoplasmic and/or nuclear type. We foresee that this method will minimize the problem of inter-observer variations across labs and further help in worldwide patient stratification potentially benefitting various multinational clinical trial initiatives.
TL;DR: Improved methods and detailed protocols make Cas9-mediated mutagenesis an attractive approach for labs of all sizes and increase rates of mutagenisation by implementing several novel approaches.
Abstract: The CRISPR/Cas9 system has been implemented in a variety of model organisms to mediate site-directed mutagenesis. A wide range of mutation rates has been reported, but at a limited number of genomic target sites. To uncover the rules that govern effective Cas9-mediated mutagenesis in zebrafish, we targeted over a hundred genomic loci for mutagenesis using a streamlined and cloning-free method. We generated mutations in 85% of target genes with mutation rates varying across several orders of magnitude, and identified sequence composition rules that influence mutagenesis. We increased rates of mutagenesis by implementing several novel approaches. The activities of poor or unsuccessful single-guide RNAs (sgRNAs) initiating with a 5′ adenine were improved by rescuing 5′ end homogeneity of the sgRNA. In some cases, direct injection of Cas9 protein/sgRNA complex further increased mutagenic activity. We also observed that low diversity of mutant alleles led to repeated failure to obtain frame-shift mutations. This limitation was overcome by knock-in of a stop codon cassette that ensured coding frame truncation. Our improved methods and detailed protocols make Cas9-mediated mutagenesis an attractive approach for labs of all sizes.
TL;DR: The powerlaw Python package provides easy commands for basic fitting and statistical analysis of distributions and seeks to support a variety of user needs by being exhaustive in the options available to the user.
Abstract: Power laws are theoretically interesting probability distributions that are also frequently used to describe empirical data. In recent years, effective statistical methods for fitting power laws have been developed, but appropriate use of these techniques requires significant programming and statistical insight. In order to greatly decrease the barriers to using good statistical methods for fitting power law distributions, we developed the powerlaw Python package. This software package provides easy commands for basic fitting and statistical analysis of distributions. Notably, it also seeks to support a variety of user needs by being exhaustive in the options available to the user. The source code is publicly available and easily extensible.
TL;DR: The ability of methods to correct the initial sampling bias varied greatly depending on bias type, bias intensity and species, but the simple systematic sampling of records consistently ranked among the best performing across the range of conditions tested, whereas other methods performed more poorly in most cases.
Abstract: MAXENT is now a common species distribution modeling (SDM) tool used by conservation practitioners for predicting the distribution of a species from a set of records and environmental predictors. However, datasets of species occurrence used to train the model are often biased in the geographical space because of unequal sampling effort across the study area. This bias may be a source of strong inaccuracy in the resulting model and could lead to incorrect predictions. Although a number of sampling bias correction methods have been proposed, there is no consensual guideline to account for it. We compared here the performance of five methods of bias correction on three datasets of species occurrence: one “virtual” derived from a land cover map, and two actual datasets for a turtle (Chrysemys picta) and a salamander (Plethodon cylindraceus). We subjected these datasets to four types of sampling biases corresponding to potential types of empirical biases. We applied five correction methods to the biased samples and compared the outputs of distribution models to unbiased datasets to assess the overall correction performance of each method. The results revealed that the ability of methods to correct the initial sampling bias varied greatly depending on bias type, bias intensity and species. However, the simple systematic sampling of records consistently ranked among the best performing across the range of conditions tested, whereas other methods performed more poorly in most cases. The strong effect of initial conditions on correction performance highlights the need for further research to develop a step-by-step guideline to account for sampling bias. However, this method seems to be the most efficient in correcting sampling bias and should be advised in most cases.
TL;DR: The limitations of the current evidence base means that further and better designed studies are needed to inform policy, research and clinical practice, with the goal of improving health-related quality of life for patients with multimorbidity.
Abstract: Introduction Multimorbidity is a major concern in primary care. Nevertheless, evidence of prevalence and patterns of multimorbidity, and their determinants, are scarce. The aim of this study is to systematically review studies of the prevalence, patterns and determinants of multimorbidity in primary care. Methods Systematic review of literature published between 1961 and 2013 and indexed in Ovid (CINAHL, PsychINFO, Medline and Embase) and Web of Knowledge. Studies were selected according to eligibility criteria of addressing prevalence, determinants, and patterns of multimorbidity and using a pretested proforma in primary care. The quality and risk of bias were assessed using STROBE criteria. Two researchers assessed the eligibility of studies for inclusion (Kappa = 0.86). Results We identified 39 eligible publications describing studies that included a total of 70,057,611 patients in 12 countries. The number of health conditions analysed per study ranged from 5 to 335, with multimorbidity prevalence ranging from 12.9% to 95.1%. All studies observed a significant positive association between multimorbidity and age (odds ratio [OR], 1.26 to 227.46), and lower socioeconomic status (OR, 1.20 to 1.91). Positive associations with female gender and mental disorders were also observed. The most frequent patterns of multimorbidity included osteoarthritis together with cardiovascular and/or metabolic conditions. Conclusions Well-established determinants of multimorbidity include age, lower socioeconomic status and gender. The most prevalent conditions shape the patterns of multimorbidity. However, the limitations of the current evidence base means that further and better designed studies are needed to inform policy, research and clinical practice, with the goal of improving health-related quality of life for patients with multimorbidity. Standardization of the definition and assessment of multimorbidity is essential in order to better understand this phenomenon, and is a necessary immediate step.
TL;DR: People who had been exposed to material supporting anti-vaccine conspiracy theories showed less intention to vaccinate than those in the anti-conspiracy condition or controls.
Abstract: The current studies investigated the potential impact of anti-vaccine conspiracy beliefs, and exposure to anti-vaccine conspiracy theories, on vaccination intentions. In Study 1, British parents completed a questionnaire measuring beliefs in anti-vaccine conspiracy theories and the likelihood that they would have a fictitious child vaccinated. Results revealed a significant negative relationship between anti-vaccine conspiracy beliefs and vaccination intentions. This effect was mediated by the perceived dangers of vaccines, and feelings of powerlessness, disillusionment and mistrust in authorities. In Study 2, participants were exposed to information that either supported or refuted anti-vaccine conspiracy theories, or a control condition. Results revealed that participants who had been exposed to material supporting anti-vaccine conspiracy theories showed less intention to vaccinate than those in the anti-conspiracy condition or controls. This effect was mediated by the same variables as in Study 1. These findings point to the potentially detrimental consequences of anti-vaccine conspiracy theories, and highlight their potential role in shaping health-related behaviors.
TL;DR: It is shown that sedentary behavior may be an important determinant of health, independently of physical activity, however, the relationship is complex because it depends on the type of sedentarybehavior and the age group studied.
Abstract: Objective 1) To synthesize the current observational evidence for the association between sedentary behavior and health outcomes using information from systematic reviews. 2) To assess the methodological quality of the systematic reviews found. Methodology/Principal Findings Medline; Excerpta Medica (Embase); PsycINFO; and Web of Science were searched for reviews published up to September 2013. Additional publications were provided by Sedentary Behaviour Research Network members. The methodological quality of the systematic reviews was evaluated using recommended standard criteria from AMSTAR. For each review, improper use of causal language in the description of their main results/conclusion was evaluated. Altogether, 1,044 review titles were identified, 144 were read in their entirety, and 27 were included. Based on the systematic reviews with the best methodological quality, we found in children and adolescents, strong evidence of a relationship between time spent in sedentary behavior and obesity. Moreover, moderate evidence was observed for blood pressure and total cholesterol, self-esteem, social behavior problems, physical fitness and academic achievement. In adults, we found strong evidence of a relationship between sedentary behavior and all-cause mortality, fatal and non-fatal cardiovascular disease, type 2 diabetes and metabolic syndrome. In addition, there is moderate evidence for incidence rates of ovarian, colon and endometrial cancers. Conclusions This overview based on the best available systematics reviews, shows that sedentary behavior may be an important determinant of health, independently of physical activity. However, the relationship is complex because it depends on the type of sedentary behavior and the age group studied. The relationship between sedentary behavior and many health outcomes remains uncertain; thus, further studies are warranted.
TL;DR: Google Trends is being used to study health phenomena in a variety of topic domains in myriad ways and poor documentation of methods precludes the reproducibility of the findings, but greater transparency can improve its reliability as a research tool.
Abstract: Google Trends is a novel, freely accessible tool that allows users to interact with Internet search data, which may provide deep insights into population behavior and health-related phenomena. However, there is limited knowledge about its potential uses and limitations. We therefore systematically reviewed health care literature using Google Trends to classify articles by topic and study aim; evaluate the methodology and validation of the tool; and address limitations for its use in research. PRISMA guidelines were followed. Two independent reviewers systematically identified studies utilizing Google Trends for health care research from MEDLINE and PubMed. Seventy studies met our inclusion criteria. Google Trends publications increased seven-fold from 2009 to 2013. Studies were classified into four topic domains: infectious disease (27% of articles), mental health and substance use (24%), other non-communicable diseases (16%), and general population behavior (33%). By use, 27% of articles utilized Google Trends for casual inference, 39% for description, and 34% for surveillance. Among surveillance studies, 92% were validated against a reference standard data source, and 80% of studies using correlation had a correlation statistic ≥0.70. Overall, 67% of articles provided a rationale for their search input. However, only 7% of articles were reproducible based on complete documentation of search strategy. We present a checklist to facilitate appropriate methodological documentation for future studies. A limitation of the study is the challenge of classifying heterogeneous studies utilizing a novel data source. Google Trends is being used to study health phenomena in a variety of topic domains in myriad ways. However, poor documentation of methods precludes the reproducibility of the findings. Such documentation would enable other researchers to determine the consistency of results provided by Google Trends for a well-specified query over time. Furthermore, greater transparency can improve its reliability as a research tool.
TL;DR: Five miRNA binding site SNPs associated significantly with breast cancer risk are located in the 3′ UTR of CASP8, HDDC3, DROSHA, MUSTN1, and MYCL1, respectively, which belongs to miRNA machinery genes and has a central role in initial miRNA processing.
Abstract: Genetic variations, such as single nucleotide polymorphisms (SNPs) in microRNAs (miRNA) or in the miRNA binding sites may affect the miRNA dependent gene expression regulation, which has been implicated in various cancers, including breast cancer, and may alter individual susceptibility to cancer. We investigated associations between miRNA related SNPs and breast cancer risk. First we evaluated 2,196 SNPs in a case-control study combining nine genome wide association studies (GWAS). Second, we further investigated 42 SNPs with suggestive evidence for association using 41,785 cases and 41,880 controls from 41 studies included in the Breast Cancer Association Consortium (BCAC). Combining the GWAS and BCAC data within a meta-analysis, we estimated main effects on breast cancer risk as well as risks for estrogen receptor (ER) and age defined subgroups. Five miRNA binding site SNPs associated significantly with breast cancer risk: rs1045494 (odds ratio (OR) 0.92; 95% confidence interval (CI): 0.88-0.96), rs1052532 (OR 0.97; 95% CI: 0.95-0.99), rs10719 (OR 0.97; 95% CI: 0.94-0.99), rs4687554 (OR 0.97; 95% CI: 0.95-0.99, and rs3134615 (OR 1.03; 95% CI: 1.01-1.05) located in the 3' UTR of CASP8, HDDC3, DROSHA, MUSTN1, and MYCL1, respectively. DROSHA belongs to miRNA machinery genes and has a central role in initial miRNA processing. The remaining genes are involved in different molecular functions, including apoptosis and gene expression regulation. Further studies are warranted to elucidate whether the miRNA binding site SNPs are the causative variants for the observed risk effects.
TL;DR: Exosomal miRNA signatures appear to mirror pathological changes of CRC patients and several miRNAs are promising biomarkers for non-invasive diagnosis of the disease.
Abstract: Purpose Exosomal microRNAs (miRNAs) have been attracting major interest as potential diagnostic biomarkers of cancer. The aim of this study was to characterize the miRNA profiles of serum exosomes and to identify those that are altered in colorectal cancer (CRC). To evaluate their use as diagnostic biomarkers, the relationship between specific exosomal miRNA levels and pathological changes of patients, including disease stage and tumor resection, was examined. Experimental Design Microarray analyses of miRNAs in exosome-enriched fractions of serum samples from 88 primary CRC patients and 11 healthy controls were performed. The expression levels of miRNAs in the culture medium of five colon cancer cell lines were also compared with those in the culture medium of a normal colon-derived cell line. The expression profiles of miRNAs that were differentially expressed between CRC and control sample sets were verified using 29 paired samples from post-tumor resection patients. The sensitivities of selected miRNAs as biomarkers of CRC were evaluated and compared with those of known tumor markers (CA19-9 and CEA) using a receiver operating characteristic analysis. The expression levels of selected miRNAs were also validated by quantitative real-time RT-PCR analyses of an independent set of 13 CRC patients. Results The serum exosomal levels of seven miRNAs (let-7a, miR-1229, miR-1246, miR-150, miR-21, miR-223, and miR-23a) were significantly higher in primary CRC patients, even those with early stage disease, than in healthy controls, and were significantly down-regulated after surgical resection of tumors. These miRNAs were also secreted at significantly higher levels by colon cancer cell lines than by a normal colon-derived cell line. The high sensitivities of the seven selected exosomal miRNAs were confirmed by a receiver operating characteristic analysis. Conclusion Exosomal miRNA signatures appear to mirror pathological changes of CRC patients and several miRNAs are promising biomarkers for non-invasive diagnosis of the disease.
TL;DR: New global distribution maps at 1 km resolution for cattle, pigs and chickens, and a partial distribution map for ducks are presented and made publically available via the Livestock Geo-Wiki.
Abstract: Livestock contributes directly to the livelihoods and food security of almost a billion people and affects the diet and health of many more. With estimated standing populations of 1.43 billion cattle, 1.87 billion sheep and goats, 0.98 billion pigs, and 19.60 billion chickens, reliable and accessible information on the distribution and abundance of livestock is needed for a many reasons. These include analyses of the social and economic aspects of the livestock sector; the environmental impacts of livestock such as the production and management of waste, greenhouse gas emissions and livestock-related land-use change; and large-scale public health and epidemiological investigations. The Gridded Livestock of the World (GLW) database, produced in 2007, provided modelled livestock densities of the world, adjusted to match official (FAOSTAT) national estimates for the reference year 2005, at a spatial resolution of 3 minutes of arc (about 5×5 km at the equator). Recent methodological improvements have significantly enhanced these distributions: more up-to date and detailed sub-national livestock statistics have been collected; a new, higher resolution set of predictor variables is used; and the analytical procedure has been revised and extended to include a more systematic assessment of model accuracy and the representation of uncertainties associated with the predictions. This paper describes the current approach in detail and presents new global distribution maps at 1 km resolution for cattle, pigs and chickens, and a partial distribution map for ducks. These digital layers are made publically available via the Livestock Geo-Wiki (http://www.livestock.geo-wiki.org), as will be the maps of other livestock types as they are produced.
TL;DR: This is the first study to provide normative data for grip strength across the life course and these centile values have the potential to inform the clinical assessment of grip strength which is recognised as an important part of the identification of people with sarcopenia and frailty.
Abstract: Introduction: Epidemiological studies have shown that weaker grip strength in later life is associated with disability, morbidity, and mortality. Grip strength is a key component of the sarcopenia and frailty phenotypes and yet it is unclear how individual measurements should be interpreted. Our objective was to produce cross-sectional centile values for grip strength across the life course. A secondary objective was to examine the impact of different aspects of measurement protocol. Methods: We combined 60,803 observations from 49,964 participants (26,687 female) of 12 general population studies in Great Britain. We produced centile curves for ages 4 to 90 and investigated the prevalence of weak grip, defined as strength at least 2.5 SDs below the gender-specific peak mean. We carried out a series of sensitivity analyses to assess the impact of dynamometer type and measurement position (seated or standing). Results: Our results suggested three overall periods: an increase to peak in early adult life, maintenance through to midlife, and decline from midlife onwards. Males were on average stronger than females from adolescence onwards: males’ peak median grip was 51 kg between ages 29 and 39, compared to 31 kg in females between ages 26 and 42. Weak grip strength, defined as strength at least 2.5 SDs below the gender-specific peak mean, increased sharply with age, reaching a prevalence of 23% in males and 27% in females by age 80. Sensitivity analyses
TL;DR: A meta-analysis of the agronomic and economic impacts of GM crops reveals robust evidence of GM crop benefits for farmers in developed and developing countries.
Abstract: Background Despite the rapid adoption of genetically modified (GM) crops by farmers in many countries, controversies about this technology continue. Uncertainty about GM crop impacts is one reason for widespread public suspicion. Objective We carry out a meta-analysis of the agronomic and economic impacts of GM crops to consolidate the evidence. Data Sources Original studies for inclusion were identified through keyword searches in ISI Web of Knowledge, Google Scholar, EconLit, and AgEcon Search. Study Eligibility Criteria Studies were included when they build on primary data from farm surveys or field trials anywhere in the world, and when they report impacts of GM soybean, maize, or cotton on crop yields, pesticide use, and/or farmer profits. In total, 147 original studies were included. Synthesis Methods Analysis of mean impacts and meta-regressions to examine factors that influence outcomes. Results On average, GM technology adoption has reduced chemical pesticide use by 37%, increased crop yields by 22%, and increased farmer profits by 68%. Yield gains and pesticide reductions are larger for insect-resistant crops than for herbicide-tolerant crops. Yield and profit gains are higher in developing countries than in developed countries. Limitations Several of the original studies did not report sample sizes and measures of variance. Conclusion The meta-analysis reveals robust evidence of GM crop benefits for farmers in developed and developing countries. Such evidence may help to gradually increase public trust in this technology.