Showing papers in "PLOS ONE in 2011"
TL;DR: A procedure for constructing GBS libraries based on reducing genome complexity with restriction enzymes (REs) is reported, which is simple, quick, extremely specific, highly reproducible, and may reach important regions of the genome that are inaccessible to sequence capture approaches.
Abstract: Advances in next generation technologies have driven the costs of DNA sequencing down to the point that genotyping-by-sequencing (GBS) is now feasible for high diversity, large genome species. Here, we report a procedure for constructing GBS libraries based on reducing genome complexity with restriction enzymes (REs). This approach is simple, quick, extremely specific, highly reproducible, and may reach important regions of the genome that are inaccessible to sequence capture approaches. By using methylation-sensitive REs, repetitive regions of genomes can be avoided and lower copy regions targeted with two to three fold higher efficiency. This tremendously simplifies computationally challenging alignment problems in species with high levels of genetic diversity. The GBS procedure is demonstrated with maize (IBM) and barley (Oregon Wolfe Barley) recombinant inbred populations where roughly 200,000 and 25,000 sequence tags were mapped, respectively. An advantage in species like barley that lack a complete genome sequence is that a reference map need only be developed around the restriction sites, and this can be done in the process of sample genotyping. In such cases, the consensus of the read clusters across the sequence tagged sites becomes the reference. Alternatively, for kinship analyses in the absence of a reference genome, the sequence tags can simply be treated as dominant markers. Future application of GBS to breeding, conservation, and global species and population surveys may allow plant breeders to conduct genomic selection on a novel germplasm or species without first having to develop any prior molecular tools, or conservation biologists to determine population structure without prior knowledge of the genome or diversity in the species.
TL;DR: REVIGO is a Web server that summarizes long, unintelligible lists of GO terms by finding a representative subset of the terms using a simple clustering algorithm that relies on semantic similarity measures.
Abstract: Outcomes of high-throughput biological experiments are typically interpreted by statistical testing for enriched gene functional categories defined by the Gene Ontology (GO). The resulting lists of GO terms may be large and highly redundant, and thus difficult to interpret. REVIGO is a Web server that summarizes long, unintelligible lists of GO terms by finding a representative subset of the terms using a simple clustering algorithm that relies on semantic similarity measures. Furthermore, REVIGO visualizes this non-redundant GO term set in multiple ways to assist in interpretation: multidimensional scaling and graph-based visualizations accurately render the subdivisions and the semantic relationships in the data, while treemaps and tag clouds are also offered as alternative views. REVIGO is freely available at http://revigo.irb.hr/.
TL;DR: Improved quality-filtering pipeline was applied to several benchmarking studies and observed that even with the stringent data curation pipeline, biases in the data generation pipeline and batch effects were observed that could potentially confound the interpretation of microbial community data.
Abstract: The advent of next generation sequencing has coincided with a growth in interest in using these approaches to better understand the role of the structure and function of the microbial communities in human, animal, and environmental health. Yet, use of next generation sequencing to perform 16S rRNA gene sequence surveys has resulted in considerable controversy surrounding the effects of sequencing errors on downstream analyses. We analyzed 2.7×106 reads distributed among 90 identical mock community samples, which were collections of genomic DNA from 21 different species with known 16S rRNA gene sequences; we observed an average error rate of 0.0060. To improve this error rate, we evaluated numerous methods of identifying bad sequence reads, identifying regions within reads of poor quality, and correcting base calls and were able to reduce the overall error rate to 0.0002. Implementation of the PyroNoise algorithm provided the best combination of error rate, sequence length, and number of sequences. Perhaps more problematic than sequencing errors was the presence of chimeras generated during PCR. Because we knew the true sequences within the mock community and the chimeras they could form, we identified 8% of the raw sequence reads as chimeric. After quality filtering the raw sequences and using the Uchime chimera detection program, the overall chimera rate decreased to 1%. The chimeras that could not be detected were largely responsible for the identification of spurious operational taxonomic units (OTUs) and genus-level phylotypes. The number of spurious OTUs and phylotypes increased with sequencing effort indicating that comparison of communities should be made using an equal number of sequences. Finally, we applied our improved quality-filtering pipeline to several benchmarking studies and observed that even with our stringent data curation pipeline, biases in the data generation pipeline and batch effects were observed that could potentially confound the interpretation of microbial community data.
TL;DR: A toolkit for the analysis of RS-fMRI data, namely the RESting-state fMRI data analysis Toolkit (REST), which was developed in MATLAB with graphical user interface (GUI).
Abstract: Resting-state fMRI (RS-fMRI) has been drawing more and more attention in recent years. However, a publicly available, systematically integrated and easy-to-use tool for RS-fMRI data processing is still lacking. We developed a toolkit for the analysis of RS-fMRI data, namely the RESting-state fMRI data analysis Toolkit (REST). REST was developed in MATLAB with graphical user interface (GUI). After data preprocessing with SPM or AFNI, a few analytic methods can be performed in REST, including functional connectivity analysis based on linear correlation, regional homogeneity, amplitude of low frequency fluctuation (ALFF), and fractional ALFF. A few additional functions were implemented in REST, including a DICOM sorter, linear trend removal, bandpass filtering, time course extraction, regression of covariates, image calculator, statistical analysis, and slice viewer (for result visualization, multiple comparison correction, etc.). REST is an open-source package and is freely available at http://www.restfmri.net.
TL;DR: A meta-analysis of 326 studies that have used remotely sensed images to map urban land conversion suggests that contemporary urban expansion is related to a variety of factors difficult to observe comprehensively at the global level, including international capital flows, the informal economy, land use policy, and generalized transport costs.
Abstract: The conversion of Earth's land surface to urban uses is one of the most irreversible human impacts on the global biosphere. It drives the loss of farmland, affects local climate, fragments habitats, and threatens biodiversity. Here we present a meta-analysis of 326 studies that have used remotely sensed images to map urban land conversion. We report a worldwide observed increase in urban land area of 58,000 km2 from 1970 to 2000. India, China, and Africa have experienced the highest rates of urban land expansion, and the largest change in total urban extent has occurred in North America. Across all regions and for all three decades, urban land expansion rates are higher than or equal to urban population growth rates, suggesting that urban growth is becoming more expansive than compact. Annual growth in GDP per capita drives approximately half of the observed urban land expansion in China but only moderately affects urban expansion in India and Africa, where urban land expansion is driven more by urban population growth. In high income countries, rates of urban land expansion are slower and increasingly related to GDP growth. However, in North America, population growth contributes more to urban expansion than it does in Europe. Much of the observed variation in urban expansion was not captured by either population, GDP, or other variables in the model. This suggests that contemporary urban expansion is related to a variety of factors difficult to observe comprehensively at the global level, including international capital flows, the informal economy, land use policy, and generalized transport costs. Using the results from the global model, we develop forecasts for new urban land cover using SRES Scenarios. Our results show that by 2030, global urban land cover will increase between 430,000 km2 and 12,568,000 km2, with an estimate of 1,527,000 km2 more likely.
TL;DR: This work has combined targeted and non-targeted NMR, GC-MS and LC-MS methods with computer-aided literature mining to identify and quantify a comprehensive, if not absolutely complete, set of metabolites commonly detected and quantified (with today's technology) in the human serum metabolome.
Abstract: Continuing improvements in analytical technology along with an increased interest in performing comprehensive, quantitative metabolic profiling, is leading to increased interest pressures within the metabolomics community to develop centralized metabolite reference resources for certain clinically important biofluids, such as cerebrospinal fluid, urine and blood. As part of an ongoing effort to systematically characterize the human metabolome through the Human Metabolome Project, we have undertaken the task of characterizing the human serum metabolome. In doing so, we have combined targeted and non-targeted NMR, GC-MS and LC-MS methods with computer-aided literature mining to identify and quantify a comprehensive, if not absolutely complete, set of metabolites commonly detected and quantified (with today's technology) in the human serum metabolome. Our use of multiple metabolomics platforms and technologies allowed us to substantially enhance the level of metabolome coverage while critically assessing the relative strengths and weaknesses of these platforms or technologies. Tables containing the complete set of 4229 confirmed and highly probable human serum compounds, their concentrations, related literature references and links to their known disease associations are freely available at http://www.serummetabolome.ca.
TL;DR: OSLOM (Order Statistics Local Optimization Method), the first method capable to detect clusters in networks accounting for edge directions, edge weights, overlapping communities, hierarchies and community dynamics, is presented.
Abstract: Community structure is one of the main structural features of networks, revealing both their internal organization and the similarity of their elementary units. Despite the large variety of methods proposed to detect communities in graphs, there is a big need for multi-purpose techniques, able to handle different types of datasets and the subtleties of community structure. In this paper we present OSLOM (Order Statistics Local Optimization Method), the first method capable to detect clusters in networks accounting for edge directions, edge weights, overlapping communities, hierarchies and community dynamics. It is based on the local optimization of a fitness function expressing the statistical significance of clusters with respect to random fluctuations, which is estimated with tools of Extreme and Order Statistics. OSLOM can be used alone or as a refinement procedure of partitions/covers delivered by other techniques. We have also implemented sequential algorithms combining OSLOM with other fast techniques, so that the community structure of very large networks can be uncovered. Our method has a comparable performance as the best existing algorithms on artificial benchmark graphs. Several applications on real networks are shown as well. OSLOM is implemented in a freely available software (http://www.oslom.org), and we believe it will be a valuable tool in the analysis of networks.
TL;DR: The use of information embedded in the Twitter stream is examined to (1) track rapidly-evolving public sentiment with respect to H1N1 or swine flu, and (2) track and measure actual disease activity.
Abstract: Twitter is a free social networking and micro-blogging service that enables its millions of users to send and read each other's “tweets,” or short, 140-character messages. The service has more than 190 million registered users and processes about 55 million tweets per day. Useful information about news and geopolitical events lies embedded in the Twitter stream, which embodies, in the aggregate, Twitter users' perspectives and reactions to current events. By virtue of sheer volume, content embedded in the Twitter stream may be useful for tracking or even forecasting behavior if it can be extracted in an efficient manner. In this study, we examine the use of information embedded in the Twitter stream to (1) track rapidly-evolving public sentiment with respect to H1N1 or swine flu, and (2) track and measure actual disease activity. We also show that Twitter can be used as a measure of public interest or concern about health-related events. Our results show that estimates of influenza-like illness derived from Twitter chatter accurately track reported disease levels.
TL;DR: Western blotting and confocal microscopic analyses revealed that among the four 2As, the one derived from porcine teschovirus-1 (P2A) has the highest cleavage efficiency in all the contexts examined.
Abstract: When expression of more than one gene is required in cells, bicistronic or multicistronic expression vectors have been used. Among various strategies employed to construct bicistronic or multicistronic vectors, an internal ribosomal entry site (IRES) has been widely used. Due to the large size and difference in expression levels between genes before and after IRES, however, a new strategy was required to replace IRES. A self-cleaving 2A peptide could be a good candidate to replace IRES because of its small size and high cleavage efficiency between genes upstream and downstream of the 2A peptide. Despite the advantages of the 2A peptides, its use is not widespread because (i) there are no publicly available cloning vectors harboring a 2A peptide gene and (ii) comprehensive comparison of cleavage efficiency among various 2A peptides reported to date has not been performed in different contexts. Here, we generated four expression plasmids each harboring different 2A peptides derived from the foot-and-mouth disease virus, equine rhinitis A virus, Thosea asigna virus and porcine teschovirus-1, respectively, and evaluated their cleavage efficiency in three commonly used human cell lines, zebrafish embryos and adult mice. Western blotting and confocal microscopic analyses revealed that among the four 2As, the one derived from porcine teschovirus-1 (P2A) has the highest cleavage efficiency in all the contexts examined. We anticipate that the 2A-harboring cloning vectors we generated and the highest efficiency of the P2A peptide we demonstrated would help biomedical researchers easily adopt the 2A technology when bicistronic or multicistronic expression is required.
TL;DR: The resolution of the primate phylogeny provides an essential evolutionary framework with far-reaching applications including: human selection and adaptation, global emergence of zoonotic diseases, mammalian comparative genomics, primate taxonomy, and conservation of endangered species.
Abstract: Comparative genomic analyses of primates offer considerable potential to define and understand the processes that mold, shape, and transform the human genome. However, primate taxonomy is both complex and controversial, with marginal unifying consensus of the evolutionary hierarchy of extant primate species. Here we provide new genomic sequence (,8 Mb) from 186 primates representing 61 (,90%) of the described genera, and we include outgroup species from Dermoptera, Scandentia, and Lagomorpha. The resultant phylogeny is exceptionally robust and illuminates events in primate evolution from ancient to recent, clarifying numerous taxonomic controversies and providing new data on human evolution. Ongoing speciation, reticulate evolution, ancient relic lineages, unequal rates of evolution, and disparate distributions of insertions/deletions among the reconstructed primate lineages are uncovered. Our resolution of the primate phylogeny provides an essential evolutionary framework with far-reaching applications including: human selection and adaptation, global emergence of zoonotic diseases, mammalian comparative genomics, primate taxonomy, and conservation of endangered species.
TL;DR: It is estimated that daily occupation-related energy expenditure has decreased by more than 100 calories, and this reduction in energy expenditure accounts for a significant portion of the increase in mean U.S. body weights for women and men over the last 50 years.
Abstract: Background The true causes of the obesity epidemic are not well understood and there are few longitudinal population-based data published examining this issue The objective of this analysis was to examine trends in occupational physical activity during the past 5 decades and explore how these trends relate to concurrent changes in body weight in the US Methodology/Principal Findings Analysis of energy expenditure for occupations in US private industry since 1960 using data from the US Bureau of Labor Statistics Mean body weight was derived from the US National Health and Nutrition Examination Surveys (NHANES) In the early 1960's almost half the jobs in private industry in the US required at least moderate intensity physical activity whereas now less than 20% demand this level of energy expenditure Since 1960 the estimated mean daily energy expenditure due to work related physical activity has dropped by more than 100 calories in both women and men Energy balance model predicted weights based on change in occupation-related daily energy expenditure since 1960 for each NHANES examination period closely matched the actual change in weight for 40–50 year old men and women For example from 1960–62 to 2003–06 we estimated that the occupation-related daily energy expenditure decreased by 142 calories in men Given a baseline weight of 769 kg in 1960–02, we estimated that a 142 calories reduction would result in an increase in mean weight to 897 kg, which closely matched the mean NHANES weight of 918 kg in 2003–06 The results were similar for women Conclusion Over the last 50 years in the US we estimate that daily occupation-related energy expenditure has decreased by more than 100 calories, and this reduction in energy expenditure accounts for a significant portion of the increase in mean US body weights for women and men
TL;DR: Overall, HIV incidence in the United States was relatively stable 2006–2009; however, among young MSM, particularly black/African American MSM, incidence increased and expanded, improved, and targeted prevention is necessary to reduce HIV incidence.
Abstract: Background The estimated number of new HIV infections in the United States reflects the leading edge of the epidemic. Previously, CDC estimated HIV incidence in the United States in 2006 as 56,300 (95% CI: 48,200–64,500). We updated the 2006 estimate and calculated incidence for 2007–2009 using improved methodology. Methodology We estimated incidence using incidence surveillance data from 16 states and 2 cities and a modification of our previously described stratified extrapolation method based on a sample survey approach with multiple imputation, stratification, and extrapolation to account for missing data and heterogeneity of HIV testing behavior among population groups. Principal Findings Estimated HIV incidence among persons aged 13 years and older was 48,600 (95% CI: 42,400–54,700) in 2006, 56,000 (95% CI: 49,100–62,900) in 2007, 47,800 (95% CI: 41,800–53,800) in 2008 and 48,100 (95% CI: 42,200–54,000) in 2009. From 2006 to 2009 incidence did not change significantly overall or among specific race/ethnicity or risk groups. However, there was a 21% (95% CI:1.9%–39.8%; p = 0.017) increase in incidence for people aged 13–29 years, driven by a 34% (95% CI: 8.4%–60.4%) increase in young men who have sex with men (MSM). There was a 48% increase among young black/African American MSM (12.3%–83.0%; p<0.001). Among people aged 13–29, only MSM experienced significant increases in incidence, and among 13–29 year-old MSM, incidence increased significantly among young, black/African American MSM. In 2009, MSM accounted for 61% of new infections, heterosexual contact 27%, injection drug use (IDU) 9%, and MSM/IDU 3%. Conclusions/Significance Overall, HIV incidence in the United States was relatively stable 2006–2009; however, among young MSM, particularly black/African American MSM, incidence increased. HIV continues to be a major public health burden, disproportionately affecting several populations in the United States, especially MSM and racial and ethnic minorities. Expanded, improved, and targeted prevention is necessary to reduce HIV incidence.
TL;DR: Large scale programs, such as the NSF-sponsored DataNET will both bring attention and resources to the issue and make it easier for scientists to apply sound data management principles.
Abstract: Background: Scientific research in the 21st century is more data intensive and collaborative than in the past. It is important to study the data practices of researchers – data accessibility, discovery, re-use, preservation and, particularly, data sharing. Data sharing is a valuable part of the scientific method allowing for verification of results and extending research from prior results. Methodology/Principal Findings: A total of 1329 scientists participated in this survey exploring current data sharing practices and perceptions of the barriers and enablers of data sharing. Scientists do not make their data electronically available to others for various reasons, including insufficient time and lack of funding. Most respondents are satisfied with their current processes for the initial and short-term parts of the data or research lifecycle (collecting their research data; searching for, describing or cataloging, analyzing, and short-term storage of their data) but are not satisfied with long-term data preservation. Many organizations do not provide support to their researchers for data management both in the shortand long-term. If certain conditions are met (such as formal citation and sharing reprints) respondents agree they are willing to share their data. There are also significant differences and approaches in data management practices based on primary funding agency, subject discipline, age, work focus, and world region. Conclusions/Significance: Barriers to effective data sharing and preservation are deeply rooted in the practices and culture of the research process as well as the researchers themselves. New mandates for data management plans from NSF and other federal agencies and world-wide attention to the need to share and preserve data could lead to changes. Large scale programs, such as the NSF-sponsored DataNET (including projects like DataONE) will both bring attention and resources to the issue and make it easier for scientists to apply sound data management principles.
TL;DR: Surprisingly, it is found that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures, and the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy.
Abstract: The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 A Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes.
TL;DR: The use and reporting of the Delphi method for quality indicators selection need to be improved and some guidance is provided to improve the using and reported of the method in future surveys.
Abstract: Objective Delphi technique is a structured process commonly used to developed healthcare quality indicators, but there is a little recommendation for researchers who wish to use it. This study aimed 1) to describe reporting of the Delphi method to develop quality indicators, 2) to discuss specific methodological skills for quality indicators selection 3) to give guidance about this practice.
TL;DR: The process of selecting and refining a plant barcode is reviewed; the factors which influence the discriminatory power of the approach are evaluated; some early applications of plant barcoding are described and summarise major emerging projects; and outline tool development that will be necessary for plant DNA barcode to advance.
Abstract: The main aim of DNA barcoding is to establish a shared community resource of DNA sequences that can be used for organismal identification and taxonomic clarification. This approach was successfully pioneered in animals using a portion of the cytochrome oxidase 1 (CO1) mitochondrial gene. In plants, establishing a standardized DNA barcoding system has been more challenging. In this paper, we review the process of selecting and refining a plant barcode; evaluate the factors which influence the discriminatory power of the approach; describe some early applications of plant barcoding and summarise major emerging projects; and outline tool development that will be necessary for plant DNA barcoding to advance.
TL;DR: It is shown that long-term inhibition of mTOR by rapamycin prevented AD-like cognitive deficits and lowered levels of Aβ42, a major toxic species in AD, in the PDAPP transgenic mouse model.
Abstract: Background Reduced TOR signaling has been shown to significantly increase lifespan in a variety of organisms , , , . It was recently demonstrated that long-term treatment with rapamycin, an inhibitor of the mTOR pathway, or ablation of the mTOR target p70S6K extends lifespan in mice, possibly by delaying aging. Whether inhibition of the mTOR pathway would delay or prevent age-associated disease such as AD remained to be determined.
Dresden University of Technology1, Royal Perth Hospital2, Trinity College, Dublin3, University of the West Indies4, Mater Dei Hospital5, The Chinese University of Hong Kong6, Friedrich Loeffler Institute7, Health Protection Agency8, University of Lyon9, Curtin University10, Shaikh Khalifa Medical City11
TL;DR: A high level of biodiversity among MRSA, especially among strains harbouring SCCmec IV and V elements is shown, and the data indicate a high rate of genetic recombination in MRSA involving SCC elements, bacteriophages or other mobile genetic elements and large-scale chromosomal replacements.
Abstract: In recent years, methicillin-resistant Staphylococcus aureus (MRSA) have become a truly global challenge. In addition to the long-known healthcare-associated clones, novel strains have also emerged outside of the hospital settings, in the community as well as in livestock. The emergence and spread of virulent clones expressing Panton-Valentine leukocidin (PVL) is an additional cause for concern. In order to provide an overview of pandemic, epidemic and sporadic strains, more than 3,000 clinical and veterinary isolates of MRSA mainly from Germany, the United Kingdom, Ireland, France, Malta, Abu Dhabi, Hong Kong, Australia, Trinidad & Tobago as well as some reference strains from the United States have been genotyped by DNA microarray analysis. This technique allowed the assignment of the MRSA isolates to 34 distinct lineages which can be clearly defined based on non-mobile genes. The results were in accordance with data from multilocus sequence typing. More than 100 different strains were distinguished based on affiliation to these lineages, SCCmec type and the presence or absence of PVL. These strains are described here mainly with regard to clinically relevant antimicrobial resistance- and virulence-associated markers, but also in relation to epidemiology and geographic distribution. The findings of the study show a high level of biodiversity among MRSA, especially among strains harbouring SCCmec IV and V elements. The data also indicate a high rate of genetic recombination in MRSA involving SCC elements, bacteriophages or other mobile genetic elements and large-scale chromosomal replacements.
TL;DR: A compilation of continuous, high-resolution time series of upper ocean pH, collected using autonomous sensors, over a variety of ecosystems ranging from polar to tropical, open-ocean to coastal, kelp forest to coral reef, reveals a continuum of month-long pH variability with characteristic diel, semi-diurnal, and stochastic patterns of varying amplitudes.
Abstract: The effect of Ocean Acidification (OA) on marine biota is quasi-predictable at best. While perturbation studies, in the form of incubations under elevated pCO2, reveal sensitivities and responses of individual species, one missing link in the OA story results from a chronic lack of pH data specific to a given species’ natural habitat. Here, we present a compilation of continuous, high-resolution time series of upper ocean pH, collected using autonomous sensors, over a variety of ecosystems ranging from polar to tropical, open-ocean to coastal, kelp forest to coral reef. These observations reveal a continuum of month-long pH variability with standard deviations from 0.004 to 0.277 and ranges spanning 0.024 to 1.430 pH units. The nature of the observed variability was also highly site-dependent, with characteristic diel, semi-diurnal, and stochastic patterns of varying amplitudes. These biome-specific pH signatures disclose current levels of exposure to both high and low dissolved CO2, often demonstrating that resident organisms are already experiencing pH regimes that are not predicted until 2100. Our data provide a first step toward crystallizing the biophysical link between environmental history of pH exposure and physiological resilience of marine organisms to fluctuations in seawater CO2. Knowledge of this spatial and temporal variation in seawater chemistry allows us to improve the design of OA experiments: we can test organisms with a priori expectations of their tolerance guardrails, based on their natural range of exposure. Such hypothesis-testing will provide a deeper understanding of the effects of OA. Both intuitively simple to understand and powerfully informative, these and similar comparative time series can help guide management efforts to identify areas of marine habitat that can serve as refugia to acidification as well as areas that are particularly vulnerable to future ocean change.
TL;DR: A hierarchical modular cloning system that allows the creation at will and with high efficiency of any eukaryotic multigene construct, starting from libraries of defined and validated basic modules containing regulatory and coding sequences.
Abstract: The field of synthetic biology promises to revolutionize biotechnology through the design of organisms with novel phenotypes useful for medicine, agriculture and industry. However, a limiting factor is the ability of current methods to assemble complex DNA molecules encoding multiple genetic elements in various predefined arrangements. We present here a hierarchical modular cloning system that allows the creation at will and with high efficiency of any eukaryotic multigene construct, starting from libraries of defined and validated basic modules containing regulatory and coding sequences. This system is based on the ability of type IIS restriction enzymes to assemble multiple DNA fragments in a defined linear order. We constructed a 33 kb DNA molecule containing 11 transcription units made from 44 individual basic modules in only three successive cloning steps. This modular cloning (MoClo) system can be readily automated and will be extremely useful for applications such as gene stacking and metabolic engineering.
TL;DR: The data suggests the existence of a core pulmonary bacterial microbiome that includes Pseudomonas, Streptococcus, Prevotella, Fusobacterium, Haemophilus, Veillonella, and Porphyromonas within the same lung of subjects with advanced COPD.
Abstract: Although culture-independent techniques have shown that the lungs are not sterile, little is known about the lung microbiome in chronic obstructive pulmonary disease (COPD). We used pyrosequencing of 16S amplicons to analyze the lung microbiome in two ways: first, using bronchoalveolar lavage (BAL) to sample the distal bronchi and air-spaces; and second, by examining multiple discrete tissue sites in the lungs of six subjects removed at the time of transplantation. We performed BAL on three never-smokers (NS) with normal spirometry, seven smokers with normal spirometry (“heathy smokers”, HS), and four subjects with COPD (CS). Bacterial 16 s sequences were found in all subjects, without significant quantitative differences between groups. Both taxonomy-based and taxonomy-independent approaches disclosed heterogeneity in the bacterial communities between HS subjects that was similar to that seen in healthy NS and two mild COPD patients. The moderate and severe COPD patients had very limited community diversity, which was also noted in 28% of the healthy subjects. Both approaches revealed extensive membership overlap between the bacterial communities of the three study groups. No genera were common within a group but unique across groups. Our data suggests the existence of a core pulmonary bacterial microbiome that includes Pseudomonas, Streptococcus, Prevotella, Fusobacterium, Haemophilus, Veillonella, and Porphyromonas. Most strikingly, there were significant micro-anatomic differences in bacterial communities within the same lung of subjects with advanced COPD. These studies are further demonstration of the pulmonary microbiome and highlight global and micro-anatomic changes in these bacterial communities in severe COPD patients.
TL;DR: It is suggested that by stepwise gain and loss of chromosomal and plasmid-encoded virulence factors, a highly pathogenic hybrid of EAEC and EHEC emerged as the current outbreak clone.
Abstract: An ongoing outbreak of exceptionally virulent Shiga toxin (Stx)-producing Escherichia coli O104:H4 centered in Germany, has caused over 830 cases of hemolytic uremic syndrome (HUS) and 46 deaths since May 2011. Serotype O104:H4, which has not been detected in animals, has rarely been associated with HUS in the past. To prospectively elucidate the unique characteristics of this strain in the early stages of this outbreak, we applied whole genome sequencing on the Life Technologies Ion Torrent PGM™ sequencer and Optical Mapping to characterize one outbreak isolate (LB226692) and a historic O104:H4 HUS isolate from 2001 (01-09591). Reference guided draft assemblies of both strains were completed with the newly introduced PGM™ within 62 hours. The HUS-associated strains both carried genes typically found in two types of pathogenic E. coli, enteroaggregative E. coli (EAEC) and enterohemorrhagic E. coli (EHEC). Phylogenetic analyses of 1,144 core E. coli genes indicate that the HUS-causing O104:H4 strains and the previously published sequence of the EAEC strain 55989 show a close relationship but are only distantly related to common EHEC serotypes. Though closely related, the outbreak strain differs from the 2001 strain in plasmid content and fimbrial genes. We propose a model in which EAEC 55989 and EHEC O104:H4 strains evolved from a common EHEC O104:H4 progenitor, and suggest that by stepwise gain and loss of chromosomal and plasmid-encoded virulence factors, a highly pathogenic hybrid of EAEC and EHEC emerged as the current outbreak clone. In conclusion, rapid next-generation technologies facilitated prospective whole genome characterization in the early stages of an outbreak.
TL;DR: Examination of expressions made on the online, global microblog and social networking service Twitter is examined, uncovering and explaining temporal variations in happiness and information levels over timescales ranging from hours to years.
Abstract: Individual happiness is a fundamental societal metric. Normally measured through self-report, happiness has often been indirectly characterized and overshadowed by more readily quantifiable economic indicators such as gross domestic product. Here, we examine expressions made on the online, global microblog and social networking service Twitter, uncovering and explaining temporal variations in happiness and information levels over timescales ranging from hours to years. Our data set comprises over 46 billion words contained in nearly 4.6 billion expressions posted over a 33 month span by over 63 million unique users. In measuring happiness, we construct a tunable, real-time, remote-sensing, and non-invasive, text-based hedonometer. In building our metric, made available with this paper, we conducted a survey to obtain happiness evaluations of over 10,000 individual words, representing a tenfold size improvement over similar existing word sets. Rather than being ad hoc, our word list is chosen solely by frequency of usage, and we show how a highly robust and tunable metric can be constructed and defended.
TL;DR: High intake of red and processed meat is associated with significant increased risk of colorectal, colon and rectal cancers, and the overall evidence of prospective studies supports limiting red and processing meat consumption as one of the dietary recommendations for the prevention of colOREctal cancer.
Abstract: Background The evidence that red and processed meat influences colorectal carcinogenesis was judged convincing in the 2007 World Cancer Research Fund/American Institute of Cancer Research report. Since then, ten prospective studies have published new results. Here we update the evidence from prospective studies and explore whether there is a non-linear association of red and processed meats with colorectal cancer risk. Methods and Findings Relevant prospective studies were identified in PubMed until March 2011. For each study, relative risks and 95% confidence intervals (CI) were extracted and pooled with a random-effects model, weighting for the inverse of the variance, in highest versus lowest intake comparison, and dose-response meta-analyses. Red and processed meats intake was associated with increased colorectal cancer risk. The summary relative risk (RR) of colorectal cancer for the highest versus the lowest intake was 1.22 (95% CI = 1.11−1.34) and the RR for every 100 g/day increase was 1.14 (95% CI = 1.04−1.24). Non-linear dose-response meta-analyses revealed that colorectal cancer risk increases approximately linearly with increasing intake of red and processed meats up to approximately 140 g/day, where the curve approaches its plateau. The associations were similar for colon and rectal cancer risk. When analyzed separately, colorectal cancer risk was related to intake of fresh red meat (RR for 100 g/day increase = 1.17, 95% CI = 1.05−1.31) and processed meat (RR for 50 g/day increase = 1.18, 95% CI = 1.10−1.28). Similar results were observed for colon cancer, but for rectal cancer, no significant associations were observed. Conclusions High intake of red and processed meat is associated with significant increased risk of colorectal, colon and rectal cancers. The overall evidence of prospective studies supports limiting red and processed meat consumption as one of the dietary recommendations for the prevention of colorectal cancer.
TL;DR: It is found that transnational corporations form a giant bow-tie structure and that a large portion of control flows to a small tightly-knit core of financial institutions that can be seen as an economic “super-entity” that raises new important issues both for researchers and policy makers.
Abstract: The structure of the control network of transnational corporations affects global market competition and financial stability. So far, only small national samples were studied and there was no appropriate methodology to assess control globally. We present the first investigation of the architecture of the international ownership network, along with the computation of the control held by each global player. We find that transnational corporations form a giant bow-tie structure and that a large portion of control flows to a small tightly-knit core of financial institutions. This core can be seen as an economic “super-entity” that raises new important issues both for researchers and policy makers.
TL;DR: It is shown that LeaderRank outperforms PageRank in terms of ranking effectiveness, as well as robustness against manipulations and noisy data, which suggest that leaders who are aware of their clout may reinforce the development of social networks, and thus the power of collective search.
Abstract: Finding pertinent information is not limited to search engines. Online communities can amplify the influence of a small number of power users for the benefit of all other users. Users' information foraging in depth and breadth can be greatly enhanced by choosing suitable leaders. For instance in delicious.com, users subscribe to leaders' collection which lead to a deeper and wider reach not achievable with search engines. To consolidate such collective search, it is essential to utilize the leadership topology and identify influential users. Google's PageRank, as a successful search algorithm in the World Wide Web, turns out to be less effective in networks of people. We thus devise an adaptive and parameter-free algorithm, the LeaderRank, to quantify user influence. We show that LeaderRank outperforms PageRank in terms of ranking effectiveness, as well as robustness against manipulations and noisy data. These results suggest that leaders who are aware of their clout may reinforce the development of social networks, and thus the power of collective search.
TL;DR: This is the first large series to demonstrate a composition change in the microbiota of colon cancer patients with possible impact on mucosal immune response and 80% of all sequences could be assigned to a total of 819 taxa based on default parameter of Classifier software.
Abstract: The composition of the human intestinal microbiota is linked to health status. The aim was to analyze the microbiota of normal and colon cancer patients in order to establish cancer-related dysbiosis. Patients and Methods: Stool bacterial DNA was extracted prior to colonoscopy from 179 patients: 60 with colorectal cancer, and 119 with normal colonoscopy. Bacterial genes obtained by pyrosequencing of 12 stool samples (6 Normal and 6 Cancer) were subjected to a validated Principal Component Analysis (PCA) test. The dominant and subdominant bacterial population (C. leptum, C. coccoides, Bacteroides/Prevotella, Lactobacillus/Leuconostoc/Pediococcus groups, Bifidobacterium genus, and E. coli, and Faecalibacterium prausnitzii species) were quantified in all individuals using qPCR and specific IL17 producer cells in the intestinal mucosa were characterized using immunohistochemistry. Findings: Pyrosequencing (Minimal sequence 200 nucleotide reads) revealed 80% of all sequences could be assigned to a total of 819 taxa based on default parameter of Classifier software. The phylogenetic core in Cancer individuals was different from that in Normal individuals according to the PCA analysis, with trends towards differences in the dominant and subdominant families of bacteria. Consequently, All-bacteria [log(10) (bacteria/g of stool)] in Normal, and Cancer individuals were similar [11.88 +/- 0.35, and 11.80 +/- 0.56, respectively, (P = 0.16)], according to qPCR values whereas among all dominant and subdominant species only those of Bacteroides/Prevotella were higher (All bacteria-specific bacterium; P = 0.009) in Cancer (-1.04 +/- 0.55) than in Normal (-1.40 +/- 0.83) individuals. IL17 immunoreactive cells were significantly expressed more in the normal mucosa of cancer patients than in those with normal colonoscopy. Conclusion: This is the first large series to demonstrate a composition change in the microbiota of colon cancer patients with possible impact on mucosal immune response. These data open new filed for mass screening and pathophysiology investigations.
TL;DR: The results can help define a prioritization of control measures based on preventive measures, case isolation, classes and school closures, that could reduce the disruption to education during epidemics.
Abstract: Little quantitative information is available on the mixing patterns of children in school environments. Describing and understanding contacts between children at school would help quantify the transmission opportunities of respiratory infections and identify situations within schools where the risk of transmission is higher. We report on measurements carried out in a French school (6-12 years children), where we collected data on the time-resolved face-to-face proximity of children and teachers using a proximity-sensing infrastructure based on radio frequency identification devices. Data on face-to-face interactions were collected on Thursday, October 1st and Friday, October 2nd 2009. We recorded 77,602 contact events between 242 individuals (232 children and 10 teachers). In this setting, each child has on average 323 contacts per day with 47 other children, leading to an average daily interaction time of 176 minutes. Most contacts are brief, but long contacts are also observed. Contacts occur mostly within each class, and each child spends on average three times more time in contact with classmates than with children of other classes. We describe the temporal evolution of the contact network and the trajectories followed by the children in the school, which constrain the contact patterns. We determine an exposure matrix aimed at informing mathematical models. This matrix exhibits a class and age structure which is very different from the homogeneous mixing hypothesis. We report on important properties of the contact patterns between school children that are relevant for modeling the propagation of diseases and for evaluating control measures. We discuss public health implications related to the management of schools in case of epidemics and pandemics. Our results can help define a prioritization of control measures based on preventive measures, case isolation, classes and school closures, that could reduce the disruption to education during epidemics.
TL;DR: It is found that exposure to even a single metaphor can induce substantial differences in opinion about how to solve social problems: differences that are larger, for example, than pre-existing Differences in opinion between Democrats and Republicans.
Abstract: The way we talk about complex and abstract ideas is suffused with metaphor. In five experiments, we explore how these metaphors influence the way that we reason about complex issues and forage for further information about them. We find that even the subtlest instantiation of a metaphor (via a single word) can have a powerful influence over how people attempt to solve social problems like crime and how they gather information to make “well-informed” decisions. Interestingly, we find that the influence of the metaphorical framing effect is covert: people do not recognize metaphors as influential in their decisions; instead they point to more “substantive” (often numerical) information as the motivation for their problem-solving decision. Metaphors in language appear to instantiate frame-consistent knowledge structures and invite structurally consistent inferences. Far from being mere rhetorical flourishes, metaphors have profound influences on how we conceptualize and act with respect to important societal issues. We find that exposure to even a single metaphor can induce substantial differences in opinion about how to solve social problems: differences that are larger, for example, than pre-existing differences in opinion between Democrats and Republicans.
TL;DR: DeconSeq is a robust framework for the rapid, automated identification and removal of sequence contamination in longer-read datasets (150 bp mean read length) and allows scientists to automatically detect and efficiently remove unwanted sequence contamination from their datasets while eliminating critical limitations of current methods.
Abstract: High-throughput sequencing technologies have strongly impacted microbiology, providing a rapid and cost-effective way of generating draft genomes and exploring microbial diversity. However, sequences obtained from impure nucleic acid preparations may contain DNA from sources other than the sample. Those sequence contaminations are a serious concern to the quality of the data used for downstream analysis, causing misassembly of sequence contigs and erroneous conclusions. Therefore, the removal of sequence contaminants is a necessary and required step for all sequencing projects. We developed DeconSeq, a robust framework for the rapid, automated identification and removal of sequence contamination in longer-read datasets (150 bp mean read length). DeconSeq is publicly available as standalone and web-based versions. The results can be exported for subsequent analysis, and the databases used for the web-based version are automatically updated on a regular basis. DeconSeq categorizes possible contamination sequences, eliminates redundant hits with higher similarity to non-contaminant genomes, and provides graphical visualizations of the alignment results and classifications. Using DeconSeq, we conducted an analysis of possible human DNA contamination in 202 previously published microbial and viral metagenomes and found possible contamination in 145 (72%) metagenomes with as high as 64% contaminating sequences. This new framework allows scientists to automatically detect and efficiently remove unwanted sequence contamination from their datasets while eliminating critical limitations of current methods. DeconSeq's web interface is simple and user-friendly. The standalone version allows offline analysis and integration into existing data processing pipelines. DeconSeq's results reveal whether the sequencing experiment has succeeded, whether the correct sample was sequenced, and whether the sample contains any sequence contamination from DNA preparation or host. In addition, the analysis of 202 metagenomes demonstrated significant contamination of the non-human associated metagenomes, suggesting that this method is appropriate for screening all metagenomes. DeconSeq is available at http://deconseq.sourceforge.net/.