scispace - formally typeset
Search or ask a question

Showing papers in "Nature Precedings in 2010"


Journal ArticleDOI
TL;DR: An error model that uses the negative binomial distribution, with variance and mean linked by local regression, to model the null distribution of the count data is proposed and provides good detection power.
Abstract: Motivation: High throughput nucleotide sequencing provides quantitative readouts in assays for RNA expression (RNA-Seq), protein-DNA binding (ChIP-Seq), cell counting. Statistical inference of differential signal in these data needs to take into account their natural variability throughout the dynamic range. When the number of replicates is small, error modeling is needed to achieve statistical power. Results: We propose an error model that uses the negative binomial distribution, with variance and mean linked by local regression, to model the null distribution of the count data. The method controls type-I error and provides good detection power.Availability: A free open-source R/Biondonductor software package, called "DESeq", is available from "http://www-huber.embl.de/users/anders/DESeq":http://www-huber.embl.de/users/anders/DESeq

2,033 citations


Journal ArticleDOI
TL;DR: This work compared the outcome of fitting models that were transformed in various ways with results from fitting models using Poisson and negative binomial models to untransformed count data, finding that the transformations performed poorly, except when the dispersion was small and the mean counts were large.
Abstract: 1. Ecological count data (e.g., number of individuals or species) are often log-transformed to satisfy parametric test assumptions.2. Apart from the fact that generalized linear models are better suited in dealing with count data, a log-transformation of counts has the additional quandary in how to deal with zero observations. With just one zero observation (if this observation represents a sampling unit), the whole dataset needs to be fudged by adding a value (usually 1) before transformation. 3. Simulating data from a negative binomial distribution, we compared the outcome of fitting models that were transformed in various ways (log, square-root) with results from fitting models using Poisson and negative binomial models to untransformed count data. 4. We found that the transformations performed poorly, except when the dispersion was small and the mean counts were large. The Poisson and negative binomial models consistently performed well, with little bias.

553 citations


Journal ArticleDOI
TL;DR: A versatile and powerful tool that allows large-scale comparison of proteome profiles among discrete populations and will enable the discovery of novel biomarkers in a manner that is unencumbered by the incomplete knowledge of biology, thereby helping to advance the next generation of evidence-based medicine.
Abstract: Interrogation of the human proteome in a highly multiplexed and efficient manner remains a coveted and challenging goal in biology. We present a new aptamer-based proteomic technology for biomarker discovery capable of simultaneously measuring thousands of proteins from small sample volumes (15 [mu]L of serum or plasma). Our current assay allows us to measure ~800 proteins with very low limits of detection (1 pM average), 7 logs of overall dynamic range, and 5% average coefficient of variation. This technology is enabled by a new generation of aptamers that contain chemically modified nucleotides, which greatly expand the physicochemical diversity of the large randomized nucleic acid libraries from which the aptamers are selected. Proteins in complex matrices such as plasma are measured with a process that transforms a signature of protein concentrations into a corresponding DNA aptamer concentration signature, which is then quantified with a DNA microarray. In essence, our assay takes advantage of the dual nature of aptamers as both folded binding entities with defined shapes and unique sequences recognizable by specific hybridization probes. To demonstrate the utility of our proteomics biomarker discovery technology, we applied it to a clinical study of chronic kidney disease (CKD). We identified two well known CKD biomarkers as well as an additional 58 potential CKD biomarkers. These results demonstrate the potential utility of our technology to discover unique protein signatures characteristic of various disease states. More generally, we describe a versatile and powerful tool that allows large-scale comparison of proteome profiles among discrete populations. This unbiased and highly multiplexed search engine will enable the discovery of novel biomarkers in a manner that is unencumbered by our incomplete knowledge of biology, thereby helping to advance the next generation of evidence-based medicine.

363 citations


Journal ArticleDOI
TL;DR: In breast cancer cell lines before but not after the epithelial-mesenchymal transition, data indicate that collective migration is governed by a simple but unifying physiological principle: neighboring cells join forces to transmit appreciable intercellular normal stress across local cell-cell junctions, but migrate along orientations of minimal inter cellular shear stress.
Abstract: Cells comprising a tissue migrate as part of a collective. In order to coordinate collective multi- cellular migration, each constituent cell integrates local information including chemical signals and mechanical stresses. The boundary between a constituent cell and its immediate neighbors comprises cell-cell junctions and cryptic lamellipodia, but the state of local mechanical stress exerted at that boundary has not been accessible experimentally. As such it is not clear how collective mechanical processes could be coordinated over length scales spanning large multi-cellular assemblies. We report here maps of the stresses exerted within and between cells comprising a monolayer. Within the cell sheet there arise unanticipated fluctuations of mechanical stress that are severe, emerge spontaneously, and ripple across the monolayer. These fluctuations define a rugged stress landscape that becomes increasingly heterogeneous, sluggish, and cooperative with increasing system density. Within that persistently rugged stress landscape, local cellular migrations are found to migrate along local orientations of maximal principal stress. Migrations of both endothelial and epithelial monolayers conform to this behavior, as do breast cancer cell lines before but not after the epithelial-mesenchymal transition. In these diverse cell types, our data indicate that collective migration is governed by a simple but unifying physiological principle: neighboring cells join forces to transmit appreciable intercellular normal stress across local cell-cell junctions, but migrate along orientations of minimal intercellular shear stress.

202 citations


Journal ArticleDOI
TL;DR: The notion of research objects, semantically rich aggregations of resources, that possess some scientifi?c intent or support some research objective are discussed and a number of principles that such objects and their associated services are expected to follow are presented.
Abstract: What will researchers be publishing in the future? Whilst there is little question that the Web will be the publication platform, as scholars move away from paper towards digital content, there is a need for mechanisms that support the production of self-contained units of knowledge and facilitate the publication, sharing and reuse of such entities. In this paper we discuss the notion of research objects, semantically rich aggregations of resources, that possess some scientifi?c intent or support some research objective. We present a number of principles that we expect such objects and their associated services to follow.

190 citations


Journal ArticleDOI
TL;DR: Nowak et al. as mentioned in this paper explain why the version of kin selection theory that is summarised by the formula R>c/b (c=cost of performing 'altruistic' act, b=benefit derived by recipient of act, R=relatedness between the two) is of little utility for understanding the evolution of eusociality.
Abstract: Nowak et al.1 wish to explain why the version of kin selection theory that is summarised by the formula R>c/b (c=cost of performing 'altruistic' act, b=benefit derived by recipient of act, R=relatedness between the two) is of little utility for understanding the evolution of eusociality. But in trying to do so they omit much that is relevant and risk misrepresenting the issue to anyone who is not familiar with the literature. A fairer account would include the following facts.

159 citations


Journal ArticleDOI
TL;DR: PhyloCSF as mentioned in this paper is a comparative genomics method that analyzes a multi-species nucleotide sequence alignment to determine whether it is likely to represent a conserved protein-coding region, based on a formal statistical comparison of phylogenetic codon models.
Abstract: As high-throughput transcriptome sequencing provides evidence for novel transcripts in many species, there is a renewed need for accurate methods to classify small genomic regions as protein-coding or non-coding. We present PhyloCSF, a novel comparative genomics method that analyzes a multi-species nucleotide sequence alignment to determine whether it is likely to represent a conserved protein-coding region, based on a formal statistical comparison of phylogenetic codon models. We show that PhyloCSF's classification performance in 12-species Drosophila genome alignments exceeds all other methods we compared in a previous study, and we provide a software implementation for use by the community. We anticipate that this method will be widely applicable as the transcriptomes of many additional species, tissues, and subcellular compartments are sequenced, particularly in the context of ENCODE and modENCODE.

104 citations


Journal ArticleDOI
TL;DR: In view of the quite labor intensive analysis of hundreds of growth curves - sometimes not revealing ideal relationships - the Grofit-software was developed to support biologists and such approaches.
Abstract: Growth experiments are routinely used to analyze basic properties of a given organism or cellular model. Similarly, within the TRANSLUCENT project the characterization of cellular growth and particularly under conditions of high extra cellular potassium or sodium is an important task. Any growth analysis should ideally reveal a relationship between the concentration of a compound/substrate and its effect on a particular growth parameter. In view of the quite labor intensive analysis of hundreds of growth curves - sometimes not revealing ideal relationships - the Grofit-software was developed to support biologists and such approaches. Within the software package, specifically tailored regression and bootstrapping techniques are utilized to statistically estimate the effect of different growth conditions. Grofit was implemented in R, an open source statistical software environment. A complete description of the Grofit software and information about the implemented methods is given in a manuscript published by the Journal of Statistical Software. In addition the software is available from CRAN repositories or the developers homepage http://www.rheinahrcampus.de/Research-Group-of-Maik-Kschisc.2452.0.html

60 citations


Journal ArticleDOI
TL;DR: Based on comprehensive studies in Inner Mongolia grassland, the authors showed that species-level stoichiometric homeostasis was consistently positively correlated with dominance and stability on both 2-year and 27-year temporal scales and across a 1200-km spatial transect.
Abstract: Ecosystem structure, functioning, and stability have been a focus of ecological and environmental sciences during the past two decades. The mechanisms underlying their relationship, however, are not well understood. Based on comprehensive studies in Inner Mongolia grassland, here we show that species-level stoichiometric homeostasis was consistently positively correlated with dominance and stability on both 2-year and 27-year temporal scales and across a 1200-km spatial transect. At the community level, stoichiometric homeostasis was also positively correlated with ecosystem function and stability in most cases. Thus, homeostatic species tend to have high and stable biomass; and ecosystems dominated by more homeostatic species have higher productivity and greater stability. By modulating organism responses to key environmental drivers, stoichiometric homeostasis appears to be a major mechanism responsible for the structure, functioning, and stability of grassland ecosystems.

55 citations


Journal ArticleDOI
TL;DR: This is the final specification for Release 1 of *SBML Level 3 Version 1 Core*, an electronic model representation format for systems biology, oriented towards describing biological processes of the sort common in research on a number of topics.
Abstract: This is the final specification for Release 1 of *SBML Level 3 Version 1 Core*, an electronic model representation format for systems biology.SBML is oriented towards describing biological processes of the sort common in research on a number of topics, including metabolic pathways, cell signaling pathways, and many others. SBML is defined neutrally with respect to programming languages and software encoding; however, it is oriented primarily towards allowing models to be encoded using XML. This document contains many examples of SBML models written in XML.More information about SBML and this specification is available online at "http://sbml.org/Documents/Specifications/":http://sbml.org/Documents/Specifications/.

48 citations


Journal ArticleDOI
TL;DR: This work has applied DaDi to human data from Africa, Europe, and East Asia, building the most complex statistically well-characterized model of human migration out of Africa to date.
Abstract: Models of demographic history (population sizes, migration rates, and divergence times) inferred from genetic data complement archeology and serve as null models in genome scans for selection. Most current inference methods are computationally limited to considering simple models or non-recombining data. We introduce a method based on a diffusion approximation to the joint frequency spectrum of genetic variation between populations. Our implementation, DaDi, can model up to three interacting populations and scales well to genome-wide data. We have applied DaDi to human data from Africa, Europe, and East Asia, building the most complex statistically well-characterized model of human migration out of Africa to date.

Journal ArticleDOI
TL;DR: It is suggested that occupancy modeling approaches hold promise for improving models of species distributions, however, occupancy models did not always significantly improve predictive performance of species distribution models.
Abstract: Models of species distributions are increasingly being used to address a variety of problems in ecology and conservation biology. In many applications, perfect detectability of species, given presence, is assumed. While this problem has been acknowledged and addressed through the development of occupancy models, we still know little regarding whether or not addressing the potential for imperfect detection improves the predictive performance of species distribution models in nature. Here, we ask if explicitly accounting for imperfect detection improves predictive performance of species distribution models relative to approaches that assume perfect detection. We contrast logistic regression models of species occurrence that do not correct for detectability to hierarchical occupancy models that explicitly estimate and adjust for detectability, and maximum entropy models that circumvent the detectability problem by using data from known presence locations only. We use a large-scale, long-term monitoring database across western Montana and northern Idaho to contrast these models for nine landbird species that cover a broad spectrum in detectability. Overall, occupancy models were similar to or better than other approaches in terms of predictive accuracy, as measured by the Area Under the ROC Curve (AUC) and Kappa, with maximum entropy tending to provide the lowest predictive accuracy. Models varied in the types of errors associated with predictions, such that some model approaches may be preferred over others in certain situations. As expected, predictive performance varied across a gradient in species detectability, with logistic regression providing lower relative performance for less detectable species and Maxent providing lower performance for highly detectable species. Occupancy models showed no strong relationship with detection probability and any source of predictive error, suggesting this approach can perform as well for highly detectable species as for difficult to detect species. We conclude by suggesting that occupancy modeling approaches hold promise for improving models of species distributions. However, occupancy models did not always significantly improve predictive performance of species distribution models. In these instances, issues of sampling design other than imperfect detection may be of greater importance for inference, and the onus is on ecologists to address the detection issue and other problems in sampling design in rigorous ways to evaluate potential biases in inference.

Journal ArticleDOI
TL;DR: Two new supervised learning rules for spiking neurons with temporal coding of information (chronotrons) are introduced, one that is analytically-derived and highly efficient, and one that has a high degree of biological plausibility.
Abstract: In many cases, neurons process information carried by the precise timing of spikes. Here we show how neurons can learn to generate specific temporally-precise output spikes in response to input spike patterns, thus processing and memorizing information that is fully temporally coded, both as input and as output. We introduce two new supervised learning rules for spiking neurons with temporal coding of information (chronotrons), one that is analytically-derived and highly efficient, and one that has a high degree of biological plausibility. We show how chronotrons can learn to classify their inputs and we study their memory capacity.

Journal ArticleDOI
TL;DR: This is the specification for Release 1 Candidate of *SBML Level 3 Version 1 Core*, an electronic model representation format for systems biology, oriented primarily towards allowing models to be encoded using XML.
Abstract: This is the specification for Release 1 Candidate of *SBML Level 3 Version 1 Core*, an electronic model representation format for systems biology.SBML is oriented towards describing biological processes of the sort common in research on a number of topics, including metabolic pathways, cell signaling pathways, and many others. SBML is defined neutrally with respect to programming languages and software encoding; however, it is oriented primarily towards allowing models to be encoded using XML. This document contains many examples of SBML models written in XML.More information about SBML and this specification is available online at "http://sbml.org/Documents/Specifications/ ":http://sbml.org/Documents/Specifications/.

Journal ArticleDOI
TL;DR: This is the first experimental demonstration of either competitive or facilitative effects of an assemblage of native ungulates on domestic livestock in a savanna ecosystem, and a unique demonstration of a rainfall-dependent shift in competition-facilitation balance within any herbivore guild.
Abstract: Savanna ecosystems are vital for both economic and biodiversity values. In savannas worldwide, management decisions are based on the concept that wildlife and livestock compete for grassland resources[1-4], yet there are virtually no experimental data to support this assumption[1]. Specifically, the critical assessment of whether or not wild ungulates alter livestock performance (e.g., weight gain, reproduction or survival) has rarely been carried out, although diminished performance is an essential prerequisite for inferring competition[1]. Here we use a large-scale experiment in a semi-arid savanna in Kenya to show that wild ungulates do depress cattle performance (weight gain) during the dry season, indicating a competitive effect, but enhance cattle performance during the wet season, signifying facilitation. This is the first experimental demonstration of either competitive or facilitative effects of an assemblage of native ungulates on domestic livestock in a savanna ecosystem, and a unique demonstration of a rainfall-dependent shift in competition-facilitation balance within any herbivore guild. These results are critical for better understanding and management of wildlife-livestock coexistence in savanna ecosystems globally, and especially in the African savanna biome which crucially hosts the last remnants of an intact large herbivore fauna.

Journal ArticleDOI
TL;DR: In this paper, a bottom-up molecular-based mesoscale model that bridges the scales from Angstroms to hundreds of nanometers is used to show that the specific combination of a crystalline phase and a semi-amorphous matrix is crucial for the unique properties of silks.
Abstract: Spider silk is one of the strongest, most extensible and toughest biological materials known, exceeding the properties of many engineered materials including steel. Silks feature a hierarchical architecture where highly organized, densely H-bonded beta-sheet nanocrystals are arranged within a semi-amorphous protein matrix consisting of 31-helices and beta-turn protein structures. By using a bottom-up molecular-based mesoscale model that bridges the scales from Angstroms to hundreds of nanometers, here we show that the specific combination of a crystalline phase and a semi-amorphous matrix is crucial for the unique properties of silks. Specifically, our results reveal that the superior mechanical properties of spider silk can be explained solely by structural effects, where the geometric confinement of beta-sheet nanocrystals combined with highly extensible semi-amorphous domains with a large hidden length is the key to reach great strength and great toughness, despite the dominance of mechanically inferior chemical interactions such as H-bonding. Our model directly shows that semi-amorphous regions unravel first when silk is being stretched, leading to the large extensibility of silk. Conversely, the large-deformation mechanical properties and ultimate tensile strength of silk is controlled by the strength of beta-sheet nanocrystals, which is directly related to their size, where small beta-sheet nanocrystals are crucial to reach outstanding levels of strength and toughness. Our model agrees well with observations in recent experiments, where it was shown that a significant change in the strength and toughness can be achieved solely by tuning the size of beta-sheet nanocrystals. Our findings unveil the material design strategy that enables silks to achieve superior material performance despite simple and inferior constituents, resulting in a new paradigm in materials design where enhanced functionality is not achieved using complex building blocks, but rather through the utilization of simple repetitive constitutive elements arranged in hierarchical structures.

Journal ArticleDOI
TL;DR: In this paper, a pilot scale accelerated mineral carbonation (AMC) process was developed and tested by reacting flue gas with fly ash particles at one of the largest coal-fired power plants in the USA.
Abstract: Multiple CO2 capture and storage (CCS) processes are required to address anthropogenic CO2 problems. However, a method which can directly capture and mineralize CO2 at a point source, under actual field conditions, has advantages and could help offset the cost associated with the conventional CCS technologies. The mineral carbonation (MC), a process of converting CO2 into stable minerals (mineralization), has been studied extensively to store CO2. However, most of the MC studies have been largely investigated at laboratory scale. Objectives of this research were to develop a pilot scale AMC (accelerated mineral carbonation) process and test the effects of flue gas moisture content on carbonation of fly ash particles. A pilot scale AMC process consisting of a moisture reducing drum (MRD), a heater/humidifier, and a fluidized-bed reactor (FBR) was developed and tested by reacting flue gas with fly ash particles at one of the largest coal-fired power plants (2120 MW) in the USA. The experiments were conducted over a period of 2 hr at ~ 300 SCFM flow-rates, at a controlled pressure (115.1 kPa), and under different flue gas moisture contents (2-16%). The flue gas CO2 and SO2 concentrations were monitored before and during the experiments by an industrial grade gas analyzer. Fly ash samples were collected from the reactor sample port from 0-120 minutes and analyzed for total inorganic carbon (C), sulfur (S), and mercury (Hg). From C, S, and Hg concentrations, %calcium carbonate (CaCO3), %sulfate (SO42-), and %mercury carbonate (HgCO3) were calculated, respectively. Results suggested significant mineralization of flue gas CO2, SO2, and Hg within 10-15 minutes of reaction. Among different moisture conditions, ~16% showed highest conversion of flue gas CO2 and SO2 to %CaCO3 and %SO42- in fly ash samples. For example, an increase of almost 4% in CaCO3 content of fly ash was observed. Overall, the AMC process is cost-effective with minimum carbon footprint and can be retrofitted to coal fired power plants (existing and/or new) as a post-combustion unit to minimize flue gas CO2, SO2, and Hg emissions into the atmosphere. Used in conjunction with capture and geologic sequestration, the AMC process has the potential to reduce overall cost associated with CO2 separation/compression/transportation/pore space/brine water treatment. It could also help protect sensitive amines and carbon filters used in flue gas CO2 capture and separation process and extend their life.

Journal ArticleDOI
TL;DR: In this article, a range of neutral and ionic inorganic and organic compounds using various levels and combinations of Hartree-Fock and density functional theory (DFT) and composite methods (CBS-Q//B3, G4MP2, and G4) with the IEFPCM-UFF, CPCM, and SMD solvation models in Gaussian 09 (G09) were calculated for a subset of highly polar and generally polyfunctional neutral organic compounds.
Abstract: Gas to aqueous phase standard state (1 atm to 1 mol/L; 298.15 K) free energies of solvation ([DELTA]G^o^~solv~) were calculated for a range of neutral and ionic inorganic and organic compounds using various levels and combinations of Hartree-Fock and density functional theory (DFT) and composite methods (CBS-Q//B3, G4MP2, and G4) with the IEFPCM-UFF, CPCM, and SMD solvation models in Gaussian 09 (G09). For a subset of highly polar and generally polyfunctional neutral organic compounds previously identified as problematic for prior solvation models, we find significantly reduced [DELTA]G^o^~solv~ errors using the revised solvent models in G09. The use of composite methods for these compounds also substantially reduces their apparent [DELTA]G^o^~solv~ errors. In contrast, no general level of theory effects between the B3LYP/6-31+G** and G4 methods were observed on a suite of simpler neutral, anionic, and cationic molecules commonly used to benchmark solvation models. Further investigations on mono- and polyhalogenated short chain alkanes and alkenes and other possibly difficult functional groups also revealed significant [DELTA]G^o^~solv~ error reductions by increasing the level of theory from DFT to G4. Future solvent model benchmarking efforts should include high level composite method calculations to allow better discrimination of potential error sources between the levels of theory and the solvation models.

Journal ArticleDOI
TL;DR: All reported categories of non-canonical splicing could be replicated using an in vitro reverse transcription system with highly purified RNA substrates, and the reproducible occurrence of ostensible trans-splicing, exon shuffling and sense-antisense fusions is observed.
Abstract: Trans-splicing, the in vivo joining of two RNA molecules, is well characterized in several groups of simple organisms but was long thought absent from fungi, plants and mammals. However, recent bioinformatic analyses of expressed sequence tag (EST) databases suggested widespread trans-splicing in mammals^1-2^. Splicing, including the characterised trans-splicing systems, involves conserved sequences at the splice junctions. Our analysis of a yeast non-coding RNA revealed that around 30% of the products of reverse transcription lacked an internal region of 117 nt, suggesting that the RNA was spliced. The junction sequences lacked canonical splice-sites but were flanked by direct repeats, and further analyses indicated that the apparent splicing actually arose because reverse transcriptase can switch templates during transcription^3^. Many newly identified, apparently trans-spliced, RNAs lacked canonical splice sites but were flanked by short regions of homology, leading us to question their authenticity. Here we report that all reported categories of non-canonical splicing could be replicated using an in vitro reverse transcription system with highly purified RNA substrates. We observed the reproducible occurrence of ostensible trans-splicing, exon shuffling and sense-antisense fusions. The latter generate apparent antisense non-coding RNAs, which are also reported to be abundant in humans^4^. Different reverse transcriptases can generate different products of template switching, providing a simple diagnostic. Many reported examples of splicing in the absence of canonical splicing signals may be artefacts of cDNA preparation.

Journal ArticleDOI
TL;DR: This paper presents a meta-modelling procedure called “spot blindness” that allows for direct measurement of the activity of the brain’s “spatial checkpoints” in the response to stimulus.
Abstract: We report a functional magnetic resonance imaging experiment showing different activation patterns as a function of threatening signals from facial or bodily expressions and these differed between male and female participants as a function of male and female actors. Male observers showed a clear motor preparation response to threatening male body language.

Journal ArticleDOI
TL;DR: Phenex is described, a platform-independent desktop application designed to facilitate efficient and consistent annotation of phenotypic similarities and differences using Entity-Quality syntax, drawing on terms from community ontologies for anatomical entities, phenotypesic qualities, and taxonomic names.
Abstract: Phenotypic differences among species have long been systematically itemized and described by biologists in the process of investigating phylogenetic relationships and trait evolution. Traditionally, these descriptions have been expressed in natural language within the context of individual journal publications or monographs. Thus, this rich store of phenotype data has been largely unavailable for statistical and computational comparisons across studies or integration with other biological knowledge.Here we describe Phenex, a platform-independent desktop application designed to facilitate efficient and consistent annotation of phenotypic similarities and differences using Entity-Quality syntax, drawing on terms from community ontologies for anatomical entities, phenotypic qualities, and taxonomic names. Phenex can be be configured to load ontologies for different taxonomic groups. The graphical user interface was developed for, and tested by, evolutionary biologists accustomed to working with lists of taxa, characters, character states, and character-by-taxon matrices.Annotation of phenotypic data using ontologies and globally unique taxonomic identifiers will allow biologists to better leverage decades of work in systematics and comparative morphology and contribute to an ever more useful web of linked biological data.

Journal ArticleDOI
TL;DR: The Universal Genetic Test (UNIT), a non-invasive, saliva-based carrier test for more than 100 Mendelian diseases across all major population groups, represents a dramatic reduction in the cost and complexity of large scale population screening.
Abstract: *Background:*Mendelian disorders are individually rare but collectively common, forming a "long tail" of genetic disease. More than 20 million people worldwide suffer from a disease in this long tail before the age of 25, with minorities and developing countries at highest risk and with the number of carriers far in excess of this figure. Importantly, the Jewish community’s campaign for universal Tay-Sachs screening shows that these incurable diseases can nevertheless be prevented if carrier status is known before conception. A single highly-accurate assay for the long tail of Mendelian disease would allow us to scale this successful campaign up to the general population, thereby improving millions of lives, greatly benefiting minority health, and saving billions of dollars.*Methods and Findings:*We have addressed the need for such an assay by designing the Universal Genetic Test (UNIT), a non-invasive, saliva-based carrier test for more than 100 Mendelian diseases across all major population groups. We exhaustively validated the test with a median of 147 positive and 525 negative samples per variant. By combining probes for risk alleles with family history information, we show that we can achieve extremely high levels of accuracy (median 95% CI [0.99988, 0.999999]), precision (median 95% CI [0.99993, 0.99999]), sensitivity (median 95% CI [0.99988, 0.999999]), and specificity (median 95% CI [0.99643, 1]) at the level of individual mutations. In particular, through a combination of replicated probes and confirmatory testing, we are able to reliably detect rare alleles at q ≈ 1/1000 with positive predictive values above 0.995. To put this in context, this performance for a multiplex assay compares favorably with FDA-approved single-gene carrier tests.*Conclusions:*The UNIT represents a dramatic reduction in the cost and complexity of large scale population screening. With a single inexpensive assay for a substantial fraction of the global Mendelian disease burden, an end to many preventable genetic diseases is now in sight. Moreover, given that the assay requires only a saliva sample, it is for the first time feasible to contemplate an "at-home carrier test" as a successor to the at-home pregnancy test.*Authors' note:*_Nature Precedings is a preprint server used by scientists to communicate results in advance of the often lengthy publication process. This manuscript is a preprint and has not yet been accepted for publication by a peer-reviewed journal. It is currently undergoing expedited review at a peer-reviewed journal and is posted publicly to allow collegial feedback in advance of publication. Please address technical comments to balajis@stanford.edu_

Journal ArticleDOI
TL;DR: This paper discusses how the EMBRACE data and methods ontology (EDAM) can be used as background knowledge for the composition of bioinformatics workflows and illustrates how the ability to flexibly formulate domain-specific and problem-specific constraints supports the workflow development process.
Abstract: Methods for the automatic composition of services into executable workflows need detailed knowledge about the application domain, in particular about the available services and their behavior in terms of input/output data descriptions. In this paper we discuss how the EMBRACE data and methods ontology (EDAM) can be used as background knowledge for the composition of bioinformatics workflows. We show by means of a small example domain that the EDAM knowledge facilitates finding possible workflows, but that additional knowledge is required to guide the search towards actually adequate solutions. We illustrate how the ability to flexibly formulate domain-specific and problem-specific constraints supports the workflow development process.

Journal ArticleDOI
TL;DR: The vaginal microbiomes of mono- and dizygotic twin pairs selected from the over 170,000 twin pairs in the Mid-Atlantic Twin Registry are examined to hypothesize that genes of both host and bacteria have important impacts on the vaginal microbiome.
Abstract: The vagina is an interactive interface between the host and the environment. Its surface is covered by a protective epithelium colonized by bacteria and other microorganisms. The ectocervix is nonsterile, whereas the endocervix and the upper genital tract are assumed to be sterile in healthy women. Therefore, the cervix serves a pivotal role as a gatekeeper to protect the upper genital tract from microbial invasion and subsequent reproductive pathology. Microorganisms that cross this barrier can cause preterm labor, pelvic inflammatory disease, and other gynecologic and reproductive disorders. Homeostasis of the microbiome in the vagina and ectocervix plays a paramount role in reproductive health. Depending on its composition, the microbiome may protect the vagina from infectious or non-infectious diseases, or it may enhance its susceptibility to them. Because of the nature of this organ, and the fact that it is continuously colonized by bacteria from birth to death, it is virtually certain that this rich environment evolved in concert with its microbial flora. Specific interactions dictated by the genetics of both the host and microbes are likely responsible for maintaining both the environment and the microbiome. However, the genetic basis of these interactions in both the host and the bacterial colonizers is currently unknown. _Lactobacillus_ species are associated with vaginal health, but the role of these species in the maintenance of health is not yet well defined. Similarly, other species, including those representing minor components of the overall flora, undoubtedly influence the ability of potential pathogens to thrive and cause disease. Gross alterations in the vaginal microbiome are frequently observed in women with bacterial vaginosis, but the exact etiology of this disorder is still unknown. There are also implications for vaginal flora in non-infectious conditions such as pregnancy, pre-term labor and birth, and possibly fertility and other aspects of women’s health. Conversely, the role of environmental factors in the maintenance of a healthy vaginal microbiome is largely unknown. To explore these issues, we have proposed to address the following questions:*1. Do the genes of the host contribute to the composition of the vaginal microbiome?* We hypothesize that genes of both host and bacteria have important impacts on the vaginal microbiome. We are addressing this question by examining the vaginal microbiomes of mono- and dizygotic twin pairs selected from the over 170,000 twin pairs in the Mid-Atlantic Twin Registry (MATR). Subsequent studies, beyond the scope of the current project, may investigate which host genes impact the microbial flora and how they do so.*2. What changes in the microbiome are associated with common non-infectious pathological states of the host?* We hypothesize that altered physiological (e.g., pregnancy) and pathologic (e.g., immune suppression) conditions, or environmental exposures (e.g., antibiotics) predictably alter the vaginal microbiome. Conversely, certain vaginal microbiome characteristics are thought to contribute to a woman’s risk for outcomes such as preterm delivery. We are addressing this question by recruiting study participants from the ~40,000 annual clinical visits to women’s clinics of the VCU Health System.*3. What changes in the vaginal microbiome are associated with relevant infectious diseases and conditions?* We hypothesize that susceptibility to infectious disease (e.g. HPV, Chlamydia infection, vaginitis, vaginosis, etc.) is impacted by the vaginal microbiome. In turn, these infectious conditions clearly can affect the ability of other bacteria to colonize and cause pathology. Again, we are exploring these issues by recruiting participants from visitors to women’s clinics in the VCU Health System.Three kinds of sequence data are generated in this project: i) rDNA sequences from vaginal microbes; ii) whole metagenome shotgun sequences from vaginal samples; and iii) whole genome shotgun sequences of bacterial clones selected from vaginal samples. The study includes samples from three vaginal sites: mid-vaginal, cervical, and introital. The data sets also include buccal and perianal samples from all twin participants. Samples from these additional sites are used to test the hypothesis of a per continuum spread of bacteria in relation to vaginal health. An extended set of clinical metadata associated with these sequences are deposited with dbGAP. We have currently collected over 4,400 samples from ~100 twins and over 450 clinical participants. We have analyzed and deposited data for 480 rDNA samples, eight whole metagenome shotgun samples, and over 50 complete bacterial genomes. These data are available to accredited investigators according to NIH and Human Microbiome Project (HMP) guidelines. The bacterial clones are deposited in the Biodefense and Emerging Infections Research Resources Repository ("http://www.beiresources.org/":http://www.beiresources.org/).In addition to the extensive sequence data obtained in this study, we are collecting metadata associated with each of the study participants. Thus, participants are asked to complete an extensive health history questionnaire at the time samples are collected. Selected clinical data associated with the visit are also obtained, and relevant information is collected from the medical records when available. This data is maintained securely in a HIPAA-compliant data system as required by VCU’s Institutional Review Board (IRB). The preponderance of these data (i.e., that judged appropriate by NIH staff and VCU’s IRB are deposited at dbGAP ("http://www.ncbi.nlm.nih.gov/gap":http://www.ncbi.nlm.nih.gov/gap). Selected fields of this data have been identified by NIH staff as ‘too sensitive’ and are not available in dbGAP. Individuals requiring access to these data fields are asked to contact the PI of this project or NIH Program Staff.

Journal ArticleDOI
TL;DR: Dryad as mentioned in this paper is a digital repository for data underlying published articles in the biosciences that grew out of a grassroots effort to support a joint data archiving policy adopted by a consortium of journals in ecology and evolutionary biology.
Abstract: Here we describe the motivation and workings of Dryad, a digital repository for data underlying published articles in the biosciences, that grew out of a grassroots effort to support a joint data archiving policy adopted by a consortium of journals in ecology and evolutionary biology.

Journal ArticleDOI
TL;DR: The UniProt Knowledgebase (UniProtKB) as discussed by the authors acts as a central hub of protein knowledge by providing a unified view of protein sequence and functional information, and all information in UniProtKB is attributed to its original source.
Abstract: Data integration plays an increasingly important role in bringing together the large amounts of diverse information spread across disparate resources and presenting a comprehensive overview of these data to the scientific community. The UniProt Knowledgebase (UniProtKB) acts as a central hub of protein knowledge by providing a unified view of protein sequence and functional information. Manual and automatic annotation procedures are used to add data directly to the database while extensive cross-referencing to more than 120 external databases provides access to additional relevant information in more specialised data collections. UniProtKB also integrates data such as protein sequences, protein-protein interactions, Gene Ontology terms and official gene nomenclature from a range of resources. All information in UniProtKB is attributed to its original source, allowing users to trace the provenance of all data. In addition, UniProtKB data is made freely available in a range of formats to facilitate integration with other databases and the UniProt Consortium is committed to using and promoting common data exchange formats and technologies. This approach ensures that information is captured in the most appropriate resource for subsequent integration with other databases and also ensures maximum curation efficiency by preventing duplication of efforts across multiple resources. How UniProt achieves this data capture and integration will be presented. The UniProt resource is available at "www.uniprot.org":http://www.uniprot.org.

Journal ArticleDOI
TL;DR: In this paper, the authors used glucagon as a model system for protein fibrillation and showed that fibrils grow in an intermittent fashion, with periods of growth followed by long pauses.
Abstract: Many human diseases are associated with protein aggregation and fibrillation. Using glucagon as a model system for protein fibrillation we show that fibrils grow in an intermittent fashion, with periods of growth followed by long pauses. Remarkably, even if the intrinsic transition rates vary considerably in each experiment, the probability of being in the growing (stopping) state is very close to 1/4 (3/4), suggesting the presence of 4 independent conformations of the fibril tip. We discuss this possibility in terms of existing structural knowledge.

Journal ArticleDOI
TL;DR: The concept of noninvasive determination of the fetal genome by shotgun sequencing maternal plasma is illustrated and translated to maternal plasma DNA samples, together with increased sequencing depth and phase knowledge of additional numbers of parental SNPs, should enable clinically practical sequencing of the maternal genome.
Abstract: We illustrate the concept of noninvasive determination of the fetal genome by shotgun sequencing maternal plasma. The approach is based on molecular counting of alleles in maternal cell-free DNA: the inheritance of paternal haplotypes can be determined by counting paternal specific alleles present on each of the paternal haplotypes; the inheritance of maternal haplotypes can be revealed by counting the alleles on each maternal haplotype and determining the relative representation of the two maternal haplotypes. The concept was experimentally proven by sequencing a synthetic mixture of genomic DNA samples from a child and her mother, whose whole-genome haplotypes (defined by ~800,000 SNPs), together with those of the father, were previously determined. Light sequencing (0.25x) of such sample containing ~16% child’s DNA enabled the inheritance of parental haplotypes to be correctly resolved over most part of the genome, and partially resolved when prior knowledge of paternal whole-genome haplotypes is absent. Translating this approach to maternal plasma DNA samples, together with increased sequencing depth and phase knowledge of additional numbers of parental SNPs, should enable clinically practical sequencing of the fetal genome.

Journal ArticleDOI
TL;DR: In this article, the effect of controlled atmospheric conditions on resulting electrospun cellulose acetate (CA) nanofibers was evaluated for temperature ranging 17.5 - 35°C and relative humidity ranging 20% - 70%.
Abstract: To fabricate nanofibers with reproducible characteristics, an important demand for many applications, the effect of controlled atmospheric conditions on resulting electrospun cellulose acetate (CA) nanofibers was evaluated for temperature ranging 17.5 - 35°C and relative humidity ranging 20% - 70%. With the potential application of nanofibers in many industries, especially membrane and filter fabrication, their reproducible production must be established to ensure commercially viability.Cellulose acetate (CA) solution (0.2 g/ml) in a solvent mixture of acetone/DMF/ethanol (2:2:1) was electrospun into nonwoven fibre mesh with the fibre diameter ranging from 150nm to 1µm.The resulting nanofibers were observed and analyzed by scanning electron microscopy (SEM), showing a correlation of reducing average fibre diameter with increasing atmospheric temperature. A less pronounced correlation was seen with changes in relative humidity regarding fibre diameter, though it was shown that increased humidity reduced the effect of fibre beading yielding a more consistent, and therefore better quality of fibre fabrication.Differential scanning calorimetry (DSC) studies observed lower melt enthalpies for finer CA nanofibers in the first heating cycle confirming the results gained from SEM analysis. From the conditions that were explored in this study the temperature and humidity that gave the most suitable fibre mats for a membrane purpose were 25.0°C and 50%RH due to the highest level of fibre diameter uniformity, the lowest level of beading while maintaining a low fibre diameter for increased surface area and increased pore size homogeneity. This study has highlighted the requirement to control the atmospheric conditions during the electrospinning process in order to fabricate reproducible fibre mats.

Journal ArticleDOI
TL;DR: This study shows that the porosity play an important role on the mechanical property of the scaffolds, which is affected not only by the macropores size, but also by the interconnections of the erected scaffolds.
Abstract: The porous structure of biomaterials plays a critical role in improving the efficiency of biomaterials in tissue engineering. Here we fabricate successfully porous bioceramics with accurately controlled pore parameters, and investigate the effect of pore parameters on the mechanical property, the cell seeding proliferation and the vascularization of the scaffolds. This study shows that the porosity play an important role on the mechanical property of the scaffolds, which is affected not only by the macropores size, but also by the interconnections of the scaffolds. Larger pores are beneficial for cell growth in scaffolds. In contrast, the interconnections do not affect cell growth much. The interconnections appear to limit the number of blood vessels penatrating through adjacent pores, and both the pores size and interconnections can determine the size of blood vessels. The results may be referenced on the selective design of porous structure of biomaterials to meet the specificity of biological application.