scispace - formally typeset
Search or ask a question

Showing papers in "PLOS Genetics in 2005"


Journal ArticleDOI
TL;DR: This work introduces “surrogate variable analysis” (SVA) to overcome the problems caused by heterogeneity in expression studies and shows that SVA increases the biological accuracy and reproducibility of analyses in genome-wide expression studies.
Abstract: It has unambiguously been shown that genetic, environmental, demographic, and technical factors may have substantial effects on gene expression levels. In addition to the measured variable(s) of interest, there will tend to be sources of signal due to factors that are unknown, unmeasured, or too complicated to capture through simple models. We show that failing to incorporate these sources of heterogeneity into an analysis can have widespread and detrimental effects on the study. Not only can this reduce power or induce unwanted dependence across genes, but it can also introduce sources of spurious signal to many genes. This phenomenon is true even for well-designed, randomized studies. We introduce “surrogate variable analysis” (SVA) to overcome the problems caused by heterogeneity in expression studies. SVA can be applied in conjunction with standard analysis techniques to accurately capture the relationship between expression and any modeled variables of interest. We apply SVA to disease class, time course, and genetics of gene expression studies. We show that SVA increases the biological accuracy and reproducibility of analyses in genome-wide expression studies.

1,779 citations


Journal ArticleDOI
TL;DR: In this paper, a genome-wide association scan was conducted to identify genetic variants associated with obesity-related quantitative traits in the genetically isolated population of Sardinia, and the results showed that common genetic variants in the FTO gene are associated with substantial changes in body weight.
Abstract: The obesity epidemic is responsible for a substantial economic burden in developed countries and is a major risk factor for type 2 diabetes and cardiovascular disease. The disease is the result not only of several environmental risk factors, but also of genetic predisposition. To take advantage of recent advances in gene-mapping technology, we executed a genome-wide association scan to identify genetic variants associated with obesity-related quantitative traits in the genetically isolated population of Sardinia. Initial analysis suggested that several SNPs in the FTO and PFKP genes were associated with increased BMI, hip circumference, and weight. Within the FTO gene, rs9930506 showed the strongest association with BMI (p ¼ 8.6 310 � 7 ), hip circumference (p ¼ 3.4 3 10 � 8 ), and weight (p ¼ 9.1 3 10 � 7 ). In Sardinia, homozygotes for the rare ‘‘G’’ allele of this SNP (minor allele frequency ¼ 0.46) were 1.3 BMI units heavier than homozygotes for the common ‘‘A’’ allele. Within the PFKP gene, rs6602024 showed very strong association with BMI (p ¼4.9 310 � 6 ). Homozygotes for the rare ‘‘A’’ allele of this SNP (minor allele frequency ¼0.12) were 1.8 BMI units heavier than homozygotes for the common ‘‘G’’ allele. To replicate our findings, we genotyped these two SNPs in the GenNet study. In European Americans (N ¼ 1,496) and in Hispanic Americans (N ¼ 839), we replicated significant association between rs9930506 in the FTO gene and BMI (p-value for meta-analysis of European American and Hispanic American follow-up samples, p ¼0.001), weight (p ¼0.001), and hip circumference (p ¼0.0005). We did not replicate association between rs6602024 and obesity-related traits in the GenNet sample, although we found that in European Americans, Hispanic Americans, and African Americans, homozygotes for the rare ‘‘A’’ allele were, on average, 1.0–3.0 BMI units heavier than homozygotes for the more common ‘‘G’’ allele. In summary, we have completed a whole genome– association scan for three obesity-related quantitative traits and report that common genetic variants in the FTO gene are associated with substantial changes in BMI, hip circumference, and body weight. These changes could have a significant impact on the risk of obesity-related morbidity in the general population.

1,619 citations


Journal ArticleDOI
TL;DR: It is shown here that MO off-targeting results in induction of a p53-dependent cell death pathway, and p53 inhibition could potentially be applicable to other systems to suppress off- target effects caused by other knockdown technologies.
Abstract: Morpholino phosphorodiamidate antisense oligonucleotides (MOs) and short interfering RNAs (siRNAs) are commonly used platforms to study gene function by sequence-specific knockdown. Both technologies, however, can elicit undesirable off-target effects. We have used several model genes to study these effects in detail in the zebrafish, Danio rerio. Using the zebrafish embryo as a template, correct and mistargeting effects are readily discernible through direct comparison of MO-injected animals with well-studied mutants. We show here indistinguishable off-targeting effects for both maternal and zygotic mRNAs and for both translational and splice-site targeting MOs. The major off-targeting effect is mediated through p53 activation, as detected through the transferase-mediated dUTP nick end labeling assay, acridine orange, and p21 transcriptional activation assays. Concurrent knockdown of p53 specifically ameliorates the cell death induced by MO off-targeting. Importantly, reversal of p53-dependent cell death by p53 knockdown does not affect specific loss of gene function, such as the cell death caused by loss of function of chordin. Interestingly, quantitative reverse-transcriptase PCR, microarrays and whole-mount in situ hybridization assays show that MO off-targeting effects are accompanied by diagnostic transcription of an N-terminal truncated p53 isoform that uses a recently recognized internal p53 promoter. We show here that MO off-targeting results in induction of a p53-dependent cell death pathway. p53 activation has also recently been shown to be an unspecified off-target effect of siRNAs. Both commonly used knockdown technologies can thus induce secondary but sequence-specific p53 activation. p53 inhibition could potentially be applicable to other systems to suppress off-target effects caused by other knockdown technologies.

1,019 citations


Journal ArticleDOI
TL;DR: It is concluded that interactions at thelevel of genes are not likely to generate much interaction at the level of variance, and that additive variance typically accounts for over half, and often close to 100%, of the total genetic variance.
Abstract: The relative proportion of additive and non-additive variation for complex traits is important in evolutionary biology, medicine, and agriculture. We address a long-standing controversy and paradox about the contribution of non-additive genetic variation, namely that knowledge about biological pathways and gene networks imply that epistasis is important. Yet empirical data across a range of traits and species imply that most genetic variance is additive. We evaluate the evidence from empirical studies of genetic variance components and find that additive variance typically accounts for over half, and often close to 100%, of the total genetic variance. We present new theoretical results, based upon the distribution of allele frequencies under neutral and other population genetic models, that show why this is the case even if there are non-additive effects at the level of gene action. We conclude that interactions at the level of genes are not likely to generate much interaction at the level of variance.

985 citations


Journal ArticleDOI
TL;DR: The data provide insights into the mechanism by which defects in an IFT protein, Tg737/Polaris, affect Shh signaling in the murine limb bud and support a model where cilia have a direct role in Gli processing and Shh signal transduction.
Abstract: Intraflagellar transport (IFT) proteins are essential for cilia assembly and have recently been associated with a number of developmental processes, such as left–right axis specification and limb and neural tube patterning. Genetic studies indicate that IFT proteins are required for Sonic hedgehog (Shh) signaling downstream of the Smoothened and Patched membrane proteins but upstream of the Glioma (Gli) transcription factors. However, the role that IFT proteins play in transduction of Shh signaling and the importance of cilia in this process remain unknown. Here we provide insights into the mechanism by which defects in an IFT protein, Tg737/Polaris, affect Shh signaling in the murine limb bud. Our data show that loss of Tg737 results in altered Gli3 processing that abrogates Gli3-mediated repression of Gli1 transcriptional activity. In contrast to the conclusions drawn from genetic analysis, the activity of Gli1 and truncated forms of Gli3 (Gli3R) are unaffected in Tg737 mutants at the molecular level, indicating that Tg737/Polaris is differentially involved in specific activities of the Gli proteins. Most important, a negative regulator of Shh signaling, Suppressor of fused, and the three full-length Gli transcription factors localize to the distal tip of cilia in addition to the nucleus. Thus, our data support a model where cilia have a direct role in Gli processing and Shh signal transduction.

901 citations


Journal ArticleDOI
TL;DR: A new mutation in MSTN found in the whippet dog breed that results in a double-muscled phenotype known as the “bully” whippets is described, marking the first time a mutation in the myostatin gene has been quantitatively linked to increased athletic performance.
Abstract: Double muscling is a trait previously described in several mammalian species including cattle and sheep and is caused by mutations in the myostatin (MSTN) gene (previously referred to as GDF8). Here we describe a new mutation in MSTN found in the whippet dog breed that results in a double-muscled phenotype known as the “bully” whippet. Individuals with this phenotype carry two copies of a two-base-pair deletion in the third exon of MSTN leading to a premature stop codon at amino acid 313. Individuals carrying only one copy of the mutation are, on average, more muscular than wild-type individuals (p = 7.43 × 10−6; Kruskal-Wallis Test) and are significantly faster than individuals carrying the wild-type genotype in competitive racing events (Kendall's nonparametric measure, τ = 0.3619; p ≈ 0.00028). These results highlight the utility of performance-enhancing polymorphisms, marking the first time a mutation in MSTN has been quantitatively linked to increased athletic performance.

738 citations


Journal ArticleDOI
TL;DR: By combining the association results with results from linkage mapping in F2 crosses, this study identifies one previously known true positive and several promising new associations, but also demonstrates the existence of both false positives and false negatives.
Abstract: A potentially serious disadvantage of association mapping is the fact that marker-trait associations may arise from confounding population structure as well as from linkage to causative polymorphisms. Using genome-wide marker data, we have previously demonstrated that the problem can be severe in a global sample of 95 Arabidopsis thaliana accessions, and that established methods for controlling for population structure are generally insufficient. Here, we use the same sample together with a number of flowering-related phenotypes and data-perturbation simulations to evaluate a wider range of methods for controlling for population structure. We find that, in terms of reducing the false-positive rate while maintaining statistical power, a recently introduced mixed-model approach that takes genome-wide differences in relatedness into account via estimated pairwise kinship coefficients generally performs best. By combining the association results with results from linkage mapping in F2 crosses, we identify one previously known true positive and several promising new associations, but also demonstrate the existence of both false positives and false negatives. Our results illustrate the potential of genome-wide association scans as a tool for dissecting the genetics of natural variation, while at the same time highlighting the pitfalls. The importance of study design is clear; our study is severely under-powered both in terms of sample size and marker density. Our results also provide a striking demonstration of confounding by population structure. While statistical methods can be used to ameliorate this problem, they cannot always be effective and are certainly not a substitute for independent evidence, such as that obtained via crosses or transgenic experiments. Ultimately, association mapping is a powerful tool for identifying a list of candidates that is short enough to permit further genetic study.

735 citations


Journal ArticleDOI
TL;DR: It is proposed that the PMK-1 pathway is a specific, indispensable immunity pathway that mediates expression of secreted immune response genes, while the DAF-2–DAF-16 pathway appears to regulate immunity as part of a more general stress response.
Abstract: The PMK-1 p38 mitogen-activated protein kinase pathway and the DAF-2–DAF-16 insulin signaling pathway control Caenorhabditis elegans intestinal innate immunity. pmk-1 loss-of-function mutants have enhanced sensitivity to pathogens, while daf-2 loss-of-function mutants have enhanced resistance to pathogens that requires upregulation of the DAF-16 transcription factor. We used genetic analysis to show that the pathogen resistance of daf-2 mutants also requires PMK-1. However, genome-wide microarray analysis indicated that there was essentially no overlap between genes positively regulated by PMK-1 and DAF-16, suggesting that they form parallel pathways to promote immunity. We found that PMK-1 controls expression of candidate secreted antimicrobials, including C-type lectins, ShK toxins, and CUB-like genes. Microarray analysis demonstrated that 25% of PMK-1 positively regulated genes are induced by Pseudomonas aeruginosa infection. Using quantitative PCR, we showed that PMK-1 regulates both basal and infection-induced expression of pathogen response genes, while DAF-16 does not. Finally, we used genetic analysis to show that PMK-1 contributes to the enhanced longevity of daf-2 mutants. We propose that the PMK-1 pathway is a specific, indispensable immunity pathway that mediates expression of secreted immune response genes, while the DAF-2–DAF-16 pathway appears to regulate immunity as part of a more general stress response. The contribution of the PMK-1 pathway to the enhanced lifespan of daf-2 mutants suggests that innate immunity is an important determinant of longevity.

607 citations


Journal ArticleDOI
TL;DR: The network analysis suggests that the centrality-lethality rule is unrelated to the network architecture, but is explained by the simple fact that hubs have large numbers of PPIs, therefore high probabilities of engaging in essential PPIs.
Abstract: The protein–protein interaction (PPI) network has a small number of highly connected protein nodes (known as hubs) and many poorly connected nodes. Genome-wide studies show that deletion of a hub protein is more likely to be lethal than deletion of a non-hub protein, a phenomenon known as the centrality-lethality rule. This rule is widely believed to reflect the special importance of hubs in organizing the network, which in turn suggests the biological significance of network architectures, a key notion of systems biology. Despite the popularity of this explanation, the underlying cause of the centrality-lethality rule has never been critically examined. We here propose the concept of essential PPIs, which are PPIs that are indispensable for the survival or reproduction of an organism. Our network analysis suggests that the centrality-lethality rule is unrelated to the network architecture, but is explained by the simple fact that hubs have large numbers of PPIs, therefore high probabilities of engaging in essential PPIs. We estimate that ~ 3% of PPIs are essential in the yeast, accounting for ~ 43% of essential genes. As expected, essential PPIs are evolutionarily more conserved than nonessential PPIs. Considering the role of essential PPIs in determining gene essentiality, we find the yeast PPI network functionally more robust than random networks, yet far less robust than the potential optimum. These and other findings provide new perspectives on the biological relevance of network structure and robustness.

597 citations


Journal ArticleDOI
TL;DR: The application shows that it is feasible to estimate genetic variance solely from within-family segregation and provides an independent validation of previously untestable assumptions, and will allow partitioning of genetic variation into additive and non-additive components.
Abstract: The study of continuously varying, quantitative traits is important in evolutionary biology, agriculture, and medicine. Variation in such traits is attributable to many, possibly interacting, genes whose expression may be sensitive to the environment, which makes their dissection into underlying causative factors difficult. An important population parameter for quantitative traits is heritability, the proportion of total variance that is due to genetic factors. Response to artificial and natural selection and the degree of resemblance between relatives are all a function of this parameter. Following the classic paper by R. A. Fisher in 1918, the estimation of additive and dominance genetic variance and heritability in populations is based upon the expected proportion of genes shared between different types of relatives, and explicit, often controversial and untestable models of genetic and non-genetic causes of family resemblance. With genome-wide coverage of genetic markers it is now possible to estimate such parameters solely within families using the actual degree of identity-by-descent sharing between relatives. Using genome scans on 4,401 quasi-independent sib pairs of which 3,375 pairs had phenotypes, we estimated the heritability of height from empirical genome-wide identity-by-descent sharing, which varied from 0.374 to 0.617 (mean 0.498, standard deviation 0.036). The variance in identity-by-descent sharing per chromosome and per genome was consistent with theory. The maximum likelihood estimate of the heritability for height was 0.80 with no evidence for non-genetic causes of sib resemblance, consistent with results from independent twin and family studies but using an entirely separate source of information. Our application shows that it is feasible to estimate genetic variance solely from within-family segregation and provides an independent validation of previously untestable assumptions. Given sufficient data, our new paradigm will allow the estimation of genetic variation for disease susceptibility and quantitative traits that is free from confounding with non-genetic factors and will allow partitioning of genetic variation into additive and non-additive components.

593 citations


Journal ArticleDOI
TL;DR: The data suggest that TFL2/LHP1 recognizes specifically H3K27me3 in vivo as part of a mechanism that represses the expression of many genes targeted by PRC2.
Abstract: TERMINAL FLOWER 2/LIKE HETEROCHROMATIN PROTEIN 1 (TFL2/LHP1) is the only Arabidopsis protein with overall sequence similarity to the HETEROCHROMATIN PROTEIN 1 (HP1) family of metazoans and S. pombe. TFL2/LHP1 represses transcription of numerous genes, including the flowering-time genes FLOWERING LOCUS T (FT) and FLOWERING LOCUS C (FLC), as well as the floral organ identity genes AGAMOUS (AG) and APETALA 3 (AP3). These genes are also regulated by proteins of the Polycomb repressive complex 2 (PRC2), and it has been proposed that TFL2/LHP1 represents a potential stabilizing factor of PRC2 activity. Here we show by chromatin immunoprecipitation and hybridization to an Arabidopsis Chromosome 4 tiling array (ChIP-chip) that TFL2/LHP1 associates with hundreds of small domains, almost all of which correspond to genes located within euchromatin. We investigated the chromatin marks to which TFL2/LHP1 binds and show that, in vitro, TFL2/LHP1 binds to histone H3 di- or tri-methylated at lysine 9 (H3K9me2 or H3K9me3), the marks recognized by HP1, and to histone H3 trimethylated at lysine 27 (H3K27me3), the mark deposited by PRC2. However, in vivo TFL2/LHP1 association with chromatin occurs almost exclusively and co-extensively with domains marked by H3K27me3, but not H3K9me2 or -3. Moreover, the distribution of H3K27me3 is unaffected in lhp1 mutant plants, indicating that unlike PRC2 components, TFL2/LHP1 is not involved in the deposition of this mark. Rather, our data suggest that TFL2/LHP1 recognizes specifically H3K27me3 in vivo as part of a mechanism that represses the expression of many genes targeted by PRC2.

Journal ArticleDOI
TL;DR: A new framework for the analysis of association studies is introduced, designed to allow untyped variants to be more effectively and directly tested for association with a phenotype, and results in increased power to detect association, even in cases in which the causal variant is typed.
Abstract: We introduce a new framework for the analysis of association studies, designed to allow untyped variants to be more effectively and directly tested for association with a phenotype The idea is to combine knowledge on patterns of correlation among SNPs (eg, from the International HapMap project or resequencing data in a candidate region of interest) with genotype data at tag SNPs collected on a phenotyped study sample, to estimate (“impute”) unmeasured genotypes, and then assess association between the phenotype and these estimated genotypes Compared with standard single-SNP tests, this approach results in increased power to detect association, even in cases in which the causal variant is typed, with the greatest gain occurring when multiple causal variants are present It also provides more interpretable explanations for observed associations, including assessing, for each SNP, the strength of the evidence that it (rather than another correlated SNP) is causal Although we focus on association studies with quantitative phenotype and a relatively restricted region (eg, a candidate gene), the framework is applicable and computationally practical for whole genome association studies Methods described here are implemented in a software package, Bim-Bam, available from the Stephens Lab website http://stephenslabuchicagoedu/softwarehtml

Journal ArticleDOI
TL;DR: The results indicate that BMPs neither act as secondary signals downstream of Sonic Hedghog in patterning the anteroposterior axis nor as signals from the interdigital mesenchyme in specifying digit identity, and it is found that the loss of both BMP2 and BMP4 results in a severe impairment of osteogenesis.
Abstract: Bone morphogenetic protein (BMP) family members, including BMP2, BMP4, and BMP7, are expressed throughout limb development. BMPs have been implicated in early limb patterning as well as in the process of skeletogenesis. However, due to complications associated with early embryonic lethality, particularly for Bmp2 and Bmp4, and with functional redundancy among BMP molecules, it has been difficult to decipher the specific roles of these BMP molecules during different stages of limb development. To circumvent these issues, we have constructed a series of mouse strains lacking one or more of these BMPs, using conditional alleles in the case of Bmp2 and Bmp4 to remove them specifically from the limb bud mesenchyme. Contrary to earlier suggestions, our results indicate that BMPs neither act as secondary signals downstream of Sonic Hedghog (SHH) in patterning the anteroposterior axis nor as signals from the interdigital mesenchyme in specifying digit identity. We do find that a threshold level of BMP signaling is required for the onset of chondrogenesis, and hence some chondrogenic condensations fail to form in limbs deficient in both BMP2 and BMP4. However, in the condensations that do form, subsequent chondrogenic differentiation proceeds normally even in the absence of BMP2 and BMP7 or BMP2 and BMP4. In contrast, we find that the loss of both BMP2 and BMP4 results in a severe impairment of osteogenesis.

Journal ArticleDOI
TL;DR: For example, the analysis of chromosome breakpoints in the proximal short arm of Chromosome 17 (17p) reveals nonallelic homologous recombination (NAHR) as a major mechanism for recurrent rearrangements whereas nonhomologous end-joining (NHEJ) can be responsible for many of the non-recurrent rearrangement.
Abstract: Rearrangements of our genome can be responsible for inherited as well as sporadic traits The analyses of chromosome breakpoints in the proximal short arm of Chromosome 17 (17p) reveal nonallelic homologous recombination (NAHR) as a major mechanism for recurrent rearrangements whereas nonhomologous end-joining (NHEJ) can be responsible for many of the nonrecurrent rearrangements Genome architectural features consisting of low-copy repeats (LCRs), or segmental duplications, can stimulate and mediate NAHR, and there are hotspots for the crossovers within the LCRs Rearrangements introduce variation into our genome for selection to act upon and as such serve an evolutionary function analogous to base pair changes Genomic rearrangements may cause Mendelian diseases, produce complex traits such as behaviors, or represent benign polymorphic changes The mechanisms by which rearrangements convey phenotypes are diverse and include gene dosage, gene interruption, generation of a fusion gene, position effects, unmasking of recessive coding region mutations (single nucleotide polymorphisms, SNPs, in coding DNA) or other functional SNPs, and perhaps by effects on transvection

Journal ArticleDOI
TL;DR: The results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I) HapMap has sufficient density to enable linkage disequilibrium mapping in humans.
Abstract: The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12–13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs) with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis-) to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I) HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level.

Journal ArticleDOI
TL;DR: Analysis of the 993-locus dataset corroborates earlier results: if enough markers are used with a sufficiently large worldwide sample, individuals can be partitioned into genetic clusters that match major geographic subdivisions of the globe, with some individuals from intermediate geographic locations having mixed membership in the clusters that correspond to neighboring regions.
Abstract: Previously, we observed that without using prior information about individual sampling locations, a clustering algorithm applied to multilocus genotypes from worldwide human populations produced genetic clusters largely coincident with major geographic regions. It has been argued, however, that the degree of clustering is diminished by use of samples with greater uniformity in geographic distribution, and that the clusters we identified were a consequence of uneven sampling along genetic clines. Expanding our earlier dataset from 377 to 993 markers, we systematically examine the influence of several study design variables—sample size, number of loci, number of clusters, assumptions about correlations in allele frequencies across populations, and the geographic dispersion of the sample—on the “clusteredness” of individuals. With all other variables held constant, geographic dispersion is seen to have comparatively little effect on the degree of clustering. Examination of the relationship between genetic and geographic distance supports a view in which the clusters arise not as an artifact of the sampling scheme, but from small discontinuous jumps in genetic distance for most population pairs on opposite sides of geographic barriers, in comparison with genetic distance for pairs on the same side. Thus, analysis of the 993-locus dataset corroborates our earlier results: if enough markers are used with a sufficiently large worldwide sample, individuals can be partitioned into genetic clusters that match major geographic subdivisions of the globe, with some individuals from intermediate geographic locations having mixed membership in the clusters that correspond to neighboring regions.

Journal ArticleDOI
TL;DR: Significant evidence for heritability of many medically important traits, including cardiovascular function and personality is found, and evidence for heterogeneity by age and sex suggests that models allowing for these differences will be important in mapping quantitative traits.
Abstract: In family studies, phenotypic similarities between relatives yield information on the overall contribution of genes to trait variation. Large samples are important for these family studies, especially when comparing heritability between subgroups such as young and old, or males and females. We recruited a cohort of 6,148 participants, aged 14–102 y, from four clustered towns in Sardinia. The cohort includes 34,469 relative pairs. To extract genetic information, we implemented software for variance components heritability analysis, designed to handle large pedigrees, analyze multiple traits simultaneously, and model heterogeneity. Here, we report heritability analyses for 98 quantitative traits, focusing on facets of personality and cardiovascular function. We also summarize results of bivariate analyses for all pairs of traits and of heterogeneity analyses for each trait. We found a significant genetic component for every trait. On average, genetic effects explained 40% of the variance for 38 blood tests, 51% for five anthropometric measures, 25% for 20 measures of cardiovascular function, and 19% for 35 personality traits. Four traits showed significant evidence for an X-linked component. Bivariate analyses suggested overlapping genetic determinants for many traits, including multiple personality facets and several traits related to the metabolic syndrome; but we found no evidence for shared genetic determinants that might underlie the reported association of some personality traits and cardiovascular risk factors. Models allowing for heterogeneity suggested that, in this cohort, the genetic variance was typically larger in females and in younger individuals, but interesting exceptions were observed. For example, narrow heritability of blood pressure was approximately 26% in individuals more than 42 y old, but only approximately 8% in younger individuals. Despite the heterogeneity in effect sizes, the same loci appear to contribute to variance in young and old, and in males and females. In summary, we find significant evidence for heritability of many medically important traits, including cardiovascular function and personality. Evidence for heterogeneity by age and sex suggests that models allowing for these differences will be important in mapping quantitative traits.

Journal ArticleDOI
TL;DR: The findings demonstrate the existence of tissue-specific expression of tRNA species that strongly implicates a role for tRNA heterogeneity in regulating translation and possibly additional processes in vertebrate organisms.
Abstract: Over 450 transfer RNA (tRNA) genes have been annotated in the human genome. Reliable quantitation of tRNA levels in human samples using microarray methods presents a technical challenge. We have developed a microarray method to quantify tRNAs based on a fluorescent dye-labeling technique. The first-generation tRNA microarray consists of 42 probes for nuclear encoded tRNAs and 21 probes for mitochondrial encoded tRNAs. These probes cover tRNAs for all 20 amino acids and 11 isoacceptor families. Using this array, we report that the amounts of tRNA within the total cellular RNA vary widely among eight different human tissues. The brain expresses higher overall levels of nuclear encoded tRNAs than every tissue examined but one and higher levels of mitochondrial encoded tRNAs than every tissue examined. We found tissue-specific differences in the expression of individual tRNA species, and tRNAs decoding amino acids with similar chemical properties exhibited coordinated expression in distinct tissue types. Relative tRNA abundance exhibits a statistically significant correlation to the codon usage of a collection of highly expressed, tissue-specific genes in a subset of tissues or tRNA isoacceptors. Our findings demonstrate the existence of tissue-specific expression of tRNA species that strongly implicates a role for tRNA heterogeneity in regulating translation and possibly additional processes in vertebrate organisms.

Journal ArticleDOI
TL;DR: This response to floral induction by photoperiod and temperature has a genetic basis that is distinct from the known genetic pathways of floral transition, and appears to correlate with changes in RNA processing.
Abstract: The transition to flowering is an important event in the plant life cycle and is modulated by several environmental factors including photoperiod, light quality, vernalization, and growth temperature, as well as biotic and abiotic stresses. In contrast to light and vernalization, little is known about the pathways that mediate the responses to other environmental variables. A mild increase in growth temperature, from 23 °C to 27 °C, is equally efficient in inducing flowering of Arabidopsis plants grown in 8-h short days as is transfer to 16-h long days. There is extensive natural variation in this response, and we identify strains with contrasting thermal reaction norms. Exploiting this natural variation, we show that FLOWERING LOCUS C potently suppresses thermal induction, and that the closely related floral repressor FLOWERING LOCUS M is a major-effect quantitative trait locus modulating thermosensitivity. Thermal induction does not require the photoperiod effector CONSTANS, acts upstream of the floral integrator FLOWERING LOCUS T, and depends on the hormone gibberellin. Analysis of mutants defective in salicylic acid biosynthesis suggests that thermal induction is independent of previously identified stress-signaling pathways. Microarray analyses confirm that the genomic responses to floral induction by photoperiod and temperature differ. Furthermore, we report that gene products that participate in RNA splicing are specifically affected by thermal induction. Above a critical threshold, even small changes in temperature can act as cues for the induction of flowering. This response has a genetic basis that is distinct from the known genetic pathways of floral transition, and appears to correlate with changes in RNA processing.

Journal ArticleDOI
TL;DR: It is found that dietary restriction extends C. elegans' lifespan by down-regulating expression of key genes, including a gene required for methylation of many macromolecules, and that integrin signaling is likely to play a general, evolutionarily conserved role in lifespan regulation.
Abstract: Most of our knowledge about the regulation of aging comes from mutants originally isolated for other phenotypes. To ask whether our current view of aging has been affected by selection bias, and to deepen our understanding of known longevity pathways, we screened a genomic Caenorhabditis elegans RNAi library for clones that extend lifespan. We identified 23 new longevity genes affecting signal transduction, the stress response, gene expression, and metabolism and assigned these genes to specific longevity pathways. Our most important findings are (i) that dietary restriction extends C. elegans' lifespan by down-regulating expression of key genes, including a gene required for methylation of many macromolecules, (ii) that integrin signaling is likely to play a general, evolutionarily conserved role in lifespan regulation, and (iii) that specific lipophilic hormones may influence lifespan in a DAF-16/FOXO-dependent fashion. Surprisingly, of the new genes that have conserved sequence domains, only one could not be associated with a known longevity pathway. Thus, our current view of the genetics of aging has probably not been distorted substantially by selection bias.

Journal ArticleDOI
TL;DR: It is found that the majority of miRNAs are not essential for the viability or development of C. elegans, and mutations in most miRNA genes do not result in grossly abnormal phenotypes, consistent with the hypothesis that there is significant functional redundancy among mi RNAs or among gene pathways regulated by miRNAAs.
Abstract: MicroRNAs (miRNAs), a large class of short noncoding RNAs found in many plants and animals, often act to post-transcriptionally inhibit gene expression. We report the generation of deletion mutations in 87 miRNA genes in Caenorhabditis elegans, expanding the number of mutated miRNA genes to 95, or 83% of known C. elegans miRNAs. We find that the majority of miRNAs are not essential for the viability or development of C. elegans, and mutations in most miRNA genes do not result in grossly abnormal phenotypes. These observations are consistent with the hypothesis that there is significant functional redundancy among miRNAs or among gene pathways regulated by miRNAs. This study represents the first comprehensive genetic analysis of miRNA function in any organism and provides a unique, permanent resource for the systematic study of miRNAs.

Journal ArticleDOI
TL;DR: It is found that recent adaptation is strikingly pervasive in the human genome, with as much as 10% of the genome affected by linkage to a selective sweep.
Abstract: Identifying genomic locations that have experienced selective sweeps is an important first step toward understanding the molecular basis of adaptive evolution. Using statistical methods that account for the confounding effects of population demography, recombination rate variation, and single-nucleotide polymorphism ascertainment, while also providing fine-scale estimates of the position of the selected site, we analyzed a genomic dataset of 1.2 million human single-nucleotide polymorphisms genotyped in African-American, European-American, and Chinese samples. We identify 101 regions of the human genome with very strong evidence (p < 10(-5)) of a recent selective sweep and where our estimate of the position of the selective sweep falls within 100 kb of a known gene. Within these regions, genes of biological interest include genes in pigmentation pathways, components of the dystrophin protein complex, clusters of olfactory receptors, genes involved in nervous system development and function, immune system genes, and heat shock genes. We also observe consistent evidence of selective sweeps in centromeric regions. In general, we find that recent adaptation is strikingly pervasive in the human genome, with as much as 10% of the genome affected by linkage to a selective sweep.

Journal ArticleDOI
TL;DR: In this article, the authors describe the genomes of eight newly sequenced isolates and combine them with the first four genomes for a comprehensive analysis of the core (shared by all isolates) and flexible genes of the Prochlorococcus group, and the patterns of loss and gain of the flexible genes over the course of evolution.
Abstract: Prochlorococcus is a marine cyanobacterium that numerically dominates the mid-latitude oceans and is the smallest known oxygenic phototroph. Numerous isolates from diverse areas of the world’s oceans have been studied and shown to be physiologically and genetically distinct. All isolates described thus far can be assigned to either a tightly clustered high-light (HL)-adapted clade, or a more divergent low-light (LL)-adapted group. The 16S rRNA sequences of the entire Prochlorococcus group differ by at most 3%, and the four initially published genomes revealed patterns of genetic differentiation that help explain physiological differences among the isolates. Here we describe the genomes of eight newly sequenced isolates and combine them with the first four genomes for a comprehensive analysis of the core (shared by all isolates) and flexible genes of the Prochlorococcus group, and the patterns of loss and gain of the flexible genes over the course of evolution. There are 1,273 genes that represent the core shared by all 12 genomes. They are apparently sufficient, according to metabolic reconstruction, to encode a functional cell. We describe a phylogeny for all 12 isolates by subjecting their complete proteomes to three different phylogenetic analyses. For each non-core gene, we used a maximum parsimony method to estimate which ancestor likely first acquired or lost each gene. Many of the genetic differences among isolates, especially for genes involved in outer membrane synthesis and nutrient transport, are found within the same clade. Nevertheless, we identified some genes defining HL and LL ecotypes, and clades within these broad ecotypes, helping to demonstrate the basis of HL and LL adaptations in Prochlorococcus. Furthermore, our estimates of gene gain events allow us to identify highly variable genomic islands that are not apparent through simple pairwise comparisons. These results emphasize the functional roles, especially those connected to outer membrane synthesis and transport that dominate the flexible genome and set it apart from the core. Besides identifying islands and demonstrating their role throughout the history of Prochlorococcus, reconstruction of past gene gains and losses shows that much of the variability exists at the ‘‘leaves of the tree,’’ between the most closely related strains. Finally, the identification of core and flexible genes from this 12-genome comparison is largely consistent with the relative frequency of Prochlorococcus genes found in global ocean metagenomic databases, further closing the gap between our understanding of these organisms in the lab and the wild.

Journal ArticleDOI
TL;DR: Surprisingly, the results demonstrate that yellow skin does not originate from the red junglefowl, the presumed sole wild ancestor of the domestic chicken, but most likely from the closely related grey Junglefowl (Gallus sonneratii).
Abstract: Yellow skin is an abundant phenotype among domestic chickens and is caused by a recessive allele (W*Y) that allows deposition of yellow carotenoids in the skin. Here we show that yellow skin is caused by one or more cis-acting and tissue-specific regulatory mutation(s) that inhibit expression of BCDO2 (beta-carotene dioxygenase 2) in skin. Our data imply that carotenoids are taken up from the circulation in both genotypes but are degraded by BCDO2 in skin from animals carrying the white skin allele (W*W). Surprisingly, our results demonstrate that yellow skin does not originate from the red junglefowl (Gallus gallus), the presumed sole wild ancestor of the domestic chicken, but most likely from the closely related grey junglefowl (Gallus sonneratii). This is the first conclusive evidence for a hybrid origin of the domestic chicken, and it has important implications for our views of the domestication process.

Journal ArticleDOI
TL;DR: Comparative genomics found 202 genomic elements that are highly conserved in vertebrates but show evidence of significantly accelerated substitution rates in human, and found the five most accelerated elements are dramatically changed in human but not in other primates.
Abstract: Comparative genomics allow us to search the human genome for segments that were extensively changed in the last approximately 5 million years since divergence from our common ancestor with chimpanzee, but are highly conserved in other species and thus are likely to be functional. We found 202 genomic elements that are highly conserved in vertebrates but show evidence of significantly accelerated substitution rates in human. These are mostly in non-coding DNA, often near genes associated with transcription and DNA binding. Resequencing confirmed that the five most accelerated elements are dramatically changed in human but not in other primates, with seven times more substitutions in human than in chimp. The accelerated elements, and in particular the top five, show a strong bias for adenine and thymine to guanine and cytosine nucleotide changes and are disproportionately located in high recombination and high guanine and cytosine content environments near telomeres, suggesting either biased gene conversion or isochore selection. In addition, there is some evidence of directional selection in the regions containing the two most accelerated regions. A combination of evolutionary forces has contributed to accelerated evolution of the fastest evolving elements in the human genome.

Journal ArticleDOI
TL;DR: This work uses microarray and genetic marker data from an F2 mouse intercross to examine the large-scale organization of the gene co-expression network in liver, and annotates several gene modules in terms of 22 physiological traits.
Abstract: Systems biology approaches that are based on the genetics of gene expression have been fruitful in identifying genetic regulatory loci related to complex traits. We use microarray and genetic marker data from an F2 mouse intercross to examine the large-scale organization of the gene co-expression network in liver, and annotate several gene modules in terms of 22 physiological traits. We identify chromosomal loci (referred to as module quantitative trait loci, mQTL) that perturb the modules and describe a novel approach that integrates network properties with genetic marker information to model gene/trait relationships. Specifically, using the mQTL and the intramodular connectivity of a body weight–related module, we describe which factors determine the relationship between gene expression profiles and weight. Our approach results in the identification of genetic targets that influence gene modules (pathways) that are related to the clinical phenotypes of interest.

Journal ArticleDOI
TL;DR: It is concluded that although there are clearly strains of S. cerevisiae specialized for the production of alcoholic beverages, these have been derived from natural populations unassociated with alcoholic beverage production, rather than the opposite.
Abstract: Saccharomyces cerevisiae is predominantly found in association with human activities, particularly the production of alcoholic beverages. S. paradoxus, the closest known relative of S. cerevisiae, is commonly found on exudates and bark of deciduous trees and in associated soils. This has lead to the idea that S. cerevisiae is a domesticated species, specialized for the fermentation of alcoholic beverages, and isolates of S. cerevisiae from other sources simply represent migrants from these fermentations. We have surveyed DNA sequence diversity at five loci in 81 strains of S. cerevisiae that were isolated from a variety of human and natural fermentations as well as sources unrelated to alcoholic beverage production, such as tree exudates and immunocompromised patients. Diversity within vineyard

Journal ArticleDOI
TL;DR: The feasibility of genome-wide association mapping in A. thaliana was tested by searching for associations with flowering time and pathogen resistance in a sample of 95 accessions for which genome- wide polymorphism data were available, and known major genes were identified for all phenotypes tested.
Abstract: There is currently tremendous interest in the possibility of using genome-wide association mapping to identify genes responsible for natural variation, particularly for human disease susceptibility. The model plant Arabidopsis thaliana is in many ways an ideal candidate for such studies, because it is a highly selfing hermaphrodite. As a result, the species largely exists as a collection of naturally occurring inbred lines, or accessions, which can be genotyped once and phenotyped repeatedly. Furthermore, linkage disequilibrium in such a species will be much more extensive than in a comparable outcrossing species. We tested the feasibility of genome-wide association mapping in A. thaliana by searching for associations with flowering time and pathogen resistance in a sample of 95 accessions for which genome-wide polymorphism data were available. In spite of an extremely high rate of false positives due to population structure, we were able to identify known major genes for all phenotypes tested, thus demonstrating the potential of genome-wide association mapping in A. thaliana and other species with similar patterns of variation. The rate of false positives differed strongly between traits, with more clinal traits showing the highest rate. However, the false positive rates were always substantial regardless of the trait, highlighting the necessity of an appropriate genomic control in association studies.

Journal ArticleDOI
TL;DR: It is found that only 22%–24% of the bona fide human ERα binding sites were overlapping conserved regions in whole genome vertebrate alignments, which suggest limited conservation of functional binding sites.
Abstract: Using a chromatin immunoprecipitation-paired end diTag cloning and sequencing strategy, we mapped estrogen receptor alpha (ERalpha) binding sites in MCF-7 breast cancer cells. We identified 1,234 high confidence binding clusters of which 94% are projected to be bona fide ERalpha binding regions. Only 5% of the mapped estrogen receptor binding sites are located within 5 kb upstream of the transcriptional start sites of adjacent genes, regions containing the proximal promoters, whereas vast majority of the sites are mapped to intronic or distal locations (>5 kb from 5' and 3' ends of adjacent transcript), suggesting transcriptional regulatory mechanisms over significant physical distances. Of all the identified sites, 71% harbored putative full estrogen response elements (EREs), 25% bore ERE half sites, and only 4% had no recognizable ERE sequences. Genes in the vicinity of ERalpha binding sites were enriched for regulation by estradiol in MCF-7 cells, and their expression profiles in patient samples segregate ERalpha-positive from ERalpha-negative breast tumors. The expression dynamics of the genes adjacent to ERalpha binding sites suggest a direct induction of gene expression through binding to ERE-like sequences, whereas transcriptional repression by ERalpha appears to be through indirect mechanisms. Our analysis also indicates a number of candidate transcription factor binding sites adjacent to occupied EREs at frequencies much greater than by chance, including the previously reported FOXA1 sites, and demonstrate the potential involvement of one such putative adjacent factor, Sp1, in the global regulation of ERalpha target genes. Unexpectedly, we found that only 22%-24% of the bona fide human ERalpha binding sites were overlapping conserved regions in whole genome vertebrate alignments, which suggest limited conservation of functional binding sites. Taken together, this genome-scale analysis suggests complex but definable rules governing ERalpha binding and gene regulation.

Journal ArticleDOI
TL;DR: The results suggest that insulin-signaling pathways play a role in regulation of aging at any stage in life in Caenorhabditis elegans.
Abstract: Evolutionarily conserved mechanisms that control aging are predicted to have prereproductive functions in order to be subject to natural selection. Genes that are essential for growth and development are highly conserved in evolution, but their role in longevity has not previously been assessed. We screened 2,700 genes essential for Caenorhabditis elegans development and identified 64 genes that extend lifespan when inactivated postdevelopmentally. These candidate lifespan regulators are highly conserved from yeast to humans. Classification of the candidate lifespan regulators into functional groups identified the expected insulin and metabolic pathways but also revealed enrichment for translation, RNA, and chromatin factors. Many of these essential gene inactivations extend lifespan as much as the strongest known regulators of aging. Early gene inactivations of these essential genes caused growth arrest at larval stages, and some of these arrested animals live much longer than wild-type adults. daf-16 is required for the enhanced survival of arrested larvae, suggesting that the increased longevity is a physiological response to the essential gene inactivation. These results suggest that insulin-signaling pathways play a role in regulation of aging at any stage in life.