scispace - formally typeset
Search or ask a question

Showing papers in "PLOS Genetics in 2009"


Journal ArticleDOI
TL;DR: It is found that imputation accuracy can be greatly enhanced by expanding the reference panel to contain thousands of chromosomes and that IMPUTE v2 outperforms other methods in this setting at both rare and common SNPs, with overall error rates that are 15%–20% lower than those of the closest competing method.
Abstract: Genotype imputation methods are now being widely used in the analysis of genome-wide association studies. Most imputation analyses to date have used the HapMap as a reference dataset, but new reference panels (such as controls genotyped on multiple SNP chips and densely typed samples from the 1,000 Genomes Project) will soon allow a broader range of SNPs to be imputed with higher accuracy, thereby increasing power. We describe a genotype imputation method (IMPUTE version 2) that is designed to address the challenges presented by these new datasets. The main innovation of our approach is a flexible modelling framework that increases accuracy and combines information across multiple reference panels while remaining computationally feasible. We find that IMPUTE v2 attains higher accuracy than other methods when the HapMap provides the sole reference panel, but that the size of the panel constrains the improvements that can be made. We also find that imputation accuracy can be greatly enhanced by expanding the reference panel to contain thousands of chromosomes and that IMPUTE v2 outperforms other methods in this setting at both rare and common SNPs, with overall error rates that are 15%–20% lower than those of the closest competing method. One particularly challenging aspect of next-generation association studies is to integrate information across multiple reference panels genotyped on different sets of SNPs; we show that our approach to this problem has practical advantages over other suggested solutions.

3,902 citations


Journal ArticleDOI
TL;DR: Combining the demographic model with a previously estimated distribution of selective effects among newly arising amino acid mutations accurately predicts the frequency spectrum of nonsynonymous variants across three continental populations (YRI, CHB, CEU).
Abstract: Demographic models built from genetic data play important roles in illuminating prehistorical events and serving as null models in genome scans for selection. We introduce an inference method based on the joint frequency spectrum of genetic variants within and between populations. For candidate models we numerically compute the expected spectrum using a diffusion approximation to the one-locus, two-allele Wright-Fisher process, involving up to three simultaneous populations. Our approach is a composite likelihood scheme, since linkage between neutral loci alters the variance but not the expectation of the frequency spectrum. We thus use bootstraps incorporating linkage to estimate uncertainties for parameters and significance values for hypothesis tests. Our method can also incorporate selection on single sites, predicting the joint distribution of selected alleles among populations experiencing a bevy of evolutionary forces, including expansions, contractions, migrations, and admixture. We model human expansion out of Africa and the settlement of the New World, using 5 Mb of noncoding DNA resequenced in 68 individuals from 4 populations (YRI, CHB, CEU, and MXL) by the Environmental Genome Project. We infer divergence between West African and Eurasian populations 140 thousand years ago (95% confidence interval: 40–270 kya). This is earlier than other genetic studies, in part because we incorporate migration. We estimate the European (CEU) and East Asian (CHB) divergence time to be 23 kya (95% c.i.: 17–43 kya), long after archeological evidence places modern humans in Europe. Finally, we estimate divergence between East Asians (CHB) and Mexican-Americans (MXL) of 22 kya (95% c.i.: 16.3–26.9 kya), and our analysis yields no evidence for subsequent migration. Furthermore, combining our demographic model with a previously estimated distribution of selective effects among newly arising amino acid mutations accurately predicts the frequency spectrum of nonsynonymous variants across three continental populations (YRI, CHB, CEU).

1,636 citations


Journal ArticleDOI
TL;DR: An important adaptive role for metabolism diversification within group B2 and Shigella strains is found, but few or no extraint intestinal virulence-specific genes are identified, which could render difficult the development of a vaccine against extraintestinal infections.
Abstract: The Escherichia coli species represents one of the best-studied model organisms, but also encompasses a variety of commensal and pathogenic strains that diversify by high rates of genetic change. We uniformly (re-) annotated the genomes of 20 commensal and pathogenic E. coli strains and one strain of E. fergusonii (the closest E. coli related species), including seven that we sequenced to completion. Within the approximately 18,000 families of orthologous genes, we found approximately 2,000 common to all strains. Although recombination rates are much higher than mutation rates, we show, both theoretically and using phylogenetic inference, that this does not obscure the phylogenetic signal, which places the B2 phylogenetic group and one group D strain at the basal position. Based on this phylogeny, we inferred past evolutionary events of gain and loss of genes, identifying functional classes under opposite selection pressures. We found an important adaptive role for metabolism diversification within group B2 and Shigella strains, but identified few or no extraintestinal virulence-specific genes, which could render difficult the development of a vaccine against extraintestinal infections. Genome flux in E. coli is confined to a small number of conserved positions in the chromosome, which most often are not associated with integrases or tRNA genes. Core genes flanking some of these regions show higher rates of recombination, suggesting that a gene, once acquired by a strain, spreads within the species by homologous recombination at the flanking genes. Finally, the genome's long-scale structure of recombination indicates lower recombination rates, but not higher mutation rates, at the terminus of replication. The ensuing effect of background selection and biased gene conversion may thus explain why this region is A+T-rich and shows high sequence divergence but low sequence polymorphism. Overall, despite a very high gene flow, genes co-exist in an organised genome.

1,213 citations


Journal ArticleDOI
TL;DR: It is demonstrated that resequencing studies can identify important genetic associations, provided that specialised analysis methods, such as the weighted-sum method, are used.
Abstract: Resequencing is an emerging tool for identification of rare disease-associated mutations. Rare mutations are difficult to tag with SNP genotyping, as genotyping studies are designed to detect common variants. However, studies have shown that genetic heterogeneity is a probable scenario for common diseases, in which multiple rare mutations together explain a large proportion of the genetic basis for the disease. Thus, we propose a weighted-sum method to jointly analyse a group of mutations in order to test for groupwise association with disease status. For example, such a group of mutations may result from resequencing a gene. We compare the proposed weighted-sum method to alternative methods and show that it is powerful for identifying disease-associated genes, both on simulated and Encode data. Using the weighted-sum method, a resequencing study can identify a disease-associated gene with an overall population attributable risk (PAR) of 2%, even when each individual mutation has much lower PAR, using 1,000 to 7,000 affected and unaffected individuals, depending on the underlying genetic model. This study thus demonstrates that resequencing studies can identify important genetic associations, provided that specialised analysis methods, such as the weighted-sum method, are used.

1,092 citations


Journal ArticleDOI
TL;DR: This work provides novel insight into the role of aging and the environment in susceptibility to diseases such as cancer and critically informs the field of epigenomics by providing evidence of epigenetic dysregulation by age-related methylation alterations.
Abstract: Epigenetic control of gene transcription is critical for normal human development and cellular differentiation. While alterations of epigenetic marks such as DNA methylation have been linked to cancers and many other human diseases, interindividual epigenetic variations in normal tissues due to aging, environmental factors, or innate susceptibility are poorly characterized. The plasticity, tissue-specific nature, and variability of gene expression are related to epigenomic states that vary across individuals. Thus, population-based investigations are needed to further our understanding of the fundamental dynamics of normal individual epigenomes. We analyzed 217 non-pathologic human tissues from 10 anatomic sites at 1,413 autosomal CpG loci associated with 773 genes to investigate tissue-specific differences in DNA methylation and to discern how aging and exposures contribute to normal variation in methylation. Methylation profile classes derived from unsupervised modeling were significantly associated with age (P<0.0001) and were significant predictors of tissue origin (P<0.0001). In solid tissues (n = 119) we found striking, highly significant CpG island-dependent correlations between age and methylation; loci in CpG islands gained methylation with age, loci not in CpG islands lost methylation with age (P<0.001), and this pattern was consistent across tissues and in an analysis of blood-derived DNA. Our data clearly demonstrate age- and exposure-related differences in tissue-specific methylation and significant age-associated methylation patterns which are CpG island context-dependent. This work provides novel insight into the role of aging and the environment in susceptibility to diseases such as cancer and critically informs the field of epigenomics by providing evidence of epigenetic dysregulation by age-related methylation alterations. Collectively we reveal key issues to consider both in the construction of reference and disease-related epigenomes and in the interpretation of potentially pathologically important alterations.

1,005 citations


Journal ArticleDOI
TL;DR: It is proposed that breakage of replication forks in stressed cells that are deficient in homologous recombination induces an aberrant repair process with features of break-induced replication (BIR) that will anneal with microhomology on any single-stranded DNA nearby, priming low-processivity polymerization with multiple template switches generating complex rearrangements, and eventual re-establishment of processive replication.
Abstract: Chromosome structural changes with nonrecurrent endpoints associated with genomic disorders offer windows into the mechanism of origin of copy number variation (CNV). A recent report of nonrecurrent duplications associated with Pelizaeus-Merzbacher disease identified three distinctive characteristics. First, the majority of events can be seen to be complex, showing discontinuous duplications mixed with deletions, inverted duplications, and triplications. Second, junctions at endpoints show microhomology of 2–5 base pairs (bp). Third, endpoints occur near pre-existing low copy repeats (LCRs). Using these observations and evidence from DNA repair in other organisms, we derive a model of microhomology-mediated break-induced replication (MMBIR) for the origin of CNV and, ultimately, of LCRs. We propose that breakage of replication forks in stressed cells that are deficient in homologous recombination induces an aberrant repair process with features of break-induced replication (BIR). Under these circumstances, single-strand 3′ tails from broken replication forks will anneal with microhomology on any single-stranded DNA nearby, priming low-processivity polymerization with multiple template switches generating complex rearrangements, and eventual re-establishment of processive replication.

763 citations


Journal ArticleDOI
TL;DR: The demonstration that numerous epialleles across the genome can be stable over many generations in the absence of selection or extensive DNA sequence variation highlights the need to integrate epigenetic information into population genetics studies.
Abstract: Loss or gain of DNA methylation can affect gene expression and is sometimes transmitted across generations. Such epigenetic alterations are thus a possible source of heritable phenotypic variation in the absence of DNA sequence change. However, attempts to assess the prevalence of stable epigenetic variation in natural and experimental populations and to quantify its impact on complex traits have been hampered by the confounding effects of DNA sequence polymorphisms. To overcome this problem as much as possible, two parents with little DNA sequence differences, but contrasting DNA methylation profiles, were used to derive a panel of epigenetic Recombinant Inbred Lines (epiRILs) in the reference plant Arabidopsis thaliana. The epiRILs showed variation and high heritability for flowering time and plant height (~30%), as well as stable inheritance of multiple parental DNA methylation variants (epialleles) over at least eight generations. These findings provide a first rationale to identify epiallelic variants that contribute to heritable variation in complex traits using linkage or association studies. More generally, the demonstration that numerous epialleles across the genome can be stable over many generations in the absence of selection or extensive DNA sequence variation highlights the need to integrate epigenetic information into population genetics studies.

743 citations


Journal ArticleDOI
TL;DR: A genome-wide association study in a homogenous case-control cohort from Bergen, Norway and evaluated the top 100 single nucleotide polymorphisms (SNPs) in the family-based International COPD Genetics Network found two SNPs at the α-nicotinic acetylcholine receptor (CHRNA 3/5) locus showed unambiguous replication and were significantly associated with lung function in both the ICGN and Boston Early-Onset COPD populations.
Abstract: There is considerable variability in the susceptibility of smokers to develop chronic obstructive pulmonary disease (COPD). The only known genetic risk factor is severe deficiency of alpha(1)-antitrypsin, which is present in 1-2% of individuals with COPD. We conducted a genome-wide association study (GWAS) in a homogenous case-control cohort from Bergen, Norway (823 COPD cases and 810 smoking controls) and evaluated the top 100 single nucleotide polymorphisms (SNPs) in the family-based International COPD Genetics Network (ICGN; 1891 Caucasian individuals from 606 pedigrees) study. The polymorphisms that showed replication were further evaluated in 389 subjects from the US National Emphysema Treatment Trial (NETT) and 472 controls from the Normative Aging Study (NAS) and then in a fourth cohort of 949 individuals from 127 extended pedigrees from the Boston Early-Onset COPD population. Logistic regression models with adjustments of covariates were used to analyze the case-control populations. Family-based association analyses were conducted for a diagnosis of COPD and lung function in the family populations. Two SNPs at the alpha-nicotinic acetylcholine receptor (CHRNA 3/5) locus were identified in the genome-wide association study. They showed unambiguous replication in the ICGN family-based analysis and in the NETT case-control analysis with combined p-values of 1.48 x 10(-10), (rs8034191) and 5.74 x 10(-10) (rs1051730). Furthermore, these SNPs were significantly associated with lung function in both the ICGN and Boston Early-Onset COPD populations. The C allele of the rs8034191 SNP was estimated to have a population attributable risk for COPD of 12.2%. The association of hedgehog interacting protein (HHIP) locus on chromosome 4 was also consistently replicated, but did not reach genome-wide significance levels. Genome-wide significant association of the HHIP locus with lung function was identified in the Framingham Heart study (Wilk et al., companion article in this issue of PLoS Genetics; doi:10.1371/journal.pgen.1000429). The CHRNA 3/5 and the HHIP loci make a significant contribution to the risk of COPD. CHRNA3/5 is the same locus that has been implicated in the risk of lung cancer.

723 citations


Journal ArticleDOI
TL;DR: It is shown that an historic emphasis, both phenotypically and technically, on mutations in protein-coding sequences, and by presumptions about the nature of regulatory mutations, show that most variations in regulatory sequences produce relatively subtle phenotypic changes, in contrast to mutations in proteins that frequently cause catastrophic component failure.
Abstract: The majority of the genome in animals and plants is transcribed in a developmentally regulated manner to produce large numbers of non–protein-coding RNAs (ncRNAs), whose incidence increases with developmental complexity. There is growing evidence that these transcripts are functional, particularly in the regulation of epigenetic processes, leading to the suggestion that they compose a hitherto hidden layer of genomic programming in humans and other complex organisms. However, to date, very few have been identified in genetic screens. Here I show that this is explicable by an historic emphasis, both phenotypically and technically, on mutations in protein-coding sequences, and by presumptions about the nature of regulatory mutations. Most variations in regulatory sequences produce relatively subtle phenotypic changes, in contrast to mutations in protein-coding sequences that frequently cause catastrophic component failure. Until recently, most mapping projects have focused on protein-coding sequences, and the limited number of identified regulatory mutations have been interpreted as affecting conventional cis-acting promoter and enhancer elements, although these regions are often themselves transcribed. Moreover, ncRNA-directed regulatory circuits underpin most, if not all, complex genetic phenomena in eukaryotes, including RNA interference-related processes such as transcriptional and post-transcriptional gene silencing, position effect variegation, hybrid dysgenesis, chromosome dosage compensation, parental imprinting and allelic exclusion, paramutation, and possibly transvection and transinduction. The next frontier is the identification and functional characterization of the myriad sequence variations that influence quantitative traits, disease susceptibility, and other complex characteristics, which are being shown by genome-wide association studies to lie mostly in noncoding, presumably regulatory, regions. There is every possibility that many of these variations will alter the interactions between regulatory RNAs and their targets, a prospect that should be borne in mind in future functional analyses.

687 citations


Journal ArticleDOI
Cecilia M. Lindgren1, Iris M. Heid2, Joshua C. Randall1, Claudia Lamina3  +152 moreInstitutions (36)
TL;DR: By focusing on anthropometric measures of central obesity and fat distribution, a meta-analysis of 16 genome-wide association studies informative for adult waist circumference and waist–hip ratio identified three loci implicated in the regulation of human adiposity.
Abstract: To identify genetic loci influencing central obesity and fat distribution, we performed a meta-analysis of 16 genome-wide association studies (GWAS, N = 38,580) informative for adult waist circumference (WC) and waist-hip ratio (WHR). We selected 26 SNPs for follow-up, for which the evidence of association with measures of central adiposity (WC and/or WHR) was strong and disproportionate to that for overall adiposity or height. Follow-up studies in a maximum of 70,689 individuals identified two loci strongly associated with measures of central adiposity; these map near TFAP2B (WC, P = 1.9x10(-11)) and MSRA (WC, P = 8.9x10(-9)). A third locus, near LYPLAL1, was associated with WHR in women only (P = 2.6x10(-8)). The variants near TFAP2B appear to influence central adiposity through an effect on overall obesity/fat-mass, whereas LYPLAL1 displays a strong female-only association with fat distribution. By focusing on anthropometric measures of central obesity and fat distribution, we have identified three loci implicated in the regulation of human adiposity.

648 citations


Journal ArticleDOI
TL;DR: These studies illustrate the importance of time sampling with respect to multiple testing, suggest caution in use of autonomous cellular models to study clock output, and demonstrate the existence of harmonics of circadian gene expression in the mouse.
Abstract: The circadian clock is a molecular and cellular oscillator found in most mammalian tissues that regulates rhythmic physiology and behavior. Numerous investigations have addressed the contribution of circadian rhythmicity to cellular, organ, and organismal physiology. We recently developed a method to look at transcriptional oscillations with unprecedented precision and accuracy using high-density time sampling. Here, we report a comparison of oscillating transcription from mouse liver, NIH3T3, and U2OS cells. Several surprising observations resulted from this study, including a 100-fold difference in the number of cycling transcripts in autonomous cellular models of the oscillator versus tissues harvested from intact mice. Strikingly, we found two clusters of genes that cycle at the second and third harmonic of circadian rhythmicity in liver, but not cultured cells. Validation experiments show that 12-hour oscillatory transcripts occur in several other peripheral tissues as well including heart, kidney, and lungs. These harmonics are lost ex vivo, as well as under restricted feeding conditions. Taken in sum, these studies illustrate the importance of time sampling with respect to multiple testing, suggest caution in use of autonomous cellular models to study clock output, and demonstrate the existence of harmonics of circadian gene expression in the mouse.

Journal ArticleDOI
TL;DR: It is argued that the statistical power to detect a causative variant should be the major criterion in study design and that, when taking budgetary considerations into account, the most powerful design may not always correspond to the chip with the highest coverage.
Abstract: Genome-wide association studies are revolutionizing the search for the genes underlying human complex diseases. The main decisions to be made at the design stage of these studies are the choice of the commercial genotyping chip to be used and the numbers of case and control samples to be genotyped. The most common method of comparing different chips is using a measure of coverage, but this fails to properly account for the effects of sample size, the genetic model of the disease, and linkage disequilibrium between SNPs. In this paper, we argue that the statistical power to detect a causative variant should be the major criterion in study design. Because of the complicated pattern of linkage disequilibrium (LD) in the human genome, power cannot be calculated analytically and must instead be assessed by simulation. We describe in detail a method of simulating case-control samples at a set of linked SNPs that replicates the patterns of LD in human populations, and we used it to assess power for a comprehensive set of available genotyping chips. Our results allow us to compare the performance of the chips to detect variants with different effect sizes and allele frequencies, look at how power changes with sample size in different populations or when using multi-marker tags and genotype imputation approaches, and how performance compares to a hypothetical chip that contains every SNP in HapMap. A main conclusion of this study is that marked differences in genome coverage may not translate into appreciable differences in power and that, when taking budgetary considerations into account, the most powerful design may not always correspond to the chip with the highest coverage. We also show that genotype imputation can be used to boost the power of many chips up to the level obtained from a hypothetical “complete” chip containing all the SNPs in HapMap. Our results have been encapsulated into an R software package that allows users to design future association studies and our methods provide a framework with which new chip sets can be evaluated.

Journal ArticleDOI
TL;DR: The first genome-wide association study (GWAS) whose sample size (1,053 Swedish subjects) is sufficiently powered to detect genome- wide significance (p<1.5×10−7) for polymorphisms that modestly alter therapeutic warfarin dose is reported.
Abstract: We report the first genome-wide association study (GWAS) whose sample size (1,053 Swedish subjects) is sufficiently powered to detect genome-wide significance (p<1.5 x 10(-7)) for polymorphisms that modestly alter therapeutic warfarin dose. The anticoagulant drug warfarin is widely prescribed for reducing the risk of stroke, thrombosis, pulmonary embolism, and coronary malfunction. However, Caucasians vary widely (20-fold) in the dose needed for therapeutic anticoagulation, and hence prescribed doses may be too low (risking serious illness) or too high (risking severe bleeding). Prior work established that approximately 30% of the dose variance is explained by single nucleotide polymorphisms (SNPs) in the warfarin drug target VKORC1 and another approximately 12% by two non-synonymous SNPs (*2, *3) in the cytochrome P450 warfarin-metabolizing gene CYP2C9. We initially tested each of 325,997 GWAS SNPs for association with warfarin dose by univariate regression and found the strongest statistical signals (p<10(-78)) at SNPs clustering near VKORC1 and the second lowest p-values (p<10(-31)) emanating from CYP2C9. No other SNPs approached genome-wide significance. To enhance detection of weaker effects, we conducted multiple regression adjusting for known influences on warfarin dose (VKORC1, CYP2C9, age, gender) and identified a single SNP (rs2108622) with genome-wide significance (p = 8.3 x 10(-10)) that alters protein coding of the CYP4F2 gene. We confirmed this result in 588 additional Swedish patients (p<0.0029) and, during our investigation, a second group provided independent confirmation from a scan of warfarin-metabolizing genes. We also thoroughly investigated copy number variations, haplotypes, and imputed SNPs, but found no additional highly significant warfarin associations. We present power analysis of our GWAS that is generalizable to other studies, and conclude we had 80% power to detect genome-wide significance for common causative variants or markers explaining at least 1.5% of dose variance. These GWAS results provide further impetus for conducting large-scale trials assessing patient benefit from genotype-based forecasting of warfarin dose.

Journal ArticleDOI
TL;DR: In this article, a meta-analysis of genome-wide association scans from 14 studies with 28,141 participants of European descent was conducted, resulting in identification of 954 SNPs distributed across nine loci that exceeded the threshold of genomewide significance, five of which are novel.
Abstract: Elevated serum uric acid levels cause gout and are a risk factor for cardiovascular disease and diabetes. To investigate the polygenetic basis of serum uric acid levels, we conducted a meta-analysis of genome-wide association scans from 14 studies totalling 28,141 participants of European descent, resulting in identification of 954 SNPs distributed across nine loci that exceeded the threshold of genome-wide significance, five of which are novel. Overall, the common variants associated with serum uric acid levels fall in the following nine regions: SLC2A9 (p = 5.2x10(-201)), ABCG2 (p = 3.1x10(-26)), SLC17A1 (p = 3.0x10(-14)), SLC22A11 (p = 6.7x10(-14)), SLC22A12 (p = 2.0x10(-9)), SLC16A9 (p = 1.1x10(-8)), GCKR (p = 1.4x10(-9)), LRRC16A (p = 8.5x10(-9)), and near PDZK1 (p = 2.7x10(-9)). Identified variants were analyzed for gender differences. We found that the minor allele for rs734553 in SLC2A9 has greater influence in lowering uric acid levels in women and the minor allele of rs2231142 in ABCG2 elevates uric acid levels more strongly in men compared to women. To further characterize the identified variants, we analyzed their association with a panel of metabolites. rs12356193 within SLC16A9 was associated with DL-carnitine (p = 4.0x10(-26)) and propionyl-L-carnitine (p = 5.0x10(-8)) concentrations, which in turn were associated with serum UA levels (p = 1.4x10(-57) and p = 8.1x10(-54), respectively), forming a triangle between SNP, metabolites, and UA levels. Taken together, these associations highlight additional pathways that are important in the regulation of serum uric acid levels and point toward novel potential targets for pharmacological intervention to prevent or treat hyperuricemia. In addition, these findings strongly support the hypothesis that transport proteins are key in regulating serum uric acid levels.

Journal ArticleDOI
TL;DR: The first panel of MAGIC lines developed is presented, a set of 527 recombinant inbred lines descended from a heterogeneous stock of 19 intermated accessions of the plant Arabidopsis thaliana, and it is shown how the power to detect a QTL and the mapping accuracy vary, depending on QTL location.
Abstract: Identifying natural allelic variation that underlies quantitative trait variation remains a fundamental problem in genetics. Most studies have employed either simple synthetic populations with restricted allelic variation or performed association mapping on a sample of naturally occurring haplotypes. Both of these approaches have some limitations, therefore alternative resources for the genetic dissection of complex traits continue to be sought. Here we describe one such alternative, the Multiparent Advanced Generation Inter-Cross (MAGIC). This approach is expected to improve the precision with which QTL can be mapped, improving the outlook for QTL cloning. Here, we present the first panel of MAGIC lines developed: a set of 527 recombinant inbred lines (RILs) descended from a heterogeneous stock of 19 intermated accessions of the plant Arabidopsis thaliana. These lines and the 19 founders were genotyped with 1,260 single nucleotide polymorphisms and phenotyped for development-related traits. Analytical methods were developed to fine-map quantitative trait loci (QTL) in the MAGIC lines by reconstructing the genome of each line as a mosaic of the founders. We show by simulation that QTL explaining 10% of the phenotypic variance will be detected in most situations with an average mapping error of about 300 kb, and that if the number of lines were doubled the mapping error would be under 200 kb. We also show how the power to detect a QTL and the mapping accuracy vary, depending on QTL location. We demonstrate the utility of this new mapping population by mapping several known QTL with high precision and by finding novel QTL for germination data and bolting time. Our results provide strong support for similar ongoing efforts to produce MAGIC lines in other organisms.

Journal ArticleDOI
TL;DR: HAPMIX will be of particular utility for mapping disease genes in recently admixed populations, as its accurate estimates of local ancestry permit admixture and case-control association signals to be combined, enabling more powerful tests of association than with either signal alone.
Abstract: Identifying the ancestry of chromosomal segments of distinct ancestry has a wide range of applications from disease mapping to learning about history. Most methods require the use of unlinked markers; but, using all markers from genome-wide scanning arrays, it should in principle be possible to infer the ancestry of even very small segments with exquisite accuracy. We describe a method, HAPMIX, which employs an explicit population genetic model to perform such local ancestry inference based on fine-scale variation data. We show that HAPMIX outperforms other methods, and we explore its utility for inferring ancestry, learning about ancestral populations, and inferring dates of admixture. We validate the method empirically by applying it to populations that have experienced recent and ancient admixture: 935 African Americans from the United States and 29 Mozabites from North Africa. HAPMIX will be of particular utility for mapping disease genes in recently admixed populations, as its accurate estimates of local ancestry permit admixture and case-control association signals to be combined, enabling more powerful tests of association than with either signal alone.

Journal ArticleDOI
TL;DR: Signaling allocation analysis showed that, contrary to current ideas, each of the JA, ET, and SA signaling sectors can positively contribute to immunity against both biotrophic and necrotrophic pathogens.
Abstract: Two modes of plant immunity against biotrophic pathogens, Effector Triggered Immunity (ETI) and Pattern-Triggered Immunity (PTI), are triggered by recognition of pathogen effectors and Microbe-Associated Molecular Patterns (MAMPs), respectively. Although the jasmonic acid (JA)/ethylene (ET) and salicylic acid (SA) signaling sectors are generally antagonistic and important for immunity against necrotrophic and biotrophic pathogens, respectively, their precise roles and interactions in ETI and PTI have not been clear. We constructed an Arabidopsis dde2/ein2/pad4/sid2-quadruple mutant. DDE2, EIN2, and SID2 are essential components of the JA, ET, and SA sectors, respectively. The pad4 mutation affects the SA sector and a poorly characterized sector. Although the ETI triggered by the bacterial effector AvrRpt2 (AvrRpt2-ETI) and the PTI triggered by the bacterial MAMP flg22 (flg22-PTI) were largely intact in plants with mutations in any one of these genes, they were mostly abolished in the quadruple mutant. For the purposes of this study, AvrRpt2-ETI and flg22-PTI were measured as relative growth of Pseudomonas syringae bacteria within leaves. Immunity to the necrotrophic fungal pathogen Alternaria brassicicola was also severely compromised in the quadruple mutant. Quantitative measurements of the immunity levels in all combinatorial mutants and wild type allowed us to estimate the effects of the wild-type genes and their interactions on the immunity by fitting a mixed general linear model. This signaling allocation analysis showed that, contrary to current ideas, each of the JA, ET, and SA signaling sectors can positively contribute to immunity against both biotrophic and necrotrophic pathogens. The analysis also revealed that while flg22-PTI and AvrRpt2-ETI use a highly overlapping signaling network, the way they use the common network is very different: synergistic relationships among the signaling sectors are evident in PTI, which may amplify the signal; compensatory relationships among the sectors dominate in ETI, explaining the robustness of ETI against genetic and pathogenic perturbations.

Journal ArticleDOI
TL;DR: A level of structural diversity between the inbred lines B73 and Mo17 that is unprecedented among higher eukaryotes is revealed, and hundreds of single-copy, expressed genes may contribute to heterosis and to the extraordinary phenotypic diversity of this important crop.
Abstract: Following the domestication of maize over the past ∼10,000 years, breeders have exploited the extensive genetic diversity of this species to mold its phenotype to meet human needs. The extent of structural variation, including copy number variation (CNV) and presence/absence variation (PAV), which are thought to contribute to the extraordinary phenotypic diversity and plasticity of this important crop, have not been elucidated. Whole-genome, array-based, comparative genomic hybridization (CGH) revealed a level of structural diversity between the inbred lines B73 and Mo17 that is unprecedented among higher eukaryotes. A detailed analysis of altered segments of DNA conservatively estimates that there are several hundred CNV sequences among the two genotypes, as well as several thousand PAV sequences that are present in B73 but not Mo17. Haplotype-specific PAVs contain hundreds of single-copy, expressed genes that may contribute to heterosis and to the extraordinary phenotypic diversity of this important crop.

Journal ArticleDOI
Gil McVean1
TL;DR: For SNP data the projection of samples onto the principal components can be obtained directly from considering the average coalescent times between pairs of haploid genomes, which provides a framework for interpreting PCA projections in terms of underlying processes, including migration, geographical isolation, and admixture.
Abstract: Principal components analysis, PCA, is a statistical method commonly used in population genetics to identify structure in the distribution of genetic variation across geographical location and ethnic background. However, while the method is often used to inform about historical demographic processes, little is known about the relationship between fundamental demographic parameters and the projection of samples onto the primary axes. Here I show that for SNP data the projection of samples onto the principal components can be obtained directly from considering the average coalescent times between pairs of haploid genomes. The result provides a framework for interpreting PCA projections in terms of underlying processes, including migration, geographical isolation, and admixture. I also demonstrate a link between PCA and Wright's f(st) and show that SNP ascertainment has a largely simple and predictable effect on the projection of samples. Using examples from human genetics, I discuss the application of these results to empirical data and the implications for inference.

Journal ArticleDOI
TL;DR: In this article, a single MHC vitamin D response element (VDRE) was found to interact with inherited factors and sought responsive regulatory elements in the MHC class II region.
Abstract: Multiple sclerosis (MS) is a complex trait in which allelic variation in the MHC class II region exerts the single strongest effect on genetic risk. Epidemiological data in MS provide strong evidence that environmental factors act at a population level to influence the unusual geographical distribution of this disease. Growing evidence implicates sunlight or vitamin D as a key environmental factor in aetiology. We hypothesised that this environmental candidate might interact with inherited factors and sought responsive regulatory elements in the MHC class II region. Sequence analysis localised a single MHC vitamin D response element (VDRE) to the promoter region of HLA-DRB1. Sequencing of this promoter in greater than 1,000 chromosomes from HLA-DRB1 homozygotes showed absolute conservation of this putative VDRE on HLA-DRB1*15 haplotypes. In contrast, there was striking variation among non-MS-associated haplotypes. Electrophoretic mobility shift assays showed specific recruitment of vitamin D receptor to the VDRE in the HLA-DRB1*15 promoter, confirmed by chromatin immunoprecipitation experiments using lymphoblastoid cells homozygous for HLA-DRB1*15. Transient transfection using a luciferase reporter assay showed a functional role for this VDRE. B cells transiently transfected with the HLA-DRB1*15 gene promoter showed increased expression on stimulation with 1,25-dihydroxyvitamin D3 (P = 0.002) that was lost both on deletion of the VDRE or with the homologous "VDRE" sequence found in non-MS-associated HLA-DRB1 haplotypes. Flow cytometric analysis showed a specific increase in the cell surface expression of HLA-DRB1 upon addition of vitamin D only in HLA-DRB1*15 bearing lymphoblastoid cells. This study further implicates vitamin D as a strong environmental candidate in MS by demonstrating direct functional interaction with the major locus determining genetic susceptibility. These findings support a connection between the main epidemiological and genetic features of this disease with major practical implications for studies of disease mechanism and prevention.

Journal ArticleDOI
TL;DR: It is shown that increased oxidative stress caused by deletion of sod genes does not result in decreased lifespan in C. elegans and that deletion of Sod-2 extends worm lifespan by altering mitochondrial function, and the demonstration of decreased oxygen consumption in sod-2 mutant worms.
Abstract: The oxidative stress theory of aging postulates that aging results from the accumulation of molecular damage caused by reactive oxygen species (ROS) generated during normal metabolism. Superoxide dismutases (SODs) counteract this process by detoxifying superoxide. It has previously been shown that elimination of either cytoplasmic or mitochondrial SOD in yeast, flies, and mice results in decreased lifespan. In this experiment, we examine the effect of eliminating each of the five individual sod genes present in Caenorhabditis elegans. In contrast to what is observed in other model organisms, none of the sod deletion mutants shows decreased lifespan compared to wild-type worms, despite a clear increase in sensitivity to paraquat- and juglone-induced oxidative stress. In fact, even mutants lacking combinations of two or three sod genes survive at least as long as wild-type worms. Examination of gene expression in these mutants reveals mild compensatory up-regulation of other sod genes. Interestingly, we find that sod-2 mutants are long-lived despite a significant increase in oxidatively damaged proteins. Testing the effect of sod-2 deletion on known pathways of lifespan extension reveals a clear interaction with genes that affect mitochondrial function: sod-2 deletion markedly increases lifespan in clk-1 worms while clearly decreasing the lifespan of isp-1 worms. Combined with the mitochondrial localization of SOD-2 and the fact that sod-2 mutant worms exhibit phenotypes that are characteristic of long-lived mitochondrial mutants—including slow development, low brood size, and slow defecation—this suggests that deletion of sod-2 extends lifespan through a similar mechanism. This conclusion is supported by our demonstration of decreased oxygen consumption in sod-2 mutant worms. Overall, we show that increased oxidative stress caused by deletion of sod genes does not result in decreased lifespan in C. elegans and that deletion of sod-2 extends worm lifespan by altering mitochondrial function.

Journal ArticleDOI
TL;DR: Findings reveal an active and inducible mechanism of persister formation mediated by the SOS response, challenging the prevailing view that persisters are pre-existing and formed purely by stochastic means.
Abstract: Bacteria can survive antibiotic treatment without acquiring heritable antibiotic resistance. We investigated persistence to the fluoroquinolone ciprofloxacin in Escherichia coli. Our data show that a majority of persisters to ciprofloxacin were formed upon exposure to the antibiotic, in a manner dependent on the SOS gene network. These findings reveal an active and inducible mechanism of persister formation mediated by the SOS response, challenging the prevailing view that persisters are pre-existing and formed purely by stochastic means. SOS-induced persistence is a novel mechanism by which cells can counteract DNA damage and promote survival to fluoroquinolones. This unique survival mechanism may be an important factor influencing the outcome of antibiotic therapy in vivo.

Journal ArticleDOI
TL;DR: A statistical method that takes a list of disease regions and automatically assesses the degree of relatedness of implicated genes using 250,000 PubMed abstracts, and offers a statistically robust approach to identifying functionally related genes from across multiple disease regions—that likely represent key disease pathways.
Abstract: Translating a set of disease regions into insight about pathogenic mechanisms requires not only the ability to identify the key disease genes within them, but also the biological relationships among those key genes. Here we describe a statistical method, Gene Relationships Among Implicated Loci (GRAIL), that takes a list of disease regions and automatically assesses the degree of relatedness of implicated genes using 250,000 PubMed abstracts. We first evaluated GRAIL by assessing its ability to identify subsets of highly related genes in common pathways from validated lipid and height SNP associations from recent genome-wide studies. We then tested GRAIL, by assessing its ability to separate true disease regions from many false positive disease regions in two separate practical applications in human genetics. First, we took 74 nominally associated Crohn's disease SNPs and applied GRAIL to identify a subset of 13 SNPs with highly related genes. Of these, ten convincingly validated in follow-up genotyping; genotyping results for the remaining three were inconclusive. Next, we applied GRAIL to 165 rare deletion events seen in schizophrenia cases (less than one-third of which are contributing to disease risk). We demonstrate that GRAIL is able to identify a subset of 16 deletions containing highly related genes; many of these genes are expressed in the central nervous system and play a role in neuronal synapses. GRAIL offers a statistically robust approach to identifying functionally related genes from across multiple disease regions—that likely represent key disease pathways. An online version of this method is available for public use (http://www.broad.mit.edu/mpg/grail/).

Journal ArticleDOI
TL;DR: These data suggest that very few schizophrenia patients share identical genomic causation, potentially complicating efforts to personalize treatment regimens and support the emerging view that rare deleterious variants may be more important in schizophrenia predisposition than common polymorphisms.
Abstract: We report a genome-wide assessment of single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) in schizophrenia. We investigated SNPs using 871 patients and 863 controls, following up the top hits in four independent cohorts comprising 1,460 patients and 12,995 controls, all of European origin. We found no genome-wide significant associations, nor could we provide support for any previously reported candidate gene or genome-wide associations. We went on to examine CNVs using a subset of 1,013 cases and 1,084 controls of European ancestry, and a further set of 60 cases and 64 controls of African ancestry. We found that eight cases and zero controls carried deletions greater than 2 Mb, of which two, at 8p22 and 16p13.11-p12.4, are newly reported here. A further evaluation of 1,378 controls identified no deletions greater than 2 Mb, suggesting a high prior probability of disease involvement when such deletions are observed in cases. We also provide further evidence for some smaller, previously reported, schizophrenia-associated CNVs, such as those in NRXN1 and APBA2. We could not provide strong support for the hypothesis that schizophrenia patients have a significantly greater “load” of large (>100 kb), rare CNVs, nor could we find common CNVs that associate with schizophrenia. Finally, we did not provide support for the suggestion that schizophrenia-associated CNVs may preferentially disrupt genes in neurodevelopmental pathways. Collectively, these analyses provide the first integrated study of SNPs and CNVs in schizophrenia and support the emerging view that rare deleterious variants may be more important in schizophrenia predisposition than common polymorphisms. While our analyses do not suggest that implicated CNVs impinge on particular key pathways, we do support the contribution of specific genomic regions in schizophrenia, presumably due to recurrent mutation. On balance, these data suggest that very few schizophrenia patients share identical genomic causation, potentially complicating efforts to personalize treatment regimens.

Journal ArticleDOI
TL;DR: It is shown that 88 putative TA system candidates are present in M. tuberculosis, considerably more than previously thought, and that four systems are specifically activated during stresses likely encountered in vivo, including hypoxia and phagocytosis by macrophages.
Abstract: Toxin-antitoxin (TA) systems, stress-responsive genetic elements ubiquitous in microbial genomes, are unusually abundant in the major human pathogen Mycobacterium tuberculosis. Why M. tuberculosis has so many TA systems and what role they play in the unique biology of the pathogen is unknown. To address these questions, we have taken a comprehensive approach to identify and functionally characterize all the TA systems encoded in the M. tuberculosis genome. Here we show that 88 putative TA system candidates are present in M. tuberculosis, considerably more than previously thought. Comparative genomic analysis revealed that the vast majority of these systems are conserved in the M. tuberculosis complex (MTBC), but largely absent from other mycobacteria, including close relatives of M. tuberculosis. We found that many of the M. tuberculosis TA systems are located within discernable genomic islands and were thus likely acquired recently via horizontal gene transfer. We discovered a novel TA system located in the core genome that is conserved across the genus, suggesting that it may fulfill a role common to all mycobacteria. By expressing each of the putative TA systems in M. smegmatis, we demonstrate that 30 encode a functional toxin and its cognate antitoxin. We show that the toxins of the largest family of TA systems, VapBC, act by inhibiting translation via mRNA cleavage. Expression profiling demonstrated that four systems are specifically activated during stresses likely encountered in vivo, including hypoxia and phagocytosis by macrophages. The expansion and maintenance of TA genes in the MTBC, coupled with the finding that a subset is transcriptionally activated by stress, suggests that TA systems are important for M. tuberculosis pathogenesis.

Journal ArticleDOI
TL;DR: A dominant role for selection in shaping genomic diversity and divergence patterns is revealed, long term selection explains the large intragenomic variation in human/chimpanzee divergence, and is a baseline for investigating specific selective events.
Abstract: Selection acting on genomic functional elements can be detected by its indirect effects on population diversity at linked neutral sites. To illuminate the selective forces that shaped hominid evolution, we analyzed the genomic distributions of human polymorphisms and sequence differences among five primate species relative to the locations of conserved sequence features. Neutral sequence diversity in human and ancestral hominid populations is substantially reduced near such features, resulting in a surprisingly large genome average diversity reduction due to selection of 19–26% on the autosomes and 12–40% on the X chromosome. The overall trends are broadly consistent with “background selection” or hitchhiking in ancestral populations acting to remove deleterious variants. Average selection is much stronger on exonic (both protein-coding and untranslated) conserved features than non-exonic features. Long term selection, rather than complex speciation scenarios, explains the large intragenomic variation in human/chimpanzee divergence. Our analyses reveal a dominant role for selection in shaping genomic diversity and divergence patterns, clarify hominid evolution, and provide a baseline for investigating specific selective events.

Journal ArticleDOI
TL;DR: Current hypotheses regarding the biological roles of these evolutionarily successful small operons are discussed and the various selective forces that could drive the maintenance of TA systems in bacterial genomes are considered.
Abstract: Bacterial toxin–antitoxin (TA) systems are diverse and widespread in the prokaryotic kingdom. They are composed of closely linked genes encoding a stable toxin that can harm the host cell and its cognate labile antitoxin, which protects the host from the toxin's deleterious effect. TA systems are thought to invade bacterial genomes through horizontal gene transfer. Some TA systems might behave as selfish elements and favour their own maintenance at the expense of their host. As a consequence, they may contribute to the maintenance of plasmids or genomic islands, such as super-integrons, by post-segregational killing of the cell that loses these genes and so suffers the stable toxin's destructive effect. The function of the chromosomally encoded TA systems is less clear and still open to debate. This Review discusses current hypotheses regarding the biological roles of these evolutionarily successful small operons. We consider the various selective forces that could drive the maintenance of TA systems in bacterial genomes.

Journal ArticleDOI
TL;DR: This paper found that the average allele frequency divergence is highly predictive of the most extreme FST values across the whole genome and that the geographic distribution of putatively selected alleles almost invariably conforms to population clusters identified using randomly chosen genetic markers.
Abstract: Various observations argue for a role of adaptation in recent human evolution, including results from genome-wide studies and analyses of selection signals at candidate genes. Here, we use genome-wide SNP data from the HapMap and CEPH-Human Genome Diversity Panel samples to study the geographic distributions of putatively selected alleles at a range of geographic scales. We find that the average allele frequency divergence is highly predictive of the most extreme FST values across the whole genome. On a broad scale, the geographic distribution of putatively selected alleles almost invariably conforms to population clusters identified using randomly chosen genetic markers. Given this structure, there are surprisingly few fixed or nearly fixed differences between human populations. Among the nearly fixed differences that do exist, nearly all are due to fixation events that occurred outside of Africa, and most appear in East Asia. These patterns suggest that selection is often weak enough that neutral processes—especially population history, migration, and drift—exert powerful influences over the fate and geographic distribution of selected alleles.

Journal ArticleDOI
TL;DR: It is found that, despite the many specific wiring changes documented between these species, the general phenotypes of orthologous transcriptional regulator knockouts are largely conserved, supporting the idea that many wiring changes affect the detailed architecture of the circuit, but not its overall output.
Abstract: Candida albicans is a normal resident of the gastrointestinal tract and also the most prevalent fungal pathogen of humans. It last shared a common ancestor with the model yeast Saccharomyces cerevisiae over 300 million years ago. We describe a collection of 143 genetically matched strains of C. albicans, each of which has been deleted for a specific transcriptional regulator. This collection represents a large fraction of the non-essential transcription circuitry. A phenotypic profile for each mutant was developed using a screen of 55 growth conditions. The results identify the biological roles of many individual transcriptional regulators; for many, this work represents the first description of their functions. For example, a quarter of the strains showed altered colony formation, a phenotype reflecting transitions among yeast, pseudohyphal, and hyphal cell forms. These transitions, which have been closely linked to pathogenesis, have been extensively studied, yet our work nearly doubles the number of transcriptional regulators known to influence them. As a second example, nearly a quarter of the knockout strains affected sensitivity to commonly used antifungal drugs; although a few transcriptional regulators have previously been implicated in susceptibility to these drugs, our work indicates many additional mechanisms of sensitivity and resistance. Finally, our results inform how transcriptional networks evolve. Comparison with the existing S. cerevisiae data (supplemented by additional S. cerevisiae experiments reported here) allows the first systematic analysis of phenotypic conservation by orthologous transcriptional regulators over a large evolutionary distance. We find that, despite the many specific wiring changes documented between these species, the general phenotypes of orthologous transcriptional regulator knockouts are largely conserved. These observations support the idea that many wiring changes affect the detailed architecture of the circuit, but not its overall output.

Journal ArticleDOI
TL;DR: To pinpoint genes likely to contribute to ASD etiology, high density genotyping was performed in 912 multiplex families from the Autism Genetics Resource Exchange collection and contrasted results to those obtained for 1,488 healthy controls.
Abstract: The genetics underlying the autism spectrum disorders (ASDs) is complex and remains poorly understood. Previous work has demonstrated an important role for structural variation in a subset of cases, but has lacked the resolution necessary to move beyond detection of large regions of potential interest to identification of individual genes. To pinpoint genes likely to contribute to ASD etiology, we performed high density genotyping in 912 multiplex families from the Autism Genetics Resource Exchange (AGRE) collection and contrasted results to those obtained for 1,488 healthy controls. Through prioritization of exonic deletions (eDels), exonic duplications (eDups), and whole gene duplication events (gDups), we identified more than 150 loci harboring rare variants in multiple unrelated probands, but no controls. Importantly, 27 of these were confirmed on examination of an independent replication cohort comprised of 859 cases and an additional 1,051 controls. Rare variants at known loci, including exonic deletions at NRXN1 and whole gene duplications encompassing UBE3A and several other genes in the 15q11-q13 region, were observed in the course of these analyses. Strong support was likewise observed for previously unreported genes such as BZRAP1, an adaptor molecule known to regulate synaptic transmission, with eDels or eDups observed in twelve unrelated cases but no controls (p = 2.3x10(-5)). Less is known about MDGA2, likewise observed to be case-specific (p = 1.3x10(-4)). But, it is notable that the encoded protein shows an unexpectedly high similarity to Contactin 4 (BLAST E-value = 3x10(-39)), which has also been linked to disease. That hundreds of distinct rare variants were each seen only once further highlights complexity in the ASDs and points to the continued need for larger cohorts.