scispace - formally typeset
Search or ask a question

Showing papers in "PLOS Genetics in 2010"


Journal ArticleDOI
TL;DR: Overall estimates of genetic diversity and differentiation among populations confirm the biogeographic hypothesis that large panmictic oceanic populations have repeatedly given rise to phenotypically divergent freshwater populations and identify several novel regions showing parallel differentiation across independent populations.
Abstract: Next-generation sequencing technology provides novel opportunities for gathering genome-scale sequence data in natural populations, laying the empirical foundation for the evolving field of population genomics. Here we conducted a genome scan of nucleotide diversity and differentiation in natural populations of threespine stickleback (Gasterosteus aculeatus). We used Illumina-sequenced RAD tags to identify and type over 45,000 single nucleotide polymorphisms (SNPs) in each of 100 individuals from two oceanic and three freshwater populations. Overall estimates of genetic diversity and differentiation among populations confirm the biogeographic hypothesis that large panmictic oceanic populations have repeatedly given rise to phenotypically divergent freshwater populations. Genomic regions exhibiting signatures of both balancing and divergent selection were remarkably consistent across multiple, independently derived populations, indicating that replicate parallel phenotypic evolution in stickleback may be occurring through extensive, parallel genetic evolution at a genome-wide scale. Some of these genomic regions co-localize with previously identified QTL for stickleback phenotypic variation identified using laboratory mapping crosses. In addition, we have identified several novel regions showing parallel differentiation across independent populations. Annotation of these regions revealed numerous genes that are candidates for stickleback phenotypic evolution and will form the basis of future genetic analyses in this and other organisms. This study represents the first high-density SNP–based genome scan of genetic diversity and differentiation for populations of threespine stickleback in the wild. These data illustrate the complementary nature of laboratory crosses and population genomic scans by confirming the adaptive significance of previously identified genomic regions, elucidating the particular evolutionary and demographic history of such regions in natural populations, and identifying new genomic regions and candidate genes of evolutionary significance.

1,406 citations


Journal ArticleDOI
TL;DR: Results showing that trait-associated SNPs are more likely to be eQTLs and that application of this information can enhance discovery of trait- associated SNPs for complex phenotypes raise the possibility that this information both to increase the heritability explained by identifiable genetic factors and to gain a better understanding of the biology underlying complex traits.
Abstract: Although genome-wide association studies (GWAS) of complex traits have yielded more reproducible associations than had been discovered using any other approach, the loci characterized to date do not account for much of the heritability to such traits and, in general, have not led to improved understanding of the biology underlying complex phenotypes. Using a web site we developed to serve results of expression quantitative trait locus (eQTL) studies in lymphoblastoid cell lines from HapMap samples (http://www.scandb.org), we show that single nucleotide polymorphisms (SNPs) associated with complex traits (from http://www.genome.gov/gwastudies/) are significantly more likely to be eQTLs than minor-allele-frequency–matched SNPs chosen from high-throughput GWAS platforms. These findings are robust across a range of thresholds for establishing eQTLs (p-values from 10−4–10−8), and a broad spectrum of human complex traits. Analyses of GWAS data from the Wellcome Trust studies confirm that annotating SNPs with a score reflecting the strength of the evidence that the SNP is an eQTL can improve the ability to discover true associations and clarify the nature of the mechanism driving the associations. Our results showing that trait-associated SNPs are more likely to be eQTLs and that application of this information can enhance discovery of trait-associated SNPs for complex phenotypes raise the possibility that we can utilize this information both to increase the heritability explained by identifiable genetic factors and to gain a better understanding of the biology underlying complex traits.

1,280 citations


Journal ArticleDOI
TL;DR: A set of integrated experiments that investigate the effects of common genetic variability on DNA methylation and mRNA expression in four human brain regions each from 150 individuals find an abundance of genetic cis regulation of mRNA expression and show for the first time abundant quantitative trait loci for DNA CpG methylation across the genome.
Abstract: A fundamental challenge in the post-genome era is to understand and annotate the consequences of genetic variation, particularly within the context of human tissues. We present a set of integrated experiments that investigate the effects of common genetic variability on DNA methylation and mRNA expression in four human brain regions each from 150 individuals (600 samples total). We find an abundance of genetic cis regulation of mRNA expression and show for the first time abundant quantitative trait loci for DNA CpG methylation across the genome. We show peak enrichment for cis expression QTLs to be approximately 68,000 bp away from individual transcription start sites; however, the peak enrichment for cis CpG methylation QTLs is located much closer, only 45 bp from the CpG site in question. We observe that the largest magnitude quantitative trait loci occur across distinct brain tissues. Our analyses reveal that CpG methylation quantitative trait loci are more likely to occur for CpG sites outside of islands. Lastly, we show that while we can observe individual QTLs that appear to affect both the level of a transcript and a physically close CpG methylation site, these are quite rare. We believe these data, which we have made publicly available, will provide a critical step toward understanding the biological effects of genetic variation.

803 citations


Journal ArticleDOI
TL;DR: The results identify novel circular RNA products emanating from the ANRIL locus and suggest causal variants at 9p21.3 regulate INK4/ARF expression and ASVD risk by modulating ANRil expression and/or structure.
Abstract: Human genome-wide association studies have linked single nucleotide polymorphisms (SNPs) on chromosome 9p21.3 near the INK4/ARF (CDKN2a/b) locus with susceptibility to atherosclerotic vascular disease (ASVD). Although this locus encodes three well-characterized tumor suppressors, p16INK4a, p15INK4b, and ARF, the SNPs most strongly associated with ASVD are ∼120 kb from the nearest coding gene within a long non-coding RNA (ncRNA) known as ANRIL (CDKN2BAS). While individuals homozygous for the atherosclerotic risk allele show decreased expression of ANRIL and the coding INK4/ARF transcripts, the mechanism by which such distant genetic variants influence INK4/ARF expression is unknown. Here, using rapid amplification of cDNA ends (RACE) and analysis of next-generation RNA sequencing datasets, we determined the structure and abundance of multiple ANRIL species. Each of these species was present at very low copy numbers in primary and cultured cells; however, only the expression of ANRIL isoforms containing exons proximal to the INK4/ARF locus correlated with the ASVD risk alleles. Surprisingly, RACE also identified transcripts containing non-colinear ANRIL exonic sequences, whose expression also correlated with genotype and INK4/ARF expression. These non-polyadenylated RNAs resisted RNAse R digestion and could be PCR amplified using outward-facing primers, suggesting they represent circular RNA structures that could arise from by-products of mRNA splicing. Next-generation DNA sequencing and splice prediction algorithms identified polymorphisms within the ASVD risk interval that may regulate ANRIL splicing and circular ANRIL (cANRIL) production. These results identify novel circular RNA products emanating from the ANRIL locus and suggest causal variants at 9p21.3 regulate INK4/ARF expression and ASVD risk by modulating ANRIL expression and/or structure.

789 citations


Journal ArticleDOI
TL;DR: It is shown that IRs are expressed in olfactory organs across Protostomia—a major branch of the animal kingdom that encompasses arthropods, nematodes, and molluscs—indicating that they represent an ancestral protostome chemosensory receptor family.
Abstract: Ionotropic glutamate receptors (iGluRs) are a highly conserved family of ligand-gated ion channels present in animals, plants, and bacteria, which are best characterized for their roles in synaptic communication in vertebrate nervous systems. A variant subfamily of iGluRs, the Ionotropic Receptors (IRs), was recently identified as a new class of olfactory receptors in the fruit fly, Drosophila melanogaster, hinting at a broader function of this ion channel family in detection of environmental, as well as intercellular, chemical signals. Here, we investigate the origin and evolution of IRs by comprehensive evolutionary genomics and in situ expression analysis. In marked contrast to the insect-specific Odorant Receptor family, we show that IRs are expressed in olfactory organs across Protostomia--a major branch of the animal kingdom that encompasses arthropods, nematodes, and molluscs--indicating that they represent an ancestral protostome chemosensory receptor family. Two subfamilies of IRs are distinguished: conserved "antennal IRs," which likely define the first olfactory receptor family of insects, and species-specific "divergent IRs," which are expressed in peripheral and internal gustatory neurons, implicating this family in taste and food assessment. Comparative analysis of drosophilid IRs reveals the selective forces that have shaped the repertoires in flies with distinct chemosensory preferences. Examination of IR gene structure and genomic distribution suggests both non-allelic homologous recombination and retroposition contributed to the expansion of this multigene family. Together, these findings lay a foundation for functional analysis of these receptors in both neurobiological and evolutionary studies. Furthermore, this work identifies novel targets for manipulating chemosensory-driven behaviours of agricultural pests and disease vectors.

645 citations


Journal ArticleDOI
TL;DR: It is demonstrated that genes with a high predicted probability of exhibiting haploinsufficiency are enriched among genes implicated in human dominant diseases and among genes causing abnormal phenotypes in heterozygous knockout mice.
Abstract: Haploinsufficiency, wherein a single functional copy of a gene is insufficient to maintain normal function, is a major cause of dominant disease. Human disease studies have identified several hundred haploinsufficient (HI) genes. We have compiled a map of 1,079 haplosufficient (HS) genes by systematic identification of genes unambiguously and repeatedly compromised by copy number variation among 8,458 apparently healthy individuals and contrasted the genomic, evolutionary, functional, and network properties between these HS genes and known HI genes. We found that HI genes are typically longer and have more conserved coding sequences and promoters than HS genes. HI genes exhibit higher levels of expression during early development and greater tissue specificity. Moreover, within a probabilistic human functional interaction network HI genes have more interaction partners and greater network proximity to other known HI genes. We built a predictive model on the basis of these differences and annotated 12,443 genes with their predicted probability of being haploinsufficient. We validated these predictions of haploinsufficiency by demonstrating that genes with a high predicted probability of exhibiting haploinsufficiency are enriched among genes implicated in human dominant diseases and among genes causing abnormal phenotypes in heterozygous knockout mice. We have transformed these gene-based haploinsufficiency predictions into haploinsufficiency scores for genic deletions, which we demonstrate to better discriminate between pathogenic and benign deletions than consideration of the deletion size or numbers of genes deleted. These robust predictions of haploinsufficiency support clinical interpretation of novel loss-of-function variants and prioritization of variants and genes for follow-up studies.

601 citations


Journal ArticleDOI
TL;DR: It is established that genetic material derived from all known viral genome types and replication strategies can enter the animal germ line, greatly broadening the scope of paleovirological studies and indicating a more significant evolutionary role for gene flow from virus to animal genomes than has previously been recognized.
Abstract: Integration into the nuclear genome of germ line cells can lead to vertical inheritance of retroviral genes as host alleles For other viruses, germ line integration has only rarely been documented Nonetheless, we identified endogenous viral elements (EVEs) derived from ten non-retroviral families by systematic in silico screening of animal genomes, including the first endogenous representatives of double-stranded RNA, reverse-transcribing DNA, and segmented RNA viruses, and the first endogenous DNA viruses in mammalian genomes Phylogenetic and genomic analysis of EVEs across multiple host species revealed novel information about the origin and evolution of diverse virus groups Furthermore, several of the elements identified here encode intact open reading frames or are expressed as mRNA For one element in the primate lineage, we provide statistically robust evidence for exaptation Our findings establish that genetic material derived from all known viral genome types and replication strategies can enter the animal germ line, greatly broadening the scope of paleovirological studies and indicating a more significant evolutionary role for gene flow from virus to animal genomes than has previously been recognized

570 citations


Journal ArticleDOI
TL;DR: Loss of brain DILPs blocked the responses of lifespan and fecundity to dietary restriction (DR) and the DR response of these mutants suggests that IIS extends lifespan through mechanisms that both overlap with those of DR and through additional mechanisms that are independent of those at work in DR.
Abstract: Multicellular animals match costly activities, such as growth and reproduction, to the environment through nutrient-sensing pathways. The insulin/IGF signaling (IIS) pathway plays key roles in growth, metabolism, stress resistance, reproduction, and longevity in diverse organisms including mammals. Invertebrate genomes often contain multiple genes encoding insulin-like ligands, including seven Drosophila insulin-like peptides (DILPs). We investigated the evolution, diversification, redundancy, and functions of the DILPs, combining evolutionary analysis, based on the completed genome sequences of 12 Drosophila species, and functional analysis, based on newly-generated knock-out mutations for all 7 dilp genes in D. melanogaster. Diversification of the 7 DILPs preceded diversification of Drosophila species, with stable gene diversification and family membership, suggesting stabilising selection for gene function. Gene knock-outs demonstrated both synergy and compensation of expression between different DILPs, notably with DILP3 required for normal expression of DILPs 2 and 5 in brain neurosecretory cells and expression of DILP6 in the fat body compensating for loss of brain DILPs. Loss of DILP2 increased lifespan and loss of DILP6 reduced growth, while loss of DILP7 did not affect fertility, contrary to its proposed role as a Drosophila relaxin. Importantly, loss of DILPs produced in the brain greatly extended lifespan but only in the presence of the endosymbiontic bacterium Wolbachia, demonstrating a specific interaction between IIS and Wolbachia in lifespan regulation. Furthermore, loss of brain DILPs blocked the responses of lifespan and fecundity to dietary restriction (DR) and the DR response of these mutants suggests that IIS extends lifespan through mechanisms that both overlap with those of DR and through additional mechanisms that are independent of those at work in DR. Evolutionary conservation has thus been accompanied by synergy, redundancy, and functional differentiation between DILPs, and these features may themselves be of evolutionary advantage.

554 citations


Journal ArticleDOI
TL;DR: It is found that, contrary to previous estimates, CGI abundance in humans and mice is very similar and many are at conserved locations relative to genes, supporting the hypothesis that these two properties are mechanistically interdependent.
Abstract: CpG islands (CGIs) are vertebrate genomic landmarks that encompass the promoters of most genes and often lack DNA methylation. Querying their apparent importance, the number of CGIs is reported to vary widely in different species and many do not co-localise with annotated promoters. We set out to quantify the number of CGIs in mouse and human genomes using CXXC Affinity Purification plus deep sequencing (CAP-seq). We also asked whether CGIs not associated with annotated transcripts share properties with those at known promoters. We found that, contrary to previous estimates, CGI abundance in humans and mice is very similar and many are at conserved locations relative to genes. In each species CpG density correlates positively with the degree of H3K4 trimethylation, supporting the hypothesis that these two properties are mechanistically interdependent. Approximately half of mammalian CGIs (>10,000) are “orphans” that are not associated with annotated promoters. Many orphan CGIs show evidence of transcriptional initiation and dynamic expression during development. Unlike CGIs at known promoters, orphan CGIs are frequently subject to DNA methylation during development, and this is accompanied by loss of their active promoter features. In colorectal tumors, however, orphan CGIs are not preferentially methylated, suggesting that cancer does not recapitulate a developmental program. Human and mouse genomes have similar numbers of CGIs, over half of which are remote from known promoters. Orphan CGIs nevertheless have the characteristics of functional promoters, though they are much more likely than promoter CGIs to become methylated during development and hence lose these properties. The data indicate that orphan CGIs correspond to previously undetected promoters whose transcriptional activity may play a functional role during development.

554 citations


Journal ArticleDOI
TL;DR: The frequency of numt insertions among 85 sequenced eukaryotic genomes reveal that numt content is strongly correlated with genome size, suggesting that the numt insertion rate might be limited by DSB frequency.
Abstract: The natural transfer of DNA from mitochondria to the nucleus generates nuclear copies of mitochondrial DNA (numts) and is an ongoing evolutionary process, as genome sequences attest. In humans, five different numts cause genetic disease and a dozen human loci are polymorphic for the presence of numts, underscoring the rapid rate at which mitochondrial sequences reach the nucleus over evolutionary time. In the laboratory and in nature, numts enter the nuclear DNA via non-homolgous end joining (NHEJ) at double-strand breaks (DSBs). The frequency of numt insertions among 85 sequenced eukaryotic genomes reveal that numt content is strongly correlated with genome size, suggesting that the numt insertion rate might be limited by DSB frequency. Polymorphic numts in humans link maternally inherited mitochondrial genotypes to nuclear DNA haplotypes during the past, offering new opportunities to associate nuclear markers with mitochondrial markers back in time.

550 citations


Journal ArticleDOI
TL;DR: This paper showed that PHR1 and PHL1 are partially redundant transcription factors acting as central integrators of starvation responses, both specific and generic, and they indicate that transcriptional repression responses are an integral part of adaptive responses to stress.
Abstract: Plants respond to different stresses by inducing or repressing transcription of partially overlapping sets of genes. In Arabidopsis, the PHR1 transcription factor (TF) has an important role in the control of phosphate (Pi) starvation stress responses. Using transcriptomic analysis of Pi starvation in phr1, and phr1 phr1-like (phl1) mutants and in wild type plants, we show that PHR1 in conjunction with PHL1 controls most transcriptional activation and repression responses to phosphate starvation, regardless of the Pi starvation specificity of these responses. Induced genes are enriched in PHR1 binding sequences (P1BS) in their promoters, whereas repressed genes do not show such enrichment, suggesting that PHR1(-like) control of transcriptional repression responses is indirect. In agreement with this, transcriptomic analysis of a transgenic plant expressing PHR1 fused to the hormone ligand domain of the glucocorticoid receptor showed that PHR1 direct targets (i.e., displaying altered expression after GR:PHR1 activation by dexamethasone in the presence of cycloheximide) corresponded largely to Pi starvation-induced genes that are highly enriched in P1BS. A minimal promoter containing a multimerised P1BS recapitulates Pi starvation-specific responsiveness. Likewise, mutation of P1BS in the promoter of two Pi starvation-responsive genes impaired their responsiveness to Pi starvation, but not to other stress types. Phylogenetic footprinting confirmed the importance of P1BS and PHR1 in Pi starvation responsiveness and indicated that P1BS acts in concert with other cis motifs. All together, our data show that PHR1 and PHL1 are partially redundant TF acting as central integrators of Pi starvation responses, both specific and generic. In addition, they indicate that transcriptional repression responses are an integral part of adaptive responses to stress.

Journal ArticleDOI
TL;DR: The results suggest that common variants affecting nuclear-encoded mitochondrial genes have at most a small genetic contribution to T2D susceptibility.
Abstract: Mitochondrial dysfunction has been observed in skeletal muscle of people with diabetes and insulin-resistant individuals. Furthermore, inherited mutations in mitochondrial DNA can cause a rare form of diabetes. However, it is unclear whether mitochondrial dysfunction is a primary cause of the common form of diabetes. To date, common genetic variants robustly associated with type 2 diabetes (T2D) are not known to affect mitochondrial function. One possibility is that multiple mitochondrial genes contain modest genetic effects that collectively influence T2D risk. To test this hypothesis we developed a method named Meta-Analysis Gene-set Enrichment of variaNT Associations (MAGENTA; http://www. broadinstitute.org/mpg/magenta). MAGENTA, in analogy to Gene Set Enrichment Analysis, tests whether sets of functionally related genes are enriched for associations with a polygenic disease or trait. MAGENTA was specifically designed to exploit the statistical power of large genome-wide association (GWA) study meta-analyses whose individual genotypes are not available. This is achieved by combining variant association p-values into gene scores and then correcting for confounders, such as gene size, variant number, and linkage disequilibrium properties. Using simulations, we determined the range of parameters for which MAGENTA can detect associations likely missed by single-marker analysis. We verified MAGENTA’s performance on empirical data by identifying known relevant pathways in lipid and lipoprotein GWA meta-analyses. We then tested our mitochondrial hypothesis by applying MAGENTA to three gene sets: nuclear regulators of mitochondrial genes, oxidative phosphorylation genes, and ,1,000 nuclear-encoded mitochondrial genes. The analysis was performed using the most recent T2D GWA meta-analysis of 47,117 people and meta-analyses of seven diabetes-related glycemic traits (up to 46,186 non-diabetic individuals). This well-powered analysis found no significant enrichment of associations to T2D or any of the glycemic traits in any of the gene sets tested. These results suggest that common variants affecting nuclearencoded mitochondrial genes have at most a small genetic contribution to T2D susceptibility.

Journal ArticleDOI
TL;DR: The results indicate that several key HIF-regulatory and targeted genes are responsible for adaptation to high altitude in Andeans and Tibetans, and several different chromosomal regions are implicated in the putative response to selection.
Abstract: High-altitude hypoxia (reduced inspired oxygen tension due to decreased barometric pressure) exerts severe physiological stress on the human body. Two high-altitude regions where humans have lived for millennia are the Andean Altiplano and the Tibetan Plateau. Populations living in these regions exhibit unique circulatory, respiratory, and hematological adaptations to life at high altitude. Although these responses have been well characterized physiologically, their underlying genetic basis remains unknown. We performed a genome scan to identify genes showing evidence of adaptation to hypoxia. We looked across each chromosome to identify genomic regions with previously unknown function with respect to altitude phenotypes. In addition, groups of genes functioning in oxygen metabolism and sensing were examined to test the hypothesis that particular pathways have been involved in genetic adaptation to altitude. Applying four population genetic statistics commonly used for detecting signatures of natural selection, we identified selection-nominated candidate genes and gene regions in these two populations (Andeans and Tibetans) separately. The Tibetan and Andean patterns of genetic adaptation are largely distinct from one another, with both populations showing evidence of positive natural selection in different genes or gene regions. Interestingly, one gene previously known to be important in cellular oxygen sensing, EGLN1 (also known as PHD2), shows evidence of positive selection in both Tibetans and Andeans. However, the pattern of variation for this gene differs between the two populations. Our results indicate that several key HIF-regulatory and targeted genes are responsible for adaptation to high altitude in Andeans and Tibetans, and several different chromosomal regions are implicated in the putative response to selection. These data suggest a genetic role in high-altitude adaption and provide a basis for future genotype/phenotype association studies necessary to confirm the role of selection-nominated candidate genes and gene regions in adaptation to altitude.

Journal ArticleDOI
TL;DR: This study proposes an empirical methodology, which is called Regulatory Trait Concordance (RTC), that accounts for local LD structure and integrates eQTLs and GWAS results in order to reveal the subset of association signals that are due to cis eZTLs, and detects several potential disease-causing regulatory effects.
Abstract: The recent success of genome-wide association studies (GWAS) is now followed by the challenge to determine how the reported susceptibility variants mediate complex traits and diseases. Expression quantitative trait loci (eQTLs) have been implicated in disease associations through overlaps between eQTLs and GWAS signals. However, the abundance of eQTLs and the strong correlation structure (LD) in the genome make it likely that some of these overlaps are coincidental and not driven by the same functional variants. In the present study, we propose an empirical methodology, which we call Regulatory Trait Concordance (RTC) that accounts for local LD structure and integrates eQTLs and GWAS results in order to reveal the subset of association signals that are due to cis eQTLs. We simulate genomic regions of various LD patterns with both a single or two causal variants and show that our score outperforms SNP correlation metrics, be they statistical (r2) or historical (D'). Following the observation of a significant abundance of regulatory signals among currently published GWAS loci, we apply our method with the goal to prioritize relevant genes for each of the respective complex traits. We detect several potential disease-causing regulatory effects, with a strong enrichment for immunity-related conditions, consistent with the nature of the cell line tested (LCLs). Furthermore, we present an extension of the method in trans, where interrogating the whole genome for downstream effects of the disease variant can be informative regarding its unknown primary biological effect. We conclude that integrating cellular phenotype associations with organismal complex traits will facilitate the biological interpretation of the genetic effects on these traits.

Journal ArticleDOI
TL;DR: A study about the genetics of flowering time that differs from previous studies in two important ways: first, it is measured in a more complex and ecologically realistic environment; and, second, it combines the advantages of genome-wide association (GWA) and traditional linkage (QTL) mapping.
Abstract: Flowering time is a key life-history trait in the plant life cycle. Most studies to unravel the genetics of flowering time in Arabidopsis thaliana have been performed under greenhouse conditions. Here, we describe a study about the genetics of flowering time that differs from previous studies in two important ways: first, we measure flowering time in a more complex and ecologically realistic environment; and, second, we combine the advantages of genome-wide association (GWA) and traditional linkage (QTL) mapping. Our experiments involved phenotyping nearly 20,000 plants over 2 winters under field conditions, including 184 worldwide natural accessions genotyped for 216,509 SNPs and 4,366 RILs derived from 13 independent crosses chosen to maximize genetic and phenotypic diversity. Based on a photothermal time model, the flowering time variation scored in our field experiment was poorly correlated with the flowering time variation previously obtained under greenhouse conditions, reinforcing previous demonstrations of the importance of genotype by environment interactions in A. thaliana and the need to study adaptive variation under natural conditions. The use of 4,366 RILs provides great power for dissecting the genetic architecture of flowering time in A. thaliana under our specific field conditions. We describe more than 60 additive QTLs, all with relatively small to medium effects and organized in 5 major clusters. We show that QTL mapping increases our power to distinguish true from false associations in GWA mapping. QTL mapping also permits the identification of false negatives, that is, causative SNPs that are lost when applying GWA methods that control for population structure. Major genes underpinning flowering time in the greenhouse were not associated with flowering time in this study. Instead, we found a prevalence of genes involved in the regulation of the plant circadian clock. Furthermore, we identified new genomic regions lacking obvious candidate genes.

Journal ArticleDOI
TL;DR: A novel research framework is developed that facilitates the parallel study of a wide assortment of traits within a single cohort and takes advantage of the interactivity of the Web both to gather data and to present genetic information to research participants, while taking care to correct for the population structure inherent to this study design.
Abstract: Despite the recent rapid growth in genome-wide data, much of human variation remains entirely unexplained. A significant challenge in the pursuit of the genetic basis for variation in common human traits is the efficient, coordinated collection of genotype and phenotype data. We have developed a novel research framework that facilitates the parallel study of a wide assortment of traits within a single cohort. The approach takes advantage of the interactivity of the Web both to gather data and to present genetic information to research participants, while taking care to correct for the population structure inherent to this study design. Here we report initial results from a participant-driven study of 22 traits. Replications of associations (in the genes OCA2, HERC2, SLC45A2, SLC24A4, IRF4, TYR, TYRP1, ASIP, and MC1R) for hair color, eye color, and freckling validate the Web-based, self-reporting paradigm. The identification of novel associations for hair morphology (rs17646946, near TCHH; rs7349332, near WNT10A; and rs1556547, near OFCC1), freckling (rs2153271, in BNC2), the ability to smell the methanethiol produced after eating asparagus (rs4481887, near OR2M7), and photic sneeze reflex (rs10427255, near ZEB2, and rs11856995, near NR2F2) illustrates the power of the approach.

Journal ArticleDOI
TL;DR: Common etiological factors for seemingly diverse diseases such as ID, autism, schizophrenia, and epilepsy are suggested, including copy number variants in genes previously implicated in other neurodevelopmental disorders.
Abstract: Epilepsy is one of the most common neurological disorders in humans with a prevalence of 1% and a lifetime incidence of 3%. Several genes have been identified in rare autosomal dominant and severe sporadic forms of epilepsy, but the genetic cause is unknown in the vast majority of cases. Copy number variants (CNVs) are known to play an important role in the genetic etiology of many neurodevelopmental disorders, including intellectual disability (ID), autism, and schizophrenia. Genome-wide studies of copy number variation in epilepsy have not been performed. We have applied whole-genome oligonucleotide array comparative genomic hybridization to a cohort of 517 individuals with various idiopathic, non-lesional epilepsies. We detected one or more rare genic CNVs in 8.9% of affected individuals that are not present in 2,493 controls; five individuals had two rare CNVs. We identified CNVs in genes previously implicated in other neurodevelopmental disorders, including two deletions in AUTS2 and one deletion in CNTNAP2. Therefore, our findings indicate that rare CNVs are likely to contribute to a broad range of generalized and focal epilepsies. In addition, we find that 2.9% of patients carry deletions at 15q11.2, 15q13.3, or 16p13.11, genomic hotspots previously associated with ID, autism, or schizophrenia. In summary, our findings suggest common etiological factors for seemingly diverse diseases such as ID, autism, schizophrenia, and epilepsy.

Journal ArticleDOI
TL;DR: A Slit-miR-218-Robo1 regulatory circuit whose disruption may contribute to GC metastasis is described, and Targeting miR- 218 may provide a strategy for blocking tumor metastasis.
Abstract: MicroRNAs play key roles in tumor metastasis. Here, we describe the regulation and function of miR-218 in gastric cancer (GC) metastasis. miR-218 expression is decreased along with the expression of one of its host genes, Slit3 in metastatic GC. However, Robo1, one of several Slit receptors, is negatively regulated by miR-218, thus establishing a negative feedback loop. Decreased miR-218 levels eliminate Robo1 repression, which activates the Slit-Robo1 pathway through the interaction between Robo1 and Slit2, thus triggering tumor metastasis. The restoration of miR-218 suppresses Robo1 expression and inhibits tumor cell invasion and metastasis in vitro and in vivo. Taken together, our results describe a Slit-miR-218-Robo1 regulatory circuit whose disruption may contribute to GC metastasis. Targeting miR-218 may provide a strategy for blocking tumor metastasis.

Journal ArticleDOI
TL;DR: It is demonstrated that clonal pathogens that evolve under severely relaxed selection are uniquely suitable for studying mutational biases in bacteria and that variation in nucleotide content cannot stem entirely from variation inmutational biases and that natural selection and/or a natural selection-like process such as biased gene conversion strongly affect nucleotidecontent.
Abstract: Mutation is the engine that drives evolution and adaptation forward in that it generates the variation on which natural selection acts. Mutation is a random process that nevertheless occurs according to certain biases. Elucidating mutational biases and the way they vary across species and within genomes is crucial to understanding evolution and adaptation. Here we demonstrate that clonal pathogens that evolve under severely relaxed selection are uniquely suitable for studying mutational biases in bacteria. We estimate mutational patterns using sequence datasets from five such clonal pathogens belonging to four diverse bacterial clades that span most of the range of genomic nucleotide content. We demonstrate that across different types of sites and in all four clades mutation is consistently biased towards AT. This is true even in clades that have high genomic GC content. In all studied cases the mutational bias towards AT is primarily due to the high rate of C/G to T/A transitions. These results suggest that bacterial mutational biases are far less variable than previously thought. They further demonstrate that variation in nucleotide content cannot stem entirely from variation in mutational biases and that natural selection and/or a natural selection-like process such as biased gene conversion strongly affect nucleotide content.

Journal ArticleDOI
TL;DR: It is shown that FSHD muscle expresses a different splice form of DUX4 mRNA compared to control muscle, which indicates that full-length Dux4 is normally expressed at specific developmental stages and is suppressed in most somatic tissues.
Abstract: Each unit of the D4Z4 macrosatellite repeat contains a retrotransposed gene encoding the DUX4 double-homeobox transcription factor. Facioscapulohumeral dystrophy (FSHD) is caused by deletion of a subset of the D4Z4 units in the subtelomeric region of chromosome 4. Although it has been reported that the deletion of D4Z4 units induces the pathological expression of DUX4 mRNA, the association of DUX4 mRNA expression with FSHD has not been rigorously investigated, nor has any human tissue been identified that normally expresses DUX4 mRNA or protein. We show that FSHD muscle expresses a different splice form of DUX4 mRNA compared to control muscle. Control muscle produces low amounts of a splice form of DUX4 encoding only the amino-terminal portion of DUX4. FSHD muscle produces low amounts of a DUX4 mRNA that encodes the full-length DUX4 protein. The low abundance of full-length DUX4 mRNA in FSHD muscle cells represents a small subset of nuclei producing a relatively high abundance of DUX4 mRNA and protein. In contrast to control skeletal muscle and most other somatic tissues, full-length DUX4 transcript and protein is expressed at relatively abundant levels in human testis, most likely in the germ-line cells. Induced pluripotent (iPS) cells also express full-length DUX4 and differentiation of control iPS cells to embryoid bodies suppresses expression of full-length DUX4, whereas expression of full-length DUX4 persists in differentiated FSHD iPS cells. Together, these findings indicate that full-length DUX4 is normally expressed at specific developmental stages and is suppressed in most somatic tissues. The contraction of the D4Z4 repeat in FSHD results in a less efficient suppression of the full-length DUX4 mRNA in skeletal muscle cells. Therefore, FSHD represents the first human disease to be associated with the incomplete developmental silencing of a retrogene array normally expressed early in development.

Journal ArticleDOI
TL;DR: Results suggest that, although Mp10 suppresses flg22-triggered immunity, it triggers a defense response, resulting in an overall decrease in aphid performance in the fecundity assays.
Abstract: Aphids are amongst the most devastating sap-feeding insects of plants. Like most plant parasites, aphids require intimate associations with their host plants to gain access to nutrients. Aphid feeding induces responses such as clogging of phloem sieve elements and callose formation, which are suppressed by unknown molecules, probably proteins, in aphid saliva. Therefore, it is likely that aphids, like plant pathogens, deliver proteins (effectors) inside their hosts to modulate host cell processes, suppress plant defenses, and promote infestation. We exploited publicly available aphid salivary gland expressed sequence tags (ESTs) to apply a functional genomics approach for identification of candidate effectors from Myzus persicae (green peach aphid), based on common features of plant pathogen effectors. A total of 48 effector candidates were identified, cloned, and subjected to transient overexpression in Nicotiana benthamiana to assay for elicitation of a phenotype, suppression of the Pathogen-Associated Molecular Pattern (PAMP)–mediated oxidative burst, and effects on aphid reproductive performance. We identified one candidate effector, Mp10, which specifically induced chlorosis and local cell death in N. benthamiana and conferred avirulence to recombinant Potato virus X (PVX) expressing Mp10, PVX-Mp10, in N. tabacum, indicating that this protein may trigger plant defenses. The ubiquitin-ligase associated protein SGT1 was required for the Mp10-mediated chlorosis response in N. benthamiana. Mp10 also suppressed the oxidative burst induced by flg22, but not by chitin. Aphid fecundity assays revealed that in planta overexpression of Mp10 and Mp42 reduced aphid fecundity, whereas another effector candidate, MpC002, enhanced aphid fecundity. Thus, these results suggest that, although Mp10 suppresses flg22-triggered immunity, it triggers a defense response, resulting in an overall decrease in aphid performance in the fecundity assays. Overall, we identified aphid salivary proteins that share features with plant pathogen effectors and therefore may function as aphid effectors by perturbing host cellular processes.

Journal ArticleDOI
TL;DR: The mechanisms underlying resistance to the neonicotinoid insecticides were investigated using biological, biochemical, and genomic approaches and it was suggested that P450-mediated detoxification plays a primary role in resistance, although additional mechanism(s) may also contribute.
Abstract: The aphid Myzus persicae is a globally significant crop pest that has evolved high levels of resistance to almost all classes of insecticide To date, the neonicotinoids, an economically important class of insecticides that target nicotinic acetylcholine receptors (nAChRs), have remained an effective control measure; however, recent reports of resistance in M persicae represent a threat to the long-term efficacy of this chemical class In this study, the mechanisms underlying resistance to the neonicotinoid insecticides were investigated using biological, biochemical, and genomic approaches Bioassays on a resistant M persicae clone (5191A) suggested that P450-mediated detoxification plays a primary role in resistance, although additional mechanism(s) may also contribute Microarray analysis, using an array populated with probes corresponding to all known detoxification genes in M persicae, revealed constitutive over-expression (22-fold) of a single P450 gene (CYP6CY3); and quantitative PCR showed that the over-expression is due, at least in part, to gene amplification This is the first report of a P450 gene amplification event associated with insecticide resistance in an agriculturally important insect pest The microarray analysis also showed over-expression of several gene sequences that encode cuticular proteins (2–16-fold), and artificial feeding assays and in vivo penetration assays using radiolabeled insecticide provided direct evidence of a role for reduced cuticular penetration in neonicotinoid resistance Conversely, receptor radioligand binding studies and nucleotide sequencing of nAChR subunit genes suggest that target-site changes are unlikely to contribute to resistance to neonicotinoid insecticides in M persicae

Journal ArticleDOI
TL;DR: DNA methylation at MEs was elevated in individuals conceived during the nutritionally challenged rainy season, providing the first evidence of a permanent, systemic effect of periconceptional environment on human epigenotype.
Abstract: Throughout most of the mammalian genome, genetically regulated developmental programming establishes diverse yet predictable epigenetic states across differentiated cells and tissues. At metastable epialleles (MEs), conversely, epigenotype is established stochastically in the early embryo then maintained in differentiated lineages, resulting in dramatic and systemic interindividual variation in epigenetic regulation. In the mouse, maternal nutrition affects this process, with permanent phenotypic consequences for the offspring. MEs have not previously been identified in humans. Here, using an innovative 2-tissue parallel epigenomic screen, we identified putative MEs in the human genome. In autopsy samples, we showed that DNA methylation at these loci is highly correlated across tissues representing all 3 embryonic germ layer lineages. Monozygotic twin pairs exhibited substantial discordance in DNA methylation at these loci, suggesting that their epigenetic state is established stochastically. We then tested for persistent epigenetic effects of periconceptional nutrition in rural Gambians, who experience dramatic seasonal fluctuations in nutritional status. DNA methylation at MEs was elevated in individuals conceived during the nutritionally challenged rainy season, providing the first evidence of a permanent, systemic effect of periconceptional environment on human epigenotype. At MEs, epigenetic regulation in internal organs and tissues varies among individuals and can be deduced from peripheral blood DNA. MEs should therefore facilitate an improved understanding of the role of interindividual epigenetic variation in human disease.

Journal ArticleDOI
TL;DR: In this paper, a series of engineered bacterial artificial chromosomes were integrated into embryonic stem (ES) cells and examined their chromatin, and it was shown that a 44 kb region corresponding to the Zfpm2 locus initiates de novo recruitment of PRC2.
Abstract: Polycomb proteins are epigenetic regulators that localize to developmental loci in the early embryo where they mediate lineage-specific gene repression. In Drosophila, these repressors are recruited to sequence elements by DNA binding proteins associated with Polycomb repressive complex 2 (PRC2). However, the sequences that recruit PRC2 in mammalian cells have remained obscure. To address this, we integrated a series of engineered bacterial artificial chromosomes into embryonic stem (ES) cells and examined their chromatin. We found that a 44 kb region corresponding to the Zfpm2 locus initiates de novo recruitment of PRC2. We then pinpointed a CpG island within this locus as both necessary and sufficient for PRC2 recruitment. Based on this causal demonstration and prior genomic analyses, we hypothesized that large GC-rich elements depleted of activating transcription factor motifs mediate PRC2 recruitment in mammals. We validated this model in two ways. First, we showed that a constitutively active CpG island is able to recruit PRC2 after excision of a cluster of activating motifs. Second, we showed that two 1 kb sequence intervals from the Escherichia coli genome with GC-contents comparable to a mammalian CpG island are both capable of recruiting PRC2 when integrated into the ES cell genome. Our findings demonstrate a causal role for GC-rich sequences in PRC2 recruitment and implicate a specific subset of CpG islands depleted of activating motifs as instrumental for the initial localization of this key regulator in mammalian genomes.

Journal ArticleDOI
TL;DR: This study provides strong evidence that multiple statistically distinct loci in this region affect smoking behavior, and is the first report of association between rs588765 (and correlates) and smoking that achieves genome-wide significance.
Abstract: Recently, genetic association findings for nicotine dependence, smoking behavior, and smoking-related diseases converged to implicate the chromosome 15q25.1 region, which includes the CHRNA5-CHRNA3-CHRNB4 cholinergic nicotinic receptor subunit genes. In particular, association with the nonsynonymous CHRNA5 SNP rs16969968 and correlates has been replicated in several independent studies. Extensive genotyping of this region has suggested additional statistically distinct signals for nicotine dependence, tagged by rs578776 and rs588765. One goal of the Consortium for the Genetic Analysis of Smoking Phenotypes (CGASP) is to elucidate the associations among these markers and dichotomous smoking quantity (heavy versus light smoking), lung cancer, and chronic obstructive pulmonary disease (COPD). We performed a meta-analysis across 34 datasets of European-ancestry subjects, including 38,617 smokers who were assessed for cigarettes-per-day, 7,700 lung cancer cases and 5,914 lung-cancer-free controls (all smokers), and 2,614 COPD cases and 3,568 COPD-free controls (all smokers). We demonstrate statistically independent associations of rs16969968 and rs588765 with smoking (mutually adjusted p-values < 10(-35) and < 10(-8) respectively). Because the risk alleles at these loci are negatively correlated, their association with smoking is stronger in the joint model than when each SNP is analyzed alone. Rs578776 also demonstrates association with smoking after adjustment for rs16969968 (p < 10(-6)). In models adjusting for cigarettes-per-day, we confirm the association between rs16969968 and lung cancer (p < 10(-20)) and observe a nominally significant association with COPD (p = 0.01); the other loci are not significantly associated with either lung cancer or COPD after adjusting for rs16969968. This study provides strong evidence that multiple statistically distinct loci in this region affect smoking behavior. This study is also the first report of association between rs588765 (and correlates) and smoking that achieves genome-wide significance; these SNPs have previously been associated with mRNA levels of CHRNA5 in brain and lung tissue.

Journal ArticleDOI
TL;DR: This study confirmed association of the HLA locus, STAT4, TNFSF4, BLK, BANK1, IRF5, and TNFAIP3 with SLE in Asians and found new genetic findings that may help gain a better understanding of the disease and the functions of the genes involved.
Abstract: Systemic lupus erythematosus is a complex and potentially fatal autoimmune disease, characterized by autoantibody production and multi-organ damage. By a genome-wide association study (320 patients and 1,500 controls) and subsequent replication altogether involving a total of 3,300 Asian SLE patients from Hong Kong, Mainland China, and Thailand, as well as 4,200 ethnically and geographically matched controls, genetic variants in ETS1 and WDFY4 were found to be associated with SLE (ETS1: rs1128334, P = 2.33x10(-11), OR = 1.29; WDFY4: rs7097397, P = 8.15x10(-12), OR = 1.30). ETS1 encodes for a transcription factor known to be involved in a wide range of immune functions, including Th17 cell development and terminal differentiation of B lymphocytes. SNP rs1128334 is located in the 3'-UTR of ETS1, and allelic expression analysis from peripheral blood mononuclear cells showed significantly lower expression level from the risk allele. WDFY4 is a conserved protein with unknown function, but is predominantly expressed in primary and secondary immune tissues, and rs7097397 in WDFY4 changes an arginine residue to glutamine (R1816Q) in this protein. Our study also confirmed association of the HLA locus, STAT4, TNFSF4, BLK, BANK1, IRF5, and TNFAIP3 with SLE in Asians. These new genetic findings may help us to gain a better understanding of the disease and the functions of the genes involved.

Journal ArticleDOI
Sandosh Padmanabhan1, Olle Melander2, Toby Johnson3, Anna Maria Di Blasio, Wai K. Lee1, Davide Gentilini, Claire E. Hastie1, Cristina Menni4, Cristina Menni1, Maria Cristina Monti5, Christian Delles1, Stewart Laing1, Barbara Corso5, Gerjan Navis6, Arjan J. Kwakernaak6, Pim van der Harst6, Murielle Bochud7, Marc Maillard7, Michel Burnier7, Thomas Hedner8, Sverre E. Kjeldsen9, Björn Wahlstrand8, Marketa Sjögren2, Cristiano Fava10, Cristiano Fava2, Martina Montagnana10, Martina Montagnana2, Elisa Danese2, Elisa Danese10, Ole Torffvit, Bo Hedblad2, Harold Snieder6, John M. C. Connell11, Morris Brown12, Nilesh J. Samani13, Martin Farrall14, Giancarlo Cesana4, Giuseppe Mancia4, Stefano Signorini, Guido Grassi4, Susana Eyheramendy15, H.-Erich Wichmann16, Maris Laan17, David P. Strachan18, Peter S. Sever19, Denis C. Shields20, Alice Stanton21, Peter Vollenweider7, Alexander Teumer22, Henry Völzke22, Rainer Rettig22, Christopher Newton-Cheh23, Christopher Newton-Cheh24, Pankaj Arora23, Pankaj Arora24, Feng Zhang25, Nicole Soranzo26, Nicole Soranzo25, Tim D. Spector25, Gavin Lucas, Sekar Kathiresan24, Sekar Kathiresan23, David S. Siscovick27, Jian'an Luan, Ruth J. F. Loos, Nicholas J. Wareham, Brenda W.J.H. Penninx28, Brenda W.J.H. Penninx6, Brenda W.J.H. Penninx29, Ilja M. Nolte6, Martin W. McBride1, William H. Miller1, Stuart A. Nicklin1, Andrew H. Baker1, Delyth Graham1, Robert A. McDonald1, Jill P. Pell1, Naveed Sattar1, Paul Welsh1, Patricia B. Munroe3, Mark J. Caulfield3, Alberto Zanchetti30, Anna F. Dominiczak1 
TL;DR: The newly discovered UMOD locus for hypertension has the potential to give new insights into the role of uromodulin in BP regulation and to identify novel drugable targets for reducing cardiovascular risk.
Abstract: Hypertension is a heritable and major contributor to the global burden of disease. The sum of rare and common genetic variants robustly identified so far explain only 1%-2% of the population variation in BP and hypertension. This suggests the existence of more undiscovered common variants. We conducted a genome-wide association study in 1,621 hypertensive cases and 1,699 controls and follow-up validation analyses in 19,845 cases and 16,541 controls using an extreme case-control design. We identified a locus on chromosome 16 in the 59 region of Uromodulin (UMOD; rs13333226, combined P value of 3.6x10(-11)). The minor G allele is associated with a lower risk of hypertension (OR [95% CI]: 0.87 [0.84-0.91]), reduced urinary uromodulin excretion, better renal function; and each copy of the G allele is associated with a 7.7% reduction in risk of CVD events after adjusting for age, sex, BMI, and smoking status (H.R. = 0.923, 95% CI 0.860-0.991; p = 0.027). In a subset of 13,446 individuals with estimated glomerular filtration rate (eGFR) measurements, we show that rs13333226 is independently associated with hypertension (unadjusted for eGFR: 0.89 [0.83-0.96], p = 0.004; after eGFR adjustment: 0.89 [0.83-0.96], p = 0.003). In clinical functional studies, we also consistently show the minor G allele is associated with lower urinary uromodulin excretion. The exclusive expression of uromodulin in the thick portion of the ascending limb of Henle suggests a putative role of this variant in hypertension through an effect on sodium homeostasis. The newly discovered UMOD locus for hypertension has the potential to give new insights into the role of uromodulin in BP regulation and to identify novel drugable targets for reducing cardiovascular risk.

Journal ArticleDOI
TL;DR: In this paper, a mass spectrometric method was developed to quantify tRNA modifications in Saccharomyces cerevisiae and revealed several novel biosynthetic pathways for RNA modifications and led to the discovery of signature changes in the spectrum of tRN modifications in the damage response to mechanistically different toxicants.
Abstract: Decades of study have revealed more than 100 ribonucleoside structures incorporated as post-transcriptional modifications mainly in tRNA and rRNA, yet the larger functional dynamics of this conserved system are unclear. To this end, we developed a highly precise mass spectrometric method to quantify tRNA modifications in Saccharomyces cerevisiae. Our approach revealed several novel biosynthetic pathways for RNA modifications and led to the discovery of signature changes in the spectrum of tRNA modifications in the damage response to mechanistically different toxicants. This is illustrated with the RNA modifications Cm, m(5)C, and m(2) (2)G, which increase following hydrogen peroxide exposure but decrease or are unaffected by exposure to methylmethane sulfonate, arsenite, and hypochlorite. Cytotoxic hypersensitivity to hydrogen peroxide is conferred by loss of enzymes catalyzing the formation of Cm, m(5)C, and m(2) (2)G, which demonstrates that tRNA modifications are critical features of the cellular stress response. The results of our study support a general model of dynamic control of tRNA modifications in cellular response pathways and add to the growing repertoire of mechanisms controlling translational responses in cells.

Journal ArticleDOI
TL;DR: It is shown that the accuracy of predicting genetic values is higher for traits with a proportion of large effects than for a trait with no loci of large effect (overall type), provided the method of analysis takes advantage of the distribution of loci effects.
Abstract: Prediction of genetic merit using dense SNP genotypes can be used for estimation of breeding values for selection of livestock, crops, and forage species; for prediction of disease risk; and for forensics. The accuracy of these genomic predictions depends in part on the genetic architecture of the trait, in particular number of loci affecting the trait and distribution of their effects. Here we investigate the difference among three traits in distribution of effects and the consequences for the accuracy of genomic predictions. Proportion of black coat colour in Holstein cattle was used as one model complex trait. Three loci, KIT, MITF, and a locus on chromosome 8, together explain 24% of the variation of proportion of black. However, a surprisingly large number of loci of small effect are necessary to capture the remaining variation. A second trait, fat concentration in milk, had one locus of large effect and a host of loci with very small effects. Both these distributions of effects were in contrast to that for a third trait, an index of scores for a number of aspects of cow confirmation (“overall type”), which had only loci of small effect. The differences in distribution of effects among the three traits were quantified by estimating the distribution of variance explained by chromosome segments containing 50 SNPs. This approach was taken to account for the imperfect linkage disequilibrium between the SNPs and the QTL affecting the traits. We also show that the accuracy of predicting genetic values is higher for traits with a proportion of large effects (proportion black and fat percentage) than for a trait with no loci of large effect (overall type), provided the method of analysis takes advantage of the distribution of loci effects.

Journal ArticleDOI
TL;DR: It is concluded that there is selection to increase synonymous GC-content in many species due to translational selection or biased gene conversion, because optimal codons tend to be AT-rich and the excess of GC→AT SNPs is observed in datasets with no evidence of recombination.
Abstract: The genomic GC-content of bacteria varies dramatically, from less than 20% to more than 70%. This variation is generally ascribed to differences in the pattern of mutation between bacteria. Here we test this hypothesis by examining patterns of synonymous polymorphism using datasets from 149 bacterial species. We find a large excess of synonymous GC→AT mutations over AT→GC mutations segregating in all but the most AT-rich bacteria, across a broad range of phylogenetically diverse species. We show that the excess of GC→AT mutations is inconsistent with mutation bias, since it would imply that most GC-rich bacteria are declining in GC-content; such a pattern would be unsustainable. We also show that the patterns are probably not due to translational selection or biased gene conversion, because optimal codons tend to be AT-rich, and the excess of GC→AT SNPs is observed in datasets with no evidence of recombination. We therefore conclude that there is selection to increase synonymous GC-content in many species. Since synonymous GC-content is highly correlated to genomic GC-content, we further conclude that there is selection on genomic base composition in many bacteria.