scispace - formally typeset
Search or ask a question

Showing papers by "Wellcome Trust Sanger Institute published in 2007"


Journal ArticleDOI
14 Jun 2007-Nature
TL;DR: Functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project are reported, providing convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts.
Abstract: We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.

5,091 citations


Journal ArticleDOI
18 Oct 2007-Nature
TL;DR: The Phase II HapMap is described, which characterizes over 3.1 million human single nucleotide polymorphisms genotyped in 270 individuals from four geographically diverse populations and includes 25–35% of common SNP variation in the populations surveyed, and increased differentiation at non-synonymous, compared to synonymous, SNPs is demonstrated.
Abstract: We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25-35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10-30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations.

4,565 citations


Journal ArticleDOI
TL;DR: The overlap of miRNA sequences with annotated transcripts, both protein- and non-coding, are described and graphical views of the locations of a wide range of genomic features in model organisms allow for the first time the prediction of the likely boundaries of many miRNA primary transcripts.
Abstract: miRBase is the central online repository for microRNA (miRNA) nomenclature, sequence data, annotation and target prediction. The current release (10.0) contains 5071 miRNA loci from 58 species, expressing 5922 distinct mature miRNA sequences: a growth of over 2000 sequences in the past 2 years. miRBase provides a range of data to facilitate studies of miRNA genomics: all miRNAs are mapped to their genomic coordinates. Clusters of miRNA sequences in the genome are highlighted, and can be defined and retrieved with any inter-miRNA distance. The overlap of miRNA sequences with annotated transcripts, both protein- and non-coding, are described. Finally, graphical views of the locations of a wide range of genomic features in model organisms allow for the first time the prediction of the likely boundaries of many miRNA primary transcripts. miRBase is available at http://microrna.sanger.ac.uk/.

4,493 citations


Journal ArticleDOI
11 May 2007-Science
TL;DR: A genome-wide search for type 2 diabetes–susceptibility genes identified a common variant in the FTO (fat mass and obesity associated) gene that predisposes to diabetes through an effect on body mass index (BMI).
Abstract: Obesity is a serious international health problem that increases the risk of several common diseases. The genetic factors predisposing to obesity are poorly understood. A genome-wide search for type 2 diabetes-susceptibility genes identified a common variant in the FTO (fat mass and obesity associated) gene that predisposes to diabetes through an effect on body mass index (BMI). An additive association of the variant with BMI was replicated in 13 cohorts with 38,759 participants. The 16% of adults who are homozygous for the risk allele weighed about 3 kilograms more and had 1.67-fold increased odds of obesity when compared with those not inheriting a risk allele. This association was observed from age 7 years upward and reflects a specific increase in fat mass.

4,184 citations


Journal Article
TL;DR: In this paper, the coding exons of the family of 518 protein kinases were sequenced in 210 cancers of diverse histological types to explore the nature of the information that will be derived from cancer genome sequencing.
Abstract: AACR Centennial Conference: Translational Cancer Medicine-- Nov 4-8, 2007; Singapore PL02-05 All cancers are due to abnormalities in DNA. The availability of the human genome sequence has led to the proposal that resequencing of cancer genomes will reveal the full complement of somatic mutations and hence all the cancer genes. To explore the nature of the information that will be derived from cancer genome sequencing we have sequenced the coding exons of the family of 518 protein kinases, ~1.3Mb DNA per cancer sample, in 210 cancers of diverse histological types. Despite the screen being directed toward the coding regions of a gene family that has previously been strongly implicated in oncogenesis, the results indicate that the majority of somatic mutations detected are “passengers”. There is considerable variation in the number and pattern of these mutations between individual cancers, indicating substantial diversity of processes of molecular evolution between cancers. The imprints of exogenous mutagenic exposures, mutagenic treatment regimes and DNA repair defects can all be seen in the distinctive mutational signatures of individual cancers. This systematic mutation screen and others have previously yielded a number of cancer genes that are frequently mutated in one or more cancer types and which are now anticancer drug targets (for example BRAF , PIK3CA , and EGFR ). However, detailed analyses of the data from our screen additionally suggest that there exist a large number of additional “driver” mutations which are distributed across a substantial number of genes. It therefore appears that cells may be able to utilise mutations in a large repertoire of potential cancer genes to acquire the neoplastic phenotype. However, many of these genes are employed only infrequently. These findings may have implications for future anticancer drug development.

2,737 citations


Journal ArticleDOI
08 Mar 2007-Nature
TL;DR: More than 1,000 somatic mutations found in 274 megabases of DNA corresponding to the coding exons of 518 protein kinase genes in 210 diverse human cancers reveal the evolutionary diversity of cancers and implicates a larger repertoire of cancer genes than previously anticipated.
Abstract: Cancers arise owing to mutations in a subset of genes that confer growth advantage. The availability of the human genome sequence led us to propose that systematic resequencing of cancer genomes for mutations would lead to the discovery of many additional cancer genes. Here we report more than 1,000 somatic mutations found in 274 megabases (Mb) of DNA corresponding to the coding exons of 518 protein kinase genes in 210 diverse human cancers. There was substantial variation in the number and pattern of mutations in individual cancers reflecting different exposures, DNA repair defects and cellular origins. Most somatic mutations are likely to be 'passengers' that do not contribute to oncogenesis. However, there was evidence for 'driver' mutations contributing to the development of the cancers studied in approximately 120 genes. Systematic sequencing of cancer genomes therefore reveals the evolutionary diversity of cancers and implicates a larger repertoire of cancer genes than previously anticipated.

2,732 citations


Journal ArticleDOI
Douglas F. Easton1, Karen A. Pooley1, Alison M. Dunning1, Paul D.P. Pharoah1, Deborah J. Thompson1, Dennis G. Ballinger, Jeffery P. Struewing2, Jonathan J. Morrison1, Helen I. Field1, Robert Luben1, Nicholas J. Wareham1, Shahana Ahmed1, Catherine S. Healey1, Richard Bowman, Kerstin B. Meyer1, Christopher A. Haiman3, Laurence K. Kolonel, Brian E. Henderson3, Loic Le Marchand, Paul Brennan4, Suleeporn Sangrajrang, Valerie Gaborieau4, Fabrice Odefrey4, Chen-Yang Shen5, Pei-Ei Wu5, Hui-Chun Wang5, Diana Eccles6, D. Gareth Evans7, Julian Peto8, Olivia Fletcher9, Nichola Johnson9, Sheila Seal, Michael R. Stratton10, Nazneen Rahman, Georgia Chenevix-Trench11, Georgia Chenevix-Trench12, Stig E. Bojesen13, Børge G. Nordestgaard13, C K Axelsson13, Montserrat Garcia-Closas2, Louise A. Brinton2, Stephen J. Chanock2, Jolanta Lissowska14, Beata Peplonska15, Heli Nevanlinna16, Rainer Fagerholm16, H Eerola16, Daehee Kang17, Keun-Young Yoo17, Dong-Young Noh17, Sei Hyun Ahn18, David J. Hunter19, Susan E. Hankinson19, David G. Cox19, Per Hall20, Sara Wedrén20, Jianjun Liu21, Yen-Ling Low21, Natalia Bogdanova22, Peter Schu¨rmann22, Do¨rk Do¨rk22, Rob A. E. M. Tollenaar23, Catharina E. Jacobi23, Peter Devilee23, Jan G. M. Klijn24, Alice J. Sigurdson2, Michele M. Doody2, Bruce H. Alexander25, Jinghui Zhang2, Angela Cox26, Ian W. Brock26, Gordon MacPherson26, Malcolm W.R. Reed26, Fergus J. Couch27, Ellen L. Goode27, Janet E. Olson27, Hanne Meijers-Heijboer24, Hanne Meijers-Heijboer28, Ans M.W. van den Ouweland24, André G. Uitterlinden24, Fernando Rivadeneira24, Roger L. Milne29, Gloria Ribas29, Anna González-Neira29, Javier Benitez29, John L. Hopper30, Margaret R. E. McCredie31, Margaret R. E. McCredie32, Margaret R. E. McCredie12, Melissa C. Southey12, Melissa C. Southey30, Graham G. Giles33, Chris Schroen30, Christina Justenhoven34, Christina Justenhoven35, Hiltrud Brauch35, Hiltrud Brauch34, Ute Hamann36, Yon-Dschun Ko, Amanda B. Spurdle11, Jonathan Beesley11, Xiaoqing Chen11, _ kConFab37, Arto Mannermaa37, Veli-Matti Kosma37, Vesa Kataja37, Jaana M. Hartikainen37, Nicholas E. Day1, David Cox, Bruce A.J. Ponder1 
28 Jun 2007-Nature
TL;DR: To identify further susceptibility alleles, a two-stage genome-wide association study in 4,398 breast cancer cases and 4,316 controls was conducted, followed by a third stage in which 30 single nucleotide polymorphisms were tested for confirmation.
Abstract: Breast cancer exhibits familial aggregation, consistent with variation in genetic susceptibility to the disease. Known susceptibility genes account for less than 25% of the familial risk of breast cancer, and the residual genetic variance is likely to be due to variants conferring more moderate risks. To identify further susceptibility alleles, we conducted a two-stage genome-wide association study in 4,398 breast cancer cases and 4,316 controls, followed by a third stage in which 30 single nucleotide polymorphisms (SNPs) were tested for confirmation in 21,860 cases and 22,578 controls from 22 studies. We used 227,876 SNPs that were estimated to correlate with 77% of known common SNPs in Europeans at r2.0.5. SNPs in five novel independent loci exhibited strong and consistent evidence of association with breast cancer (P,1027). Four of these contain plausible causative genes (FGFR2, TNRC9, MAP3K1 and LSP1). At the second stage, 1,792 SNPs were significant at the P,0.05 level compared with an estimated 1,343 that would be expected by chance, indicating that many additional common susceptibility alleles may be identifiable by this approach.

2,288 citations


Journal ArticleDOI
Andrew G. Clark1, Michael B. Eisen2, Michael B. Eisen3, Douglas Smith  +426 moreInstitutions (70)
08 Nov 2007-Nature
TL;DR: These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution.
Abstract: Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.

2,057 citations


Journal ArticleDOI
27 Apr 2007-Science
TL;DR: It is shown that mice deficient for bic/microRNA-155 are immunodeficient and display increased lung airway remodeling, and suggests that bic-micro RNA-155 plays a key role in the homeostasis and function of the immune system.
Abstract: MicroRNAs are a class of small RNAs that are increasingly being recognized as important regulators of gene expression. Although hundreds of microRNAs are present in the mammalian genome, genetic studies addressing their physiological roles are at an early stage. We have shown that mice deficient for bic/microRNA-155 are immunodeficient and display increased lung airway remodeling. We demonstrate a requirement of bic/microRNA-155 for the function of B and T lymphocytes and dendritic cells. Transcriptome analysis of bic/microRNA-155–deficient CD4+ T cells identified a wide spectrum of microRNA-155–regulated genes, including cytokines, chemokines, and transcription factors. Our work suggests that bic/microRNA-155 plays a key role in the homeostasis and function of the immune system.

1,880 citations


Journal ArticleDOI
Pardis C. Sabeti1, Pardis C. Sabeti2, Patrick Varilly1, Patrick Varilly2  +255 moreInstitutions (50)
18 Oct 2007-Nature
TL;DR: ‘Long-range haplotype’ methods, which were developed to identify alleles segregating in a population that have undergone recent selection, and new methods that are based on cross-population comparisons to discover alleles that have swept to near-fixation within a population are developed.
Abstract: With the advent of dense maps of human genetic variation, it is now possible to detect positive natural selection across the human genome. Here we report an analysis of over 3 million polymorphisms from the International HapMap Project Phase 2 (HapMap2). We used 'long-range haplotype' methods, which were developed to identify alleles segregating in a population that have undergone recent selection, and we also developed new methods that are based on cross-population comparisons to discover alleles that have swept to near-fixation within a population. The analysis reveals more than 300 strong candidate regions. Focusing on the strongest 22 regions, we develop a heuristic for scrutinizing these regions to identify candidate targets of selection. In a complementary analysis, we identify 26 non-synonymous, coding, single nucleotide polymorphisms showing regional evidence of positive selection. Examination of these candidates highlights three cases in which two genes in a common biological process have apparently undergone positive selection in the same population:LARGE and DMD, both related to infection by the Lassa virus, in West Africa;SLC24A5 and SLC45A2, both involved in skin pigmentation, in Europe; and EDAR and EDA2R, both involved in development of hair follicles, in Asia.

1,778 citations


Journal ArticleDOI
09 Feb 2007-Science
TL;DR: To determine the overall contribution of CNVs to complex phenotypes, association analyses of expression levels with SNPs and CNVs in individuals who are part of the International HapMap project show little overlap between the two types of variation.
Abstract: Extensive studies are currently being performed to associate disease susceptibility with one form of genetic variation, namely, single-nucleotide polymorphisms (SNPs). In recent years, another type of common genetic variation has been characterized, namely, structural variation, including copy number variants (CNVs). To determine the overall contribution of CNVs to complex phenotypes, we have performed association analyses of expression levels of 14,925 transcripts with SNPs and CNVs in individuals who are part of the International HapMap project. SNPs and CNVs captured 83.6% and 17.7% of the total detected genetic variation in gene expression, respectively, but the signals from the two types of variation had little overlap. Interrogation of the genome for both types of variants may be an effective way to elucidate the causes of complex phenotypes and disease in humans.

Journal ArticleDOI
30 Nov 2007-Science
TL;DR: It is found that recombinant murine Fto catalyzes the Fe(II)- and 2OG-dependent demethylation of 3-methylthymine in single-stranded DNA, with concomitant production of succinate, formaldehyde, and carbon dioxide.
Abstract: Variants in the FTO (fat mass and obesity associated) gene are associated with increased body mass index in humans. Here, we show by bioinformatics analysis that FTO shares sequence motifs with Fe(II)- and 2-oxoglutarate–dependent oxygenases. We find that recombinant murine Fto catalyzes the Fe(II)- and 2OG-dependent demethylation of 3-methylthymine in single-stranded DNA, with concomitant production of succinate, formaldehyde, and carbon dioxide. Consistent with a potential role in nucleic acid demethylation, Fto localizes to the nucleus in transfected cells. Studies of wild-type mice indicate that Fto messenger RNA (mRNA) is most abundant in the brain, particularly in hypothalamic nuclei governing energy balance, and that Fto mRNA levels in the arcuate nucleus are regulated by feeding and fasting. Studies can now be directed toward determining the physiologically relevant FTO substrate and how nucleic acid methylation status is linked to increased fat mass.

Journal ArticleDOI
TL;DR: A genotype-phenotype association study in Tanzanians, Kenyans and Sudanese and identified three SNPs that are associated with lactase persistence and that have derived alleles that significantly enhance transcription from the LCT promoter in vitro, providing a marked example of convergent evolution due to strong selective pressure resulting from shared cultural traits.
Abstract: A SNP in the gene encoding lactase (LCT) (C/T-13910) is associated with the ability to digest milk as adults (lactase persistence) in Europeans, but the genetic basis of lactase persistence in Africans was previously unknown. We conducted a genotype-phenotype association study in 470

Reference EntryDOI
TL;DR: In this paper, the concept of hidden Markov models in computational biology is introduced and described using simple biological examples, requiring as little mathematical knowledge as possible, and an overview of their current applications are presented.
Abstract: This unit introduces the concept of hidden Markov models in computational biology. It describes them using simple biological examples, requiring as little mathematical knowledge as possible. The unit also presents a brief history of hidden Markov models and an overview of their current applications before concluding with a discussion of their limitations.

Journal ArticleDOI
Paul Burton1, David Clayton2, Lon R. Cardon1, Nicholas John Craddock3  +221 moreInstitutions (30)
TL;DR: In this paper, the authors report initial association and independent replication in a North American sample of two new loci related to ankylosing spondylitis, ARTS1 and IL23R, and confirm the previously reported association of AITD with TSHR and FCRL3.
Abstract: We have genotyped 14,436 nonsynonymous SNPs (nsSNPs) and 897 major histocompatibility complex (MHC) tag SNPs from 1,000 independent cases of ankylosing spondylitis (AS), autoimmune thyroid disease (AITD), multiple sclerosis (MS) and breast cancer (BC). Comparing these data against a common control dataset derived from 1,500 randomly selected healthy British individuals, we report initial association and independent replication in a North American sample of two new loci related to ankylosing spondylitis, ARTS1 and IL23R, and confirmation of the previously reported association of AITD with TSHR and FCRL3. These findings, enabled in part by increased statistical power resulting from the expansion of the control reference group to include individuals from the other disease groups, highlight notable new possibilities for autoimmune regulation and suggest that IL23R may be a common susceptibility factor for the major 'seronegative' diseases.

Journal ArticleDOI
13 Apr 2007-Science
TL;DR: The genome sequence of an Indian-origin Macaca mulatta female is determined and compared with chimpanzees and humans to reveal the structure of ancestral primate genomes and to identify evidence for positive selection and lineage-specific expansions and contractions of gene families.
Abstract: The rhesus macaque (Macaca mulatta) is an abundant primate species that diverged from the ancestors of Homo sapiens about 25 million years ago. Because they are genetically and physiologically similar to humans, rhesus monkeys are the most widely used nonhuman primate in basic and applied biomedical research. We determined the genome sequence of an Indian-origin Macaca mulatta female and compared the data with chimpanzees and humans to reveal the structure of ancestral primate genomes and to identify evidence for positive selection and lineage-specific expansions and contractions of gene families. A comparison of sequences from individual animals was used to investigate their underlying genetic diversity. The complete description of the macaque genome blueprint enhances the utility of this animal model for biomedical research and improves our understanding of the basic biology of the species.

Journal ArticleDOI
TL;DR: It is found that copy number of the salivary amylase gene (AMY1) is correlated positively with salivaries protein level and that individuals from populations with high-starch diets have, on average, more AMY1 copies than those with traditionally low-st starch diets.
Abstract: Starch consumption is a prominent characteristic of agricultural societies and hunter-gatherers in arid environments. In contrast, rainforest and circum-arctic hunter-gatherers and some pastoralists consume much less starch. This behavioral variation raises the possibility that different selective pressures have acted on amylase, the enzyme responsible for starch hydrolysis. We found that copy number of the salivary amylase gene (AMY1) is correlated positively with salivary amylase protein level and that individuals from populations with high-starch diets have, on average, more AMY1 copies than those with traditionally low-starch diets. Comparisons with other loci in a subset of these populations suggest that the extent of AMY1 copy number differentiation is highly unusual. This example of positive selection on a copy number-variable gene is, to our knowledge, one of the first discovered in the human genome. Higher AMY1 copy numbers and protein levels probably improve the digestion of starchy foods and may buffer against the fitness-reducing effects of intestinal disease.

Journal ArticleDOI
TL;DR: Four somatic gain-of-function mutations affecting JAK2 exon 12 define a distinctive myeloproliferative syndrome that affects patients who currently receive a diagnosis of polycythemia vera or idiopathic erythrocytosis.
Abstract: BACKGROUND The V617F mutation, which causes the substitution of phenylalanine for valine at position 617 of the Janus kinase (JAK) 2 gene (JAK2), is often present in patients with polycythemia vera, essential thrombocythemia, and idiopathic myelofibrosis. However, the molecular basis of these myeloproliferative disorders in patients without the V617F mutation is unclear. METHODS We searched for new mutations in members of the JAK and signal transducer and activator of transcription (STAT) gene families in patients with V617F-negative polycythemia vera or idiopathic erythrocytosis. The mutations were characterized biochemically and in a murine model of bone marrow transplantation. RESULTS We identified four somatic gain-of-function mutations affecting JAK2 exon 12 in 10 V617F-negative patients. Those with a JAK2 exon 12 mutation presented with an isolated erythrocytosis and distinctive bone marrow morphology, and several also had reduced serum erythropoietin levels. Erythroid colonies could be grown from their blood samples in the absence of exogenous erythropoietin. All such erythroid colonies were heterozygous for the mutation, whereas colonies homozygous for the mutation occur in most patients with V617F-positive polycythemia vera. BaF3 cells expressing the murine erythropoietin receptor and also carrying exon 12 mutations could proliferate without added interleukin-3. They also exhibited increased phosphorylation of JAK2 and extracellular regulated kinase 1 and 2, as compared with cells transduced by wild-type JAK2 or V617F JAK2. Three of the exon 12 mutations included a substitution of leucine for lysine at position 539 of JAK2. This mutation resulted in a myeloproliferative phenotype, including erythrocytosis, in a murine model of retroviral bone marrow transplantation. CONCLUSIONS JAK2 exon 12 mutations define a distinctive myeloproliferative syndrome that affects patients who currently receive a diagnosis of polycythemia vera or idiopathic erythrocytosis.

Journal ArticleDOI
TL;DR: A genome-wide association scan in individuals with Crohn's disease by the Wellcome Trust Case Control Consortium detected strong association at four novel loci, and 37 SNPs from these and other loci were tested for association in an independent case-control sample.
Abstract: A genome-wide association scan in individuals with Crohn's disease by the Wellcome Trust Case Control Consortium detected strong association at four novel loci. We tested 37 SNPs from these and other loci for association in an independent case-control sample. We obtained replication for the autophagy-inducing IRGM gene on chromosome 5q33.1 (replication P = 6.6 x 10(-4), combined P = 2.1 x 10(-10)) and for nine other loci, including NKX2-3, PTPN2 and gene deserts on chromosomes 1q and 5p13.

Journal ArticleDOI
TL;DR: Observations indicate the presence of at least two independent loci within 8q24 that contribute to prostate cancer in men of European ancestry, and it is estimated that the population attributable risk of the new locus, marked by rs6983267, is higher than the locus marked byrs1447295.
Abstract: Recently, common variants on human chromosome 8q24 were found to be associated with prostate cancer risk. While conducting a genome-wide association study in the Cancer Genetic Markers of Susceptibility project with 550,000 SNPs in a nested case-control study (1,172 cases and 1,157 controls of European origin), we identified a new association at 8q24 with an independent effect on prostate cancer susceptibility. The most significant signal is 70 kb centromeric to the previously reported SNP, rs1447295, but shows little evidence of linkage disequilibrium with it. A combined analysis with four additional studies (total: 4,296 cases and 4,299 controls) confirms association with prostate cancer for rs6983267 in the centromeric locus (P = 9.42 x 10(-13); heterozygote odds ratio (OR): 1.26, 95% confidence interval (c.i.): 1.13-1.41; homozygote OR: 1.58, 95% c.i.: 1.40-1.78). Each SNP remained significant in a joint analysis after adjusting for the other (rs1447295 P = 1.41 x 10(-11); rs6983267 P = 6.62 x 10(-10)). These observations, combined with compelling evidence for a recombination hotspot between the two markers, indicate the presence of at least two independent loci within 8q24 that contribute to prostate cancer in men of European ancestry. We estimate that the population attributable risk of the new locus, marked by rs6983267, is higher than the locus marked by rs1447295 (21% versus 9%).

Journal ArticleDOI
TL;DR: It is found that gene expression is heritable and that differentiation between populations is in agreement with earlier small-scale studies, and the results strongly support an abundance of cis-regulatory variation in the human genome.
Abstract: Genetic variation influences gene expression, and this variation in gene expression can be efficiently mapped to specific genomic regions and variants. Here we have used gene expression profiling of Epstein-Barr virus‐transformed lymphoblastoid cell lines of all 270 individuals genotyped in the HapMap Consortium to elucidate the detailed features of genetic variation underlying gene expression variation. We find that gene expression is heritable and that differentiation between populations is in agreement with earlier small-scale studies. A detailed association analysis of over 2.2 million common SNPs per population (5% frequency in HapMap) with gene expression identified at least 1,348 genes with association signals in cis and at least 180 in trans. Replication in at least one independent population was achieved for 37% of cis signals and 15% of trans signals, respectively. Our results strongly support an abundance of cis-regulatory variation in the human genome. Detection of trans effects is limited but suggests that regulatory variation may be the key primary effect contributing to phenotypic variation in humans. We also explore several methodologies that improve the current state of analysis of gene expression variation.

Journal ArticleDOI
TL;DR: Manipulation of the intestinal microbiota by the enteropathogenic bacterium Salmonella enterica subspecies 1 serovar Typhimurium in a mouse colitis model reveals a new concept in infectious disease: in contrast to current thinking, inflammation is not always detrimental for the pathogen.
Abstract: Most mucosal surfaces of the mammalian body are colonized by microbial communities ("microbiota"). A high density of commensal microbiota inhabits the intestine and shields from infection ("colonization resistance"). The virulence strategies allowing enteropathogenic bacteria to successfully compete with the microbiota and overcome colonization resistance are poorly understood. Here, we investigated manipulation of the intestinal microbiota by the enteropathogenic bacterium Salmonella enterica subspecies 1 serovar Typhimurium (S. Tm) in a mouse colitis model: we found that inflammatory host responses induced by S. Tm changed microbiota composition and suppressed its growth. In contrast to wild-type S. Tm, an avirulent invGsseD mutant failing to trigger colitis was outcompeted by the microbiota. This competitive defect was reverted if inflammation was provided concomitantly by mixed infection with wild-type S. Tm or in mice (IL10(-/-), VILLIN-HA(CL4-CD8)) with inflammatory bowel disease. Thus, inflammation is necessary and sufficient for overcoming colonization resistance. This reveals a new concept in infectious disease: in contrast to current thinking, inflammation is not always detrimental for the pathogen. Triggering the host's immune defence can shift the balance between the protective microbiota and the pathogen in favour of the pathogen.

Journal ArticleDOI
TL;DR: The results show that PALB2 is a breast cancer susceptibility gene and further demonstrate the close relationship of the Fanconi anemia–DNA repair pathway and breast cancer predisposition.
Abstract: PALB2 interacts with BRCA2, and biallelic mutations in PALB2 (also known as FANCN), similar to biallelic BRCA2 mutations, cause Fanconi anemia. We identified monoallelic truncating PALB2 mutations in 10/923 individuals with familial breast cancer compared with 0/1,084 controls (P = 0.0004) and show that such mutations confer a 2.3-fold higher risk of breast cancer (95% confidence interval (c.i.) = 1.4-3.9, P = 0.0025). The results show that PALB2 is a breast cancer susceptibility gene and further demonstrate the close relationship of the Fanconi anemia-DNA repair pathway and breast cancer predisposition.

Journal ArticleDOI
21 Dec 2007-Immunity
TL;DR: The intrinsic requirement for miR-155 is shown in B cell responses to thymus-dependent and -independent antigens and implicate post-transcriptional regulation of gene expression for establishing the terminal differentiation program of B cells.

Journal ArticleDOI
10 May 2007-Nature
TL;DR: A high-quality draft of the genome sequence of the grey, short-tailed opossum is reported, indicating a strong influence of biased gene conversion on nucleotide sequence composition, and a relationship between chromosomal characteristics and X chromosome inactivation.
Abstract: We report a high-quality draft of the genome sequence of the grey, short-tailed opossum (Monodelphis domestica). As the first metatherian ('marsupial') species to be sequenced, the opossum provides a unique perspective on the organization and evolution of mammalian genomes. Distinctive features of the opossum chromosomes provide support for recent theories about genome evolution and function, including a strong influence of biased gene conversion on nucleotide sequence composition, and a relationship between chromosomal characteristics and X chromosome inactivation. Comparison of opossum and eutherian genomes also reveals a sharp difference in evolutionary innovation between protein-coding and non-coding functional elements. True innovation in protein-coding genes seems to be relatively rare, with lineage-specific differences being largely due to diversification and rapid turnover in gene families involved in environmental interactions. In contrast, about 20% of eutherian conserved non-coding elements (CNEs) are recent inventions that postdate the divergence of Eutheria and Metatheria. A substantial proportion of these eutherian-specific CNEs arose from sequence inserted by transposable elements, pointing to transposons as a major creative force in the evolution of mammalian gene regulation.

Journal ArticleDOI
TL;DR: It is shown that pseudogene formation and gene loss are the principal forces shaping the different genomes of Leishmania, and genes that are differentially distributed between the species encode proteins implicated in host-pathogen interactions and parasite survival in the macrophage.
Abstract: Leishmania parasites cause a broad spectrum of clinical disease. Here we report the sequencing of the genomes of two species of Leishmania: Leishmania infantum and Leishmania braziliensis. The comparison of these sequences with the published genome of Leishmania major reveals marked conservation of synteny and identifies only 200 genes with a differential distribution between the three species. L. braziliensis, contrary to Leishmania species examined so far, possesses components of a putative RNA-mediated interference pathway, telomere-associated transposable elements and spliced leader–associated SLACS retrotransposons. We show that pseudogene formation and gene loss are the principal forces shaping the different genomes. Genes that are differentially distributed between the species encode proteins implicated in host-pathogen interactions and parasite survival in the macrophage.

Journal ArticleDOI
TL;DR: A highly conserved, thermodynamically stable RNA G-quadruplex is discovered in the 5' untranslated region (UTR) of the gene transcript of the human NRAS proto-oncogene and it is demonstrated that this NRAS RNAG- quadruplex modulates translation.
Abstract: Guanine-rich nucleic acid sequences can adopt noncanonical four-stranded secondary structures called guanine (G)-quadruplexes1. Bioinformatics analysis suggests that G-quadruplex motifs are prevalent in genomes2, which raises the need to elucidate their function. There is now evidence for the existence of DNA G-quadruplexes at telomeres with associated biological function3. A recent hypothesis supports the notion that gene promoter elements contain DNA G-quadruplex motifs that control gene expression at the transcriptional level4. We discovered a highly conserved, thermodynamically stable RNA G-quadruplex in the 5′ untranslated region (UTR) of the gene transcript of the human NRAS proto-oncogene. Using a cell-free translation system coupled to a reporter gene assay, we have demonstrated that this NRAS RNA G-quadruplex modulates translation. This is the first example of translational repression by an RNA G-quadruplex. Bioinformatics analysis has revealed 2,922 other 5′ UTR RNA G-quadruplex elements in the human genome. We propose that RNA G-quadruplexes in the 5′ UTR modulate gene expression at the translational level.

Journal ArticleDOI
TL;DR: This paper tested 310,605 SNPs for association in 778 individuals with celiac disease and 1,422 controls, and the most significant finding (rs13119723; P = 2.0 x 10(-7)) was in the KIAA1109-TENR-IL2-IL21 linkage disequilibrium block.
Abstract: We tested 310,605 SNPs for association in 778 individuals with celiac disease and 1,422 controls. Outside the HLA region, the most significant finding (rs13119723; P = 2.0 x 10(-7)) was in the KIAA1109-TENR-IL2-IL21 linkage disequilibrium block. We independently confirmed association in two further collections (strongest association at rs6822844, 24 kb 5' of IL21; meta-analysis P = 1.3 x 10(-14), odds ratio = 0.63), suggesting that genetic variation in this region predisposes to celiac disease.

Journal ArticleDOI
TL;DR: A number of new features have been added to the BioGRID including an improved user interface to display interactions based on different attributes, a mirror site and a dedicated interaction management system to coordinate curation across different locations.
Abstract: The Biological General Repository for Interaction Datasets (BioGRID) database (http://www.thebiogrid.org) was developed to house and distribute collections of protein and genetic interactions from major model organism species. BioGRID currently contains over 198 000 interactions from six different species, as derived from both high-throughput studies and conventional focused studies. Through comprehensive curation efforts, BioGRID now includes a virtually complete set of interactions reported to date in the primary literature for both the budding yeast Saccharomyces cerevisiae and the fission yeast Schizosaccharomyces pombe. A number of new features have been added to the BioGRID including an improved user interface to display interactions based on different attributes, a mirror site and a dedicated interaction management system to coordinate curation across different locations. The BioGRID provides interaction data with monthly updates to Saccharomyces Genome Database, Flybase and Entrez Gene. Source code for the BioGRID and the linked Osprey network visualization system is now freely available without restriction.

Journal ArticleDOI
08 Nov 2007-Nature
TL;DR: This work uses the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly, and identifies several classes of pre- and post-transcriptional regulatory motifs, and predicts individual motif instances with high confidence.
Abstract: Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly. Each type of functional element shows characteristic patterns of change, or 'evolutionary signatures', dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several classes of pre- and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We also study how discovery power scales with the divergence and number of species compared, and we provide general guidelines for comparative studies.