scispace - formally typeset
Search or ask a question

Showing papers by "Roderic Guigó published in 2011"


Journal ArticleDOI
TL;DR: An overview of the project and the resources it is generating and the application of ENCODE data to interpret the human genome are provided.
Abstract: The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome.

1,446 citations


Journal ArticleDOI
07 Jul 2011-Nature
TL;DR: The patterns of somatic mutation, supported by functional and clinical analyses, strongly indicate that the recurrent NOTCH1, MYD88 and XPO1 mutations are oncogenic changes that contribute to the clinical evolution of the disease.
Abstract: Chronic lymphocytic leukaemia (CLL), the most frequent leukaemia in adults in Western countries, is a heterogeneous disease with variable clinical presentation and evolution. Two major molecular subtypes can be distinguished, characterized respectively by a high or low number of somatic hypermutations in the variable region of immunoglobulin genes. The molecular changes leading to the pathogenesis of the disease are still poorly understood. Here we performed whole-genome sequencing of four cases of CLL and identified 46 somatic mutations that potentially affect gene function. Further analysis of these mutations in 363 patients with CLL identified four genes that are recurrently mutated: notch 1 (NOTCH1), exportin 1 (XPO1), myeloid differentiation primary response gene 88 (MYD88) and kelch-like 6 (KLHL6). Mutations in MYD88 and KLHL6 are predominant in cases of CLL with mutated immunoglobulin genes, whereas NOTCH1 and XPO1 mutations are mainly detected in patients with unmutated immunoglobulins. The patterns of somatic mutation, supported by functional and clinical analyses, strongly indicate that the recurrent NOTCH1, MYD88 and XPO1 mutations are oncogenic changes that contribute to the clinical evolution of the disease. To our knowledge, this is the first comprehensive analysis of CLL combining whole-genome sequencing with clinical characteristics and clinical outcomes. It highlights the usefulness of this approach for the identification of clinically relevant mutations in cancer.

1,435 citations


Journal ArticleDOI
Christopher A. Maxwell1, Javier Benitez, Laia Gómez-Baldó, Ana Osorio, Núria Bonifaci, Ricardo Fernandez-Ramires, Sylvain V. Costes2, Elisabet Guinó, Helen Chen1, G Evans1, Pooja Mohan1, Isabel Catala, Anna Petit, Helena Aguilar, Alberto Villanueva, Alvaro Aytes, Jordi Serra-Musach, Gad Rennert3, Flavio Lejbkowicz3, Paolo Peterlongo, Siranoush Manoukian, Bernard Peissel, Carla B. Ripamonti, Bernardo Bonanni4, Alessandra Viel, Anna Allavena5, Loris Bernard4, Paolo Radice, Eitan Friedman6, Bella Kaufman7, Yael Laitman7, Maya Dubrovsky7, Roni Milgrom7, Anna Jakubowska8, Cezary Cybulski8, Bohdan Górski8, Katarzyna Jaworska8, Katarzyna Durda8, Grzegorz Sukiennicki8, Jan Lubinski8, Yin Yao Shugart9, Susan M. Domchek10, Richard Letrero10, Barbara L. Weber11, Frans B. L. Hogervorst12, Matti A. Rookus12, J. Margriet Collée13, Peter Devilee14, Marjolijn J. L. Ligtenberg15, Rob B. van der Luijt16, Cora M. Aalfs17, Quinten Waisfisz18, Juul T. Wijnen14, Cornelis E. P. van Roozendaal19, Douglas F. Easton20, Susan Peock20, Margaret Cook20, Clare Oliver20, Debra Frost20, Patricia Harrington20, D. Gareth Evans21, Fiona Lalloo, Rosalind A. Eeles22, Louise Izatt23, Carol Chu24, Diana Eccles25, Fiona Douglas26, Carole Brewer27, Heli Nevanlinna28, Tuomas Heikkinen28, Fergus J. Couch29, Noralane M. Lindor29, Xianshu Wang29, Andrew K. Godwin30, Maria A. Caligo31, Grazia Lombardi31, Niklas Loman, Per Karlsson32, Hans Ehrencrona33, Anna von Wachenfeldt34, Rosa B. Barkardottir, Ute Hamann35, Muhammad Usman Rashid35, Adriana Lasa36, Trinidad Caldés37, Raquel Andrés38, Michael Schmitt39, Volker Assmann40, Kristen N. Stevens41, Kenneth Offit42, Joao Curado43, Hagen Tilgner43, Roderic Guigó43, Gemma Aiza, Joan Brunet, Joan Castellsague, Griselda Martrat, Ander Urruticoechea, Ignacio Blanco, Laima Tihomirova44, David E. Goldgar45, Saundra S. Buys45, Esther M. John46, Alexander Miron47, Melissa C. Southey48, Mary B. Daly49, Rita K. Schmutzler50, Barbara Wappenschmidt50, Alfons Meindl51, Norbert Arnold52, Helmut Deissler53, Raymonda Varon-Mateeva54, Christian Sutter55, Dieter Niederacher56, Evgeny Imyamitov, Olga M. Sinilnikova, Dominique Stoppa-Lyonne57, Sylvie Mazoyer58, Carole Verny-Pierre58, Laurent Castera57, Antoine De Pauw57, Yves-Jean Bignon, Nancy Uhrhammer, Jean-Philippe Peyrat, Philippe Vennin, Sandra Fert Ferrer, Marie-Agnès Collonge-Rame59, Isabelle Mortemousque, Amanda B. Spurdle60, Jonathan Beesley60, Xiaoqing Chen60, Sue Healey60, Mary Helen Barcellos-Hoff61, Marc Vidal47, Stephen B. Gruber41, Conxi Lázaro, Gabriel Capellá, Lesley McGuffog20, Katherine L. Nathanson20, Antonis C. Antoniou20, Georgia Chenevix-Trench60, Markus C. Fleisch56, Victor Moreno, Miguel Angel Pujana 
Family Research Institute1, Lawrence Berkeley National Laboratory2, Technion – Israel Institute of Technology3, European Institute of Oncology4, University of Turin5, Tel Aviv University6, Sheba Medical Center7, Pomeranian Medical University8, National Institutes of Health9, University of Pennsylvania10, Novartis11, Netherlands Cancer Institute12, Erasmus University Rotterdam13, Leiden University14, Radboud University Nijmegen15, Utrecht University16, University of Amsterdam17, VU University Amsterdam18, Maastricht University19, University of Cambridge20, Central Manchester University Hospitals NHS Foundation Trust21, The Royal Marsden NHS Foundation Trust22, Guy's and St Thomas' NHS Foundation Trust23, St James's University Hospital24, Princess Anne Hospital25, Newcastle upon Tyne Hospitals NHS Foundation Trust26, Royal Devon and Exeter Hospital27, University of Helsinki28, Mayo Clinic29, University of Kansas30, University of Pisa31, University of Gothenburg32, Uppsala University33, Karolinska Institutet34, German Cancer Research Center35, Memorial Hospital of South Bend36, Complutense University of Madrid37, University of Zaragoza38, University of Rostock39, University of Hamburg40, University of Michigan41, Memorial Sloan Kettering Cancer Center42, Pompeu Fabra University43, Latvian Biomedical Research and Study centre44, University of Utah45, Cancer Prevention Institute of California46, Harvard University47, University of Melbourne48, Fox Chase Cancer Center49, University of Cologne50, Technische Universität München51, University of Kiel52, University of Ulm53, Charité54, Heidelberg University55, University of Düsseldorf56, University of Paris57, University of Lyon58, University of Franche-Comté59, QIMR Berghofer Medical Research Institute60, New York University61
TL;DR: Cell biological analysis of the protein product suggests a function in regulating development of the mammary gland and genetic analysis identifies the HMMR gene as a modifier of the breast cancer risk associated with BRCA1 gene mutation.
Abstract: Differentiated mammary epithelium shows apicobasal polarity, and loss of tissue organization is an early hallmark of breast carcinogenesis. In BRCA1 mutation carriers, accumulation of stem and progenitor cells in normal breast tissue and increased risk of developing tumors of basal-like type suggest that BRCA1 regulates stem/progenitor cell proliferation and differentiation. However, the function of BRCA1 in this process and its link to carcinogenesis remain unknown. Here we depict a molecular mechanism involving BRCA1 and RHAMM that regulates apicobasal polarity and, when perturbed, may increase risk of breast cancer. Starting from complementary genetic analyses across families and populations, we identified common genetic variation at the low-penetrance susceptibility HMMR locus (encoding for RHAMM) that modifies breast cancer risk among BRCA1, but probably not BRCA2, mutation carriers: n = 7,584, weighted hazard ratio ((w)HR) = 1.09 (95% CI 1.02-1.16), p(trend) = 0.017; and n = 3,965, (w)HR = 1.04 (95% CI 0.94-1.16), p(trend) = 0.43; respectively. Subsequently, studies of MCF10A apicobasal polarization revealed a central role for BRCA1 and RHAMM, together with AURKA and TPX2, in essential reorganization of microtubules. Mechanistically, reorganization is facilitated by BRCA1 and impaired by AURKA, which is regulated by negative feedback involving RHAMM and TPX2. Taken together, our data provide fundamental insight into apicobasal polarization through BRCA1 function, which may explain the expanded cell subsets and characteristic tumor type accompanying BRCA1 mutation, while also linking this process to sporadic breast cancer through perturbation of HMMR/RHAMM.

190 citations


Journal ArticleDOI
TL;DR: The genomic distribution of CTCF in various human, mouse and chicken cell types is analyzed, demonstrating the existence of evolutionarily conserved C TCF-bound sites beyond mammals and predicting and functionally demonstrating that the polymorphic variants associated with multiple sclerosis impinge on the adjacent gene GFI1.
Abstract: Many genomic alterations associated with human diseases localize in noncoding regulatory elements located far from the promoters they regulate, making it challenging to link noncoding mutations or risk-associated variants with target genes. The range of action of a given set of enhancers is thought to be defined by insulator elements bound by the 11 zinc-finger nuclear factor CCCTC-binding protein (CTCF). Here we analyzed the genomic distribution of CTCF in various human, mouse and chicken cell types, demonstrating the existence of evolutionarily conserved CTCF-bound sites beyond mammals. These sites preferentially flank transcription factor-encoding genes, often associated with human diseases, and function as enhancer blockers in vivo, suggesting that they act as evolutionarily invariant gene boundaries. We then applied this concept to predict and functionally demonstrate that the polymorphic variants associated with multiple sclerosis located within the EVI5 gene impinge on the adjacent gene GFI1.

94 citations


Journal ArticleDOI
TL;DR: The data supports a model whereby the acquisition of functional AS has occurred throughout vertebrate evolution and is considered alongside amino acid change as a key mechanism in gene evolution, and indicates that functional NMD-linked AS is more widespread and ancient than previously thought.
Abstract: Alternative splicing (AS) has the potential to greatly expand the functional repertoire of mammalian transcriptomes. However, few variant transcripts have been characterized functionally, making it difficult to assess the contribution of AS to the generation of phenotypic complexity and to study the evolution of splicing patterns. We have compared the AS of 309 protein-coding genes in the human ENCODE pilot regions against their mouse orthologs in unprecedented detail, utilizing traditional transcriptomic and RNAseq data. The conservation status of every transcript has been investigated, and each functionally categorized as coding (separated into coding sequence [CDS] or nonsense-mediated decay [NMD] linked) or noncoding. In total, 36.7% of human and 19.3% of mouse coding transcripts are species specific, and we observe a 3.6 times excess of human NMD transcripts compared with mouse; in contrast to previous studies, the majority of species-specific AS is unlinked to transposable elements. We observe one conserved CDS variant and one conserved NMD variant per 2.3 and 11.4 genes, respectively. Subsequently, we identify and characterize equivalent AS patterns for 22.9% of these CDS or NMD-linked events in nonmammalian vertebrate genomes, and our data indicate that functional NMD-linked AS is more widespread and ancient than previously thought. Furthermore, although we observe an association between conserved AS and elevated sequence conservation, as previously reported, we emphasize that 30% of conserved AS exons display sequence conservation below the average score for constitutive exons. In conclusion, we demonstrate the value of detailed comparative annotation in generating a comprehensive set of AS transcripts, increasing our understanding of AS evolution in vertebrates. Our data supports a model whereby the acquisition of functional AS has occurred throughout vertebrate evolution and is considered alongside amino acid change as a key mechanism in gene evolution.

76 citations


Journal ArticleDOI
TL;DR: Characterization of the recurrent 8p11-12 amplicon identifies PPAPDC1B, a phosphatase protein, as a new therapeutic target in breast cancer.
Abstract: m/s n° 4, vol. 27, avril 2011 DOI : 10.1051/medsci/2011274009 6. Turner N, Pearson A, Sharpe R, et al. FGFR1 amplification drives endocrine therapy resistance and is a therapeutic target in breast cancer. Cancer Res 2010 ; 70 : 2085-94. 7. Zhang J, Liu X, Datta A, et al. RCP is a human breast cancer-promoting gene with Ras-activating function. J Clin Invest 2009 ; 119 : 2171-83. 8. Bernard-Pierrot I, Gruel N, Stransky N, et al. Characterization of the recurrent 8p11-12 amplicon identifies PPAPDC1B, a phosphatase protein, as a new therapeutic target in breast cancer. Cancer Res 2008 ; 68 : 7165-75. 9. Kwek SS, Roy R, Zhou H, et al. Co-amplified genes at 8p12 and 11q13 in breast tumors cooperate with two major pathways in oncogenesis. Oncogene 2009 ; 28 : 1892-1903. 10. Nakamura M, Runko AP, Sagerstrom CG. A novel subfamily of zinc finger genes involved in embryonic development. J Cell Biochem 2004 ; 93 : 887-95. REFERENCES

23 citations


Journal ArticleDOI
TL;DR: Capped analysis of gene expression has been an essential tool in transcriptome studies within several large-scale international projects, such as FANTOM3 and ENCODE and the book edited by Piero Carninci provides an excellent guide to the CAGE method.
Abstract: Center for Genomic Regulation, Universitat Pompeu Fabra, Barcelona, Catalonia, Spain E-mail: roderic.guigo@crg.cat Methods to characterise transcriptomes (i.e. to identify and quantify all transcript species in a given population of cells) have, until very recently, focused either on the qualitative discovery and cataloguing of transcript species in a given cell type or condition through multiplex cDNA cloning and sequencing of normalised cDNA libraries [1] or, alternatively, on the quantification of known transcript species through DNA microarrays [2]. Recent technical developments in massively parallel sequencing seem to have provided, for the first time, the throughput required to offer both transcript discovery and quantification simultaneously. Before the advent of the new sequencing technologies, however, capped analysis of gene expression (CAGE) offered genome wide transcript discovery and quantification. CAGE is a tagging technology in which short (20 or 27 base pair) sequences (tags) are extracted from the 50ends of the capped RNA molecules, concatenated and the concatenamers sequenced. Because many tags (10 to 20) can be collated in a single sequence, even with classic Sanger sequencing, it is possible to perform high throughput genome wide transcriptome surveys; and while CAGE does not provide information on the complete transcript structure, and, therefore, it cannot discern between alternative transcript isoforms, it does provide quantitative information on usage of all transcription start sites (TSS), known and novel, in the cellular condition assayed. It is thus particularly appropriate to measure the genome wide transcriptional activity of promoters. Combined with the throughput of the massively parallel sequencing instruments, which, incidentally, eliminate the need for tag concatenation, it is theoretically possible to identify and measure RNA molecules at a molecular level for each 1 to 10 cells. Capped analysis of gene expression has been an essential tool in transcriptome studies within several large-scale international projects, such as FANTOM3 [3] and ENCODE [4]. CAGE data has enabled valuable insights into promoter architecture and evolution, the discovery of novel TSSs, biological variability in the use of alternative TSSs and description of antisense-transcription. CAGE is, indeed, one of the technologies underlying the paradigmatic shift that we are witnessing on our understanding of eukaryotic transcriptomes. From a view of the genome composed of well defined genes isolated among large intergenic regions and mostly coding for proteins, we are transitioning to a view of the genome as a continuum of transcripts of many different classes with poorly defined boundaries, often overlapping in sense and antisense direction [4, 5]. The book edited by Piero Carninci provides an excellent guide to the CAGE method. Describing experimental procedures and bioinformatic analyses, it appeals to bench scientists and bioinformaticians alike, and even more so to the interested newcomer, as it gives examples of such applications and biological insights drawn from them. Starting with an introduction to CAGE and other technologies used for expression analyses, the book follows up with a well commented (and established) protocol for constructing CAGE libraries and a description of paired end DOI 10.1002/bies.201000144