scispace - formally typeset
Search or ask a question

Showing papers by "Wellcome Trust Sanger Institute published in 2002"


Journal ArticleDOI
27 Jun 2002-Nature
TL;DR: BRAF somatic missense mutations in 66% of malignant melanomas and at lower frequency in a wide range of human cancers, with a single substitution (V599E) accounting for 80%.
Abstract: Cancers arise owing to the accumulation of mutations in critical genes that alter normal programmes of cell proliferation, differentiation and death. As the first stage of a systematic genome-wide screen for these genes, we have prioritized for analysis signalling pathways in which at least one gene is mutated in human cancer. The RAS RAF MEK ERK MAP kinase pathway mediates cellular responses to growth signals. RAS is mutated to an oncogenic form in about 15% of human cancer. The three RAF genes code for cytoplasmic serine/threonine kinases that are regulated by binding RAS. Here we report BRAF somatic missense mutations in 66% of malignant melanomas and at lower frequency in a wide range of human cancers. All mutations are within the kinase domain, with a single substitution (V599E) accounting for 80%. Mutated BRAF proteins have elevated kinase activity and are transforming in NIH3T3 cells. Furthermore, RAS function is not required for the growth of cancer cell lines with the V599E mutation. As BRAF is a serine/threonine kinase that is commonly activated by somatic point mutation in human cancer, it may provide new therapeutic opportunities in malignant melanoma.

9,785 citations


Journal ArticleDOI
Robert H. Waterston1, Kerstin Lindblad-Toh2, Ewan Birney, Jane Rogers3  +219 moreInstitutions (26)
05 Dec 2002-Nature
TL;DR: The results of an international collaboration to produce a high-quality draft sequence of the mouse genome are reported and an initial comparative analysis of the Mouse and human genomes is presented, describing some of the insights that can be gleaned from the two sequences.
Abstract: The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

6,643 citations


Journal ArticleDOI
09 May 2002-Nature
TL;DR: The 8,667,507 base pair linear chromosome of Streptomyces coelicolor is reported, containing the largest number of genes so far discovered in a bacterium.
Abstract: Streptomyces coelicolor is a representative of the group of soil-dwelling, filamentous bacteria responsible for producing most natural antibiotics used in human and veterinary medicine. Here we report the 8,667,507 base pair linear chromosome of this organism, containing the largest number of genes so far discovered in a bacterium. The 7,825 predicted genes include more than 20 clusters coding for known or predicted secondary metabolites. The genome contains an unprecedented proportion of regulatory genes, predominantly those likely to be involved in responses to external stimuli and stresses, and many duplicated gene sets that may represent 'tissue-specific' isoforms operating in different phases of colonial development, a unique situation for a bacterium. An ancient synteny was revealed between the central 'core' of the chromosome and the whole chromosome of pathogens Mycobacterium tuberculosis and Corynebacterium diphtheriae. The genome sequence will greatly increase our understanding of microbial life in the soil as well as aiding the generation of new drug candidates by genetic engineering.

3,077 citations


Journal ArticleDOI
Robert A. Holt1, G. Mani Subramanian1, Aaron L. Halpern1, Granger G. Sutton1, Rosane Charlab1, Deborah R. Nusskern1, Patrick Wincker2, Andrew G. Clark3, José M. C. Ribeiro4, Ron Wides5, Steven L. Salzberg6, Brendan J. Loftus6, Mark Yandell1, William H. Majoros6, William H. Majoros1, Douglas B. Rusch1, Zhongwu Lai1, Cheryl L. Kraft1, Josep F. Abril, Véronique Anthouard2, Peter Arensburger7, Peter W. Atkinson7, Holly Baden1, Véronique de Berardinis2, Danita Baldwin1, Vladimir Benes, Jim Biedler8, Claudia Blass, Randall Bolanos1, Didier Boscus2, Mary Barnstead1, Shuang Cai1, Kabir Chatuverdi1, George K. Christophides, Mathew A. Chrystal9, Michele Clamp10, Anibal Cravchik1, Val Curwen10, Ali N Dana9, Arthur L. Delcher1, Ian M. Dew1, Cheryl A. Evans1, Michael Flanigan1, Anne Grundschober-Freimoser11, Lisa Friedli7, Zhiping Gu1, Ping Guan1, Roderic Guigó, Maureen E. Hillenmeyer9, Susanne L. Hladun1, James R. Hogan9, Young S. Hong9, Jeffrey Hoover1, Olivier Jaillon2, Zhaoxi Ke1, Zhaoxi Ke9, Chinnappa D. Kodira1, Kokoza Eb, Anastasios C. Koutsos12, Ivica Letunic, Alex Levitsky1, Yong Liang1, Jhy-Jhu Lin6, Jhy-Jhu Lin1, Neil F. Lobo9, John Lopez1, Joel A. Malek6, Tina C. McIntosh1, Stephan Meister, Jason R. Miller1, Clark M. Mobarry1, Emmanuel Mongin13, Sean D. Murphy1, David A. O'Brochta11, Cynthia Pfannkoch1, Rong Qi1, Megan A. Regier1, Karin A. Remington1, Hongguang Shao8, Maria V. Sharakhova9, Cynthia Sitter1, Jyoti Shetty6, Thomas J. Smith1, Renee Strong1, Jingtao Sun1, Dana Thomasova, Lucas Q. Ton9, Pantelis Topalis12, Zhijian Tu8, Maria F. Unger9, Brian P. Walenz1, Aihui Wang1, Jian Wang1, Mei Wang1, X. Wang9, Kerry J. Woodford1, Jennifer R. Wortman1, Jennifer R. Wortman6, Martin Wu6, Alison Yao1, Evgeny M. Zdobnov, Hongyu Zhang1, Qi Zhao1, Shaying Zhao6, Shiaoping C. Zhu1, Igor F. Zhimulev, Mario Coluzzi14, Alessandra della Torre14, Charles Roth15, Christos Louis12, Francis Kalush1, Richard J. Mural1, Eugene W. Myers1, Mark Raymond Adams1, Hamilton O. Smith1, Samuel Broder1, Malcolm J. Gardner6, Claire M. Fraser6, Ewan Birney13, Peer Bork, Paul T. Brey15, J. Craig Venter6, J. Craig Venter1, Jean Weissenbach2, Fotis C. Kafatos, Frank H. Collins9, Stephen L. Hoffman1 
04 Oct 2002-Science
TL;DR: Analysis of the PEST strain of A. gambiae revealed strong evidence for about 14,000 protein-encoding transcripts, and prominent expansions in specific families of proteins likely involved in cell adhesion and immunity were noted.
Abstract: Anopheles gambiae is the principal vector of malaria, a disease that afflicts more than 500 million people and causes more than 1 million deaths each year. Tenfold shotgun sequence coverage was obtained from the PEST strain of A. gambiae and assembled into scaffolds that span 278 million base pairs. A total of 91% of the genome was organized in 303 scaffolds; the largest scaffold was 23.1 million base pairs. There was substantial genetic variation within this strain, and the apparent existence of two haplotypes of approximately equal frequency ("dual haplotypes") in a substantial fraction of the genome likely reflects the outbred nature of the PEST strain. The sequence produced a conservative inference of more than 400,000 single-nucleotide polymorphisms that showed a markedly bimodal density distribution. Analysis of the genome sequence revealed strong evidence for about 14,000 protein-encoding transcripts. Prominent expansions in specific families of proteins likely involved in cell adhesion and immunity were noted. An expressed sequence tag analysis of genes regulated by blood feeding provided insights into the physiological adaptations of a hematophagous insect.

2,033 citations


Journal ArticleDOI
TL;DR: The overall architecture of the Bioperl toolkit is described, the problem domains that it addresses, and specific examples of how the toolkit can be used to solve common life-sciences problems are given.
Abstract: The Bioperl project is an international open-source collaboration of biologists, bioinformaticians, and computer scientists that has evolved over the past 7 yr into the most comprehensive library of Perl modules available for managing and manipulating life-science information. Bioperl provides an easy-to-use, stable, and consistent programming interface for bioinformatics application programmers. The Bioperl modules have been successfully and repeatedly used to reduce otherwise complex tasks to only a few lines of code. The Bioperl object model has been proven to be flexible enough to support enterprise-level applications such as EnsEMBL, while maintaining an easy learning curve for novice Perl programmers. Bioperl is capable of executing analyses and processing results from programs such as BLAST, ClustalW, or the EMBOSS suite. Interoperation with modules written in Python and Java is supported through the evolving BioCORBA bridge. Bioperl provides access to data stores such as GenBank and SwissProt via a flexible series of sequence input/output modules, and to the emerging common sequence data storage format of the Open Bioinformatics Database Access project. This study describes the overall architecture of the toolkit, the problem domains that it addresses, and gives specific examples of how the toolkit can be used to solve common life-sciences problems. We conclude with a discussion of how the open-source nature of the project has contributed to the development effort.

1,694 citations


Journal ArticleDOI
Valerie Wood1, R. Gwilliam1, Marie-Adèle Rajandream1, M. Lyne1, Rachel Lyne1, A. Stewart2, J. Sgouros2, N. Peat2, Jacqueline Hayles2, Stephen Baker1, D. Basham1, Sharen Bowman1, Karen Brooks1, D. Brown1, Steve D.M. Brown1, Tracey Chillingworth1, Carol Churcher1, Mark O. Collins1, R. Connor1, Ann Cronin1, P. Davis1, Theresa Feltwell1, Andrew G. Fraser1, S. Gentles1, Arlette Goble1, N. Hamlin1, David Harris1, J. Hidalgo1, Geoffrey M. Hodgson1, S. Holroyd1, T. Hornsby1, S. Howarth1, Elizabeth J. Huckle1, Sarah E. Hunt1, Kay Jagels1, Kylie R. James1, L. Jones1, Matthew Jones1, S. Leather1, S. McDonald1, J. McLean1, P. Mooney1, Sharon Moule1, Karen Mungall1, Lee Murphy1, D. Niblett1, C. Odell1, Karen Oliver1, Susan O'Neil1, D. Pearson1, Michael A. Quail1, Ester Rabbinowitsch1, Kim Rutherford1, Simon Rutter1, David L. Saunders1, Kathy Seeger1, Sarah Sharp1, Jason Skelton1, Mark Simmonds1, R. Squares1, S. Squares1, K. Stevens1, K. Taylor1, Ruth Taylor1, Adrian Tivey1, S. Walsh1, T. Warren1, S. Whitehead1, John Woodward1, Guido Volckaert3, Rita Aert3, Johan Robben3, B. Grymonprez3, I. Weltjens3, E. Vanstreels3, Michael A. Rieger, M. Schafer, S. Muller-Auer, C. Gabel, M. Fuchs, C. Fritzc, E. Holzer, D. Moestl, H. Hilbert, K. Borzym4, I. Langer4, Alfred Beck4, Hans Lehrach4, Richard Reinhardt4, Thomas M. Pohl5, P. Eger5, Wolfgang Zimmermann, H. Wedler, R. Wambutt, Bénédicte Purnelle6, André Goffeau6, Edouard Cadieu7, Stéphane Dréano7, Stéphanie Gloux7, Valerie Lelaure7, Stéphanie Mottier7, Francis Galibert7, Stephen J. Aves8, Z. Xiang8, Cherryl Hunt8, Karen Moore8, S. M. Hurst8, M. Lucas9, M. Rochet9, Claude Gaillardin9, Victor A. Tallada10, Victor A. Tallada11, Andrés Garzón10, Andrés Garzón11, G. Thode10, Rafael R. Daga10, Rafael R. Daga11, L. Cruzado10, Juan Jimenez10, Juan Jimenez11, Miguel del Nogal Sánchez12, F. del Rey12, J. Benito12, Angel Domínguez12, José L. Revuelta12, Sergio Moreno12, John Armstrong13, Susan L. Forsburg14, L. Cerrutti1, Todd M. Lowe15, W. R. McCombie16, Ian T. Paulsen17, Judith A. Potashkin18, G. V. Shpakovski19, David W. Ussery20, Bart Barrell1, Paul Nurse2 
21 Feb 2002-Nature
TL;DR: The genome of fission yeast (Schizosaccharomyces pombe), which contains the smallest number of protein-coding genes yet recorded for a eukaryote, is sequenced and highly conserved genes important for eukARYotic cell organization including those required for the cytoskeleton, compartmentation, cell-cycle control, proteolysis, protein phosphorylation and RNA splicing are identified.
Abstract: We have sequenced and annotated the genome of fission yeast (Schizosaccharomyces pombe), which contains the smallest number of protein-coding genes yet recorded for a eukaryote: 4,824. The centromeres are between 35 and 110 kilobases (kb) and contain related repeats including a highly conserved 1.8-kb element. Regions upstream of genes are longer than in budding yeast (Saccharomyces cerevisiae), possibly reflecting more-extended control regions. Some 43% of the genes contain introns, of which there are 4,730. Fifty genes have significant similarity with human disease genes; half of these are cancer related. We identify highly conserved genes important for eukaryotic cell organization including those required for the cytoskeleton, compartmentation, cell-cycle control, proteolysis, protein phosphorylation and RNA splicing. These genes may have originated with the appearance of eukaryotic life. Few similarly conserved genes that are important for multicellular organization were identified, suggesting that the transition from prokaryotes to eukaryotes required more new genes than did the transition from unicellular to multicellular organization.

1,686 citations


Journal ArticleDOI
Yasushi Okazaki, Masaaki Furuno, Takeya Kasukawa1, Jun Adachi, Hidemasa Bono, S. Kondo, Itoshi Nikaido2, Naoki Osato, Rintaro Saito3, Harukazu Suzuki, Itaru Yamanaka, H. Kiyosawa2, Ken Yagi, Yasuhiro Tomaru4, Yuki Hasegawa2, A. Nogami2, Christian Schönbach, Takashi Gojobori, Richard M. Baldarelli, David P. Hill, Carol J. Bult, David A. Hume5, John Quackenbush6, Lynn M. Schriml7, Alexander Kanapin, Hideo Matsuda8, Serge Batalov9, Kirk W. Beisel10, Judith A. Blake, Dirck W. Bradt, Vladimir Brusic, Cyrus Chothia11, Lori E. Corbani, S. Cousins, Emiliano Dalla, Tommaso A. Dragani, Colin F. Fletcher9, Colin F. Fletcher12, Alistair R. R. Forrest5, K. S. Frazer13, Terry Gaasterland14, Manuela Gariboldi, Carmela Gissi15, Adam Godzik16, Julian Gough11, Sean M. Grimmond5, Stefano Gustincich17, Nobutaka Hirokawa18, Ian J. Jackson19, Erich D. Jarvis20, Akio Kanai3, Hideya Kawaji1, Hideya Kawaji8, Yuka Imamura Kawasawa21, Rafal M. Kedzierski21, Benjamin L. King, Akihiko Konagaya, Igor V. Kurochkin, Yong-Hwan Lee6, Boris Lenhard22, Paul A. Lyons23, Donna Maglott7, Lois J. Maltais, Luigi Marchionni, Louise M. McKenzie, Harukata Miki18, Takeshi Nagashima, Koji Numata3, Toshihisa Okido, William J. Pavan7, Geo Pertea6, Graziano Pesole15, Nikolai Petrovsky24, Ramesh S. Pillai, Joan Pontius7, D. Qi, Sridhar Ramachandran, Timothy Ravasi5, Jonathan C. Reed16, Deborah J Reed, Jeffrey G. Reid, Brian Z. Ring, M. Ringwald, Albin Sandelin22, Claudio Schneider, Colin A. Semple19, Mitsutoshi Setou18, K. Shimada25, Razvan Sultana6, Yoichi Takenaka8, Martin S. Taylor19, Rohan D. Teasdale5, Masaru Tomita3, Roberto Verardo, Lukas Wagner7, Claes Wahlestedt22, Y. Wang6, Yoshiki Watanabe25, Christine A. Wells5, Laurens G. Wilming26, Anthony Wynshaw-Boris27, Masashi Yanagisawa21, Ivana V. Yang6, L. Yang, Zheng Yuan5, Mihaela Zavolan14, Yunhui Zhu, Anne M. Zimmer28, Piero Carninci, N. Hayatsu, Tomoko Hirozane-Kishikawa, Hideaki Konno, M. Nakamura, Naoko Sakazume, K. Sato4, Toshiyuki Shiraki, Kazunori Waki, Jun Kawai, Katsunori Aizawa, Takahiro Arakawa, S. Fukuda, A. Hara, W. Hashizume, K. Imotani, Y. Ishii, Masayoshi Itoh, Ikuko Kagawa, A. Miyazaki, K. Sakai, D. Sasaki, K. Shibata, Akira Shinagawa, Ayako Yasunishi, Masayasu Yoshino, Robert H. Waterston29, Eric S. Lander30, Jane Rogers26, Ewan Birney, Yoshihide Hayashizaki 
05 Dec 2002-Nature
TL;DR: The present work, completely supported by physical clones, provides the most comprehensive survey of a mammalian transcriptome so far, and is a valuable resource for functional genomics.
Abstract: Only a small proportion of the mouse genome is transcribed into mature messenger RNA transcripts There is an international collaborative effort to identify all full-length mRNA transcripts from the mouse, and to ensure that each is represented in a physical collection of clones Here we report the manual annotation of 60,770 full-length mouse complementary DNA sequences These are clustered into 33,409 'transcriptional units', contributing 901% of a newly established mouse transcriptome database Of these transcriptional units, 4,258 are new protein-coding and 11,665 are new non-coding messages, indicating that non-coding RNA is a major component of the transcriptome 41% of all transcriptional units showed evidence of alternative splicing In protein-coding transcripts, 79% of splice variations altered the protein product Whole-transcriptome analyses resulted in the identification of 2,431 sense-antisense pairs The present work, completely supported by physical clones, provides the most comprehensive survey of a mammalian transcriptome so far, and is a valuable resource for functional genomics

1,663 citations


Journal ArticleDOI
TL;DR: The Ensembl database project provides a bioinformatics framework to organise biology around the sequences of large genomes and is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources.
Abstract: The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources, and is available as either an interactive web site or as flat files. It is also an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements from sequence analysis to data storage and visualisation. The Ensembl site is one of the leading sources of human genome sequence annotation and provided much of the analysis for publication by the international human genome project of the draft genome. The Ensembl system is being installed around the world in both companies and academic sites on machines ranging from supercomputers to laptops.

1,540 citations


Journal Article
TL;DR: Three BRAF mutations identified in this study are novel, altering residues important in AKT-mediated BRAF phosphorylation and suggesting that disruption ofAKT-induced BRAF inhibition can play a role in malignant transformation, first report of mutations documenting this interaction in human cancers.
Abstract: BRAF encodes a RAS-regulated kinase that mediates cell growth and malignant transformation kinase pathway activation. Recently, we have identified activating BRAF mutations in 66% of melanomas and a smaller percentage of many other human cancers. To determine whether BRAF mutations account for the MAP kinase pathway activation common in non-small cell lung carcinomas (NSCLCs) and to extend the initial findings in melanoma, we screened DNA from 179 NSCLCs and 35 melanomas for BRAF mutations (exons 11 and 15). We identified BRAF mutations in 5 NSCLCs (3%; one V599 and four non-V599) and 22 melanomas (63%; 21 V599 and 1 non-V599). Three BRAF mutations identified in this study are novel, altering residues important in AKT-mediated BRAF phosphorylation and suggesting that disruption of AKT-induced BRAF inhibition can play a role in malignant transformation. To our knowledge, this is the first report of mutations documenting this interaction in human cancers. Although >90% of BRAF mutations in melanoma involve codon 599 (57 of 60), 8 of 9 BRAF mutations reported to date in NSCLC are non-V599 (89%; P < 10(-7)), strongly suggesting that BRAF mutations in NSCLC are qualitatively different from those in melanoma; thus, there may be therapeutic differences between lung cancer and melanoma in response to RAF inhibitors. Although uncommon, BRAF mutations in human lung cancers may identify a subset of tumors sensitive to targeted therapy.

1,097 citations


Journal ArticleDOI
03 Oct 2002-Nature
TL;DR: A large-scale, high-accuracy mass spectrometric proteome analysis of selected stages of the human malaria parasite Plasmodium falciparum revealed 1,289 proteins that contain domains that indicate a role in cell–cell interactions, and therefore can be evaluated as potential components of a malaria vaccine formulation.
Abstract: The annotated genomes of organisms define a 'blueprint' of their possible gene products. Post-genome analyses attempt to confirm and modify the annotation and impose a sense of the spatial, temporal and developmental usage of genetic information by the organism. Here we describe a large-scale, high-accuracy (average deviation less than 0.02 Da at 1,000 Da) mass spectrometric proteome analysis of selected stages of the human malaria parasite Plasmodium falciparum. The analysis revealed 1,289 proteins of which 714 proteins were identified in asexual blood stages, 931 in gametocytes and 645 in gametes. The last two groups provide insights into the biology of the sexual stages of the parasite, and include conserved, stage-specific, secreted and membrane-associated proteins. A subset of these proteins contain domains that indicate a role in cell-cell interactions, and therefore can be evaluated as potential components of a malaria vaccine formulation. We also report a set of peptides with significant matches in the parasite genome but not in the protein set predicted by computational methods.

667 citations


Journal ArticleDOI
TL;DR: Comparison with the meiotic program of the distantly related Saccharomyces cerevisiae reveals an unexpectedly small shared meiotic transcriptome, suggesting that the transcriptional regulation of meiosis evolved independently in both species.
Abstract: Sexual reproduction requires meiosis to produce haploid gametes, which in turn can fuse to regenerate a diploid organism. We have studied the transcriptional program that drives this developmental process in Schizosaccharomyces pombe using DNA microarrays. Here we show that hundreds of genes are regulated in successive waves of transcription that correlate with major biological events of meiosis and sporulation. Each wave is associated with specific promoter motifs. Clusters of neighboring genes (mostly close to telomeres) are co-expressed early in the process, which reflects a more global control of these genes. We find that two Atf-like transcription factors are essential for the expression of late genes and formation of spores, and identify dozens of potential Atf target genes. Comparison with the meiotic program of the distantly related Saccharomyces cerevisiae reveals an unexpectedly small shared meiotic transcriptome, suggesting that the transcriptional regulation of meiosis evolved independently in both species.

Journal ArticleDOI
TL;DR: FlyBase biologists successfully used Apollo to annotate the Drosophila melanogaster genome and it is increasingly being used as a starting point for the development of customized annotation editing tools for other genome projects.
Abstract: The well-established inaccuracy of purely computational methods for annotating genome sequences necessitates an interactive tool to allow biological experts to refine these approximations by viewing and independently evaluating the data supporting each annotation. Apollo was developed to meet this need, enabling curators to inspect genome annotations closely and edit them. FlyBase biologists successfully used Apollo to annotate the Drosophila melanogaster genome and it is increasingly being used as a starting point for the development of customized annotation editing tools for other genome projects.

Journal ArticleDOI
01 Aug 2002-Nature
TL;DR: This study demonstrates the feasibility of developing genome-wide maps of LD and shows a strong correlation between high LD and low recombination frequency in the extant genetic map, suggesting that historical and contemporary recombination rates are similar.
Abstract: DNA sequence variants in specific genes or regions of the human genome are responsible for a variety of phenotypes such as disease risk or variable drug response. These variants can be investigated directly, or through their non-random associations with neighbouring markers (called linkage disequilibrium (LD)). Here we report measurement of LD along the complete sequence of human chromosome 22. Duplicate genotyping and analysis of 1,504 markers in Centre d'Etude du Polymorphisme Humain (CEPH) reference families at a median spacing of 15 kilobases (kb) reveals a highly variable pattern of LD along the chromosome, in which extensive regions of nearly complete LD up to 804 kb in length are interspersed with regions of little or no detectable LD. The LD patterns are replicated in a panel of unrelated UK Caucasians. There is a strong correlation between high LD and low recombination frequency in the extant genetic map, suggesting that historical and contemporary recombination rates are similar. This study demonstrates the feasibility of developing genome-wide maps of LD.

Journal Article
TL;DR: The data suggest that BRAF mutations are, to some extent, biologically similar to RAS mutations in colorectal cancer because both occur at approximately the same stage of the adenoma-carcinoma sequence, both are associated with villous morphology, and both are less common in adenomas from FAP cases.
Abstract: Activation of the RAS/RAF/extracellular signal-regulated kinase-mitogen-activated protein kinase/extracellular signal-regulated kinase/mitogen-activated protein kinase pathway by RAS mutations is commonly found in human cancers. Recently, we reported that mutation of BRAF provides an alternative route for activation of this signaling pathway and can be found in melanomas, colorectal cancers, and ovarian tumors. Here we perform an extensive characterization of BRAF mutations in a large series of colorectal tumors in various stages of neoplastic transformation. BRAF mutations were found in 11 of 215 (5.1%) colorectal adenocarcinomas, 3 of 108 (2.8%) sporadic adenomas, 1 of 63 (1.6%) adenomas from familial adenomatous polyposis (FAP) patients, and 1 of 3 (33%) hyperplastic polyps. KRAS mutations were detected in 34% of carcinomas, 31% of sporadic adenomas, 9% of FAP adenomas, and no hyperplastic polyps. Eight of 16 BRAF mutations were V599E, the previously described hotspot, and none of these was associated with a KRAS mutation in the same lesion. The remaining eight mutations involve other conserved amino acids in the kinase domain, and 62.5% have a KRAS mutation in the same tumor. Our data suggest that BRAF mutations are, to some extent, biologically similar to RAS mutations in colorectal cancer because both occur at approximately the same stage of the adenoma-carcinoma sequence, both are associated with villous morphology, and both are less common in adenomas from FAP cases. By contrast, colorectal adenocarcinomas with BRAF mutations are associated with early Dukes' tumor stages (P = 0.006) and no such relationship was observed for KRAS mutations. The presence in some colorectal neoplasms of mutations in both BRAF and KRAS suggests that modulation of the RAS-RAF-extracellular signal-regulated kinase-mitogen-activated protein kinase/extracellular signal-regulated kinase/mitogen-activated protein kinase signaling pathway may occur by mutation of multiple components.

Journal ArticleDOI
15 Aug 2002-Nature
TL;DR: A physical map of the mouse genome that contains 296 contigs of overlapping bacterial clones and 16,992 unique markers is constructed, enabling identification of a mouse clone that corresponds to almost any position in the human genome.
Abstract: A physical map of a genome is an essential guide for navigation, allowing the location of any gene or other landmark in the chromosomal DNA. We have constructed a physical map of the mouse genome that contains 296 contigs of overlapping bacterial clones and 16,992 unique markers. The mouse contigs were aligned to the human genome sequence on the basis of 51,486 homology matches, thus enabling use of the conserved synteny (correspondence between chromosome blocks) of the two genomes to accelerate construction of the mouse map. The map provides a framework for assembly of whole-genome shotgun sequence data, and a tile path of clones for generation of the reference sequence. Definition of the human-mouse alignment at this level of resolution enables identification of a mouse clone that corresponds to almost any position in the human genome. The human sequence may be used to facilitate construction of other mammalian genome maps using the same strategy.

Journal ArticleDOI
TL;DR: The results are best explained by extreme variability in the recombination rate at a fine scale, and provide the first empirical evidence that such recombination 'hot spots' are a general feature of the human genome and have a principal role in shaping genetic variation in the human population.
Abstract: Variation in the human genome sequence is key to understanding susceptibility to disease in modern populations and the history of ancestral populations. Unlocking this information requires knowledge of the patterns and underlying causes of human sequence diversity. By applying a new population-genetic framework to two genome-wide polymorphism surveys, we find that the human genome contains sizeable regions (stretching over tens of thousands of base pairs) that have intrinsically high and low rates of sequence variation. We show that the primary determinant of these patterns is shared genealogical history. Only a fraction of the variation (at most 25%) is due to the local mutation rate. By measuring the average distance over which genealogical histories are typically preserved, these data provide the first genome-wide estimate of the average extent of correlation among variants (linkage disequilibrium). The results are best explained by extreme variability in the recombination rate at a fine scale, and provide the first empirical evidence that such recombination 'hot spots' are a general feature of the human genome and have a principal role in shaping genetic variation in the human population.

Journal ArticleDOI
TL;DR: It is demonstrated that the LGI1 protein, which contains several leucine-rich repeats, is expressed ubiquitously in the neuronal cell compartment of the brain and provides evidence for genetic heterogeneity within this disorder.
Abstract: Autosomal dominant lateral temporal epilepsy (EPT; OMIM 600512) is a form of epilepsy characterized by partial seizures, usually preceded by auditory signs. The gene for this disorder has been mapped by linkage studies to chromosomal region 10q24. Here we show that mutations in the LGI1 gene segregate with EPT in two families affected by this disorder. Both mutations introduce premature stop codons and thus prevent the production of the full-length protein from the affected allele. By immunohistochemical studies, we demonstrate that the LGI1 protein, which contains several leucine-rich repeats, is expressed ubiquitously in the neuronal cell compartment of the brain. Moreover, we provide evidence for genetic heterogeneity within this disorder, since several other families with a phenotype consistent with this type of epilepsy lack mutations in the LGI1 gene.

Journal ArticleDOI
TL;DR: The entire 127,923-bp sequence of the toxin-encoding plasmid pBtoxis from Bacillus thuringiensis subsp.
Abstract: The entire 127,923-bp sequence of the toxin-encoding plasmid pBtoxis from Bacillus thuringiensis subsp. israelensis is presented and analyzed. In addition to the four known Cry and two known Cyt toxins, a third Cyt-type sequence was found with an additional C-terminal domain previously unseen in such proteins. Many plasmid-encoded genes could be involved in several functions other than toxin production. The most striking of these are several genes potentially affecting host sporulation and germination and a set of genes for the production and export of a peptide antibiotic.

Journal ArticleDOI
TL;DR: How the SSD appears to function as a regulatory domain involved in linking vesicle trafficking and protein localization with such varied processes as cholesterol homeostasis, cell signalling and cytokinesis is discussed.

Journal ArticleDOI
TL;DR: This work has written a fast implementation of the popular Neighbor-Joining tree building algorithm QuickTree, which allows the reconstruction of phylogenies for very large protein families that would be infeasible using other popular methods.
Abstract: We have written a fast implementation of the popular Neighbor-Joining tree building algorithm. QuickTree allows the reconstruction of phylogenies for very large protein families (including the largest Pfam alignment containing 27000 HIV GP120 glycoprotein sequences) that would be infeasible using other popular methods.

Journal ArticleDOI
TL;DR: It is concluded that a signal resembling the well known TATA box, together with flanking regions of C-G enrichment, are the most important sequence-based signals marking sites of transcriptional initiation at a large class of typical promoters.
Abstract: Transcription, the process whereby RNA copies are made from sections of the DNA genome, is directed by promoter regions. These define the transcription start site, and also the set of cellular conditions under which the promoter is active. At least in more complex species, it appears to be common for genes to have several different transcription start sites, which may be active under different conditions. Eukaryotic promoters are complex and fairly diffuse structures, which have proven hard to detect in silico. We show that a novel hybrid machine-learning method is able to build useful models of promoters for >50% of human transcription start sites. We estimate specificity to be >70%, and demonstrate good positional accuracy. Based on the structure of our learned models, we conclude that a signal resembling the well known TATA box, together with flanking regions of C-G enrichment, are the most important sequence-based signals marking sites of transcriptional initiation at a large class of typical promoters.

Journal ArticleDOI
03 Oct 2002-Nature
TL;DR: The sequence of chromosomes 1, 3–9 and 13 of P. falciparum clone 3D7 is reported—these chromosomes account for approximately 55% of the total genome, and a highly conserved sequence element is identified in the intergenic region of internal var genes that is not associated with their telomeric counterparts.
Abstract: Since the sequencing of the first two chromosomes of the malaria parasite, Plasmodium falciparum, there has been a concerted effort to sequence and assemble the entire genome of this organism. Here we report the sequence of chromosomes 1, 3-9 and 13 of P. falciparum clone 3D7--these chromosomes account for approximately 55% of the total genome. We describe the methods used to map, sequence and annotate these chromosomes. By comparing our assemblies with the optical map, we indicate the completeness of the resulting sequence. During annotation, we assign Gene Ontology terms to the predicted gene products, and observe clustering of some malaria-specific terms to specific chromosomes. We identify a highly conserved sequence element found in the intergenic region of internal var genes that is not associated with their telomeric counterparts.

Journal ArticleDOI
05 Dec 2002-Neuron
TL;DR: Pcdh-gamma genes are dispensable for at least some aspects of connectivity but required for survival of specific neuronal types, as shown in mutant mice lacking all 22 PCDh-Gamma genes.

Journal ArticleDOI
TL;DR: This work describes the previously uncharacterized PASTA domain and infer that it binds β-lactam antibiotics and their peptidoglycan analogues, and postulates that PknB-like kinases are key regulators of cell-wall biosynthesis.

Journal ArticleDOI
04 Jul 2002-Nature
TL;DR: A significant number of the genes show higher similarities to genes of vertebrates than to those of other fully sequenced eukaryotes, which strengthens the view that the evolutionary position of D. discoideum is located before the branching of metazoa and fungi but after the divergence of the plant kingdom.
Abstract: The genome of the lower eukaryote Dictyostelium discoideum comprises six chromosomes. Here we report the sequence of the largest, chromosome 2, which at 8 megabases (Mb) represents about 25% of the genome. Despite an A + T content of nearly 80%, the chromosome codes for 2,799 predicted protein coding genes and 73 transfer RNA genes. This gene density, about 1 gene per 2.6 kilobases (kb), is surpassed only by Saccharomyces cerevisiae (one per 2 kb) and is similar to that of Schizosaccharomyces pombe (one per 2.5 kb). If we assume that the other chromosomes have a similar gene density, we can expect around 11,000 genes in the D. discoideum genome. A significant number of the genes show higher similarities to genes of vertebrates than to those of other fully sequenced eukaryotes. This analysis strengthens the view that the evolutionary position of D. discoideum is located before the branching of metazoa and fungi but after the divergence of the plant kingdom, placing it close to the base of metazoan evolution.

Journal ArticleDOI
TL;DR: A microarray-based CGH method is demonstrated that allows reliable detection of chromosomal deletions and amplifications with high resolution and is an obvious solution to all of the limitations of conventional CGH.
Abstract: Chromosomal imbalances such as deletions and amplifications are common rearrangements in most tumors1. Specific rearrangements are consistently associated with specific tumor types or stages, implicating the role of the genes in a region of chromosomal imbalance in tumor initiation and progression2. The development of comparative genomic hybridization (CGH)3 has obviated the need to obtain metaphase spreads from tumors, so that the chromosomal imbalances in many solid tumors may be revealed using an extracted genomic DNA sample. However, the resolution of the cytogenetic method remains and the extreme technical difficulty of CGH has restricted its use. Conceptually, DNA microarray4–based CGH is an obvious solution to all of the limitations of conventional CGH. Although arrays have been used for CGH studies5,6,7,8, their success has been limited by poor specific signal-to-noise ratios. Here we demonstrate a microarray-based CGH method that allows reliable detection of chromosomal deletions and amplifications with high resolution. Our microarray system is fundamentally different from most current microarray technologies in that activated DNA is printed on natural glass surfaces while other systems almost exclusively focus on activating the surfaces, a strategy that invariably introduces hybridization backgrounds. The concept of using pre-modification may be generally applied for making arrays of other biological materials, as modifying the substrates will be more controllable in solution than on surfaces.

Journal ArticleDOI
TL;DR: Findings argue for dual roles for CT/CGRP gene products: prevention of bone resorption in hypercalcemic states and a regulatory role in bone formation.
Abstract: Calcitonin (CT) is a known inhibitor of bone resorption. Calcitonin gene-related peptide-alpha (CGRPalpha), produced by alternative RNA processing of the CT/CGRP gene, has no clearly defined role in bone. To better understand the physiologic role of the CT/CGRP gene we created a mouse in which the coding sequences for both CT and CGRPalpha were deleted by homologous recombination. The CT/CGRP(-/-) knockout (KO) mice procreated normally, there were no identifiable developmental defects at birth, and they had normal baseline calcium-related chemistry values. However, KO animals were more responsive to exogenous human parathyroid hormone as evidenced by a greater increase of the serum calcium concentration and urine deoxypyridinoline crosslinks, an effect reversed by CT and mediated by a greater increase in bone resorption than in controls. Surprisingly, KO mice have significantly greater trabecular bone volume and a 1.5- to 2-fold increase in bone formation at 1 and 3 months of age. This effect appears to be mediated by increased bone formation. In addition, KO mice maintain bone mass following ovariectomy, whereas wild-type mice lose approximately one-third of their bone mass over 2 months. These findings argue for dual roles for CT/CGRP gene products: prevention of bone resorption in hypercalcemic states and a regulatory role in bone formation.

Journal ArticleDOI
TL;DR: Analysis of the genomic architecture of protocadherin (Pcdh) gene clusters provides evidence that the transcription of individual Pcdh-gamma genes is under the control of a distinct but related promoter upstream of each PCDh-Gamma variable exon, and posttranscriptional processing of each K562 transcript is predominantly mediated through cis-alternative splicing.
Abstract: The genomic architecture of protocadherin (Pcdh) gene clusters is remarkably similar to that of the immunoglobulin and T cell receptor gene clusters, and can potentially provide significant molecular diversity. Pcdh genes are abundantly expressed in the central nervous system. These molecules are primary candidates for establishing specific neuronal connectivity. Despite the extensive analyses of the genomic structure of both human and mouse Pcdh gene clusters, the definitive molecular mechanisms that control Pcdh gene expression are still unknown. Four theories have been proposed, including (1) DNA recombination followed by cis-splicing, (2) single promoter and cis-alternative splicing, (3) multiple promoters and cis-alternative splicing, and (4) multiple promoters and trans-splicing. Using a combination of molecular and genetic analyses, we evaluated the four models at the Pcdh-gamma locus. Our analysis provides evidence that the transcription of individual Pcdh-gamma genes is under the control of a distinct but related promoter upstream of each Pcdh-gamma variable exon, and posttranscriptional processing of each Pcdh-gamma transcript is predominantly mediated through cis-alternative splicing.

Journal ArticleDOI
TL;DR: In this review, some of the classic and modern approaches that have fueled the recent dramatic explosion in mouse genetics are summarized.
Abstract: In the postgenomic era the mouse will be central to the challenge of ascribing a function to the 40,000 or so genes that constitute our genome. In this review, we summarize some of the classic and modern approaches that have fueled the recent dramatic explosion in mouse genetics. Together with the sequencing of the mouse genome, these tools will have a profound effect on our ability to generate new and more accurate mouse models and thus provide a powerful insight into the function of human genes during the processes of both normal development and disease.

Journal ArticleDOI
TL;DR: The first comprehensive microarray representing a human chromosome for analysis of DNA copy number variation and comprehensive epigenetic profiling of 22q-located genes and high-resolution analysis of replication timing across the entire chromosome are constructed.
Abstract: We have constructed the first comprehensive microarray representing a human chromosome for analysis of DNA copy number variation. This chromosome 22 array covers 34.7 Mb, representing 1.1% of the genome, with an average resolution of 75 kb. To demonstrate the utility of the array, we have applied it to profile acral melanoma, dermatofibrosarcoma, DiGeorge syndrome and neurofibromatosis 2. We accurately diagnosed homozygous/heterozygous deletions, amplifications/gains, IGLV/IGLC locus instability, and breakpoints of an imbalanced translocation. We further identified the 14-3-3 eta isoform as a candidate tumor suppressor in glioblastoma. Two significant methodological advances in array construction were also developed and validated. These include a strictly sequence defined, repeat-free, and non-redundant strategy for array preparation. This approach allows an increase in array resolution and analysis of any locus; disregarding common repeats, genomic clone availability and sequence redundancy. In addition, we report that the application of phi29 DNA polymerase is advantageous in microarray preparation. A broad spectrum of issues in medical research and diagnostics can be approached using the array. This well annotated and gene-rich autosome contains numerous uncharacterized disease genes. It is therefore crucial to associate these genes to specific 22q-related conditions and this array will be instrumental towards this goal. Furthermore, comprehensive epigenetic profiling of 22q-located genes and high-resolution analysis of replication timing across the entire chromosome can be studied using our array.