scispace - formally typeset
Search or ask a question

Showing papers by "Wellcome Trust Sanger Institute published in 2005"


Journal ArticleDOI
John W. Belmont1, Andrew Boudreau, Suzanne M. Leal1, Paul Hardenbol  +229 moreInstitutions (40)
27 Oct 2005
TL;DR: A public database of common variation in the human genome: more than one million single nucleotide polymorphisms for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted.
Abstract: Inherited genetic variation has a critical but as yet largely uncharacterized role in human disease. Here we report a public database of common variation in the human genome: more than one million single nucleotide polymorphisms (SNPs) for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted. These data document the generality of recombination hotspots, a block-like structure of linkage disequilibrium and low haplotype diversity, leading to substantial correlations of SNPs with many of their neighbours. We show how the HapMap resource can guide the design and analysis of genetic association studies, shed light on structural variation and recombination, and identify loci that may have been subject to natural selection during human evolution.

5,479 citations


Journal ArticleDOI
Piero Carninci, Takeya Kasukawa1, Shintaro Katayama, Julian Gough  +194 moreInstitutions (36)
02 Sep 2005-Science
TL;DR: Detailed polling of transcription start and termination sites and analysis of previously unidentified full-length complementary DNAs derived from the mouse genome provide a comprehensive platform for the comparative analysis of mammalian transcriptional regulation in differentiation and development.
Abstract: This study describes comprehensive polling of transcription start and termination sites and analysis of previously unidentified full-length complementary DNAs derived from the mouse genome. We identify the 5' and 3' boundaries of 181,047 transcripts with extensive variation in transcripts arising from alternative promoter usage, splicing, and polyadenylation. There are 16,247 new mouse protein-coding transcripts, including 5154 encoding previously unidentified proteins. Genomic mapping of the transcriptome reveals transcriptional forests, with overlapping transcription on both strands, separated by deserts in which few transcripts are observed. The data provide a comprehensive platform for the comparative analysis of mammalian transcriptional regulation in differentiation and development.

3,412 citations


Journal ArticleDOI
08 Dec 2005-Nature
TL;DR: A high-quality draft genome sequence of the domestic dog is reported, together with a dense map of single nucleotide polymorphisms (SNPs) across breeds, to shed light on the structure and evolution of genomes and genes.
Abstract: Here we report a high-quality draft genome sequence of the domestic dog (Canis familiaris), together with a dense map of single nucleotide polymorphisms (SNPs) across breeds. The dog is of particular interest because it provides important evolutionary information and because existing breeds show great phenotypic diversity for morphological, physiological and behavioural traits. We use sequence comparison with the primate and rodent lineages to shed light on the structure and evolution of genomes and genes. Notably, the majority of the most highly conserved non-coding sequences in mammalian genomes are clustered near a small subset of genes with important roles in development. Analysis of SNPs reveals long-range haplotypes across the entire dog genome, and defines the nature of genetic diversity within and across breeds. The current SNP map now makes it possible for genome-wide association studies to identify genes responsible for diseases and traits, with important consequences for human and companion animal health.

2,431 citations


Journal ArticleDOI
Matthew Berriman1, Elodie Ghedin2, Elodie Ghedin3, Christiane Hertz-Fowler1, Gaëlle Blandin3, Hubert Renauld1, Daniella Castanheira Bartholomeu3, Nicola Lennard1, Elisabet Caler3, N. Hamlin1, Brian J. Haas3, Ulrike Böhme1, Linda Hannick3, Martin Aslett1, Joshua Shallom3, Lucio Marcello4, Lihua Hou3, Bill Wickstead5, U. Cecilia M. Alsmark6, Claire Arrowsmith1, Rebecca Atkin1, Andrew Barron1, Frédéric Bringaud7, Karen Brooks1, Mark Carrington8, Inna Cherevach1, Tracey-Jane Chillingworth1, Carol Churcher1, Louise Clark1, Craig Corton1, Ann Cronin1, Robert L. Davies1, Jonathon Doggett1, Appolinaire Djikeng3, Tamara Feldblyum3, Mark C. Field8, Audrey Fraser1, Ian Goodhead1, Zahra Hance1, David Harper1, Barbara Harris1, Heidi Hauser1, Jessica B. Hostetler3, Al Ivens1, Kay Jagels1, David W. Johnson1, Justin Johnson3, Kristine Jones3, Arnaud Kerhornou1, Hean Koo3, Natasha Larke1, Scott M. Landfear9, Christopher Larkin3, Vanessa Leech8, Alexandra Line1, Angela Lord1, Annette MacLeod4, P. Mooney1, Sharon Moule1, David M. A. Martin10, Gareth W. Morgan11, Karen Mungall1, Halina Norbertczak1, Doug Ormond1, Grace Pai3, Christopher S. Peacock1, Jeremy Peterson3, Michael A. Quail1, Ester Rabbinowitsch1, Marie-Adèle Rajandream1, Chris P Reitter8, Steven L. Salzberg3, Mandy Sanders1, Seth Schobel3, Sarah Sharp1, Mark Simmonds1, Anjana J. Simpson3, Luke J. Tallon3, C. Michael R. Turner4, Andrew Tait4, Adrian Tivey1, Susan Van Aken3, Danielle Walker1, David Wanless3, Shiliang Wang3, Brian White1, Owen White3, Sally Whitehead1, John Woodward1, Jennifer R. Wortman3, Mark Raymond Adams12, T. Martin Embley6, Keith Gull5, Elisabetta Ullu13, J. David Barry4, Alan H. Fairlamb10, Fred R. Opperdoes14, Barclay G. Barrell1, John E. Donelson15, Neil Hall16, Neil Hall3, Claire M. Fraser3, Sara E. Melville8, Najib M. El-Sayed3, Najib M. El-Sayed2 
15 Jul 2005-Science
TL;DR: Comparisons of the cytoskeleton and endocytic trafficking systems of Trypanosoma brucei with those of humans and other eukaryotic organisms reveal major differences.
Abstract: African trypanosomes cause human sleeping sickness and livestock trypanosomiasis in sub-Saharan Africa. We present the sequence and analysis of the 11 megabase-sized chromosomes of Trypanosoma brucei. The 26-megabase genome contains 9068 predicted genes, including ∼900 pseudogenes and ∼1700 T. brucei–specific genes. Large subtelomeric arrays contain an archive of 806 variant surface glycoprotein (VSG) genes used by the parasite to evade the mammalian immune system. Most VSG genes are pseudogenes, which may be used to generate expressed mosaic genes by ectopic recombination. Comparisons of the cytoskeleton and endocytic trafficking systems with those of humans and other eukaryotic organisms reveal major differences. A comparison of metabolic pathways encoded by the genomes of T. brucei, T. cruzi, and Leishmania major reveals the least overall metabolic capability in T. brucei and the greatest in L. major. Horizontal transfer of genes of bacterial origin has contributed to some of the metabolic differences in these parasites, and a number of novel potential drug targets have been identified.

1,631 citations


Journal ArticleDOI
TL;DR: The Artemis Comparison Tool (ACT) allows an interactive visualisation of comparisons between complete genome sequences and associated annotations and so inherits powerful searching and analysis tools.
Abstract: The Artemis Comparison Tool (ACT) allows an interactive visualisation of comparisons between complete genome sequences and associated annotations. The comparison data can be generated with several different programs; BLASTN, TBLASTX or Mummer comparisons between genomic DNA sequences, or orthologue tables generated by reciprocal FASTA comparison between protein sets. It is possible to identify regions of similarity, insertions and rearrangements at any level from the whole genome to base-pair differences. ACT uses Artemis components to display the sequences and so inherits powerful searching and analysis tools. ACT is part of the Artemis distribution and is similarly open source, written in Java and can run on any Java enabled platform, including UNIX, Macintosh and Windows. Availability: ACT is freely available (under a GPL licence) for download from the Sanger Institute web site, http://www.sanger.ac.uk Contact: artemis@sanger.ac.uk

1,565 citations


Journal ArticleDOI
Alasdair Ivens1, Christopher S. Peacock1, Elizabeth A. Worthey2, Lee Murphy1, Gautam Aggarwal2, Matthew Berriman1, Ellen Sisk2, Marie-Adèle Rajandream1, Ellen Adlem1, Rita Aert3, Atashi Anupama2, Zina Apostolou, Philip Attipoe2, Nathalie Bason1, Christopher Bauser4, Alfred Beck5, Stephen M. Beverley6, Gabriella Bianchettin7, K. Borzym5, G. Bothe4, Carlo V. Bruschi8, Carlo V. Bruschi7, Matt Collins1, Eithon Cadag2, Laura Ciarloni7, Christine Clayton, Richard M.R. Coulson9, Ann Cronin1, Angela K. Cruz10, Robert L. Davies1, Javier G. De Gaudenzi11, Deborah E. Dobson6, Andreas Duesterhoeft, Gholam Fazelina2, Nigel Fosker1, Alberto C.C. Frasch11, Audrey Fraser1, Monika Fuchs, Claudia Gabel, Arlette Goble1, André Goffeau12, David Harris1, Christiane Hertz-Fowler1, Helmut Hilbert, David Horn13, Yiting Huang2, Sven Klages5, Andrew J Knights1, Michael Kube5, Natasha Larke1, Lyudmila Litvin2, Angela Lord1, Tin Louie2, Marco A. Marra, David Masuy12, Keith R. Matthews14, Shulamit Michaeli, Jeremy C. Mottram15, Silke Müller-Auer, Heather Munden2, Siri Nelson2, Halina Norbertczak1, Karen Oliver1, Susan O'Neil1, Martin Pentony2, Thomas M. Pohl4, Claire Price1, Bénédicte Purnelle12, Michael A. Quail1, Ester Rabbinowitsch1, Richard Reinhardt5, Michael A. Rieger, Joel Rinta2, Johan Robben3, Laura Robertson2, Jeronimo C. Ruiz10, Simon Rutter1, David L. Saunders1, Melanie Schäfer, Jacquie Schein, David C. Schwartz16, Kathy Seeger1, Amber Seyler2, Sarah Sharp1, Heesun Shin, Dhileep Sivam2, Rob Squares1, Steve Squares1, Valentina Tosato7, Christy Vogt2, Guido Volckaert3, Rolf Wambutt, T. Warren1, Holger Wedler, John Woodward1, Shiguo Zhou16, Wolfgang Zimmermann, Deborah F. Smith17, Jenefer M. Blackwell18, Kenneth Stuart19, Kenneth Stuart2, Bart Barrell1, Peter J. Myler2, Peter J. Myler19 
15 Jul 2005-Science
TL;DR: The organization of protein-coding genes into long, strand-specific, polycistronic clusters and lack of general transcription factors in the L. major, Trypanosoma brucei, and Tritryp genomes suggest that the mechanisms regulating RNA polymerase II–directed transcription are distinct from those operating in other eukaryotes, although the trypanosomatids appear capable of chromatin remodeling.
Abstract: Leishmania species cause a spectrum of human diseases in tropical and subtropical regions of the world. We have sequenced the 36 chromosomes of the 32.8-megabase haploid genome of Leishmania major (Friedlin strain) and predict 911 RNA genes, 39 pseudogenes, and 8272 protein-coding genes, of which 36% can be ascribed a putative function. These include genes involved in host-pathogen interactions, such as proteolytic enzymes, and extensive machinery for synthesis of complex surface glycoconjugates. The organization of protein-coding genes into long, strand-specific, polycistronic clusters and lack of general transcription factors in the L. major, Trypanosoma brucei, and Trypanosoma cruzi (Tritryp) genomes suggest that the mechanisms regulating RNA polymerase II-directed transcription are distinct from those operating in other eukaryotes, although the trypanosomatids appear capable of chromatin remodeling. Abundant RNA-binding proteins are encoded in the Tritryp genomes, consistent with active posttranscriptional regulation of gene expression.

1,357 citations


Journal ArticleDOI
William C. Nierman1, William C. Nierman2, Arnab Pain3, Michael J. Anderson4, Jennifer R. Wortman1, Jennifer R. Wortman2, H. Stanley Kim2, H. Stanley Kim1, Javier Arroyo5, Matthew Berriman3, Keietsu Abe6, David B. Archer7, Clara Bermejo5, Joan W. Bennett8, Paul Bowyer4, Dan Chen2, Dan Chen1, Matthew Collins3, Richard Coulsen, Robert L. Davies3, Paul S. Dyer7, Mark L. Farman9, Nadia Fedorova1, Nadia Fedorova2, Natalie D. Fedorova2, Natalie D. Fedorova1, T. Feldblyum1, T. Feldblyum2, Reinhard Fischer10, Nigel Fosker3, Audrey Fraser3, José Luis García11, María Josefa Marcos García12, Ariette Goble3, Gustavo H. Goldman13, Katsuya Gomi6, Sam Griffith-Jones3, R. Gwilliam3, Brian J. Haas1, Brian J. Haas2, Hubertus Haas14, David Harris3, H. Horiuchi15, Jiaqi Huang1, Jiaqi Huang2, Sean Humphray3, Javier Jiménez12, Nancy P. Keller15, H. Khouri1, H. Khouri2, Katsuhiko Kitamoto16, Tetsuo Kobayashi17, Sven Konzack10, Resham Kulkarni1, Resham Kulkarni2, Toshitaka Kumagai18, Anne Lafton19, Jean-Paul Latgé19, Weixi Li9, Angela Lord3, Charles Lu2, Charles Lu1, William H. Majoros1, William H. Majoros2, Gregory S. May20, Bruce L. Miller21, Yasmin Ali Mohamoud1, Yasmin Ali Mohamoud2, María Molina5, Michel Monod22, Isabelle Mouyna19, Stephanie Mulligan1, Stephanie Mulligan2, Lee Murphy3, Susan O'Neil3, Ian T. Paulsen1, Ian T. Paulsen2, Miguel A. Peñalva11, Mihaela Pertea1, Mihaela Pertea2, Claire Price3, Bethan L. Pritchard4, Michael A. Quail3, Ester Rabbinowitsch3, Neil Rawlins3, Marie Adele Rajandream3, Utz Reichard23, Hubert Renauld3, Geoffrey D. Robson4, Santiago Rodríguez de Córdoba11, José Manuel Rodríguez-Peña5, Catherine M. Ronning1, Catherine M. Ronning2, Simon Rutter3, Steven L. Salzberg1, Steven L. Salzberg2, Miguel del Nogal Sánchez12, Juan C. Sánchez-Ferrero11, David L. Saunders3, Kathy Seeger3, Rob Squares3, S. Squares3, Michio Takeuchi24, Fredj Tekaia19, Geoffrey Turner25, Carlos R. Vázquez de Aldana12, J. Weidman2, J. Weidman1, Owen White2, Owen White1, John Woodward3, Jae-Hyuk Yu15, Claire M. Fraser2, Claire M. Fraser1, James E. Galagan26, Kiyoshi Asai18, Masayuki Machida18, Neil Hall2, Neil Hall3, Bart Barrell3, David W. Denning4 
22 Dec 2005-Nature
TL;DR: The Af293 genome sequence provides an unparalleled resource for the future understanding of this remarkable fungus and revealed temperature-dependent expression of distinct sets of genes, as well as 700 A. fumigatus genes not present or significantly diverged in the closely related sexual species Neosartorya fischeri, many of which may have roles in the pathogenicity phenotype.
Abstract: Aspergillus fumigatus is exceptional among microorganisms in being both a primary and opportunistic pathogen as well as a major allergen. Its conidia production is prolific, and so human respiratory tract exposure is almost constant. A. fumigatus is isolated from human habitats and vegetable compost heaps. In immunocompromised individuals, the incidence of invasive infection can be as high as 50% and the mortality rate is often about 50% (ref. 2). The interaction of A. fumigatus and other airborne fungi with the immune system is increasingly linked to severe asthma and sinusitis. Although the burden of invasive disease caused by A. fumigatus is substantial, the basic biology of the organism is mostly obscure. Here we show the complete 29.4-megabase genome sequence of the clinical isolate Af293, which consists of eight chromosomes containing 9,926 predicted genes. Microarray analysis revealed temperature-dependent expression of distinct sets of genes, as well as 700 A. fumigatus genes not present or significantly diverged in the closely related sexual species Neosartorya fischeri, many of which may have roles in the pathogenicity phenotype. The Af293 genome sequence provides an unparalleled resource for the future understanding of this remarkable fungus.

1,356 citations


Journal ArticleDOI
22 Dec 2005-Nature
TL;DR: The aspergilli comprise a diverse group of filamentous fungi spanning over 200 million years of evolution, and a comparative study with Aspergillus fumigatus and As pergillus oryzae, used in the production of sake, miso and soy sauce, provides new insight into eukaryotic genome evolution and gene regulation.
Abstract: The aspergilli comprise a diverse group of filamentous fungi spanning over 200 million years of evolution. Here we report the genome sequence of the model organism Aspergillus nidulans, and a comparative study with Aspergillus fumigatus, a serious human pathogen, and Aspergillus oryzae, used in the production of sake, miso and soy sauce. Our analysis of genome structure provided a quantitative evaluation of forces driving long-term eukaryotic genome evolution. It also led to an experimentally validated model of mating-type locus evolution, suggesting the potential for sexual reproduction in A. fumigatus and A. oryzae. Our analysis of sequence conservation revealed over 5,000 non-coding regions actively conserved across all three species. Within these regions, we identified potential functional elements including a previously uncharacterized TPP riboswitch and motifs suggesting regulation in filamentous fungi by Puf family genes. We further obtained comparative and experimental evidence indicating widespread translational regulation by upstream open reading frames. These results enhance our understanding of these widely studied fungi as well as provide new insight into eukaryotic genome evolution and gene regulation.

1,297 citations


Journal ArticleDOI
06 May 2005-Science
TL;DR: Injection of miR-430 miRNAs rescues the brain defects in MZdicer mutants, revealing essential roles for miRNas during morphogenesis.
Abstract: MicroRNAs (miRNAs) are small RNAs that regulate gene expression posttranscriptionally. To block all miRNA formation in zebrafish, we generated maternal-zygotic dicer (MZdicer) mutants that disrupt the Dicer ribonuclease III and double-stranded RNA-binding domains. Mutant embryos do not process precursor miRNAs into mature miRNAs, but injection of preprocessed miRNAs restores gene silencing, indicating that the disrupted domains are dispensable for later steps in silencing. MZdicer mutants undergo axis formation and differentiate multiple cell types but display abnormal morphogenesis during gastrulation, brain formation, somitogenesis, and heart development. Injection of miR-430 miRNAs rescues the brain defects in MZdicer mutants, revealing essential roles for miRNAs during morphogenesis.

1,292 citations


Journal ArticleDOI
Ludwig Eichinger1, Justin A. Pachebat2, Justin A. Pachebat1, Gernot Glöckner, Marie-Adèle Rajandream3, Richard Sucgang4, Matthew Berriman3, J. Song4, Rolf Olsen5, Karol Szafranski, Qikai Xu4, Budi Tunggal1, Sarah K. Kummerfeld2, Martin Madera2, Bernard Anri Konfortov2, Francisco Rivero1, Alan T. Bankier2, Rüdiger Lehmann, N. Hamlin3, Robert L. Davies3, Pascale Gaudet6, Petra Fey6, Karen E Pilcher6, Guokai Chen4, David L. Saunders3, Erica Sodergren4, P. Davis3, Arnaud Kerhornou3, X. Nie4, Neil Hall3, Christophe Anjard5, Lisa Hemphill4, Nathalie Bason3, Patrick Farbrother1, Brian A. Desany4, Eric M. Just6, Takahiro Morio7, René Rost8, Carol Churcher3, J. Cooper3, Stephen F. Haydock9, N. van Driessche4, Ann Cronin3, Ian Goodhead3, Donna M. Muzny4, T. Mourier3, Arnab Pain3, Mingyang Lu4, D. Harper3, R. Lindsay4, Heidi Hauser3, Kylie R. James3, M. Quiles4, M. Madan Babu2, Tsuneyuki Saito10, Carmen Buchrieser11, A. Wardroper2, A. Wardroper12, Marius Felder, M. Thangavelu, D. Johnson3, Andrew J Knights3, H. Loulseged4, Karen Mungall3, Karen Oliver3, Claire Price3, Michael A. Quail3, Hideko Urushihara7, Judith Hernandez4, Ester Rabbinowitsch3, David Steffen4, Mandy Sanders3, Jun Ma4, Yuji Kohara13, Sarah Sharp3, Mark Simmonds3, S. Spiegler3, Adrian Tivey3, Sumio Sugano14, Brian White3, Danielle Walker3, John Woodward3, Thomas Winckler, Yoshiaki Tanaka7, Gad Shaulsky4, Michael Schleicher8, George M. Weinstock4, André Rosenthal, Edward C. Cox15, Rex L. Chisholm6, Richard A. Gibbs4, William F. Loomis5, Matthias Platzer, Robert R. Kay2, Jeffrey G. Williams16, Paul H. Dear2, Angelika A. Noegel1, Bart Barrell3, Adam Kuspa4 
05 May 2005-Nature
TL;DR: A proteome-based phylogeny shows that the amoebozoa diverged from the animal–fungal lineage after the plant–animal split, but Dictyostelium seems to have retained more of the diversity of the ancestral genome than have plants, animals or fungi.
Abstract: The social amoebae are exceptional in their ability to alternate between unicellular and multicellular forms. Here we describe the genome of the best-studied member of this group, Dictyostelium discoideum. The gene-dense chromosomes of this organism encode approximately 12,500 predicted proteins, a high proportion of which have long, repetitive amino acid tracts. There are many genes for polyketide synthases and ABC transporters, suggesting an extensive secondary metabolism for producing and exporting small molecules. The genome is rich in complex repeats, one class of which is clustered and may serve as centromeres. Partial copies of the extrachromosomal ribosomal DNA (rDNA) element are found at the ends of each chromosome, suggesting a novel telomere structure and the use of a common mechanism to maintain both the rDNA and chromosomal termini. A proteome-based phylogeny shows that the amoebozoa diverged from the animal-fungal lineage after the plant-animal split, but Dictyostelium seems to have retained more of the diversity of the ancestral genome than have plants, animals or fungi.

1,289 citations


Journal ArticleDOI
Mark T. Ross1, Darren Grafham1, Alison J. Coffey1, Steven E. Scherer2  +279 moreInstitutions (15)
17 Mar 2005-Nature
TL;DR: This analysis illustrates the autosomal origin of the mammalian sex chromosomes, the stepwise process that led to the progressive loss of recombination between X and Y, and the extent of subsequent degradation of the Y chromosome.
Abstract: The human X chromosome has a unique biology that was shaped by its evolution as the sex chromosome shared by males and females. We have determined 99.3% of the euchromatic sequence of the X chromosome. Our analysis illustrates the autosomal origin of the mammalian sex chromosomes, the stepwise process that led to the progressive loss of recombination between X and Y, and the extent of subsequent degradation of the Y chromosome. LINE1 repeat elements cover one-third of the X chromosome, with a distribution that is consistent with their proposed role as way stations in the process of X-chromosome inactivation. We found 1,098 genes in the sequence, of which 99 encode proteins expressed in testis and in various tumour types. A disproportionately high number of mendelian diseases are documented for the X chromosome. Of this number, 168 have been explained by mutations in 113 X-linked genes, which in many cases were characterized with the aid of the DNA sequence.

Journal ArticleDOI
24 Mar 2005-Nature
TL;DR: This work used RNA-mediated interference to target 98% of all genes predicted in the C. elegans genome in combination with differential interference contrast time-lapse microscopy and developed a phenotypic profiling system, which shows high correlation with cellular processes and biochemical pathways, thus enabling to predict new functions for previously uncharacterized genes.
Abstract: A key challenge of functional genomics today is to generate well-annotated data sets that can be interpreted across different platforms and technologies. Large-scale functional genomics data often fail to connect to standard experimental approaches of gene characterization in individual laboratories. Furthermore, a lack of universal annotation standards for phenotypic data sets makes it difficult to compare different screening approaches. Here we address this problem in a screen designed to identify all genes required for the first two rounds of cell division in the Caenorhabditis elegans embryo. We used RNA-mediated interference to target 98% of all genes predicted in the C. elegans genome in combination with differential interference contrast time-lapse microscopy. Through systematic annotation of the resulting movies, we developed a phenotypic profiling system, which shows high correlation with cellular processes and biochemical pathways, thus enabling us to predict new functions for previously uncharacterized genes.

Journal ArticleDOI
07 Jan 2005-Science
TL;DR: It is observed posttranscriptional gene silencing through translational repression of messenger RNA during sexual development, and a 47-base 3′ untranslated region motif is implicated in this process.
Abstract: Plasmodium berghei and Plasmodium chabaudi are widely used model malaria species. Comparison of their genomes, integrated with proteomic and microarray data, with the genomes of Plasmodium falciparum and Plasmodium yoelii revealed a conserved core of 4500 Plasmodium genes in the central regions of the 14 chromosomes and highlighted genes evolving rapidly because of stage-specific selective pressures. Four strategies for gene expression are apparent during the parasites' life cycle: (i) housekeeping; (ii) host-related; (iii) strategy-specific related to invasion, asexual replication, and sexual development; and (iv) stage-specific. We observed posttranscriptional gene silencing through translational repression of messenger RNA during sexual development, and a 47-base 3' untranslated region motif is implicated in this process.

Journal ArticleDOI
TL;DR: The Sequence Ontology is a structured controlled vocabulary for the parts of a genomic annotation that provides a common set of terms and definitions that will facilitate the exchange, analysis and management of genomic data.
Abstract: The Sequence Ontology (SO) is a structured controlled vocabulary for the parts of a genomic annotation. SO provides a common set of terms and definitions that will facilitate the exchange, analysis and management of genomic data. Because SO treats part-whole relationships rigorously, data described with it can become substrates for automated reasoning, and instances of sequence features described by the SO can be subjected to a group of logical operations termed extensional mereology operators.

Book ChapterDOI
15 Apr 2005
TL;DR: The basic contents and availability of the Pfam database are described, and the new resource that describes domain–domain interactions at the molecular level is called iPfam, a protein families database that contains information on domain– domain interactions.
Abstract: Systematic analysis has shown that the majority of proteins can be grouped into approximately 1000 sequence families. These sequence families are often representative of domains. Pfam is a protein families database. The basic contents and availability of the Pfam database are described. Genome sequencing projects, including the human and fly, have used Pfam extensively for large-scale functional annotation of genomic data, while smaller research groups, devoted to a single protein or biochemical pathway, frequently use Pfam for their analyses. Typically, Pfam matches between 55 and 90% of proteins from complete proteome sets. Pfam also allows the domain distributions to be compared for completed genomes. In addition to sequence domain annotation, Pfam also contains information on domain–domain interactions. The new resource that describes domain–domain interactions at the molecular level is called iPfam. The contents of iPfam are briefly outlined. Keywords: Pfam; genome annotation; HMM; Markov; protein interaction

Journal ArticleDOI
15 Jul 2005-Science
TL;DR: No evidence that these species are descended from an ancestor that contained a photosynthetic endosymbiont is revealed, and a conserved core proteome of about 6200 genes in large syntenic polycistronic gene clusters is revealed.
Abstract: A comparison of gene content and genome architecture of Trypanosoma brucei, Trypanosoma cruzi, and Leishmania major, three related pathogens with different life cycles and disease pathology, revealed a conserved core proteome of about 6200 genes in large syntenic polycistronic gene clusters. Many species-specific genes, especially large surface antigen families, occur at nonsyntenic chromosome-internal and subtelomeric regions. Retroelements, structural RNAs, and gene family expansion are often associated with syntenic discontinuities that-along with gene divergence, acquisition and loss, and rearrangement within the syntenic regions-have shaped the genomes of each parasite. Contrary to recent reports, our analyses reveal no evidence that these species are descended from an ancestor that contained a photosynthetic endosymbiont.

Journal ArticleDOI
TL;DR: The SET-domain protein methyltransferase superfamily includes all but one of the proteins known to methylate histones on lysine.
Abstract: The SET-domain protein methyltransferase superfamily includes all but one of the proteins known to methylate histones on lysine. Histone methylation is important in the regulation of chromatin and gene expression.

Journal ArticleDOI
TL;DR: The results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I) HapMap has sufficient density to enable linkage disequilibrium mapping in humans.
Abstract: The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12–13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs) with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis-) to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I) HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level.

Journal ArticleDOI
09 Sep 2005-Cell
TL;DR: It is shown that expression of clock genes in osteoblasts is regulated by the sympathetic nervous system and leptin, which determines the extent of bone formation by modulating, via sympathetic signaling, osteoblast proliferation through two antagonistic pathways, one of which involves the molecular clock.

Journal ArticleDOI
21 Apr 2005-Nature
TL;DR: It is reported here that inner ears of Lcc/Lcc mice fail to establish a prosensory domain and neither hair cells nor supporting cells differentiate, resulting in a severe inner ear malformation, whereas the sensory epithelium of Ysb/Ysb mice shows abnormal development with disorganized and fewer hair cells.
Abstract: Sensory hair cells and their associated non-sensory supporting cells in the inner ear are fundamental for hearing and balance. They arise from a common progenitor, but little is known about the molecular events specifying this cell lineage. We recently identified two allelic mouse mutants, light coat and circling (Lcc) and yellow submarine (Ysb), that show hearing and balance impairment. Lcc/Lcc mice are completely deaf, whereas Ysb/Ysb mice are severely hearing impaired. We report here that inner ears of Lcc/Lcc mice fail to establish a prosensory domain and neither hair cells nor supporting cells differentiate, resulting in a severe inner ear malformation, whereas the sensory epithelium of Ysb/Ysb mice shows abnormal development with disorganized and fewer hair cells. These phenotypes are due to the absence (in Lcc mutants) or reduced expression (in Ysb mutants) of the transcription factor SOX2, specifically within the developing inner ear. SOX2 continues to be expressed in the inner ears of mice lacking Math1 (also known as Atoh1 and HATH1), a gene essential for hair cell differentiation, whereas Math1 expression is absent in Lcc mutants, suggesting that Sox2 acts upstream of Math1.

Journal ArticleDOI
TL;DR: The literature of C. rodentium is reviewed from its emergence in the mid‐1960s to the most contemporary reports of colonization, pathogenesis, transmission and immunity, providing an excellent in vivo model for A/E lesion forming pathogens.
Abstract: The major classes of enteric bacteria harbour a conserved core genomic structure, common to both commensal and pathogenic strains, that is most likely optimized to a life style involving colonization of the host intestine and transmission via the environment. In pathogenic bacteria this core genome framework is decorated with novel genetic islands that are often associated with adaptive phenotypes such as virulence. This classical genome organization is well illustrated by a group of extracellular enteric pathogens, which includes enteropathogenic Escherichia coli (EPEC), enterohaemorrhagic E. coli (EHEC) and Citrobacter rodentium, all of which use attaching and effacing (A/E) lesion formation as a major mechanism of tissue targeting and infection. Both EHEC and EPEC are poorly pathogenic in mice but infect humans and domestic animals. In contrast, C. rodentium is a natural mouse pathogen that is related to E. coli, hence providing an excellent in vivo model for A/E lesion forming pathogens. C. rodentium also provides a model of infections that are mainly restricted to the lumen of the intestine. The mechanism's by which the immune system deals with such infections has become a topic of great interest in recent years. Here we review the literature of C. rodentium from its emergence in the mid-1960s to the most contemporary reports of colonization, pathogenesis, transmission and immunity.

Journal ArticleDOI
TL;DR: The results suggest that several mutated protein kinases may be contributing to lung cancer development, but that mutations in each one are infrequent.
Abstract: Protein kinases are frequently mutated in human cancer and inhibitors of mutant protein kinases have proven to be effective anticancer drugs. We screened the coding sequences of 518 protein kinases (approximately 1.3 Mb of DNA per sample) for somatic mutations in 26 primary lung neoplasms and seven lung cancer cell lines. One hundred eighty-eight somatic mutations were detected in 141 genes. Of these, 35 were synonymous (silent) changes. This result indicates that most of the 188 mutations were "passenger" mutations that are not causally implicated in oncogenesis. However, an excess of approximately 40 nonsynonymous substitutions compared with that expected by chance (P = 0.07) suggests that some nonsynonymous mutations have been selected and are contributing to oncogenesis. There was considerable variation between individual lung cancers in the number of mutations observed and no mutations were found in lung carcinoids. The mutational spectra of most lung cancers were characterized by a high proportion of C:G > A:T transversions, compatible with the mutagenic effects of tobacco carcinogens. However, one neuroendocrine cancer cell line had a distinctive mutational spectrum reminiscent of UV-induced DNA damage. The results suggest that several mutated protein kinases may be contributing to lung cancer development, but that mutations in each one are infrequent.

Journal ArticleDOI
TL;DR: A novel combination of factors that explains almost 60% of variable response to warfarin are reported, andotype-based dose predictions may in future enable personalised drug treatment from the start of warFarin therapy.
Abstract: We report a novel combination of factors that explains almost 60% of variable response to warfarin. Warfarin is a widely used anticoagulant, which acts through interference with vitamin K epoxide reductase that is encoded by VKORC1. In the next step of the vitamin K cycle, gamma-glutamyl carboxylase encoded by GGCX uses reduced vitamin K to activate clotting factors. We genotyped 201 warfarin-treated patients for common polymorphisms in VKORC1 and GGCX. All the five VKORC1 single-nucleotide polymorphisms covary significantly with warfarin dose, and explain 29–30% of variance in dose. Thus, VKORC1 has a larger impact than cytochrome P450 2C9, which explains 12% of variance in dose. In addition, one GGCX SNP showed a small but significant effect on warfarin dose. Incorrect dosage, especially during the initial phase of treatment, carries a high risk of either severe bleeding or failure to prevent thromboembolism. Genotype-based dose predictions may in future enable personalised drug treatment from the start of warfarin therapy.

Journal ArticleDOI
TL;DR: The complete genome sequence of a highly virulent isolate of F. tularensis is reported and an unexpectedly high proportion of disrupted pathways are found, explaining the fastidious nutritional requirements of the bacterium.
Abstract: Francisella tularensis is one of the most infectious human pathogens known. In the past, both the former Soviet Union and the US had programs to develop weapons containing the bacterium. We report the complete genome sequence of a highly virulent isolate of F. tularensis (1,892,819 bp). The sequence uncovers previously uncharacterized genes encoding type IV pili, a surface polysaccharide and iron-acquisition systems. Several virulence-associated genes were located in a putative pathogenicity island, which was duplicated in the genome. More than 10% of the putative coding sequences contained insertion-deletion or substitution mutations and seemed to be deteriorating. The genome is rich in IS elements, including IS630 Tc-1 mariner family transposons, which are not expected in a prokaryote. We used a computational method for predicting metabolic pathways and found an unexpectedly high proportion of disrupted pathways, explaining the fastidious nutritional requirements of the bacterium. The loss of biosynthetic pathways indicates that F. tularensis is an obligate host-dependent bacterium in its natural life cycle. Our results have implications for our understanding of how highly virulent human pathogens evolve and will expedite strategies to combat them.

Journal ArticleDOI
TL;DR: Cytogenetic analysis now extends beyond the simple description of the chromosomal status of a genome and allows the study of fundamental biological questions, such as the nature of inherited syndromes, the genomic changes that are involved in tumorigenesis and the three-dimensional organization of the human genome.
Abstract: Exciting advances in fluorescence in situ hybridization and array-based techniques are changing the nature of cytogenetics, in both basic research and molecular diagnostics. Cytogenetic analysis now extends beyond the simple description of the chromosomal status of a genome and allows the study of fundamental biological questions, such as the nature of inherited syndromes, the genomic changes that are involved in tumorigenesis and the three-dimensional organization of the human genome. The high resolution that is achieved by these techniques, particularly by microarray technologies such as array comparative genomic hybridization, is blurring the traditional distinction between cytogenetics and molecular biology.

Journal ArticleDOI
TL;DR: A previously unknown class of immunoglobulin ζ is identified, expressed in zebrafish and other teleosts, and raises questions concerning the evolution of Immunoglobulins and the regulation of the differential expression of ighz and ighm.
Abstract: The only immunoglobulin heavy-chain classes known so far in teleosts have been mu and delta. We identify here a previously unknown class, immunoglobulin zeta, expressed in zebrafish and other teleosts. In the zebrafish heavy-chain locus, variable (V) gene segments lie upstream of two tandem diversity, joining and constant (DJC) clusters, resembling the mouse T cell receptor alpha (Tcra) and delta (Tcrd) locus. V genes rearrange to (DJC)(zeta) or to (DJC)(mu) without evidence of switch rearrangement. The zebrafish immunoglobulin zeta gene (ighz) and mouse Tcrd, which are proximal to the V gene array, are expressed earlier in development. In adults, ighz was expressed only in kidney and thymus, which are primary lymphoid organs in teleosts. This additional class adds complexity to the immunoglobulin repertoire and raises questions concerning the evolution of immunoglobulins and the regulation of the differential expression of ighz and ighm.

Journal ArticleDOI
TL;DR: The embryonic origin, signalling roles and ultimate fate of the notochord are discussed, with an emphasis on structural aspects ofNotochord biology.
Abstract: The notochord is the defining structure of the chordates, and has essential roles in vertebrate development. It serves as a source of midline signals that pattern surrounding tissues and as a major skeletal element of the developing embryo. Genetic and embryological studies over the past decade have informed us about the development and function of the notochord. In this review, I discuss the embryonic origin, signalling roles and ultimate fate of the notochord, with an emphasis on structural aspects of notochord biology.

Journal ArticleDOI
TL;DR: In many tumors, the coding sequence of 518 protein kinases was examined, and a few had numerous somatic mutations with distinctive patterns indicative of either a mutator phenotype or a past exposure.
Abstract: We examined the coding sequence of 518 protein kinases, approximately 1.3 Mb of DNA per sample, in 25 breast cancers. In many tumors, we detected no somatic mutations. But a few had numerous somatic mutations with distinctive patterns indicative of either a mutator phenotype or a past exposure.

Journal ArticleDOI
01 Jul 2005-Science
TL;DR: The genome sequence of Theileria parva is reported, an apicomplexan pathogen causing economic losses to smallholder farmers in Africa, and its plastid-like genome represents the first example where all apicoplast genes are encoded on one DNA strand.
Abstract: We report the genome sequence of Theileria parva, an apicomplexan pathogen causing economic losses to smallholder farmers in Africa. The parasite chromosomes exhibit limited conservation of gene synteny with Plasmodium falciparum, and its plastid-like genome represents the first example where all apicoplast genes are encoded on one DNA strand. We tentatively identify proteins that facilitate parasite segregation during host cell cytokinesis and contribute to persistent infection of transformed host cells. Several biosynthetic pathways are incomplete or absent, suggesting substantial metabolic dependence on the host cell. One protein family that may generate parasite antigenic diversity is not telomere-associated.

Journal ArticleDOI
TL;DR: Improved annotation permitted a detailed analysis of several multigene families, and comparative genomic studies showed that C. albicans has a far greater catabolic range, encoding respiratory Complex 1, several novel oxidoreductases and ketone body degrading enzymes, malonyl- CoA and enoyl-CoA carriers, and numerous transporters to assimilate the resulting nutrients.
Abstract: Recent sequencing and assembly of the genome for the fungal pathogen Candida albicans used simple automated procedures for the identification of putative genes. We have reviewed the entire assembly, both by hand and with additional bioinformatic resources, to accurately map and describe 6,354 genes and to identify 246 genes whose original database entries contained sequencing errors (or possibly mutations) that affect their reading frame. Comparison with other fungal genomes permitted the identification of numerous fungus-specific genes that might be targeted for antifungal therapy. We also observed that, compared to other fungi, the protein-coding sequences in the C. albicans genome are especially rich in short sequence repeats. Finally, our improved annotation permitted a detailed analysis of several multigene families, and comparative genomic studies showed that C. albicans has a far greater catabolic range, encoding respiratory Complex 1, several novel oxidoreductases and ketone body degrading enzymes, malonyl-CoA and enoyl-CoA carriers, several novel amino acid degrading enzymes, a variety of secreted catabolic lipases and proteases, and numerous transporters to assimilate the resulting nutrients. The results of these efforts will ensure that the Candida research community has uniform and comprehensive genomic information for medical research as well as for future diagnostic and therapeutic applications.