scispace - formally typeset
Search or ask a question
Author

Susan Van Aken

Bio: Susan Van Aken is an academic researcher from J. Craig Venter Institute. The author has contributed to research in topics: Genome & Gene. The author has an hindex of 16, co-authored 16 publications receiving 11022 citations.

Papers
More filters
Journal ArticleDOI
Takashi Matsumoto1, Jianzhong Wu1, Hiroyuki Kanamori1, Yuichi Katayose1  +262 moreInstitutions (25)
11 Aug 2005-Nature
TL;DR: A map-based, finished quality sequence that covers 95% of the 389 Mb rice genome, including virtually all of the euchromatin and two complete centromeres, and finds evidence for widespread and recurrent gene transfer from the organelles to the nuclear chromosomes.
Abstract: Rice, one of the world's most important food plants, has important syntenic relationships with the other cereal species and is a model plant for the grasses. Here we present a map-based, finished quality sequence that covers 95% of the 389 Mb genome, including virtually all of the euchromatin and two complete centromeres. A total of 37,544 non-transposable-element-related protein-coding genes were identified, of which 71% had a putative homologue in Arabidopsis. In a reciprocal analysis, 90% of the Arabidopsis proteins had a putative homologue in the predicted rice proteome. Twenty-nine per cent of the 37,544 predicted genes appear in clustered gene families. The number and classes of transposable elements found in the rice genome are consistent with the expansion of syntenic regions in the maize and sorghum genomes. We find evidence for widespread and recurrent gene transfer from the organelles to the nuclear chromosomes. The map-based sequence has proven useful for the identification of genes underlying agronomic traits. The additional single-nucleotide polymorphisms and simple sequence repeats identified in our study should accelerate improvements in rice production.

3,423 citations

Journal ArticleDOI
Matthew Berriman1, Elodie Ghedin2, Elodie Ghedin3, Christiane Hertz-Fowler1, Gaëlle Blandin2, Hubert Renauld1, Daniella Castanheira Bartholomeu2, Nicola Lennard1, Elisabet Caler2, N. Hamlin1, Brian J. Haas2, Ulrike Böhme1, Linda Hannick2, Martin Aslett1, Joshua Shallom2, Lucio Marcello4, Lihua Hou2, Bill Wickstead5, U. Cecilia M. Alsmark6, Claire Arrowsmith1, Rebecca Atkin1, Andrew Barron1, Frédéric Bringaud7, Karen Brooks1, Mark Carrington8, Inna Cherevach1, Tracey-Jane Chillingworth1, Carol Churcher1, Louise Clark1, Craig Corton1, Ann Cronin1, Robert L. Davies1, Jonathon Doggett1, Appolinaire Djikeng2, Tamara Feldblyum2, Mark C. Field8, Audrey Fraser1, Ian Goodhead1, Zahra Hance1, David Harper1, Barbara Harris1, Heidi Hauser1, Jessica B. Hostetler2, Al Ivens1, Kay Jagels1, David W. Johnson1, Justin Johnson2, Kristine Jones2, Arnaud Kerhornou1, Hean Koo2, Natasha Larke1, Scott M. Landfear9, Christopher Larkin2, Vanessa Leech8, Alexandra Line1, Angela Lord1, Annette MacLeod4, P. Mooney1, Sharon Moule1, David M. A. Martin10, Gareth W. Morgan11, Karen Mungall1, Halina Norbertczak1, Doug Ormond1, Grace Pai2, Christopher S. Peacock1, Jeremy Peterson2, Michael A. Quail1, Ester Rabbinowitsch1, Marie-Adèle Rajandream1, Chris P Reitter8, Steven L. Salzberg2, Mandy Sanders1, Seth Schobel2, Sarah Sharp1, Mark Simmonds1, Anjana J. Simpson2, Luke J. Tallon2, C. Michael R. Turner4, Andrew Tait4, Adrian Tivey1, Susan Van Aken2, Danielle Walker1, David Wanless2, Shiliang Wang2, Brian White1, Owen White2, Sally Whitehead1, John Woodward1, Jennifer R. Wortman2, Mark Raymond Adams12, T. Martin Embley6, Keith Gull5, Elisabetta Ullu13, J. David Barry4, Alan H. Fairlamb10, Fred R. Opperdoes14, Barclay G. Barrell1, John E. Donelson15, Neil Hall16, Neil Hall2, Claire M. Fraser2, Sara E. Melville8, Najib M. El-Sayed3, Najib M. El-Sayed2 
15 Jul 2005-Science
TL;DR: Comparisons of the cytoskeleton and endocytic trafficking systems of Trypanosoma brucei with those of humans and other eukaryotic organisms reveal major differences.
Abstract: African trypanosomes cause human sleeping sickness and livestock trypanosomiasis in sub-Saharan Africa. We present the sequence and analysis of the 11 megabase-sized chromosomes of Trypanosoma brucei. The 26-megabase genome contains 9068 predicted genes, including ∼900 pseudogenes and ∼1700 T. brucei–specific genes. Large subtelomeric arrays contain an archive of 806 variant surface glycoprotein (VSG) genes used by the parasite to evade the mammalian immune system. Most VSG genes are pseudogenes, which may be used to generate expressed mosaic genes by ectopic recombination. Comparisons of the cytoskeleton and endocytic trafficking systems with those of humans and other eukaryotic organisms reveal major differences. A comparison of metabolic pathways encoded by the genomes of T. brucei, T. cruzi, and Leishmania major reveals the least overall metabolic capability in T. brucei and the greatest in L. major. Horizontal transfer of genes of bacterial origin has contributed to some of the metabolic differences in these parasites, and a number of novel potential drug targets have been identified.

1,631 citations

Journal ArticleDOI
Najib M. El-Sayed1, Peter J. Myler2, Peter J. Myler3, Daniella Castanheira Bartholomeu4, Daniel Nilsson5, Gautam Aggarwal3, Anh-Nhi Tran5, Elodie Ghedin1, Elizabeth A. Worthey3, Arthur L. Delcher, Gaëlle Blandin4, Scott J. Westenberger6, Elisabet Caler4, Gustavo C. Cerqueira7, Carole Branche5, Brian J. Haas4, Atashi Anupama3, Erik Arner5, Lena Åslund8, Philip Attipoe3, Esteban J. Bontempi5, Frédéric Bringaud9, Peter Burton10, Eithon Cadag3, David A. Campbell6, Mark Carrington11, Jonathan Crabtree4, Hamid Darban5, José Franco da Silveira12, Pieter J. de Jong13, Kimberly Edwards5, Paul T. Englund14, Gholam Fazelina3, Tamara Feldblyum4, Marcela Ferella5, Alberto C.C. Frasch15, Keith Gull16, David Horn17, Lihua Hou4, Yiting Huang3, Ellen Kindlund5, Michele M. Klingbeil18, Sindy Kluge5, Hean Koo4, Daniela R. Lacerda19, Mariano J. Levin20, Hernan Lorenzi20, Tin Louie3, Carlos Renato Machado7, Richard McCulloch10, Alan McKenna5, Yumi Mizuno5, Jeremy C. Mottram10, Siri Nelson3, Stephen Ochaya5, Kazutoyo Osoegawa13, Grace Pai4, Marilyn Parsons3, Marilyn Parsons2, Martin Pentony3, Ulf Pettersson8, Mihai Pop4, José Luis Ramírez21, Joel Rinta3, Laura Robertson3, Steven L. Salzberg, Daniel O. Sánchez15, Amber Seyler3, Reuben Sunil Kumar Sharma11, Jyoti Shetty4, Anjana J. Simpson4, Ellen Sisk3, Martti T. Tammi22, Martti T. Tammi5, Rick L. Tarleton23, Santuza M. R. Teixeira7, Susan Van Aken4, Christy Vogt3, Pauline N. Ward10, Bill Wickstead16, Jennifer R. Wortman4, Owen White4, Claire M. Fraser4, Kenneth Stuart3, Kenneth Stuart2, Björn Andersson5 
15 Jul 2005-Science
TL;DR: Although the Tritryp lack several classes of signaling molecules, their kinomes contain a large and diverse set of protein kinases and phosphatases; their size and diversity imply previously unknown interactions and regulatory processes, which may be targets for intervention.
Abstract: Whole-genome sequencing of the protozoan pathogen Trypanosoma cruzi revealed that the diploid genome contains a predicted 22,570 proteins encoded by genes, of which 12,570 represent allelic pairs. Over 50% of the genome consists of repeated sequences, such as retrotransposons and genes for large families of surface molecules, which include trans-sialidases, mucins, gp63s, and a large novel family (>1300 copies) of mucin-associated surface protein (MASP) genes. Analyses of the T. cruzi, T. brucei, and Leishmania major (Tritryp) genomes imply differences from other eukaryotes in DNA repair and initiation of replication and reflect their unusual mitochondrial DNA. Although the Tritryp lack several classes of signaling molecules, their kinomes contain a large and diverse set of protein kinases and phosphatases; their size and diversity imply previously unknown interactions and regulatory processes, which may be targets for intervention.

1,349 citations

Journal ArticleDOI
TL;DR: The complete genome sequence of the model bacterial pathogen Pseudomonas syringae pathovar tomato DC3000 (DC3000), which is pathogenic on tomato and Arabidopsis thaliana, is reported and 1,159 genes unique to DC3000 are revealed, of which 811 lack a known function.
Abstract: We report the complete genome sequence of the model bacterial pathogen Pseudomonas syringae pathovar tomato DC3000 (DC3000), which is pathogenic on tomato and Arabidopsis thaliana. The DC3000 genome (6.5 megabases) contains a circular chromosome and two plasmids, which collectively encode 5,763 ORFs. We identified 298 established and putative virulence genes, including several clusters of genes encoding 31 confirmed and 19 predicted type III secretion system effector proteins. Many of the virulence genes were members of paralogous families and also were proximal to mobile elements, which collectively comprise 7% of the DC3000 genome. The bacterium possesses a large repertoire of transporters for the acquisition of nutrients, particularly sugars, as well as genes implicated in attachment to plant surfaces. Over 12% of the genes are dedicated to regulation, which may reflect the need for rapid adaptation to the diverse environments encountered during epiphytic growth and pathogenesis. Comparative analyses confirmed a high degree of similarity with two sequenced pseudomonads, Pseudomonas putida and Pseudomonas aeruginosa, yet revealed 1,159 genes unique to DC3000, of which 811 lack a known function.

835 citations

Journal ArticleDOI
16 Dec 1999-Nature
TL;DR: The sequence of chromosome 2 from the Columbia ecotype is reported in two gap-free assemblies (contigs) of 3.6 and 16 megabases, which represents the longest published stretch of uninterrupted DNA sequence assembled from any organism to date.
Abstract: Arabidopsis thaliana (Arabidopsis) is unique among plant model organisms in having a small genome (130-140 Mb), excellent physical and genetic maps, and little repetitive DNA. Here we report the sequence of chromosome 2 from the Columbia ecotype in two gap-free assemblies (contigs) of 3.6 and 16 megabases (Mb). The latter represents the longest published stretch of uninterrupted DNA sequence assembled from any organism to date. Chromosome 2 represents 15% of the genome and encodes 4,037 genes, 49% of which have no predicted function. Roughly 250 tandem gene duplications were found in addition to large-scale duplications of about 0.5 and 4.5 Mb between chromosomes 2 and 1 and between chromosomes 2 and 4, respectively. Sequencing of nearly 2 Mb within the genetically defined centromere revealed a low density of recognizable genes, and a high density and diverse range of vestigial and presumably inactive mobile elements. More unexpected is what appears to be a recent insertion of a continuous stretch of 75% of the mitochondrial genome into chromosome 2.

792 citations


Cited by
More filters
28 Jul 2005
TL;DR: PfPMP1)与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作�ly.
Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1(PfPMP1)与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员,通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

18,940 citations

Journal ArticleDOI
J. Craig Venter1, Mark Raymond Adams1, Eugene W. Myers1, Peter W. Li1  +269 moreInstitutions (12)
16 Feb 2001-Science
TL;DR: Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems are indicated.
Abstract: A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.

12,098 citations

Journal ArticleDOI
14 Dec 2000-Nature
TL;DR: This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.
Abstract: The flowering plant Arabidopsis thaliana is an important model system for identifying genes and determining their functions. Here we report the analysis of the genomic sequence of Arabidopsis. The sequenced regions cover 115.4 megabases of the 125-megabase genome and extend into centromeric regions. The evolution of Arabidopsis involved a whole-genome duplication, followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral gene transfer from a cyanobacterial-like ancestor of the plastid. The genome contains 25,498 genes encoding proteins from 11,000 families, similar to the functional diversity of Drosophila and Caenorhabditis elegans--the other sequenced multicellular eukaryotes. Arabidopsis has many families of new proteins but also lacks several common protein families, indicating that the sets of common proteins have undergone differential expansion and contraction in the three multicellular eukaryotes. This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.

8,742 citations

Journal ArticleDOI
24 Mar 2000-Science
TL;DR: The nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome is determined using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map.
Abstract: The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes approximately 13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.

6,180 citations

Journal ArticleDOI
TL;DR: OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs.
Abstract: The identification of orthologous groups is useful for genome annotation, studies on gene/protein evolution, comparative genomics, and the identification of taxonomically restricted sequences. Methods successfully exploited for prokaryotic genome analysis have proved difficult to apply to eukaryotes, however, as larger genomes may contain multiple paralogous genes, and sequence information is often incomplete. OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs. This method performs similarly to the INPARANOID algorithm when applied to two genomes, but can be extended to cluster orthologs from multiple species. OrthoMCL clusters are coherent with groups identified by EGO, but improved recognition of "recent" paralogs permits overlapping EGO groups representing the same gene to be merged. Comparison with previously assigned EC annotations suggests a high degree of reliability, implying utility for automated eukaryotic genome annotation. OrthoMCL has been applied to the proteome data set from seven publicly available genomes (human, fly, worm, yeast, Arabidopsis, the malaria parasite Plasmodium falciparum, and Escherichia coli). A Web interface allows queries based on individual genes or user-defined phylogenetic patterns (http://www.cbil.upenn.edu/gene-family). Analysis of clusters incorporating P. falciparum genes identifies numerous enzymes that were incompletely annotated in first-pass annotation of the parasite genome.

5,321 citations