scispace - formally typeset
Search or ask a question
Author

Inna Dubchak

Bio: Inna Dubchak is an academic researcher from Lawrence Berkeley National Laboratory. The author has contributed to research in topics: Genome & Gene. The author has an hindex of 64, co-authored 122 publications receiving 41115 citations. Previous affiliations of Inna Dubchak include Joint Genome Institute & United States Department of Energy.


Papers
More filters
Journal ArticleDOI
Robert H. Waterston1, Kerstin Lindblad-Toh2, Ewan Birney, Jane Rogers3  +219 moreInstitutions (26)
05 Dec 2002-Nature
TL;DR: The results of an international collaboration to produce a high-quality draft sequence of the mouse genome are reported and an initial comparative analysis of the Mouse and human genomes is presented, describing some of the insights that can be gleaned from the two sequences.
Abstract: The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

6,643 citations

Journal ArticleDOI
Gerald A. Tuskan1, Gerald A. Tuskan2, Stephen P. DiFazio1, Stephen P. DiFazio3, Stefan Jansson4, Joerg Bohlmann5, Igor V. Grigoriev6, Uffe Hellsten6, Nicholas H. Putnam6, Steven G. Ralph5, Stephane Rombauts7, Asaf Salamov6, Jacquie Schein, Lieven Sterck7, Andrea Aerts6, Rishikeshi Bhalerao4, Rishikesh P. Bhalerao8, Damien Blaudez9, Wout Boerjan7, Annick Brun9, Amy M. Brunner10, Victor Busov11, Malcolm M. Campbell12, John E. Carlson13, Michel Chalot9, Jarrod Chapman6, G.-L. Chen1, Dawn Cooper5, Pedro M. Coutinho14, Jérémy Couturier9, Sarah F. Covert15, Quentin C. B. Cronk5, R. Cunningham1, John M. Davis16, Sven Degroeve7, Annabelle Déjardin9, Claude W. dePamphilis13, John C. Detter6, Bill Dirks17, Inna Dubchak18, Inna Dubchak6, Sébastien Duplessis9, Jürgen Ehlting5, Brian E. Ellis5, Karla C Gendler19, David Goodstein6, Michael Gribskov20, Jane Grimwood21, Andrew Groover22, Lee E. Gunter1, Björn Hamberger5, Berthold Heinze, Yrjö Helariutta23, Yrjö Helariutta8, Yrjö Helariutta24, Bernard Henrissat14, D. Holligan15, Robert A. Holt, Wenyu Huang6, N. Islam-Faridi22, Steven J.M. Jones, M. Jones-Rhoades25, Richard A. Jorgensen19, Chandrashekhar P. Joshi11, Jaakko Kangasjärvi24, Jan Karlsson4, Colin T. Kelleher5, Robert Kirkpatrick, Matias Kirst16, Annegret Kohler9, Udaya C. Kalluri1, Frank W. Larimer1, Jim Leebens-Mack15, Jean-Charles Leplé9, Philip F. LoCascio1, Y. Lou6, Susan Lucas6, Francis Martin9, Barbara Montanini9, Carolyn A. Napoli19, David R. Nelson26, C D Nelson22, Kaisa Nieminen24, Ove Nilsson8, V. Pereda9, Gary F. Peter16, Ryan N. Philippe5, Gilles Pilate9, Alexander Poliakov18, J. Razumovskaya1, Paul G. Richardson6, Cécile Rinaldi9, Kermit Ritland5, Pierre Rouzé7, D. Ryaboy18, Jeremy Schmutz21, J. Schrader27, Bo Segerman4, H. Shin, Asim Siddiqui, Fredrik Sterky, Astrid Terry6, Chung-Jui Tsai11, Edward C. Uberbacher1, Per Unneberg, Jorma Vahala24, Kerr Wall13, Susan R. Wessler15, Guojun Yang15, T. Yin1, Carl J. Douglas5, Marco A. Marra, Göran Sandberg8, Y. Van de Peer7, Daniel S. Rokhsar17, Daniel S. Rokhsar6 
15 Sep 2006-Science
TL;DR: The draft genome of the black cottonwood tree, Populus trichocarpa, has been reported in this paper, with more than 45,000 putative protein-coding genes identified.
Abstract: We report the draft genome of the black cottonwood tree, Populus trichocarpa. Integration of shotgun sequence assembly with genetic mapping enabled chromosome-scale reconstruction of the genome. More than 45,000 putative protein-coding genes were identified. Analysis of the assembled genome revealed a whole-genome duplication event; about 8000 pairs of duplicated genes from that event survived in the Populus genome. A second, older duplication event is indistinguishably coincident with the divergence of the Populus and Arabidopsis lineages. Nucleotide substitution, tandem gene duplication, and gross chromosomal rearrangement appear to proceed substantially more slowly in Populus than in Arabidopsis. Populus has more protein-coding genes than Arabidopsis, ranging on average from 1.4 to 1.6 putative Populus homologs for each Arabidopsis gene. However, the relative frequency of protein domains in the two genomes is similar. Overrepresented exceptions in Populus include genes associated with lignocellulosic wall biosynthesis, meristem development, disease resistance, and metabolite transport.

4,025 citations

Journal ArticleDOI
29 Jan 2009-Nature
TL;DR: An initial analysis of the ∼730-megabase Sorghum bicolor (L.) Moench genome is presented, placing ∼98% of genes in their chromosomal context using whole-genome shotgun sequence validated by genetic, physical and syntenic information.
Abstract: Sorghum, an African grass related to sugar cane and maize, is grown for food, feed, fibre and fuel. We present an initial analysis of the approximately 730-megabase Sorghum bicolor (L.) Moench genome, placing approximately 98% of genes in their chromosomal context using whole-genome shotgun sequence validated by genetic, physical and syntenic information. Genetic recombination is largely confined to about one-third of the sorghum genome with gene order and density similar to those of rice. Retrotransposon accumulation in recombinationally recalcitrant heterochromatin explains the approximately 75% larger genome size of sorghum compared with rice. Although gene and repetitive DNA distributions have been preserved since palaeopolyploidization approximately 70 million years ago, most duplicated gene sets lost one member before the sorghum-rice divergence. Concerted evolution makes one duplicated chromosomal segment appear to be only a few million years old. About 24% of genes are grass-specific and 7% are sorghum-specific. Recent gene and microRNA duplications may contribute to sorghum's drought tolerance.

2,809 citations

Journal ArticleDOI
Sabeeha S. Merchant1, Simon E. Prochnik2, Olivier Vallon3, Elizabeth H. Harris4, Steven J. Karpowicz1, George B. Witman5, Astrid Terry2, Asaf Salamov2, Lillian K. Fritz-Laylin6, Laurence Maréchal-Drouard7, Wallace F. Marshall8, Liang-Hu Qu9, David R. Nelson10, Anton A. Sanderfoot11, Martin H. Spalding12, Vladimir V. Kapitonov13, Qinghu Ren, Patrick J. Ferris14, Erika Lindquist2, Harris Shapiro2, Susan Lucas2, Jane Grimwood15, Jeremy Schmutz15, Pierre Cardol3, Pierre Cardol16, Heriberto Cerutti17, Guillaume Chanfreau1, Chun-Long Chen9, Valérie Cognat7, Martin T. Croft18, Rachel M. Dent6, Susan K. Dutcher19, Emilio Fernández20, Hideya Fukuzawa21, David González-Ballester22, Diego González-Halphen23, Armin Hallmann, Marc Hanikenne16, Michael Hippler24, William Inwood6, Kamel Jabbari25, Ming Kalanon26, Richard Kuras3, Paul A. Lefebvre11, Stéphane D. Lemaire27, Alexey V. Lobanov17, Martin Lohr28, Andrea L Manuell29, Iris Meier30, Laurens Mets31, Maria Mittag32, Telsa M. Mittelmeier33, James V. Moroney34, Jeffrey L. Moseley22, Carolyn A. Napoli33, Aurora M. Nedelcu35, Krishna K. Niyogi6, Sergey V. Novoselov17, Ian T. Paulsen, Greg Pazour5, Saul Purton36, Jean-Philippe Ral7, Diego Mauricio Riaño-Pachón37, Wayne R. Riekhof, Linda A. Rymarquis38, Michael Schroda, David B. Stern39, James G. Umen14, Robert D. Willows40, Nedra F. Wilson41, Sara L. Zimmer39, Jens Allmer42, Janneke Balk18, Katerina Bisova43, Chong-Jian Chen9, Marek Eliáš44, Karla C Gendler33, Charles R. Hauser45, Mary Rose Lamb46, Heidi K. Ledford6, Joanne C. Long1, Jun Minagawa47, M. Dudley Page1, Junmin Pan48, Wirulda Pootakham22, Sanja Roje49, Annkatrin Rose50, Eric Stahlberg30, Aimee M. Terauchi1, Pinfen Yang51, Steven G. Ball7, Chris Bowler25, Carol L. Dieckmann33, Vadim N. Gladyshev17, Pamela J. Green38, Richard A. Jorgensen33, Stephen P. Mayfield29, Bernd Mueller-Roeber37, Sathish Rajamani30, Richard T. Sayre30, Peter Brokstein2, Inna Dubchak2, David Goodstein2, Leila Hornick2, Y. Wayne Huang2, Jinal Jhaveri2, Yigong Luo2, Diego Martinez2, Wing Chi Abby Ngau2, Bobby Otillar2, Alexander Poliakov2, Aaron Porter2, Lukasz Szajkowski2, Gregory Werner2, Kemin Zhou2, Igor V. Grigoriev2, Daniel S. Rokhsar2, Daniel S. Rokhsar6, Arthur R. Grossman22 
University of California, Los Angeles1, United States Department of Energy2, University of Paris3, Duke University4, University of Massachusetts Medical School5, University of California, Berkeley6, Centre national de la recherche scientifique7, University of California, San Francisco8, Sun Yat-sen University9, University of Tennessee Health Science Center10, University of Minnesota11, Iowa State University12, Genetic Information Research Institute13, Salk Institute for Biological Studies14, Stanford University15, University of Liège16, University of Nebraska–Lincoln17, University of Cambridge18, Washington University in St. Louis19, University of Córdoba (Spain)20, Kyoto University21, Carnegie Institution for Science22, National Autonomous University of Mexico23, University of Münster24, École Normale Supérieure25, University of Melbourne26, University of Paris-Sud27, University of Mainz28, Scripps Research Institute29, Ohio State University30, University of Chicago31, University of Jena32, University of Arizona33, Louisiana State University34, University of New Brunswick35, University College London36, University of Potsdam37, Delaware Biotechnology Institute38, Boyce Thompson Institute for Plant Research39, Macquarie University40, Oklahoma State University Center for Health Sciences41, İzmir University of Economics42, Academy of Sciences of the Czech Republic43, Charles University in Prague44, St. Edward's University45, University of Puget Sound46, Hokkaido University47, Tsinghua University48, Washington State University49, Appalachian State University50, Marquette University51
12 Oct 2007-Science
TL;DR: Analyses of the Chlamydomonas genome advance the understanding of the ancestral eukaryotic cell, reveal previously unknown genes associated with photosynthetic and flagellar functions, and establish links between ciliopathy and the composition and function of flagella.
Abstract: Chlamydomonas reinhardtii is a unicellular green alga whose lineage diverged from land plants over 1 billion years ago. It is a model system for studying chloroplast-based photosynthesis, as well as the structure, assembly, and function of eukaryotic flagella (cilia), which were inherited from the common ancestor of plants and animals, but lost in land plants. We sequenced the approximately 120-megabase nuclear genome of Chlamydomonas and performed comparative phylogenomic analyses, identifying genes encoding uncharacterized proteins that are likely associated with the function and biogenesis of chloroplasts or eukaryotic flagella. Analyses of the Chlamydomonas genome advance our understanding of the ancestral eukaryotic cell, reveal previously unknown genes associated with photosynthetic and flagellar functions, and establish links between ciliopathy and the composition and function of flagella.

2,554 citations

Journal ArticleDOI
TL;DR: The VISTA family of tools created to assist biologists in carrying out comparative analysis of DNA sequences is described and capabilities of the site are illustrated by the analysis of a 180 kb interval on human chromosome 5 that encodes for the kinesin family member 3A (KIF3A) protein.
Abstract: Comparison of DNA sequences from different species is a fundamental method for identifying functional elements in genomes. Here, we describe the VISTA family of tools created to assist biologists in carrying out this task. Our first VISTA server at http://www-gsd.lbl.gov/vista/ was launched in the summer of 2000 and was designed to align long genomic sequences and visualize these alignments with associated functional annotations. Currently the VISTA site includes multiple comparative genomics tools and provides users with rich capabilities to browse pre-computed whole-genome alignments of large vertebrate genomes and other groups of organisms with VISTA Browser, to submit their own sequences of interest to several VISTA servers for various types of comparative analysis and to obtain detailed comparative analysis results for a set of cardiovascular genes. We illustrate capabilities of the VISTA site by the analysis of a 180 kb interval on human chromosome 5 that encodes for the kinesin family member 3A (KIF3A) protein.

1,986 citations


Cited by
More filters
28 Jul 2005
TL;DR: PfPMP1)与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作�ly.
Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1(PfPMP1)与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员,通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

18,940 citations

Journal ArticleDOI
TL;DR: The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences.
Abstract: Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications. We describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site. The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications.

13,223 citations

Journal ArticleDOI
14 Jan 2005-Cell
TL;DR: In a four-genome analysis of 3' UTRs, approximately 13,000 regulatory relationships were detected above the estimate of false-positive predictions, thereby implicating as miRNA targets more than 5300 human genes, which represented 30% of the gene set.

11,624 citations

Journal ArticleDOI
TL;DR: In this article, the authors present an approach for efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data.
Abstract: Rapid improvements in sequencing and array-based platforms are resulting in a flood of diverse genome-wide data, including data from exome and whole-genome sequencing, epigenetic surveys, expression profiling of coding and noncoding RNAs, single nucleotide polymorphism (SNP) and copy number profiling, and functional assays. Analysis of these large, diverse data sets holds the promise of a more comprehensive understanding of the genome and its relation to human disease. Experienced and knowledgeable human review is an essential component of this process, complementing computational approaches. This calls for efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data. However, the sheer volume and scope of data pose a significant challenge to the development of such tools.

10,798 citations

Journal ArticleDOI
TL;DR: A fully automated service for annotating bacterial and archaeal genomes that identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user.
Abstract: The number of prokaryotic genome sequences becoming available is growing steadily and is growing faster than our ability to accurately annotate them. We describe a fully automated service for annotating bacterial and archaeal genomes. The service identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user. In addition, the annotated genome can be browsed in an environment that supports comparative analysis with the annotated genomes maintained in the SEED environment. The service normally makes the annotated genome available within 12–24 hours of submission, but ultimately the quality of such a service will be judged in terms of accuracy, consistency, and completeness of the produced annotations. We summarize our attempts to address these issues and discuss plans for incrementally enhancing the service. By providing accurate, rapid annotation freely to the community we have created an important community resource. The service has now been utilized by over 120 external users annotating over 350 distinct genomes.

9,397 citations