scispace - formally typeset
Search or ask a question
Author

Nina Thayer

Other affiliations: Joint Genome Institute
Bio: Nina Thayer is an academic researcher from Los Alamos National Laboratory. The author has contributed to research in topics: Bacillus thuringiensis & Chromosome 16. The author has an hindex of 11, co-authored 14 publications receiving 2274 citations. Previous affiliations of Nina Thayer include Joint Genome Institute.

Papers
More filters
Journal ArticleDOI
TL;DR: This work assembled 89 scaffolds to generate 34 Mbp of nearly contiguous T. reesei genome sequence comprising 9,129 predicted gene models, providing a roadmap for constructing enhanced T.Reesei strains for industrial applications such as biofuel production.
Abstract: Trichoderma reesei is the main industrial source of cellulases and hemicellulases used to depolymerize biomass to simple sugars that are converted to chemical intermediates and biofuels, such as ethanol. We assembled 89 scaffolds (sets of ordered and oriented contigs) to generate 34 Mbp of nearly contiguous T. reesei genome sequence comprising 9,129 predicted gene models. Unexpectedly, considering the industrial utility and effectiveness of the carbohydrate-active enzymes of T. reesei, its genome encodes fewer cellulases and hemicellulases than any other sequenced fungus able to hydrolyze plant cell wall polysaccharides. Many T. reesei genes encoding carbohydrate-active enzymes are distributed nonrandomly in clusters that lie between regions of synteny with other Sordariomycetes. Numerous genes encoding biosynthetic pathways for secondary metabolites may promote survival of T. reesei in its competitive soil habitat, but genome analysis provided little mechanistic insight into its extraordinary capacity for protein secretion. Our analysis, coupled with the genome sequence data, provides a roadmap for constructing enhanced T. reesei strains for industrial applications such as biofuel production.

1,085 citations

Journal ArticleDOI
Jane Grimwood1, Laurie Gordon2, Laurie Gordon3, Anne S. Olsen3, Anne S. Olsen2, Astrid Terry3, Jeremy Schmutz1, Jane Lamerdin2, Jane Lamerdin3, Uffe Hellsten3, David Goodstein3, Olivier Couronne3, Mary Bao Tran-Gyamfi3, Mary Bao Tran-Gyamfi2, Andrea Aerts3, Michael R. Altherr4, Michael R. Altherr3, Linda K. Ashworth2, Linda K. Ashworth3, Eva Bajorek1, Stacey Black1, Elbert Branscomb2, Elbert Branscomb3, Sean Caenepeel3, Anthony V. Carrano2, Anthony V. Carrano3, Chenier Caoile1, Yee Man Chan1, Mari Christensen3, Mari Christensen2, Catherine A. Cleland3, Catherine A. Cleland4, Alex Copeland3, Eileen Dalin3, Paramvir S. Dehal3, Mirian Denys1, John C. Detter3, Julio Escobar1, Dave Flowers1, Dea Fotopulos1, Carmen Rosa Albacete García1, Anca M. Georgescu2, Anca M. Georgescu3, Tijana Glavina3, Maria Gomez1, Eidelyn Gonzales1, Matthew Groza2, Matthew Groza3, Nancy Hammon3, Trevor Hawkins3, Lauren Haydu1, Isaac Ho3, Wayne Huang3, Sanjay Israni3, Jamie Jett3, Kristen Kadner3, Heather Kimball3, Arthur Kobayashi3, Arthur Kobayashi2, Vladimer Larionov, Sun-Hee Leem, Frederick Lopez1, Yunian Lou3, Steve Lowry3, Stephanie Malfatti3, Stephanie Malfatti2, Diego Martinez3, Paula McCready3, Paula McCready2, Catherine Medina1, Jenna Morgan3, Kathryn Nelson4, Kathryn Nelson3, Matt Nolan3, Ivan Ovcharenko2, Ivan Ovcharenko3, Sam Pitluck3, Martin Pollard3, Anthony P. Popkie5, Paul Predki3, Glenda Quan2, Glenda Quan3, Lucía Ramírez1, Sam Rash3, James Retterer1, Alex Rodriguez1, Stephanine Rogers1, Asaf Salamov3, Angelica Salazar1, Xinwei She5, Doug Smith3, Tom Slezak2, Tom Slezak3, Victor V. Solovyev3, Nina Thayer4, Nina Thayer3, Hope Tice3, Ming Tsai1, Anna Ustaszewska3, Nu Vo1, Mark C. Wagner3, Mark C. Wagner2, Jeremy Wheeler1, Kevin Wu1, Gary Xie4, Gary Xie3, Joan Yang1, Inna Dubchak3, Terrence S. Furey6, Pieter J. deJong7, Mark Dickson1, David Gordon8, Evan E. Eichler5, Len A. Pennacchio3, Paul G. Richardson3, Lisa Stubbs2, Lisa Stubbs3, Daniel S. Rokhsar3, Richard M. Myers1, Edward M. Rubin3, Susan Lucas3 
01 Apr 2004-Nature
TL;DR: Comparative analyses show a fascinating picture of conservation and divergence, revealing large blocks of gene orthology with rodents, scattered regions with more recent gene family expansions and deletions, and segments of coding and non-coding conservation with the distant fish species Takifugu.
Abstract: Chromosome 19 has the highest gene density of all human chromosomes, more than double the genome-wide average. The large clustered gene families, corresponding high G + C content, CpG islands and density of repetitive DNA indicate a chromosome rich in biological and evolutionary significance. Here we describe 55.8 million base pairs of highly accurate finished sequence representing 99.9% of the euchromatin portion of the chromosome. Manual curation of gene loci reveals 1,461 protein-coding genes and 321 pseudogenes. Among these are genes directly implicated in mendelian disorders, including familial hypercholesterolaemia and insulin-resistant diabetes. Nearly one-quarter of these genes belong to tandemly arranged families, encompassing more than 25% of the chromosome. Comparative analyses show a fascinating picture of conservation and divergence, revealing large blocks of gene orthology with rodents, scattered regions with more recent gene family expansions and deletions, and segments of coding and non-coding conservation with the distant fish species Takifugu.

307 citations

Journal ArticleDOI
TL;DR: Comparison of the genomes of two members of the B. cereus group revealed differences in terms of virulence, metabolic competence, structural components, and regulatory mechanisms, as well as shared and unique genes among these isolates in comparison to the genome of pathogenic strains B. anthracis Ames and B. cerealus.
Abstract: Bacillus anthracis, Bacillus cereus, and Bacillus thuringiensis are closely related gram-positive, spore-forming bacteria of the B. cereus sensu lato group. While independently derived strains of B. anthracis reveal conspicuous sequence homogeneity, environmental isolates of B. cereus and B. thuringiensis exhibit extensive genetic diversity. Here we report the sequencing and comparative analysis of the genomes of two members of the B. cereus group, B. thuringiensis 97-27 subsp. konkukian serotype H34, isolated from a necrotic human wound, and B. cereus E33L, which was isolated from a swab of a zebra carcass in Namibia. These two strains, when analyzed by amplified fragment length polymorphism within a collection of over 300 of B. cereus, B. thuringiensis, and B. anthracis isolates, appear closely related to B. anthracis. The B. cereus E33L isolate appears to be the nearest relative to B. anthracis identified thus far. Whole-genome sequencing of B. thuringiensis 97-27and B. cereus E33L was undertaken to identify shared and unique genes among these isolates in comparison to the genomes of pathogenic strains B. anthracis Ames and B. cereus G9241 and nonpathogenic strains B. cereus ATCC 10987 and B. cereus ATCC 14579. Comparison of these genomes revealed differences in terms of virulence, metabolic competence, structural components, and regulatory mechanisms.

231 citations

Journal ArticleDOI
TL;DR: The complete DNA sequence of the aerobic cellulolytic soil bacterium Cytophaga hutchinsonii, which belongs to the phylum Bacteroidetes, is presented and many genes thought to encode proteins involved in cellulose utilization were identified.
Abstract: The complete DNA sequence of the aerobic cellulolytic soil bacterium Cytophaga hutchinsonii, which belongs to the phylum Bacteroidetes, is presented. The genome consists of a single, circular, 4.43-Mb chromosome containing 3,790 open reading frames, 1,986 of which have been assigned a tentative function. Two of the most striking characteristics of C. hutchinsonii are its rapid gliding motility over surfaces and its contact-dependent digestion of crystalline cellulose. The mechanism of C. hutchinsonii motility is not known, but its genome contains homologs for each of the gld genes that are required for gliding of the distantly related bacteroidete Flavobacterium johnsoniae. Cytophaga-Flavobacterium gliding appears to be novel and does not involve well-studied motility organelles such as flagella or type IV pili. Many genes thought to encode proteins involved in cellulose utilization were identified. These include candidate endo-β-1,4-glucanases and β-glucosidases. Surprisingly, obvious homologs of known cellobiohydrolases were not detected. Since such enzymes are needed for efficient cellulose digestion by well-studied cellulolytic bacteria, C. hutchinsonii either has novel cellobiohydrolases or has an unusual method of cellulose utilization. Genes encoding proteins with cohesin domains, which are characteristic of cellulosomes, were absent, but many proteins predicted to be involved in polysaccharide utilization had putative D5 domains, which are thought to be involved in anchoring proteins to the cell surface.

220 citations

Journal ArticleDOI
Joel Martin1, Cliff Han2, Laurie Gordon1, Astrid Terry1, Shyam Prabhakar3, Xinwei She4, Gary Xie1, Gary Xie2, Uffe Hellsten1, Yee Man Chan5, Michael R. Altherr2, Michael R. Altherr1, Olivier Couronne3, Andrea Aerts1, Eva Bajorek5, Stacey Black5, Heather Blumer2, Elbert Branscomb1, Elbert Branscomb6, Nancy C. Brown2, William J. Bruno2, Judith M. Buckingham2, David F. Callen2, Connie S. Campbell2, Mary L. Campbell2, E.W. Campbell2, Chenier Caoile5, Jean F. Challacombe2, Leslie Chasteen2, Olga Chertkov2, Han C. Chi2, Mari Christensen6, Lynn M. Clark2, Judith D. Cohn2, Mirian Denys5, John C. Detter1, Mark Dickson5, Mira Dimitrijevic-Bussod2, Julio Escobar5, Joseph J. Fawcett2, Dave Flowers5, Dea Fotopulos5, Tijana Glavina1, Maria Gomez5, Eidelyn Gonzales5, David Goodstein1, Lynne Goodwin2, Deborah L. Grady2, Igor V. Grigoriev1, Matthew Groza6, Nancy Hammon1, Trevor Hawkins1, Lauren Haydu5, C.E. Hildebrand2, Wayne Huang1, Sanjay Israni1, Jamie Jett1, Phillip B. Jewett2, Kristen Kadner1, Heather Kimball1, Arthur Kobayashi6, Arthur Kobayashi1, Marie-Claude Krawczyk2, Tina Leyba2, Jonathan L. Longmire2, Frederick Lopez5, Yunian Lou1, Steve Lowry1, Thom Ludeman2, Chitra Manohar6, Graham A. Mark2, Kimberly L. Mcmurray2, Linda Meincke2, Jenna Morgan1, Robert K. Moyzis2, Mark Mundt2, A. Christine Munk2, Richard D. Nandkeshwar6, Sam Pitluck1, Martin Pollard1, Paul Predki1, B. Parson-Quintana2, Lucía Ramírez5, Sam Rash1, James Retterer5, Darryl O. Ricke2, Donna L. Robinson2, Alex Rodriguez5, Asaf Salamov1, Elizabeth Saunders2, D. Scott1, Timothy Shough2, Raymond L. Stallings2, Malinda Stalvey2, Robert D. Sutherland2, Roxanne Tapia2, Judith G. Tesmer2, Nina Thayer2, Nina Thayer1, Linda S. Thompson2, Hope Tice1, David C. Torney2, Mary Bao Tran-Gyamfi1, Ming Tsai5, Levy E. Ulanovsky2, Anna Ustaszewska1, Nu Vo5, P. Scott White2, Albert L. Williams2, Patricia L. Wills2, Jung-Rung Wu2, Kevin Wu5, Joan Yang5, Pieter J. deJong7, David Bruce2, Norman A. Doggett2, Larry L. Deaven2, Jeremy Schmutz5, Jane Grimwood5, Paul G. Richardson1, Daniel S. Rokhsar1, Evan E. Eichler4, Paul Gilna2, Susan Lucas1, Richard M. Myers5, Edward M. Rubin1, Edward M. Rubin3, Len A. Pennacchio3, Len A. Pennacchio1 
23 Dec 2004-Nature
TL;DR: The 78,884,754 base pairs of finished chromosome 16 sequence, representing over 99.9% of its euchromatin, revealed 880 protein-coding genes, including metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukaemia.
Abstract: Human chromosome 16 features one of the highest levels of segmentally duplicated sequence among the human autosomes. We report here the 78,884,754 base pairs of finished chromosome 16 sequence, representing over 99.9% of its euchromatin. Manual annotation revealed 880 protein-coding genes confirmed by 1,670 aligned transcripts, 19 transfer RNA genes, 341 pseudogenes and three RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukaemia. Several large-scale structural polymorphisms spanning hundreds of kilobase pairs were identified and result in gene content differences among humans. Whereas the segmental duplications of chromosome 16 are enriched in the relatively gene-poor pericentromere of the p arm, some are involved in recent gene duplication and conversion events that are likely to have had an impact on the evolution of primates and human disease susceptibility.

146 citations


Cited by
More filters
Journal ArticleDOI
21 Oct 2004-Nature
TL;DR: The current human genome sequence (Build 35) as discussed by the authors contains 2.85 billion nucleotides interrupted by only 341 gaps and is accurate to an error rate of approximately 1 event per 100,000 bases.
Abstract: The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers approximately 99% of the euchromatic genome and is accurate to an error rate of approximately 1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human genome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead.

3,989 citations

Journal ArticleDOI
LaDeana W. Hillier1, Webb Miller2, Ewan Birney, Wesley C. Warren1  +171 moreInstitutions (39)
09 Dec 2004-Nature
TL;DR: A draft genome sequence of the red jungle fowl, Gallus gallus, provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes.
Abstract: We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.

2,579 citations

Journal ArticleDOI
TL;DR: Rapidly accumulating evidence indicates that structural variants can comprise millions of nucleotides of heterogeneity within every genome, and are likely to make an important contribution to human diversity and disease susceptibility.
Abstract: The first wave of information from the analysis of the human genome revealed SNPs to be the main source of genetic and phenotypic human variation. However, the advent of genome-scanning technologies has now uncovered an unexpectedly large extent of what we term 'structural variation' in the human genome. This comprises microscopic and, more commonly, submicroscopic variants, which include deletions, duplications and large-scale copy-number variants - collectively termed copy-number variants or copy-number polymorphisms - as well as insertions, inversions and translocations. Rapidly accumulating evidence indicates that structural variants can comprise millions of nucleotides of heterogeneity within every genome, and are likely to make an important contribution to human diversity and disease susceptibility.

1,804 citations

Journal ArticleDOI
TL;DR: An analysis of 1,391 manually curated sequence-specific DNA-binding transcription factors, their functions, genomic organization and evolutionary conservation provides a solid foundation for future investigations to elucidate regulatory mechanisms underlying diverse mammalian biological processes.
Abstract: Transcription factors are key cellular components that control gene expression: their activities determine how cells function and respond to the environment. Currently, there is great interest in research into human transcriptional regulation. However, surprisingly little is known about these regulators themselves. For example, how many transcription factors does the human genome contain? How are they expressed in different tissues? Are they evolutionarily conserved? Here, we present an analysis of 1,391 manually curated sequence-specific DNA-binding transcription factors, their functions, genomic organization and evolutionary conservation. Much remains to be explored, but this study provides a solid foundation for future investigations to elucidate regulatory mechanisms underlying diverse mammalian biological processes.

1,489 citations

Journal ArticleDOI
TL;DR: An increased understanding of the disorder's underlying genetic, molecular, and cellular mechanisms and a better appreciation of its progression and systemic manifestations have laid out the foundation for the development of clinical trials and potentially effective treatments.

1,319 citations