scispace - formally typeset
Search or ask a question
Author

Kerr Wall

Bio: Kerr Wall is an academic researcher from Pennsylvania State University. The author has contributed to research in topics: Gene & Genome. The author has an hindex of 9, co-authored 10 publications receiving 5126 citations.

Papers
More filters
Journal ArticleDOI
Gerald A. Tuskan1, Gerald A. Tuskan2, Stephen P. DiFazio2, Stephen P. DiFazio3, Stefan Jansson4, Joerg Bohlmann5, Igor V. Grigoriev6, Uffe Hellsten6, Nicholas H. Putnam6, Steven G. Ralph5, Stephane Rombauts7, Asaf Salamov6, Jacquie Schein, Lieven Sterck7, Andrea Aerts6, Rishikeshi Bhalerao4, Rishikesh P. Bhalerao8, Damien Blaudez9, Wout Boerjan7, Annick Brun9, Amy M. Brunner10, Victor Busov11, Malcolm M. Campbell12, John E. Carlson13, Michel Chalot9, Jarrod Chapman6, G.-L. Chen2, Dawn Cooper5, Pedro M. Coutinho14, Jérémy Couturier9, Sarah F. Covert15, Quentin C. B. Cronk5, R. Cunningham2, John M. Davis16, Sven Degroeve7, Annabelle Déjardin9, Claude W. dePamphilis13, John C. Detter6, Bill Dirks17, Inna Dubchak6, Inna Dubchak18, Sébastien Duplessis9, Jürgen Ehlting5, Brian E. Ellis5, Karla C Gendler19, David Goodstein6, Michael Gribskov20, Jane Grimwood21, Andrew Groover22, Lee E. Gunter2, Björn Hamberger5, Berthold Heinze, Yrjö Helariutta8, Yrjö Helariutta23, Yrjö Helariutta24, Bernard Henrissat14, D. Holligan15, Robert A. Holt, Wenyu Huang6, N. Islam-Faridi22, Steven J.M. Jones, M. Jones-Rhoades25, Richard A. Jorgensen19, Chandrashekhar P. Joshi11, Jaakko Kangasjärvi23, Jan Karlsson4, Colin T. Kelleher5, Robert Kirkpatrick, Matias Kirst16, Annegret Kohler9, Udaya C. Kalluri2, Frank W. Larimer2, Jim Leebens-Mack15, Jean-Charles Leplé9, Philip F. LoCascio2, Y. Lou6, Susan Lucas6, Francis Martin9, Barbara Montanini9, Carolyn A. Napoli19, David R. Nelson26, C D Nelson22, Kaisa Nieminen23, Ove Nilsson8, V. Pereda9, Gary F. Peter16, Ryan N. Philippe5, Gilles Pilate9, Alexander Poliakov18, J. Razumovskaya2, Paul G. Richardson6, Cécile Rinaldi9, Kermit Ritland5, Pierre Rouzé7, D. Ryaboy18, Jeremy Schmutz21, J. Schrader27, Bo Segerman4, H. Shin, Asim Siddiqui, Fredrik Sterky, Astrid Terry6, Chung-Jui Tsai11, Edward C. Uberbacher2, Per Unneberg, Jorma Vahala23, Kerr Wall13, Susan R. Wessler15, Guojun Yang15, T. Yin2, Carl J. Douglas5, Marco A. Marra, Göran Sandberg8, Y. Van de Peer7, Daniel S. Rokhsar17, Daniel S. Rokhsar6 
15 Sep 2006-Science
TL;DR: The draft genome of the black cottonwood tree, Populus trichocarpa, has been reported in this paper, with more than 45,000 putative protein-coding genes identified.
Abstract: We report the draft genome of the black cottonwood tree, Populus trichocarpa. Integration of shotgun sequence assembly with genetic mapping enabled chromosome-scale reconstruction of the genome. More than 45,000 putative protein-coding genes were identified. Analysis of the assembled genome revealed a whole-genome duplication event; about 8000 pairs of duplicated genes from that event survived in the Populus genome. A second, older duplication event is indistinguishably coincident with the divergence of the Populus and Arabidopsis lineages. Nucleotide substitution, tandem gene duplication, and gross chromosomal rearrangement appear to proceed substantially more slowly in Populus than in Arabidopsis. Populus has more protein-coding genes than Arabidopsis, ranging on average from 1.4 to 1.6 putative Populus homologs for each Arabidopsis gene. However, the relative frequency of protein domains in the two genomes is similar. Overrepresented exceptions in Populus include genes associated with lignocellulosic wall biosynthesis, meristem development, disease resistance, and metabolite transport.

4,025 citations

Journal ArticleDOI
24 Apr 2008-Nature
TL;DR: Papaya offers numerous advantages as a system for fruit-tree functional genomics, and this draft genome sequence provides the foundation for revealing the basis of Carica’s distinguishing morpho-physiological, medicinal and nutritional properties.
Abstract: Papaya, a fruit crop cultivated in tropical and subtropical regions, is known for its nutritional benefits and medicinal applications. Here we report a 3x draft genome sequence of 'SunUp' papaya, the first commercial virus-resistant transgenic fruit tree to be sequenced. The papaya genome is three times the size of the Arabidopsis genome, but contains fewer genes, including significantly fewer disease-resistance gene analogues. Comparison of the five sequenced genomes suggests a minimal angiosperm gene set of 13,311. A lack of recent genome duplication, atypical of other angiosperm genomes sequenced so far, may account for the smaller papaya gene number in most functional groups. Nonetheless, striking amplifications in gene number within particular functional groups suggest roles in the evolution of tree-like habit, deposition and remobilization of starch reserves, attraction of seed dispersal agents, and adaptation to tropical daylengths. Transgenesis at three locations is closely associated with chloroplast insertions into the nuclear genome, and with topoisomerase I recognition sites. Papaya offers numerous advantages as a system for fruit-tree functional genomics, and this draft genome sequence provides the foundation for revealing the basis of Carica's distinguishing morpho-physiological, medicinal and nutritional properties.

1,028 citations

Journal ArticleDOI
TL;DR: These studies indicate that the AP2 domain was duplicated prior to the divergence of the two major lineages of AP2-like genes, euAP2 and AINTEGUMENTA (ANT), and show that the eu AP2 homologue from Amborella trichopoda is expressed in all floral organs as well as leaves.
Abstract: The combined processes of gene duplication, nucleotide substitution, domain duplication, and intron/exon shuffling can generate a complex set of related genes that may differ substantially in their expression patterns and functions. The APETALA2-like (AP2-like) gene family exhibits patterns of both gene and domain duplication, coupled with changes in sequence, exon arrangement, and expression. In angiosperms, these genes perform an array of functions including the establishment of the floral meristem, the specification of floral organ identity, the regulation of floral homeotic gene expression, the regulation of ovule development, and the growth of floral organs. To determine patterns of gene diversification, we conducted a series of broad phylogenetic analyses of AP2-like sequences from green plants. These studies indicate that the AP2 domain was duplicated prior to the divergence of the two major lineages of AP2-like genes, euAP2 and AINTEGUMENTA (ANT). Structural features of the AP2-like genes as well as phylogenetic analyses of nucleotide and amino acid (aa) sequences of the AP2-like gene family support the presence of the two major lineages. The ANT lineage is supported by a 10-aa insertion in the AP2-R1 domain and a 1-aa insertion in the AP2-R2 domain, relative to all other members of the AP2-like family. MicroRNA172-binding sequences, the function of which has been studied in some of the AP2-like genes in Arabidopsis, are restricted to the euAP2 lineage. Within the ANT lineage, the euANT lineage is characterized by four conserved motifs: one in the 10-aa insertion in the AP2-R1 domain (euANT1) and three in the predomain region (euANT2, euANT3, and euANT4). Our expression studies show that the euAP2 homologue from Amborella trichopoda, the putative sister to all other angiosperms, is expressed in all floral organs as well as leaves.

201 citations

Journal ArticleDOI
TL;DR: Comparative analysis of miRNA sequences from plants of phylogenetically-critical basal lineages aid the study of the evolutionary gains and losses of miRNAs in plants as well as their conservation, and lead to discoveries about the miRNas of even well-studied model organisms.
Abstract: MicroRNAs (miRNAs) negatively control gene expression by cleaving or inhibiting the translation of mRNA of target genes, and as such, they play an important role in plant development. Of the 79 plant miRNA families discovered to date, most are from the fully sequenced plant genomes of Arabidopsis, Populus and rice. Here, we identified miRNAs from leaves, roots, stems and flowers at different developmental stages of the basal eudicot species Eschscholzia californica (California poppy) using cloning and capillary sequencing, as well as ultrahigh-throughput pyrosequencing using the recently introduced 454 sequencing method. In total, we identified a minimum of 173 unique miRNA sequences belonging to 28 miRNA families and seven trans-acting small interfering RNAs (ta-siRNAs) conserved in eudicot and monocot species. miR529 and miR537, which have not yet been reported in eudicot species, were detected in California poppy; loci encoding these miRNAs were also found in Arabidopsis and Populus. miR535, which occurs in the moss Physcomitrella patens, was also detected in California poppy, but not in other angiosperms. Several potential miRNA targets were found in cDNA sequences of California poppy. Predicted target genes include transcription factors but also genes implicated in various metabolic processes and in stress defense. Comparative analysis of miRNAs from plants of phylogenetically-critical basal lineages aid the study of the evolutionary gains and losses of miRNAs in plants as well as their conservation, and lead to discoveries about the miRNAs of even well-studied model organisms.

84 citations

Journal ArticleDOI
TL;DR: Two types of EST clustering error are identified and a novel statistical approach is proposed to correct ISO error to provide more accurate estimates of the true gene cluster profile.
Abstract: Motivation: The gene expression intensity information conveyed by (EST) Expressed Sequence Tag data can be used to infer important cDNA library properties, such as gene number and expression patterns. However, EST clustering errors, which often lead to greatly inflated estimates of obtained unique genes, have become a major obstacle in the analyses. The EST clustering error structure, the relationship between clustering error and clustering criteria, and possible error correction methods need to be systematically investigated. Results: We identify and quantify two types of EST clustering error, namely, Type I and II in EST clustering using CAP3 assembling program. A Type I error occurs when ESTs from the same gene do not form a cluster whereas a Type II error occurs when ESTs from distinct genes are falsely clustered together. While the Type II error rate is <1.5% for both 5' and 3' EST clustering, the Type I error in the 5' EST case is ∼10 times higher than the 3' EST case (30% versus 3%). An over-stringent identity rule, e.g., P ≥ 95%, may even inflate the Type I error in both cases. We demonstrate that ∼80% of the Type I error is due to insufficient overlap among sibling ESTs (ISO error) in 5' EST clustering. A novel statistical approach is proposed to correct ISO error to provide more accurate estimates of the true gene cluster profile. Availability: We have automated the methods developed in this paper in a web-based software ESTstat at http://cwdg5.bio.psu.edu/eststat. Supplementary information: http://cwdg5.bio.psu.edu/eststat

72 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The Carbohydrate-Active Enzyme (CAZy) database is a knowledge-based resource specialized in the enzymes that build and breakdown complex carbohydrates and glycoconjugates and has been used to improve the quality of functional predictions of a number genome projects by providing expert annotation.
Abstract: The Carbohydrate-Active Enzyme (CAZy) database is a knowledge-based resource specialized in the enzymes that build and breakdown complex carbohydrates and glycoconjugates. As of September 2008, the database describes the present knowledge on 113 glycoside hydrolase, 91 glycosyltransferase, 19 polysaccharide lyase, 15 carbohydrate esterase and 52 carbohydrate-binding module families. These families are created based on experimentally characterized proteins and are populated by sequences from public databases with significant similarity. Protein biochemical information is continuously curated based on the available literature and structural information. Over 6400 proteins have assigned EC numbers and 700 proteins have a PDB structure. The classification (i) reflects the structural features of these enzymes better than their sole substrate specificity, (ii) helps to reveal the evolutionary relationships between these enzymes and (iii) provides a convenient framework to understand mechanistic properties. This resource has been available for over 10 years to the scientific community, contributing to information dissemination and providing a transversal nomenclature to glycobiologists. More recently, this resource has been used to improve the quality of functional predictions of a number genome projects by providing expert annotation. The CAZy resource resides at URL: http://www.cazy.org/.

6,028 citations

Journal ArticleDOI
14 Jan 2010-Nature
TL;DR: An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.
Abstract: Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70% more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78% of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75% of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.

3,743 citations

Journal ArticleDOI
TL;DR: Phytozome provides a view of the evolutionary history of every plant gene at the level of sequence, gene structure, gene family and genome organization, while at the same time providing access to the sequences and functional annotations of a growing number of complete plant genomes.
Abstract: The number of sequenced plant genomes and associated genomic resources is growing rapidly with the advent of both an increased focus on plant genomics from funding agencies, and the application of inexpensive next generation sequencing. To interact with this increasing body of data, we have developed Phytozome (http://www.phytozome.net), a comparative hub for plant genome and gene family data and analysis. Phytozome provides a view of the evolutionary history of every plant gene at the level of sequence, gene structure, gene family and genome organization, while at the same time providing access to the sequences and functional annotations of a growing number (currently 25) of complete plant genomes, including all the land plants and selected algae sequenced at the Joint Genome Institute, as well as selected species sequenced elsewhere. Through a comprehensive plant genome database and web portal, these data and analyses are available to the broader plant science research community, providing powerful comparative genomics tools that help to link model systems with other plants of economic and ecological importance.

3,728 citations

Journal ArticleDOI
26 Aug 2007-Nature
TL;DR: A high-quality draft of the genome sequence of grapevine is obtained from a highly homozygous genotype, revealing the contribution of three ancestral genomes to the grapevine haploid content and explaining the chronology of previously described whole-genome duplication events in the evolution of flowering plants.
Abstract: The analysis of the first plant genomes provided unexpected evidence for genome duplication events in species that had previously been considered as true diploids on the basis of their genetics. These polyploidization events may have had important consequences in plant evolution, in particular for species radiation and adaptation and for the modulation of functional capacities. Here we report a high-quality draft of the genome sequence of grapevine (Vitis vinifera) obtained from a highly homozygous genotype. The draft sequence of the grapevine genome is the fourth one produced so far for flowering plants, the second for a woody species and the first for a fruit crop (cultivated for both fruit and beverage). Grapevine was selected because of its important place in the cultural heritage of humanity beginning during the Neolithic period. Several large expansions of gene families with roles in aromatic features are observed. The grapevine genome has not undergone recent genome duplication, thus enabling the discovery of ancestral traits and features of the genetic organization of flowering plants. This analysis reveals the contribution of three ancestral genomes to the grapevine haploid content. This ancestral arrangement is common to many dicotyledonous plants but is absent from the genome of rice, which is a monocotyledon. Furthermore, we explain the chronology of previously described whole-genome duplication events in the evolution of flowering plants.

3,311 citations

Book ChapterDOI
31 Jan 1963

2,885 citations