Author
Per Unneberg
Other affiliations: Centre national de la recherche scientifique, Royal Institute of Technology, Uppsala University
Bio: Per Unneberg is an academic researcher from Science for Life Laboratory. The author has contributed to research in topics: Gene & Expressed sequence tag. The author has an hindex of 17, co-authored 26 publications receiving 5529 citations. Previous affiliations of Per Unneberg include Centre national de la recherche scientifique & Royal Institute of Technology.
Topics: Gene, Expressed sequence tag, Genome, Population, Candidate gene
Papers
More filters
••
University of Tennessee1, Oak Ridge National Laboratory2, West Virginia University3, Umeå University4, University of British Columbia5, United States Department of Energy6, Ghent University7, Swedish University of Agricultural Sciences8, Institut national de la recherche agronomique9, Virginia Tech10, Michigan Technological University11, University of Toronto12, Pennsylvania State University13, University of Provence14, University of Georgia15, University of Florida16, University of California, Berkeley17, Lawrence Berkeley National Laboratory18, University of Arizona19, Purdue University20, Stanford University21, United States Department of Agriculture22, University of Turku23, University of Helsinki24, Massachusetts Institute of Technology25, University of Tennessee Health Science Center26, University of Tübingen27
TL;DR: The draft genome of the black cottonwood tree, Populus trichocarpa, has been reported in this paper, with more than 45,000 putative protein-coding genes identified.
Abstract: We report the draft genome of the black cottonwood tree, Populus trichocarpa. Integration of shotgun sequence assembly with genetic mapping enabled chromosome-scale reconstruction of the genome. More than 45,000 putative protein-coding genes were identified. Analysis of the assembled genome revealed a whole-genome duplication event; about 8000 pairs of duplicated genes from that event survived in the Populus genome. A second, older duplication event is indistinguishably coincident with the divergence of the Populus and Arabidopsis lineages. Nucleotide substitution, tandem gene duplication, and gross chromosomal rearrangement appear to proceed substantially more slowly in Populus than in Arabidopsis. Populus has more protein-coding genes than Arabidopsis, ranging on average from 1.4 to 1.6 putative Populus homologs for each Arabidopsis gene. However, the relative frequency of protein domains in the two genomes is similar. Overrepresented exceptions in Populus include genes associated with lignocellulosic wall biosynthesis, meristem development, disease resistance, and metabolite transport.
4,025 citations
••
TL;DR: Characterization of genomic differentiation in a classic example of hybridization between all-black carrion crows and gray-coated hooded crows identified genome-wide introgression extending far beyond the morphological hybrid zone, indicating localized genomic selection can cause marked heterogeneity in introgressive landscapes while maintaining phenotypic divergence.
Abstract: The importance, extent, and mode of interspecific gene flow for the evolution of species has long been debated. Characterization of genomic differentiation in a classic example of hybridization between all-black carrion crows and gray-coated hooded crows identified genome-wide introgression extending far beyond the morphological hybrid zone. Gene expression divergence was concentrated in pigmentation genes expressed in gray versus black feather follicles. Only a small number of narrow genomic islands exhibited resistance to gene flow. One prominent genomic region (<2 megabases) harbored 81 of all 82 fixed differences (of 8.4 million single-nucleotide polymorphisms in total) linking genes involved in pigmentation and in visual perception-a genomic signal reflecting color-mediated prezygotic isolation. Thus, localized genomic selection can cause marked heterogeneity in introgression landscapes while maintaining phenotypic divergence.
495 citations
••
TL;DR: The H-InvDB as discussed by the authors is a database of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level.
Abstract: The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology.
341 citations
••
TL;DR: The coding content of Populus and Arabidopsis genomes shows very high similarity, indicating that differences between these annual and perennial angiosperm life forms result primarily from differences in gene regulation.
Abstract: Trees present a life form of paramount importance for terrestrial ecosystems and human societies because of their ecological structure and physiological function and provision of energy and industrial materials. The genus Populus is the internationally accepted model for molecular tree biology. We have analyzed 102,019 Populus ESTs that clustered into 11,885 clusters and 12,759 singletons. We also provide >4,000 assembled full clone sequences to serve as a basis for the upcoming annotation of the Populus genome sequence. A public web-based EST database (populusdb) provides digital expression profiles for 18 tissues that comprise the majority of differentiated organs. The coding content of Populus and Arabidopsis genomes shows very high similarity, indicating that differences between these annual and perennial angiosperm life forms result primarily from differences in gene regulation. The high similarity between Populus and Arabidopsis will allow studies of Populus to directly benefit from the detailed functional genomic information generated for Arabidopsis, enabling detailed insights into tree development and adaptation. These data will also valuable for functional genomic efforts in Arabidopsis.
340 citations
••
TL;DR: Data demonstrated that mutations in two genes, IRF6 and GRHL3, can lead to nearly identical phenotypes of orofacial cleft and supported the hypotheses that both genes are essential for the presence of a functional oral periderm and that failure of this process contributes to VWS.
Abstract: Mutations in interferon regulatory factor 6 (IRF6) account for ∼70% of cases of Van der Woude syndrome (VWS), the most common syndromic form of cleft lip and palate. In 8 of 45 VWS-affected families lacking a mutation in IRF6, we found coding mutations in grainyhead-like 3 (GRHL3). According to a zebrafish-based assay, the disease-associated GRHL3 mutations abrogated periderm development and were consistent with a dominant-negative effect, in contrast to haploinsufficiency seen in most VWS cases caused by IRF6 mutations. In mouse, all embryos lacking Grhl3 exhibited abnormal oral periderm and 17% developed a cleft palate. Analysis of the oral phenotype of double heterozygote (Irf6+/−;Grhl3+/−) murine embryos failed to detect epistasis between the two genes, suggesting that they function in separate but convergent pathways during palatogenesis. Taken together, our data demonstrated that mutations in two genes, IRF6 and GRHL3, can lead to nearly identical phenotypes of orofacial cleft. They supported the hypotheses that both genes are essential for the presence of a functional oral periderm and that failure of this process contributes to VWS.
186 citations
Cited by
More filters
••
TL;DR: The Carbohydrate-Active Enzyme (CAZy) database is a knowledge-based resource specialized in the enzymes that build and breakdown complex carbohydrates and glycoconjugates and has been used to improve the quality of functional predictions of a number genome projects by providing expert annotation.
Abstract: The Carbohydrate-Active Enzyme (CAZy) database is a knowledge-based resource specialized in the enzymes that build and breakdown complex carbohydrates and glycoconjugates. As of September 2008, the database describes the present knowledge on 113 glycoside hydrolase, 91 glycosyltransferase, 19 polysaccharide lyase, 15 carbohydrate esterase and 52 carbohydrate-binding module families. These families are created based on experimentally characterized proteins and are populated by sequences from public databases with significant similarity. Protein biochemical information is continuously curated based on the available literature and structural information. Over 6400 proteins have assigned EC numbers and 700 proteins have a PDB structure. The classification (i) reflects the structural features of these enzymes better than their sole substrate specificity, (ii) helps to reveal the evolutionary relationships between these enzymes and (iii) provides a convenient framework to understand mechanistic properties. This resource has been available for over 10 years to the scientific community, contributing to information dissemination and providing a transversal nomenclature to glycobiologists. More recently, this resource has been used to improve the quality of functional predictions of a number genome projects by providing expert annotation. The CAZy resource resides at URL: http://www.cazy.org/.
6,028 citations
••
TL;DR: This research presents a novel and scalable approach to genome engineering that addresses the challenge of integrating RNAseq data to provide real-time information about the “silent” response of the immune system to DNA editing.
Abstract: White1, J. Li1, W. Liang1, N. Bhagabati1, J. Braisted1, M. Klapa1, T. Currier1, M. Thiagarajan1, A. Sturn1, M. Snuffin2, A. Rezantsev2, D. Popov2, A. Ryltsov2, E. Kostukovich2, I. Borisovsky2, Z. Liu3, A. Vinsavich3, V. Trush3, and J. Quackenbush1,4 1The Institute for Genomic Research, Rockville, MD, 2DataNaut, Bethesda, MD, 3Syntek Systems, Bethesda, MD, and 4Department of Biochemistry, George Washington University, Washington, D.C., USA
4,756 citations
01 Feb 2015
TL;DR: In this article, the authors describe the integrative analysis of 111 reference human epigenomes generated as part of the NIH Roadmap Epigenomics Consortium, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression.
Abstract: The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.
4,409 citations
••
University of Tennessee1, Oak Ridge National Laboratory2, West Virginia University3, Umeå University4, University of British Columbia5, United States Department of Energy6, Ghent University7, Swedish University of Agricultural Sciences8, Institut national de la recherche agronomique9, Virginia Tech10, Michigan Technological University11, University of Toronto12, Pennsylvania State University13, University of Provence14, University of Georgia15, University of Florida16, University of California, Berkeley17, Lawrence Berkeley National Laboratory18, University of Arizona19, Purdue University20, Stanford University21, United States Department of Agriculture22, University of Helsinki23, University of Turku24, Massachusetts Institute of Technology25, University of Tennessee Health Science Center26, University of Tübingen27
TL;DR: The draft genome of the black cottonwood tree, Populus trichocarpa, has been reported in this paper, with more than 45,000 putative protein-coding genes identified.
Abstract: We report the draft genome of the black cottonwood tree, Populus trichocarpa. Integration of shotgun sequence assembly with genetic mapping enabled chromosome-scale reconstruction of the genome. More than 45,000 putative protein-coding genes were identified. Analysis of the assembled genome revealed a whole-genome duplication event; about 8000 pairs of duplicated genes from that event survived in the Populus genome. A second, older duplication event is indistinguishably coincident with the divergence of the Populus and Arabidopsis lineages. Nucleotide substitution, tandem gene duplication, and gross chromosomal rearrangement appear to proceed substantially more slowly in Populus than in Arabidopsis. Populus has more protein-coding genes than Arabidopsis, ranging on average from 1.4 to 1.6 putative Populus homologs for each Arabidopsis gene. However, the relative frequency of protein domains in the two genomes is similar. Overrepresented exceptions in Populus include genes associated with lignocellulosic wall biosynthesis, meristem development, disease resistance, and metabolite transport.
4,025 citations
••
Agricultural Research Service1, Purdue University2, University of North Carolina at Charlotte3, University of California, Berkeley4, University of Arizona5, University of Maryland, College Park6, University of Missouri7, Joint Genome Institute8, National Center for Genome Resources9, Iowa State University10, University of Wisconsin–Stevens Point11, University of Nebraska–Lincoln12
TL;DR: An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.
Abstract: Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70% more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78% of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75% of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.
3,743 citations