scispace - formally typeset
Open AccessJournal ArticleDOI

The genome of black cottonwood, Populus trichocarpa (Torr. & Gray)

Gerald A. Tuskan, +115 more
- 15 Sep 2006 - 
- Vol. 313, Iss: 5793, pp 1596-1604
Reads0
Chats0
TLDR
The draft genome of the black cottonwood tree, Populus trichocarpa, has been reported in this paper, with more than 45,000 putative protein-coding genes identified.
Abstract
We report the draft genome of the black cottonwood tree, Populus trichocarpa. Integration of shotgun sequence assembly with genetic mapping enabled chromosome-scale reconstruction of the genome. More than 45,000 putative protein-coding genes were identified. Analysis of the assembled genome revealed a whole-genome duplication event; about 8000 pairs of duplicated genes from that event survived in the Populus genome. A second, older duplication event is indistinguishably coincident with the divergence of the Populus and Arabidopsis lineages. Nucleotide substitution, tandem gene duplication, and gross chromosomal rearrangement appear to proceed substantially more slowly in Populus than in Arabidopsis. Populus has more protein-coding genes than Arabidopsis, ranging on average from 1.4 to 1.6 putative Populus homologs for each Arabidopsis gene. However, the relative frequency of protein domains in the two genomes is similar. Overrepresented exceptions in Populus include genes associated with lignocellulosic wall biosynthesis, meristem development, disease resistance, and metabolite transport.

read more

Content maybe subject to copyright    Report

The genome of black cottonwood,
Populus trichocarpa (Torr. & Gray)
3
6
9
12
15
18
21
24
27
30
33
36
39
42
45
48
G. A. Tuskan,
1,3
S. DiFazio,
1,4*
S. Jansson,
9*
J. Bohlmann,
5*
I. Grigoriev,
8*
U. Hellsten,
8*
N. Putnam,
8*
S. Ralph,
5*
S. Rombauts,
10*
A. Salamov,
8*
J. Schein,
11*
L. Sterck,
10*
A.
Aerts,
8
R. R. Bhalerao,
9
R. P. Bhalerao,
12
D. Blaudez,
13
W. Boerjan,
10
A. Brun,
13
A.
Brunner,
14
V. Busov,
15
M. Campbell,
16
J. Carlson,
17
M. Chalot,
13
J. Chapman,
8
G.-L.
Chen,
2
D. Cooper,
5
P.M. Coutinho,
19
J. Couturier,
13
S. Covert,
20
Q. Cronk,
6
R.
Cunningham,
1
J. Davis,
22
S. Degroeve,
10
A. Déjardin,
23
C. dePamphilis,
18
J. Detter,
8
B.
Dirks,
24
I. Dubchak,
8,25
S. Duplessis,
13
J. Ehlting,
6
B. Ellis,
5
K. Gendler,
26
D. Goodstein,
8
M. Gribskov,
27
J. Grimwood,
28
A. Groover,
29
L. Gunter,
1
B. Hamberger,
6
B. Heinze,
30
Y.
Helariutta,
31,12,33
B. Henrissat,
19
D. Holligan,
21
R. Holt,
11
W. Huang,
8
N. Islam-Faridi,
34
S.
Jones,
11
M. Jones-Rhoades,
35
R. Jorgensen,
26
C. Joshi,
15
J. Kangasjärvi,
32
J. Karlsson,
9
C. Kelleher,
5
R. Kirkpatrick,
11
M. Kirst,
22
A. Kohler,
13
U. Kalluri,
1
F. Larimer,
2
J. Leebens-
Mack,
21
J.-C. Leplé,
23
P. Locascio,
2
Y. Lou,
8
S. Lucas,
8
F. Martin,
13
B. Montanini,
13
C.
Napoli,
26
D.R. Nelson,
36
C. Nelson,
37
K. Nieminen,
31
O. Nilsson,
12
G. Peter,
22
R.
Philippe,
5
G. Pilate,
23
A. Poliakov,
25
J. Razumovskaya,
2
P. Richardson,
8
C. Rinaldi,
13
K.
Ritland,
7
P. Rouzé,
10
D. Ryaboy,
25
J. Schmutz,
28
J. Schrader,
38
B. Segerman,
9
H. Shin,
11
A. Siddiqui,
11
F. Sterky,
39
A. Terry,
8
C. Tsai,
15
E. Uberbacher,
2
P. Unneberg,
39
J.
Vahala,
32
K. Wall,
18
S. Wessler,
21
G. Yang,
21
T. Yin,
1
C. Douglas,
6†
M. Marra,
11†
G.
Sandberg,
12†
Y. Van de Peer,
10†
D. Rokhsar,
8,24†
1
Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN
37831, USA.
2
Life Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA.
3
Plant Sciences Department, University of Tennessee, TN 37996, USA.
4
Department of Biology, West Virginia University, Morgantown, WV 26506, USA.
5
Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z4,
Canada.
6
Department of Botany, University of British Columbia, Vancouver, BC V6T 1Z4,
Canada.
7
Department of Forest Sciences, University of British Columbia, Vancouver, BC V6T
1Z4, Canada.
8
U.S. Department of Energy, Joint Genome Institute, Walnut Creek, CA 94598, USA.
9
Umeå Plant Science Centre, Department of Plant Physiology, Umeå University, SE-901
87, Umeå, Sweden.
10
Department of Plant Systems Biology, Flanders Interuniversity Institute for
Biotechnology (VIB), Ghent University, B-9052 Gent, Belgium.

11
Genome Sciences Centre, 100-570 West 7th Avenue, Vancouver, BC V5Z 4S6,
Canada.
3
6
9
12
15
18
21
24
27
30
33
36
39
42
45
48
51
12
Umeå Plant Science Centre, Department of Forest Genetics and Plant Physiology,
Swedish University of Agricultural Sciences, SE-901 83 Umeå, Sweden.
13
Tree-Microbe Interactions Unit, INRA-Université Henri Poincaré, INRA-Nancy, 54280
Champenoux, France.
14
Department of Forestry, Virginia Polytechnic Institute and State University, Blacksburg,
VA 24061, USA.
15
Biotechnology Research Center, School of Forest Resources and Environmental
Science, Michigan Technological University, Houghton, MI 49931, USA.
16
Department of Cell & Systems Biology, University of Toronto, 25 Willcocks St.,
Toronto, Ontario, M5S 3B2 Canada.
17
School of Forest Resources and Huck Institutes of the Life Sciences, the Pennsylvania
State University, University Park, PA 16802, USA.
18
Department of Biology, Institute of Molecular Evolutionary Genetics, and Huck
Institutes of Life Sciences, The Pennsylvania State University, University Park, PA
16802, USA.
19
Architecture et Fonction des Macromolécules Biologiques, UMR6098, CNRS and
Universities of Aix-Marseille I & II, case 932, 163 avenue de Luminy, 13288 Marseille,
France.
20
Warnell School of Forest Resources, University of Georgia, Athens, GA 30602, USA.
21
Department of Plant Biology, University of Georgia, Athens, GA 30602, USA.
22
School of Forest Resources and Conservation, Genetics Institute, and Plant Molecular
and Cellular Biology Program, University of Florida, Gainesville, FL 32611, USA.
23
Institut National de la Recherche Agronomique –Orléans, Unit of Forest Improvement,
Genetics and Physiology, 45166 Olivet Cedex, France.
24
Center for Integrative Genomics, University of California, Berkeley, CA 94720 , USA.
25
Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720,
USA.
26
Department of Plant Sciences, University of Arizona, Tucson, AZ 85721, USA.
27
Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA.
28
The Stanford Human Genome Center and the Department of Genetics, Stanford
University School of Medicine, Palo Alto, CA 94305, USA.
1

3
6
9
12
15
18
21
24
27
30
33
36
29
Institute of Forest Genetics, United States Department of Agriculture, Forest Service,
Davis, CA 95616, USA.
30
Federal Research Centre for Forests, Hauptstrasse 7, A-1140 Vienna, Austria.
31
Plant Molecular Biology Laboratory, Institute of Biotechnology, University of Helsinki,
FI-00014 Helsinki, Finland.
32
Department of Biological and Environmental Sciences, University of Helsinki, FI-00014
Helsinki, Finland.
33
Department of Biology, 200014, University of Turku, FI-20014 Turku, Finland.
34
Southern Institute of Forest Genetics, United States Department of Agriculture, Forest
Service and Department of Forest Science, Texas A&M University, College Station, TX
77843, USA.
35
Whitehead Institute for Biomedical Research and Department of Biology,
Massachusetts Institute of Technology, Cambridge, MA 02142, USA.
36
Department of Molecular Sciences and Center of Excellence in Genomics and
Bioinformatics, University of Tennessee, Memphis, TN 38163 , USA.
37
Southern Institute of Forest Genetics, United States Department of Argiculture, Forest
Service, Saucier, MS 39574, USA.
38
Developmental Genetics, University of Tübingen, D-72076 Tübingen, Germany.
39
Department of Biotechnology, KTH, AlbaNova University Center, SE-106 91
Stockholm, Sweden.
*These authors contributed equally to this work as second authors.
†These authors contributed equally to this work as senior authors.
2

ABSTRACT
3
6
9
12
15
18
We report the draft genome of the black cottonwood tree, Populus trichocarpa.
Integration of shotgun sequence assembly with genetic mapping enabled chromosome-
scale reconstruction of the genome. Over 45,000 putative protein-coding genes were
identified. Analysis of the assembled genome revealed a whole-genome duplication
event, with approximately 8,000 pairs of duplicated genes from that event surviving in
the Populus genome. A second, older duplication event is indistinguishably coincident
with the divergence of the Populus and Arabidopsis lineages. Nucleotide substitution,
tandem gene duplication and gross chromosomal rearrangement appear to proceed
substantially slower in Populus relative to Arabidopsis. Populus has more protein-coding
genes than Arabidopsis, ranging on average between 1.4-1.6 putative Populus
homologs for each Arabidopsis gene. However, the relative frequency of protein
domains in the two genomes is similar. Overrepresented exceptions in Populus include
genes associated with disease resistance, meristem development, metabolite transport
and lignocellulosic wall biosynthesis.
KEYWORDS: Whole-genome shotgun sequencing, genome-wide duplication, perennial
habit, woody plant, poplar, Salix, Arabidopsis, angiosperm evolution
3

Forests cover thirty percent of the earth’s terrestrial surface (ca., 3.8 billion
hectares), harbor large amounts of biodiversity, and provide humanity with benefits,
including clean air and water, lumber, fiber and fuels. Worldwide, one quarter of all
industrial feedstocks have their origins in forest-based resources(1). Occurring in
extensive wild populations across continents, large and long-lived forest trees have
evolved under selective pressures unlike those of annual herbaceous plants. Their
growth and development involves extensive secondary growth, coordinated signaling
and distribution of water and nutrients over great distances, and strategic storage and
re-distribution of metabolites in concordance with inter-annual climatic cycles. The need
to survive and thrive in fixed locations over centuries under continually changing physical
and biotic stresses also sets them apart from short-lived plants. Many of the features
that distinguish trees from other organisms, especially their large sizes and long-
generation times, present challenges to the study of the cellular and molecular
mechanisms that underlie their unique biology. To enable and facilitate such
investigations in a relatively well-studied model tree, we describe here the draft genome
of black cottonwood, Populus trichocarpa (Torr. & Gray) and its comparison with other
sequenced plant genomes.
3
6
9
12
15
18
21
24
27
30
33
Populus trichocarpa was selected as the model forest species for genome
sequencing not only because of its modest genome size, but also because of its rapid
growth, relative ease of experimental manipulation, and range of available genetic
tools(2, 3). The genus is phenotypically diverse and interspecific hybrids facilitate the
genetic mapping of economically important traits related to growth rate, stature, wood
properties and paper quality. Dozens of quantitative trait loci (QTL) are already
mapped(4) and methods of genetic transformation have been developed(5). Under
appropriate conditions, Populus can reach reproductive maturity in as few as 4-6 years,
permitting selective breeding for large-scale sustainable plantation forestry. Finally, rapid
growth of trees coupled with thermochemical or biochemical conversion of the
lignocellulosic portion of the plant has the potential to provide a renewable energy
resource with a concomitant reduction of greenhouse gases(6-8).
SEQUENCING and ASSEMBLY
A single female genotype, ‘Nisqually-1’, was selected and used in a whole-
genome shotgun sequence and assembly strategy(9). Approximately 7.6 million end-
reads representing 4.2 billion high-quality (i.e., Q20 or higher) base pairs were
4

Citations
More filters
Journal ArticleDOI

The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics

TL;DR: The Carbohydrate-Active Enzyme (CAZy) database is a knowledge-based resource specialized in the enzymes that build and breakdown complex carbohydrates and glycoconjugates and has been used to improve the quality of functional predictions of a number genome projects by providing expert annotation.
Journal ArticleDOI

Phytozome: a comparative platform for green plant genomics

TL;DR: Phytozome provides a view of the evolutionary history of every plant gene at the level of sequence, gene structure, gene family and genome organization, while at the same time providing access to the sequences and functional annotations of a growing number of complete plant genomes.
Journal ArticleDOI

Ancestral polyploidy in seed plants and angiosperms

TL;DR: Comprehensive phylogenomic analyses of sequenced plant genomes and more than 12.6 million new expressed-sequence-tag sequences from phylogenetically pivotal lineages are used to elucidate two groups of ancient gene duplications, implicating two WGDs in ancestral lineages shortly before the diversification of extant seed plants and extant angiosperms.
References
More filters
Journal ArticleDOI

The evolutionary fate and consequences of duplicate genes

TL;DR: Although duplicate genes may only rarely evolve new functions, the stochastic silencing of such genes may play a significant role in the passive origin of new species.
Journal ArticleDOI

Computational Identification of Plant MicroRNAs and Their Targets, Including a Stress-Induced miRNA

TL;DR: Comparative genomic approaches were developed to systematically identify both miRNAs and their targets that are conserved in Arabidopsis thaliana and rice, and the expression of miR395, the sulfurylase-targeting miRNA, increases upon sulfate starvation, showing that miRNAAs can be induced by environmental stress.
Journal ArticleDOI

Genome-Wide Analysis of the ERF Gene Family in Arabidopsis and Rice

TL;DR: It was concluded that the major functional diversification within the ERF family predated the monocot/dicot divergence and might have been due to chromosomal/segmental duplication and tandem duplication, as well as more ancient transposition and homing.
Journal ArticleDOI

Biosynthesis of flavonoids and effects of stress.

TL;DR: The accumulation of red or purple flavonoids is a hallmark of plant stress and mounting evidence points to diverse physiological functions for these compounds in the stress response.
Journal ArticleDOI

Rfam: annotating non-coding RNAs in complete genomes

TL;DR: The Rfam database aims to facilitate the identification and classification of new members of known sequence families, and distributes annotation of ncRNAs in over 200 complete genome sequences.
Related Papers (5)

The B73 Maize Genome: Complexity, Diversity, and Dynamics

Patrick S. Schnable, +159 more
- 20 Nov 2009 - 
Frequently Asked Questions (22)
Q1. What contributions have the authors mentioned in the paper "The genome of black cottonwood, " ?

3.3 ǫ 3.3 this paper 3.2 ǒ 3. 

5. R. Meilan, D. Ellis, G. Pilate, A. M. Brunner, J. Skinner, in Forest Biotechnology: Scientific Opportunities and Social Challenges., S. H. Strauss, Jr. H. D. Bradshaw, Eds. ( Resources for the Future Press, Washington, D. C., 2004 ), pp. 36-51. 9. See Supplemental Materials for further information. 

In total, 356 microsatellite markers were used to assign 155 scaffolds (335 Mb of sequence) to the 19 P. trichocarpa chromosome-scale linkage groups (LG) (13). 

Cytokinins are thought to control the identity and proliferation of cell typesrelevant for wood formation as well as general cell division(67). 

Approximately 89% of the predicted gene models had homology (E-value<1e-8)to the non-redundant (NR) set of proteins from NCBI, including 60% with extensive homology over 75% of both model and NR protein lengths. 

the second most abundant secondary cell wall polymer after cellulose, isa complex polymer of monolignols (hydroxycinnamyl alcohols) that encrusts and interacts with the cellulose/hemicellulose matrix of the secondary cell wall(43). 

Phenolic glycosides and condensed tannins alone can constitute up to 35% leaf dry weight and are abundant in buds, bark and roots of Populus(50, 54, 55). 

The higher complement of GA20-oxidase genes may have biological significance in Populus with respect to secondary xylem and fiber cell development. 

One of the InterPro domains in this protein,36IPR008271, a serine/threonine proteinkinase active site, was the most frequent domain in tandemly repeated genes in both species (SOM F8). 

Within the reference gene set, 13,019 pairs of orthologs were identified between genes in Populus and Arabidopsis using the best bi-directional BLAST hits, with average mutual coverage of these alignments equal to 93%; 11,654 pairs of orthologs had coverage greater than 90% of gene lengths, with only 156 genes with less than 50% coverage. 

Among the processes unique to tree biology, one of the most obvious is the yearly development of secondary xylem from the vascular cambium. 

Analysis of the assembled genome revealed a whole-genome duplication event, with approximately 8,000 pairs of duplicated genes from that event surviving in the Populus genome. 

As a long-lived vegetatively propagated species Populus has the potential to successfully contribute gametes to multiple generations. 

Paired BAC-end sequences from most of the physical map were linked to the large-scale assembly, permitting 2,460 of the physical map contigs to be positioned on the genomeassembly. 

The total number of genes in such arrays was 4,839 and the total length of tandemly duplicated segments in Populus was 47.9 Mb or 15.6% of the genome (SOM F8). 

The NBS-coding R gene family is one of the largest in Populus, with 399 members, approximately 2-fold higher than in Arabidopsis. 

Populus has a 1.3 to 1.0 ratio in the number of snRNA compared with Arabidopsis, yet U1, U2 and U5 are overrepresented in Populus while U4 is underrepresented. 

The distribution of 4DTV values for paralogous pairs of genes also shows that a large fraction of the Populus genome falls in a set of duplicated segments anchored by gene pairs with 4DTV at 0.364 3 +0.001, representing the residue of a more ancient, large-scale,apparently synchronous duplication event (Fig 3A). 

This relatively older duplication event covers approximately 59% of the Populus genome with 16% of genes in these segments present in two copies. 

A phylogenetic analysis using the known and predicted ARF protein sequences showed that Populus and Arabidopsis ARF gene families have expanded independently since they diverged from their common ancestor. 

The colinearity of genetic maps among multiple Populus species suggests that the genome reorganization must have occurred prior to the evolution of the modern taxa of Populus. 

On the basis of the depth of coverage of major scaffolds (~7.5X), and the total amount of non-organellar shotgun sequence that was generated, the Populus genome size was estimated to be 485 3 +10 Mb, in rough agreement with previouscytogenetic estimates of approximately 550 Mb(10).