Durham Research Online
Deposited in DRO:
07 March 2017
Version of attached le:
Accepted Version
Peer-review status of attached le:
Peer-reviewed
Citation for published item:
Wang, M. and Tu, L. and Lin, M. and Lin, Z. and Wang, P. and Yang, Q. and Ye, Z. and Shen, C. and Zhou,
X. and Zhang, L. and Li, J. and Nie, X. and Li, Z. and Guo, K. and Ma, Y. and Jin, S. and Zhu, L. and Yang,
X. and Min, L. and Zhang, Q. and Lindsey, K. and Zhang, X. (2017) 'Asymmetric subgenome selection and
cis-regulatory divergence during cotton domestication.', Nature genetics., 49 (4). pp. 579-587.
Further information on publisher's website:
https://doi.org/10.1038/ng.3807
Publisher's copyright statement:
Additional information:
Use policy
The full-text may be used and/or reproduced, and given to third parties in any format or medium, without prior permission or charge, for
personal research or study, educational, or not-for-prot purposes provided that:
•
a full bibliographic reference is made to the original source
•
a link is made to the metadata record in DRO
•
the full-text is not changed in any way
The full-text must not be sold in any format or medium without the formal permission of the copyright holders.
Please consult the full DRO policy for further details.
Durham University Library, Stockton Road, Durham DH1 3LY, United Kingdom
Tel : +44 (0)191 334 3042 | Fax : +44 (0)191 334 2971
https://dro.dur.ac.uk
1
Asymmetric subgenome selection and cis-regulatory divergence 1
during cotton domestication 2
3
Maojun Wang
1
, Lili Tu
1
, Min Lin
1,2
, Zhongxu Lin
1
, Pengcheng Wang
1
, Qingyong 4
Yang
1,2
, Lin Zhang
1
, Zhengxiu Ye
1
, Chao Shen
1
, Jianying Li
1
, Kai Guo
1
, Xiaolin 5
Zhou
1
, Xinhui Nie
3
, Zhonghua Li
1
, Yizan Ma
1
, Cong Huang
1
, Shuangxia Jin
1
, Longfu 6
Zhu
1
, Xiyan Yang
4
, Ling Min
4
, Daojun Yuan
4
, Qinghua Zhang
1
, Keith Lindsey
5
& 7
Xianlong Zhang
1
8
9
1
National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural 10
University, Wuhan 430070, Hubei, China. 11
2
Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, 12
Huazhong Agricultural University, Wuhan 430070, Hubei, China 13
3
Key Laboratory of Oasis Eco-agriculture of the Xinjiang Production and 14
Construction Corps, College of Agronomy, Shihezi University, Shihezi, Xinjiang, 15
China. 16
4
College of Plant Science and Technology, Huazhong Agricultural University, 17
Wuhan 430070, Hubei, China 18
5
Department of Biosciences, Durham University, Durham DH1 3LE, United 19
Kingdom. 20
21
Correspondence should be addressed to X.Z. (xlzhang@mail.hzau.edu.cn) or K.L. 22
(keith.lindsey@durham.ac.uk) 23
24
Tel: +86-27-87280510 25
Fax: +86-27-87280196 26
2
Comparative population genomics offers an excellent opportunity for 27
unravelling the genetic history of crop domestication. Upland cotton (Gossypium 28
hirsutum) has long been an important economic crop, but a genome-wide and 29
evolutionary understanding of the effects of human selection is largely 30
unresolved. Here, we describe an integrated variation map for 352 wild and 31
domesticated cotton accessions. This has allowed us to scan 93 domestication 32
sweeps and identify 19 candidate loci for fiber quality-related traits by a 33
genome-wide association study. We provide evidence to show asymmetric 34
subgenome domestication for directional selection of long white fibers. Global 35
analyses of DNase I-hypersensitive sites and 3-dimensional genome architecture, 36
linking functional variants to gene transcription, reveal the effects of 37
domestication on cis-regulatory divergence. This study provides new insights into 38
the evolution of gene organization, regulation and adaptation in a major crop, 39
and represents a rich resource for genome-based cotton improvement. 40
41
Early human domestication of wild plants represented the first step in the 42
development of modern crop varieties, and migration and differential directional 43
selection over millennia has contributed to the adaptation of species in different 44
environments for improved yield and quality traits
1
. In the current genomic era, 45
high-throughput ‘omics’ technologies provide significant opportunities for a detailed 46
analysis of genetic change through domestication and for new, targeted and precise 47
genome-based crop breeding strategies
2,3
. 48
Cotton is one of the most important economic crops in the world, both as a 49
source of natural and renewable fiber for textiles, and as a source of seed oil and 50
protein
4
. Allotetraploid Upland cotton is formed from an inter-genomic hybridization 51
event approximately 1–2 million years ago
5
. Originally native to the Yucatan 52
peninsula in Mesoamerica, it was first domesticated at least 4,000 to 5,000 years ago, 53
with subsequent directional selection
6
. Modern varieties of cultivated cotton produce 54
spinnable fine white fiber, which is preferable to the sparser, coarse brown fiber of 55
3
wild cotton. Previous molecular studies have shown that domestication has 56
dramatically rewired the transcriptome during fiber development
7,8
. What remains 57
largely unknown, however, is the effect of human selection on the organization of the 58
cotton genome and its gene regulatory landscape. Using as a comparator the recently 59
published genome sequence of Texas Marker-1 (TM-1)
9,10
, we can address this 60
question through a comprehensive population genome analysis of multiple wild and 61
cultivated cotton genotypes. 62
63
RESULTS 64
A genome variation map for cotton 65
To construct an integrated variation map of Upland cotton, we collected a total of 352 66
diverse accessions for genomic sequence analysis
11
. These included 31 wild 67
accessions and 321 cultivated accessions from around the world (Fig. 1a and 68
Supplementary Table 1). A total of 6.1 Tb of sequence data were integrated, with an 69
average depth of 6.9× (Supplementary Table 1). These data were mapped against the 70
TM-1 genome
9
to identify genomic variants. We detected a total of 7,497,568 SNPs, 71
351,013 small indels (shorter than 10 bp) and 93,786 structural variants (SVs) (Table 72
1, Supplementary Fig. 1 and Supplementary Tables 2-4). The accuracy of SNPs 73
was estimated to be 98.2%, determined by Sanger sequencing of 300 randomly 74
selected SNPs in 3 individual accessions. In addition, we selected 50 representative 75
accessions (10 wild and 40 cultivated cottons) from the 352 accessions for RNA 76
sequencing (Supplementary Table 5), and generated 78,728 SNPs, of which more 77
than 93.6% overlapped with SNPs from re-sequencing data. This integrated variation 78
data set represents a new resource for cotton genetics and breeding. 79
80
Cotton population properties and linkage disequilibrium 81
We explored the phylogenetic relationship between the 352 cotton accessions using a 82
whole-genome SNP analysis. These cottons can be divided into 3 groups (Fig. 1b and 83
Supplementary Fig. 2), as supported by a principal component analysis (PCA; Fig. 84
4
1c). Wild cotton accessions cluster together (Group-I; the Wild group) except for a 85
few accessions which cluster into a second group (Group-II; the ABI group), which 86
mainly comprises cottons from America, Brazil and India. The third group (Group-III; 87
the Chinese group) mostly consists of cotton cultivars in China, which were collected 88
from the major Chinese cotton cultivation regions: the Northwestern Inland Region 89
(NIR), the Northern Specific Early Maturation Region (NSEMR), the Yellow River 90
Region (YRR) and the Yangtze River Region (YtRR)
12
. This group could be further 91
classified into two subclades (Group-III-1 and Group-III-2; Fig. 1b), which exhibit 92
different geographic distribution patterns. The subclade Group-III-1 is represented by 93
cotton accessions from northern China (NIR and NSEMR), while Group-III-2 94
includes the majority of accessions from southern China (YtRR). We observed that a 95
few cotton accessions, which were collected from North America, clustered into 96
Group-III, which might be due to the introduction of Upland cotton to China from 97
America during the first thirty years of the 20
th
century
13
. 98
Crop species may experience population bottlenecks during domestication
14
. To 99
examine this possibility in cotton, genetic diversity for each group was measured by 100
calculating π values. We found that genetic diversity decreased from the Wild cotton 101
group (π = 1.32 × 10
-3
; the A-subgenome (At, the lower case t denotes tetraploid), 102
1.36 × 10
-3
; the D-subgenome (Dt), 1.25 × 10
-3
) to the ABI group (π = 0.88 × 10
-3
; At, 103
0.96 × 10
-3
; Dt, 0.66 × 10
-3
) and to the Chinese group (π = 0.67 × 10
-3
; At, 0.72 × 10
-3
; 104
Dt, 0.56 × 10
-3
) (Fig. 1d and Supplementary Fig. 3). This shows that a large amount 105
of genetic variation in both subgenomes has been lost during cotton domestication, 106
especially for the Dt. Compared with other major crops, cotton possesses narrow 107
genetic diversity even within wild cotton accessions (Supplementary Table 6). To 108
investigate population divergence, we calculated the population fixation statistics (F
ST
) 109
among groups (Fig. 1d). This reveals large population divergence between the 110
Chinese group and the Wild group. Population divergence between the Chinese group 111
and the ABI group was observed, suggesting that Upland cottons in China have 112
undergone population divergence after their introduction. 113
Linkage disequilibrium (LD; indicated by r
2
) was found to drop with physical 114
distance between SNPs in all cotton groups (Fig. 1e). The LD extent for each group 115
was measured as the chromosomal distance when LD dropped to half of its maximum 116