Home
/
Authors
/
Shengqiang Shu

Author

Shengqiang Shu

Other affiliations: United States Department of Energy, Joint Genome Institute

Bio: Shengqiang Shu is an academic researcher from Lawrence Berkeley National Laboratory. The author has contributed to research in topics: Genome & Gene. The author has an hindex of 29, co-authored 63 publications receiving 12867 citations. Previous affiliations of Shengqiang Shu include United States Department of Energy & Joint Genome Institute.

Topics: Genome, Gene, Genome evolution, Genomics, Reference genome ...read more

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2010
2002

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Genome sequence of the palaeopolyploid soybean

[...]

Jeremy Schmutz, Steven B. Cannon¹, Jessica A. Schlueter², Jessica A. Schlueter³, Jianxin Ma³, Therese Mitros⁴, William Nelson⁵, David L. Hyten¹, Qijian Song⁶, Qijian Song¹, Jay J. Thelen⁷, Jianlin Cheng⁷, Dong Xu⁷, Uffe Hellsten⁸, Gregory D. May⁹, Yeisoo Yu⁵, Tetsuya Sakurai, Taishi Umezawa, Madan K. Bhattacharyya¹⁰, Devinder Sandhu¹¹, Babu Valliyodan⁷, Erika Lindquist⁸, Myron Peto¹, David Grant¹, Shengqiang Shu⁸, David Goodstein⁸, Kerrie Barry⁸, Montona Futrell-Griggs³, Brian Abernathy³, Jianchang Du³, Zhixi Tian³, Liucun Zhu³, Navdeep Gill³, Trupti Joshi⁷, Marc Libault⁷, Ananad Sethuraman, Xue-Cheng Zhang⁷, Kazuo Shinozaki, Henry T. Nguyen⁷, Rod A. Wing⁵, Perry B. Cregan¹, James E. Specht¹², Jane Grimwood⁸, Daniel S. Rokhsar⁸, Gary Stacey⁷, Randy C. Shoemaker¹, Scott A. Jackson³ - Show less +43 more•Institutions (12)

Agricultural Research Service¹, University of North Carolina at Charlotte², Purdue University³, University of California, Berkeley⁴, University of Arizona⁵, University of Maryland, College Park⁶, University of Missouri⁷, Joint Genome Institute⁸, National Center for Genome Resources⁹, Iowa State University¹⁰, University of Wisconsin–Stevens Point¹¹, University of Nebraska–Lincoln¹²

14 Jan 2010-Nature

TL;DR: An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.

...read moreread less

Abstract: Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70% more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78% of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75% of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.

...read moreread less

3,743 citations

Journal Article•DOI•

Phytozome: a comparative platform for green plant genomics

[...]

David Goodstein¹, Shengqiang Shu¹, Russell Howson¹, Rochak Neupane¹, Richard D. Hayes¹, Joni Fazo¹, Therese Mitros¹, William Dirks¹, Uffe Hellsten¹, Nicholas H. Putnam¹, Daniel S. Rokhsar¹ - Show less +7 more•Institutions (1)

United States Department of Energy¹

01 Jan 2012-Nucleic Acids Research

TL;DR: Phytozome provides a view of the evolutionary history of every plant gene at the level of sequence, gene structure, gene family and genome organization, while at the same time providing access to the sequences and functional annotations of a growing number of complete plant genomes.

...read moreread less

Abstract: The number of sequenced plant genomes and associated genomic resources is growing rapidly with the advent of both an increased focus on plant genomics from funding agencies, and the application of inexpensive next generation sequencing. To interact with this increasing body of data, we have developed Phytozome (http://www.phytozome.net), a comparative hub for plant genome and gene family data and analysis. Phytozome provides a view of the evolutionary history of every plant gene at the level of sequence, gene structure, gene family and genome organization, while at the same time providing access to the sequences and functional annotations of a growing number (currently 25) of complete plant genomes, including all the land plants and selected algae sequenced at the Joint Genome Institute, as well as selected species sequenced elsewhere. Through a comprehensive plant genome database and web portal, these data and analyses are available to the broader plant science research community, providing powerful comparative genomics tools that help to link model systems with other plants of economic and ecological importance.

...read moreread less

3,728 citations

Journal Article•DOI•

Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres

[...]

Andrew H. Paterson¹, Jonathan F. Wendel², Heidrun Gundlach, Hui Guo¹, Jerry Jenkins³, Dianchuan Jin, Danny J. Llewellyn⁴, Kurtis C. Showmaker⁵, Shengqiang Shu³, Joshua A. Udall⁶, Mi-Jeong Yoo², Robert L. Byers⁶, Wei Chen, Adi Doron-Faigenboim, Mary V. Duke⁷, Lei Gong², Jane Grimwood³, Corrinne E. Grover², Kara Grupp², Guanjing Hu², Tae-Ho Lee¹, Jingping Li¹, Lifeng Lin¹, Tao Liu, Barry S. Marler¹, Justin T. Page⁶, Alison W. Roberts⁸, Elisson Romanel⁹, William S. Sanders⁵, Emmanuel Szadkowski², Xu Tan¹, Haibao Tang¹⁰, Haibao Tang¹, Chunming Xu², Chunming Xu¹¹, Jinpeng Wang, Zining Wang¹, Dong Zhang¹, Lan Zhang, Hamid Ashrafi¹², Frank Bedon⁴, John E. Bowers¹, Curt L. Brubaker¹³, Curt L. Brubaker⁴, Peng W. Chee¹⁴, Sayan Das¹, Alan R. Gingle¹, Candace H. Haigler¹⁵, David B. Harker⁶, Lucia Vieira Hoffmann¹⁶, Ran Hovav, Don C. Jones¹⁷, Cornelia Lemke¹, Shahid Mansoor¹, Shahid Mansoor¹⁸, Mehboob-ur Rahman¹⁸, Lisa N. Rainville¹, Aditi Rambani⁶, Umesh K. Reddy¹⁹, Junkang Rong¹, Yehoshua Saranga²⁰, Brian E. Scheffler⁷, Jodi A. Scheffler⁷, David M. Stelly²¹, Barbara A. Triplett⁷, Allen Van Deynze¹², Maite F S Vaslin⁹, V. N. Waghmare²², Sally A. Walford⁴, Robert J. Wright²³, Essam A. Zaki, Tianzhen Zhang²⁴, Elizabeth S. Dennis⁴, Klaus F. X. Mayer, Daniel G. Peterson⁵, Daniel S. Rokhsar³, Xiyin Wang¹, Jeremy Schmutz³ - Show less +74 more•Institutions (24)

Plant Genome Mapping Laboratory¹, Iowa State University², Joint Genome Institute³, Commonwealth Scientific and Industrial Research Organisation⁴, Mississippi State University⁵, Brigham Young University⁶, Agricultural Research Service⁷, University of Rhode Island⁸, Federal University of Rio de Janeiro⁹, J. Craig Venter Institute¹⁰, Northeast Normal University¹¹, University of California, Davis¹², Bayer¹³, University of Georgia¹⁴, North Carolina State University¹⁵, Empresa Brasileira de Pesquisa Agropecuária¹⁶, Cotton Incorporated¹⁷, National Institute for Biotechnology and Genetic Engineering¹⁸, West Virginia State University¹⁹, Hebrew University of Jerusalem²⁰, Texas A&M University²¹, Central Institute for Cotton Research²², Texas Tech University²³, Nanjing Agricultural University²⁴

20 Dec 2012-Nature

TL;DR: It is shown that an abrupt five- to sixfold ploidy increase approximately 60 million years (Myr) ago, and allopolyploidy reuniting divergent Gossypium genomes approximately 1–2 Myr ago, conferred about 30–36-fold duplication of ancestral angiosperm genes in elite cottons, genetic complexity equalled only by Brassica among sequenced angiosperms.

...read moreread less

Abstract: Polyploidy often confers emergent properties, such as the higher fibre productivity and quality of tetraploid cottons than diploid cottons bred for the same environments. Here we show that an abrupt five- to sixfold ploidy increase approximately 60 million years (Myr) ago, and allopolyploidy reuniting divergent Gossypium genomes approximately 1-2 Myr ago, conferred about 30-36-fold duplication of ancestral angiosperm (flowering plant) genes in elite cottons (Gossypium hirsutum and Gossypium barbadense), genetic complexity equalled only by Brassica among sequenced angiosperms. Nascent fibre evolution, before allopolyploidy, is elucidated by comparison of spinnable-fibred Gossypium herbaceum A and non-spinnable Gossypium longicalyx F genomes to one another and the outgroup D genome of non-spinnable Gossypium raimondii. The sequence of a G. hirsutum A(t)D(t) (in which 't' indicates tetraploid) cultivar reveals many non-reciprocal DNA exchanges between subgenomes that may have contributed to phenotypic innovation and/or other emergent properties such as ecological adaptation by polyploids. Most DNA-level novelty in G. hirsutum recombines alleles from the D-genome progenitor native to its New World habitat and the Old World A-genome progenitor in which spinnable fibre evolved. Coordinated expression changes in proximal groups of functionally distinct genes, including a nuclear mitochondrial DNA block, may account for clusters of cotton-fibre quantitative trait loci affecting diverse traits. Opportunities abound for dissecting emergent properties of other polyploids, particularly angiosperms, by comparison to diploid progenitors and outgroups.

...read moreread less

1,015 citations

Journal Article•DOI•

A reference genome for common bean and genome-wide analysis of dual domestications

[...]

Jeremy Schmutz¹, Phillip E. McClean², Sujan Mamidi², G Albert Wu¹, Steven B. Cannon³, Jane Grimwood, Jerry Jenkins, Shengqiang Shu¹, Qijian Song³, Carolina Chavarro⁴, Mirayda Torres-Torres⁴, Valérie Geffroy⁵, Samira Mafi Moghaddam², Dongying Gao⁴, Brian Abernathy⁴, Kerrie Barry¹, Matthew W. Blair⁶, Mark A. Brick⁷, Mansi Chovatia¹, Paul Gepts⁸, David Goodstein¹, Michael D. Gonzales⁴, Uffe Hellsten¹, David L. Hyten³, Gaofeng Jia³, James D. Kelly⁹, Dave Kudrna¹⁰, Rian Lee², Manon M.S. Richard¹¹, Phillip N. Miklas³, Juan M. Osorno², Josiane Rodrigues³, Vincent Thareau¹¹, Carlos A. Urrea¹², Mei Wang¹, Yeisoo Yu¹⁰, Ming Zhang¹, Rod A. Wing¹⁰, Perry B. Cregan³, Daniel S. Rokhsar¹, Scott A. Jackson⁴ - Show less +37 more•Institutions (12)

United States Department of Energy¹, North Dakota State University², United States Department of Agriculture³, University of Georgia⁴, Institut national de la recherche agronomique⁵, Tennessee State University⁶, Colorado State University⁷, University of California, Davis⁸, Michigan State University⁹, University of Arizona¹⁰, University of Paris-Sud¹¹, University of Nebraska–Lincoln¹²

01 Jul 2014-Nature Genetics

TL;DR: 2 independent domestications from genetic pools that diverged before human colonization are confirmed and a set of genes linked with increased leaf and seed size are identified and combined with quantitative trait locus data from Mesoamerican cultivars.

...read moreread less

Abstract: Common bean (Phaseolus vulgaris L.) is the most important grain legume for human consumption and has a role in sustainable agriculture owing to its ability to fix atmospheric nitrogen. We assembled 473 Mb of the 587-Mb genome and genetically anchored 98% of this sequence in 11 chromosome-scale pseudomolecules. We compared the genome for the common bean against the soybean genome to find changes in soybean resulting from polyploidy. Using resequencing of 60 wild individuals and 100 landraces from the genetically differentiated Mesoamerican and Andean gene pools, we confirmed 2 independent domestications from genetic pools that diverged before human colonization. Less than 10% of the 74 Mb of sequence putatively involved in domestication was shared by the two domestication events. We identified a set of genes linked with increased leaf and seed size and combined these results with quantitative trait locus data from Mesoamerican cultivars. Genes affected by domestication may be useful for genomics-enabled crop improvement.

...read moreread less

1,012 citations

Journal Article•DOI•

The high-quality draft genome of peach ( Prunus persica ) identifies unique patterns of genetic diversity, domestication and genome evolution

[...]

Ignazio Verde¹, Albert G. Abbott², Simone Scalabrin, Sook Jung³, Shengqiang Shu⁴, Fabio Marroni, Tatyana Zhebentyayeva², Maria Teresa Dettori, Jane Grimwood⁴, Federica Cattonaro, Andrea Zuccolo, Laura Rossini⁵, Jerry Jenkins⁴, Elisa Vendramin, Lee A. Meisel⁶, Véronique Decroocq, Bryon Sosinski⁷, Simon E. Prochnik⁴, Therese Mitros⁸, Alberto Policriti, Guido Cipriani, Luca Dondini⁹, Stephen P. Ficklin³, David Goodstein⁴, Pengfei Xuan², Cristian Del Fabbro, Valeria Aramini, Dario Copetti, Susana González¹⁰, David S. Horner¹¹, Rachele Falchi¹², Susan Lucas⁴, Erica Mica, Jonathan Maldonado⁶, Barbara Lazzari⁵, Douglas G. Bielenberg², Raul Pirona¹¹, Mara Miculan, Abdelali Barakat², Raffaele Testolin, Alessandra Stella⁵, Stefano Tartarini⁹, Pietro Tonutti, Pere Arús¹³, Ariel Orellana¹⁰, Christina E. Wells, Dorrie Main³, Giannina Vizzotto¹², Herman Silva⁶, Francesco Salamini⁵, Jeremy Schmutz⁴, Michele Morgante, Daniel S. Rokhsar⁴ - Show less +49 more•Institutions (13)

Centra¹, Clemson University², Washington State University³, United States Department of Energy⁴, Parco Tecnologico Padano⁵, University of Chile⁶, North Carolina State University⁷, University of California, Berkeley⁸, University of Bologna⁹, Andrés Bello National University¹⁰, University of Milan¹¹, University of Udine¹², University of Barcelona¹³

01 May 2013-Nature Genetics

TL;DR: Comparisons showed that peach has not undergone recent whole-genome duplication, and even though the ancestral triplicated blocks in peach are fragmentary compared to those in grape, all seven paleosets of paralogs from the putative paleoancestor are detectable.

...read moreread less

Abstract: Rosaceae is the most important fruit-producing clade, and its key commercially relevant genera (Fragaria, Rosa, Rubus and Prunus) show broadly diverse growth habits, fruit types and compact diploid genomes. Peach, a diploid Prunus species, is one of the best genetically characterized deciduous trees. Here we describe the high-quality genome sequence of peach obtained from a completely homozygous genotype. We obtained a complete chromosome-scale assembly using Sanger whole-genome shotgun methods. We predicted 27,852 protein-coding genes, as well as noncoding RNAs. We investigated the path of peach domestication through whole-genome resequencing of 14 Prunus accessions. The analyses suggest major genetic bottlenecks that have substantially shaped peach genome diversity. Furthermore, comparative analyses showed that peach has not undergone recent whole-genome duplication, and even though the ancestral triplicated blocks in peach are fragmentary compared to those in grape, all seven paleosets of paralogs from the putative paleoancestor are detectable.

...read moreread less

935 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

Statistical method for testing the neutral mutation hypothesis by DNA polymorphism.

[...]

Fumio Tajima¹•Institutions (1)

Kyushu University¹

30 Oct 1989-Genomics

TL;DR: It is suggested that the natural selection against large insertion/deletion is so weak that a large amount of variation is maintained in a population.

...read moreread less

11,521 citations

Journal Article•DOI•

UniProt: the Universal Protein knowledgebase

[...]

Rolf Apweiler¹, Amos Marc Bairoch, Cathy H. Wu, Winona C. Barker, Brigitte Boeckmann, Serenella Ferro, Elisabeth Gasteiger, Hongzhan Huang, Rodrigo Lopez, Michele Magrane, Maria Jesus Martin, Darren A. Natale, Claire O'Donovan, Nicole Redaschi, Lai-Su L. Yeh - Show less +11 more•Institutions (1)

European Bioinformatics Institute¹

01 Jan 2004-Nucleic Acids Research

TL;DR: The Swiss-Prot, TrEMBL and PIR protein database activities have united to form the Universal Protein Knowledgebase (UniProt), which is to provide a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and query interfaces.

...read moreread less

Abstract: To provide the scientific community with a single, centralized, authoritative resource for protein sequences and functional information, the Swiss-Prot, TrEMBL and PIR protein database activities have united to form the Universal Protein Knowledgebase (UniProt) consortium. Our mission is to provide a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and query interfaces. The central database will have two sections, corresponding to the familiar Swiss-Prot (fully manually curated entries) and TrEMBL (enriched with automated classification, annotation and extensive cross-references). For convenient sequence searches, UniProt also provides several non-redundant sequence databases. The UniProt NREF (UniRef) databases provide representative subsets of the knowledgebase suitable for efficient searching. The comprehensive UniProt Archive (UniParc) is updated daily from many public source databases. The UniProt databases can be accessed online (http://www.uniprot.org) or downloaded in several formats (ftp://ftp.uniprot.org/pub). The scientific community is encouraged to submit data for inclusion in UniProt.

...read moreread less

7,298 citations

Journal Article•DOI•

NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins

[...]

Kim D. Pruitt¹, Tatiana Tatusova¹, Donna Maglott¹•Institutions (1)

National Institutes of Health¹

17 Dec 2004-Nucleic Acids Research

TL;DR: The National Center for Biotechnology Information Reference Sequence (RefSeq) database provides a non-redundant collection of sequences representing genomic data, transcripts and proteins that pragmatically includes sequence data that are currently publicly available in the archival databases.

...read moreread less

Abstract: The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff.

...read moreread less

4,229 citations

Journal Article•DOI•

Phytozome: a comparative platform for green plant genomics

[...]

United States Department of Energy¹

01 Jan 2012-Nucleic Acids Research

...read moreread less

3,728 citations

Journal Article•DOI•

Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes

[...]

Adam Siepel¹, Gill Bejerano, Jakob Skou Pedersen², Angie S. Hinrichs, Minmei Hou, Kate R. Rosenbloom, Hiram Clawson, John Spieth, LaDeana W. Hillier, Stephen Richards, George M. Weinstock, Richard K. Wilson, Richard A. Gibbs, W. James Kent, Webb Miller, David Haussler - Show less +12 more•Institutions (2)

University of California, Santa Cruz¹, Aarhus University²

01 Aug 2005-Genome Research

TL;DR: A comprehensive search for conserved elements in vertebrate genomes is conducted, using genome-wide multiple alignments of five vertebrate species (human, mouse, rat, chicken, and Fugu rubripes), using a two-state phylogenetic hidden Markov model (phylo-HMM).

...read moreread less

Abstract: We have conducted a comprehensive search for conserved elements in vertebrate genomes, using genome-wide multiple alignments of five vertebrate species (human, mouse, rat, chicken, and Fugu rubripes). Parallel searches have been performed with multiple alignments of four insect species (three species of Drosophila and Anopheles gambiae), two species of Caenorhabditis, and seven species of Saccharomyces. Conserved elements were identified with a computer program called phastCons, which is based on a two-state phylogenetic hidden Markov model (phylo-HMM). PhastCons works by fitting a phylo-HMM to the data by maximum likelihood, subject to constraints designed to calibrate the model across species groups, and then predicting conserved elements based on this model. The predicted elements cover roughly 3%-8% of the human genome (depending on the details of the calibration procedure) and substantially higher fractions of the more compact Drosophila melanogaster (37%-53%), Caenorhabditis elegans (18%-37%), and Saccharaomyces cerevisiae (47%-68%) genomes. From yeasts to vertebrates, in order of increasing genome size and general biological complexity, increasing fractions of conserved bases are found to lie outside of the exons of known protein-coding genes. In all groups, the most highly conserved elements (HCEs), by log-odds score, are hundreds or thousands of bases long. These elements share certain properties with ultraconserved elements, but they tend to be longer and less perfectly conserved, and they overlap genes of somewhat different functional categories. In vertebrates, HCEs are associated with the 3' UTRs of regulatory genes, stable gene deserts, and megabase-sized regions rich in moderately conserved noncoding sequences. Noncoding HCEs also show strong statistical evidence of an enrichment for RNA secondary structure.

...read moreread less

3,719 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse