Home
/
Authors
/
Wang Jun

Author

Wang Jun

Bio: Wang Jun is an academic researcher. The author has contributed to research in topics: Phylogenetic tree & Genome evolution. The author has an hindex of 2, co-authored 2 publications receiving 1466 citations.

Topics: Phylogenetic tree, Genome evolution, Coalescent theory, Neognathae, Neoaves ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Whole-genome analyses resolve early branches in the tree of life of modern birds

[...]

Erich D. Jarvis¹, Siavash Mirarab², Andre J. Aberer³, Bo Li⁴, Bo Li⁵, Bo Li⁶, Peter Houde⁷, Cai Li⁵, Cai Li⁶, Simon Y. W. Ho⁸, Brant C. Faircloth⁹, Benoit Nabholz, Jason T. Howard¹, Alexander Suh¹⁰, Claudia C. Weber¹⁰, Rute R. da Fonseca¹¹, Jianwen Li, Fang Zhang Zhang, Hui Li, Long Zhou, Nitish Narula⁷, Nitish Narula¹², Liang Liu¹³, Ganesh Ganapathy¹, Bastien Boussau, Shamsuzzoha Bayzid², Volodymyr Zavidovych¹, Sankar Subramanian¹⁴, Toni Gabaldón¹⁵, Salvador Capella-Gutierrez, Jaime Huerta-Cepas, Bhanu Rekepalli¹⁶, Bhanu Rekepalli¹⁷, Kasper Munch¹⁸, Mikkel H. Schierup¹⁸, Bent E. K. Lindow¹¹, Wesley C. Warren¹⁹, David A. Ray, Richard E. Green²⁰, Michael William Bruford²¹, Xiangjiang Zhan²¹, Xiangjiang Zhan²², Andrew Dixon, Shengbin Li⁴, Ning Li²³, Yinhua Huang²³, Elizabeth P. Derryberry²⁴, Elizabeth P. Derryberry²⁵, Mads F. Bertelsen²⁶, Frederick H. Sheldon²⁵, Robb T. Brumfield²⁵, Claudio V. Mello²⁷, Claudio V. Mello²⁸, Peter V. Lovell²⁸, Morgan Wirthlin²⁸, Maria Paula Cruz Schneider²⁷, Francisco Prosdocimi²⁷, José Alfredo Samaniego¹¹, Amhed Missael Vargas Velazquez¹¹, Alonzo Alfaro-Núñez¹¹, Paula F. Campos¹¹, Bent O. Petersen²⁹, Thomas Sicheritz-Pontén²⁹, An Pas, Thomas L. Bailey, R. Paul Scofield³⁰, Michael Bunce³¹, David M. Lambert¹⁴, Qi Zhou, Polina L. Perelman³², Amy C. Driskell³³, Beth Shapiro²⁰, Zijun Xiong, Yongli Zeng, Shiping Liu, Zhenyu Li, Binghang Liu, Kui Wu, Jin Xiao, Xiong Yinqi, Quiemei Zheng, Yong Zhang, Huanming Yang, Jian Wang, Linnéa Smeds¹⁰, Frank E. Rheindt³⁴, Michael J. Braun³⁵, Jon Fjeldså¹¹, Ludovic Orlando¹¹, F. Keith Barker⁶, Knud A. Jønsson⁶, Warren E. Johnson³³, Klaus-Peter Koepfli³³, Stephen J. O'Brien³⁶, David Haussler, Oliver A. Ryder, Carsten Rahbek⁶, Eske Willerslev¹¹, Gary R. Graves⁶, Gary R. Graves³³, Travis C. Glenn¹³, John E. McCormack³⁷, Dave Burt³⁸, Hans Ellegren¹⁰, Per Alström, Scott V. Edwards³⁹, Alexandros Stamatakis³, David P. Mindell⁴⁰, Joel Cracraft⁶, Edward L. Braun⁴¹, Tandy Warnow⁴², Tandy Warnow², Wang Jun, M. Thomas P. Gilbert³¹, M. Thomas P. Gilbert⁶, Guojie Zhang¹¹, Guojie Zhang⁵ - Show less +113 more•Institutions (42)

Duke University¹, University of Texas at Austin², Heidelberg Institute for Theoretical Studies³, Xi'an Jiaotong University⁴, Beijing Genomics Institute⁵, American Museum of Natural History⁶, New Mexico State University⁷, University of Sydney⁸, University of California⁹, Uppsala University¹⁰, University of Copenhagen¹¹, Okinawa Institute of Science and Technology¹², University of Georgia¹³, Griffith University¹⁴, Catalan Institution for Research and Advanced Studies¹⁵, Oak Ridge National Laboratory¹⁶, Joint Institute for Nuclear Research¹⁷, Aarhus University¹⁸, Washington University in St. Louis¹⁹, University of California, Santa Cruz²⁰, Cardiff University²¹, Kunming Institute of Zoology²², China Agricultural University²³, Tulane University²⁴, Louisiana State University²⁵, Copenhagen Zoo²⁶, Federal University of Pará²⁷, Oregon Health & Science University²⁸, Technical University of Denmark²⁹, Canterbury Museum³⁰, Curtin University³¹, Novosibirsk State University³², Smithsonian Institution³³, National University of Singapore³⁴, National Museum of Natural History³⁵, Nova Southeastern University³⁶, Occidental College³⁷, University of Edinburgh³⁸, Harvard University³⁹, University of California, San Francisco⁴⁰, University of Florida⁴¹, University of Illinois at Urbana–Champaign⁴²

12 Dec 2014-Science

TL;DR: A genome-scale phylogenetic analysis of 48 species representing all orders of Neoaves recovered a highly resolved tree that confirms previously controversial sister or close relationships and identifies the first divergence in Neoaves, two groups the authors named Passerea and Columbea.

...read moreread less

Abstract: To better determine the history of modern birds, we performed a genome-scale phylogenetic analysis of 48 species representing all orders of Neoaves using phylogenomic methods created to handle genome-scale data. We recovered a highly resolved tree that confirms previously controversial sister or close relationships. We identified the first divergence in Neoaves, two groups we named Passerea and Columbea, representing independent lineages of diverse and convergently evolved land and water bird species. Among Passerea, we infer the common ancestor of core landbirds to have been an apex predator and confirm independent gains of vocal learning. Among Columbea, we identify pigeons and flamingoes as belonging to sister clades. Even with whole genomes, some of the earliest branches in Neoaves proved challenging to resolve, which was best explained by massive protein-coding sequence convergence and high levels of incomplete lineage sorting that occurred during a rapid radiation after the Cretaceous-Paleogene mass extinction event about 66 million years ago.

...read moreread less

1,624 citations

Journal Article•DOI•

Phylogenomic analyses data of the avian phylogenomics project.

[...]

Erich D. Jarvis¹, Siavash Mirarab², Andre J. Aberer³, Bo Li⁴, Bo Li⁵, Peter Houde⁶, Cai Li⁴, Simon Y. W. Ho⁷, Brant C. Faircloth⁸, Brant C. Faircloth⁹, Benoit Nabholz¹⁰, Jason T. Howard¹, Alexander Suh¹¹, Claudia C. Weber¹¹, Rute R. da Fonseca⁴, Alonzo Alfaro-Núñez⁴, Nitish Narula¹², Nitish Narula⁶, Liang Liu¹³, Dave Burt¹⁴, Hans Ellegren¹¹, Scott V. Edwards¹⁵, Alexandros Stamatakis¹⁶, Alexandros Stamatakis³, David P. Mindell¹⁷, Joel Cracraft¹⁸, Edward L. Braun¹⁹, Tandy Warnow², Wang Jun, M. Thomas P. Gilbert²⁰, M. Thomas P. Gilbert⁴, Guojie Zhang⁴ - Show less +28 more•Institutions (20)

Howard Hughes Medical Institute¹, University of Texas at Austin², Heidelberg Institute for Theoretical Studies³, University of Copenhagen⁴, Xi'an Jiaotong University⁵, New Mexico State University⁶, University of Sydney⁷, University of California, Los Angeles⁸, Louisiana State University⁹, University of Montpellier¹⁰, Uppsala University¹¹, Okinawa Institute of Science and Technology¹², University of Georgia¹³, University of Edinburgh¹⁴, Harvard University¹⁵, Karlsruhe Institute of Technology¹⁶, University of California, San Francisco¹⁷, American Museum of Natural History¹⁸, University of Florida¹⁹, Curtin University²⁰

12 Feb 2015-GigaScience

TL;DR: The Avian Phylogenomics Project is the largest vertebrate phylogenomics project to date and the sequence, alignment, and tree data are expected to accelerate analyses in phylogenomics and other related areas.

...read moreread less

Abstract: Background: Determining the evolutionary relationships among the major lineages of extant birds has been one of the biggest challenges in systematic biology. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders. We used these genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomic analyses. Findings: Here we present the datasets associated with the phylogenomic analyses, which include sequence alignment files consisting of nucleotides, amino acids, indels, and transposable elements, as well as tree files containing gene trees and species trees. Inferring an accurate phylogeny required generating: 1) A well annotated data set across species based on genome synteny; 2) Alignments with unaligned or incorrectly overaligned sequences filtered out; and 3) Diverse data sets, including genes and their inferred trees, indels, and transposable elements. Our total evidence nucleotide tree (TENT) data set (consisting of exons, introns, and UCEs) gave what we consider our most reliable species tree when using the concatenation-based ExaML algorithm or when using statistical binning with the coalescence-based MP-EST algorithm (which we refer to as MP-EST*). Other data sets, such as the coding sequence of some exons, revealed other properties of genome evolution, namely convergence. Conclusions: The Avian Phylogenomics Project is the largest vertebrate phylogenomics project to date that we are aware of. The sequence, alignment, and tree data are expected to accelerate analyses in phylogenomics and other related areas.

...read moreread less

84 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

[...]

Nuala A. O'Leary¹, Mathew W. Wright¹, J. Rodney Brister¹, Stacy Ciufo¹, Diana Haddad¹, Richard McVeigh¹, Bhanu Rajput¹, Barbara Robbertse¹, Brian Smith-White¹, Danso Ako-adjei¹, Alexander Astashyn¹, Azat Badretdin¹, Yiming Bao¹, Olga Blinkova¹, Vyacheslav Brover¹, Vyacheslav Chetvernin¹, Jinna Choi¹, Eric Cox¹, Olga Ermolaeva¹, Catherine M. Farrell¹, Tamara Goldfarb¹, Tripti Gupta¹, Daniel H. Haft¹, Eneida L. Hatcher¹, Wratko Hlavina¹, Vinita Joardar¹, Vamsi K. Kodali¹, Wenjun Li¹, Donna Maglott¹, Patrick Masterson¹, Kelly M. McGarvey¹, Michael R. Murphy¹, Kathleen O'Neill¹, Shashikant Pujar¹, Sanjida H. Rangwala¹, Daniel Rausch¹, Lillian D. Riddick¹, Conrad L. Schoch¹, Andrei Shkeda¹, Susan S. Storz¹, Hanzhen Sun¹, Françoise Thibaud-Nissen¹, Igor Tolstoy¹, Raymond E. Tully¹, Anjana R. Vatsan¹, Craig Wallin¹, David Webb¹, Wendy Wu¹, Melissa J. Landrum¹, Avi Kimchi¹, Tatiana Tatusova¹, Michael DiCuccio¹, Paul Kitts¹, Terence Murphy¹, Kim D. Pruitt¹ - Show less +51 more•Institutions (1)

National Institutes of Health¹

04 Jan 2016-Nucleic Acids Research

TL;DR: The approach to utilizing available RNA-Seq and other data types in the authors' manual curation process for vertebrate, plant, and other species is summarized, and a new direction for prokaryotic genomes and protein name management is described.

...read moreread less

Abstract: The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55,000 organisms (>4800 viruses, >40,000 prokaryotes and >10,000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.

...read moreread less

4,104 citations

Journal Article•DOI•

PartitionFinder 2: New Methods for Selecting Partitioned Models of Evolution for Molecular and Morphological Phylogenetic Analyses.

[...]

Robert Lanfear¹, Paul B. Frandsen², April M. Wright³, Tereza Senfeld⁴, Brett Calcott⁵ - Show less +1 more•Institutions (5)

Australian National University¹, Smithsonian Institution², Iowa State University³, Macquarie University⁴, University of Sydney⁵

23 Dec 2016-Molecular Biology and Evolution

TL;DR: PartitionFinder 2 is a program for automatically selecting best-fit partitioning schemes and models of evolution for phylogenetic analyses that includes the ability to analyze morphological datasets, new methods to analyze genome-scale datasets, and new output formats to facilitate interoperability with downstream software.

...read moreread less

Abstract: PartitionFinder 2 is a program for automatically selecting best-fit partitioning schemes and models of evolution for phylogenetic analyses. PartitionFinder 2 is substantially faster and more efficient than version 1, and incorporates many new methods and features. These include the ability to analyze morphological datasets, new methods to analyze genome-scale datasets, new output formats to facilitate interoperability with downstream software, and many new models of molecular evolution. PartitionFinder 2 is freely available under an open source license and works on Windows, OSX, and Linux operating systems. It can be downloaded from www.robertlanfear.com/partitionfinder. The source code is available at https://github.com/brettc/partitionfinder.

...read moreread less

3,445 citations

Journal Article•DOI•

BUSCO Applications from Quality Assessments to Gene Prediction and Phylogenomics.

[...]

Robert M. Waterhouse¹, Mathieu Seppey¹, Felipe A. Simão¹, Mosè Manni¹, Panagiotis Ioannidis¹, Guennadi Klioutchnikov¹, Evgenia V. Kriventseva¹, Evgeny M. Zdobnov¹ - Show less +4 more•Institutions (1)

Swiss Institute of Bioinformatics¹

01 Mar 2018-Molecular Biology and Evolution

TL;DR: This work presents BUSCO v3 with example analyses that highlight the wide‐ranging utility of BUSCO assessments, which extend beyond quality control of genomics data sets to applications in comparative genomics analyses, gene predictor training, metagenomics, and phylogenomics.

...read moreread less

Abstract: Genomics promises comprehensive surveying of genomes and metagenomes, but rapidly changing technologies and expanding data volumes make evaluation of completeness a challenging task. Technical sequencing quality metrics can be complemented by quantifying completeness of genomic data sets in terms of the expected gene content of Benchmarking Universal Single-Copy Orthologs (BUSCO, http://busco.ezlab.org). The latest software release implements a complete refactoring of the code to make it more flexible and extendable to facilitate high-throughput assessments. The original six lineage assessment data sets have been updated with improved species sampling, 34 new subsets have been built for vertebrates, arthropods, fungi, and prokaryotes that greatly enhance resolution, and data sets are now also available for nematodes, protists, and plants. Here, we present BUSCO v3 with example analyses that highlight the wide-ranging utility of BUSCO assessments, which extend beyond quality control of genomics data sets to applications in comparative genomics analyses, gene predictor training, metagenomics, and phylogenomics.

...read moreread less

1,575 citations

Journal Article•DOI•

ETE 3: Reconstruction, analysis and visualization of phylogenomic data

[...]

Jaime Huerta-Cepas, François Serra¹, Peer Bork²•Institutions (2)

Pompeu Fabra University¹, Molecular Medicine Partnership Unit²

26 Feb 2016-Molecular Biology and Evolution

TL;DR: The Environment for Tree Exploration v3 is presented, featuring numerous improvements in the underlying library of methods, and providing a novel set of standalone tools to perform common tasks in comparative genomics and phylogenetics.

...read moreread less

Abstract: The Environment for Tree Exploration (ETE) is a computational framework that simplifies the reconstruction, analysis, and visualization of phylogenetic trees and multiple sequence alignments. Here, we present ETE v3, featuring numerous improvements in the underlying library of methods, and providing a novel set of standalone tools to perform common tasks in comparative genomics and phylogenetics. The new features include (i) building gene-based and supermatrix-based phylogenies using a single command, (ii) testing and visualizing evolutionary models, (iii) calculating distances between trees of different size or including duplications, and (iv) providing seamless integration with the NCBI taxonomy database. ETE is freely available at http://etetoolkit.org.

...read moreread less

1,452 citations

Journal Article•DOI•

ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees.

[...]

Chao Zhang¹, Maryam Rabiee¹, Erfan Sayyari¹, Siavash Mirarab¹•Institutions (1)

University of California, San Diego¹

08 May 2018-BMC Bioinformatics

TL;DR: ASTRAL-III is a faster version of the ASTRAL method for phylogenetic reconstruction and can scale up to 10,000 species and removes low support branches from gene trees, resulting in improved accuracy.

...read moreread less

Abstract: Evolutionary histories can be discordant across the genome, and such discordances need to be considered in reconstructing the species phylogeny. ASTRAL is one of the leading methods for inferring species trees from gene trees while accounting for gene tree discordance. ASTRAL uses dynamic programming to search for the tree that shares the maximum number of quartet topologies with input gene trees, restricting itself to a predefined set of bipartitions. We introduce ASTRAL-III, which substantially improves the running time of ASTRAL-II and guarantees polynomial running time as a function of both the number of species (n) and the number of genes (k). ASTRAL-III limits the bipartition constraint set (X) to grow at most linearly with n and k. Moreover, it handles polytomies more efficiently than ASTRAL-II, exploits similarities between gene trees better, and uses several techniques to avoid searching parts of the search space that are mathematically guaranteed not to include the optimal tree. The asymptotic running time of ASTRAL-III in the presence of polytomies is $O\left ((nk)^{1.726} D \right)$ where D=O(nk) is the sum of degrees of all unique nodes in input trees. The running time improvements enable us to test whether contracting low support branches in gene trees improves the accuracy by reducing noise. In extensive simulations, we show that removing branches with very low support (e.g., below 10%) improves accuracy while overly aggressive filtering is harmful. We observe on a biological avian phylogenomic dataset of 14K genes that contracting low support branches greatly improve results. ASTRAL-III is a faster version of the ASTRAL method for phylogenetic reconstruction and can scale up to 10,000 species. With ASTRAL-III, low support branches can be removed, resulting in improved accuracy.

...read moreread less

1,261 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse