Author
Chengran Zhou
Other affiliations: China Agricultural University
Bio: Chengran Zhou is an academic researcher from Sichuan University. The author has contributed to research in topics: Phylogenomics & Cricetidae. The author has an hindex of 9, co-authored 15 publications receiving 1932 citations. Previous affiliations of Chengran Zhou include China Agricultural University.
Topics: Phylogenomics, Cricetidae, Medicine, Gene, Illumina dye sequencing
Papers
More filters
••
Commonwealth Scientific and Industrial Research Organisation1, Rutgers University2, Heidelberg Institute for Theoretical Studies3, University of Jena4, University of Bonn5, Naturhistorisches Museum6, University of Vienna7, University of Tsukuba8, Landcare Research9, Johns Hopkins University10, University of Hamburg11, Ehime University12, Florida Museum of Natural History13, Staatliches Museum für Naturkunde Stuttgart14, Macquarie University15, Australian National University16, National Evolutionary Synthesis Center17, American Museum of Natural History18, University of Memphis19, University of Guadalajara20, Bavarian Academy of Sciences and Humanities21, Natural History Museum22, Karlsruhe Institute of Technology23, California Academy of Sciences24, South China Agricultural University25, North Carolina State University26, Hokkaido University27
TL;DR: The phylogeny of all major insect lineages reveals how and when insects diversified and provides a comprehensive reliable scaffold for future comparative analyses of evolutionary innovations among insects.
Abstract: Insects are the most speciose group of animals, but the phylogenetic relationships of many major lineages remain unresolved. We inferred the phylogeny of insects from 1478 protein-coding genes. Phylogenomic analyses of nucleotide and amino acid sequences, with site-specific nucleotide or domain-specific amino acid substitution models, produced statistically robust and congruent results resolving previously controversial phylogenetic relations hips. We dated the origin of insects to the Early Ordovician [~479 million years ago (Ma)], of insect flight to the Early Devonian (~406 Ma), of major extant lineages to the Mississippian (~345 Ma), and the major diversification of holometabolous insects to the Early Cretaceous. Our phylogenomic study provides a comprehensive reliable scaffold for future comparative analyses of evolutionary innovations among insects.
1,998 citations
••
University of Copenhagen1, Kunming Institute of Zoology2, Landcare Research3, University of Otago4, Chinese Academy of Sciences5, University of Oxford6, Bowling Green State University7, Otago Polytechnic8, Copenhagen Zoo9, University of Washington10, University of La Rochelle11, Canterbury of New Zealand12, University of Tasmania13, University of Western Australia14, University of Missouri–St. Louis15, Natural Environment Research Council16, University of Giessen17, Percy FitzPatrick Institute of African Ornithology18, Hastings Entertainment19, National Institute of Water and Atmospheric Research20, Norwegian University of Science and Technology21, Museum of New Zealand Te Papa Tongarewa22, Ludwig Maximilian University of Munich23, Wellington Management Company24, Massey University25, La Trobe University26, National Scientific and Technical Research Council27
TL;DR: A novel dataset of 19 high-coverage genomes that, together with 2 previously published genomes, encompass all extant penguin species and demonstrates that the genus Aptenodytes is basal and sister to all other extant Penguin genera, providing intriguing new insights into the adaptation of penguins to Antarctica.
Abstract: Penguins (Sphenisciformes) are a remarkable order of flightless wing-propelled diving seabirds distributed widely across the southern hemisphere. They share a volant common ancestor with Procellariiformes close to the Cretaceous-Paleogene boundary (66 million years ago) and subsequently lost the ability to fly but enhanced their diving capabilities. With ∼20 species among 6 genera, penguins range from the tropical Galapagos Islands to the oceanic temperate forests of New Zealand, the rocky coastlines of the sub-Antarctic islands, and the sea ice around Antarctica. To inhabit such diverse and extreme environments, penguins evolved many physiological and morphological adaptations. However, they are also highly sensitive to climate change. Therefore, penguins provide an exciting target system for understanding the evolutionary processes of speciation, adaptation, and demography. Genomic data are an emerging resource for addressing questions about such processes.
89 citations
••
TL;DR: A new Illumina‐based pipeline to recover full‐length COI barcodes from mixed arthropod samples and more species‐level operational taxonomic units (OTUs) from bulk insect samples is presented, with fewer untraceable (novel) OTUs.
Abstract: Summary
Metabarcoding of mixed arthropod samples for biodiversity assessment has mostly been carried out on the 454 GS FLX sequencer (Roche, Branford, Connecticut, USA), due to its ability to produce long reads (≥400 bp) that are believed to allow higher taxonomic resolution. The Illumina sequencing platforms, with their much higher throughputs, could potentially reduce sequencing costs and improve sequence quality, but the associated shorter read length (typically <150 bp) has deterred their usage in next-generation-sequencing (NGS)-based analyses of eukaryotic biodiversity, which often utilize standard barcode markers (e.g. COI, rbcL, matK, ITS) that are hundreds of nucleotides long.
We present a new Illumina-based pipeline to recover full-length COI barcodes from mixed arthropod samples. Our new assembly program, SOAPBarcode, a variant of the genome assembly program SOAPdenovo, uses paired-end reads of the standard COI barcode region as anchors to extract the correct pathways (sequences) out of otherwise chaotic ‘de Bruijn graphs’, which are caused by the presence of large numbers of COI homologs of high sequence similarity.
Two bulk insect samples of known species composition have been analysed in a recently published 454 metabarcoding study (Yu et al. 2012) and are re-analysed by our analysis pipeline. Compared to the results of Roche 454 (c. 400-bp reads), our pipeline recovered full-length COI barcodes (658 bp) and 17–31% more species-level operational taxonomic units (OTUs) from bulk insect samples, with fewer untraceable (novel) OTUs. On the other hand, our PCR-based pipeline also revealed higher rates of contamination across samples, due to the Illumina's increased sequencing depth. On balance, the assembled full-length barcodes and increased OTU recovery rates resulted in more resolved taxonomic assignments and more accurate beta diversity estimation.
The HiSeq 2000 and the SOAPBarcode pipeline together can achieve more accurate biodiversity assessment at a much reduced sequencing cost in metabarcoding analyses. However, greater precaution is needed to prevent cross-sample contamination during field preparation and laboratory operation because of greater ability to detect non-target DNA amplicons present in low-copy numbers.
56 citations
01 Jan 2014
TL;DR: A phylogenetic analysis of protein-coding genes from all major insect orders and close relatives was performed by Misof et al. as discussed by the authors, who used this resolved phylogenetic tree together with fossil analysis to date the origin of insects to ~479 million years ago and to resolve longcontroversial subjects in insect phylogeny.
Abstract: Toward an insect evolution resolution Insects are the most diverse group of animals, with the largest number of species. However, many of the evolutionary relationships between insect species have been controversial and difficult to resolve. Misof et al. performed a phylogenomic analysis of protein-coding genes from all major insect orders and close relatives, resolving the placement of taxa. The authors used this resolved phylogenetic tree together with fossil analysis to date the origin of insects to ~479 million years ago and to resolve long-controversial subjects in insect phylogeny. Science, this issue p. 763 The phylogeny of all major insect lineages reveals how and when insects diversified. Insects are the most speciose group of animals, but the phylogenetic relationships of many major lineages remain unresolved. We inferred the phylogeny of insects from 1478 protein-coding genes. Phylogenomic analyses of nucleotide and amino acid sequences, with site-specific nucleotide or domain-specific amino acid substitution models, produced statistically robust and congruent results resolving previously controversial phylogenetic relations hips. We dated the origin of insects to the Early Ordovician [~479 million years ago (Ma)], of insect flight to the Early Devonian (~406 Ma), of major extant lineages to the Mississippian (~345 Ma), and the major diversification of holometabolous insects to the Early Cretaceous. Our phylogenomic study provides a comprehensive reliable scaffold for future comparative analyses of evolutionary innovations among insects.
52 citations
••
TL;DR: Genomic and proteomic analyses of reconstructed bacterial draft genomes from all seven uncultured phylotypes in F1RT indicate that its constituent microbes cooperate in both cellulose-degrading and other important metabolic processes.
Abstract: Reaching a comprehensive understanding of how nature solves the problem of degrading recalcitrant biomass may eventually allow development of more efficient biorefining processes. Here we interpret genomic and proteomic information generated from a cellulolytic microbial consortium (termed F1RT) enriched from soil. Analyses of reconstructed bacterial draft genomes from all seven uncultured phylotypes in F1RT indicate that its constituent microbes cooperate in both cellulose-degrading and other important metabolic processes. Support for cellulolytic inter-species cooperation came from the discovery of F1RT microbes that encode and express complimentary enzymatic inventories that include both extracellular cellulosomes and secreted free-enzyme systems. Metabolic reconstruction of the seven F1RT phylotypes predicted a wider genomic rationale as to how this particular community functions as well as possible reasons as to why biomass conversion in nature relies on a structured and cooperative microbial community.
42 citations
Cited by
More filters
••
TL;DR: PartitionFinder 2 is a program for automatically selecting best-fit partitioning schemes and models of evolution for phylogenetic analyses that includes the ability to analyze morphological datasets, new methods to analyze genome-scale datasets, and new output formats to facilitate interoperability with downstream software.
Abstract: PartitionFinder 2 is a program for automatically selecting best-fit partitioning schemes and models of evolution for phylogenetic analyses. PartitionFinder 2 is substantially faster and more efficient than version 1, and incorporates many new methods and features. These include the ability to analyze morphological datasets, new methods to analyze genome-scale datasets, new output formats to facilitate interoperability with downstream software, and many new models of molecular evolution. PartitionFinder 2 is freely available under an open source license and works on Windows, OSX, and Linux operating systems. It can be downloaded from www.robertlanfear.com/partitionfinder. The source code is available at https://github.com/brettc/partitionfinder.
3,445 citations
01 Jan 2011
TL;DR: The sheer volume and scope of data posed by this flood of data pose a significant challenge to the development of efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data.
Abstract: Rapid improvements in sequencing and array-based platforms are resulting in a flood of diverse genome-wide data, including data from exome and whole-genome sequencing, epigenetic surveys, expression profiling of coding and noncoding RNAs, single nucleotide polymorphism (SNP) and copy number profiling, and functional assays. Analysis of these large, diverse data sets holds the promise of a more comprehensive understanding of the genome and its relation to human disease. Experienced and knowledgeable human review is an essential component of this process, complementing computational approaches. This calls for efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data. However, the sheer volume and scope of data pose a significant challenge to the development of such tools.
2,187 citations
••
TL;DR: RAxML-NG is presented, a from-scratch re-implementation of the established greedy tree search algorithm of RAxML/ExaML, which offers improved accuracy, flexibility, speed, scalability, and usability compared with RAx ML/ exaML.
Abstract: MOTIVATION Phylogenies are important for fundamental biological research, but also have numerous applications in biotechnology, agriculture and medicine. Finding the optimal tree under the popular maximum likelihood (ML) criterion is known to be NP-hard. Thus, highly optimized and scalable codes are needed to analyze constantly growing empirical datasets. RESULTS We present RAxML-NG, a from-scratch re-implementation of the established greedy tree search algorithm of RAxML/ExaML. RAxML-NG offers improved accuracy, flexibility, speed, scalability, and usability compared with RAxML/ExaML. On taxon-rich datasets, RAxML-NG typically finds higher-scoring trees than IQTree, an increasingly popular recent tool for ML-based phylogenetic inference (although IQ-Tree shows better stability). Finally, RAxML-NG introduces several new features, such as the detection of terraces in tree space and the recently introduced transfer bootstrap support metric. AVAILABILITY AND IMPLEMENTATION The code is available under GNU GPL at https://github.com/amkozlov/raxml-ng. RAxML-NG web service (maintained by Vital-IT) is available at https://raxml-ng.vital-it.ch/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
1,765 citations
••
TL;DR: This work presents BUSCO v3 with example analyses that highlight the wide‐ranging utility of BUSCO assessments, which extend beyond quality control of genomics data sets to applications in comparative genomics analyses, gene predictor training, metagenomics, and phylogenomics.
Abstract: Genomics promises comprehensive surveying of genomes and metagenomes, but rapidly changing technologies and expanding data volumes make evaluation of completeness a challenging task. Technical sequencing quality metrics can be complemented by quantifying completeness of genomic data sets in terms of the expected gene content of Benchmarking Universal Single-Copy Orthologs (BUSCO, http://busco.ezlab.org). The latest software release implements a complete refactoring of the code to make it more flexible and extendable to facilitate high-throughput assessments. The original six lineage assessment data sets have been updated with improved species sampling, 34 new subsets have been built for vertebrates, arthropods, fungi, and prokaryotes that greatly enhance resolution, and data sets are now also available for nematodes, protists, and plants. Here, we present BUSCO v3 with example analyses that highlight the wide-ranging utility of BUSCO assessments, which extend beyond quality control of genomics data sets to applications in comparative genomics analyses, gene predictor training, metagenomics, and phylogenomics.
1,575 citations
••
TL;DR: The results of the divergence time analyses are congruent with the palaeontological record, supporting a major radiation of crown birds in the wake of the Cretaceous–Palaeogene (K–Pg) mass extinction.
Abstract: Although reconstruction of the phylogeny of living birds has progressed tremendously in the last decade, the evolutionary history of Neoaves--a clade that encompasses nearly all living bird species--remains the greatest unresolved challenge in dinosaur systematics. Here we investigate avian phylogeny with an unprecedented scale of data: >390,000 bases of genomic sequence data from each of 198 species of living birds, representing all major avian lineages, and two crocodilian outgroups. Sequence data were collected using anchored hybrid enrichment, yielding 259 nuclear loci with an average length of 1,523 bases for a total data set of over 7.8 × 10(7) bases. Bayesian and maximum likelihood analyses yielded highly supported and nearly identical phylogenetic trees for all major avian lineages. Five major clades form successive sister groups to the rest of Neoaves: (1) a clade including nightjars, other caprimulgiforms, swifts, and hummingbirds; (2) a clade uniting cuckoos, bustards, and turacos with pigeons, mesites, and sandgrouse; (3) cranes and their relatives; (4) a comprehensive waterbird clade, including all diving, wading, and shorebirds; and (5) a comprehensive landbird clade with the enigmatic hoatzin (Opisthocomus hoazin) as the sister group to the rest. Neither of the two main, recently proposed Neoavian clades--Columbea and Passerea--were supported as monophyletic. The results of our divergence time analyses are congruent with the palaeontological record, supporting a major radiation of crown birds in the wake of the Cretaceous-Palaeogene (K-Pg) mass extinction.
1,094 citations