scispace - formally typeset
Search or ask a question
Author

Jing Zhang

Other affiliations: University of Texas at Dallas
Bio: Jing Zhang is an academic researcher from University of Texas Southwestern Medical Center. The author has contributed to research in topics: Genus & Subspecies. The author has an hindex of 9, co-authored 26 publications receiving 218 citations. Previous affiliations of Jing Zhang include University of Texas at Dallas.

Papers
More filters
Journal ArticleDOI
11 Nov 2021-Science
TL;DR: The structures of many eukaryotic protein complexes are unknown, and there are likely many protein-protein interactions not yet identified as mentioned in this paper, but these structures play critical roles in biology.
Abstract: Protein-protein interactions play critical roles in biology, but the structures of many eukaryotic protein complexes are unknown, and there are likely many interactions not yet identified. We take ...

215 citations

Journal ArticleDOI
TL;DR: The genomes of 250 representative species of skippers reveal rampant inconsistencies between their current classification and a genome-based phylogeny, and a dated genomic tree is used to define tribes, overhaul genera, and display convergence in wing patterns that fooled researchers for decades.
Abstract: For centuries, biologists have used phenotypes to infer evolution. For decades, a handful of gene markers have given us a glimpse of the genotype to combine with phenotypic traits. Today, we can sequence entire genomes from hundreds of species and gain yet closer scrutiny. To illustrate the power of genomics, we have chosen skipper butterflies (Hesperiidae). The genomes of 250 representative species of skippers reveal rampant inconsistencies between their current classification and a genome-based phylogeny. We use a dated genomic tree to define tribes (six new) and subtribes (six new), to overhaul genera (nine new) and subgenera (three new), and to display convergence in wing patterns that fooled researchers for decades. We find that many skippers with similar appearance are distantly related, and several skippers with distinct morphology are close relatives. These conclusions are strongly supported by different genomic regions and are consistent with some morphological traits. Our work is a forerunner to genomic biology shaping biodiversity research.

82 citations

Journal ArticleDOI
TL;DR: The gypsy moth genome is reported and the genetic features that distinguish gypsy moths from other Lepidoptera are explored, and insights into gene-expression changes of the EGM in response to virus infection are presented, which may assist in the design of viral bioinsecticides.
Abstract: Since its accidental introduction to Massachusetts in the late 1800s, the European gypsy moth (EGM; Lymantria dispar dispar) has become a major defoliator in North American forests. However, in part because females are flightless, the spread of the EGM across the United States and Canada has been relatively slow over the past 150 years. In contrast, females of the Asian gypsy moth (AGM; Lymantria dispar asiatica) subspecies have fully developed wings and can fly, thereby posing a serious economic threat if populations are established in North America. To explore the genetic determinants of these phenotypic differences, we sequenced and annotated a draft genome of L. dispar and used it to identify genetic variation between EGM and AGM populations. The 865-Mb gypsy moth genome is the largest Lepidoptera genome sequenced to date and encodes ∼13,300 proteins. Gene ontology analyses of EGM and AGM samples revealed divergence between these populations in genes enriched for several gene ontology categories related to muscle adaptation, chemosensory communication, detoxification of food plant foliage, and immunity. These genetic differences likely contribute to variations in flight ability, chemical sensing, and pathogen interactions among EGM and AGM populations. Finally, we use our new genomic and transcriptomic tools to provide insights into genome-wide gene-expression changes of the gypsy moth after viral infection. Characterizing the immunological response of gypsy moths to virus infection may aid in the improvement of virus-based bioinsecticides currently used to control larval populations.

33 citations

Journal ArticleDOI
TL;DR: The firetip skipper butterfly (Pyrrhopyginae) was used as a model for phylogenetic analysis of mimetic butterflies as discussed by the authors, and the authors sequenced and analyzed whole genomes of nearly 120 representative species.
Abstract: Biologists marvel at the powers of adaptive convergence, when distantly related animals look alike. While mimetic wing patterns of butterflies have fooled predators for millennia, entomologists inferred that mimics were distant relatives despite similar appearance. However, the obverse question has not been frequently asked. Who are the close relatives of mimetic butterflies and what are their features? As opposed to close convergence, divergence from a non-mimetic relative would also be extreme. When closely related animals look unalike, it is challenging to pair them. Genomic analysis promises to elucidate evolutionary relationships and shed light on molecular mechanisms of divergence. We chose the firetip skipper butterfly as a model due to its phenotypic diversity and abundance of mimicry. We sequenced and analysed whole genomes of nearly 120 representative species. Genomes partitioned this subfamily Pyrrhopyginae into five tribes (1 new), 23 genera and, additionally, 22 subgenera (10 new). The largest tribe Pyrrhopygini is divided into four subtribes (three new). Surprisingly, we found five cases where a uniquely patterned butterfly was formerly placed in a genus of its own and separately from its close relatives. In several cases, extreme and rapid phenotypic divergence involved not only wing patterns but also the structure of the male genitalia. The visually striking wing pattern difference between close relatives frequently involves disappearance or suffusion of spots and colour exchange between orange and blue. These differences (in particular, a transition between unspotted black and striped wings) happen recurrently on a short evolutionary time scale, and are therefore probably achieved by a small number of mutations.

32 citations

Posted ContentDOI
04 Nov 2019-bioRxiv
TL;DR: All 845 species of butterflies recorded from North America north of Mexico are sequence, revealing the pattern of diversification and adaptation occurring in this phylogenetic lineage as it has spread over the continent, which cannot be seen on a sample of selected species.
Abstract: Never before have we had the luxury of choosing a continent, picking a large phylogenetic group of animals, and obtaining genomic data for its every species. Here, we sequence all 845 species of butterflies recorded from North America north of Mexico. Our comprehensive approach reveals the pattern of diversification and adaptation occurring in this phylogenetic lineage as it has spread over the continent, which cannot be seen on a sample of selected species. We observe bursts of diversification that generated taxonomic ranks: subfamily, tribe, subtribe, genus, and species. The older burst around 70 Mya resulted in the butterfly subfamilies, with the major evolutionary inventions being unique phenotypic traits shaped by high positive selection and gene duplications. The recent burst around 5 Mya is caused by explosive radiation in diverse butterfly groups associated with diversification in transcription and mRNA regulation, morphogenesis, and mate selection. Rapid radiation correlates with more frequent introgression of speciation-promoting and beneficial genes among radiating species. Radiation and extinction patterns over the last 100 million years suggest the following general model of animal evolution. A population spreads over the land, adapts to various conditions through mutations, and diversifies into several species. Occasional hybridization between these species results in accumulation of beneficial alleles in one, which eventually survives, while others become extinct. Not only butterflies, but also the hominids may have followed this path.

25 citations


Cited by
More filters
10 Dec 2007
TL;DR: The experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation.
Abstract: EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms. Our experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation.

1,528 citations

Journal ArticleDOI
TL;DR: STRING as mentioned in this paper collects and integrates protein-protein interactions, both physical interactions as well as functional associations, from a number of sources: automated text mining of the scientific literature, computational interaction predictions from co-expression, conserved genomic context, databases of interaction experiments and known complexes/pathways from curated sources.
Abstract: Abstract Much of the complexity within cells arises from functional and regulatory interactions among proteins. The core of these interactions is increasingly known, but novel interactions continue to be discovered, and the information remains scattered across different database resources, experimental modalities and levels of mechanistic detail. The STRING database (https://string-db.org/) systematically collects and integrates protein–protein interactions—both physical interactions as well as functional associations. The data originate from a number of sources: automated text mining of the scientific literature, computational interaction predictions from co-expression, conserved genomic context, databases of interaction experiments and known complexes/pathways from curated sources. All of these interactions are critically assessed, scored, and subsequently automatically transferred to less well-studied organisms using hierarchical orthology information. The data can be accessed via the website, but also programmatically and via bulk downloads. The most recent developments in STRING (version 12.0) are: (i) it is now possible to create, browse and analyze a full interaction network for any novel genome of interest, by submitting its complement of encoded proteins, (ii) the co-expression channel now uses variational auto-encoders to predict interactions, and it covers two new sources, single-cell RNA-seq and experimental proteomics data and (iii) the confidence in each experimentally derived interaction is now estimated based on the detection method used, and communicated to the user in the web-interface. Furthermore, STRING continues to enhance its facilities for functional enrichment analysis, which are now fully available also for user-submitted genomes.

127 citations

Journal ArticleDOI
21 Jul 2022-Science
TL;DR: Wang et al. as mentioned in this paper proposed two deep learning methods to design proteins that contain prespecified functional sites, which can enable the scaffolding of desired functional residues within a well-folded designed protein.
Abstract: The binding and catalytic functions of proteins are generally mediated by a small number of functional residues held in place by the overall protein structure. Here, we describe deep learning approaches for scaffolding such functional sites without needing to prespecify the fold or secondary structure of the scaffold. The first approach, “constrained hallucination,” optimizes sequences such that their predicted structures contain the desired functional site. The second approach, “inpainting,” starts from the functional site and fills in additional sequence and structure to create a viable protein scaffold in a single forward pass through a specifically trained RoseTTAFold network. We use these two methods to design candidate immunogens, receptor traps, metalloproteins, enzymes, and protein-binding proteins and validate the designs using a combination of in silico and experimental tests. Description Designing around function Protein design has had success in finding sequences that fold into a desired conformation, but designing functional proteins remains challenging. Wang et al. describe two deep-learning methods to design proteins that contain prespecified functional sites. In the first, they found sequences predicted to fold into stable structures that contain the functional site. In the second, they retrained a structure prediction network to recover the sequence and full structure of a protein given only the functional site. The authors demonstrate their methods by designing proteins containing a variety of functional motifs. —VV Deep-learning methods enable the scaffolding of desired functional residues within a well-folded designed protein.

118 citations

Posted ContentDOI
06 Sep 2022-bioRxiv
TL;DR: A sequence-to-sequence transformer with invariant geometric input processing layers achieves 51% native sequence recovery on structurally held-out backbones with 72% recovery for buried residues, an overall improvement of almost 10 percentage points over existing methods.
Abstract: We consider the problem of predicting a protein sequence from its backbone atom coordinates. Machine learning approaches to this problem to date have been limited by the number of available experimentally determined protein structures. We augment training data by nearly three orders of magnitude by predicting structures for 12M protein sequences using AlphaFold2. Trained with this additional data, a sequence-to-sequence transformer with invariant geometric input processing layers achieves 51% native sequence recovery on structurally held-out backbones with 72% recovery for buried residues, an overall improvement of almost 10 percentage points over existing methods. The model generalizes to a variety of more complex tasks including design of protein complexes, partially masked structures, binding interfaces, and multiple states.

109 citations