scispace - formally typeset
Search or ask a question
Author

Jianlin Cheng

Bio: Jianlin Cheng is an academic researcher from University of Missouri. The author has contributed to research in topics: Protein structure prediction & Computer science. The author has an hindex of 55, co-authored 240 publications receiving 13909 citations. Previous affiliations of Jianlin Cheng include University of Central Florida & University of Missouri–St. Louis.


Papers
More filters
Journal ArticleDOI
14 Jan 2010-Nature
TL;DR: An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.
Abstract: Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70% more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78% of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75% of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.

3,743 citations

Journal ArticleDOI
TL;DR: SCRATCH is a server for predicting protein tertiary structure and structural features and includes predictors for secondary structure, relative solvent accessibility, disordered regions, domains, disulfide bridges, single mutation stability, residue contacts versus average, individual residue contacts and tertiaries structure.
Abstract: SCRATCH is a server for predicting protein tertiary structure and structural features. The SCRATCH software suite includes predictors for secondary structure, relative solvent accessibility, disordered regions, domains, disulfide bridges, single mutation stability, residue contacts versus average, individual residue contacts and tertiary structure. The user simply provides an amino acid sequence and selects the desired predictions, then submits to the server. Results are emailed to the user. The server is available at http://www.igb.uci.edu/servers/psss.html.

914 citations

Journal ArticleDOI
Predrag Radivojac1, Wyatt T. Clark1, Tal Ronnen Oron2, Alexandra M. Schnoes3, Tobias Wittkop2, Artem Sokolov4, Artem Sokolov5, Kiley Graim4, Christopher S. Funk6, Karin Verspoor6, Asa Ben-Hur4, Gaurav Pandey7, Gaurav Pandey8, Jeffrey M. Yunes8, Ameet Talwalkar8, Susanna Repo9, Susanna Repo8, Michael L Souza8, Damiano Piovesan10, Rita Casadio10, Zheng Wang11, Jianlin Cheng11, Hai Fang, Julian Gough12, Patrik Koskinen13, Petri Törönen13, Jussi Nokso-Koivisto13, Liisa Holm13, Domenico Cozzetto14, Daniel W. A. Buchan14, Kevin Bryson14, David T. Jones14, Bhakti Limaye15, Harshal Inamdar15, Avik Datta15, Sunitha K Manjari15, Rajendra Joshi15, Meghana Chitale16, Daisuke Kihara16, Andreas Martin Lisewski17, Serkan Erdin17, Eric Venner17, Olivier Lichtarge17, Robert Rentzsch14, Haixuan Yang18, Alfonso E. Romero18, Prajwal Bhat18, Alberto Paccanaro18, Tobias Hamp19, Rebecca Kaßner19, Stefan Seemayer19, Esmeralda Vicedo19, Christian Schaefer19, Dominik Achten19, Florian Auer19, Ariane Boehm19, Tatjana Braun19, Maximilian Hecht19, Mark Heron19, Peter Hönigschmid19, Thomas A. Hopf19, Stefanie Kaufmann19, Michael Kiening19, Denis Krompass19, Cedric Landerer19, Yannick Mahlich19, Manfred Roos19, Jari Björne20, Tapio Salakoski20, Andrew Wong21, Hagit Shatkay22, Hagit Shatkay21, Fanny Gatzmann23, Ingolf Sommer23, Mark N. Wass24, Michael J.E. Sternberg24, Nives Škunca, Fran Supek, Matko Bošnjak, Panče Panov, Sašo Džeroski, Tomislav Šmuc, Yiannis A. I. Kourmpetis25, Yiannis A. I. Kourmpetis26, Aalt D. J. van Dijk26, Cajo J. F. ter Braak26, Yuanpeng Zhou27, Qingtian Gong27, Xinran Dong27, Weidong Tian27, Marco Falda28, Paolo Fontana, Enrico Lavezzo28, Barbara Di Camillo28, Stefano Toppo28, Liang Lan29, Nemanja Djuric29, Yuhong Guo29, Slobodan Vucetic29, Amos Marc Bairoch30, Amos Marc Bairoch31, Michal Linial32, Patricia C. Babbitt3, Steven E. Brenner8, Christine A. Orengo14, Burkhard Rost19, Sean D. Mooney2, Iddo Friedberg33 
TL;DR: Today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets, and there is considerable need for improvement of currently available tools.
Abstract: Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.

859 citations

Journal ArticleDOI
21 Dec 2005-Proteins
TL;DR: The method can accurately predict protein stability changes using primary sequence information only, it is applicable to many situations where the tertiary structure is unknown, overcoming a major limitation of previous methods which require tertiary information.
Abstract: Accurate prediction of protein stability changes resulting from single amino acid mutations is important for understanding protein structures and designing new proteins. We use support vector machines to predict protein stability changes for single amino acid mutations leveraging both sequence and structural information. We evaluate our approach using cross-validation methods on a large dataset of single amino acid mutations. When only the sign of the stability changes is considered, the predictive method achieves 84% accuracy-a significant improvement over previously published results. Moreover, the experimental results show that the prediction accuracy obtained using sequence alone is close to the accuracy obtained using tertiary structure information. Because our method can accurately predict protein stability changes using primary sequence information only, it is applicable to many situations where the tertiary structure is unknown, overcoming a major limitation of previous methods which require tertiary information. The web server for predictions of protein stability changes upon mutations (MUpro), software, and datasets are available at http://www.igb.uci.edu/servers/servers.html.

801 citations

Journal ArticleDOI
Yuxiang Jiang1, Tal Ronnen Oron2, Wyatt T. Clark3, Asma R. Bankapur4  +153 moreInstitutions (59)
TL;DR: The second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function, was conducted by as mentioned in this paper. But the results of the CAFA2 assessment are limited.
Abstract: BACKGROUND: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. RESULTS: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. CONCLUSIONS: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.

330 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This historical survey compactly summarizes relevant work, much of it from the previous millennium, review deep supervised learning, unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

14,635 citations

Journal ArticleDOI
15 Jul 2021-Nature
TL;DR: For example, AlphaFold as mentioned in this paper predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture. But the accuracy is limited by the fact that no homologous structure is available.
Abstract: Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort1–4, the structures of around 100,000 unique proteins have been determined5, but this represents a small fraction of the billions of known protein sequences6,7. Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence—the structure prediction component of the ‘protein folding problem’8—has been an important open research problem for more than 50 years9. Despite recent progress10–14, existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14)15, demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm. AlphaFold predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture.

10,601 citations

01 Jan 2002

9,314 citations

Proceedings ArticleDOI
13 Aug 2016
TL;DR: Node2vec as mentioned in this paper learns a mapping of nodes to a low-dimensional space of features that maximizes the likelihood of preserving network neighborhoods of nodes by using a biased random walk procedure.
Abstract: Prediction tasks over nodes and edges in networks require careful effort in engineering features used by learning algorithms. Recent research in the broader field of representation learning has led to significant progress in automating prediction by learning the features themselves. However, present feature learning approaches are not expressive enough to capture the diversity of connectivity patterns observed in networks. Here we propose node2vec, an algorithmic framework for learning continuous feature representations for nodes in networks. In node2vec, we learn a mapping of nodes to a low-dimensional space of features that maximizes the likelihood of preserving network neighborhoods of nodes. We define a flexible notion of a node's network neighborhood and design a biased random walk procedure, which efficiently explores diverse neighborhoods. Our algorithm generalizes prior work which is based on rigid notions of network neighborhoods, and we argue that the added flexibility in exploring neighborhoods is the key to learning richer representations. We demonstrate the efficacy of node2vec over existing state-of-the-art techniques on multi-label classification and link prediction in several real-world networks from diverse domains. Taken together, our work represents a new way for efficiently learning state-of-the-art task-independent representations in complex networks.

7,072 citations

01 Aug 2000
TL;DR: Assessment of medical technology in the context of commercialization with Bioentrepreneur course, which addresses many issues unique to biomedical products.
Abstract: BIOE 402. Medical Technology Assessment. 2 or 3 hours. Bioentrepreneur course. Assessment of medical technology in the context of commercialization. Objectives, competition, market share, funding, pricing, manufacturing, growth, and intellectual property; many issues unique to biomedical products. Course Information: 2 undergraduate hours. 3 graduate hours. Prerequisite(s): Junior standing or above and consent of the instructor.

4,833 citations