scispace - formally typeset
Search or ask a question
Author

Dong Xu

Bio: Dong Xu is an academic researcher from University of Missouri. The author has contributed to research in topics: Protein structure prediction & Computer science. The author has an hindex of 67, co-authored 483 publications receiving 18242 citations. Previous affiliations of Dong Xu include University of Missouri–St. Louis & University of Missouri–Kansas City.


Papers
More filters
Journal ArticleDOI
14 Jan 2010-Nature
TL;DR: An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.
Abstract: Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70% more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78% of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75% of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.

3,743 citations

Journal ArticleDOI
TL;DR: Differences between the interfacial hydrogen bonding patterns and the intra-chain ones further substantiate the notion that protein complexes formed by rigid binding may be far away from the global minimum conformations.
Abstract: To understand further, and to utilize, the interactions across protein-protein interfaces, we carried out an analysis of the hydrogen bonds and of the salt bridges in a collection of 319 non-redundant protein-protein interfaces derived from high-quality X-ray structures. We found that the geometry of the hydrogen bonds across protein interfaces is generally less optimal and has a wider distribution than typically observed within the chains. This difference originates from the more hydrophilic side chains buried in the binding interface than in the folded monomer interior. Protein folding differs from protein binding. Whereas in folding practically all degrees of freedom are available to the chain to attain its optimal configuration, this is not the case for rigid binding, where the protein molecules are already folded, with only six degrees of translational and rotational freedom available to the chains to achieve their most favorable bound configuration. These constraints enforce many polar/charged residues buried in the interface to form weak hydrogen bonds with protein atoms, rather than strongly hydrogen bonding to the solvent. Since interfacial hydrogen bonds are weaker than the intra-chain ones to compete with the binding of water, more water molecules are involved in bridging hydrogen bond networks across the protein interface than in the protein interior. Interfacial water molecules both mediate non-complementary donor-donor or acceptor-acceptor pairs, and connect non-optimally oriented donor-acceptor pairs. These differences between the interfacial hydrogen bonding patterns and the intra-chain ones further substantiate the notion that protein complexes formed by rigid binding may be far away from the global minimum conformations. Moreover, we summarize the pattern of charge complementarity and of the conservation of hydrogen bond network across binding interfaces. We further illustrate the utility of this study in understanding the specificity of protein-protein associations, and hence in docking prediction and molecular (inhibitor) design.

435 citations

Journal ArticleDOI
TL;DR: Microarray data suggest that DEIRA cells efficiently coordinate their recovery by a complex network, within which both DNA repair and metabolic functions play critical roles, including a predicted distinct ATP-dependent DNA ligase and metabolic pathway switching that could prevent additional genomic damage elicited by metabolism-induced free radicals.
Abstract: Deinococcus radiodurans R1 (DEIRA) is a bacterium best known for its extreme resistance to the lethal effects of ionizing radiation, but the molecular mechanisms underlying this phenotype remain poorly understood. To define the repertoire of DEIRA genes responding to acute irradiation (15 kGy), transcriptome dynamics were examined in cells representing early, middle, and late phases of recovery by using DNA microarrays covering ≈94% of its predicted genes. At least at one time point during DEIRA recovery, 832 genes (28% of the genome) were induced and 451 genes (15%) were repressed 2-fold or more. The expression patterns of the majority of the induced genes resemble the previously characterized expression profile of recA after irradiation. DEIRA recA, which is central to genomic restoration after irradiation, is substantially up-regulated on DNA damage (early phase) and down-regulated before the onset of exponential growth (late phase). Many other genes were expressed later in recovery, displaying a growth-related pattern of induction. Genes induced in the early phase of recovery included those involved in DNA replication, repair, and recombination, cell wall metabolism, cellular transport, and many encoding uncharacterized proteins. Collectively, the microarray data suggest that DEIRA cells efficiently coordinate their recovery by a complex network, within which both DNA repair and metabolic functions play critical roles. Components of this network include a predicted distinct ATP-dependent DNA ligase and metabolic pathway switching that could prevent additional genomic damage elicited by metabolism-induced free radicals.

353 citations

Journal ArticleDOI
TL;DR: The expression patterns of genes implicated in nodulation, and also transcription factors, are investigated using both the Solexa sequence data and large-scale qRT-PCR, facilitating both basic and applied aspects of soybean research.
Abstract: *SUMMARY Soybean (Glycine max L.) is a major crop providing an important source of protein and oil, which can also be converted into biodiesel. A major milestone in soybean research was the recent sequencing of its genome. The sequence predicts 69 145 putative soybean genes, with 46 430 predicted with high confidence. In order to examine the expression of these genes, we utilized the Illumina Solexa platform to sequence cDNA derived from 14 conditions (tissues). The result is a searchable soybean gene expression atlas accessible through a browser (http://digbio.missouri.edu/soybean_atlas). The data provide experimental support for the transcription of 55 616 annotated genes and also demonstrate that 13 529 annotated soybean genes are putative pseudogenes, and 1736 currently unannotated sequences are transcribed. An analysis of this atlas reveals strong differences in gene expression patterns between different tissues, especially between root and aerial organs, but also reveals similarities between gene expression in other tissues, such as flower and leaf organs. In order to demonstrate the full utility of the atlas, we investigated the expression patterns of genes implicated in nodulation, and also transcription factors, using both the Solexa sequence data and large-scale qRT-PCR. The availability of the soybean gene expression atlas allowed a comparison with gene expression documented in the two model legume species, Medicago truncatula and Lotus japonicus, as well as data available for Arabidopsis thaliana, facilitating both basic and applied aspects of soybean research.

345 citations

Journal ArticleDOI
TL;DR: A new framework for representing a set of multi-dimensional gene expression data as a Minimum Spanning Tree (MST), a concept from the graph theory, which can overcome many of the problems faced by classical clustering algorithms.
Abstract: Motivation: Gene expression data clustering provides a powerful tool for studying functional relationships of genes in a biological process. Identifying correlated expression patterns of genes represents the basic challenge in this clustering problem. Results: This paper describes a new framework for representing a set of multi-dimensional gene expression data as a Minimum Spanning Tree (MST), a concept from the graph theory. A key property of this representation is that each cluster of the expression data corresponds to one subtree of the MST, which rigorously converts a multi-dimensional clustering problem to a tree partitioning problem. We have demonstrated that though the inter-data relationship is greatly simplified in the MST representation, no essential information is lost for the purpose of clustering. Two key advantages in representing a set of multi-dimensional data as an MST are: (1) the simple structure of a tree facilitates efficient implementations of rigorous clustering algorithms, which otherwise are highly computationally challenging; and (2) as an MST-based clustering does not depend on detailed geometric shape of a cluster, it can overcome many of the problems faced by classical clustering algorithms. Based on the MST representation, we have developed a number of rigorous and efficient clustering algorithms, including two with guaranteed global optimality. We have implemented these algorithms as a computer software EXpression data Clustering Analysis and VisualizATiOn Resource (EXCAVATOR). To demonstrate its effectiveness, we have tested it on three data sets, i.e. expression data from yeast Saccharomyces cerevisiae, expression data in response of human fibroblasts to serum, and Arabidopsis expression data in response to chitin elicitation. The test results are highly encouraging. Availability: EXCAVATOR is available on request from the authors.

312 citations


Cited by
More filters
01 May 1993
TL;DR: Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems.
Abstract: Three parallel algorithms for classical molecular dynamics are presented. The first assigns each processor a fixed subset of atoms; the second assigns each a fixed subset of inter-atomic forces to compute; the third assigns each a fixed spatial region. The algorithms are suitable for molecular dynamics models which can be difficult to parallelize efficiently—those with short-range forces where the neighbors of each atom change rapidly. They can be implemented on any distributed-memory parallel machine which allows for message-passing of data between independently executing processors. The algorithms are tested on a standard Lennard-Jones benchmark problem for system sizes ranging from 500 to 100,000,000 atoms on several parallel supercomputers--the nCUBE 2, Intel iPSC/860 and Paragon, and Cray T3D. Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems. For large problems, the spatial algorithm achieves parallel efficiencies of 90% and a 1840-node Intel Paragon performs up to 165 faster than a single Cray C9O processor. Trade-offs between the three algorithms and guidelines for adapting them to more complex molecular dynamics simulations are also discussed.

29,323 citations

28 Jul 2005
TL;DR: PfPMP1)与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作�ly.
Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1(PfPMP1)与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用,在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员,通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

18,940 citations

Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

01 Jan 2014
TL;DR: These standards of care are intended to provide clinicians, patients, researchers, payors, and other interested individuals with the components of diabetes care, treatment goals, and tools to evaluate the quality of care.
Abstract: XI. STRATEGIES FOR IMPROVING DIABETES CARE D iabetes is a chronic illness that requires continuing medical care and patient self-management education to prevent acute complications and to reduce the risk of long-term complications. Diabetes care is complex and requires that many issues, beyond glycemic control, be addressed. A large body of evidence exists that supports a range of interventions to improve diabetes outcomes. These standards of care are intended to provide clinicians, patients, researchers, payors, and other interested individuals with the components of diabetes care, treatment goals, and tools to evaluate the quality of care. While individual preferences, comorbidities, and other patient factors may require modification of goals, targets that are desirable for most patients with diabetes are provided. These standards are not intended to preclude more extensive evaluation and management of the patient by other specialists as needed. For more detailed information, refer to Bode (Ed.): Medical Management of Type 1 Diabetes (1), Burant (Ed): Medical Management of Type 2 Diabetes (2), and Klingensmith (Ed): Intensive Diabetes Management (3). The recommendations included are diagnostic and therapeutic actions that are known or believed to favorably affect health outcomes of patients with diabetes. A grading system (Table 1), developed by the American Diabetes Association (ADA) and modeled after existing methods, was utilized to clarify and codify the evidence that forms the basis for the recommendations. The level of evidence that supports each recommendation is listed after each recommendation using the letters A, B, C, or E.

9,618 citations

Journal ArticleDOI
TL;DR: A new method, based on chemical thermodynamics, is developed for automatic detection of macromolecular assemblies in the Protein Data Bank (PDB) entries that are the results of X-ray diffraction experiments, as found, biological units may be recovered at 80-90% success rate, which makesX-ray crystallography an important source of experimental data on macromolescular complexes and protein-protein interactions.

8,377 citations