Home
/
Authors
/
Dong Xu

Author

Dong Xu

Other affiliations: University of Missouri–St. Louis, University of Missouri–Kansas City, South China University of Technology ...read more

Bio: Dong Xu is an academic researcher from University of Missouri. The author has contributed to research in topics: Protein structure prediction & Computer science. The author has an hindex of 67, co-authored 483 publications receiving 18242 citations. Previous affiliations of Dong Xu include University of Missouri–St. Louis & University of Missouri–Kansas City.

Topics: Protein structure prediction, Computer science, Deep learning, Medicine, Gene ...read more

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1992

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Genome sequence of the palaeopolyploid soybean

[...]

Jeremy Schmutz, Steven B. Cannon¹, Jessica A. Schlueter², Jessica A. Schlueter³, Jianxin Ma², Therese Mitros⁴, William Nelson⁵, David L. Hyten¹, Qijian Song⁶, Qijian Song¹, Jay J. Thelen⁷, Jianlin Cheng⁷, Dong Xu⁷, Uffe Hellsten⁸, Gregory D. May⁹, Yeisoo Yu⁵, Tetsuya Sakurai, Taishi Umezawa, Madan K. Bhattacharyya¹⁰, Devinder Sandhu¹¹, Babu Valliyodan⁷, Erika Lindquist⁸, Myron Peto¹, David Grant¹, Shengqiang Shu⁸, David Goodstein⁸, Kerrie Barry⁸, Montona Futrell-Griggs², Brian Abernathy², Jianchang Du², Zhixi Tian², Liucun Zhu², Navdeep Gill², Trupti Joshi⁷, Marc Libault⁷, Ananad Sethuraman, Xue-Cheng Zhang⁷, Kazuo Shinozaki, Henry T. Nguyen⁷, Rod A. Wing⁵, Perry B. Cregan¹, James E. Specht¹², Jane Grimwood⁸, Daniel S. Rokhsar⁸, Gary Stacey⁷, Randy C. Shoemaker¹, Scott A. Jackson² - Show less +43 more•Institutions (12)

Agricultural Research Service¹, Purdue University², University of North Carolina at Charlotte³, University of California, Berkeley⁴, University of Arizona⁵, University of Maryland, College Park⁶, University of Missouri⁷, Joint Genome Institute⁸, National Center for Genome Resources⁹, Iowa State University¹⁰, University of Wisconsin–Stevens Point¹¹, University of Nebraska–Lincoln¹²

14 Jan 2010-Nature

TL;DR: An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.

...read moreread less

Abstract: Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70% more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78% of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75% of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.

...read moreread less

3,743 citations

Journal Article•DOI•

Hydrogen bonds and salt bridges across protein-protein interfaces.

[...]

Dong Xu¹, Chung-Jung Tsai, Ruth Nussinov•Institutions (1)

Science Applications International Corporation¹

01 Sep 1997-Protein Engineering

TL;DR: Differences between the interfacial hydrogen bonding patterns and the intra-chain ones further substantiate the notion that protein complexes formed by rigid binding may be far away from the global minimum conformations.

...read moreread less

Abstract: To understand further, and to utilize, the interactions across protein-protein interfaces, we carried out an analysis of the hydrogen bonds and of the salt bridges in a collection of 319 non-redundant protein-protein interfaces derived from high-quality X-ray structures. We found that the geometry of the hydrogen bonds across protein interfaces is generally less optimal and has a wider distribution than typically observed within the chains. This difference originates from the more hydrophilic side chains buried in the binding interface than in the folded monomer interior. Protein folding differs from protein binding. Whereas in folding practically all degrees of freedom are available to the chain to attain its optimal configuration, this is not the case for rigid binding, where the protein molecules are already folded, with only six degrees of translational and rotational freedom available to the chains to achieve their most favorable bound configuration. These constraints enforce many polar/charged residues buried in the interface to form weak hydrogen bonds with protein atoms, rather than strongly hydrogen bonding to the solvent. Since interfacial hydrogen bonds are weaker than the intra-chain ones to compete with the binding of water, more water molecules are involved in bridging hydrogen bond networks across the protein interface than in the protein interior. Interfacial water molecules both mediate non-complementary donor-donor or acceptor-acceptor pairs, and connect non-optimally oriented donor-acceptor pairs. These differences between the interfacial hydrogen bonding patterns and the intra-chain ones further substantiate the notion that protein complexes formed by rigid binding may be far away from the global minimum conformations. Moreover, we summarize the pattern of charge complementarity and of the conservation of hydrogen bond network across binding interfaces. We further illustrate the utility of this study in understanding the specificity of protein-protein associations, and hence in docking prediction and molecular (inhibitor) design.

...read moreread less

435 citations

Journal Article•DOI•

Transcriptome dynamics of Deinococcus radiodurans recovering from ionizing radiation

[...]

Yongqing Liu¹, Jizhong Zhou, Marina V. Omelchenko, Alexander S. Beliaev, Amudhan Venkateswaran, Julia Stair, Liyou Wu, Dorothea K. Thompson, Dong Xu, Igor B. Rogozin, Elena K. Gaidamakova, Min Zhai, Kira S. Makarova, Eugene V. Koonin, Michael J. Daly - Show less +11 more•Institutions (1)

Oak Ridge National Laboratory¹

01 Apr 2003-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: Microarray data suggest that DEIRA cells efficiently coordinate their recovery by a complex network, within which both DNA repair and metabolic functions play critical roles, including a predicted distinct ATP-dependent DNA ligase and metabolic pathway switching that could prevent additional genomic damage elicited by metabolism-induced free radicals.

...read moreread less

Abstract: Deinococcus radiodurans R1 (DEIRA) is a bacterium best known for its extreme resistance to the lethal effects of ionizing radiation, but the molecular mechanisms underlying this phenotype remain poorly understood. To define the repertoire of DEIRA genes responding to acute irradiation (15 kGy), transcriptome dynamics were examined in cells representing early, middle, and late phases of recovery by using DNA microarrays covering ≈94% of its predicted genes. At least at one time point during DEIRA recovery, 832 genes (28% of the genome) were induced and 451 genes (15%) were repressed 2-fold or more. The expression patterns of the majority of the induced genes resemble the previously characterized expression profile of recA after irradiation. DEIRA recA, which is central to genomic restoration after irradiation, is substantially up-regulated on DNA damage (early phase) and down-regulated before the onset of exponential growth (late phase). Many other genes were expressed later in recovery, displaying a growth-related pattern of induction. Genes induced in the early phase of recovery included those involved in DNA replication, repair, and recombination, cell wall metabolism, cellular transport, and many encoding uncharacterized proteins. Collectively, the microarray data suggest that DEIRA cells efficiently coordinate their recovery by a complex network, within which both DNA repair and metabolic functions play critical roles. Components of this network include a predicted distinct ATP-dependent DNA ligase and metabolic pathway switching that could prevent additional genomic damage elicited by metabolism-induced free radicals.

...read moreread less

353 citations

Journal Article•DOI•

An integrated transcriptome atlas of the crop model Glycine max, and its use in comparative analyses in plants

[...]

Marc Libault¹, Andrew Farmer², Trupti Joshi¹, Kaori Takahashi¹, Raymond J. Langley², Levi D. Franklin¹, Ji He, Dong Xu¹, Gregory D. May², Gary Stacey¹ - Show less +6 more•Institutions (2)

University of Missouri¹, National Center for Genome Resources²

01 Jul 2010-Plant Journal

TL;DR: The expression patterns of genes implicated in nodulation, and also transcription factors, are investigated using both the Solexa sequence data and large-scale qRT-PCR, facilitating both basic and applied aspects of soybean research.

...read moreread less

Abstract: *SUMMARY Soybean (Glycine max L.) is a major crop providing an important source of protein and oil, which can also be converted into biodiesel. A major milestone in soybean research was the recent sequencing of its genome. The sequence predicts 69 145 putative soybean genes, with 46 430 predicted with high confidence. In order to examine the expression of these genes, we utilized the Illumina Solexa platform to sequence cDNA derived from 14 conditions (tissues). The result is a searchable soybean gene expression atlas accessible through a browser (http://digbio.missouri.edu/soybean_atlas). The data provide experimental support for the transcription of 55 616 annotated genes and also demonstrate that 13 529 annotated soybean genes are putative pseudogenes, and 1736 currently unannotated sequences are transcribed. An analysis of this atlas reveals strong differences in gene expression patterns between different tissues, especially between root and aerial organs, but also reveals similarities between gene expression in other tissues, such as flower and leaf organs. In order to demonstrate the full utility of the atlas, we investigated the expression patterns of genes implicated in nodulation, and also transcription factors, using both the Solexa sequence data and large-scale qRT-PCR. The availability of the soybean gene expression atlas allowed a comparison with gene expression documented in the two model legume species, Medicago truncatula and Lotus japonicus, as well as data available for Arabidopsis thaliana, facilitating both basic and applied aspects of soybean research.

...read moreread less

345 citations

Journal Article•DOI•

Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees.

[...]

Ying Xu¹, Victor Olman¹, Dong Xu¹•Institutions (1)

Oak Ridge National Laboratory¹

01 Apr 2002-Bioinformatics

TL;DR: A new framework for representing a set of multi-dimensional gene expression data as a Minimum Spanning Tree (MST), a concept from the graph theory, which can overcome many of the problems faced by classical clustering algorithms.

...read moreread less

Abstract: Motivation: Gene expression data clustering provides a powerful tool for studying functional relationships of genes in a biological process. Identifying correlated expression patterns of genes represents the basic challenge in this clustering problem. Results: This paper describes a new framework for representing a set of multi-dimensional gene expression data as a Minimum Spanning Tree (MST), a concept from the graph theory. A key property of this representation is that each cluster of the expression data corresponds to one subtree of the MST, which rigorously converts a multi-dimensional clustering problem to a tree partitioning problem. We have demonstrated that though the inter-data relationship is greatly simplified in the MST representation, no essential information is lost for the purpose of clustering. Two key advantages in representing a set of multi-dimensional data as an MST are: (1) the simple structure of a tree facilitates efficient implementations of rigorous clustering algorithms, which otherwise are highly computationally challenging; and (2) as an MST-based clustering does not depend on detailed geometric shape of a cluster, it can overcome many of the problems faced by classical clustering algorithms. Based on the MST representation, we have developed a number of rigorous and efficient clustering algorithms, including two with guaranteed global optimality. We have implemented these algorithms as a computer software EXpression data Clustering Analysis and VisualizATiOn Resource (EXCAVATOR). To demonstrate its effectiveness, we have tested it on three data sets, i.e. expression data from yeast Saccharomyces cerevisiae, expression data in response of human fibroblasts to serum, and Arabidopsis expression data in response to chitin elicitation. The test results are highly encouraging. Availability: EXCAVATOR is available on request from the authors.

...read moreread less

312 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107

Collapse

Cited by

PDF

Open Access

More filters

Fast parallel algorithms for short-range molecular dynamics

[...]

Steven J. Plimpton¹•Institutions (1)

Sandia National Laboratories¹

01 May 1993

TL;DR: Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems.

...read moreread less

Abstract: Three parallel algorithms for classical molecular dynamics are presented. The first assigns each processor a fixed subset of atoms; the second assigns each a fixed subset of inter-atomic forces to compute; the third assigns each a fixed spatial region. The algorithms are suitable for molecular dynamics models which can be difficult to parallelize efficiently—those with short-range forces where the neighbors of each atom change rapidly. They can be implemented on any distributed-memory parallel machine which allows for message-passing of data between independently executing processors. The algorithms are tested on a standard Lennard-Jones benchmark problem for system sizes ranging from 500 to 100,000,000 atoms on several parallel supercomputers--the nCUBE 2, Intel iPSC/860 and Paragon, and Cray T3D. Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems. For large problems, the spatial algorithm achieves parallel efficiencies of 90% and a 1840-node Intel Paragon performs up to 165 faster than a single Cray C9O processor. Trade-offs between the three algorithms and guidelines for adapting them to more complex molecular dynamics simulations are also discussed.

...read moreread less

29,323 citations

疟原虫var基因转换速率变化导致抗原变异[英]／Paul H, Robert P, Christodoulou Z, et al//Proc Natl Acad Sci U S A

[...]

宁北芳, 朱淮民

28 Jul 2005

TL;DR: PfPMP1）与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用，在黏附及免疫逃避中起关键的作�ly.

...read moreread less

Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1（PfPMP1）与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用，在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员，通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

...read moreread less

18,940 citations

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Standards of Medical Care in Diabetes

[...]

Harry J. Morris

01 Jan 2014

TL;DR: These standards of care are intended to provide clinicians, patients, researchers, payors, and other interested individuals with the components of diabetes care, treatment goals, and tools to evaluate the quality of care.

...read moreread less

Abstract: XI. STRATEGIES FOR IMPROVING DIABETES CARE D iabetes is a chronic illness that requires continuing medical care and patient self-management education to prevent acute complications and to reduce the risk of long-term complications. Diabetes care is complex and requires that many issues, beyond glycemic control, be addressed. A large body of evidence exists that supports a range of interventions to improve diabetes outcomes. These standards of care are intended to provide clinicians, patients, researchers, payors, and other interested individuals with the components of diabetes care, treatment goals, and tools to evaluate the quality of care. While individual preferences, comorbidities, and other patient factors may require modification of goals, targets that are desirable for most patients with diabetes are provided. These standards are not intended to preclude more extensive evaluation and management of the patient by other specialists as needed. For more detailed information, refer to Bode (Ed.): Medical Management of Type 1 Diabetes (1), Burant (Ed): Medical Management of Type 2 Diabetes (2), and Klingensmith (Ed): Intensive Diabetes Management (3). The recommendations included are diagnostic and therapeutic actions that are known or believed to favorably affect health outcomes of patients with diabetes. A grading system (Table 1), developed by the American Diabetes Association (ADA) and modeled after existing methods, was utilized to clarify and codify the evidence that forms the basis for the recommendations. The level of evidence that supports each recommendation is listed after each recommendation using the letters A, B, C, or E.

...read moreread less

9,618 citations

Journal Article•DOI•

Inference of macromolecular assemblies from crystalline state.

[...]

E. Krissinel¹, Kim Henrick¹•Institutions (1)

European Bioinformatics Institute¹

21 Sep 2007-Journal of Molecular Biology

TL;DR: A new method, based on chemical thermodynamics, is developed for automatic detection of macromolecular assemblies in the Protein Data Bank (PDB) entries that are the results of X-ray diffraction experiments, as found, biological units may be recovered at 80-90% success rate, which makesX-ray crystallography an important source of experimental data on macromolescular complexes and protein-protein interactions.

...read moreread less

8,377 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse