Home
/
Authors
/
John C. Wootton

Author

John C. Wootton

Bio: John C. Wootton is an academic researcher from National Institutes of Health. The author has contributed to research in topics: Genome & Plasmodium falciparum. The author has an hindex of 30, co-authored 40 publications receiving 10545 citations.

Topics: Genome, Plasmodium falciparum, Gene, Gene mapping, Matrix (mathematics) ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Alignment

[...]

Charles E. Lawrence¹, Stephen F. Altschul¹, Mark S. Boguski¹, Jun Liu², Andrew F. Neuwald¹, John C. Wootton¹ - Show less +2 more•Institutions (2)

National Institutes of Health¹, Harvard University²

08 Oct 1993-Science

TL;DR: A mathematical definition of this "local multiple alignment" problem suitable for full computer automation has been used to develop a new and sensitive algorithm, based on the statistical method of iterative sampling, that finds an optimized local alignment model for N sequences in N-linear time, requiring only seconds on current workstations.

...read moreread less

Abstract: A wealth of protein and DNA sequence data is being generated by genome projects and other sequencing efforts. A crucial barrier to deciphering these sequences and understanding the relations among them is the difficulty of detecting subtle local residue patterns common to multiple sequences. Such patterns frequently reflect similar molecular structures and biological properties. A mathematical definition of this "local multiple alignment" problem suitable for full computer automation has been used to develop a new and sensitive algorithm, based on the statistical method of iterative sampling. This algorithm finds an optimized local alignment model for N sequences in N-linear time, requiring only seconds on current workstations, and allows the simultaneous detection and optimization of multiple patterns and pattern repeats. The method is illustrated as applied to helix-turn-helix proteins, lipocalins, and prenyltransferases.

...read moreread less

1,991 citations

Journal Article•DOI•

Mutations in the P. falciparum Digestive Vacuole Transmembrane Protein PfCRT and Evidence for Their Role in Chloroquine Resistance

[...]

David A. Fidock¹, Takashi Nomura¹, Angela K. Talley¹, Roland A. Cooper¹, Sergey M. Dzekunov², Michael T. Ferdig¹, Lyann M. B. Ursos², Amar Bir Singh Sidhu³, Bronwen Naudé¹, Kirk W. Deitsch¹, Xin-zhuan Su¹, John C. Wootton¹, Paul D. Roepe², Thomas E. Wellems¹ - Show less +10 more•Institutions (3)

National Institutes of Health¹, University of Washington², Albert Einstein College of Medicine³

01 Oct 2000-Molecular Cell

TL;DR: The determinant of verapamil-reversible chloroquine resistance (CQR) in a Plasmodium falciparum genetic cross maps to a 36 kb segment of chromosome 7 that harbors a 13-exon gene, pfcrt, having point mutations that associate completely with CQR in parasite lines from Asia, Africa, and South America.

...read moreread less

1,358 citations

Journal Article•DOI•

Protein database searches using compositionally adjusted substitution matrices

[...]

Stephen F. Altschul¹, John C. Wootton¹, E. Michael Gertz¹, Richa Agarwala¹, Aleksandr Morgulis¹, Alejandro A. Schäffer¹, Yi-Kuo Yu¹ - Show less +3 more•Institutions (1)

National Institutes of Health¹

01 Oct 2005-FEBS Journal

TL;DR: This work has recently developed a general procedure for transforming a standard matrix into one appropriate for the comparison of two sequences with arbitrary, and possibly differing compositions.

...read moreread less

Abstract: Almost all protein database search methods use amino acid substitution matrices for scoring, optimizing, and assessing the statistical significance of sequence alignments. Much care and effort has therefore gone into constructing substitution matrices, and the quality of search results can depend strongly upon the choice of the proper matrix. A long-standing problem has been the comparison of sequences with biased amino acid compositions, for which standard substitution matrices are not optimal. To address this problem, we have recently developed a general procedure for transforming a standard matrix into one appropriate for the comparison of two sequences with arbitrary, and possibly differing compositions. Such adjusted matrices yield, on average, improved alignments and alignment scores when applied to the comparison of proteins with markedly biased compositions. Here we review the application of compositionally adjusted matrices and consider whether they may also be applied fruitfully to general purpose protein sequence database searches, in which related sequence pairs do not necessarily have strong compositional biases. Although it is not advisable to apply compositional adjustment indiscriminately, we describe several simple criteria under which invoking such adjustment is on average beneficial. In a typical database search, at least one of these criteria is satisfied by over half the related sequence pairs. Compositional substitution matrix adjustment is now available in NCBI's protein-protein version of BLAST.

...read moreread less

1,017 citations

Journal Article•DOI•

Issues in searching molecular sequence databases.

[...]

Stephen F. Altschul¹, Mark S. Boguski¹, Warren Gish¹, John C. Wootton¹•Institutions (1)

National Institutes of Health¹

01 Feb 1994-Nature Genetics

TL;DR: Here, a number of issues are considered, including the choice of scoring systems, the statistical significance of alignments, the masking of uninformative or potentially confounding sequence regions, the nature and extent of sequence redundancy in the databases and network access to similarity search services.

...read moreread less

Abstract: Sequence similarity search programs are versatile tools for the molecular biologist, frequently able to identify possible DNA coding regions and to provide clues to gene and protein structure and function. While much attention had been paid to the precise algorithms these programs employ and to their relative speeds, there is a constellation of associated issues that are equally important to realize the full potential of these methods. Here, we consider a number of these issues, including the choice of scoring systems, the statistical significance of alignments, the masking of uninformative or potentially confounding sequence regions, the nature and extent of sequence redundancy in the databases and network access to similarity search services.

...read moreread less

807 citations

Book Chapter•DOI•

Analysis of compositionally biased regions in sequence databases.

[...]

John C. Wootton¹, Scott Federhen•Institutions (1)

National Institutes of Health¹

01 Jan 1996-Methods in Enzymology

TL;DR: For genomic studies, it is essential to view compositional bias in the context of many types of other features, such as recognizable functional sites, transcripts, coding sequences, and homologies, which are being integrated into software packages that have graphic multilevel browsing facilities and include zoom functions.

...read moreread less

Abstract: Publisher Summary This chapter discusses the analysis of compositionally biased region in sequence databases Programs sequence (SEG) and protein sequence (PSEG) are tuned for amino acid sequences and nucleotide sequence (NSEG) for nucleotide sequences The programs can be applied to either individual sequences, including whole chromosomes if appropriate, or entire sequence databases Compositional complexity is based only on residue composition, regardless of the patterns or periodicity of sequence repetitiveness This contrasts with some alternative methods that use counts of k-grams to define residue patterns and clustering Complexity, pattern, and periodicity are distinct abstract attributes of simple sequences For genomic studies, it is essential to view compositional bias in the context of many types of other features, such as recognizable functional sites, transcripts, coding sequences, and homologies For this purpose, the SEG family of programs is being integrated into software packages, or workbenches, that have graphic multilevel browsing facilities and include zoom functions

...read moreread less

786 citations

1
2
3
4
…
5
6
7
8

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

[...]

Stephen F. Altschul¹, Thomas L. Madden, Alejandro A. Schäffer¹, Jinghui Zhang, Zheng Zhang², Webb Miller², David J. Lipman - Show less +3 more•Institutions (2)

National Institutes of Health¹, Pennsylvania State University²

01 Sep 1997-Nucleic Acids Research

TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.

...read moreread less

Abstract: The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSIBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

...read moreread less

70,111 citations

Journal Article•DOI•

Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice

[...]

Julie D. Thompson, Desmond G. Higgins, Toby J. Gibson

11 Nov 1994-Nucleic Acids Research

TL;DR: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved and modifications are incorporated into a new program, CLUSTAL W, which is freely available.

...read moreread less

Abstract: The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individual weights are assigned to each sequence in a partial alignment in order to down-weight near-duplicate sequences and up-weight the most divergent ones. Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned. Thirdly, residue-specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure. Fourthly, positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions. These modifications are incorporated into a new program, CLUSTAL W which is freely available.

...read moreread less

63,427 citations

Book•

Data Mining: Concepts and Techniques

[...]

Jiawei Han¹, Micheline Kamber², Jian Pei²•Institutions (2)

University of Illinois at Urbana–Champaign¹, Simon Fraser University²

08 Sep 2000

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.

...read moreread less

Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

...read moreread less

23,600 citations

疟原虫var基因转换速率变化导致抗原变异[英]／Paul H, Robert P, Christodoulou Z, et al//Proc Natl Acad Sci U S A

[...]

宁北芳, 朱淮民

28 Jul 2005

TL;DR: PfPMP1）与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用，在黏附及免疫逃避中起关键的作�ly.

...read moreread less

Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1（PfPMP1）与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用，在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员，通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

...read moreread less

18,940 citations

Journal Article•DOI•

The Pfam protein families database

[...]

Marco Punta¹, Penny Coggill¹, Ruth Y. Eberhardt¹, Jaina Mistry¹, John Tate¹, Chris Boursnell¹, Ningze Pang¹, Kristoffer Forslund¹, Goran Ceric¹, Jody Clements¹, Andreas Heger¹, Liisa Holm¹, Erik L. L. Sonnhammer¹, Sean R. Eddy¹, Alex Bateman¹, Robert D. Finn¹ - Show less +12 more•Institutions (1)

Wellcome Trust Sanger Institute¹

01 Jan 2000-Nucleic Acids Research

TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.

...read moreread less

Abstract: Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is approximately 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11,912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).

...read moreread less

14,075 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse