Home
/
Authors
/
James K. Bonfield

Author

James K. Bonfield

Other affiliations: European Bioinformatics Institute, Laboratory of Molecular Biology

Bio: James K. Bonfield is an academic researcher from Wellcome Trust Sanger Institute. The author has contributed to research in topics: File format & Genome. The author has an hindex of 24, co-authored 37 publications receiving 8048 citations. Previous affiliations of James K. Bonfield include European Bioinformatics Institute & Laboratory of Molecular Biology.

Topics: File format, Genome, Sequence assembly, Sequence analysis, Medicine ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution

[...]

LaDeana W. Hillier¹, Webb Miller², Ewan Birney, Wesley C. Warren¹ +171 more•Institutions (39)

09 Dec 2004-Nature

TL;DR: A draft genome sequence of the red jungle fowl, Gallus gallus, provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes.

...read moreread less

Abstract: We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.

...read moreread less

2,579 citations

Journal Article•DOI•

Twelve years of SAMtools and BCFtools.

[...]

Petr Danecek¹, James K. Bonfield¹, Jennifer Liddle¹, John Marshall², Valeriu Ohan¹, Martin O. Pollard¹, Andrew Whitwham¹, Thomas M. Keane³, Shane A. McCarthy¹, Robert L. Davies¹, Heng Li⁴ - Show less +7 more•Institutions (4)

Wellcome Trust Sanger Institute¹, University of Glasgow², European Bioinformatics Institute³, Harvard University⁴

01 Feb 2021-GigaScience

TL;DR: The SAMtools and BCFtools packages represent a unique collection of tools that have been used in numerous other software projects and countless genomic pipelines and are freely available on GitHub under the permissive MIT licence, free for both noncommercial and commercial use.

...read moreread less

Abstract: Background: SAMtools and BCFtools are widely used programs for processing and analysing high-throughput sequencing data. They include tools for file format conversion and manipulation, sorting, querying, statistics, variant calling, and effect analysis amongst other methods. Findings: The first version appeared online 12 years ago and has been maintained and further developed ever since, with many new features and improvements added over the years. The SAMtools and BCFtools packages represent a unique collection of tools that have been used in numerous other software projects and countless genomic pipelines. Conclusion: Both SAMtools and BCFtools are freely available on GitHub under the permissive MIT licence, free for both non-commercial and commercial use. Both packages have been installed >1 million times via Bioconda. The source code and documentation are available from https://www.htslib.org.

...read moreread less

2,448 citations

Journal Article•DOI•

2.2 Mb of contiguous nucleotide sequence from chromosome III of C. elegans

[...]

Richard K. Wilson¹, R. Ainscough², Karen E. Anderson¹, C. Baynes², Mary Berks², James K. Bonfield², James Burton², M. Connell¹, T. Copsey², John A. Cooper¹, Alan Coulson², M. Craxton², S. Dear², Zijin Du¹, Richard Durbin², Anthony Favello¹, A. Fraser², Lucinda Fulton¹, A. Gardner², Philip Green¹, T. Hawkins², LaDeana W. Hillier¹, M. Jier¹, L. Johnston¹, Martin K. Jones², J. Kershaw², J. Kirsten¹, N. Laisster², Phil Latreille¹, J. Lightning², C. Lloyd², Beverley J. Mortimore², M. O'Callaghan², J. Parsons¹, C. Percy², L. Rifken¹, A. Roopra¹, D. Saunders², Ratna Shownkeen², M. Sims², N. Smaldon², Andrew J.H. Smith², Michael D. Smith², Erik L. L. Sonnhammer², Rodger Staden², John Sulston², Jean Thierry-Mieg³, K. Thomas², M. Vaudin¹, K. Vaughan¹, Robert H. Waterston¹, A. Watson², L. Weinstock¹, J. Wilkinson-Sproat², P. Wohldman¹ - Show less +51 more•Institutions (3)

Washington University in St. Louis¹, Laboratory of Molecular Biology², Centre national de la recherche scientifique³

03 Mar 1994-Nature

TL;DR: The nucleotide sequence of a contiguous 2,181,032 base pairs in the central gene cluster of chromosome III is completed, and comparison with the public sequence databases reveals similarities to previously known genes for about one gene in three.

...read moreread less

Abstract: As part of our effort to sequence the 100-megabase (Mb) genome of the nematode Caenorhabditis elegans, we have completed the nucleotide sequence of a contiguous 2,181,032 base pairs in the central gene cluster of chromosome III. Analysis of the finished sequence has indicated an average density of about one gene per five kilobases; comparison with the public sequence databases reveals similarities to previously known genes for about one gene in three. In addition, the genomic sequence contains several intriguing features, including putative gene duplications and a variety of other repeats with potential evolutionary implications.

...read moreread less

1,612 citations

Book Chapter•DOI•

The Staden package, 1998.

[...]

Rodger Staden¹, Kathryn Beal¹, James K. Bonfield¹•Institutions (1)

Laboratory of Molecular Biology¹

01 Jan 2000-Methods of Molecular Biology

1,124 citations

Journal Article•DOI•

A new DNA sequence assembly program

[...]

James K. Bonfield, Kathryn F. Smith, Rodger Staden

01 Jan 1995-Nucleic Acids Research

TL;DR: The Genome Assembly Program (GAP), a new program for DNA sequence assembly, is described, which retains the useful components of the previous work, but includes many novel ideas and methods.

...read moreread less

Abstract: We describe the Genome Assembly Program (GAP), a new program for DNA sequence assembly. The program is suitable for large and small projects, a variety of strategies and can handle data from a range of sequencing instruments. It retains the useful components of our previous work, but includes many novel ideas and methods. Many of these methods have been made possible by the program's completely new, and highly interactive, graphical user interface. The program provides many visual clues to the current state of a sequencing project and allows users to interact in intuitive and graphical ways with their data. The program has tools to display and manipulate the various types of data that help to solve and check difficult assemblies, particularly those in repetitive genomes. We have introduced the following new displays: the Contig Selector, the Contig Comparator, the Template Display, the Restriction Enzyme Map and the Stop Codon Map. We have also made it possible to have any number of Contig Editors and Contig Joining Editors running simultaneously even on the same contig. The program also includes a new 'Directed Assembly' algorithm and routines for automatically detecting unfinished segments of sequence, to which it suggests experimental solutions.

...read moreread less

951 citations

1
2
3
4
…
5
6
7
8
9

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

[...]

Stephen F. Altschul¹, Thomas L. Madden, Alejandro A. Schäffer¹, Jinghui Zhang, Zheng Zhang², Webb Miller², David J. Lipman - Show less +3 more•Institutions (2)

National Institutes of Health¹, Pennsylvania State University²

01 Sep 1997-Nucleic Acids Research

TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.

...read moreread less

Abstract: The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSIBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

...read moreread less

70,111 citations

Journal Article•DOI•

Initial sequencing and analysis of the human genome.

[...]

Eric S. Lander¹, Lauren Linton¹, Bruce W. Birren¹, Chad Nusbaum¹ +245 more•Institutions (29)

15 Feb 2001-Nature

TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.

...read moreread less

Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

...read moreread less

22,269 citations

Journal Article•DOI•

Oligomerization and phosphorylation of the Ire1p kinase during intracellular signaling from the endoplasmic reticulum to the nucleus.

[...]

Caroline E. Shamu¹, Peter Walter¹•Institutions (1)

University of California, San Francisco¹

17 Jun 1996-The EMBO Journal

TL;DR: Molecular genetic and biochemical studies described here suggest that, as in the case of growth factor receptors of higher eukaryotic cells, Ire1p oligomerizes in response to the accumulation of unfolded proteins in the ER and is phosphorylated in trans by otherIre1p molecules as a result of oligomerization.

...read moreread less

Abstract: The transmembrane kinase Ire1p is required for activation of the unfolded protein response (UPR), the increase in transcription of genes encoding endoplasmic reticulum (ER) resident proteins that occurs in response to the accumulation of unfolded proteins in the ER. Ire1p spans the ER membrane (or the nuclear membrane with which the ER is continuous), with its kinase domain localized in the cytoplasm or in the nucleus. Consistent with this arrangement, it has been proposed that Ire1p senses the accumulation of unfolded proteins in the ER and transmits the signal across the membrane toward the transcription machinery, possibly by phosphorylating downstream components of the UPR pathway. Molecular genetic and biochemical studies described here suggest that, as in the case of growth factor receptors of higher eukaryotic cells, Ire1p oligomerizes in response to the accumulation of unfolded proteins in the ER and is phosphorylated in trans by other Ire1p molecules as a result of oligomerization. In addition to its kinase domain, a C-terminal tail domain of Ire1p is required for induction of the UPR. The role of the tail is probably to bind other proteins that transmit the unfolded protein signal to the nucleus.

...read moreread less

12,185 citations

Journal Article•DOI•

Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets

[...]

Benjamin P. Lewis¹, Christopher B. Burge¹, David P. Bartel¹•Institutions (1)

Massachusetts Institute of Technology¹

14 Jan 2005-Cell

TL;DR: In a four-genome analysis of 3' UTRs, approximately 13,000 regulatory relationships were detected above the estimate of false-positive predictions, thereby implicating as miRNA targets more than 5300 human genes, which represented 30% of the gene set.

...read moreread less

11,624 citations

Journal Article•DOI•

Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence

[...]

Stewart T. Cole, Roland Brosch, Julian Parkhill¹, Thierry Garnier, Carol Churcher¹, David Harris¹, Stephen V. Gordon, Karin Eiglmeier, S. Gas, Clifton E. Barry², Fredj Tekaia, K. Badcock¹, D. Basham¹, D. Brown¹, Tracey Chillingworth¹, R. Connor¹, Robert L. Davies¹, K. Devlin¹, Theresa Feltwell¹, S. Gentles¹, N. Hamlin¹, S. Holroyd¹, T. Hornsby¹, Kay Jagels¹, Anders Krogh³, J. McLean¹, Sharon Moule¹, Lee Murphy¹, K. Oliver¹, J. Osborne¹, Michael A. Quail¹, Marie-Adèle Rajandream¹, Jane Rogers¹, S. Rutter¹, K. Seeger¹, Jason Skelton¹, Rob Squares¹, S. Squares¹, John Sulston¹, K. Taylor¹, Sally Whitehead¹, Bart Barrell¹ - Show less +38 more•Institutions (3)

Wellcome Trust¹, National Institutes of Health², Technical University of Denmark³

11 Jun 1998-Nature

TL;DR: The complete genome sequence of the best-characterized strain of Mycobacterium tuberculosis, H37Rv, has been determined and analysed in order to improve the understanding of the biology of this slow-growing pathogen and to help the conception of new prophylactic and therapeutic interventions.

...read moreread less

Abstract: Countless millions of people have died from tuberculosis, a chronic infectious disease caused by the tubercle bacillus. The complete genome sequence of the best-characterized strain of Mycobacterium tuberculosis, H37Rv, has been determined and analysed in order to improve our understanding of the biology of this slow-growing pathogen and to help the conception of new prophylactic and therapeutic interventions. The genome comprises 4,411,529 base pairs, contains around 4,000 genes, and has a very high guanine + cytosine content that is reflected in the biased amino-acid content of the proteins. M. tuberculosis differs radically from other bacteria in that a very large portion of its coding capacity is devoted to the production of enzymes involved in lipogenesis and lipolysis, and to two new families of glycine-rich proteins with a repetitive structure that may represent a source of antigenic variation.

...read moreread less

7,779 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse