scispace - formally typeset
Search or ask a question
Author

James K. Bonfield

Bio: James K. Bonfield is an academic researcher from Wellcome Trust Sanger Institute. The author has contributed to research in topics: File format & Genome. The author has an hindex of 24, co-authored 37 publications receiving 8048 citations. Previous affiliations of James K. Bonfield include European Bioinformatics Institute & Laboratory of Molecular Biology.

Papers
More filters
Journal ArticleDOI
LaDeana W. Hillier1, Webb Miller2, Ewan Birney, Wesley C. Warren1  +171 moreInstitutions (39)
09 Dec 2004-Nature
TL;DR: A draft genome sequence of the red jungle fowl, Gallus gallus, provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes.
Abstract: We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.

2,579 citations

Journal ArticleDOI
TL;DR: The SAMtools and BCFtools packages represent a unique collection of tools that have been used in numerous other software projects and countless genomic pipelines and are freely available on GitHub under the permissive MIT licence, free for both noncommercial and commercial use.
Abstract: Background: SAMtools and BCFtools are widely used programs for processing and analysing high-throughput sequencing data. They include tools for file format conversion and manipulation, sorting, querying, statistics, variant calling, and effect analysis amongst other methods. Findings: The first version appeared online 12 years ago and has been maintained and further developed ever since, with many new features and improvements added over the years. The SAMtools and BCFtools packages represent a unique collection of tools that have been used in numerous other software projects and countless genomic pipelines. Conclusion: Both SAMtools and BCFtools are freely available on GitHub under the permissive MIT licence, free for both non-commercial and commercial use. Both packages have been installed >1 million times via Bioconda. The source code and documentation are available from https://www.htslib.org.

2,448 citations

Journal ArticleDOI
03 Mar 1994-Nature
TL;DR: The nucleotide sequence of a contiguous 2,181,032 base pairs in the central gene cluster of chromosome III is completed, and comparison with the public sequence databases reveals similarities to previously known genes for about one gene in three.
Abstract: As part of our effort to sequence the 100-megabase (Mb) genome of the nematode Caenorhabditis elegans, we have completed the nucleotide sequence of a contiguous 2,181,032 base pairs in the central gene cluster of chromosome III. Analysis of the finished sequence has indicated an average density of about one gene per five kilobases; comparison with the public sequence databases reveals similarities to previously known genes for about one gene in three. In addition, the genomic sequence contains several intriguing features, including putative gene duplications and a variety of other repeats with potential evolutionary implications.

1,612 citations

Journal ArticleDOI
TL;DR: The Genome Assembly Program (GAP), a new program for DNA sequence assembly, is described, which retains the useful components of the previous work, but includes many novel ideas and methods.
Abstract: We describe the Genome Assembly Program (GAP), a new program for DNA sequence assembly. The program is suitable for large and small projects, a variety of strategies and can handle data from a range of sequencing instruments. It retains the useful components of our previous work, but includes many novel ideas and methods. Many of these methods have been made possible by the program's completely new, and highly interactive, graphical user interface. The program provides many visual clues to the current state of a sequencing project and allows users to interact in intuitive and graphical ways with their data. The program has tools to display and manipulate the various types of data that help to solve and check difficult assemblies, particularly those in repetitive genomes. We have introduced the following new displays: the Contig Selector, the Contig Comparator, the Template Display, the Restriction Enzyme Map and the Stop Codon Map. We have also made it possible to have any number of Contig Editors and Contig Joining Editors running simultaneously even on the same contig. The program also includes a new 'Directed Assembly' algorithm and routines for automatically detecting unfinished segments of sequence, to which it suggests experimental solutions.

951 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Abstract: The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSIBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

70,111 citations

Journal ArticleDOI
Eric S. Lander1, Lauren Linton1, Bruce W. Birren1, Chad Nusbaum1  +245 moreInstitutions (29)
15 Feb 2001-Nature
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Abstract: The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

22,269 citations

Journal ArticleDOI
TL;DR: Molecular genetic and biochemical studies described here suggest that, as in the case of growth factor receptors of higher eukaryotic cells, Ire1p oligomerizes in response to the accumulation of unfolded proteins in the ER and is phosphorylated in trans by otherIre1p molecules as a result of oligomerization.
Abstract: The transmembrane kinase Ire1p is required for activation of the unfolded protein response (UPR), the increase in transcription of genes encoding endoplasmic reticulum (ER) resident proteins that occurs in response to the accumulation of unfolded proteins in the ER. Ire1p spans the ER membrane (or the nuclear membrane with which the ER is continuous), with its kinase domain localized in the cytoplasm or in the nucleus. Consistent with this arrangement, it has been proposed that Ire1p senses the accumulation of unfolded proteins in the ER and transmits the signal across the membrane toward the transcription machinery, possibly by phosphorylating downstream components of the UPR pathway. Molecular genetic and biochemical studies described here suggest that, as in the case of growth factor receptors of higher eukaryotic cells, Ire1p oligomerizes in response to the accumulation of unfolded proteins in the ER and is phosphorylated in trans by other Ire1p molecules as a result of oligomerization. In addition to its kinase domain, a C-terminal tail domain of Ire1p is required for induction of the UPR. The role of the tail is probably to bind other proteins that transmit the unfolded protein signal to the nucleus.

12,185 citations

Journal ArticleDOI
14 Jan 2005-Cell
TL;DR: In a four-genome analysis of 3' UTRs, approximately 13,000 regulatory relationships were detected above the estimate of false-positive predictions, thereby implicating as miRNA targets more than 5300 human genes, which represented 30% of the gene set.

11,624 citations

Journal ArticleDOI
11 Jun 1998-Nature
TL;DR: The complete genome sequence of the best-characterized strain of Mycobacterium tuberculosis, H37Rv, has been determined and analysed in order to improve the understanding of the biology of this slow-growing pathogen and to help the conception of new prophylactic and therapeutic interventions.
Abstract: Countless millions of people have died from tuberculosis, a chronic infectious disease caused by the tubercle bacillus. The complete genome sequence of the best-characterized strain of Mycobacterium tuberculosis, H37Rv, has been determined and analysed in order to improve our understanding of the biology of this slow-growing pathogen and to help the conception of new prophylactic and therapeutic interventions. The genome comprises 4,411,529 base pairs, contains around 4,000 genes, and has a very high guanine + cytosine content that is reflected in the biased amino-acid content of the proteins. M. tuberculosis differs radically from other bacteria in that a very large portion of its coding capacity is devoted to the production of enzymes involved in lipogenesis and lipolysis, and to two new families of glycine-rich proteins with a repetitive structure that may represent a source of antigenic variation.

7,779 citations