scispace - formally typeset
Search or ask a question
Author

Daniel Barker

Bio: Daniel Barker is an academic researcher from University of Newcastle. The author has contributed to research in topics: Population & Medicine. The author has an hindex of 30, co-authored 133 publications receiving 11609 citations. Previous affiliations of Daniel Barker include Edgewood Chemical Biological Center & University of St Andrews.


Papers
More filters
Journal ArticleDOI
21 Oct 2004-Nature
TL;DR: The current human genome sequence (Build 35) as discussed by the authors contains 2.85 billion nucleotides interrupted by only 341 gaps and is accurate to an error rate of approximately 1 event per 100,000 bases.
Abstract: The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers approximately 99% of the euchromatic genome and is accurate to an error rate of approximately 1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human genome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead.

3,989 citations

Journal ArticleDOI
Andrew G. Clark1, Michael B. Eisen2, Michael B. Eisen3, Douglas Smith  +426 moreInstitutions (70)
08 Nov 2007-Nature
TL;DR: These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution.
Abstract: Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.

2,057 citations

Journal ArticleDOI
TL;DR: The Ensembl database project provides a bioinformatics framework to organise biology around the sequences of large genomes and is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources.
Abstract: The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources, and is available as either an interactive web site or as flat files. It is also an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements from sequence analysis to data storage and visualisation. The Ensembl site is one of the leading sources of human genome sequence annotation and provided much of the analysis for publication by the international human genome project of the draft genome. The Ensembl system is being installed around the world in both companies and academic sites on machines ranging from supercomputers to laptops.

1,540 citations

Journal ArticleDOI
TL;DR: This work describes how to combine information about the uncertainty of the phylogeny with uncertainty in the estimate of the ancestral state and shows how to reconstruct ancestral states of uncertain nodes using a most-recent-common-ancestor approach.
Abstract: Biologists frequently attempt to infer the character states at ancestral nodes of a phylogeny from the distribution of traits observed in contemporary organisms. Because phylogenies are normally inferences from data, it is desirable to account for the uncertainty in estimates of the tree and its branch lengths when making inferences about ancestral states or other comparative parameters. Here we present a general Bayesian approach for testing comparative hypotheses across statistically justified samples of phylogenies, focusing on the specific issue of reconstructing ancestral states. The method uses Markov chain Monte Carlo techniques for sampling phylogenetic trees and for investigating the parameters of a statistical model of trait evolution. We describe how to combine information about the uncertainty of the phylogeny with uncertainty in the estimate of the ancestral state. Our approach does not constrain the sample of trees only to those that contain the ancestral node or nodes of interest, and we show how to reconstruct ancestral states of uncertain nodes using a most-recent-common-ancestor approach. We illustrate the methods with data on ribonuclease evolution in the Artiodactyla. Software implementing the methods (BayesMultiState) is available from the authors.

1,177 citations

Journal ArticleDOI
22 Jul 2005-Science
TL;DR: It is found that rates of population growth are high at low population densities but, contrary to previous predictions, decline rapidly with increasing population size and then flatten out, for all four taxa.
Abstract: A key unresolved question in population ecology concerns the relationship between a population's size and its growth rate. We estimated this relationship for 1780 time series of mammals, birds, fish, and insects. We found that rates of population growth are high at low population densities but, contrary to previous predictions, decline rapidly with increasing population size and then flatten out, for all four taxa. This produces a strongly concave relationship between a population's growth rate and its size. These findings have fundamental implications for our understanding of animals' lives, suggesting in particular that many animals in these taxa will be found living at densities above the carrying capacity of their environments.

406 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Characteristics of patients who died were in line with the MuLBSTA score, an early warning model for predicting mortality in viral pneumonia, and further investigation is needed to explore the applicability of the Mu LBSTA scores in predicting the risk of mortality in 2019-nCoV infection.

16,282 citations

Journal ArticleDOI
TL;DR: DAMID is a web-accessible program that integrates functional genomic annotations with intuitive graphical summaries that assists in the interpretation of genome-scale datasets by facilitating the transition from data collection to biological meaning.
Abstract: The distributed nature of biological knowledge poses a major challenge to the interpretation of genome-scale datasets, including those derived from microarray and proteomic studies. This report describes DAVID, a web-accessible program that integrates functional genomic annotations with intuitive graphical summaries. Lists of gene or protein identifiers are rapidly annotated and summarized according to shared categorical data for Gene Ontology, protein domain, and biochemical pathway membership. DAVID assists in the interpretation of genome-scale datasets by facilitating the transition from data collection to biological meaning.

8,849 citations

Journal ArticleDOI
28 Oct 2010-Nature
TL;DR: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype as mentioned in this paper, and the results of the pilot phase of the project, designed to develop and compare different strategies for genomewide sequencing with high-throughput platforms.
Abstract: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

7,538 citations

Journal ArticleDOI
TL;DR: The Swiss-Prot, TrEMBL and PIR protein database activities have united to form the Universal Protein Knowledgebase (UniProt), which is to provide a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and query interfaces.
Abstract: To provide the scientific community with a single, centralized, authoritative resource for protein sequences and functional information, the Swiss-Prot, TrEMBL and PIR protein database activities have united to form the Universal Protein Knowledgebase (UniProt) consortium. Our mission is to provide a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and query interfaces. The central database will have two sections, corresponding to the familiar Swiss-Prot (fully manually curated entries) and TrEMBL (enriched with automated classification, annotation and extensive cross-references). For convenient sequence searches, UniProt also provides several non-redundant sequence databases. The UniProt NREF (UniRef) databases provide representative subsets of the knowledgebase suitable for efficient searching. The comprehensive UniProt Archive (UniParc) is updated daily from many public source databases. The UniProt databases can be accessed online (http://www.uniprot.org) or downloaded in several formats (ftp://ftp.uniprot.org/pub). The scientific community is encouraged to submit data for inclusion in UniProt.

7,298 citations