scispace - formally typeset
Search or ask a question

Showing papers by "Simon A. Forbes published in 2008"


Journal ArticleDOI
TL;DR: This unit describes the graphical system in detail, elaborating an example walkthrough and the many ways that the resulting information can be thoroughly investigated by combining data, respecializing the query, or viewing the results in different ways.
Abstract: COSMIC is currently the most comprehensive global resource for information on somatic mutations in human cancer, combining curation of the scientific literature with tumor resequencing data from the Cancer Genome Project at the Sanger Institute, U.K. Almost 4800 genes and 250000 tumors have been examined, resulting in over 50000 mutations available for investigation. This information can be accessed in a number of ways, the most convenient being the Web-based system which allows detailed data mining, presenting the results in easily interpretable formats. This unit describes the graphical system in detail, elaborating an example walkthrough and the many ways that the resulting information can be thoroughly investigated by combining data, respecializing the query, or viewing the results in different ways. Alternate protocols overview the available precompiled data files available for download.

898 citations


Journal ArticleDOI
TL;DR: The MHC Haplotype Project was to provide a comprehensively annotated reference sequence of a single, human leukocyte antigen-homozygous MHC haplotype and to use it as a basis against which variations could be assessed from seven other similarly homozygous cell lines, representative of the most common MHChaplotype in the European population.
Abstract: The human major histocompatibility complex (MHC) is contained within about 4 Mb on the short arm of chromosome 6 and is recognised as the most variable region in the human genome. The primary aim of the MHC Haplotype Project was to provide a comprehensively annotated reference sequence of a single, human leukocyte antigen-homozygous MHC haplotype and to use it as a basis against which variations could be assessed from seven other similarly homozygous cell lines, representative of the most common MHC haplotypes in the European population. Comparison of the haplotype sequences, including four haplotypes not previously analysed, resulted in the identification of >44,000 variations, both substitutions and indels (insertions and deletions), which have been submitted to the dbSNP database. The gene annotation uncovered haplotype-specific differences and confirmed the presence of more than 300 loci, including over 160 protein-coding genes. Combined analysis of the variation and annotation datasets revealed 122 gene loci with coding substitutions of which 97 were non-synonymous. The haplotype (A3-B7-DR15; PGF cell line) designated as the new MHC reference sequence, has been incorporated into the human genome assembly (NCBI35 and subsequent builds), and constitutes the largest single-haplotype sequence of the human genome to date. The extensive variation and annotation data derived from the analysis of seven further haplotypes have been made publicly available and provide a framework and resource for future association studies of all MHC-associated diseases and transplant medicine.

314 citations



Journal ArticleDOI
TL;DR: The findings of a linkage analysis, involving numerous markers from the human X chromosome, in an attempt to localise a putative gene causing apparent X‐linked spina bifida and anencephaly in a large Icelandic pedigree are reported.
Abstract: We report here the findings of a linkage analysis, involving numerous markers from the human X chromosome, in an attempt to localise a putative gene causing apparent X-linked spina bifida and anencephaly (SBA) in a large Icelandic pedigree. Two-point linkage analysis was performed using markers from 62 informative loci in this family. Although small positive lod scores were found at a number of these loci, none reached the significance level for linkage. Haplotypes were extensively analysed and found to exclude linkage to the X chromosome.

10 citations


01 Dec 2008
TL;DR: The current version of COSMIC is close to fulfilling its original intentions, with curation of most pointmutated genes in cancer complete, however, new challenges are emerging with the need to calculate the effect of high numbers of observed sequence changes to identify those driving tumour formation, and theneed to meaningfully handle the increasing quantities of data from high-throughput screens and next-generation sequencing technologies.
Abstract: Background. COSMIC (http://www.sanger.ac.uk/cosmic) is a system designed to curate the world's literature on somatic mutations in known cancer genes. Initially conceived to capture the mutation spread in point-mutated genes, COSMIC has now grown to encompass gene fusion products of genome rearrangement events which generate completely novel transcripts, together with all the somatic mutation data from candidate gene screens at the Cancer Genome Project, UK (CGP), covering almost 5000 genes of potential interest in cancer genetics. Results. The latest release of COSMIC (version 37; July 2008) now holds full and up-to-date curation of over 5,900 scientific papers, examining over 268,000 tumours, in which over 59,000 mutations are detailed through 60 pointmutated genes. Fusion gene products have been curated for 16 pairs of genes, described through over 4200 tumours. 2246 papers were rejected during manual curation, usually due to significant inconsistencies in the publication. A relational database holds the captured information, which is warehoused for each release. The information is presented on the internet with a series of graphical and tabulated views aiding navigation and interpretation. Conclusions. The current version of COSMIC is close to fulfilling its original intentions, with curation of most pointmutated genes in cancer complete. However, new challenges are emerging with the need to calculate the effect of high numbers of observed sequence changes to identify those driving tumour formation, and the need to meaningfully handle the increasing quantities of data from high-throughput screens and next-generation sequencing technologies.