scispace - formally typeset
Search or ask a question

Showing papers by "Wellcome Trust Sanger Institute published in 2006"


Journal ArticleDOI
TL;DR: The miRBase database aims to provide integrated interfaces to comprehensive microRNA sequence data, annotation and predicted gene targets, and acts as an independent arbiter of microRNA gene nomenclature.
Abstract: The miRBase database aims to provide integrated interfaces to comprehensive microRNA sequence data, annotation and predicted gene targets. miRBase takes over functionality from the microRNA Registry and fulfils three main roles: the miRBase Registry acts as an independent arbiter of microRNA gene nomenclature, assigning names prior to publication of novel miRNA sequences. miRBase Sequences is the primary online repository for miRNA sequence data and annotation. miRBase Targets is a comprehensive new database of predicted miRNA target genes. miRBase is available at http://microrna.sanger.ac.uk/.

4,629 citations


Journal ArticleDOI
23 Nov 2006-Nature
TL;DR: A first-generation CNV map of the human genome is constructed through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia, underscoring the importance of CNV in genetic diversity and evolution and the utility of this resource for genetic disease studies.
Abstract: Copy number variation (CNV) of DNA sequences is functionally significant but has yet to be fully ascertained. We have constructed a first-generation CNV map of the human genome through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia (the HapMap collection). DNA from these individuals was screened for CNV using two complementary technologies: single-nucleotide polymorphism (SNP) genotyping arrays, and clone-based comparative genomic hybridization. A total of 1,447 copy number variable regions (CNVRs), which can encompass overlapping or adjacent gains or losses, covering 360 megabases (12% of the genome) were identified in these populations. These CNVRs contained hundreds of genes, disease loci, functional elements and segmental duplications. Notably, the CNVRs encompassed more nucleotide content per genome than SNPs, underscoring the importance of CNV in genetic diversity and evolution. The data obtained delineate linkage disequilibrium patterns for many CNVs, and reveal marked variation in copy number among populations. We also demonstrate the utility of this resource for genetic disease studies.

4,275 citations


Journal ArticleDOI
TL;DR: Improvements to the range of Pfam web tools and the first set of PfAm web services that allow programmatic access to the database and associated tools are presented.
Abstract: Pfam is a database of protein families that currently contains 7973 entries (release 180) A recent development in Pfam has enabled the grouping of related families into clans Pfam clans are described in detail, together with the new associated web pages Improvements to the range of Pfam web tools and the first set of Pfam web services that allow programmatic access to the database and associated tools are also presented Pfam is available on the web in the UK (http://wwwsangeracuk/Software/Pfam/), the USA (http://pfamwustledu/), France (http://pfamjouyinrafr/) and Sweden (http://pfamcgbkise/)

2,241 citations


Journal ArticleDOI
TL;DR: RPA couples isothermal recombinase polymerase-driven primer targeting of template material with strand-displacement DNA synthesis and achieves exponential amplification with no need for pretreatment of sample DNA, thereby establishing an instrument-free DNA testing system.
Abstract: DNA amplification is essential to most nucleic acid testing strategies, but established techniques require sophisticated equipment or complex experimental procedures, and their uptake outside specialised laboratories has been limited. Our novel approach, recombinase polymerase amplification (RPA), couples isothermal recombinase-driven primer targeting of template material with strand-displacement DNA synthesis. It achieves exponential amplification with no need for pretreatment of sample DNA. Reactions are sensitive, specific, and rapid and operate at constant low temperature. We have also developed a probe-based detection system. Key aspects of the combined RPA amplification/detection process are illustrated by a test for the pathogen methicillin-resistant Staphylococcus aureus. The technology proves to be sensitive to fewer than ten copies of genomic DNA. Furthermore, products can be detected in a simple sandwich assay, thereby establishing an instrument-free DNA testing system. This unique combination of properties is a significant advance in the development of portable and widely accessible nucleic acid-based tests.

1,655 citations


Journal ArticleDOI
TL;DR: Analysis of six annotation categories showed that evolutionarily conserved regions are the predominant sites for differential DNA methylation and that a core region surrounding the transcriptional start site is an informative surrogate for promoter methylation.
Abstract: DNA methylation constitutes the most stable type of epigenetic modifications modulating the transcriptional plasticity of mammalian genomes. Using bisulfite DNA sequencing, we report high-resolution methylation reference profiles of human chromosomes 6, 20 and 22, providing a resource of about 1.9 million CpG methylation values derived from 12 different tissues. Analysis of 6 annotation categories, revealed evolutionary conserved regions to be the predominant sites for differential DNA methylation and a core region surrounding the transcriptional start site as informative surrogate for promoter methylation. We find 17% of the 873 analyzed genes differentially methylated in their 5′-untranslated regions (5′-UTR) and about one third of the differentially methylated 5′-UTRs to be inversely correlated with transcription. While our study was controlled for factors reported to affect DNA methylation such as sex and age, we did not find any significant attributable effects. Our data suggest DNA methylation to be ontogenetically more stable than previously thought.

1,335 citations


Journal ArticleDOI
TL;DR: The complete genome sequence of Clostridium difficile strain 630, a virulent and multidrug-resistant strain, is determined; it indicates that a large proportion (11%) of the genome consists of mobile genetic elements, mainly in the form of conjugative transposons.
Abstract: We determined the complete genome sequence of Clostridium difficile strain 630, a virulent and multidrug-resistant strain. Our analysis indicates that a large proportion (11%) of the genome consists of mobile genetic elements, mainly in the form of conjugative transposons. These mobile elements are putatively responsible for the acquisition by C. difficile of an extensive array of genes involved in antimicrobial resistance, virulence, host interaction and the production of surface structures. The metabolic capabilities encoded in the genome show multiple adaptations for survival and growth within the gut environment. The extreme genome variability was confirmed by whole-genome microarray analysis; it may reflect the organism's niche in the gut and should provide information on the evolution of virulence in this organism.

892 citations


Journal ArticleDOI
TL;DR: Current efforts are directed toward a more comprehensive cataloging and characterization of CNVs that will provide the basis for determining how genomic diversity impacts biological function, evolution, and common human diseases.
Abstract: DNA copy number variation has long been associated with specific chromosomal rearrangements and genomic disorders, but its ubiquity in mammalian genomes was not fully realized until recently. Although our understanding of the extent of this variation is still developing, it seems likely that, at least in humans, copy number variants (CNVs) account for a substantial amount of genetic variation. Since many CNVs include genes that result in differential levels of gene expression, CNVs may account for a significant proportion of normal phenotypic variation. Current efforts are directed toward a more comprehensive cataloging and characterization of CNVs that will provide the basis for determining how genomic diversity impacts biological function, evolution, and common human diseases.

855 citations


Journal ArticleDOI
09 Nov 2006-Neuron
TL;DR: A critical role is demonstrated in the consolidation of enduring synaptic plasticity and memory storage in Arc/Arg3.1 knockout mice that fail to form long-lasting memories for implicit and explicit learning tasks, despite intact short-term memory.

809 citations


Journal ArticleDOI
TL;DR: The analysis provides informative tag SNPs that capture much of the common variation in the MHC region and that could be used in disease association studies, and it provides new insight into the evolutionary dynamics and ancestral origins of the HLA loci and their haplotypes.
Abstract: The proteins encoded by the classical HLA class I and class II genes in the major histocompatibility complex (MHC) are highly polymorphic and are essential in self versus non-self immune recognition. HLA variation is a crucial determinant of transplant rejection and susceptibility to a large number of infectious and autoimmune diseases. Yet identification of causal variants is problematic owing to linkage disequilibrium that extends across multiple HLA and non-HLA genes in the MHC. We therefore set out to characterize the linkage disequilibrium patterns between the highly polymorphic HLA genes and background variation by typing the classical HLA genes and >7,500 common SNPs and deletion-insertion polymorphisms across four population samples. The analysis provides informative tag SNPs that capture much of the common variation in the MHC region and that could be used in disease association studies, and it provides new insight into the evolutionary dynamics and ancestral origins of the HLA loci and their haplotypes.

780 citations


Journal ArticleDOI
TL;DR: Inactivating truncating mutations of BRIP1, similar to those in BRCA2, cause Fanconi anemia in biallelic carriers and confer susceptibility to breast cancer in monoallelic carriers.
Abstract: We identified constitutional truncating mutations of the BRCA1-interacting helicase BRIP1 in 9/1,212 individuals with breast cancer from BRCA1/BRCA2 mutation-negative families but in only 2/2,081 controls (P = 0.0030), and we estimate that BRIP1 mutations confer a relative risk of breast cancer of 2.0 (95% confidence interval = 1.2-3.2, P = 0.012). Biallelic BRIP1 mutations were recently shown to cause Fanconi anemia complementation group J. Thus, inactivating truncating mutations of BRIP1, similar to those in BRCA2, cause Fanconi anemia in biallelic carriers and confer susceptibility to breast cancer in monoallelic carriers.

704 citations


Journal ArticleDOI
TL;DR: A new method that uses SNP genotype data from parent-offspring trios to identify polymorphic deletions is reported, which will permit the identification of deletion polymorphisms in high-density SNP surveys of trio or other family data.
Abstract: Recent work has shown that copy number polymorphism is an important class of genetic variation in human genomes. Here we report a new method that uses SNP genotype data from parent-offspring trios to identify polymorphic deletions. We applied this method to data from the International HapMap Project to produce the first high-resolution population surveys of deletion polymorphism. Approximately 100 of these deletions have been experimentally validated using comparative genome hybridization on tiling-resolution oligonucleotide microarrays. Our analysis identifies a total of 586 distinct regions that harbor deletion polymorphisms in one or more of the families. Notably, we estimate that typical individuals are hemizygous for roughly 30-50 deletions larger than 5 kb, totaling around 550-750 kb of euchromatic sequence across their genomes. The detected deletions span a total of 267 known and predicted genes. Overall, however, the deleted regions are relatively gene-poor, consistent with the action of purifying selection against deletions. Deletion polymorphisms may well have an important role in the genetics of complex traits; however, they are not directly observed in most current gene mapping studies. Our new method will permit the identification of deletion polymorphisms in high-density SNP surveys of trio or other family data.

Journal ArticleDOI
TL;DR: The results demonstrate that ATM mutations that cause ataxia-telangiectasia in biallelic carriers are breast cancer susceptibility alleles in monoallelic carriers, with an estimated relative risk of breast cancer.
Abstract: We screened individuals from 443 familial breast cancer pedigrees and 521 controls for ATM sequence variants and identified 12 mutations in affected individuals and two in controls (P = 0.0047). The results demonstrate that ATM mutations that cause ataxia-telangiectasia in biallelic carriers are breast cancer susceptibility alleles in monoallelic carriers, with an estimated relative risk of 2.37 (95% confidence interval (c.i.) = 1.51-3.78, P = 0.0003). There was no evidence that other classes of ATM variant confer a risk of breast cancer.

Journal ArticleDOI
TL;DR: This work provides the sequences of the capsular biosynthetic genes of all 90 serotypes of Streptococcus pneumoniae and relate these to the known polysaccharide structures and patterns of immunological reactivity of typing sera, thereby providing the most complete understanding of the genetics and origins of bacterial poly Saccharide diversity.
Abstract: Several major invasive bacterial pathogens are encapsulated. Expression of a polysaccharide capsule is essential for survival in the blood, and thus for virulence, but also is a target for host antibodies and the basis for effective vaccines. Encapsulated species typically exhibit antigenic variation and express one of a number of immunochemically distinct capsular polysaccharides that define serotypes. We provide the sequences of the capsular biosynthetic genes of all 90 serotypes of Streptococcus pneumoniae and relate these to the known polysaccharide structures and patterns of immunological reactivity of typing sera, thereby providing the most complete understanding of the genetics and origins of bacterial polysaccharide diversity, laying the foundations for molecular serotyping. This is the first time, to our knowledge, that a complete repertoire of capsular biosynthetic genes has been available, enabling a holistic analysis of a bacterial polysaccharide biosynthesis system. Remarkably, the total size of alternative coding DNA at this one locus exceeds 1.8 Mbp, almost equivalent to the entire S. pneumoniae chromosomal complement.

Journal ArticleDOI
TL;DR: A snapshot analysis based on the most recent genome sequences of two E.coli K-12 strains allows comparison of their genotypes and mutant status of alleles.
Abstract: The goal of this group project has been to coordinate and bring up-to-date information on all genes of Escherichia coli K-12. Annotation of the genome of an organism entails identification of genes, the boundaries of genes in terms of precise start and end sites, and description of the gene products. Known and predicted functions were assigned to each gene product on the basis of experimental evidence or sequence analysis. Since both kinds of evidence are constantly expanding, no annotation is complete at any moment in time. This is a snapshot analysis based on the most recent genome sequences of two E.coli K-12 bacteria. An accurate and up-to-date description of E.coli K-12 genes is of particular importance to the scientific community because experimentally determined properties of its gene products provide fundamental information for annotation of innumerable genes of other organisms. Availability of the complete genome sequence of two K-12 strains allows comparison of their genotypes and mutant status of alleles.

Journal ArticleDOI
TL;DR: The comprehensiveness of the GENCODE annotation was assessed by attempting to validate all the predicted exon boundaries outside the GencODE annotation, which showed only 40% of GENCode exons are contained within the two sets, which is a reflection of the high number of alternative splice forms with unique exons annotated.
Abstract: Background The GENCODE consortium was formed to identify and map all protein-coding genes within the ENCODE regions. This was achieved by a combination of initial manual annotation by the HAVANA team, experimental validation by the GENCODE consortium and a refinement of the annotation based on these experimental results.

Journal ArticleDOI
20 Oct 2006-Cell
TL;DR: Otoferlin is essential for a late step of synaptic vesicle exocytosis and may act as the major Ca(2+) sensor triggering membrane fusion at the IHC ribbon synapse.

Journal ArticleDOI
TL;DR: The genome of R. leguminosarum can be considered to have two main components: a 'core', which is higher in G+C, is mostly chromosomal, is shared with related organisms, and has a consistent phylogeny; and an 'accessory' component, which is sporadic in distribution, lower in G-C, and located on the plasmids and chromosomal islands.
Abstract: Rhizobium leguminosarum is an α-proteobacterial N2-fixing symbiont of legumes that has been the subject of more than a thousand publications. Genes for the symbiotic interaction with plants are well studied, but the adaptations that allow survival and growth in the soil environment are poorly understood. We have sequenced the genome of R. leguminosarum biovar viciae strain 3841. The 7.75 Mb genome comprises a circular chromosome and six circular plasmids, with 61% G+C overall. All three rRNA operons and 52 tRNA genes are on the chromosome; essential protein-encoding genes are largely chromosomal, but most functional classes occur on plasmids as well. Of the 7,263 protein-encoding genes, 2,056 had orthologs in each of three related genomes (Agrobacterium tumefaciens, Sinorhizobium meliloti, and Mesorhizobium loti), and these genes were over-represented in the chromosome and had above average G+C. Most supported the rRNA-based phylogeny, confirming A. tumefaciens to be the closest among these relatives, but 347 genes were incompatible with this phylogeny; these were scattered throughout the genome but were over-represented on the plasmids. An unexpectedly large number of genes were shared by all three rhizobia but were missing from A. tumefaciens. Overall, the genome can be considered to have two main components: a 'core', which is higher in G+C, is mostly chromosomal, is shared with related organisms, and has a consistent phylogeny; and an 'accessory' component, which is sporadic in distribution, lower in G+C, and located on the plasmids and chromosomal islands. The accessory genome has a different nucleotide composition from the core despite a long history of coexistence.

Journal ArticleDOI
TL;DR: It is proposed that these genes function as general buffers of genetic variation and that these hub genes may act as modifier genes in multiple, mechanistically unrelated genetic diseases in humans.
Abstract: Most heritable traits, including disease susceptibility, are affected by interactions between multiple genes. However, we understand little about how genes interact because very few possible genetic interactions have been explored experimentally. We have used RNA interference in Caenorhabditis elegans to systematically test ∼65,000 pairs of genes for their ability to interact genetically. We identify ∼350 genetic interactions between genes functioning in signaling pathways that are mutated in human diseases, including components of the EGF/Ras, Notch and Wnt pathways. Most notably, we identify a class of highly connected 'hub' genes: inactivation of these genes can enhance the phenotypic consequences of mutation of many different genes. These hub genes all encode chromatin regulators, and their activity as genetic hubs seems to be conserved across animals. We propose that these genes function as general buffers of genetic variation and that these hub genes may act as modifier genes in multiple, mechanistically unrelated genetic diseases in humans.

Journal ArticleDOI
TL;DR: In this article, the role of RpfG is to degrade the unusual nucleotide cyclic di-GMP, an activity associated with the HD-GYP domain.
Abstract: HD-GYP is a protein domain of unknown biochemical function implicated in bacterial signaling and regulation. In the plant pathogen Xanthomonas campestris pv. campestris, the synthesis of virulence factors and dispersal of biofilms are positively controlled by a two-component signal transduction system comprising the HD-GYP domain regulatory protein RpfG and cognate sensor RpfC and by cell–cell signaling mediated by the diffusible signal molecule DSF (diffusible signal factor). The RpfG/RpfC two-component system has been implicated in DSF perception and signal transduction. Here we show that the role of RpfG is to degrade the unusual nucleotide cyclic di-GMP, an activity associated with the HD-GYP domain. Mutation of the conserved H and D residues of the isolated HD-GYP domain resulted in loss of both the enzymatic activity against cyclic di-GMP and the regulatory activity in virulence factor synthesis. Two other protein domains, GGDEF and EAL, are already implicated in the synthesis and degradation respectively of cyclic di-GMP. As with GGDEF and EAL domains, the HD-GYP domain is widely distributed in free-living bacteria and occurs in plant and animal pathogens, as well as beneficial symbionts and organisms associated with a range of environmental niches. Identification of the role of the HD-GYP domain thus increases our understanding of a signaling network whose importance to the lifestyle of diverse bacteria is now emerging.

Journal ArticleDOI
TL;DR: This Commentary is an invitation to an open discussion started among various users of RNAi to set forth accepted standards that would insure the quality and accuracy of information in the large datasets coming out of genome-scale screens.
Abstract: Large-scale RNA interference (RNAi)-based analyses, very much as other ‘omic’ approaches, have inherent rates of false positives and negatives. The variability in the standards of care applied to validate results from these studies, if left unchecked, could eventually begin to undermine the credibility of RNAi as a powerful functional approach. This Commentary is an invitation to an open discussion started among various users of RNAi to set forth accepted standards that would insure the quality and accuracy of information in the large datasets coming out of genome-scale screens.

Journal ArticleDOI
TL;DR: These synapse proteome data sets offer a basis for future research in synaptic biology and will provide useful information in brain disease and mental disorder studies.
Abstract: Characterization of the composition of the postsynaptic proteome (PSP) provides a framework for understanding the overall organization and function of the synapse in normal and pathological conditions. We have identified 698 proteins from the postsynaptic terminal of mouse CNS synapses using a series of purification strategies and analysis by liquid chromatography tandem mass spectrometry and large-scale immunoblotting. Some 620 proteins were found in purified postsynaptic densities (PSDs), nine in AMPA-receptor immuno-purifications, 100 in isolates using an antibody against the NMDA receptor subunit NR1, and 170 by peptide-affinity purification of complexes with the C-terminus of NR2B. Together, the NR1 and NR2B complexes contain 186 proteins, collectively referred to as membrane-associated guanylate kinase-associated signalling complexes. We extracted data from six other synapse proteome experiments and combined these with our data to provide a consensus on the composition of the PSP. In total, 1124 proteins are present in the PSP, of which 466 were validated by their detection in two or more studies, forming what we have designated the Consensus PSD. These synapse proteome data sets offer a basis for future research in synaptic biology and will provide useful information in brain disease and mental disorder studies.

Journal ArticleDOI
TL;DR: The full yeast protein-protein interaction network is estimated to contain 37,800-75,500 interactions and the human network 154,000-369,000, but owing to a high false-positive rate, current maps are roughly only 50% and 10% complete, respectively.
Abstract: We estimate the full yeast protein-protein interaction network to contain 37,800-75,500 interactions and the human network 154,000-369,000, but owing to a high false-positive rate, current maps are roughly only 50% and 10% complete, respectively. Paradoxically, releasing raw, unfiltered assay data might help separate true from false interactions.

Journal ArticleDOI
TL;DR: Identification of those cancer genes mutated in the NCI-60, in combination with pharmacologic and molecular profiles of the cells, will allow for more informed interpretation of anticancer agent screening and will enhance the use of the NCi-60 cell lines for molecularly targeted screens.
Abstract: The panel of 60 human cancer cell lines (the NCI-60) assembled by the National Cancer Institute for anticancer drug discovery is a widely used resource. The NCI-60 has been characterized pharmacologically and at the molecular level more extensively than any other set of cell lines. However, no systematic mutation analysis of genes causally implicated in oncogenesis has been reported. This study reports the sequence analysis of 24 known cancer genes in the NCI-60 and an assessment of 4 of the 24 genes for homozygous deletions. One hundred thirty-seven oncogenic mutations were identified in 14 (APC, BRAF, CDKN2, CTNNB1, HRAS, KRAS, NRAS, SMAD4, PIK3CA, PTEN, RB1, STK11, TP53, and VHL) of the 24 genes. All lines have at least one mutation among the cancer genes examined, with most lines (73%) having more than one. Identification of those cancer genes mutated in the NCI-60, in combination with pharmacologic and molecular profiles of the cells, will allow for more informed interpretation of anticancer agent screening and will enhance the use of the NCI-60 cell lines for molecularly targeted screens.

Journal ArticleDOI
TL;DR: The evidence suggests that when MSH6 is inactivated in gliomas, alkylating agents convert from induction of tumor cell death to promotion of neoplastic progression, and the potential of large scale sequencing for revealing and elucidating mutagenic processes operative in individual human cancers is highlighted.
Abstract: Malignant gliomas have a very poor prognosis. The current standard of care for these cancers consists of extended adjuvant treatment with the alkylating agent temozolomide after surgical resection and radiotherapy. Although a statistically significant increase in survival has been reported with this regimen, nearly all gliomas recur and become insensitive to further treatment with this class of agents. We sequenced 500 kb of genomic DNA corresponding to the kinase domains of 518 protein kinases in each of nine gliomas. Large numbers of somatic mutations were observed in two gliomas recurrent after alkylating agent treatment. The pattern of mutations in these cases showed strong similarity to that induced by alkylating agents in experimental systems. Further investigation revealed inactivating somatic mutations of the mismatch repair gene MSH6 in each case. We propose that inactivating somatic mutations of MSH6 confer resistance to alkylating agents in gliomas in vivo and concurrently unleash accelerated mutagenesis in resistant clones as a consequence of continued exposure to alkylating agents in the presence of defective mismatch repair. The evidence therefore suggests that when MSH6 is inactivated in gliomas, alkylating agents convert from induction of tumor cell death to promotion of neoplastic progression. These observations highlight the potential of large scale sequencing for revealing and elucidating mutagenic processes operative in individual human cancers.

Journal ArticleDOI
TL;DR: It is demonstrated that a simple time lag model provides a general, parsimonious explanation of the extensive variation in the dN/dS ratio seen when comparing closely related bacterial genomes, and a role for hitch-hiking in the accumulation of non-synonymous mutations is suggested.

Journal ArticleDOI
19 Oct 2006-Neuron
TL;DR: A PSD-MAGUK-specific regulation of AMPA-R synaptic expression that establishes and maintains glutamatergic synaptic transmission in the mammalian central nervous system is established.

Journal ArticleDOI
TL;DR: It is shown that the IVOMs outperform state-of-the-art low and high order motif methods predicting not only the already characterized Salmonella Pathogenicity Islands but also three novel SPIs and other HGT events.
Abstract: Motivation: There is a growing literature on the detection of Horizontal Gene Transfer (HGT) events by means of parametric, non-comparative methods. Such approaches rely only on sequence information and utilize different low and high order indices to capture compositional deviation from the genome backbone; the superiority of the latter over the former has been shown elsewhere. However even high order k-mers may be poor estimators of HGT, when insufficient information is available, e.g. in short sliding windows. Most of the current HGT prediction methods require pre-existing annotation, which may restrict their application on newly sequenced genomes. Results: We introduce a novel computational method, Interpolated Variable Order Motifs (IVOMs), which exploits compositional biases using variable order motif distributions and captures more reliably the local composition of a sequence compared with fixed-order methods. For optimal localization of the boundaries of each predicted region, a second order, two-state hidden Markov model (HMM) is implemented in a change-point detection framework. We applied the IVOM approach to the genome of Salmonella enterica serovar Typhi CT18, a well-studied prokaryote in terms of HGT events, and we show that the IVOMs outperform state-of-the-art low and high order motif methods predicting not only the already characterized Salmonella Pathogenicity Islands (SPI-1 to SPI-10) but also three novel SPIs (SPI-15, SPI-16, SPI-17) and other HGT events. Availability: The software is available under a GPL license as a standalone application at http://www.sanger.ac.uk/Software/analysis/alien_hunter Contact: [email protected] Supplementary Information: Supplementary data are available at Bioinformatics online.

Journal ArticleDOI
TL;DR: It is shown that, in each trio, the parent of origin of the deleted chromosome 17 carries at least one H2 chromosome and this region of 17q21.3 shows complex genomic architecture with well-described low-copy repeats (LCRs).
Abstract: Recently, the application of array-based comparative genomic hybridization (array CGH) has improved rates of detection of chromosomal imbalances in individuals with mental retardation and dysmorphic features. Here, we describe three individuals with learning disability and a heterozygous deletion at chromosome 17q21.3, detected in each case by array CGH. FISH analysis demonstrated that the deletions occurred as de novo events in each individual and were between 500 kb and 650 kb in size. A recently described 900-kb inversion that suppresses recombination between ancestral H1 and H2 haplotypes encompasses the deletion. We show that, in each trio, the parent of origin of the deleted chromosome 17 carries at least one H2 chromosome. This region of 17q21.3 shows complex genomic architecture with well-described low-copy repeats (LCRs). The orientation of LCRs flanking the deleted segment in inversion heterozygotes is likely to facilitate the generation of this microdeletion by means of non-allelic homologous recombination.

Journal ArticleDOI
TL;DR: The results presented here contribute to the value of ongoing large-scale annotation projects and should guide further experimental methods when being scaled up to the entire human genome sequence.
Abstract: Background: We present the results of EGASP, a community experiment to assess the state-ofthe-art in genome annotation within the ENCODE regions, which span 1% of the human genome sequence. The experiment had two major goals: the assessment of the accuracy of computational methods to predict protein coding genes; and the overall assessment of the completeness of the current human genome annotations as represented in the ENCODE regions. For the computational prediction assessment, eighteen groups contributed gene predictions. We evaluated these submissions against each other based on a ‘reference set’ of annotations generated as part of the GENCODE project. These annotations were not available to the prediction groups prior to the submission deadline, so that their predictions were blind and an external advisory committee could perform a fair assessment. Results: The best methods had at least one gene transcript correctly predicted for close to 70% of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into account alternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotide level, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programs relying on mRNA and protein sequences were the most accurate in reproducing the manually curated annotations. Experimental validation shows that only a very small percentage (3.2%) of

Journal ArticleDOI
TL;DR: The results indicate that RNA editing increases the diversity of miRNAs and their targets, and hence may modulate miRNA function.
Abstract: Background: MicroRNAs (miRNAs) are short RNAs of around 22 nucleotides that regulate gene expression. The primary transcripts of miRNAs contain double-stranded RNA and are therefore potential substrates for adenosine to inosine (A-to-I) RNA editing. Results: We have conducted a survey of RNA editing of miRNAs from ten human tissues by sequence comparison of PCR products derived from matched genomic DNA and total cDNA from the same individual. Six out of 99 (6%) miRNA transcripts from which data were obtained were subject to A-to-I editing in at least one tissue. Four out of seven edited adenosines were in the mature miRNA and were predicted to change the target sites in 3' untranslated regions. For a further six miRNAs, we identified A-to-I editing of transcripts derived from the opposite strand of the genome to the annotated miRNA. These miRNAs may have been annotated to the wrong genomic strand. Conclusion: Our results indicate that RNA editing increases the diversity of miRNAs and their targets, and hence may modulate miRNA function.