scispace - formally typeset
Search or ask a question
Author

Nadia Chuzhanova

Bio: Nadia Chuzhanova is an academic researcher from Nottingham Trent University. The author has contributed to research in topics: Gene & Mutation. The author has an hindex of 40, co-authored 101 publications receiving 6808 citations. Previous affiliations of Nadia Chuzhanova include Russian Academy of Sciences & University of Central Lancashire.


Papers
More filters
Journal ArticleDOI
TL;DR: The method is implemented as a program with a simple-to-use graphic user interface that is capable of running on a range of computer platforms and concludes that, as a conservative estimate, 1 in every 20 public database records is likely to be corrupt.
Abstract: A new method for detecting chimeras and other anomalies within 16S rRNA sequence records is presented. Using this method, we screened 1,399 sequences from 19 phyla, as defined by the Ribosomal Database Project, release 9, update 22, and found 5.0% to harbor substantial errors. Of these, 64.3% were obvious chimeras, 14.3% were unidentified sequencing errors, and 21.4% were highly degenerate. In all, 11 phyla contained obvious chimeras, accounting for 0.8 to 11% of the records for these phyla. Many chimeras (43.1%) were formed from parental sequences belonging to different phyla. While most comprised two fragments, 13.7% were composed of at least three fragments, often from three different sources. A separate analysis of the Bacteroidetes phylum (2,739 sequences) also revealed 5.8% records to be anomalous, of which 65.4% were apparently chimeric. Overall, we conclude that, as a conservative estimate, 1 in every 20 public database records is likely to be corrupt. Our results support concerns recently expressed over the quality of the public repositories. With 16S rRNA sequence data increasingly playing a dominant role in bacterial systematics and environmental biodiversity studies, it is vital that steps be taken to improve screening of sequences prior to submission. To this end, we have implemented our method as a program with a simple-to-use graphic user interface that is capable of running on a range of computer platforms. The program is called Pintail, is released under the terms of the GNU General Public License open source license, and is freely available from our website at http://www.cardiff.ac.uk/biosi/research/biosoft/.

802 citations

Journal ArticleDOI
TL;DR: A new computer program, called Mallard, is presented for screening entire 16S rRNA gene libraries of up to 1,000 sequences for chimeras and other artifacts, which far exceed previous estimates of artifacts within public repositories and highlight the urgent need for all researchers to adequately screen their libraries prior to submission.
Abstract: A new computer program, called Mallard, is presented for screening entire 16S rRNA gene libraries of up to 1,000 sequences for chimeras and other artifacts. Written in the Java computer language and capable of running on all major operating systems, the program provides a novel graphical approach for visualizing phylogenetic relationships among 16S rRNA gene sequences. To illustrate its use, we analyzed most of the large libraries of cloned bacterial 16S rRNA gene sequences submitted to the public repository during 2005. Defining a large library as one containing 100 or more sequences of 1,200 bases or greater, we screened 25 of the 28 libraries and found that all but three contained substantial anomalies. Overall, 543 anomalous sequences were found. The average anomaly content per clone library was 9.0%, 4% higher than that previously estimated for the public repository overall. In addition, 90.8% of anomalies had characteristic chimeric patterns, a rise of 25.4% over that found previously. One library alone was found to contain 54 chimeras, representing 45.8% of its content. These figures far exceed previous estimates of artifacts within public repositories and further highlight the urgent need for all researchers to adequately screen their libraries prior to submission. Mallard is freely available from our website at http://www.cardiff.ac.uk/biosi/research/biosoft/.

711 citations

Journal ArticleDOI
TL;DR: Current thinking about how gene conversion occurs is assessed, the key part it has played in fashioning extant human genes is explored, and a meta-analysis of gene-conversion events that are known to have caused human genetic disease is carried out.
Abstract: Gene conversion, one of the two mechanisms of homologous recombination, involves the unidirectional transfer of genetic material from a 'donor' sequence to a highly homologous 'acceptor'. Considerable progress has been made in understanding the molecular mechanisms that underlie gene conversion, its formative role in human genome evolution and its implications for human inherited disease. Here we assess current thinking about how gene conversion occurs, explore the key part it has played in fashioning extant human genes, and carry out a meta-analysis of gene-conversion events that are known to have caused human genetic disease.

609 citations

Journal ArticleDOI
TL;DR: The proportion of disease‐causing nonsense mutations predicted to elicit nonsense‐mediated mRNA decay (NMD) is significantly higher than among nonobserved (potential) nonsense mutations, implying that nonsense mutations that elicit NMD are more likely to come to clinical attention.
Abstract: Nonsense mutations account for ∼11% of all described gene lesions causing human inherited disease and ∼20% of disease-associated single-basepair substitutions affecting gene coding regions. Pathological nonsense mutations resulting in TGA (38.5%), TAG (40.4%), and TAA (21.1%) occur in different proportions to naturally occurring stop codons. Of the 23 different nucleotide substitutions giving rise to nonsense mutations, the most frequent are CGA → TGA (21%; resulting from methylation-mediated deamination) and CAG → TAG (19%). The differing nonsense mutation frequencies are largely explicable in terms of variable nucleotide substitution rates such that it is unnecessary to invoke differential translational termination efficiency or differential codon usage. Some genes are characterized by numerous nonsense mutations but relatively few if any missense mutations (e.g., CHM) whereas other genes exhibit many missense mutations but few if any nonsense mutations (e.g., PSEN1). Genes in the latter category have a tendency to encode proteins characterized by multimer formation. Consistent with the operation of a clinical selection bias, genes exhibiting an excess of nonsense mutations are also likely to display an excess of frameshift mutations. Tumor suppressor (TS) genes exhibit a disproportionate number of nonsense mutations while most mutations in oncogenes are missense. A total of 12% of somatic nonsense mutations in TS genes were found to occur recurrently in the hypermutable CpG dinucleotide. In a comparison of somatic and germline mutational spectra for 17 TS genes, ∼43% of somatic nonsense mutations had counterparts in the germline (rising to 98% for CpG mutations). Finally, the proportion of disease-causing nonsense mutations predicted to elicit nonsense-mediated mRNA decay (NMD) is significantly higher (P=1.56 × 10−9) than among nonobserved (potential) nonsense mutations, implying that nonsense mutations that elicit NMD are more likely to come to clinical attention.

331 citations

Journal ArticleDOI
TL;DR: These data represent results from the first study to correlate a specific small mutation of the NF1 gene to the expression of a particular clinical phenotype, and the biological mechanism that relates this specific mutation to the suppression of cutaneous neurofibroma development is unknown.
Abstract: Neurofibromatosis type 1 (NF1) is characterized by cafe-au-lait spots, skinfold freckling, and cutaneous neurofibromas. No obvious relationships between small mutations (<20 bp) of the NF1 gene and a specific phenotype have previously been demonstrated, which suggests that interaction with either unlinked modifying genes and/or the normal NF1 allele may be involved in the development of the particular clinical features associated with NF1. We identified 21 unrelated probands with NF1 (14 familial and 7 sporadic cases) who were all found to have the same c.2970-2972 delAAT (p.990delM) mutation but no cutaneous neurofibromas or clinically obvious plexiform neurofibromas. Molecular analysis identified the same 3-bp inframe deletion (c.2970-2972 delAAT) in exon 17 of the NF1 gene in all affected subjects. The ΔAAT mutation is predicted to result in the loss of one of two adjacent methionines (codon 991 or 992) (ΔMet991), in conjunction with silent ACA→ACG change of codon 990. These two methionine residues are located in a highly conserved region of neurofibromin and are expected, therefore, to have a functional role in the protein. Our data represent results from the first study to correlate a specific small mutation of the NF1 gene to the expression of a particular clinical phenotype. The biological mechanism that relates this specific mutation to the suppression of cutaneous neurofibroma development is unknown.

308 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: UCHIME has better sensitivity than ChimeraSlayer (previously the most sensitive database method), especially with short, noisy sequences, and in testing on artificial bacterial communities with known composition, UCHIME de novo sensitivity is shown to be comparable to Perseus.
Abstract: Motivation: Chimeric DNA sequences often form during polymerase chain reaction amplification, especially when sequencing single regions (e.g. 16S rRNA or fungal Internal Transcribed Spacer) to assess diversity or compare populations. Undetected chimeras may be misinterpreted as novel species, causing inflated estimates of diversity and spurious inferences of differences between populations. Detection and removal of chimeras is therefore of critical importance in such experiments. Results: We describe UCHIME, a new program that detects chimeric sequences with two or more segments. UCHIME either uses a database of chimera-free sequences or detects chimeras de novo by exploiting abundance data. UCHIME has better sensitivity than ChimeraSlayer (previously the most sensitive database method), especially with short, noisy sequences. In testing on artificial bacterial communities with known composition, UCHIME de novo sensitivity is shown to be comparable to Perseus. UCHIME is >100× faster than Perseus and >1000× faster than ChimeraSlayer. Contact: [email protected] Availability: Source, binaries and data: http://drive5.com/uchime. Supplementary information:Supplementary data are available at Bioinformatics online.

11,904 citations

Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Journal ArticleDOI
TL;DR: A 16S rRNA gene database (http://greengenes.lbl.gov) was used to provide chimera screening, standard alignment, and taxonomic classification using multiple published taxonomies as mentioned in this paper.
Abstract: A 16S rRNA gene database (http://greengenes.lbl.gov) addresses limitations of public repositories by providing chimera screening, standard alignment, and taxonomic classification using multiple published taxonomies. It was found that there is incongruent taxonomic nomenclature among curators even at the phylum level. Putative chimeras were identified in 3% of environmental sequences and in 0.2% of records derived from isolates. Environmental sequences were classified into 100 phylum-level lineages in the Archaea and Bacteria.

9,593 citations

Journal ArticleDOI
TL;DR: SILVA (from Latin silva, forest), was implemented to provide a central comprehensive web resource for up to date, quality controlled databases of aligned rRNA sequences from the Bacteria, Archaea and Eukarya domains.
Abstract: Sequencing ribosomal RNA (rRNA) genes is currently the method of choice for phylogenetic reconstruction, nucleic acid based detection and quantification of microbial diversity. The ARB software suite with its corresponding rRNA datasets has been accepted by researchers worldwide as a standard tool for large scale rRNA analysis. However, the rapid increase of publicly available rRNA sequence data has recently hampered the maintenance of comprehensive and curated rRNA knowledge databases. A new system, SILVA (from Latin silva, forest), was implemented to provide a central comprehensive web resource for up to date, quality controlled databases of aligned rRNA sequences from the Bacteria, Archaea and Eukarya domains. All sequences are checked for anomalies, carry a rich set of sequence associated contextual information, have multiple taxonomic classifications, and the latest validly described nomenclature. Furthermore, two precompiled sequence datasets compatible with ARB are offered for download on the SILVA website: (i) the reference (Ref) datasets, comprising only high quality, nearly full length sequences suitable for in-depth phylogenetic analysis and probe design and (ii) the comprehensive Parc datasets with all publicly available rRNA sequences longer than 300 nucleotides suitable for biodiversity analyses. The latest publicly available database release 91 (August 2007) hosts 547 521 sequences split into 461 823 small subunit and 85 689 large subunit rRNAs.

5,733 citations

Journal ArticleDOI
TL;DR: A basic taxonomy of feature selection techniques is provided, providing their use, variety and potential in a number of both common as well as upcoming bioinformatics applications.
Abstract: Feature selection techniques have become an apparent need in many bioinformatics applications. In addition to the large pool of techniques that have already been developed in the machine learning and data mining fields, specific applications in bioinformatics have led to a wealth of newly proposed techniques. In this article, we make the interested reader aware of the possibilities of feature selection, providing a basic taxonomy of feature selection techniques, and discussing their use, variety and potential in a number of both common as well as upcoming bioinformatics applications. Contact: yvan.saeys@psb.ugent.be Supplementary information: http://bioinformatics.psb.ugent.be/supplementary_data/yvsae/fsreview

4,706 citations