scispace - formally typeset
Search or ask a question
Author

Natasha G. Caminsky

Bio: Natasha G. Caminsky is an academic researcher from University of Western Ontario. The author has contributed to research in topics: Medicine & Gene. The author has an hindex of 5, co-authored 8 publications receiving 160 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: A review of nearly 20 years of research on the applications of information theory to interpret coding and non-coding mutations that alter mRNA splicing in rare and common diseases offers guidelines for optimal use of IT software for interpretation of mRNA splice mutations.
Abstract: The interpretation of genomic variants has become one of the paramount challenges in the post-genome sequencing era. In this review we summarize nearly 20 years of research on the applications of information theory (IT) to interpret coding and non-coding mutations that alter mRNA splicing in rare and common diseases. We compile and summarize the spectrum of published variants analyzed by IT, to provide a broad perspective of the distribution of deleterious natural and cryptic splice site variants detected, as well as those affecting splicing regulatory sequences. Results for natural splice site mutations can be interrogated dynamically with Splicing Mutation Calculator, a companion software program that computes changes in information content for any splice site substitution, linked to corresponding publications containing these mutations. The accuracy of IT-based analysis was assessed in the context of experimentally validated mutations. Because splice site information quantifies binding affinity, IT-based analyses can discern the differences between variants that account for the observed reduced (leaky) versus abolished mRNA splicing. We extend this principle by comparing predicted mutations in natural, cryptic, and regulatory splice sites with observed deleterious phenotypic and benign effects. Our analysis of 1727 variants revealed a number of general principles useful for ensuring portability of these analyses and accurate input and interpretation of mutations. We offer guidelines for optimal use of IT software for interpretation of mRNA splicing mutations.

86 citations

Journal ArticleDOI
TL;DR: Information theory (IT) is applied to predict and prioritize noncoding variants of uncertain significance in regulatory, coding, and intronic regions based on changes in binding sites in these genes.
Abstract: BRCA1 and BRCA2 testing for hereditary breast and ovarian cancer (HBOC) does not identify all pathogenic variants. Sequencing of 20 complete genes in HBOC patients with uninformative test results (N = 287), including noncoding and flanking sequences of ATM, BARD1, BRCA1, BRCA2, CDH1, CHEK2, EPCAM, MLH1, MRE11A, MSH2, MSH6, MUTYH, NBN, PALB2, PMS2, PTEN, RAD51B, STK11, TP53, and XRCC2, identified 38,372 unique variants. We apply information theory (IT) to predict and prioritize noncoding variants of uncertain significance in regulatory, coding, and intronic regions based on changes in binding sites in these genes. Besides mRNA splicing, IT provides a common framework to evaluate potential affinity changes in transcription factor (TFBSs), splicing regulatory (SRBSs), and RNA-binding protein (RBBSs) binding sites following mutation. We prioritized variants affecting the strengths of 10 splice sites (four natural, six cryptic), 148 SRBS, 36 TFBS, and 31 RBBS. Three variants were also prioritized based on their predicted effects on mRNA secondary (2°) structure and 17 for pseudoexon activation. Additionally, four frameshift, two in-frame deletions, and five stop-gain mutations were identified. When combined with pedigree information, complete gene sequence analysis can focus attention on a limited set of variants in a wide spectrum of functional mutation types for downstream functional and co-segregation analysis.

39 citations

Journal ArticleDOI
TL;DR: A strategy for complete gene sequence analysis followed by a unified framework for interpreting non-coding variants that may affect gene expression is presented and large numbers of variants detected by NGS are distilled to a limited set of variants prioritized as potential deleterious changes.
Abstract: Sequencing of both healthy and disease singletons yields many novel and low frequency variants of uncertain significance (VUS). Complete gene and genome sequencing by next generation sequencing (NGS) significantly increases the number of VUS detected. While prior studies have emphasized protein coding variants, non-coding sequence variants have also been proven to significantly contribute to high penetrance disorders, such as hereditary breast and ovarian cancer (HBOC). We present a strategy for analyzing different functional classes of non-coding variants based on information theory (IT) and prioritizing patients with large intragenic deletions. We captured and enriched for coding and non-coding variants in genes known to harbor mutations that increase HBOC risk. Custom oligonucleotide baits spanning the complete coding, non-coding, and intergenic regions 10 kb up- and downstream of ATM, BRCA1, BRCA2, CDH1, CHEK2, PALB2, and TP53 were synthesized for solution hybridization enrichment. Unique and divergent repetitive sequences were sequenced in 102 high-risk, anonymized patients without identified mutations in BRCA1/2. Aside from protein coding and copy number changes, IT-based sequence analysis was used to identify and prioritize pathogenic non-coding variants that occurred within sequence elements predicted to be recognized by proteins or protein complexes involved in mRNA splicing, transcription, and untranslated region (UTR) binding and structure. This approach was supplemented by in silico and laboratory analysis of UTR structure. 15,311 unique variants were identified, of which 245 occurred in coding regions. With the unified IT-framework, 132 variants were identified and 87 functionally significant VUS were further prioritized. An intragenic 32.1 kb interval in BRCA2 that was likely hemizygous was detected in one patient. We also identified 4 stop-gain variants and 3 reading-frame altering exonic insertions/deletions (indels). We have presented a strategy for complete gene sequence analysis followed by a unified framework for interpreting non-coding variants that may affect gene expression. This approach distills large numbers of variants detected by NGS to a limited set of variants prioritized as potential deleterious changes.

25 citations

Journal ArticleDOI
TL;DR: A prototype software system with sufficient capacity and speed to estimate radiation exposures in a mass casualty event by counting dicentric chromosomes (DCs) in metaphase cells from many individuals is presented.
Abstract: We present a prototype software system with sufficient capacity and speed to estimate radiation exposures in a mass casualty event by counting dicentric chromosomes (DCs) in metaphase cells from many individuals. Top-ranked metaphase cell images are segmented by classifying and defining chromosomes with an active contour gradient vector field (GVF) and by determining centromere locations along the centreline. The centreline is extracted by discrete curve evolution (DCE) skeleton branch pruning and curve interpolation. Centromere detection minimises the global width and DAPI-staining intensity profiles along the centreline. A second centromere is identified by reapplying this procedure after masking the first. Dicentrics can be identified from features that capture width and intensity profile characteristics as well as local shape features of the object contour at candidate pixel locations. The correct location of the centromere is also refined in chromosomes with sister chromatid separation. The overall algorithm has both high sensitivity (85 %) and specificity (94 %). Results are independent of the shape and structure of chromosomes in different cells, or the laboratory preparation protocol followed. The prototype software was recoded in C++/OpenCV; image processing was accelerated by data and task parallelisation with Message Passaging Interface and Intel Threading Building Blocks and an asynchronous non-blocking I/O strategy. Relative to a serial process, metaphase ranking, GVF and DCE are, respectively, 100 and 300-fold faster on an 8-core desktop and 64-core cluster computers. The software was then ported to a 1024-core supercomputer, which processed 200 metaphase images each from 1025 specimens in 1.4 h.

22 citations

Posted ContentDOI
11 Nov 2015-bioRxiv
TL;DR: This approach distills large numbers of variants detected by NGS to a limited set of variants prioritized as potential deleterious changes and presents a strategy for complete gene sequence analysis followed by a unified framework for interpreting non-coding variants that may affect gene expression.
Abstract: Background: Sequencing of both healthy and disease singletons yields many novel and low frequency variants of uncertain significance (VUS). Complete gene and genome sequencing by next generation sequencing (NGS) significantly increases the number of VUS detected. While prior studies have emphasized protein coding variants, non-coding sequence variants have also been proven to significantly contribute to high penetrance disorders, such as hereditary breast and ovarian cancer (HBOC). We present a strategy for analyzing different functional classes of non-coding variants based on information theory (IT). Methods: We captured and enriched for coding and non-coding variants in genes known to harbor mutations that increase HBOC risk. Custom oligonucleotide baits spanning the complete coding, non-coding, and intergenic regions 10 kb up- and downstream of ATM, BRCA1, BRCA2, CDH1, CHEK2, PALB2, and TP53 were synthesized for solution hybridization enrichment. Unique and divergent repetitive sequences were sequenced in 102 high-risk patients without identified mutations in BRCA1/2. Aside from protein coding changes, IT-based sequence analysis was used to identify and prioritize pathogenic non-coding variants that occurred within sequence elements predicted to be recognized by proteins or protein complexes involved in mRNA splicing, transcription, and untranslated region (UTR) binding and structure. This approach was supplemented by in silico and laboratory analysis of UTR structure. Results: 15,311 unique variants were identified, of which 245 occurred in coding regions. With the unified IT-framework, 132 variants were identified and 87 functionally significant VUS were further prioritized. We also identified 4 stop-gain variants and 3 reading-frame altering exonic insertions/deletions (indels). Conclusions: We have presented a strategy for complete gene sequence analysis followed by a unified framework for interpreting non-coding variants that may affect gene expression. This approach distills large numbers of variants detected by NGS to a limited set of variants prioritized as potential deleterious changes.

9 citations


Cited by
More filters
Journal ArticleDOI

3,734 citations

01 Jan 2011
TL;DR: The sheer volume and scope of data posed by this flood of data pose a significant challenge to the development of efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data.
Abstract: Rapid improvements in sequencing and array-based platforms are resulting in a flood of diverse genome-wide data, including data from exome and whole-genome sequencing, epigenetic surveys, expression profiling of coding and noncoding RNAs, single nucleotide polymorphism (SNP) and copy number profiling, and functional assays. Analysis of these large, diverse data sets holds the promise of a more comprehensive understanding of the genome and its relation to human disease. Experienced and knowledgeable human review is an essential component of this process, complementing computational approaches. This calls for efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data. However, the sheer volume and scope of data pose a significant challenge to the development of such tools.

2,187 citations

Journal ArticleDOI
Aude Nicolas1, Kevin P. Kenna2, Alan E. Renton1, Alan E. Renton3  +432 moreInstitutions (78)
21 Mar 2018-Neuron
TL;DR: Interestingly, mutations predominantly in the N-terminal motor domain of KIF5A are causative for two neurodegenerative diseases: hereditary spastic paraplegia and Charcot-Marie-Tooth type 2.

444 citations

Journal ArticleDOI
TL;DR: This article summarizes the current knowledge about the “splicing mutations” and methods that help to identify such changes in clinical diagnosis and recommends bioinformatic algorithms as a tool to assess the possible effect of the identified changes.
Abstract: Precise pre-mRNA splicing, essential for appropriate protein translation, depends on the presence of consensus “cis” sequences that define exon-intron boundaries and regulatory sequences recognized by splicing machinery. Point mutations at these consensus sequences can cause improper exon and intron recognition and may result in the formation of an aberrant transcript of the mutated gene. The splicing mutation may occur in both introns and exons and disrupt existing splice sites or splicing regulatory sequences (intronic and exonic splicing silencers and enhancers), create new ones, or activate the cryptic ones. Usually such mutations result in errors during the splicing process and may lead to improper intron removal and thus cause alterations of the open reading frame. Recent research has underlined the abundance and importance of splicing mutations in the etiology of inherited diseases. The application of modern techniques allowed to identify synonymous and nonsynonymous variants as well as deep intronic mutations that affected pre-mRNA splicing. The bioinformatic algorithms can be applied as a tool to assess the possible effect of the identified changes. However, it should be underlined that the results of such tests are only predictive, and the exact effect of the specific mutation should be verified in functional studies. This article summarizes the current knowledge about the “splicing mutations” and methods that help to identify such changes in clinical diagnosis.

383 citations

Journal ArticleDOI
TL;DR: Evidence from mRNA analysis and entire genomic sequencing indicates that pathogenic mutations can occur deep within the introns of over 75 disease-associated genes, highlighting the importance of studying variation in deep intronic sequence as a cause of monogenic disorders as well as hereditary cancer syndromes.
Abstract: Next-generation sequencing has revolutionized clinical diagnostic testing. Yet, for a substantial proportion of patients, sequence information restricted to exons and exon-intron boundaries fails to identify the genetic cause of the disease. Here we review evidence from mRNA analysis and entire genomic sequencing indicating that pathogenic mutations can occur deep within the introns of over 75 disease-associated genes. Deleterious DNA variants located more than 100 base pairs away from exon-intron junctions most commonly lead to pseudo-exon inclusion due to activation of non-canonical splice sites or changes in splicing regulatory elements. Additionally, deep intronic mutations can disrupt transcription regulatory motifs and non-coding RNA genes. This review aims to highlight the importance of studying variation in deep intronic sequence as a cause of monogenic disorders as well as hereditary cancer syndromes.

284 citations