Discretized Gaussian mixture for genotyping of microsatellite loci containing homopolymer runs
Reads0
Chats0
TLDR
GenoTan, a program using a discretized Gaussian mixture model combined with a rules-based approach to identify inherited variation of microsatellite loci from short sequence reads without paired-end information, effectively distinguishes length variants from noise including insertion/deletion errors in homopolymer runs by addressing the bidirectional aspect of insertion and deletion errors in sequence reads.Abstract:
Motivation: Inferring lengths of inherited microsatellite alleles with single base pair resolution from short sequence reads is challenging due to several sources of noise caused by the repetitive nature of microsatellites and the technologies used to generate raw sequence data. Results: We have developed a program, GenoTan, using a discretized Gaussian mixture model combined with a rules-based approach to identify inherited variation of microsatellite loci from short sequence reads without paired-end information. It effectively distinguishes length variants from noise including insertion/deletion errors in homopolymer runs by addressing the bidirectional aspect of insertion and deletion errors in sequence reads. Here we first introduce a homopolymer decomposition method which estimates error bias toward insertion or deletion in homopolymer sequence runs. Combining these approaches, GenoTan was able to genotype 94.9% of microsatellite loci accurately from simulated data with 40x sequence coverage quickly while the other programs showed590% correct calls for the same data and required 5� 30� more computational time than GenoTan. It also showed the highest true-positive rate for real data using mixed sequence data of two Drosophila inbred lines, which was a novel validation approach for genotyping. Availability: GenoTan is open-source software available at http://gen otan.sourceforge.net.read more
Citations
More filters
Journal ArticleDOI
Detecting Expansions of Tandem Repeats in Cohorts Sequenced with Short-Read Sequencing Data
Rick M. Tankard,Mark F. Bennett,Peter Degorski,Martin B. Delatycki,Paul J. Lockhart,Melanie Bahlo,Melanie Bahlo +6 more
TL;DR: It is demonstrated that exSTRa can be effectively utilized as a screening tool for detecting repeat expansions in WES and WGS data, although the best performance would be produced by consensus calling, wherein at least two out of the four currently available screening methods call an expansion.
Journal ArticleDOI
A unified analytic framework for prioritization of non-coding variants of uncertain significance in heritable breast and ovarian cancer
Eliseos J. Mucaki,Natasha G. Caminsky,Ami M. Perri,Ruipeng Lu,Alain Laederach,Matthew Halvorsen,Joan H.M. Knoll,Peter K. Rogan +7 more
TL;DR: A strategy for complete gene sequence analysis followed by a unified framework for interpreting non-coding variants that may affect gene expression is presented and large numbers of variants detected by NGS are distilled to a limited set of variants prioritized as potential deleterious changes.
Journal ArticleDOI
MicNeSs: genotyping microsatellite loci from a collection of (NGS) reads
Marie Suez,Marie Suez,Abdelkader Behdenna,Sophie Brouillet,Sophie Brouillet,Paula Graça,Paula Graça,Dominique Higuet,Dominique Higuet,Guillaume Achaz +9 more
TL;DR: An algorithm to automatically and efficiently genotype microsatellites from a collection of reads sorted by individual, which can be used to genotype any microsatellite locus from any organism and has been tested on 454 pyrosequencing data of several loci from fruit flies and red deers.
Journal ArticleDOI
Pheno2Geno : High-throughput generation of genetic markers and maps from molecular phenotypes for crosses between inbred strains
Konrad Zych,Konrad Zych,Yang Li,Yang Li,Joeri van der Velde,Ronny V. L. Joosen,Wilco Ligterink,Ritsert C. Jansen,Danny Arends +8 more
TL;DR: The Pheno2Geno package makes use of genome-wide molecular profiling and provides a tool for high-throughput de novo map construction and saturation of existing genetic maps.
Journal ArticleDOI
Low temperature isothermal amplification of microsatellites drastically reduces stutter artifact formation and improves microsatellite instability detection in cancer.
Antoine Daunay,Alex Duval,Laura G. Baudrin,Olivier Buhard,Victor Renault,Jean-François Deleuze,Alexandre How-Kit +6 more
TL;DR: It is shown that LT-RPA improves the limit of detection of MSI compared to PCR up to four times, notably for small deletions, and simplifies the identification of the mutant alleles.
References
More filters
Journal ArticleDOI
The Drosophila melanogaster Genetic Reference Panel
Trudy F. C. Mackay,Stephen Richards,Eric A. Stone,Antonio Barbadilla,Julien F. Ayroles,Julien F. Ayroles,Dianhui Zhu,Sònia Casillas,Yi Han,Michael M. Magwire,Julie M. Cridland,Mark F. Richardson,Robert R. H. Anholt,Maite G. Barrón,Crystal Bess,Kerstin P. Blankenburg,Mary Anna Carbone,David Castellano,Lesley S. Chaboub,Laura H Duncan,Zeke Harris,Mehwish Javaid,Joy Jayaseelan,Shalini N. Jhangiani,Katherine W. Jordan,Fremiet Lara,Faye Lawrence,Sandra L. Lee,Pablo Librado,Raquel S. Linheiro,Richard F. Lyman,Aaron J. Mackey,Mala Munidasa,Donna M. Muzny,Lynne V. Nazareth,Irene Newsham,Lora Perales,Ling-Ling Pu,Carson Qu,Miquel Ràmia,Jeffrey G. Reid,Stephanie M. Rollmann,Stephanie M. Rollmann,Julio Rozas,Nehad Saada,Lavanya Turlapati,Kim C. Worley,Yuanqing Wu,Akihiko Yamamoto,Yiming Zhu,Casey M. Bergman,Kevin R. Thornton,David Mittelman,Richard A. Gibbs +53 more
TL;DR: The Drosophila melanogaster Genetic Reference Panel is described, a community resource for analysis of population genomics and quantitative traits, which reveals reduced polymorphism in centromeric autosomal regions and the X chromosomes, evidence for positive and negative selection, and rapid evolution of the X chromosome.
Journal ArticleDOI
Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems
TL;DR: In this article, the authors comprehensively evaluate properties of genomic HiSeq and GAIIx data derived from two plant genomes and one virus, with read lengths of 95 to 150 bases.
Journal ArticleDOI
Dindel: Accurate indel calls from short-read data
Cornelis A. Albers,Gerton Lunter,Daniel G. MacArthur,Gilean McVean,Willem H. Ouwehand,Richard Durbin +5 more
TL;DR: This work proposes a Bayesian method to call indels from short-read sequence data in individuals and populations by realigning reads to candidate haplotypes that represent alternative sequence to the reference, and achieves low false discovery rates on simulated and real data sets.
Journal ArticleDOI
The direction of microsatellite mutations is dependent upon allele length.
TL;DR: The rate of contraction mutations increases exponentially with allele size, whereas the rate of expansion mutations is constant across the entire allele distribution, offering an explanation for the stationary allele distribution of microsatellites.
Journal ArticleDOI
Simple tandem DNA repeats and human genetic disease.
TL;DR: The mechanism of dynamic mutation is discussed, and a number of observations of simple tandem repeat mutation that could assist in understanding this phenomenon are commented on.
Related Papers (5)
A framework for variation discovery and genotyping using next-generation DNA sequencing data
Mark A. DePristo,Eric Banks,Ryan Poplin,Kiran V. Garimella,Jared Maguire,Christopher Hartl,Anthony A. Philippakis,Anthony A. Philippakis,Anthony A. Philippakis,Guillermo del Angel,Manuel A. Rivas,Manuel A. Rivas,Matt Hanna,Aaron McKenna,Timothy Fennell,Andrew Kernytsky,Andrey Sivachenko,Kristian Cibulskis,Stacey Gabriel,David Altshuler,David Altshuler,Mark J. Daly,Mark J. Daly +22 more