Discretized Gaussian mixture for genotyping of microsatellite loci containing homopolymer runs
Reads0
Chats0
TLDR
GenoTan, a program using a discretized Gaussian mixture model combined with a rules-based approach to identify inherited variation of microsatellite loci from short sequence reads without paired-end information, effectively distinguishes length variants from noise including insertion/deletion errors in homopolymer runs by addressing the bidirectional aspect of insertion and deletion errors in sequence reads.Abstract:
Motivation: Inferring lengths of inherited microsatellite alleles with single base pair resolution from short sequence reads is challenging due to several sources of noise caused by the repetitive nature of microsatellites and the technologies used to generate raw sequence data. Results: We have developed a program, GenoTan, using a discretized Gaussian mixture model combined with a rules-based approach to identify inherited variation of microsatellite loci from short sequence reads without paired-end information. It effectively distinguishes length variants from noise including insertion/deletion errors in homopolymer runs by addressing the bidirectional aspect of insertion and deletion errors in sequence reads. Here we first introduce a homopolymer decomposition method which estimates error bias toward insertion or deletion in homopolymer sequence runs. Combining these approaches, GenoTan was able to genotype 94.9% of microsatellite loci accurately from simulated data with 40x sequence coverage quickly while the other programs showed590% correct calls for the same data and required 5� 30� more computational time than GenoTan. It also showed the highest true-positive rate for real data using mixed sequence data of two Drosophila inbred lines, which was a novel validation approach for genotyping. Availability: GenoTan is open-source software available at http://gen otan.sourceforge.net.read more
Citations
More filters
Posted ContentDOI
Detecting known repeat expansions with standard protocol next generation sequencing, towards developing a single screening test for neurological repeat expansion disorders
TL;DR: It is demonstrated that exSTRa can be effectively utilized as a screening tool to interrogate WES and WGS sequencing data generated with PCR-based library preparations which can then be followed up with specific diagnostic tests.
Journal ArticleDOI
Exceptionally long-range haplotypes in Plasmodium falciparum chromosome 6 maintained in an endemic African population.
Alfred Amambua-Ngwa,Bakary Danso,Archibald Worwui,Sukai Ceesay,Nwakanma Davies,David Jeffries,Umberto D'Alessandro,Umberto D'Alessandro,David J. Conway +8 more
TL;DR: The occurrence of several long haplotypes at intermediate frequencies suggests an unusual mode of selection in chromosome 6, possibly combined with recombination suppression on specific haplotypes in The Gambia.
Journal ArticleDOI
CAGm: A repository of germline microsatellite variations in the 1000 genomes project
Nicholas A. Kinney,Kyle Titus-Glover,Jonathan D. Wren,Jonathan D. Wren,Robin T. Varghese,Pawel Michalak,Pawel Michalak,Pawel Michalak,Han Liao,Ramu Anandakrishnan,Arichanah Pulenthiran,Lin Kang,Harold R. Garner +12 more
TL;DR: A key novelty of CAGm is the ability to aggregate microsatellite variation by population, ethnicity (super population) and gender, and the database provides advanced searching for microsatellites embedded in genes and functional elements.
Posted ContentDOI
A unified analytic framework for prioritization of non-coding variants of uncertain significance in heritable breast and ovarian cancer
Eliseos J. Mucaki,Natasha G. Caminsky,Ami M. Perri,Ruipeng Lu,Alain Laederach,Matthew Halvorsen,Joan H.M. Knoll,Peter K. Rogan +7 more
TL;DR: This approach distills large numbers of variants detected by NGS to a limited set of variants prioritized as potential deleterious changes and presents a strategy for complete gene sequence analysis followed by a unified framework for interpreting non-coding variants that may affect gene expression.
Journal ArticleDOI
Novel variation at chr11p13 associated with cystic fibrosis lung disease severity
Hong Dang,Paul J. Gallins,Rhonda G. Pace,Xue Liang Guo,Jaclyn R. Stonebraker,Harriet Corvol,Harriet Corvol,Garry R. Cutting,Mitchell L. Drumm,Lisa J. Strug,Michael R. Knowles,Wanda K. O'Neal +11 more
TL;DR: Highly significant associations were in strong linkage disequilibrium and were seen only in Phe508del homozygous CF subjects, indicating a CFTR genotype-specific mechanism.
References
More filters
Journal ArticleDOI
The Sequence Alignment/Map format and SAMtools
Heng Li,Bob Handsaker,Alec Wysoker,T. J. Fennell,Jue Ruan,Nils Homer,Gabor T. Marth,Gonçalo R. Abecasis,Richard Durbin +8 more
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Journal ArticleDOI
Fast and accurate short read alignment with Burrows–Wheeler transform
Heng Li,Richard Durbin +1 more
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Journal ArticleDOI
The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data
Aaron McKenna,Matthew Hanna,Eric Banks,Andrey Sivachenko,Kristian Cibulskis,Andrew Kernytsky,Kiran V. Garimella,David Altshuler,Stacey Gabriel,Mark J. Daly,Mark A. DePristo +10 more
TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
Journal ArticleDOI
Base-calling of automated sequencer traces using Phred. I. accuracy assessment
TL;DR: In this article, a base-calling program for automated sequencer traces, phred, with improved accuracy was proposed. But it was not shown to achieve a lower error rate than the ABI software, averaging 40%-50% fewer errors in the data sets examined independent of position in read, machine running conditions, or sequencing chemistry.
Journal ArticleDOI
Tandem repeats finder: a program to analyze DNA sequences
TL;DR: A new algorithm for finding tandem repeats which works without the need to specify either the pattern or pattern size is presented and its ability to detect tandem repeats that have undergone extensive mutational change is demonstrated.
Related Papers (5)
A framework for variation discovery and genotyping using next-generation DNA sequencing data
Mark A. DePristo,Eric Banks,Ryan Poplin,Kiran V. Garimella,Jared Maguire,Christopher Hartl,Anthony A. Philippakis,Anthony A. Philippakis,Anthony A. Philippakis,Guillermo del Angel,Manuel A. Rivas,Manuel A. Rivas,Matt Hanna,Aaron McKenna,Timothy Fennell,Andrew Kernytsky,Andrey Sivachenko,Kristian Cibulskis,Stacey Gabriel,David Altshuler,David Altshuler,Mark J. Daly,Mark J. Daly +22 more