scispace - formally typeset
Open AccessJournal ArticleDOI

Discretized Gaussian mixture for genotyping of microsatellite loci containing homopolymer runs

TLDR
GenoTan, a program using a discretized Gaussian mixture model combined with a rules-based approach to identify inherited variation of microsatellite loci from short sequence reads without paired-end information, effectively distinguishes length variants from noise including insertion/deletion errors in homopolymer runs by addressing the bidirectional aspect of insertion and deletion errors in sequence reads.
Abstract
Motivation: Inferring lengths of inherited microsatellite alleles with single base pair resolution from short sequence reads is challenging due to several sources of noise caused by the repetitive nature of microsatellites and the technologies used to generate raw sequence data. Results: We have developed a program, GenoTan, using a discretized Gaussian mixture model combined with a rules-based approach to identify inherited variation of microsatellite loci from short sequence reads without paired-end information. It effectively distinguishes length variants from noise including insertion/deletion errors in homopolymer runs by addressing the bidirectional aspect of insertion and deletion errors in sequence reads. Here we first introduce a homopolymer decomposition method which estimates error bias toward insertion or deletion in homopolymer sequence runs. Combining these approaches, GenoTan was able to genotype 94.9% of microsatellite loci accurately from simulated data with 40x sequence coverage quickly while the other programs showed590% correct calls for the same data and required 5� 30� more computational time than GenoTan. It also showed the highest true-positive rate for real data using mixed sequence data of two Drosophila inbred lines, which was a novel validation approach for genotyping. Availability: GenoTan is open-source software available at http://gen otan.sourceforge.net.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Detecting Expansions of Tandem Repeats in Cohorts Sequenced with Short-Read Sequencing Data

TL;DR: It is demonstrated that exSTRa can be effectively utilized as a screening tool for detecting repeat expansions in WES and WGS data, although the best performance would be produced by consensus calling, wherein at least two out of the four currently available screening methods call an expansion.
Journal ArticleDOI

A unified analytic framework for prioritization of non-coding variants of uncertain significance in heritable breast and ovarian cancer

TL;DR: A strategy for complete gene sequence analysis followed by a unified framework for interpreting non-coding variants that may affect gene expression is presented and large numbers of variants detected by NGS are distilled to a limited set of variants prioritized as potential deleterious changes.
Journal ArticleDOI

MicNeSs: genotyping microsatellite loci from a collection of (NGS) reads

TL;DR: An algorithm to automatically and efficiently genotype microsatellites from a collection of reads sorted by individual, which can be used to genotype any microsatellite locus from any organism and has been tested on 454 pyrosequencing data of several loci from fruit flies and red deers.
Journal ArticleDOI

Low temperature isothermal amplification of microsatellites drastically reduces stutter artifact formation and improves microsatellite instability detection in cancer.

TL;DR: It is shown that LT-RPA improves the limit of detection of MSI compared to PCR up to four times, notably for small deletions, and simplifies the identification of the mutant alleles.
References
More filters
Journal ArticleDOI

lobSTR: A short tandem repeat profiler for personal genomes.

TL;DR: The speed and reliability of lobSTR exceed the performance of current mainstream algorithms for STR profiling, and the algorithm was used to conduct a comprehensive survey of STR variations in a deeply sequenced personal genome.
Journal ArticleDOI

Alta-Cyclic: a self-optimizing base caller for next-generation sequencing

TL;DR: Alta-Cyclic substantially improved the number of accurate reads for sequencing runs up to 78 bases and reduced systematic biases, facilitating confident identification of sequence variants.
Journal ArticleDOI

pIRS: Profile-based Illumina pair-end reads simulator

TL;DR: A software package, pIRS (profile-based Illumina pair-end reads simulator), which simulates Illumina reads with empirical Base-Calling and GC%-depth profiles trained from real re-sequencing data, fits the properties of real sequencing data better than existing simulators.
Journal ArticleDOI

Accurate human microsatellite genotypes from high-throughput resequencing data using informed error profiles

TL;DR: A tool for genotyping microsatellite repeats called RepeatSeq is presented, which uses Bayesian model selection guided by an empirically derived error model that incorporates sequence and read properties.
Journal ArticleDOI

The Drosophila melanogaster circadian pacemaker circuit

Vasu Sheeba
- 31 Dec 2008 - 
TL;DR: This work has shown that the D. melanogaster circadian pacemaker circuit presents a relatively simple and attractive model for the study of neuronal circuits and their functions.
Related Papers (5)