Discretized Gaussian mixture for genotyping of microsatellite loci containing homopolymer runs

doi:10.1093/BIOINFORMATICS/BTT595

Open AccessJournal ArticleDOI

Discretized Gaussian mixture for genotyping of microsatellite loci containing homopolymer runs

Hongseok Tae, +4 more

- 01 Mar 2014 -

Bioinformatics

- Vol. 30, Iss: 5, pp 652-659

Chats0

TLDR

GenoTan, a program using a discretized Gaussian mixture model combined with a rules-based approach to identify inherited variation of microsatellite loci from short sequence reads without paired-end information, effectively distinguishes length variants from noise including insertion/deletion errors in homopolymer runs by addressing the bidirectional aspect of insertion and deletion errors in sequence reads.

Abstract:

Motivation: Inferring lengths of inherited microsatellite alleles with single base pair resolution from short sequence reads is challenging due to several sources of noise caused by the repetitive nature of microsatellites and the technologies used to generate raw sequence data. Results: We have developed a program, GenoTan, using a discretized Gaussian mixture model combined with a rules-based approach to identify inherited variation of microsatellite loci from short sequence reads without paired-end information. It effectively distinguishes length variants from noise including insertion/deletion errors in homopolymer runs by addressing the bidirectional aspect of insertion and deletion errors in sequence reads. Here we first introduce a homopolymer decomposition method which estimates error bias toward insertion or deletion in homopolymer sequence runs. Combining these approaches, GenoTan was able to genotype 94.9% of microsatellite loci accurately from simulated data with 40x sequence coverage quickly while the other programs showed590% correct calls for the same data and required 5� 30� more computational time than GenoTan. It also showed the highest true-positive rate for real data using mixed sequence data of two Drosophila inbred lines, which was a novel validation approach for genotyping. Availability: GenoTan is open-source software available at http://gen otan.sourceforge.net.

Citations

PDF

Open Access

More filters

Posted ContentDOI

Detecting known repeat expansions with standard protocol next generation sequencing, towards developing a single screening test for neurological repeat expansion disorders

Rick M. Tankard, +3 more

- 30 Jun 2017 -

bioRxiv

TL;DR: It is demonstrated that exSTRa can be effectively utilized as a screening tool to interrogate WES and WGS sequencing data generated with PCR-based library preparations which can then be followed up with specific diagnostic tests.

...read moreread less

Journal ArticleDOI

Exceptionally long-range haplotypes in Plasmodium falciparum chromosome 6 maintained in an endemic African population.

Alfred Amambua-Ngwa, +8 more

- 21 Oct 2016 -

Malaria Journal

TL;DR: The occurrence of several long haplotypes at intermediate frequencies suggests an unusual mode of selection in chromosome 6, possibly combined with recombination suppression on specific haplotypes in The Gambia.

...read moreread less

Journal ArticleDOI

CAGm: A repository of germline microsatellite variations in the 1000 genomes project

Nicholas A. Kinney, +12 more

- 08 Jan 2019 -

Nucleic Acids Research

TL;DR: A key novelty of CAGm is the ability to aggregate microsatellite variation by population, ethnicity (super population) and gender, and the database provides advanced searching for microsatellites embedded in genes and functional elements.

...read moreread less

Posted ContentDOI

A unified analytic framework for prioritization of non-coding variants of uncertain significance in heritable breast and ovarian cancer

Eliseos J. Mucaki, +7 more

- 11 Nov 2015 -

bioRxiv

TL;DR: This approach distills large numbers of variants detected by NGS to a limited set of variants prioritized as potential deleterious changes and presents a strategy for complete gene sequence analysis followed by a unified framework for interpreting non-coding variants that may affect gene expression.

Gary Benson

- 01 Jan 1999 -

Nucleic Acids Research

TL;DR: A new algorithm for finding tandem repeats which works without the need to specify either the pattern or pattern size is presented and its ability to detect tandem repeats that have undergone extensive mutational change is demonstrated.

...read moreread less

Related Papers (5)

Accurate human microsatellite genotypes from high-throughput resequencing data using informed error profiles

Gareth Highnam, +5 more

- 01 Jan 2013 -

Nucleic Acids Research

Tandem repeats mediating genetic plasticity in health and disease.

Anthony J. Hannan

- 05 Feb 2018 -

Nature Reviews Genetics

Tandem repeats finder: a program to analyze DNA sequences

Gary Benson

- 01 Jan 1999 -

Nucleic Acids Research

Microsatellites: simple sequences with complex evolution

Hans Ellegren

- 01 Jun 2004 -

Nature Reviews Genetics

Discretized Gaussian mixture for genotyping of microsatellite loci containing homopolymer runs

Citations

Detecting known repeat expansions with standard protocol next generation sequencing, towards developing a single screening test for neurological repeat expansion disorders

Exceptionally long-range haplotypes in Plasmodium falciparum chromosome 6 maintained in an endemic African population.

CAGm: A repository of germline microsatellite variations in the 1000 genomes project

A unified analytic framework for prioritization of non-coding variants of uncertain significance in heritable breast and ovarian cancer

Novel variation at chr11p13 associated with cystic fibrosis lung disease severity

References

The Sequence Alignment/Map format and SAMtools

Fast and accurate short read alignment with Burrows–Wheeler transform

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data

Base-calling of automated sequencer traces using Phred. I. accuracy assessment

Tandem repeats finder: a program to analyze DNA sequences

Related Papers (5)

Accurate human microsatellite genotypes from high-throughput resequencing data using informed error profiles

Tandem repeats mediating genetic plasticity in health and disease.

Tandem repeats finder: a program to analyze DNA sequences

A framework for variation discovery and genotyping using next-generation DNA sequencing data

Microsatellites: simple sequences with complex evolution