scispace - formally typeset
Search or ask a question
Author

Harold R. Garner

Bio: Harold R. Garner is an academic researcher from National Institutes of Health. The author has contributed to research in topics: Contig. The author has an hindex of 1, co-authored 2 publications receiving 22 citations.
Topics: Contig

Papers
More filters
Journal ArticleDOI
TL;DR: GenoTan, a program using a discretized Gaussian mixture model combined with a rules-based approach to identify inherited variation of microsatellite loci from short sequence reads without paired-end information, effectively distinguishes length variants from noise including insertion/deletion errors in homopolymer runs by addressing the bidirectional aspect of insertion and deletion errors in sequence reads.
Abstract: Motivation: Inferring lengths of inherited microsatellite alleles with single base pair resolution from short sequence reads is challenging due to several sources of noise caused by the repetitive nature of microsatellites and the technologies used to generate raw sequence data. Results: We have developed a program, GenoTan, using a discretized Gaussian mixture model combined with a rules-based approach to identify inherited variation of microsatellite loci from short sequence reads without paired-end information. It effectively distinguishes length variants from noise including insertion/deletion errors in homopolymer runs by addressing the bidirectional aspect of insertion and deletion errors in sequence reads. Here we first introduce a homopolymer decomposition method which estimates error bias toward insertion or deletion in homopolymer sequence runs. Combining these approaches, GenoTan was able to genotype 94.9% of microsatellite loci accurately from simulated data with 40x sequence coverage quickly while the other programs showed590% correct calls for the same data and required 5� 30� more computational time than GenoTan. It also showed the highest true-positive rate for real data using mixed sequence data of two Drosophila inbred lines, which was a novel validation approach for genotyping. Availability: GenoTan is open-source software available at http://gen otan.sourceforge.net.

25 citations

01 Jan 2012
TL;DR: VBI22 strain isolated from raw milk from an infected cow in Texas was sequenced via the Illumina GAIIx se-quencer following the standardIllumina76-cyclepaired-endpro-tocols generating 24,600,000 sequence read pairs per sample.
Abstract: VBI22 strain isolated from raw milkfrom an infected cow in Texas.Genomic DNA was sequenced via the Illumina GAIIx se-quencerfollowingthestandardIllumina76-cyclepaired-endpro-tocols generating 24,600,000 sequence read pairs (49,200,000reads) per sample. All low-quality bases (0.99 of quality score)were trimmed from the sequence reads, and contig sequenceswere assembled using two

Cited by
More filters
Journal ArticleDOI
TL;DR: It is demonstrated that exSTRa can be effectively utilized as a screening tool for detecting repeat expansions in WES and WGS data, although the best performance would be produced by consensus calling, wherein at least two out of the four currently available screening methods call an expansion.
Abstract: Repeat expansions cause more than 30 inherited disorders, predominantly neurogenetic. These can present with overlapping clinical phenotypes, making molecular diagnosis challenging. Single-gene or small-panel PCR-based methods can help to identify the precise genetic cause, but they can be slow and costly and often yield no result. Researchers are increasingly performing genomic analysis via whole-exome and whole-genome sequencing (WES and WGS) to diagnose genetic disorders. However, until recently, analysis protocols could not identify repeat expansions in these datasets. We developed exSTRa (expanded short tandem repeat algorithm), a method that uses either WES or WGS to identify repeat expansions. Performance of exSTRa was assessed in a simulation study. In addition, four retrospective cohorts of individuals with eleven different known repeat-expansion disorders were analyzed with exSTRa. We assessed results by comparing the findings to known disease status. Performance was also compared to three other analysis methods (ExpansionHunter, STRetch, and TREDPARSE), which were developed specifically for WGS data. Expansions in the assessed STR loci were successfully identified in WES and WGS datasets by all four methods with high specificity and sensitivity. Overall, exSTRa demonstrated more robust and superior performance for WES data than did the other three methods. We demonstrate that exSTRa can be effectively utilized as a screening tool for detecting repeat expansions in WES and WGS data, although the best performance would be produced by consensus calling, wherein at least two out of the four currently available screening methods call an expansion.

89 citations

Journal ArticleDOI
TL;DR: A strategy for complete gene sequence analysis followed by a unified framework for interpreting non-coding variants that may affect gene expression is presented and large numbers of variants detected by NGS are distilled to a limited set of variants prioritized as potential deleterious changes.
Abstract: Sequencing of both healthy and disease singletons yields many novel and low frequency variants of uncertain significance (VUS). Complete gene and genome sequencing by next generation sequencing (NGS) significantly increases the number of VUS detected. While prior studies have emphasized protein coding variants, non-coding sequence variants have also been proven to significantly contribute to high penetrance disorders, such as hereditary breast and ovarian cancer (HBOC). We present a strategy for analyzing different functional classes of non-coding variants based on information theory (IT) and prioritizing patients with large intragenic deletions. We captured and enriched for coding and non-coding variants in genes known to harbor mutations that increase HBOC risk. Custom oligonucleotide baits spanning the complete coding, non-coding, and intergenic regions 10 kb up- and downstream of ATM, BRCA1, BRCA2, CDH1, CHEK2, PALB2, and TP53 were synthesized for solution hybridization enrichment. Unique and divergent repetitive sequences were sequenced in 102 high-risk, anonymized patients without identified mutations in BRCA1/2. Aside from protein coding and copy number changes, IT-based sequence analysis was used to identify and prioritize pathogenic non-coding variants that occurred within sequence elements predicted to be recognized by proteins or protein complexes involved in mRNA splicing, transcription, and untranslated region (UTR) binding and structure. This approach was supplemented by in silico and laboratory analysis of UTR structure. 15,311 unique variants were identified, of which 245 occurred in coding regions. With the unified IT-framework, 132 variants were identified and 87 functionally significant VUS were further prioritized. An intragenic 32.1 kb interval in BRCA2 that was likely hemizygous was detected in one patient. We also identified 4 stop-gain variants and 3 reading-frame altering exonic insertions/deletions (indels). We have presented a strategy for complete gene sequence analysis followed by a unified framework for interpreting non-coding variants that may affect gene expression. This approach distills large numbers of variants detected by NGS to a limited set of variants prioritized as potential deleterious changes.

25 citations

Journal ArticleDOI
TL;DR: An algorithm to automatically and efficiently genotype microsatellites from a collection of reads sorted by individual, which can be used to genotype any microsatellite locus from any organism and has been tested on 454 pyrosequencing data of several loci from fruit flies and red deers.
Abstract: Microsatellites are widely used in population genetics to uncover recent evolutionary events. They are typically genotyped using capillary sequencer, which capacity is usually limited to 9, at most 12 loci for each run, and which analysis is a tedious task that is performed by hand. With the rise of next-generation sequencing (NGS), a much larger number of loci and individuals are available from sequencing: for example, on a single run of a GS Junior, 28 loci from 96 individuals are sequenced with a 30X cover. We have developed an algorithm to automatically and efficiently genotype microsatellites from a collection of reads sorted by individual (e.g. specific PCR amplifications of a locus or a collection of reads that encompass a locus of interest). As the sequencing and the PCR amplification introduce artefactual insertions or deletions, the set of reads from a single microsatellite allele shows several length variants. The algorithm infers, without alignment, the true unknown allele(s) of each individual from the observed distributions of microsatellites length of all individuals. MicNeSs, a python implementation of the algorithm, can be used to genotype any microsatellite locus from any organism and has been tested on 454 pyrosequencing data of several loci from fruit flies (a model species) and red deers (a nonmodel species). Without any parallelization, it automatically genotypes 22 loci from 441 individuals in 11 hours on a standard computer. The comparison of MicNeSs inferences to the standard method shows an excellent agreement, with some differences illustrating the pros and cons of both methods.

22 citations

Journal ArticleDOI
TL;DR: The Pheno2Geno package makes use of genome-wide molecular profiling and provides a tool for high-throughput de novo map construction and saturation of existing genetic maps.
Abstract: Background: Genetic markers and maps are instrumental in quantitative trait locus (QTL) mapping in segregating populations. The resolution of QTL localization depends on the number of informative recombinations in the population and how well they are tagged by markers. Larger populations and denser marker maps are better for detecting and locating QTLs. Marker maps that are initially too sparse can be saturated or derived de novo from high-throughput omics data, (e.g. gene expression, protein or metabolite abundance). If these molecular phenotypes are affected by genetic variation due to a major QTL they will show a clear multimodal distribution. Using this information, phenotypes can be converted into genetic markers. Results: The Pheno2Geno tool uses mixture modeling to select phenotypes and transform them into genetic markers suitable for construction and/or saturation of a genetic map. Pheno2Geno excludes candidate genetic markers that show evidence for multiple possibly epistatically interacting QTL and/or interaction with the environment, in order to provide a set of robust markers for follow-up QTL mapping. We demonstrate the use of Pheno2Geno on gene expression data of 370,000 probes in 148 A. thaliana recombinant inbred lines. Pheno2Geno is able to saturate the existing genetic map, decreasing the average distance between markers from 7.1 cM to 0.89 cM, close to the theoretical limit of 0.68 cM (with 148 individuals we expect a recombination every 100/148=0.68 cM); this pinpointed almost all of the informative recombinations in the population. Conclusion: The Pheno2Geno package makes use of genome-wide molecular profiling and provides a tool for high-throughput de novo map construction and saturation of existing genetic maps. Processing of the showcase dataset takes less than 30 minutes on an average desktop PC. Pheno2Geno improves QTL mapping results at no additional laboratory cost and with minimum computational effort. Its results are formatted for direct use in R/qtl, the leading R package for QTL studies. Pheno2Geno is freely available on CRAN under “GNU GPL v3”. The Pheno2Geno package as well as the tutorial can also be found at: http://pheno2geno.nl.

15 citations

Journal ArticleDOI
TL;DR: It is shown that LT-RPA improves the limit of detection of MSI compared to PCR up to four times, notably for small deletions, and simplifies the identification of the mutant alleles.
Abstract: Microsatellites are polymorphic short tandem repeats of 1-6 nucleotides ubiquitously present in the genome that are extensively used in living organisms as genetic markers and in oncology to detect microsatellite instability (MSI). While the standard analysis method of microsatellites is based on PCR followed by capillary electrophoresis, it generates undesirable frameshift products known as 'stutter peaks' caused by the polymerase slippage that can greatly complicate the analysis and interpretation of the data. Here we present an easy multiplexable approach replacing PCR that is based on low temperature isothermal amplification using recombinase polymerase amplification (LT-RPA) that drastically reduces and sometimes completely abolishes the formation of stutter artifacts, thus greatly simplifying the calling of the alleles. Using HT17, a mononucleotide DNA repeat that was previously proposed as an optimal marker to detect MSI in tumor DNA, we showed that LT-RPA improves the limit of detection of MSI compared to PCR up to four times, notably for small deletions, and simplifies the identification of the mutant alleles. It was successfully applied to clinical colorectal cancer samples and enabled detection of MSI. This easy-to-handle, rapid and cost-effective approach may deeply improve the analysis of microsatellites in several biological and clinical applications.

14 citations