FastHap: fast and accurate single individual haplotype reconstruction using fuzzy conflict graphs.
Sepideh Mazrouee,Wei Wang +1 more
Reads0
Chats0
TLDR
This article introduces FastHap, a fast and accurate haplotype reconstruction approach, which is up to one order of magnitude faster than the state-of-the-art haplotype inference algorithms while also delivering higher accuracy than these algorithms.Abstract:
Motivation: Understanding exact structure of an individual’s haplotype plays a significant role in various fields of human genetics. Despite tremendous research effort in recent years, fast and accurate haplotype reconstruction remains as an active research topic, mainly owing to the computational challenges involved. Existing haplotype assembly algorithms focus primarily on improving accuracy of the assembly, making them computationally challenging for applications on large high-throughput sequence data. Therefore, there is a need to develop haplotype reconstruction algorithms that are not only accurate but also highly scalable.
Results: In this article, we introduce FastHap, a fast and accurate haplotype reconstruction approach, which is up to one order of magnitude faster than the state-of-the-art haplotype inference algorithms while also delivering higher accuracy than these algorithms. FastHap leverages a new similarity metric that allows us to precisely measure distances between pairs of fragments. The distance is then used in building the fuzzy conflict graphs of fragments. Given that optimal haplotype reconstruction based on minimum error correction is known to be NP-hard, we use our fuzzy conflict graphs to develop a fast heuristic for fragment partitioning and haplotype reconstruction.
Availability: An implementation of FastHap is available for sharing on request.
Contact: ude.alcu.sc@hedipesread more
Citations
More filters
Book
Algorithms in Bioinformatics: 5th International Workshop, WABI 2005, Mallorca, Spain, October 3-6, 2005, Proceedings (Lecture Notes in Computer Science / Lecture Notes in Bioinformatics)
Rita Casadio,Gene Myers +1 more
TL;DR: In this article, the authors present an efficient reduction from constrained to unconstrained maximum agreement subtree for the maximum quartet consistency problem, which can be solved by using semi-definite programming.
Posted ContentDOI
WhatsHap: fast and accurate read-based phasing
Marcel Martin,Patterson,Shilpa Garg,S. Fischer,Nadia Pisanti,Gunnar W. Klau,Alexander Schönhuth,Tobias Marschall +7 more
TL;DR: WhatsHap is a production-ready tool for highly accurate read-based phasing that was designed from the beginning to leverage third-generation sequencing technologies, whose long reads can span many variants and are therefore ideal for phasing.
Journal ArticleDOI
Unzipping haplotypes in diploid and polyploid genomes
TL;DR: In this paper, the authors review existing methods for alignment-based and assembly-based haplotype phasing for heterozygous diploid and polyploid genomes, as well as recent advances of experimental approaches for improved genome phasing.
Patent
Methods and systems for detecting sequence variants
TL;DR: In this article, the authors proposed a method for identifying rare variants near a structural variation in a genetic sequence, for example in a nucleic acid sample taken from a subject, by aligning reads to a reference sequence construct accounting for the structural variation, and using the alignment methods to identify rare variants.
Journal ArticleDOI
Survey of computational haplotype determination methods for single individual
TL;DR: This review investigates how the computational haplotype determination methods have been developed, and the remaining problems affecting the determination of the haplotype of single individual using next-generation sequencing methods are presented.
References
More filters
Book
Computers and Intractability: A Guide to the Theory of NP-Completeness
TL;DR: The second edition of a quarterly column as discussed by the authors provides a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book "Computers and Intractability: A Guide to the Theory of NP-Completeness,” W. H. Freeman & Co., San Francisco, 1979.
Journal ArticleDOI
Real-Time DNA Sequencing from Single Polymerase Molecules
John Eid,Adrian Fehr,Jeremy Gray,Khai Luong,John Lyle,Geoff Otto,Paul Peluso,David R. Rank,Primo Baybayan,Brad Bettman,Arkadiusz Bibillo,Keith Bjornson,Bidhan Chaudhuri,Fred Christians,Ronald L. Cicero,Sonya Clark,Ravindra V. Dalal,Alex DeWinter,John Dixon,Mathieu Foquet,Alfred Gaertner,Paul Hardenbol,Cheryl Heiner,Kevin Hester,David P. Holden,Gregory J. Kearns,Xiangxu Kong,Ronald Kuse,Yves Lacroix,Steven Lin,Paul Lundquist,Congcong Ma,Patrick Marks,Mark Maxham,Devon Murphy,Insil Park,Thang Pham,Michael Phillips,Joy Roy,Robert Sebra,Gene Shen,Jon M. Sorenson,Austin B. Tomaney,Kevin Travers,Mark Trulson,John Vieceli,Jeffrey Wegener,Dawn Wu,Alicia Yang,Denis Zaccarin,Peter Zhao,Frank Zhong,Jonas Korlach,Stephen Turner +53 more
TL;DR: Single-molecule, real-time sequencing data obtained from a DNA polymerase performing uninterrupted template-directed synthesis using four distinguishable fluorescently labeled deoxyribonucleoside triphosphates (dNTPs) are presented.
Journal ArticleDOI
The Diploid Genome Sequence of an Individual Human
Samuel Levy,Granger G. Sutton,Pauline C. Ng,Lars Feuk,Aaron L. Halpern,Brian P. Walenz,Nelson Axelrod,Jiaqi Huang,Ewen F. Kirkness,Gennady Denisov,Yuan Lin,Jeffrey R. MacDonald,Andy Wing Chun Pang,Mary Shago,Timothy B. Stockwell,Alexia Tsiamouri,Vineet Bafna,Vikas Bansal,Saul A. Kravitz,Dana A. Busam,Karen Beeson,Tina C McIntosh,Karin A. Remington,Josep F. Abril,John Gill,Jon Borman,Yu-Hui Rogers,Marvin Frazier,Stephen W. Scherer,Robert L. Strausberg,J. Craig Venter +30 more
TL;DR: A modified version of the Celera assembler is developed to facilitate the identification and comparison of alternate alleles within this individual diploid genome, and a novel haplotype assembly strategy is used, able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploids nature of the genome.
Book
Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties
Giorgio Ausiello,Pierluigi Crescenzi,Giorgio Gambosi,Viggo Kann,Alberto Marchetti-Spaccamela,Marco Protasi +5 more
TL;DR: This book documents the state of the art in combinatorial optimization, presenting approximate solutions of virtually all relevant classes of NP-hard optimization problems.
Related Papers (5)
An MCMC algorithm for haplotype assembly from whole-genome sequence data.
HapCompass: A Fast Cycle Basis Algorithm for Accurate Haplotype Assembly of Sequence Data
Derek Aguiar,Sorin Istrail +1 more
A framework for variation discovery and genotyping using next-generation DNA sequencing data
Mark A. DePristo,Eric Banks,Ryan Poplin,Kiran V. Garimella,Jared Maguire,Christopher Hartl,Anthony A. Philippakis,Anthony A. Philippakis,Anthony A. Philippakis,Guillermo del Angel,Manuel A. Rivas,Manuel A. Rivas,Matt Hanna,Aaron McKenna,Timothy Fennell,Andrew Kernytsky,Andrey Sivachenko,Kristian Cibulskis,Stacey Gabriel,David Altshuler,David Altshuler,Mark J. Daly,Mark J. Daly +22 more