Repetitive DNA and next-generation sequencing: computational challenges and solutions.
TLDR
The computational problems surrounding repeats are discussed and strategies used by current bioinformatics systems to solve them are described.Abstract:
Repetitive DNA sequences are abundant in a broad range of species, from bacteria to mammals, and they cover nearly half of the human genome. Repeats have always presented technical challenges for sequence alignment and assembly programs. Next-generation sequencing projects, with their short read lengths and high data volumes, have made these challenges more difficult. From a computational perspective, repeats create ambiguities in alignment and assembly, which, in turn, can produce biases and errors when interpreting results. Simply ignoring repeats is not an option, as this creates problems of its own and may mean that important biological phenomena are missed. We discuss the computational problems surrounding repeats and describe strategies used by current bioinformatics systems to solve them.read more
Citations
More filters
Integrative Genomics Viewer
James T. Robinson,Helga Thorvaldsdottir,Wendy Winckler,Mitchell Guttman,Eric S. Lander,Eric S. Lander,Gad Getz,Jill P. Mesirov +7 more
TL;DR: The sheer volume and scope of data posed by this flood of data pose a significant challenge to the development of efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data.
Journal ArticleDOI
The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes.
TL;DR: The Harvest suite of core-genome alignment and visualization tools for the rapid and simultaneous analysis of thousands of intraspecific microbial strains is presented, demonstrating that the approach exhibits unrivaled speed while maintaining the accuracy of existing methods.
Journal ArticleDOI
Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index.
Jian Yang,Andrew Bakshi,Zhihong Zhu,Gibran Hemani,Gibran Hemani,Anna A. E. Vinkhuyzen,Sang Hong Lee,Sang Hong Lee,Matthew R. Robinson,John R. B. Perry,Ilja M. Nolte,Jana V. van Vliet-Ostaptchouk,Harold Snieder,Tõnu Esko,Lili Milani,Reedik Mägi,Andres Metspalu,Anders Hamsten,Patrik K. E. Magnusson,Nancy L. Pedersen,Erik Ingelsson,Erik Ingelsson,Nicole Soranzo,Nicole Soranzo,Matthew C. Keller,Naomi R. Wray,Michael E. Goddard,Peter M. Visscher +27 more
TL;DR: It is demonstrated using simulations based on whole-genome sequencing data that ∼97% and ∼68% of variation at common and rare variants, respectively, can be captured by imputation, and evidence that height- and BMI-associated variants have been under natural selection is found.
Journal ArticleDOI
ChIP-seq and Beyond: new and improved methodologies to detect and characterize protein-DNA interactions
TL;DR: The latest advances in methods to detect and functionally characterize DNA-bound proteins are described, which are being used to identify variability in the functions of DNA-binding proteins across genomes and individuals.
Journal ArticleDOI
Discovery of microbial natural products by activation of silent biosynthetic gene clusters.
TL;DR: This Review discusses the strategies that have been developed in bacteria and fungi to identify and induce the expression of silent BGCs, and briefly summarize methods for the isolation and structural characterization of their metabolic products.
References
More filters
Journal ArticleDOI
The Sequence Alignment/Map format and SAMtools
Heng Li,Bob Handsaker,Alec Wysoker,T. J. Fennell,Jue Ruan,Nils Homer,Gabor T. Marth,Gonçalo R. Abecasis,Richard Durbin +8 more
TL;DR: SAMtools as discussed by the authors implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments.
Journal ArticleDOI
Fast and accurate short read alignment with Burrows–Wheeler transform
Heng Li,Richard Durbin +1 more
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Journal ArticleDOI
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.
Journal ArticleDOI
Full-length transcriptome assembly from RNA-Seq data without a reference genome.
Manfred Grabherr,Brian J. Haas,Moran Yassour,Moran Yassour,Joshua Z. Levin,Dawn Thompson,Ido Amit,Xian Adiconis,Lin Fan,Raktima Raychowdhury,Qiandong Zeng,Zehua Chen,Evan Mauceli,Nir Hacohen,Andreas Gnirke,Nicholas Rhind,Federica Di Palma,Bruce W. Birren,Chad Nusbaum,Kerstin Lindblad-Toh,Kerstin Lindblad-Toh,Nir Friedman,Aviv Regev +22 more
TL;DR: The Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available, providing a unified solution for transcriptome reconstruction in any sample.
Journal ArticleDOI
Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation
Cole Trapnell,Cole Trapnell,Brian A. Williams,Geo Pertea,Ali Mortazavi,Gordon Kwan,Marijke J. van Baren,Steven L. Salzberg,Barbara J. Wold,Lior Pachter +9 more
TL;DR: The results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.
Related Papers (5)
BEDTools: a flexible suite of utilities for comparing genomic features
Aaron R. Quinlan,Ira M. Hall +1 more
Initial sequencing and analysis of the human genome.
Eric S. Lander,Lauren Linton,Bruce W. Birren,Chad Nusbaum,Michael C. Zody,Jennifer Baldwin,Keri Devon,Ken Dewar,Michael Doyle,William Fitzhugh,Roel Funke,Diane Gage,Katrina Harris,Andrew Heaford,John Howland,Lisa Kann,Jessica A. Lehoczky,Rosie Levine,Paul A. McEwan,Kevin McKernan,James Meldrim,Jill P. Mesirov,Cher Miranda,William Morris,Jerome Naylor,Christina Raymond,Mark Rosetti,Ralph Santos,Andrew Sheridan,Carrie Sougnez,Nicole Stange-Thomann,Nikola Stojanovic,Aravind Subramanian,Dudley Wyman,Jane Rogers,John Sulston,R Ainscough,Stephan Beck,David Bentley,John Burton,C M Clee,Nigel P. Carter,Alan Coulson,Rebecca Deadman,Panos Deloukas,Andrew Dunham,Ian Dunham,Richard Durbin,Lisa French,Darren Grafham,Simon G. Gregory,Tim Hubbard,Sean Humphray,Adrienne Hunt,Matthew Jones,Christine Lloyd,Amanda McMurray,Lucy Matthews,Simon Mercer,Sarah Milne,James C. Mullikin,Andrew J. Mungall,Robert W. Plumb,Mark T. Ross,Ratna Shownkeen,Sarah Sims,Robert H. Waterston,Richard K. Wilson,LaDeana W. Hillier,John Douglas Mcpherson,Marco A. Marra,Elaine R. Mardis,Lucinda Fulton,Asif T. Chinwalla,Kymberlie H. Pepin,Warren Gish,Stephanie L. Chissoe,Michael C. Wendl,Kim D. Delehaunty,Tracie L. Miner,Andrew Delehaunty,Jason B. Kramer,Lisa Cook,Robert S. Fulton,Douglas L. Johnson,Patrick Minx,Sandra W. Clifton,Trevor Hawkins,Elbert Branscomb,Paul Predki,Paul G. Richardson,Sarah Wenning,Tom Slezak,Norman A. Doggett,Jan Fang Cheng,Anne S. Olsen,Susan Lucas,Christopher J. Elkin,Edward Uberbacher,Marvin Frazier,Richard A. Gibbs,Donna M. Muzny,Steven E. Scherer,John Bouck,Erica Sodergren,Kim C. Worley,Catherine M. Rives,James H. Gorrell,Michael L. Metzker,Susan L. Naylor,Raju Kucherlapati,David L. Nelson,George M. Weinstock,Yoshiyuki Sakaki,Asao Fujiyama,Masahira Hattori,Tetsushi Yada,Atsushi Toyoda,Takehiko Itoh,Chiharu Kawagoe,Hidemi Watanabe,Yasushi Totoki,Todd D. Taylor,Jean Weissenbach,Roland Heilig,William Saurin,François Artiguenave,Philippe Brottier,Thomas Brüls,Eric Pelletier,Catherine Robert,Patrick Wincker,André Rosenthal,Matthias Platzer,Gerald Nyakatura,Stefan Taudien,Andreas Rump,Douglas R. Smith,Lynn Doucette-Stamm,Marc Rubenfield,Keith Weinstock,Mei Lee Hong,Joann Dubois,Huanming Yang,Jun Yu,Jian Wang,Guyang Huang,Jun Gu,Leroy Hood,Lee Rowen,Anup Madan,Shizen Qin,Ronald W. Davis,Nancy A. Federspiel,A. Pia Abola,Michael Proctor,Bruce A. Roe,Feng Chen,Huaqin Pan,Juliane Ramser,Hans Lehrach,Richard Reinhardt,W. Richard McCombie,Melissa De La Bastide,Neilay Dedhia,H. Blöcker,K. Hornischer,Gabriele Nordsiek,Richa Agarwala,L. Aravind,Jeffrey A. Bailey,Alex Bateman,Serafim Batzoglou,Ewan Birney,Peer Bork,Daniel G. Brown,Christopher B. Burge,Lorenzo Cerutti,Hsiu Chuan Chen,Deanna M. Church,Michele Clamp,Richard R. Copley,Tobias Doerks,Sean R. Eddy,Evan E. Eichler,Terrence S. Furey,James E. Galagan,James G. R. Gilbert,Cyrus L. Harmon,Yoshihide Hayashizaki,David Haussler,Henning Hermjakob,Karsten Hokamp,Wonhee Jang,L. Steven Johnson,Thomas A. Jones,Simon Kasif,Arek Kaspryzk,Scot Kennedy,W. James Kent,Paul Kitts,Eugene V. Koonin,Ian F Korf,David Kulp,Doron Lancet,Todd M. Lowe,Aoife McLysaght,Tarjei S. Mikkelsen,John V. Moran,Nicola Mulder,Victor J. Pollara,Chris P. Ponting,Greg Schuler,Jörg Schultz,Guy Slater,Arian F.A. Smit,Elia Stupka,Joseph Szustakowki,Danielle Thierry-Mieg,Jean Thierry-Mieg,Lukas Wagner,John W. Wallis,Raymond Wheeler,Alan Williams,Yuri I. Wolf,Kenneth H. Wolfe,Shiaw Pyng Yang,Ru Fang Yeh,Francis S. Collins,Mark S. Guyer,Jane Peterson,Adam Felsenfeld,Kris A. Wetterstrand,Richard M. Myers,Jeremy Schmutz,Mark Dickson,Jane Grimwood,David R. Cox,Maynard V. Olson,Rajinder Kaul,Christopher K. Raymond,Nobuyoshi Shimizu,Kazuhiko Kawasaki,Shinsei Minoshima,Glen A. Evans,Maria Athanasiou,Roger A. Schultz,Aristides Patrinos,Michael J. Morgan +248 more