Optimization of de novo transcriptome assembly from next-generation sequencing data
Reads0
Chats0
TLDR
Two new methods for substantially improving transcriptome de novo assembly were used to assemble successfully the transcripts of the core set of genes regulating tooth development in vertebrates, while classic de noVO assembly failed.Abstract:
Transcriptome analysis has important applications in many biological fields. However, assembling a transcriptome without a known reference remains a challenging task requiring algorithmic improvements. We present two methods for substantially improving transcriptome de novo assembly. The first method relies on the observation that the use of a single k-mer length by current de novo assemblers is suboptimal to assemble transcriptomes where the sequence coverage of transcripts is highly heterogeneous. We present the Multiple-k method in which various k-mer lengths are used for de novo transcriptome assembly. We demonstrate its good performance by assembling de novo a published next-generation transcriptome sequence data set of Aedes aegypti, using the existing genome to check the accuracy of our method. The second method relies on the use of a reference proteome to improve the de novo assembly. We developed the Scaffolding using Translation Mapping (STM) method that uses mapping against the closest available reference proteome for scaffolding contigs that map onto the same protein. In a controlled experiment using simulated data, we show that the STM method considerably improves the assembly, with few errors. We applied these two methods to assemble the transcriptome of the non-model catfish Loricaria gr. cataphracta. Using the Multiple-k and STM methods, the assembly increases in contiguity and in gene identification, showing that our methods clearly improve quality and can be widely used. The new methods were used to assemble successfully the transcripts of the core set of genes regulating tooth development in vertebrates, while classic de novo assembly failed.read more
Citations
More filters
Journal ArticleDOI
Oases: Robust de novo RNA-seq assembly across the dynamic range of expression levels
Marcel H. Schulz,Marcel H. Schulz,Marcel H. Schulz,Daniel R. Zerbino,Daniel R. Zerbino,Martin Vingron,Ewan Birney +6 more
TL;DR: A software package named Oases designed to heuristically assemble RNA-seq reads in the absence of a reference genome, across a broad spectrum of expression values and in presence of alternative isoforms is presented.
Journal ArticleDOI
Next-generation transcriptome assembly
Jeffrey Martin,Zhong Wang +1 more
TL;DR: This Review summarizes the recent developments in transcriptome assembly approaches — reference-based, de novo and combined strategies — along with some perspectives on transcriptomeAssembly in the near future.
Journal ArticleDOI
Computational methods for transcriptome annotation and quantification using RNA-seq
TL;DR: The major conceptual and practical challenges of high-throughput RNA sequencing, the general classes of solutions for each category, and the interdependence between these categories are highlighted and discussed.
Journal ArticleDOI
SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads
Yinlong Xie,Yinlong Xie,Gengxiong Wu,Jingbo Tang,Ruibang Luo,Jordan Patterson,Shanlin Liu,Weihua Huang,Guangzhu He,Shengchang Gu,Shengkang Li,Xin Zhou,Tak-Wah Lam,Yingrui Li,Xun Xu,Gane Ka-Shu Wong,Jun Wang +16 more
TL;DR: The conclusion is that SOAPdenovo-Trans provides higher contiguity, lower redundancy and faster execution, compared with two other popular transcriptome assemblers.
Posted Content
SOAPdenovo-Trans: De novo transcriptome assembly with short RNA-Seq reads
Yinlong Xie,Yinlong Xie,Gengxiong Wu,Jingbo Tang,Ruibang Luo,Jordan Patterson,Shanlin Liu,Weihua Huang,Guangzhu He,Shengchang Gu,Shengkang Li,Xin Zhou,Tak-Wah Lam,Yingrui Li,Xun Xu,Gane Ka-Shu Wong,Jun Wang +16 more
TL;DR: SOAPdenovo-Trans as mentioned in this paper is a de novo transcriptome assembler designed specifically for RNA-Seq that provides higher contiguity, lower redundancy, and faster execution.
References
More filters
Journal ArticleDOI
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Stephen F. Altschul,Thomas L. Madden,Alejandro A. Schäffer,Jinghui Zhang,Zheng Zhang,Webb Miller,David J. Lipman +6 more
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Journal ArticleDOI
Mapping and quantifying mammalian transcriptomes by RNA-Seq.
TL;DR: Although >90% of uniquely mapped reads fell within known exons, the remaining data suggest new and revised gene models, including changed or additional promoters, exons and 3′ untranscribed regions, as well as new candidate microRNA precursors.
Journal ArticleDOI
RNA-Seq: a revolutionary tool for transcriptomics
TL;DR: The RNA-Seq approach to transcriptome profiling that uses deep-sequencing technologies provides a far more precise measurement of levels of transcripts and their isoforms than other methods.
Journal ArticleDOI
Velvet: Algorithms for de novo short read assembly using de Bruijn graphs
Daniel R. Zerbino,Ewan Birney +1 more
TL;DR: Velvet represents a new approach to assembly that can leverage very short reads in combination with read pairs to produce useful assemblies and is in close agreement with simulated results without read-pair information.
Journal ArticleDOI
Genome sequencing in microfabricated high-density picolitre reactors
Marcel Margulies,Michael Egholm,William E. Altman,Said Attiya,Joel S. Bader,Lisa A. Bemben,Jan Berka,Michael S. Braverman,Yi-Ju Chen,Zhoutao Chen,Scott Dewell,Lei Du,J. M. Fierro,Xavier V. Gomes,Brian C. Godwin,Wen He,Scott Edward Helgesen,Chun Heen Ho,Gerard P. Irzyk,Szilveszter C. Jando,Maria L. I. Alenquer,Thomas P. Jarvie,Kshama B. Jirage,Jong-Bum Kim,James R. Knight,Janna R. Lanza,John H. Leamon,Steven Lefkowitz,Ming Lei,Jing Li,Kenton Lohman,Hong Lu,Vinod Makhijani,Keith Mcdade,Michael P. McKenna,Eugene W. Myers,Elizabeth Nickerson,John Nobile,Ramona Plant,Bernard P. Puc,Michael T. Ronan,George T. Roth,Gary J. Sarkis,Jan Fredrik Simons,John Simpson,Maithreyan Srinivasan,Karrie R. Tartaro,Alexander Tomasz,Kari A. Vogt,Greg A. Volkmer,Shally H. Wang,Yong Wang,Michael P. Weiner,Pengguang Yu,Richard F. Begley,Jonathan M. Rothberg +55 more
TL;DR: A scalable, highly parallel sequencing system with raw throughput significantly greater than that of state-of-the-art capillary electrophoresis instruments with 96% coverage at 99.96% accuracy in one run of the machine is described.
Related Papers (5)
Full-length transcriptome assembly from RNA-Seq data without a reference genome.
Manfred Grabherr,Brian J. Haas,Moran Yassour,Moran Yassour,Joshua Z. Levin,Dawn Thompson,Ido Amit,Xian Adiconis,Lin Fan,Raktima Raychowdhury,Qiandong Zeng,Zehua Chen,Evan Mauceli,Nir Hacohen,Andreas Gnirke,Nicholas Rhind,Federica Di Palma,Bruce W. Birren,Chad Nusbaum,Kerstin Lindblad-Toh,Kerstin Lindblad-Toh,Nir Friedman,Aviv Regev +22 more