scispace - formally typeset
Search or ask a question
Topic

2 base encoding

About: 2 base encoding is a research topic. Over the lifetime, 210 publications have been published within this topic receiving 81608 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The results indicate that it is possible to map SMS reads with high accuracy and speed, and the inferences made on the mapability of SMS reads using the combinatorial model of sequencing error are in agreement with the mapping accuracy demonstrated on simulated reads.
Abstract: Recent methods have been developed to perform high-throughput sequencing of DNA by Single Molecule Sequencing (SMS). While Next-Generation sequencing methods may produce reads up to several hundred bases long, SMS sequencing produces reads up to tens of kilobases long. Existing alignment methods are either too inefficient for high-throughput datasets, or not sensitive enough to align SMS reads, which have a higher error rate than Next-Generation sequencing. We describe the method BLASR (Basic Local Alignment with Successive Refinement) for mapping Single Molecule Sequencing (SMS) reads that are thousands of bases long, with divergence between the read and genome dominated by insertion and deletion error. The method is benchmarked using both simulated reads and reads from a bacterial sequencing project. We also present a combinatorial model of sequencing error that motivates why our approach is effective. The results indicate that it is possible to map SMS reads with high accuracy and speed. Furthermore, the inferences made on the mapability of SMS reads using our combinatorial model of sequencing error are in agreement with the mapping accuracy demonstrated on simulated reads.

1,085 citations

Journal ArticleDOI
01 Jan 2016-Genomics
TL;DR: This article traverses those years, iterating through the different generations of sequencing technology, highlighting some of the key discoveries, researchers, and sequences along the way.

880 citations

Journal ArticleDOI
TL;DR: The relevant concepts and issues raised by the current high‐throughput DNA sequencing technologies are reviewed and compared and how future developments may overcome these limitations are analyzed.
Abstract: Recent advances in DNA sequencing have revolutionized the field of genomics, making it possible for even single research groups to generate large amounts of sequence data very rapidly and at a substantially lower cost. These high-throughput sequencing technologies make deep transcriptome sequencing and transcript quantification, whole genome sequencing and resequencing available to many more researchers and projects. However, while the cost and time have been greatly reduced, the error profiles and limitations of the new platforms differ significantly from those of previous sequencing technologies. The selection of an appropriate sequencing platform for particular types of experiments is an important consideration, and requires a detailed understanding of the technologies available; including sources of error, error rate, as well as the speed and cost of sequencing. We review the relevant concepts and compare the issues raised by the current high-throughput DNA sequencing technologies. We analyze how future developments may overcome these limitations and what challenges remain.

651 citations

Journal ArticleDOI
TL;DR: The 454 Sequencer has dramatically increased the volume of sequencing conducted by the scientific community and expanded the range of problems that can be addressed by the direct readouts of DNA sequence, leading to a better understanding of the structure of the human genome and opening up new approaches to identify small RNAs.
Abstract: The 454 Sequencer has dramatically increased the volume of sequencing conducted by the scientific community and expanded the range of problems that can be addressed by the direct readouts of DNA sequence. Key breakthroughs in the development of the 454 sequencing platform included higher throughput, simplified all in vitro sample preparation and the miniaturization of sequencing chemistries, enabling massively parallel sequencing reactions to be carried out at a scale and cost not previously possible. Together with other recently released next-generation technologies, the 454 platform has started to democratize sequencing, providing individual laboratories with access to capacities that rival those previously found only at a handful of large sequencing centers. Over the past 18 months, 454 sequencing has led to a better understanding of the structure of the human genome, allowed the first non-Sanger sequence of an individual human and opened up new approaches to identify small RNAs. To make next-generation technologies more widely accessible, they must become easier to use and less costly. In the longer term, the principles established by 454 sequencing might reduce cost further, potentially enabling personalized genomics.

568 citations

Journal ArticleDOI
TL;DR: SSAKE is a tool for aggressively assembling millions of short nucleotide sequences by progressively searching through a prefix tree for the longest possible overlap between any two sequences to help leverage the information from short sequence reads by stringently assembling them into contiguous sequences that can be used to characterize novel sequencing targets.
Abstract: Summary: Novel DNA sequencing technologies with the potential for up to three orders magnitude more sequence throughput than conventional Sanger sequencing are emerging. The instrument now available from Solexa Ltd, produces millions of short DNA sequences of 25 nt each. Due to ubiquitous repeats in large genomes and the inability of short sequences to uniquely and unambiguously characterize them, the short read length limits applicability for de novo sequencing. However, given the sequencing depth and the throughput of this instrument, stringent assembly of highly identical sequences can be achieved. We describe SSAKE, a tool for aggressively assembling millions of short nucleotide sequences by progressively searching through a prefix tree for the longest possible overlap between any two sequences. SSAKE is designed to help leverage the information from short sequence reads by stringently assembling them into contiguous sequences that can be used to characterize novel sequencing targets. Availability: http://www.bcgsc.ca/bioinfo/software/ssake Contact: [email protected]

542 citations


Network Information
Related Topics (5)
Genome
74.2K papers, 3.8M citations
76% related
Gene
211.7K papers, 10.3M citations
73% related
Regulation of gene expression
85.4K papers, 5.8M citations
72% related
RNA
111.6K papers, 5.4M citations
72% related
DNA
107.1K papers, 4.7M citations
71% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20181
20179
201618
201522
20147
201325