Unicycler: resolving bacterial genome assemblies from short and long sequencing reads
read more
Citations
SPAdes, a new genome assembly algorithm and its applications to single-cell sequencing ( 7th Annual SFAF Meeting, 2012)
Plasmid-encoded tet(X) genes that confer high-level tigecycline resistance in Escherichia coli.
Accurate and Complete Genomes from Metagenomes
Oxford Nanopore R10.4 long-read sequencing enables the generation of near-finished bacterial genomes from pure cultures and metagenomes without short-read or reference polishing
Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism-calling pipelines.
References
Fast gapped-read alignment with Bowtie 2
SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing
SPAdes, a new genome assembly algorithm and its applications to single-cell sequencing ( 7th Annual SFAF Meeting, 2012)
Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM
T-Coffee: A novel method for fast and accurate multiple sequence alignment.
Related Papers (5)
TruSPAdes: barcode assembly of TruSeq synthetic long reads
Frequently Asked Questions (14)
Q2. What are the requirements for Unicycler to be appropriate in such cases?
Algorithmic improvements to long-read alignment, path finding and graph manipulations will all be required for Unicycler to be appropriate in such cases.
Q3. What is the main source of information for creating bridges?
There are two primary sources of information available for creating bridges: paired-end short reads, which can resolve small repeats, and long reads, which can resolve much larger repeats.
Q4. What is the cost-effective method of achieving a complete bacterial genome?
Hybrid assembly, which requires fewer long reads than long-read-only assembly, is the most cost-effective means of achieving this goal.
Q5. How many assemblies were performed for the short-read sets?
For the short-read sets, the authors performed five assemblies: Unicycler in each of its modes (conservatives, normal and bold), SPAdes and ABySS.
Q6. How many reads did npScarf take to complete the assembly?
npScarf required 76 minutes of reads (9.0x) to complete the assembly, SPAdes took 102 minutes of reads (12.1x) and miniasm took 213 minutes of reads (25.3x).
Q7. Why was it included in the tests?
It was included in these tests because of its speed—it only takes a few minutes to run—making it potentially suitable for real-time analysis.
Q8. How many assemblies were performed using PBSIM?
all tests were performed in five replicates using separately generated synthetic reads, resulting in 16920 total assemblies.
Q9. How many RNA operons were found in the reference genome?
The authors produced seven query sequences from the reference genome: each RNA operon along with 2 kbp of neighbouring sequence on each end.
Q10. What is the way to polish the genome?
By iteratively polishing the genome with both short and long reads, this process can correct many remaining errors in a completed assembly, including those in repeat regions.
Q11. What assemblers performed the fastest on the simulated data?
SPAdes and npScarf performed the fastest, both having a median time of eight minutes and maximum time of less than 25 minutes on the same data.
Q12. What is the way to polish the assembly graph?
As a final step, Unicycler uses Bowtie2 and Pilon to polish the assembly using shortread alignments, reducing the rate of small errors (Fig 1G)[21,22].
Q13. How many reads were generated in a four hour period?
To investigate each assembler’s suitability for such real-time analysis, the authors generated 240 sub-sets of reads, one set per minute of sequencing, each containing all reads generated up to that minute (e.g. set 60 contained all reads generated in the first hour of sequencing).
Q14. What is the way to correct errors in the sequence?
they may first use short reads to correct errors in long reads, followed by assembly of the corrected long reads[6,7].