scispace - formally typeset
Search or ask a question

Showing papers by "René L. Warren published in 2021"


Journal ArticleDOI
TL;DR: This work predicted HLA class I and II alleles from the transcriptome sequencing data prepared from the bronchoalveolar lavage fluid samples of five patients at the early stage of the COVID-19 outbreak and identified the HLA-I allele A*24:02 in four out of four patients, which is higher than the expected frequency in the South Han Chinese population.
Abstract: We are in the midst of a global viral pandemic, one with no cure and a high mortality rate. The Human Leukocyte Antigen (HLA) gene complex plays a critical role in host immunity. We predicted HLA class I and II alleles from the transcriptome sequencing data prepared from the bronchoalveolar lavage fluid samples of five patients at the early stage of the COVID-19 outbreak. We identified the HLA-I allele A*24:02 in four out of five patients, which is higher than the expected frequency (17.2%) in the South Han Chinese population. The difference is statistically significant with a p-value less than 10-4. Our analysis results may help provide future insights on disease susceptibility.

30 citations


Journal ArticleDOI
TL;DR: LongStitch as discussed by the authors is a scalable pipeline that corrects and scaffolds draft genome assemblies exclusively using long reads, which can be used for resolving problematic regions and helping generate more complete draft assemblies.
Abstract: Generating high-quality de novo genome assemblies is foundational to the genomics study of model and non-model organisms. In recent years, long-read sequencing has greatly benefited genome assembly and scaffolding, a process by which assembled sequences are ordered and oriented through the use of long-range information. Long reads are better able to span repetitive genomic regions compared to short reads, and thus have tremendous utility for resolving problematic regions and helping generate more complete draft assemblies. Here, we present LongStitch, a scalable pipeline that corrects and scaffolds draft genome assemblies exclusively using long reads. LongStitch incorporates multiple tools developed by our group and runs in up to three stages, which includes initial assembly correction (Tigmint-long), followed by two incremental scaffolding stages (ntLink and ARKS-long). Tigmint-long and ARKS-long are misassembly correction and scaffolding utilities, respectively, previously developed for linked reads, that we adapted for long reads. Here, we describe the LongStitch pipeline and introduce our new long-read scaffolder, ntLink, which utilizes lightweight minimizer mappings to join contigs. LongStitch was tested on short and long-read assemblies of Caenorhabditis elegans, Oryza sativa, and three different human individuals using corresponding nanopore long-read data, and improves the contiguity of each assembly from 1.2-fold up to 304.6-fold (as measured by NGA50 length). Furthermore, LongStitch generates more contiguous and correct assemblies compared to state-of-the-art long-read scaffolder LRScaf in most tests, and consistently improves upon human assemblies in under five hours using less than 23 GB of RAM. Due to its effectiveness and efficiency in improving draft assemblies using long reads, we expect LongStitch to benefit a wide variety of de novo genome assembly projects. The LongStitch pipeline is freely available at https://github.com/bcgsc/longstitch .

14 citations


Journal ArticleDOI
15 Oct 2021-PeerJ
TL;DR: In this article, the authors have identified prevalent HLA class I and class II alleles, including DPA1*02:02, in two small patient cohorts at the COVID-19 pandemic onset.
Abstract: Background The Human Leukocyte Antigen (HLA) gene locus plays a fundamental role in human immunity, and it is established that certain HLA alleles are disease determinants. Previously, we have identified prevalent HLA class I and class II alleles, including DPA1*02:02, in two small patient cohorts at the COVID-19 pandemic onset. Methods We have since analyzed a larger public patient cohort data (n = 126 patients) with controls, associated demographic and clinical data. By combining the predictive power of multiple in silico HLA predictors, we report on HLA-I and HLA-II alleles, along with their associated risk significance. Results We observe HLA-II DPA1*02:02 at a higher frequency in the COVID-19 positive cohort (29%) when compared to the COVID-negative control group (Fisher's exact test [FET] p = 0.0174). Having this allele, however, does not appear to put this cohort's patients at an increased risk of hospitalization. Inspection of COVID-19 disease severity outcomes, including admission to intensive care, reveal nominally significant risk associations with A*11:01 (FET p = 0.0078) and C*04:01 (FET p = 0.0087). The association with severe disease outcome is especially evident for patients with C*04:01, where disease prognosis measured by mechanical ventilation-free days was statistically significant after multiple hypothesis correction (Bonferroni p = 0.0323). While prevalence of some of these alleles falls below statistical significance after Bonferroni correction, COVID-19 patients with HLA-I C*04:01 tend to fare worse overall. This HLA allele may hold potential clinical value.

8 citations


Posted ContentDOI
18 Jun 2021-bioRxiv
TL;DR: The LongStitch pipeline is described and a new long-read scaffolder is introduced, ntLink, which utilizes lightweight minimizer mappings to join contigs and is expected to benefit a wide variety of de novo genome assembly projects.
Abstract: Background Generating high-quality de novo genome assemblies is foundational to the genomics study of model and non-model organisms. In recent years, long-read sequencing has greatly benefited genome assembly and scaffolding, a process by which assembled sequences are ordered and oriented through the use of long-range information. Long reads are better able to span repetitive genomic regions compared to short reads, and thus have tremendous utility for resolving problematic regions and helping generate more complete draft assemblies. Here, we present LongStitch, a scalable pipeline that corrects and scaffolds draft genome assemblies exclusively using long reads. Results LongStitch incorporates multiple tools developed by our group and runs in up to three stages, which includes initial assembly correction (Tigmint-long), followed by two incremental scaffolding stages (ntLink and ARKS-long). Tigmint-long and ARKS-long are misassembly correction and scaffolding utilities, respectively, previously developed for linked reads, that we adapted for long reads. Here, we describe the LongStitch pipeline and introduce our new long-read scaffolder, ntLink, which utilizes lightweight minimizer mappings to join contigs. LongStitch was tested on short and long-read assemblies of three different human individuals using corresponding nanopore long-read data, and improves the contiguity of each assembly from 2.0-fold up to 304.6-fold (as measured by NGA50 length). Furthermore, LongStitch generates more contiguous and correct assemblies compared to state-of-the-art long-read scaffolder LRScaf in most tests, and consistently runs in under five hours using less than 23GB of RAM. Conclusions Due to its effectiveness and efficiency in improving draft assemblies using long reads, we expect LongStitch to benefit a wide variety of de novo genome assembly projects. The LongStitch pipeline is freely available at https://github.com/bcgsc/longstitch.

6 citations


Journal ArticleDOI
TL;DR: This work built interactive scalable vector graphics maps that show daily nucleotide variations in genomes from the six most populated continents compared to that of the initial, ground-zero SARS-CoV-2 isolate sequenced at the beginning of the year.
Abstract: As the year 2020 draws to an end, several new strains have been reported for the SARS-CoV-2 coronavirus, the agent responsible for the COVID-19 pandemic that has afflicted us all this past year. However, it is difficult to comprehend the scale, in sequence space, geographical location and time, at which SARS-CoV-2 mutates and evolves in its human hosts. To get an appreciation for the rapid evolution of the coronavirus, we built interactive scalable vector graphics maps that show daily nucleotide variations in genomes from the six most populated continents compared to that of the initial, ground-zero SARS-CoV-2 isolate sequenced at the beginning of the year. Availability: Mutation time maps are available from https://bcgsc.github.io/SARS2/.

1 citations


Posted ContentDOI
20 Nov 2021-bioRxiv
TL;DR: Meta-NanoSim as mentioned in this paper is a fast and versatile utility that characterizes and simulates the unique properties of nanopore metagenomic reads and improves upon state-of-theart methods on microbial abundance estimation through a base-level quantification algorithm.
Abstract: Nanopore sequencing is crucial to metagenomic studies as its kilobase-long reads can contribute to resolving genomic structural differences among microbes. However, platform-specific challenges, including high base-call error rate, non-uniform read lengths, and the presence of chimeric artifacts, necessitate specifically designed analytical tools. Here, we present Meta-NanoSim, a fast and versatile utility that characterizes and simulates the unique properties of nanopore metagenomic reads. Further, Meta-NanoSim improves upon state-of-the-art methods on microbial abundance estimation through a base-level quantification algorithm. We demonstrate that Meta-NanoSim simulated data can facilitate the development of metagenomic algorithms and guide experimental design through a metagenomic assembly benchmarking task.

1 citations


Posted Content
Eric Chen1, Justin Chu, Jessica Zhang, René L. Warren, Inanc Birol2 
TL;DR: GapPredict as discussed by the authors uses a character-level language model to predict unresolved nucleotides in scaffold gaps, which can fill 65.6% of the sampled gaps that were left unfilled by the latter.
Abstract: Short-read DNA sequencing instruments can yield over 1e+12 bases per run, typically composed of reads 150 bases long. Despite this high throughput, de novo assembly algorithms have difficulty reconstructing contiguous genome sequences using short reads due to both repetitive and difficult-to-sequence regions in these genomes. Some of the short read assembly challenges are mitigated by scaffolding assembled sequences using paired-end reads. However, unresolved sequences in these scaffolds appear as "gaps". Here, we introduce GapPredict, a tool that uses a character-level language model to predict unresolved nucleotides in scaffold gaps. We benchmarked GapPredict against the state-of-the-art gap-filling tool Sealer, and observed that the former can fill 65.6% of the sampled gaps that were left unfilled by the latter, demonstrating the practical utility of deep learning approaches to the gap-filling problem in genome sequence assembly.