scispace - formally typeset
Search or ask a question

Showing papers by "René L. Warren published in 2011"


Journal ArticleDOI
TL;DR: It is shown that the sensitivity of sequence-based repertoire profiling is limited by both sequencing depth and sequencing accuracy, and a new, directly measured, lower limit on individual T-cell repertoire size is established.
Abstract: Massively parallel sequencing is a useful approach for characterizing T-cell receptor diversity. However, immune receptors are extraordinarily difficult sequencing targets because any given receptor variant may be present in very low abundance and may differ legitimately by only a single nucleotide. We show that the sensitivity of sequence-based repertoire profiling is limited by both sequencing depth and sequencing accuracy. At two timepoints, 1 wk apart, we isolated bulk PBMC plus naive (CD45RA+/CD45RO-) and memory (CD45RA-/CD45RO+) T-cell subsets from a healthy donor. From T-cell receptor beta chain (TCRB) mRNA we constructed and sequenced multiple libraries to obtain a total of 1.7 billion paired sequence reads. The sequencing error rate was determined empirically and used to inform a high stringency data filtering procedure. The error filtered data yielded 1,061,522 distinct TCRB nucleotide sequences from this subject which establishes a new, directly measured, lower limit on individual T-cell repertoire size and provides a useful reference set of sequences for repertoire analysis. TCRB nucleotide sequences obtained from two additional donors were compared to those from the first donor and revealed limited sharing (up to 1.1%) of nucleotide sequences among donors, but substantially higher sharing (up to 14.2%) of inferred amino acid sequences. For each donor, shared amino acid sequences were encoded by a much larger diversity of nucleotide sequences than were unshared amino acid sequences. We also observed a highly statistically significant association between numbers of shared sequences and shared HLA class I alleles.

367 citations


Journal ArticleDOI
01 Mar 2011-Mbio
TL;DR: Genetic comparisons revealed that the genomes of the two representative C. gattii strains (genotypes VGI and VGIIa) are colinear for the majority of chromosomes, with some minor rearrangements, however, multiortholog phylogenetic analysis and an evaluation of gene/sequence conservation support the existence of speciation within the C. Gattii complex.
Abstract: Cryptococcus gattii recently emerged as the causative agent of cryptococcosis in healthy individuals in western North America, despite previous characterization of the fungus as a pathogen in tropical or subtropical regions. As a foundation to study the genetics of virulence in this pathogen, we sequenced the genomes of a strain (WM276) representing the predominant global molecular type (VGI) and a clinical strain (R265) of the major genotype (VGIIa) causing disease in North America. We compared these C. gattii genomes with each other and with the genomes of representative strains of the two varieties of Cryptococcus neoformans that generally cause disease in immunocompromised people. Our comparisons included chromosome alignments, analysis of gene content and gene family evolution, and comparative genome hybridization (CGH). These studies revealed that the genomes of the two representative C. gattii strains (genotypes VGI and VGIIa) are colinear for the majority of chromosomes, with some minor rearrangements. However, multiortholog phylogenetic analysis and an evaluation of gene/sequence conservation support the existence of speciation within the C. gattii complex. More extensive chromosome rearrangements were observed upon comparison of the C. gattii and the C. neoformans genomes. Finally, CGH revealed considerable variation in clinical and environmental isolates as well as changes in chromosome copy numbers in C. gattii isolates displaying fluconazole heteroresistance.

190 citations


Journal ArticleDOI
13 May 2011-PLOS ONE
TL;DR: An approach for mining large sequence data sets for the presence of microbial sequences and demonstrating the sensitivity of this approach by sequencing human RNA-seq libraries spiked with decreasing amounts of an RNA-virus.
Abstract: Massively parallel sequencing technology now provides the opportunity to sample the transcriptome of a given tissue comprehensively. Transcripts at only a few copies per cell are readily detectable, allowing the discovery of low abundance viral and bacterial transcripts in human tissue samples. Here we describe an approach for mining large sequence data sets for the presence of microbial sequences. Further, we demonstrate the sensitivity of this approach by sequencing human RNA-seq libraries spiked with decreasing amounts of an RNA-virus. At a modest depth of sequencing, viral transcripts can be detected at frequencies less than 1 in 1,000,000. With current sequencing platforms approaching outputs of one billion reads per run, this is a highly sensitive method for detecting putative infectious agents associated with human tissues.

60 citations


Journal ArticleDOI
11 May 2011-PLOS ONE
TL;DR: TASR is presented, a streamlined assembler that interrogates very large NGS data sets for the presence of specific variants by only considering reads within the sequence space of input target sequences provided by the user.
Abstract: As next-generation sequence (NGS) production continues to increase, analysis is becoming a significant bottleneck. However, in situations where information is required only for specific sequence variants, it is not necessary to assemble or align whole genome data sets in their entirety. Rather, NGS data sets can be mined for the presence of sequence variants of interest by localized assembly, which is a faster, easier, and more accurate approach. We present TASR, a streamlined assembler that interrogates very large NGS data sets for the presence of specific variants by only considering reads within the sequence space of input target sequences provided by the user. The NGS data set is searched for reads with an exact match to all possible short words within the target sequence, and these reads are then assembled stringently to generate a consensus of the target and flanking sequence. Typically, variants of a particular locus are provided as different target sequences, and the presence of the variant in the data set being interrogated is revealed by a successful assembly outcome. However, TASR can also be used to find unknown sequences that flank a given target. We demonstrate that TASR has utility in finding or confirming genomic mutations, polymorphisms, fusions and integration events. Targeted assembly is a powerful method for interrogating large data sets for the presence of sequence variants of interest. TASR is a fast, flexible and easy to use tool for targeted assembly.

50 citations


Journal ArticleDOI
29 Apr 2011-PLOS ONE
TL;DR: HERV-W env diversity together with Syncytin-1 abundance and host immune gene profiles were examined in the nervous system using a multiplatform approach to reinforce the potential contributions of HERV expression to neuroinflammatory diseases.
Abstract: Background The glycoprotein, Syncytin-1, is encoded by a human endogenous retrovirus (HERV)-W env gene and is capable of inducing neuroinflammation. The specific allele(s) responsible for Syncytin-1 expression in the brain is uncertain. Herein, HERV-W env diversity together with Syncytin-1 abundance and host immune gene profiles were examined in the nervous system using a multiplatform approach. Results HERV-W env sequences were encoded by multiple chromosomal encoding loci in primary human neurons compared with less chromosomal diversity in astrocytes and microglia (p<0.05). HERV-W env RNA sequences cloned from brains of patients with systemic or neurologic diseases were principally derived from chromosomal locus 7q21.2. Within the same specimens, HERV-W env transcript levels were correlated with the expression of multiple proinflammatory genes (p<0.05). Deep sequencing of brain transcriptomes disclosed the env transcripts to be the most abundant HERV-W transcripts, showing greater expression in fetal compared with healthy adult brain specimens. Syncytin-1's expression in healthy brain specimens was derived from multiple encoding loci and linked to distinct immune and developmental gene profiles. Conclusions Syncytin-1 expression in the brain during disease was associated with neuroinflammation and was principally encoded by a full length provirus. The present studies also highlighted the diversity in HERV gene expression within the brain and reinforce the potential contributions of HERV expression to neuroinflammatory diseases.

31 citations


Journal ArticleDOI
TL;DR: TASR is presented, a streamlined assembler that interrogates very large NGS data sets for the presence of specific variants, by only considering reads within the sequence space of input target sequences provided by the user.
Abstract: As next-generation sequence (NGS) production continues to increase, analysis is becoming a significant bottleneck. However, in situations where information is required only for specific sequence variants, it is not necessary to assemble or align whole genome data sets in their entirety. Rather, NGS data sets can be mined for the presence of sequence variants of interest by localized assembly, which is a faster, easier, and more accurate approach. We present TASR, a streamlined assembler that interrogates very large NGS data sets for the presence of specific variants, by only considering reads within the sequence space of input target sequences provided by the user. The NGS data set is searched for reads with an exact match to all possible short words within the target sequence, and these reads are then assembled strin-gently to generate a consensus of the target and flanking sequence. Typically, variants of a particular locus are provided as different target sequences, and the presence of the variant in the data set being interrogated is revealed by a successful assembly outcome. However, TASR can also be used to find unknown sequences that flank a given target. We demonstrate that TASR has utility in finding or confirming ge-nomic mutations, polymorphism, fusion and integration events. Targeted assembly is a powerful method for interrogating large data sets for the presence of sequence variants of interest. TASR is a fast, flexible and easy to use tool for targeted assembly.

6 citations


Proceedings ArticleDOI
09 Feb 2011
TL;DR: A parallel version of SSAKE, one of the first and more popular implementation of the sequence hashing algorithm, is presented, which enables a fast, lightweight and scalable solution for modern genome assembly.
Abstract: Present advances in sequencing technology make possible to generate large amounts of data in short time. The problem is that fragments produced by these high-throughput methods are much shorter than in traditional Sanger sequencing, and this makes stringent the issue of exploiting an efficient sequence assembly algorithm. While two common approaches are actually applied to genome assembly, overlap graph and sequence hashing, the latter allows aggressive assembling of millions of short fragments with a reasonable memory and computational cost. In particular SSAKE, one of the first and more popular implementation of the sequence hashing algorithm, was designed to leverage the information from short sequence reads by stringently assembling them into contiguous sequences that can be used to characterize novel sequencing targets. In this paper we present a parallel version of this tool that enables a fast, lightweight and scalable solution for modern genome assembly.

2 citations