scispace - formally typeset
Search or ask a question

Showing papers by "Jo Vandesompele published in 2019"


Journal ArticleDOI
TL;DR: The fifth release of the human lncRNA database LNCipedia is presented, with the most notable improvements include manual literature curation of 2482 lncRNAs articles and the use of official gene symbols when available.
Abstract: While long non-coding RNA (lncRNA) research in the past has primarily focused on the discovery of novel genes, today it has shifted towards functional annotation of this large class of genes. With thousands of lncRNA studies published every year, the current challenge lies in keeping track of which lncRNAs are functionally described. This is further complicated by the fact that lncRNA nomenclature is not straightforward and lncRNA annotation is scattered across different resources with their own quality metrics and definition of a lncRNA. To overcome this issue, large scale curation and annotation is needed. Here, we present the fifth release of the human lncRNA database LNCipedia (https://lncipedia.org). The most notable improvements include manual literature curation of 2482 lncRNA articles and the use of official gene symbols when available. In addition, an improved filtering pipeline results in a higher quality reference lncRNA gene set.

352 citations


Journal ArticleDOI
TL;DR: A new genome mapping pipeline that identifies genomic locations for ncRNA sequences in 296 species is added and several new types of functional annotations, such as tRNA secondary structures, Gene Ontology annotations, and miRNA-target interactions are added.
Abstract: RNAcentral is a comprehensive database of non-coding RNA (ncRNA) sequences, collating information on ncRNA sequences of all types from a broad range of organisms. We have recently added a new genome mapping pipeline that identifies genomic locations for ncRNA sequences in 296 species. We have also added several new types of functional annotations, such as tRNA secondary structures, Gene Ontology annotations, and miRNA-target interactions. A new quality control mechanism based on Rfam family assignments identifies potential contamination, incomplete sequences, and more. The RNAcentral database has become a vital component of many workflows in the RNA community, serving as both the primary source of sequence data for academic and commercial groups, as well as a source of stable accessions for the annotation of genomic and functional features. These examples are facilitated by an improved RNAcentral web interface, which features an updated genome browser, a new sequence feature viewer, and improved text search functionality. RNAcentral is freely available at https://rnacentral.org.

177 citations


Journal ArticleDOI
TL;DR: The challenges related to long noncoding RNA expression profiling are discussed, how cancer longnoncoding RNAs provide new opportunities for cancer diagnosis and treatment are highlighted, and future developments are reflected.
Abstract: In recent years, technological advances in transcriptome profiling revealed that the repertoire of human RNA molecules is more diverse and extended than originally thought. This diversity and complexity mainly derive from a large ensemble of noncoding RNAs. Because of their key roles in cellular processes important for normal development and physiology, disruption of noncoding RNA expression is intrinsically linked to human disease, including cancer. Therefore, studying the noncoding portion of the transcriptome offers the prospect of identifying novel therapeutic and diagnostic targets. Although evidence of the relevance of noncoding RNAs in cancer is accumulating, we still face many challenges when it comes to accurately profiling their expression levels. Some of these challenges are inherent to the technologies employed, whereas others are associated with characteristics of the noncoding RNAs themselves. In this review, we discuss the challenges related to long noncoding RNA expression profiling, highlight how cancer long noncoding RNAs provide new opportunities for cancer diagnosis and treatment, and reflect on future developments.

102 citations


Journal ArticleDOI
TL;DR: HIV-derived gag proteins are used to assemble recombinant fluorescent EV as a trackable reference material resembling the physical and biochemical properties of sample EV, which will aid EV-based sample preparation and analysis, data normalization, method development and instrument calibration in various research and biomedical applications.
Abstract: Recent years have seen an increase of extracellular vesicle (EV) research geared towards biological understanding, diagnostics and therapy. However, EV data interpretation remains challenging owing to complexity of biofluids and technical variation introduced during sample preparation and analysis. To understand and mitigate these limitations, we generated trackable recombinant EV (rEV) as a biological reference material. Employing complementary characterization methods, we demonstrate that rEV are stable and bear physical and biochemical traits characteristic of sample EV. Furthermore, rEV can be quantified using fluorescence-, RNA- and protein-based technologies available in routine laboratories. Spiking rEV in biofluids allows recovery efficiencies of commonly implemented EV separation methods to be identified, intra-method and inter-user variability induced by sample handling to be defined, and to normalize and improve sensitivity of EV enumerations. We anticipate that rEV will aid EV-based sample preparation and analysis, data normalization, method development and instrument calibration in various research and biomedical applications.

85 citations


Journal ArticleDOI
TL;DR: The authors identify a long noncoding RNA, lncNB1, in these cancers and show that it promotes tumorigenesis by binding to ribosomal protein, RPL35 to enhance E2F1 and DEPDC1B protein synthesis, which phosphorylates ERK to stabilise N-Myc.
Abstract: The majority of patients with neuroblastoma due to MYCN oncogene amplification and consequent N-Myc oncoprotein over-expression die of the disease. Here our analyses of RNA sequencing data identify the long noncoding RNA lncNB1 as one of the transcripts most over-expressed in MYCN-amplified, compared with MYCN-non-amplified, human neuroblastoma cells and also the most over-expressed in neuroblastoma compared with all other cancers. lncNB1 binds to the ribosomal protein RPL35 to enhance E2F1 protein synthesis, leading to DEPDC1B gene transcription. The GTPase-activating protein DEPDC1B induces ERK protein phosphorylation and N-Myc protein stabilization. Importantly, lncNB1 knockdown abolishes neuroblastoma cell clonogenic capacity in vitro and leads to neuroblastoma tumor regression in mice, while high levels of lncNB1 and RPL35 in human neuroblastoma tissues predict poor patient prognosis. This study therefore identifies lncNB1 and its binding protein RPL35 as key factors for promoting E2F1 protein synthesis, N-Myc protein stability and N-Myc-driven oncogenesis, and as therapeutic targets.

58 citations



Journal ArticleDOI
TL;DR: In conclusion, this work is the first to show that the SMARTer method can be used for unbiased unraveling of the complete transcriptome of a wide range of biofluids and their extracellular vesicles.
Abstract: RNA profiling has emerged as a powerful tool to investigate the biomarker potential of human biofluids. However, despite enormous interest in extracellular nucleic acids, RNA sequencing methods to quantify the total RNA content outside cells are rare. Here, we evaluate the performance of the SMARTer Stranded Total RNA-Seq method in human platelet-rich plasma, platelet-free plasma, urine, conditioned medium, and extracellular vesicles (EVs) from these biofluids. We found the method to be accurate, precise, compatible with low-input volumes and able to quantify a few thousand genes. We picked up distinct classes of RNA molecules, including mRNA, lncRNA, circRNA, miscRNA and pseudogenes. Notably, the read distribution and gene content drastically differ among biofluids. In conclusion, we are the first to show that the SMARTer method can be used for unbiased unraveling of the complete transcriptome of a wide range of biofluids and their extracellular vesicles.

44 citations


Journal ArticleDOI
TL;DR: This work developed a novel single cell strand-specific total RNA library preparation method addressing all the shortcomings of existing methods and demonstrating that the method detects an equal or higher number of genes compared to classic polyA[+] RNA-seq, including novel and non-polyadenylated genes.
Abstract: Single cell RNA sequencing methods have been increasingly used to understand cellular heterogeneity. Nevertheless, most of these methods suffer from one or more limitations, such as focusing only on polyadenylated RNA, sequencing of only the 3' end of the transcript, an exuberant fraction of reads mapping to ribosomal RNA, and the unstranded nature of the sequencing data. Here, we developed a novel single cell strand-specific total RNA library preparation method addressing all the aforementioned shortcomings. Our method was validated on a microfluidics system using three different cancer cell lines undergoing a chemical or genetic perturbation and on two other cancer cell lines sorted in microplates. We demonstrate that our total RNA-seq method detects an equal or higher number of genes compared to classic polyA[+] RNA-seq, including novel and non-polyadenylated genes. The obtained RNA expression patterns also recapitulate the expected biological signal. Inherent to total RNA-seq, our method is also able to detect circular RNAs. Taken together, SMARTer single cell total RNA sequencing is very well suited for any single cell sequencing experiment in which transcript level information is needed beyond polyadenylated genes.

33 citations


Journal ArticleDOI
TL;DR: The resulting method, called double-mismatch allele-specific qPCR (DMAS-qPCR), was successfully validated using 12 SNPs and 15 clinically relevant somatic mutations on 48 cancer cell lines and is characterized by high analytical sensitivity and specificity.
Abstract: For a wide range of diseases, SNPs in the genome are the underlying mechanism of dysfunction. Therefore, targeted detection of these variations is of high importance for early diagnosis and (familial) screenings. While allele-specific PCR has been around for many years, its adoption for SNP genotyping or somatic mutation detection has been hampered by its low discriminating power and high costs. To tackle this, we developed a cost-effective qPCR based method, able to detect SNPs in a robust and specific manner. This study describes how to combine the basic principles of allele-specific PCR (the combination of a wild type and variant primer) with the straightforward readout of DNA-binding dye based qPCR technology. To enhance the robustness and discriminating power, an artificial mismatch in the allele-specific primer was introduced. The resulting method, called double-mismatch allele-specific qPCR (DMAS-qPCR), was successfully validated using 12 SNPs and 15 clinically relevant somatic mutations on 48 cancer cell lines. It is easy to use, does not require labeled probes and is characterized by high analytical sensitivity and specificity. DMAS-qPCR comes with a complimentary online assay design tool, available for the whole scientific community, enabling researchers to design custom assays and implement those as a diagnostic test.

30 citations


Journal ArticleDOI
TL;DR: Although PC patients with oligometastatic disease had a more favorable prognosis, no serum-derived biomarkers allowing for prospective discrimination of oligo- and polymetastatic prostate cancer patients could be identified.
Abstract: Patients with oligometastatic prostate cancer (PC) may benefit from metastasis-directed therapy (MDT), delaying disease progression and the start of palliative systemic treatment. However, a significant proportion of oligometastatic PC patients progress to polymetastatic PC within a year following MDT, suggesting an underestimation of the metastatic load by current staging modalities. Molecular markers could help to identify true oligometastatic patients eligible for MDT. Patients with asymptomatic biochemical recurrence following primary PC treatment were classified as oligo- or polymetastatic based on 18F-choline PET/CT imaging. Oligometastatic patients had up to three metastases at baseline and did not progress to more than three lesions following MDT or surveillance within 1 year of diagnosis of metastases. Polymetastatic patients had > 3 metastases at baseline or developed > 3 metastases within 1 year following imaging. A model aiming to prospectively distinguish oligo- and polymetastatic PC patients was trained using clinicopathological parameters and serum-derived microRNA expression profiles from a discovery cohort of 20 oligometastatic and 20 polymetastatic PC patients. To confirm the models predictive performance, it was applied on biomarker data obtained from an independent validation cohort of 44 patients with oligometastatic and 39 patients with polymetastatic disease. Oligometastatic PC patients had a more favorable prognosis compared to polymetastatic ones, as defined by a significantly longer median CRPC-free survival (not reached versus 38 months; 95% confidence interval 31–45 months with P < 0.001). Despite the good performance of a predictive model trained on the discovery cohort, with an AUC of 0.833 (0.693–0.973; 95% CI) and a sensitivity of 0.894 (0.714–1.000; 95% CI) for oligometastatic disease, none of the miRNA targets were found to be differentially expressed between oligo- and polymetastatic PC patients in the signature validation cohort. The multivariate model had an AUC of 0.393 (0.534 after cross-validation) and therefore, no predictive ability. Although PC patients with oligometastatic disease had a more favorable prognosis, no serum-derived biomarkers allowing for prospective discrimination of oligo- and polymetastatic prostate cancer patients could be identified.

28 citations


Posted ContentDOI
14 Jul 2019-bioRxiv
TL;DR: In conclusion, this work is the first to show that the SMARTer method can be used for unbiased unraveling of the complete transcriptome of a wide range of biofluids and their extracellular vesicles.
Abstract: RNA profiling has emerged as a powerful tool to investigate the biomarker potential of human biofluids. However, despite enormous interest in extracellular nucleic acids, RNA sequencing methods to quantify the total RNA content outside cells are rare. Here, we evaluate the performance of the SMARTer Stranded Total RNA-Seq method in human platelet-rich plasma, platelet-free plasma, urine, conditioned medium, and extracellular vesicles (EVs) from these biofluids. We found the method to be accurate, precise, compatible with low-input volumes and able to quantify a few thousand genes. We picked up distinct classes of RNA molecules, including mRNA, lncRNA, circRNA, miscRNA and pseudogenes. Notably, the read distribution and gene content drastically differ among biofluids. In conclusion, we are the first to show that the SMARTer method can be used for unbiased unraveling of the complete transcriptome of a wide range of biofluids and their extracellular vesicles.

Journal ArticleDOI
TL;DR: This analysis revealed strong associations between the neuroblastoma lincRNAs MIAT and MEG3 and MYCN and PHOX2B activity or expression and a strong association between stromal cell composition and driver gene status, resulting in differential expression of these linc RNAs.
Abstract: Long intergenic non-coding RNAs (lincRNAs) are emerging as integral components of signaling pathways in various cancer types. In neuroblastoma, only a handful of lincRNAs are known as upstream regulators or downstream effectors of oncogenes. Here, we exploit RNA sequencing data of primary neuroblastoma tumors, neuroblast precursor cells, neuroblastoma cell lines and various cellular perturbation model systems to define the neuroblastoma lincRNome and map lincRNAs up- and downstream of neuroblastoma driver genes MYCN, ALK and PHOX2B. Each of these driver genes controls the expression of a particular subset of lincRNAs, several of which are associated with poor survival and are differentially expressed in neuroblastoma tumors compared to neuroblasts. By integrating RNA sequencing data from both primary tumor tissue and cancer cell lines, we demonstrate that several of these lincRNAs are expressed in stromal cells. Deconvolution of primary tumor gene expression data revealed a strong association between stromal cell composition and driver gene status, resulting in differential expression of these lincRNAs. We also explored lincRNAs that putatively act upstream of neuroblastoma driver genes, either as presumed modulators of driver gene activity, or as modulators of effectors regulating driver gene expression. This analysis revealed strong associations between the neuroblastoma lincRNAs MIAT and MEG3 and MYCN and PHOX2B activity or expression. Together, our results provide a comprehensive catalogue of the neuroblastoma lincRNome, highlighting lincRNAs up- and downstream of key neuroblastoma driver genes. This catalogue forms a solid basis for further functional validation of candidate neuroblastoma lincRNAs.

Posted ContentDOI
17 Oct 2019-bioRxiv
TL;DR: This study reports on thousands of novel RNA species across all major RNA biotypes, including a hitherto poorly-cataloged class of non-polyadenylated single-exon long non-coding RNAs.
Abstract: The human transcriptome consists of various RNA biotypes including multiple types of non-coding RNAs (ncRNAs). Current ncRNA compendia remain incomplete partially because they are almost exclusively derived from the interrogation of small- and polyadenylated RNAs. Here, we present a more comprehensive atlas of the human transcriptome that is derived from matching polyA-, total-, and small-RNA profiles of a heterogeneous collection of nearly 300 human tissues and cell lines. We report on thousands of novel RNA species across all major RNA biotypes, including a hitherto poorly-cataloged class of non-polyadenylated single-exon long non-coding RNAs. In addition, we exploit intron abundance estimates from total RNA-sequencing to test and verify functional regulation by novel non-coding RNAs. Our study represents a substantial expansion of the current catalogue of human ncRNAs and their regulatory interactions. All data, analyses, and results are available in the R2 web portal and serve as a basis to further explore RNA biology and function.

Journal ArticleDOI
TL;DR: Although B elements are suitable as an alternative normalization strategy in the hippocampus, they do not represent a universal normalization approach in the APP23 model.

Posted ContentDOI
08 Oct 2019-bioRxiv
TL;DR: A novel technique, reduced representation bisulfite sequencing on cell-free DNA (cf-RRBS), is shown to have the feasibility of obtaining the histopathological diagnosis with a minimally invasive test on either plasma or cerebrospinal fluid.
Abstract: In the clinical management of pediatric solid tumors, histological examination of tumor tissue obtained by a biopsy remains the gold standard to establish a conclusive pathological diagnosis. The DNA methylation pattern of a tumor is known to correlate with the histopathological diagnosis across cancer types and is showing promise in the diagnostic workup of tumor samples. This methylation pattern can be detected in the cell-free DNA. Here, we provide proof-of-concept of histopathologic classification of pediatric tumors using cell-free reduced representation bisulfite sequencing (cf-RRBS) from retrospectively collected plasma and cerebrospinal fluid samples. We determined the correct tumor type in 49 out of 60 (81.6%) samples starting from minute amounts (less than 10 ng) of cell-free DNA. We demonstrate that the majority of misclassifications were associated with sample quality and not with the extent of disease. Our approach has the potential to help tackle some of the remaining diagnostic challenges in pediatric oncology in a cost-effective and minimally invasive manner. Translational relevance Obtaining a correct diagnosis in pediatric oncology can be challenging in some tumor types, especially in renal tumors or central nervous system tumors. Furthermore, the diagnostic odyssey can result in anxiety and discomfort for these children. By applying a novel technique, reduced representation bisulfite sequencing on cell-free DNA (cf-RRBS), we show the feasibility of obtaining the histopathological diagnosis with a minimally invasive test on either plasma or cerebrospinal fluid. Furthermore, we were able to derive the copy number profile or tumor subtype from the same assay. Given that primary tumor material might be difficult to obtain, in particular in critically ill children or depending on the tumor location, and might be limited in terms of quantity or quality, our assay could become complementary to the classical tissue biopsy in difficult cases.

Posted ContentDOI
21 Jun 2019-bioRxiv
TL;DR: This method constructs an empirical distribution of gene expression data from a given source RNA-seq experiment to faithfully capture the data characteristics of real data, and can be used to simulate a wide range of scenarios, such as single or multiple biological groups, systematic variations, and different sample sizes.
Abstract: Summary SPsimSeq is a semi-parametric simulation method for bulk and single cell RNA sequencing data. It simulates data from a good estimate of the actual distribution of a given real RNA-seq dataset. In contrast to existing approaches that assume a particular data distribution, our method constructs an empirical distribution of gene expression data from a given source RNA-seq experiment to faithfully capture the data characteristics of real data. Importantly, our method can be used to simulate a wide range of scenarios, such as single or multiple biological groups, systematic variations (e.g. confounding batch effects), and different sample sizes. It can also be used to simulate different gene expression units resulting from different library preparation protocols, such as read counts or UMI counts. Availability and implementation The R package and associated documentation is available from https://github.com/CenterForStatistics-UGent/SPsimSeq. Supplementary information Supplementary data are available at bioRχiv online.

Journal ArticleDOI
TL;DR: This work evaluated two suitable RNA isolation kits and determined that sorting cells directly into lysis buffer is a critical step for success, and demonstrated that an additional genomic DNA removal step after RNA isolation is required to completely clear the RNA from any contaminating genomic DNA.
Abstract: Transgenic zebrafish lines with the expression of a fluorescent reporter under the control of a cell-type specific promoter, enable transcriptome analysis of FACS sorted cell populations. RNA quality and yield are key determinant factors for accurate expression profiling. Limited cell number and FACS induced cellular stress make RNA isolation of sorted zebrafish cells a delicate process. We aimed to optimize a workflow to extract sufficient amounts of high-quality RNA from a limited number of FACS sorted cells from Tg(fli1a:GFP) zebrafish embryos, which can be used for accurate gene expression analysis. We evaluated two suitable RNA isolation kits (the RNAqueous micro and the RNeasy plus micro kit) and determined that sorting cells directly into lysis buffer is a critical step for success. For low cell numbers, this ensures direct cell lysis, protects RNA from degradation and results in a higher RNA quality and yield. We showed that this works well up to 0.5× dilution of the lysis buffer with sorted cells. In our sort settings, this corresponded to 30,000 and 75,000 cells for the RNAqueous micro kit and RNeasy plus micro kit respectively. Sorting more cells dilutes the lysis buffer too much and requires the use of a collection buffer. We also demonstrated that an additional genomic DNA removal step after RNA isolation is required to completely clear the RNA from any contaminating genomic DNA. For cDNA synthesis and library preparation, we combined SmartSeq v4 full length cDNA library amplification, Nextera XT tagmentation and sample barcoding. Using this workflow, we were able to generate highly reproducible RNA sequencing results. The presented optimized workflow enables to generate high quality RNA and allows accurate transcriptome profiling of small populations of sorted zebrafish cells.

Posted ContentDOI
30 Jul 2019-bioRxiv
TL;DR: In this article, a semi-parametric approach based on probabilistic index models (PIM) was proposed for differential expression (DE) detection in single-cell RNA sequencing (scRNA-seq) data.
Abstract: Single-cell RNA sequencing (scRNA-seq) technologies profile gene expression patterns in individual cells. It is often of interest to test for differential expression (DE) between conditions, e.g. treatment and control or between cell types. Simulation studies have shown that non-parametric tests, such as the Wilcoxon-rank sum test, can robustly detect significant DE, with better performance than many parametric tools specifically developed for scRNA-seq data analysis. However, these classical rank tests cannot be used for complex experimental designs involving multiple groups, multiple factors and confounding variables. Further, rank based tests do not provide an interpretable measure of the effect size. We propose a semi-parametric approach based on probabilistic index models (PIM) that form a flexible class of models that generalize classical rank tests. Our method does not rely on strong distributional assumptions and it allows accounting for confounding factors. Moreover, our method allows for the estimation of the effect size in terms of a probabilistic index. Real data analysis demonstrated that PIM is capable of identifying biologically meaningful DE. Our simulation studies also show that tests for DE succeed well in controlling the false discovery rate at its nominal level, while maintaining good sensitivity as compared to competing methods.

Proceedings Article
01 Jan 2019
TL;DR: References 1) Abdel-Salam, E., Abdel-Mequi, I., Korraa, S.S. and Feghali-Bostwick, C.A. (2009).
Abstract: References 1) Abdel-Salam, E., Abdel-Mequi, I., Korraa, S.S. (2009). Markers of degeneration and regeneration in Duchenne muscular dystrophy. Acta Myol. 28, 94-100. 2) Banker, B.Q. and Engel, A.G. (2004) Basic reactions of muscle. In: Engel AG, Franzini-Armstrong C (eds) Myology. McGraw-Hill, New York, pp 691–748. 3) Kendall, R.T. and Feghali-Bostwick, C.A. (2014). Fibroblasts in fibrosis: novel roles and mediators. Front. Pharmacol. 5, article 123, doi: 10.3389/fphar.2014.00123 4) Mann , C.J., Perdiguero, E., Kharraz, Y., Aguilar, S., Pessina, P., Serrano, A.L., Muñoz-Cánoves (2011). Aberrant repair and fibrosis development in skeletal muscle. Muscle Nerve. 1: 21.

Posted ContentDOI
08 Jul 2019-bioRxiv
TL;DR: The data of polyA[+] and total RNA sequencing in the context of in vitro TLX1 knockdown in ALL-SIL cells and a primary T-ALL cohort are presented and it is shown that ATAC and H3K4me3 ChIP peaks are enriched at transcription start sites.
Abstract: Most currently available transcriptome data of T-cell acute lymphoblastic leukemia (T-ALL) are based on polyA[+] RNA sequencing methods thus lacking non-polyadenylated transcripts. Here, we present the data of polyA[+] and total RNA sequencing in the context of in vitro TLX1 knockdown in ALL-SIL cells and a primary T-ALL cohort. We extended this dataset with ATAC sequencing and H3K4me1 and H3K4me3 ChIP sequencing data to map putative gene regulatory regions. In this data descriptor, we present a detailed report of how the data were generated and which bioinformatics analyses were performed. Through several technical validations, we showed that our sequencing data are of high quality and that our in vitro TLX1 knockdown was successful. We also validated the quality of the ATAC and ChIP sequencing data and showed that ATAC and H3K4me3 ChIP peaks are enriched at transcription start sites. We believe that this comprehensive set of sequencing data can be reused by others to further unravel the complex biology of T-ALL in general and TLX1 in particular.