Author
Manuel Tardaguila
Other affiliations: Swiss Institute of Bioinformatics, University of Florida, Spanish National Research Council
Bio: Manuel Tardaguila is an academic researcher from Wellcome Trust Sanger Institute. The author has contributed to research in topics: Genome-wide association study & Allele. The author has an hindex of 12, co-authored 22 publications receiving 562 citations. Previous affiliations of Manuel Tardaguila include Swiss Institute of Bioinformatics & University of Florida.
Topics: Genome-wide association study, Allele, Chromatin, Gene, Genome
Papers
More filters
••
National Institute for Health Research1, Harvard University2, Montreal Heart Institute3, University of North Carolina at Chapel Hill4, Wellcome Trust Sanger Institute5, VA Boston Healthcare System6, Osaka University7, Icahn School of Medicine at Mount Sinai8, University of Wisconsin–Milwaukee9, Kyushu University10, University of Washington11, University of Bristol12, University of Copenhagen13, Erasmus University Medical Center14, National Institutes of Health15, Veterans Health Administration16, Kaiser Permanente17, International Agency for Research on Cancer18, Wake Forest University19, Imperial College London20, Broad Institute21, Greifswald University Hospital22, University of Pennsylvania23, British Heart Foundation24, Fred Hutchinson Cancer Research Center25, Chinese National Human Genome Center26, Technische Universität München27, University of Tampere28, University of Tokyo29, University of Ioannina30, University of Colorado Denver31, Duke University32, University of Virginia33, University of Minnesota34, Turku University Hospital35, Los Angeles Biomedical Research Institute36, Stanford University37, Mashhad University of Medical Sciences38, NHS Blood and Transplant39, Brigham and Women's Hospital40, University of Oxford41, University of Liège42, European Bioinformatics Institute43, John Radcliffe Hospital44
TL;DR: The results show the power of large-scale blood cell trait GWAS to interrogate clinically meaningful variants across a wide allelic spectrum of human variation.
284 citations
••
TL;DR: SQANTI allows the user to maximize the analytical outcome of long-read technologies by providing the tools to deliver quality-evaluated and curated full-length transcriptomes and shows that these new transcripts have a major impact in the correct quantification of transcript levels by state-of-the-art short-read-based quantification algorithms.
Abstract: High-throughput sequencing of full-length transcripts using long reads has paved the way for the discovery of thousands of novel transcripts, even in well-annotated mammalian species. The advances in sequencing technology have created a need for studies and tools that can characterize these novel variants. Here, we present SQANTI, an automated pipeline for the classification of long-read transcripts that can assess the quality of data and the preprocessing pipeline using 47 unique descriptors. We apply SQANTI to a neuronal mouse transcriptome using Pacific Biosciences (PacBio) long reads and illustrate how the tool is effective in characterizing and describing the composition of the full-length transcriptome. We perform extensive evaluation of ToFU PacBio transcripts by PCR to reveal that an important number of the novel transcripts are technical artifacts of the sequencing approach and that SQANTI quality descriptors can be used to engineer a filtering strategy to remove them. Most novel transcripts in this curated transcriptome are novel combinations of existing splice sites, resulting more frequently in novel ORFs than novel UTRs, and are enriched in both general metabolic and neural-specific functions. We show that these new transcripts have a major impact in the correct quantification of transcript levels by state-of-the-art short-read-based quantification algorithms. By comparing our iso-transcriptome with public proteomics databases, we find that alternative isoforms are elusive to proteogenomics detection. SQANTI allows the user to maximize the analytical outcome of long-read technologies by providing the tools to deliver quality-evaluated and curated full-length transcriptomes.
271 citations
••
Wellcome Trust Sanger Institute1, National Institute for Health Research2, Boston Children's Hospital3, Broad Institute4, Montreal Heart Institute5, British Heart Foundation6, University of North Carolina at Chapel Hill7, VA Boston Healthcare System8, NHS Blood and Transplant9, University of Cambridge10, Osaka University11, Icahn School of Medicine at Mount Sinai12, University of Wisconsin–Milwaukee13, Kyushu University14, University of Washington15, University of Bristol16, University of Copenhagen17, Hasso Plattner Institute18, Erasmus University Medical Center19, National Institutes of Health20, Harvard University21, Brigham and Women's Hospital22, Kaiser Permanente23, University of Mississippi Medical Center24, International Agency for Research on Cancer25, University of Ioannina26, Wake Forest University27, Greifswald University Hospital28, University of Pennsylvania29, Fred Hutchinson Cancer Research Center30, Chinese National Human Genome Center31, Technische Universität München32, University of Tampere33, University of Tokyo34, University of Colorado Denver35, Frederiksberg Hospital36, Duke University37, University of Virginia38, University of Minnesota39, University of Turku40, Turku University Hospital41, Los Angeles Biomedical Research Institute42, Stanford University43, Université de Montréal44, Veterans Health Administration45, University of Oxford46, University of Liège47, European Bioinformatics Institute48, Imperial College London49, John Radcliffe Hospital50, Churchill Hospital51
TL;DR: These results show the power of large-scale blood cell GWAS to interrogate clinically meaningful variants across the full allelic spectrum of human variation.
Abstract: Blood cells play essential roles in human health, underpinning physiological processes such as immunity, oxygen transport, and clotting, which when perturbed cause a significant health burden. Here we integrate data from UK Biobank and a large-scale international collaborative effort, including 563,946 European ancestry participants, and discover 5,106 new genetic variants independently associated with 29 blood cell phenotypes covering the full allele frequency spectrum of variation impacting hematopoiesis. We holistically characterize the genetic architecture of hematopoiesis, assess the relevance of the omnigenic model to blood cell phenotypes, delineate relevant hematopoietic cell states influenced by regulatory genetic variants and gene networks, identify novel splice-altering variants mediating the associations, and assess the polygenic prediction potential for blood cell traits and clinical disorders at the interface of complex and Mendelian genetics. These results show the power of large-scale blood cell GWAS to interrogate clinically meaningful variants across the full allelic spectrum of human variation.
162 citations
••
TL;DR: The findings support the conclusion that CX3CL1 acts as a positive modifier of breast cancer in concert with ErbB receptors, and this effect was important insofar as mammary tumorigenesis was delayed and tumor multiplicity was reduced by genetic deletion in HER2/neu mice, but not in polyoma middle T-antigen oncomice.
Abstract: Chemokines are relevant molecules in shaping the tumor microenvironment, although their contributions to tumorigenesis are not fully understood. We studied the influence of the chemokine CX3CL1/fractalkine in de novo breast cancer formation using HER2/neu transgenic mice. CX3CL1 expression was downmodulated in HER2/neu tumors, yet, paradoxically, adenovirus-mediated CX3CL1 expression in the tumor milieu enhanced mammary tumor numbers in a dose-dependent manner. Increased tumor multiplicity was not a consequence of CX3CL1-induced metastatic dissemination of the primary tumor, although CX3CL1 induced epithelial-to-mesenchymal transition in breast cancer cells in vitro. Instead, CX3CL1 triggered cell proliferation by induction of ErbB receptors through the proteolytic shedding of an ErbB ligand. This effect was important insofar as mammary tumorigenesis was delayed and tumor multiplicity was reduced by genetic deletion of CX3CL1 in HER2/neu mice, but not in polyoma middle T-antigen oncomice. Our findings support the conclusion that CX3CL1 acts as a positive modifier of breast cancer in concert with ErbB receptors.
67 citations
••
TL;DR: Corrigendum: SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification is presented.
Abstract: Corrigendum: SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification Manuel Tardaguila, Lorena de la Fuente, CristinaMarti, Cécile Pereira, Francisco Jose Pardo-Palacios, Hector del Risco, Marc Ferrell, Maravillas Mellado, Marissa Macchietto, Kenneth Verheggen, Mariola Edelmann, Iakes Ezkurdia, Jesus Vazquez, Michael Tress, Ali Mortazavi, Lennart Martens, Susana Rodriguez-Navarro, Victoria Moreno-Manzano, and Ana Conesa
65 citations
Cited by
More filters
••
TL;DR: Key statistics on the current data contents and volume of downloads are outlined, and how PRIDE data are starting to be disseminated to added-value resources including Ensembl, UniProt and Expression Atlas are outlined.
Abstract: The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world’s largest data repository of mass spectrometry-based proteomics data, and is one of the founding members of the global ProteomeXchange (PX) consortium. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2016. In the last 3 years, public data sharing through PRIDE (as part of PX) has definitely become the norm in the field. In parallel, data re-use of public proteomics data has increased enormously, with multiple applications. We first describe the new architecture of PRIDE Archive, the archival component of PRIDE. PRIDE Archive and the related data submission framework have been further developed to support the increase in submitted data volumes and additional data types. A new scalable and fault tolerant storage backend, Application Programming Interface and web interface have been implemented, as a part of an ongoing process. Additionally, we emphasize the improved support for quantitative proteomics data through the mzTab format. At last, we outline key statistics on the current data contents and volume of downloads, and how PRIDE data are starting to be disseminated to added-value resources including Ensembl, UniProt and Expression Atlas.
5,735 citations
••
TL;DR: The current landscape of available tools is reviewed, the principles of error correction, base modification detection, and long-read transcriptomics analysis are focused on, and the challenges that remain are highlighted.
Abstract: Long-read technologies are overcoming early limitations in accuracy and throughput, broadening their application domains in genomics. Dedicated analysis tools that take into account the characteristics of long-read data are thus required, but the fast pace of development of such tools can be overwhelming. To assist in the design and analysis of long-read sequencing projects, we review the current landscape of available tools and present an online interactive database, long-read-tools.org, to facilitate their browsing. We further focus on the principles of error correction, base modification detection, and long-read transcriptomics analysis and highlight the challenges that remain.
1,172 citations
••
TL;DR: StringTie2 is a reference-guided transcriptome assembler that works with both short and long reads and offers the ability to work with full-length super-reads assembled from short reads, which further improves the quality of short-read assemblies.
Abstract: RNA sequencing using the latest single-molecule sequencing instruments produces reads that are thousands of nucleotides long. The ability to assemble these long reads can greatly improve the sensitivity of long-read analyses. Here we present StringTie2, a reference-guided transcriptome assembler that works with both short and long reads. StringTie2 includes new methods to handle the high error rate of long reads and offers the ability to work with full-length super-reads assembled from short reads, which further improves the quality of short-read assemblies. StringTie2 is more accurate and faster and uses less memory than all comparable short-read and long-read analysis tools.
635 citations
••
TL;DR: StringTie2 is a reference-guided transcriptome assembler that works with both short and long reads and includes new computational methods to handle the high error rate of long-read sequencing technology, which previous assemblers could not tolerate.
Abstract: RNA sequencing using the latest single-molecule sequencing instruments produces reads that are thousands of nucleotides long. The ability to assemble these long reads can greatly improve the sensitivity of long-read analyses. Here we present StringTie2, a reference-guided transcriptome assembler that works with both short and long reads. StringTie2 includes new computational methods to handle the high error rate of long-read sequencing technology, which previous assemblers could not tolerate. It also offers the ability to work with full-length super-reads assembled from short reads, which further improves the quality of assemblies. On 33 short-read datasets from humans and two plant species, StringTie2 is 47.3% more precise and 3.9% more sensitive than Scallop. On multiple long read datasets, StringTie2 on average correctly assembles 8.3 and 2.6 times as many transcripts as FLAIR and Traphlor, respectively, with substantially higher precision. StringTie2 is also faster and has a smaller memory footprint than all comparable tools.
390 citations
••
TL;DR: This Review discusses bioinformatics tools that have been devised to handle the numerous characteristic features of these long-range data types, with applications in genome assembly, genetic variant detection, haplotype phasing, transcriptomics and epigenomics.
Abstract: Several new genomics technologies have become available that offer long-read sequencing or long-range mapping with higher throughput and higher resolution analysis than ever before. These long-range technologies are rapidly advancing the field with improved reference genomes, more comprehensive variant identification and more complete views of transcriptomes and epigenomes. However, they also require new bioinformatics approaches to take full advantage of their unique characteristics while overcoming their complex errors and modalities. Here, we discuss several of the most important applications of the new technologies, focusing on both the currently available bioinformatics tools and opportunities for future research.
381 citations