scispace - formally typeset

Anthony Bolger

Other affiliations: Max Planck Society
Bio: Anthony Bolger is a academic researcher from RWTH Aachen University. The author has contributed to research in topic(s): Genome & Gene. The author has an hindex of 17, co-authored 29 publication(s) receiving 28766 citation(s). Previous affiliations of Anthony Bolger include Max Planck Society. more

Topics: Genome, Gene, Nanopore sequencing more

Open accessJournal ArticleDOI: 10.1093/BIOINFORMATICS/BTU170
Anthony Bolger1, Marc Lohse1, Bjoern Usadel1Institutions (1)
01 Aug 2014-Bioinformatics
Abstract: Motivation: Although many next-generation sequencing (NGS) read preprocessing tools already existed, we could not find any tool or combination of tools that met our requirements in terms of flexibility, correct handling of paired-end data and high performance. We have developed Trimmomatic as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data. Results: The value of NGS read preprocessing is demonstrated for both reference-based and reference-free tasks. Trimmomatic is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested. Availability and implementation: Trimmomatic is licensed under GPL V3. It is cross-platform (Java 1.5+ required) and available at Contact: ed.nehcaa-htwr.1oib@ledasu Supplementary information: Supplementary data are available at Bioinformatics online. more

26,464 Citations

Open accessJournal ArticleDOI: 10.1093/NAR/GKS540
Marc Lohse1, Anthony Bolger1, Axel Nagel1, Alisdair R. Fernie1  +3 moreInstitutions (1)
Abstract: Recent rapid advances in next generation RNA sequencing (RNA-Seq)-based provide researchers with unprecedentedly large data sets and open new perspectives in transcriptomics. Furthermore, RNA-Seq-based transcript profiling can be applied to non-model and newly discovered organisms because it does not require a predefined measuring platform (like e.g. microarrays). However, these novel technologies pose new challenges: the raw data need to be rigorously quality checked and filtered prior to analysis, and proper statistical methods have to be applied to extract biologically relevant information. Given the sheer volume of data, this is no trivial task and requires a combination of considerable technical resources along with bioinformatics expertise. To aid the individual researcher, we have developed RobiNA as an integrated solution that consolidates all steps of RNA-Seq-based differential gene-expression analysis in one user-friendly cross-platform application featuring a rich graphical user interface. RobiNA accepts raw FastQ files, SAM/BAM alignment files and counts tables as input. It supports quality checking, flexible filtering and statistical analysis of differential gene expression based on state-of-the art biostatistical methods developed in the R/Bioconductor projects. In-line help and a step-by-step manual guide users through the analysis. Installer packages for Mac OS X, Windows and Linux are available under the LGPL licence from robin. more

Topics: Bioconductor (53%), FASTQ format (51%), Graphical user interface (50%)

744 Citations

Open accessJournal ArticleDOI: 10.1038/NG.3046
Anthony Bolger1, Federico Scossa1, Marie E. Bolger1, Christa Lanz1  +39 moreInstitutions (10)
01 Sep 2014-Nature Genetics
Abstract: Solanum pennellii is a wild tomato species endemic to Andean regions in South America, where it has evolved to thrive in arid habitats. Because of its extreme stress tolerance and unusual morphology, it is an important donor of germplasm for the cultivated tomato Solanum lycopersicum. Introgression lines (ILs) in which large genomic regions of S. lycopersicum are replaced with the corresponding segments from S. pennellii can show remarkably superior agronomic performance. Here we describe a high-quality genome assembly of the parents of the IL population. By anchoring the S. pennellii genome to the genetic map, we define candidate genes for stress tolerance and provide evidence that transposable elements had a role in the evolution of these traits. Our work paves a path toward further tomato improvement and for deciphering the mechanisms underlying the myriad other agronomic traits that can be improved with S. pennellii germplasm. more

Topics: Wild tomato (54%), Solanum (52%), Population (52%)

299 Citations

Open accessJournal ArticleDOI: 10.1073/PNAS.1309606110
Abstract: Although applied over extremely short timescales, artificial selection has dramatically altered the form, physiology, and life history of cultivated plants. We have used RNAseq to define both gene sequence and expression divergence between cultivated tomato and five related wild species. Based on sequence differences, we detect footprints of positive selection in over 50 genes. We also document thousands of shifts in gene-expression level, many of which resulted from changes in selection pressure. These rapidly evolving genes are commonly associated with environmental response and stress tolerance. The importance of environmental inputs during evolution of gene expression is further highlighted by large-scale alteration of the light response coexpression network between wild and cultivated accessions. Human manipulation of the genome has heavily impacted the tomato transcriptome through directed admixture and by indirectly favoring nonsynonymous over synonymous substitutions. Taken together, our results shed light on the pervasive effects artificial and natural selection have had on the transcriptomes of tomato and its wild relatives. more

292 Citations

Open accessJournal ArticleDOI: 10.1016/J.MOLP.2019.01.003
03 Jun 2019-Molecular Plant
Abstract: Genome sequences from over 200 plant species have already been published, with this number expected to increase rapidly due to advances in sequencing technologies. Once a new genome has been assembled and the genes identified, the functional annotation of their putative translational products, proteins, using ontologies is of key importance as it places the sequencing data in a biological context. Furthermore, to keep pace with rapid production of genome sequences, this functional annotation process must be fully automated. Here we present a redesigned and significantly enhanced MapMan4 framework, together with a revised version of the associated online Mercator annotation tool. Compared with the original MapMan, the new ontology has been expanded almost threefold and enforces stricter assignment rules. This framework was then incorporated into Mercator4, which has been upgraded to reflect current knowledge across the land plant group, providing protein annotations for all embryophytes with a comparably high quality. The annotation process has been optimized to allow a plant genome to be annotated in a matter of minutes. The output results continue to be compatible with the established MapMan desktop application. more

Topics: Protein Annotation (61%), Annotation (53%)

147 Citations

Cited by

Open accessJournal ArticleDOI: 10.1093/BIOINFORMATICS/BTU638
15 Jan 2015-Bioinformatics
Abstract: Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an opensource software under the GNU General Public Licence and available from or from the Python Package Index at Contact: more

11,833 Citations

Open access
01 Jun 2012-
Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( ). It is distributed as open source software. more

Topics: Single cell sequencing (61%), Genomics (60%), Sequence assembly (57%)

10,124 Citations

Open accessJournal ArticleDOI: 10.1038/S41586-020-2008-3
Fan Wu1, Su Zhao2, Bin Yu3, Yan-Mei Chen1  +17 moreInstitutions (4)
03 Feb 2020-Nature
Abstract: Emerging infectious diseases, such as severe acute respiratory syndrome (SARS) and Zika virus disease, present a major threat to public health1–3. Despite intense research efforts, how, when and where new diseases appear are still a source of considerable uncertainty. A severe respiratory disease was recently reported in Wuhan, Hubei province, China. As of 25 January 2020, at least 1,975 cases had been reported since the first patient was hospitalized on 12 December 2019. Epidemiological investigations have suggested that the outbreak was associated with a seafood market in Wuhan. Here we study a single patient who was a worker at the market and who was admitted to the Central Hospital of Wuhan on 26 December 2019 while experiencing a severe respiratory syndrome that included fever, dizziness and a cough. Metagenomic RNA sequencing4 of a sample of bronchoalveolar lavage fluid from the patient identified a new RNA virus strain from the family Coronaviridae, which is designated here ‘WH-Human 1’ coronavirus (and has also been referred to as ‘2019-nCoV’). Phylogenetic analysis of the complete viral genome (29,903 nucleotides) revealed that the virus was most closely related (89.1% nucleotide similarity) to a group of SARS-like coronaviruses (genus Betacoronavirus, subgenus Sarbecovirus) that had previously been found in bats in China5. This outbreak highlights the ongoing ability of viral spill-over from animals to cause severe disease in humans. Phylogenetic and metagenomic analyses of the complete viral genome of a new coronavirus from the family Coronaviridae reveal that the virus is closely related to a group of SARS-like coronaviruses found in bats in China. more

Topics: Coronavirus (62%), Betacoronavirus (59%), Zika virus disease (54%) more

6,266 Citations

Open accessJournal ArticleDOI: 10.1038/NPROT.2013.084
Brian J. Haas1, Alexie Papanicolaou2, Moran Yassour3, Moran Yassour4  +21 moreInstitutions (16)
01 Aug 2013-Nature Protocols
Abstract: De novo assembly of RNA-seq data enables researchers to study transcriptomes without the need for a genome sequence; this approach can be usefully applied, for instance, in research on 'non-model organisms' of ecological and evolutionary importance, cancer samples or the microbiome. In this protocol we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-seq data in non-model organisms. We also present Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes. In the procedure, we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from The run time of this protocol is highly dependent on the size and complexity of data to be analyzed. The example data set analyzed in the procedure detailed herein can be processed in less than 5 h. more

5,056 Citations

Open accessJournal ArticleDOI: 10.1093/BIOINFORMATICS/BTY560
Shifu Chen1, Yanqing Zhou, Yaru Chen, Jia Gu1Institutions (1)
01 Sep 2018-Bioinformatics
Abstract: Motivation Quality control and preprocessing of FASTQ files are essential to providing clean data for downstream analysis. Traditionally, a different tool is used for each operation, such as quality control, adapter trimming and quality filtering. These tools are often insufficiently fast as most are developed using high-level programming languages (e.g. Python and Java) and provide limited multi-threading support. Reading and loading data multiple times also renders preprocessing slow and I/O inefficient. Results We developed fastp as an ultra-fast FASTQ preprocessor with useful quality control and data-filtering features. It can perform quality control, adapter trimming, quality filtering, per-read quality pruning and many other operations with a single scan of the FASTQ data. This tool is developed in C++ and has multi-threading support. Based on our evaluation, fastp is 2-5 times faster than other FASTQ preprocessing tools such as Trimmomatic or Cutadapt despite performing far more operations than similar tools. Availability and implementation The open-source code and corresponding instructions are available at more

Topics: FASTQ format (65%), Preprocessor (50%)

2,501 Citations


Author's H-index: 17

No. of papers from the Author in previous years

Top Attributes

Show by:

Author's top 5 most impactful journals


4 papers, 22 citations

Molecular Plant

2 papers, 211 citations

Frontiers in Plant Science

2 papers, 32 citations

Plant Physiology

2 papers, 43 citations

The Plant Cell

2 papers, 261 citations

Network Information
Related Authors (5)
Marie E. Bolger

16 papers, 1.1K citations

93% related
Alexander Vogel

12 papers, 606 citations

93% related
Björn Usadel

167 papers, 14.3K citations

85% related
Marc Lohse

30 papers, 32.5K citations

82% related
Rainer Schwacke

21 papers, 2.8K citations

82% related