scispace - formally typeset
Search or ask a question
Author

Marcel Martin

Bio: Marcel Martin is an academic researcher from Science for Life Laboratory. The author has contributed to research in topics: Exome sequencing & Population. The author has an hindex of 24, co-authored 42 publications receiving 15979 citations. Previous affiliations of Marcel Martin include Max Planck Society & University of Duisburg-Essen.

Papers
More filters
Journal ArticleDOI
TL;DR: The command-line tool cutadapt is developed, which supports 454, Illumina and SOLiD (color space) data, offers two adapter trimming algorithms, and has other useful features.
Abstract: When small RNA is sequenced on current sequencing machines, the resulting reads are usually longer than the RNA and therefore contain parts of the 3' adapter. That adapter must be found and removed error-tolerantly from each read before read mapping. Previous solutions are either hard to use or do not offer required features, in particular support for color space data. As an easy to use alternative, we developed the command-line tool cutadapt, which supports 454, Illumina and SOLiD (color space) data, offers two adapter trimming algorithms, and has other useful features. Cutadapt, including its MIT-licensed source code, is available for download at http://code.google.com/p/cutadapt/

20,255 citations

Journal ArticleDOI
TL;DR: Using exome sequencing, recurrent somatic mutations in EIF1AX and SF3B1 are identified occurring in uveal melanomas with disomy 3, which rarely metastasize and are associated with poor prognosis.
Abstract: Michael Zeschnigk and colleagues identify recurrent somatic mutations of EIF1AX and SF3B1 in uveal melanomas with disomy 3. The EIF1AX mutations specifically alter the N-terminal tail of the protein and were found exclusively in tumors lacking SF3B1 mutations.

407 citations

Posted ContentDOI
02 Nov 2016-bioRxiv
TL;DR: WhatsHap is a production-ready tool for highly accurate read-based phasing that was designed from the beginning to leverage third-generation sequencing technologies, whose long reads can span many variants and are therefore ideal for phasing.
Abstract: Read-based phasing allows to reconstruct the haplotype structure of a sample purely from sequencing reads. While phasing is a required step for answering questions about population genetics, compound heterozygosity, and to aid in clinical decision making, there has been a lack of an accurate, usable and standards-based software. WhatsHap is a production-ready tool for highly accurate read-based phasing. It was designed from the beginning to leverage third-generation sequencing technologies, whose long reads can span many variants and are therefore ideal for phasing. WhatsHap works also well with second-generation data, is easy to use and will phase not only SNVs, but also indels and other variants. It is unique in its ability to combine read-based with genetic phasing, allowing to further improve accuracy if multiple related samples are provided.

230 citations

Journal ArticleDOI
TL;DR: Already available approaches to construct and use pan-genomes are examined, the potential benefits of future technologies and methodologies are discussed, and open challenges from the vantage point of the above-mentioned biological disciplines are reviewed.
Abstract: Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains.

220 citations

Journal ArticleDOI
TL;DR: It is shown that mutations in ARID1B are the main cause of CSS, accounting for 76% of identified mutations, and proposed genotype-phenotype correlations are important for molecular screening strategies.
Abstract: Chromatin remodeling complexes are known to modify chemical marks on histones or to induce conformational changes in the chromatin in order to regulate transcription. De novo dominant mutations in different members of the SWI/SNF chromatin remodeling complex have recently been described in individuals with Coffin-Siris (CSS) and Nicolaides-Baraitser (NCBRS) syndromes. Using a combination of whole-exome sequencing, NGS-based sequencing of 23 SWI/SNF complex genes, and molecular karyotyping in 46 previously undescribed individuals with CSS and NCBRS, we identified a de novo 1-bp deletion (c.677delG, p.Gly226Glufs*53) and a de novo missense mutation (c.914G>T, p.Cys305Phe) in PHF6 in two individuals diagnosed with CSS. PHF6 interacts with the nucleosome remodeling and deacetylation (NuRD) complex implicating dysfunction of a second chromatin remodeling complex in the pathogenesis of CSS-like phenotypes. Altogether, we identified mutations in 60% of the studied individuals (28/46), located in the genes ARID1A, ARID1B, SMARCB1, SMARCE1, SMARCA2, and PHF6. We show that mutations in ARID1B are the main cause of CSS, accounting for 76% of identified mutations. ARID1B and SMARCB1 mutations were also found in individuals with the initial diagnosis of NCBRS. These individuals apparently belong to a small subset who display an intermediate CSS/NCBRS phenotype. Our proposed genotype-phenotype correlations are important for molecular screening strategies.

181 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Timmomatic is developed as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data and is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested.
Abstract: Motivation: Although many next-generation sequencing (NGS) read preprocessing tools already existed, we could not find any tool or combination of tools that met our requirements in terms of flexibility, correct handling of paired-end data and high performance. We have developed Trimmomatic as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data. Results: The value of NGS read preprocessing is demonstrated for both reference-based and reference-free tasks. Trimmomatic is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested. Availability and implementation: Trimmomatic is licensed under GPL V3. It is cross-platform (Java 1.5+ required) and available at http://www.usadellab.org/cms/index.php?page=trimmomatic Contact: ed.nehcaa-htwr.1oib@ledasu Supplementary information: Supplementary data are available at Bioinformatics online.

39,291 citations

Journal ArticleDOI
TL;DR: The command-line tool cutadapt is developed, which supports 454, Illumina and SOLiD (color space) data, offers two adapter trimming algorithms, and has other useful features.
Abstract: When small RNA is sequenced on current sequencing machines, the resulting reads are usually longer than the RNA and therefore contain parts of the 3' adapter. That adapter must be found and removed error-tolerantly from each read before read mapping. Previous solutions are either hard to use or do not offer required features, in particular support for color space data. As an easy to use alternative, we developed the command-line tool cutadapt, which supports 454, Illumina and SOLiD (color space) data, offers two adapter trimming algorithms, and has other useful features. Cutadapt, including its MIT-licensed source code, is available for download at http://code.google.com/p/cutadapt/

20,255 citations

Journal ArticleDOI
TL;DR: Fastp is developed as an ultra‐fast FASTQ preprocessor with useful quality control and data‐filtering features that can perform quality control, adapter trimming, quality filtering, per‐read quality pruning and many other operations with a single scan of the FAST Q data.
Abstract: Motivation Quality control and preprocessing of FASTQ files are essential to providing clean data for downstream analysis. Traditionally, a different tool is used for each operation, such as quality control, adapter trimming and quality filtering. These tools are often insufficiently fast as most are developed using high-level programming languages (e.g. Python and Java) and provide limited multi-threading support. Reading and loading data multiple times also renders preprocessing slow and I/O inefficient. Results We developed fastp as an ultra-fast FASTQ preprocessor with useful quality control and data-filtering features. It can perform quality control, adapter trimming, quality filtering, per-read quality pruning and many other operations with a single scan of the FASTQ data. This tool is developed in C++ and has multi-threading support. Based on our evaluation, fastp is 2-5 times faster than other FASTQ preprocessing tools such as Trimmomatic or Cutadapt despite performing far more operations than similar tools. Availability and implementation The open-source code and corresponding instructions are available at https://github.com/OpenGene/fastp.

7,461 citations

Journal ArticleDOI
TL;DR: This protocol provides a workflow for genome-independent transcriptome analysis leveraging the Trinity platform and presents Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes.
Abstract: De novo assembly of RNA-seq data enables researchers to study transcriptomes without the need for a genome sequence; this approach can be usefully applied, for instance, in research on 'non-model organisms' of ecological and evolutionary importance, cancer samples or the microbiome. In this protocol we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-seq data in non-model organisms. We also present Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes. In the procedure, we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sourceforge.net. The run time of this protocol is highly dependent on the size and complexity of data to be analyzed. The example data set analyzed in the procedure detailed herein can be processed in less than 5 h.

6,369 citations

01 Aug 2000
TL;DR: Assessment of medical technology in the context of commercialization with Bioentrepreneur course, which addresses many issues unique to biomedical products.
Abstract: BIOE 402. Medical Technology Assessment. 2 or 3 hours. Bioentrepreneur course. Assessment of medical technology in the context of commercialization. Objectives, competition, market share, funding, pricing, manufacturing, growth, and intellectual property; many issues unique to biomedical products. Course Information: 2 undergraduate hours. 3 graduate hours. Prerequisite(s): Junior standing or above and consent of the instructor.

4,833 citations