scispace - formally typeset
Open AccessJournal ArticleDOI

Computational pan-genomics: status, promises and challenges.

Tobias Marschall, +61 more
- 01 Jan 2018 - 
- Vol. 19, Iss: 1, pp 118-135
TLDR
Already available approaches to construct and use pan-genomes are examined, the potential benefits of future technologies and methodologies are discussed, and open challenges from the vantage point of the above-mentioned biological disciplines are reviewed.
Abstract
Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Haplotype-resolved diverse human genomes and integrated analysis of structural variation.

Peter Ebert, +73 more
- 02 Apr 2021 - 
TL;DR: In this article, the authors present 64 assembled haplotypes from 32 diverse human genomes, which integrate all forms of genetic variation, even across complex loci, and identify 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing.
Journal ArticleDOI

Genome graphs and the evolution of genome inference

TL;DR: This work surveys various projects underway to build and apply graph-based structures-which it is referred to as genome graphs-and discusses the improvements in read mapping, variant calling, and haplotype determination that genome graphs are expected to produce.
Journal ArticleDOI

SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies.

TL;DR: SyRI is presented, a pairwise whole-genome comparison tool for chromosome-level assemblies that starts by finding rearranged regions and then searches for differences in the sequences, which are distinguished for residing in syntenic or rearrange regions.
Journal ArticleDOI

The design and construction of reference pangenome graphs with minigraph

TL;DR: A graph-based data model and associated formats are proposed to represent multiple genomes while preserving the coordinate of the linear reference genome and it is demonstrated that this model can efficiently construct a pangenome graph and compactly encode tens of thousands of structural variants missing from the current reference genome.
References
More filters
Journal ArticleDOI

Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks

TL;DR: Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.
Journal ArticleDOI

Initial sequencing and analysis of the human genome.

Eric S. Lander, +248 more
- 15 Feb 2001 - 
TL;DR: The results of an international collaboration to produce and make freely available a draft sequence of the human genome are reported and an initial analysis is presented, describing some of the insights that can be gleaned from the sequence.
Journal ArticleDOI

A global reference for human genetic variation.

Adam Auton, +517 more
- 01 Oct 2015 - 
TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.
Journal ArticleDOI

The sequence of the human genome.

J. Craig Venter, +272 more
- 16 Feb 2001 - 
TL;DR: Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems are indicated.
Journal ArticleDOI

The Human Genome Browser at UCSC

TL;DR: A mature web tool for rapid and reliable display of any requested portion of the genome at any scale, together with several dozen aligned annotation tracks, is provided at http://genome.ucsc.edu.
Related Papers (5)

A global reference for human genetic variation.

Adam Auton, +517 more
- 01 Oct 2015 - 
Trending Questions (1)
What is a pangenome?

A pangenome is defined as any collection of genomic sequences that are analyzed jointly or used as a reference in computational pan-genomics.