scispace - formally typeset
Open AccessJournal ArticleDOI

PHYLOViZ 2.0: providing scalable data integration and visualization for multiple phylogenetic inference methods

TLDR
PHYLOViZ 2.0 is presented, an extension of PHYLoviZ tool, a platform independent Java tool that allows phylogenetic inference and data visualization for large datasets of sequence based typing methods, including Single Nucleotide Polymorphism (SNP) and whole genome/core genome Multilocus Sequence Typing (wg/cgMLST) analysis.
Abstract
Summary: High Throughput Sequencing provides a cost effective means of generating high resolution data for hundreds or even thousands of strains, and is rapidly superseding methodologies based on a few genomic loci. The wealth of genomic data deposited on public databases such as Sequence Read Archive/European Nucleotide Archive provides a powerful resource for evolutionary analysis and epidemiological surveillance. However, many of the analysis tools currently available do not scale well to these large datasets, nor provide the means to fully integrate ancillary data. Here we present PHYLOViZ 2.0, an extension of PHYLOViZ tool, a platform independent Java tool that allows phylogenetic inference and data visualization for large datasets of sequence based typing methods, including Single Nucleotide Polymorphism (SNP) and whole genome/core genome Multilocus Sequence Typing (wg/cgMLST) analysis. PHYLOViZ 2.0 incorporates new data analysis algorithms and new visualization modules, as well as the capability of saving projects for subsequent work or for dissemination of results. Availability and Implementation: http://www.phyloviz.net/ (licensed under GPLv3). Contact: cvaz@inesc-id.pt Supplementary information: Supplementary data are available at Bioinformatics online.

read more

Content maybe subject to copyright    Report

Phylogenetics
PHYLOViZ 2.0: providing scalable data
integration and visualization for multiple
phylogenetic inference methods
Marta Nascimento
1,2
, Adriano Sousa
3
,M
ario Ramirez
4
,
Alexandre P. Francisco
1,2,†
,Jo
~
ao A. Carric¸o
4,†
and C
atia Vaz
1,3,†,
*
1
INESC-ID, 1000-029 Lisboa, Portugal,
2
Instituto Superior Te´cnico, Universidade de Lisboa, 1049-001 Lisboa,
Portugal,
3
Instituto Superior de Engenharia de Lisboa, 1959-007 Lisboa, Portugal and
4
Instituto de Microbiologia,
Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, 1649-028 Lisboa, Portugal
*To whom correspondence should be addressed.
The authors wish it to be known that, in their opinion, the last 3 authors should be regarded as Joint Last Authors.
Associate Editor: Janet Kelso
Received on April 13, 2016; revised on July 18, 2016; accepted on September 2, 2016
Abstract
Summary: High Throughput Sequencing provides a cost effective means of generating high reso-
lution data for hundreds or even thousands of strains, and is rapidly superseding methodologies
based on a few genomic loci. The wealth of genomic data deposited on public databases such as
Sequence Read Archive/European Nucleotide Archive provides a powerful resource for evolution-
ary analysis and epidemiological surveillance. However, many of the analysis tools currently avail-
able do not scale well to these large datasets, nor provide the means to fully integrate ancillary
data. Here we present PHYLOViZ 2.0, an extension of PHYLOViZ tool, a platform independent Java
tool that allows phylogenetic inference and data visualization for large datasets of sequence based
typing methods, including Single Nucleotide Polymorphism (SNP) and whole genome/core
genome Multilocus Sequence Typing (wg/cgMLST) analysis. PHYLOViZ 2.0 incorporates new data
analysis algorithms and new visualization modules, as well as the capability of saving projects for
subsequent work or for dissemination of results.
Availability and Implementation: http://www.phyloviz.net/ (licensed under GPLv3).
Contact: cvaz@inesc-id.pt
Supplementary information: Supplementary data are available at Bioinformatics online.
1 Introduction
DNA sequencing facilitated obtaining comparable and reproducible
microbial typing data, effectively replacing other molecular and
phenotypic techniques. Although single-gene approaches exist,
MultiLocus Sequence Typing (MLST) (Maiden et al., 2013; Spratt,
1999) remains the most popular. While traditional MLST relies on a
few genes for discriminating bacterial isolates, the advent of High-
Throughput Sequencing (HTS) allowed analyzing thousands of loci
and is being presented as the ultimate tool for defining bacterial
clones with application in clinical settings (Carric¸o et al., 2013;
Maiden et al., 2013). On the other hand, traditional algorithms and
software are proving inadequate to handle the increase in number of
loci and strains to be analyzed, which is going from tens to thou-
sands given the decrease in HTS costs.
Several tools are available for data analysis and visualization
of microbial typing data, such as START (Jolley, 2001), eBURST
(Feil et al., 2004) and goeBURST (Francisco et al., 2009).
However, these tools lack the capacity to integrate ancillary epi-
demiological data. Several well known tools (Suderman and
Hallett, 2007) allow network visualization and data integration.
However, these are of generic use and are not specifically dir-
ected towards population or evolutionary analysis, depending on
V
C
The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com 128
Bioinformatics, 33(1), 2017, 128–129
doi: 10.1093/bioinformatics/btw582
Advance Access Publication Date: 6 September 2016
Applications Note
Downloaded from https://academic.oup.com/bioinformatics/article/33/1/128/2525691 by guest on 16 August 2022

other software for inferring trees and requiring non-trivial
customization.
PHYLOViZ (Francisco et al., 2012) is a flexible and expandable
plugin based tool, able to handle large datasets, both in number of
analyzed samples and loci. PHYLOViZ includes data integration
and visualization capabilities for molecular epidemiological data
(such as SNP or cg/wgMLST data), allowing a visual analytics ap-
proach. However, the current version lacks important features such
as storing data manipulations (projects) for sharing or later work.
Moreover, PHYLOViZ offered limited clustering methods for the
creation of trees, lacking widely used methods in the context of mi-
crobial population and epidemiological analyses such as Neighbor-
Joining (Saitou and Nei, 1987) and Hierarchical Clustering methods
(Sneath and Sokal, 1973).
We developed PHYLOViZ 2.0 to meet needs identified by users
of PHYLOViZ, including new data analysis algorithms and visual-
ization modules, namely a dendrogram view and a weighted tree
view. PHYLOViZ 2.0 is also now capable of saving ongoing pro-
jects and of dynamically updating the saved projects, a time-saving
feature when working with large datasets and essential for efficiently
sharing results.
2 Methods
PHYLOViZ is based on the NetBeans Platform and it includes revi-
sions of existing plugins and six new plugins (see Table 1).
PHYLOViZ now includes implementations of hierarchical clus-
tering methods, namely methods that belong to a common class
defined as Globally Closest Pair (GPC) clustering algorithms
(Gronau and Shlomo, 2007). In each step of a GPC algorithm, one
pair of clusters is selected such that it satisfies a criterion of minimal
dissimilarity. The selected clusters are merged and the next step will
consider a new cluster that corresponds to their union. We have im-
plemented three variants of GPC algorithms, namely unweighted
and weighted pair group method with arithmetic mean (UPGMA
and WPGMA, respectively), Single Linkage and Complete Linkage.
These methods are distinguished by the dissimilarity condition con-
sidered, so we have implemented a generic method where this is
given as parameter. By using the lookup mechanism implemented in
the NetBeans platform, dissimilarity conditions are provided as ser-
vices and remain independent of the algorithm. Hence, new dissimi-
larity conditions can be made available through independent
plugins, which may be developed and added at any time. We have
also added to PHYLOViZ the Neighbor-Joining method with two
different branch length estimators (Saitou and Nei, 1987; Studier
and Kepler, 1988).
The new version includes two new visualization modules: a den-
drogram view and a weighted tree view. In order to make these
modules generic, we defined different types of abstractions associ-
ated to each visualization, including specific JSON schemas. The
main goal is to allow reusing visualizations by other plugins. In
order to visually emphasize the hierarchical groups within some pre-
defined distance, we also implemented a visual cutoff functionality.
Together with the ability of integrating ancillary data, this provides
a unique and user friendly platform for visual analytics in microbial
typing and phylogenetics.
In previous PHYLOViZ version, users could analyze and visual-
ize their data and export results in several commonly used graphic
formats. However, it was not possible to save studies, for
subsequent work or for sharing with others. This limitation is of
particular importance when working with large datasets, for which
running algorithms and optimizing visualizations can take consider-
able time. In version 2.0 it is possible to save each study as a project.
Each project includes the data under analysis, results of inference al-
gorithms, visualization serializations and related graphical layout
customizations.
Tutorial screencasts on new functionalities of PHYLOViZ 2.0
are available at http://www.phyloviz.net/tutorials.html, and its
documentation is available at http://phyloviz.readthedocs.org. An
example of the novel algorithms visualizations can be found in sup-
plemental material.
Funding
This work was partially supported by the Fundac¸
~
ao para a Ci
^
encia e a
Tecnologia [EXCL/EEI-ESS/0257/2012, UID/CEC/50021/2013].
Conflict of Interest: none declared.
References
Carric¸o,J.A. et al. (2013) Bioinformatics in bacterial molecular epidemiology
and public health: databases, tools and the next-generation sequencing revo-
lution. Euro Surv., 18, 20382.
Feil,E.J. et al. (2004) eBURST: inferring patterns of evolutionary descent
among clusters of related bacterial genotypes from multilocus sequence typ-
ing data. J. Bacteriol., 186, 1518–1530.
Francisco,A.P. et al. (2009) Global optimal eBURST analysis of multilocus
typing data using a graphic matroid approach. BMC Bioinformatics, 10,
152.
Francisco,A.P. et al. (2012) PHYLOViZ: phylogenetic inference and data visu-
alization for sequence based typing methods. BMC Bioinformatics, 13, 87.
Gronau,I. and Shlomo,M. (2007) Optimal implementations of UPGMA and
other common clustering algorithms. Inf. Process. Lett., 104, 205–210.
Jolley,K.A. et al. (2001) Sequence type analysis and recombinational tests
(START). Bioinformatics, 17, 1230–1231.
Maiden,M.C.J. et al. (2013) MLST revisited: the gene-by-gene approach to
bacterial genomics. Nat. Rev. Micro biol., 11, 728–736.
Saitou,N. and Nei,M. (1987) The neighbor-joining method: a new method for
reconstructing phylogenetic trees. Mol. Biol. Evol., 4, 406–425.
Sneath,P.H.A. and Sokal,R.R. (1973) Numerical Taxonomy: The Principles
and Practice of Numerical Classification, 1st ed., WF Freeman and Co., San
Francisco, 573.
Spratt,B.G. (1999) Multilocus sequence typing: molecular typing of bacterial
pathogens in an era of rapid DNA sequencing and the internet. Curr. Opin.
Microbiol., 2, 312–316.
Studier,J. and Keppler,K. (1988) A note on the neighbor-joining algorithm of
Saitou and Nei. Mol. Biol. Evol., 5, 729–731.
Suderman,M. and Hallett,M. (2007) Tools for visually exploring biological
networks. Bioinformatics, 23, 2651–2659.
Table 1. New plugins in PHYLOViZ 2.0
Plugin Description
Clustering Tree Viewer Hierarchical clustering tree viewer
Dendrogram Viewer UPGMA dendrogram/tree viewer
GPC Commons Globally Closest Pair clustering utilities
Neighbor-Joining Neighbor-Joining algorithm implementation
Projects Save and load features for Projects
UPGMA UPGMA algorithm implementation
PHYLOViZ 2.0 129
Downloaded from https://academic.oup.com/bioinformatics/article/33/1/128/2525691 by guest on 16 August 2022
Citations
More filters
Journal ArticleDOI

GrapeTree : visualization of core genomic relationships among 100,000 bacterial pathogens

TL;DR: GrapeTree is a stand-alone package for investigating phylogenetic trees plus associated metadata and is also integrated into EnteroBase to facilitate cutting edge navigation of genomic relationships among bacterial pathogens.
Posted ContentDOI

GrapeTree: Visualization of core genomic relationships among 100,000 bacterial pathogens

TL;DR: G GrapeTree implements a novel minimum spanning tree algorithm to reconstruct genetic relationships despite missing data together with a static “GrapeTree Layout” algorithm to render interactive visualisations of large trees.
Journal ArticleDOI

Microbiome at the Frontier of Personalized Medicine

TL;DR: Taken together, the microbiome is emerging as an integral part of precision medicine approach as it not only contributes to interindividual variability in all aspects of a disease but also represents a potentially modifiable factor that is amenable to targeting by therapeutics.
References
More filters
Journal ArticleDOI

The neighbor-joining method: a new method for reconstructing phylogenetic trees.

TL;DR: The neighbor-joining method and Sattath and Tversky's method are shown to be generally better than the other methods for reconstructing phylogenetic trees from evolutionary distance data.
Journal ArticleDOI

eBURST: Inferring Patterns of Evolutionary Descent among Clusters of Related Bacterial Genotypes from Multilocus Sequence Typing Data

TL;DR: A new implementation of eBURST is presented, which divides an MLST data set of any size into groups of related isolates and clonal complexes, predicts the founding (ancestral) genotype of each clonal complex, and computes the bootstrap support for the assignment.
Journal ArticleDOI

MLST revisited: the gene-by-gene approach to bacterial genomics

TL;DR: This work draws on the successes of MLST and 16S rRNA gene sequencing to propose a hierarchical gene-by-gene approach that reflects functional and evolutionary relationships and catalogues bacteria 'from domain to strain'.
Related Papers (5)