PHYLOViZ 2.0: providing scalable data integration and visualization for multiple phylogenetic inference methods

doi:10.1093/BIOINFORMATICS/BTW582

Phylogenetics

PHYLOViZ 2.0: providing scalable data

integration and visualization for multiple

phylogenetic inference methods

Marta Nascimento

1,2

, Adriano Sousa

3

,M



ario Ramirez

4

,

Alexandre P. Francisco

1,2,†

,Jo

~

ao A. Carric¸o

4,†

and C



atia Vaz

1,3,†,

*

1

INESC-ID, 1000-029 Lisboa, Portugal,

2

Instituto Superior Te´cnico, Universidade de Lisboa, 1049-001 Lisboa,

Portugal,

3

Instituto Superior de Engenharia de Lisboa, 1959-007 Lisboa, Portugal and

4

Instituto de Microbiologia,

Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, 1649-028 Lisboa, Portugal

*To whom correspondence should be addressed.

†

The authors wish it to be known that, in their opinion, the last 3 authors should be regarded as Joint Last Authors.

Associate Editor: Janet Kelso

Received on April 13, 2016; revised on July 18, 2016; accepted on September 2, 2016

Abstract

Summary: High Throughput Sequencing provides a cost effective means of generating high reso-

lution data for hundreds or even thousands of strains, and is rapidly superseding methodologies

based on a few genomic loci. The wealth of genomic data deposited on public databases such as

Sequence Read Archive/European Nucleotide Archive provides a powerful resource for evolution-

ary analysis and epidemiological surveillance. However, many of the analysis tools currently avail-

able do not scale well to these large datasets, nor provide the means to fully integrate ancillary

data. Here we present PHYLOViZ 2.0, an extension of PHYLOViZ tool, a platform independent Java

tool that allows phylogenetic inference and data visualization for large datasets of sequence based

typing methods, including Single Nucleotide Polymorphism (SNP) and whole genome/core

genome Multilocus Sequence Typing (wg/cgMLST) analysis. PHYLOViZ 2.0 incorporates new data

analysis algorithms and new visualization modules, as well as the capability of saving projects for

subsequent work or for dissemination of results.

Availability and Implementation: http://www.phyloviz.net/ (licensed under GPLv3).

Contact: cvaz@inesc-id.pt

Supplementary information: Supplementary data are available at Bioinformatics online.

1 Introduction

DNA sequencing facilitated obtaining comparable and reproducible

microbial typing data, effectively replacing other molecular and

phenotypic techniques. Although single-gene approaches exist,

MultiLocus Sequence Typing (MLST) (Maiden et al., 2013; Spratt,

1999) remains the most popular. While traditional MLST relies on a

few genes for discriminating bacterial isolates, the advent of High-

Throughput Sequencing (HTS) allowed analyzing thousands of loci

and is being presented as the ultimate tool for defining bacterial

clones with application in clinical settings (Carric¸o et al., 2013;

Maiden et al., 2013). On the other hand, traditional algorithms and

software are proving inadequate to handle the increase in number of

loci and strains to be analyzed, which is going from tens to thou-

sands given the decrease in HTS costs.

Several tools are available for data analysis and visualization

of microbial typing data, such as START (Jolley, 2001), eBURST

(Feil et al., 2004) and goeBURST (Francisco et al., 2009).

However, these tools lack the capacity to integrate ancillary epi-

demiological data. Several well known tools (Suderman and

Hallett, 2007) allow network visualization and data integration.

However, these are of generic use and are not specifically dir-

ected towards population or evolutionary analysis, depending on

V

C

Bioinformatics, 33(1), 2017, 128–129

doi: 10.1093/bioinformatics/btw582

Advance Access Publication Date: 6 September 2016

Applications Note

Downloaded from https://academic.oup.com/bioinformatics/article/33/1/128/2525691 by guest on 16 August 2022

other software for inferring trees and requiring non-trivial

customization.

PHYLOViZ (Francisco et al., 2012) is a flexible and expandable

plugin based tool, able to handle large datasets, both in number of

analyzed samples and loci. PHYLOViZ includes data integration

and visualization capabilities for molecular epidemiological data

(such as SNP or cg/wgMLST data), allowing a visual analytics ap-

proach. However, the current version lacks important features such

as storing data manipulations (projects) for sharing or later work.

Moreover, PHYLOViZ offered limited clustering methods for the

creation of trees, lacking widely used methods in the context of mi-

crobial population and epidemiological analyses such as Neighbor-

Joining (Saitou and Nei, 1987) and Hierarchical Clustering methods

(Sneath and Sokal, 1973).

We developed PHYLOViZ 2.0 to meet needs identified by users

of PHYLOViZ, including new data analysis algorithms and visual-

ization modules, namely a dendrogram view and a weighted tree

view. PHYLOViZ 2.0 is also now capable of saving ongoing pro-

jects and of dynamically updating the saved projects, a time-saving

feature when working with large datasets and essential for efficiently

sharing results.

2 Methods

PHYLOViZ is based on the NetBeans Platform and it includes revi-

sions of existing plugins and six new plugins (see Table 1).

PHYLOViZ now includes implementations of hierarchical clus-

tering methods, namely methods that belong to a common class

defined as Globally Closest Pair (GPC) clustering algorithms

(Gronau and Shlomo, 2007). In each step of a GPC algorithm, one

pair of clusters is selected such that it satisfies a criterion of minimal

dissimilarity. The selected clusters are merged and the next step will

consider a new cluster that corresponds to their union. We have im-

plemented three variants of GPC algorithms, namely unweighted

and weighted pair group method with arithmetic mean (UPGMA

and WPGMA, respectively), Single Linkage and Complete Linkage.

These methods are distinguished by the dissimilarity condition con-

sidered, so we have implemented a generic method where this is

given as parameter. By using the lookup mechanism implemented in

the NetBeans platform, dissimilarity conditions are provided as ser-

vices and remain independent of the algorithm. Hence, new dissimi-

larity conditions can be made available through independent

plugins, which may be developed and added at any time. We have

also added to PHYLOViZ the Neighbor-Joining method with two

different branch length estimators (Saitou and Nei, 1987; Studier

and Kepler, 1988).

The new version includes two new visualization modules: a den-

drogram view and a weighted tree view. In order to make these

modules generic, we defined different types of abstractions associ-

ated to each visualization, including specific JSON schemas. The

main goal is to allow reusing visualizations by other plugins. In

order to visually emphasize the hierarchical groups within some pre-

defined distance, we also implemented a visual cutoff functionality.

Together with the ability of integrating ancillary data, this provides

a unique and user friendly platform for visual analytics in microbial

typing and phylogenetics.

In previous PHYLOViZ version, users could analyze and visual-

ize their data and export results in several commonly used graphic

formats. However, it was not possible to save studies, for

subsequent work or for sharing with others. This limitation is of

particular importance when working with large datasets, for which

running algorithms and optimizing visualizations can take consider-

able time. In version 2.0 it is possible to save each study as a project.

Each project includes the data under analysis, results of inference al-

gorithms, visualization serializations and related graphical layout

customizations.

Tutorial screencasts on new functionalities of PHYLOViZ 2.0

are available at http://www.phyloviz.net/tutorials.html, and its

documentation is available at http://phyloviz.readthedocs.org. An

example of the novel algorithms visualizations can be found in sup-

plemental material.

Funding

This work was partially supported by the Fundac¸

~

ao para a Ci

^

encia e a

Tecnologia [EXCL/EEI-ESS/0257/2012, UID/CEC/50021/2013].

Conflict of Interest: none declared.

References

Carric¸o,J.A. et al. (2013) Bioinformatics in bacterial molecular epidemiology

and public health: databases, tools and the next-generation sequencing revo-

lution. Euro Surv., 18, 20382.

Feil,E.J. et al. (2004) eBURST: inferring patterns of evolutionary descent

among clusters of related bacterial genotypes from multilocus sequence typ-

ing data. J. Bacteriol., 186, 1518–1530.

Francisco,A.P. et al. (2009) Global optimal eBURST analysis of multilocus

typing data using a graphic matroid approach. BMC Bioinformatics, 10,

152.

Francisco,A.P. et al. (2012) PHYLOViZ: phylogenetic inference and data visu-

alization for sequence based typing methods. BMC Bioinformatics, 13, 87.

Gronau,I. and Shlomo,M. (2007) Optimal implementations of UPGMA and

other common clustering algorithms. Inf. Process. Lett., 104, 205–210.

Jolley,K.A. et al. (2001) Sequence type analysis and recombinational tests

(START). Bioinformatics, 17, 1230–1231.

Maiden,M.C.J. et al. (2013) MLST revisited: the gene-by-gene approach to

bacterial genomics. Nat. Rev. Micro biol., 11, 728–736.

Saitou,N. and Nei,M. (1987) The neighbor-joining method: a new method for

reconstructing phylogenetic trees. Mol. Biol. Evol., 4, 406–425.

Sneath,P.H.A. and Sokal,R.R. (1973) Numerical Taxonomy: The Principles

and Practice of Numerical Classiﬁcation, 1st ed., WF Freeman and Co., San

Francisco, 573.

Spratt,B.G. (1999) Multilocus sequence typing: molecular typing of bacterial

pathogens in an era of rapid DNA sequencing and the internet. Curr. Opin.

Microbiol., 2, 312–316.

Studier,J. and Keppler,K. (1988) A note on the neighbor-joining algorithm of

Saitou and Nei. Mol. Biol. Evol., 5, 729–731.

Suderman,M. and Hallett,M. (2007) Tools for visually exploring biological

networks. Bioinformatics, 23, 2651–2659.

Table 1. New plugins in PHYLOViZ 2.0

Plugin Description

Clustering Tree Viewer Hierarchical clustering tree viewer

Dendrogram Viewer UPGMA dendrogram/tree viewer

GPC Commons Globally Closest Pair clustering utilities

Neighbor-Joining Neighbor-Joining algorithm implementation

Projects Save and load features for Projects

UPGMA UPGMA algorithm implementation

PHYLOViZ 2.0 129

Downloaded from https://academic.oup.com/bioinformatics/article/33/1/128/2525691 by guest on 16 August 2022

PHYLOViZ 2.0: providing scalable data integration and visualization for multiple phylogenetic inference methods

Citations

GrapeTree : visualization of core genomic relationships among 100,000 bacterial pathogens

The use of next generation sequencing for improving food safety: Translation into practice.

GrapeTree: Visualization of core genomic relationships among 100,000 bacterial pathogens

Neisseria gonorrhoeae Sequence Typing for Antimicrobial Resistance, a Novel Antimicrobial Resistance Multilocus Typing Scheme for Tracking Global Dissemination of N. gonorrhoeae Strains.

Microbiome at the Frontier of Personalized Medicine

References

The neighbor-joining method: a new method for reconstructing phylogenetic trees.

Numerical Taxonomy: The Principles and Practice of Numerical Classification

eBURST: Inferring Patterns of Evolutionary Descent among Clusters of Related Bacterial Genotypes from Multilocus Sequence Typing Data

Numerical Taxonomy: The Principles and Practice of Numerical Classification.

MLST revisited: the gene-by-gene approach to bacterial genomics

Related Papers (5)

SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing

Trimmomatic: a flexible trimmer for Illumina sequence data

Prokka: Rapid Prokaryotic Genome Annotation

Identification of acquired antimicrobial resistance genes

Interactive Tree Of Life (iTOL) v4: recent updates and new developments.