scispace - formally typeset
Search or ask a question
Author

Cátia Vaz

Bio: Cátia Vaz is an academic researcher from Instituto Superior de Engenharia de Lisboa. The author has contributed to research in topics: Web service & Process calculus. The author has an hindex of 9, co-authored 21 publications receiving 1080 citations. Previous affiliations of Cátia Vaz include Instituto Politécnico Nacional & INESC-ID.

Papers
More filters
Journal ArticleDOI
TL;DR: PHYLOViZ is platform independent Java software that allows the integrated analysis of sequence-based typing methods, including SNP data generated from whole genome sequence approaches, and associated epidemiological data.
Abstract: With the decrease of DNA sequencing costs, sequence-based typing methods are rapidly becoming the gold standard for epidemiological surveillance. These methods provide reproducible and comparable results needed for a global scale bacterial population analysis, while retaining their usefulness for local epidemiological surveys. Online databases that collect the generated allelic profiles and associated epidemiological data are available but this wealth of data remains underused and are frequently poorly annotated since no user-friendly tool exists to analyze and explore it. PHYLOViZ is platform independent Java software that allows the integrated analysis of sequence-based typing methods, including SNP data generated from whole genome sequence approaches, and associated epidemiological data. goeBURST and its Minimum Spanning Tree expansion are used for visualizing the possible evolutionary relationships between isolates. The results can be displayed as an annotated graph overlaying the query results of any other epidemiological data available. PHYLOViZ is a user-friendly software that allows the combined analysis of multiple data sources for microbial epidemiological and population studies. It is freely available at http://www.phyloviz.net .

452 citations

Journal ArticleDOI
TL;DR: GrapeTree is a stand-alone package for investigating phylogenetic trees plus associated metadata and is also integrated into EnteroBase to facilitate cutting edge navigation of genomic relationships among bacterial pathogens.
Abstract: Current methods struggle to reconstruct and visualize the genomic relationships of large numbers of bacterial genomes. GrapeTree facilitates the analyses of large numbers of allelic profiles by a static "GrapeTree Layout" algorithm that supports interactive visualizations of large trees within a web browser window. GrapeTree also implements a novel minimum spanning tree algorithm (MSTree V2) to reconstruct genetic relationships despite high levels of missing data. GrapeTree is a stand-alone package for investigating phylogenetic trees plus associated metadata and is also integrated into EnteroBase to facilitate cutting edge navigation of genomic relationships among bacterial pathogens.

448 citations

Journal ArticleDOI
TL;DR: PHYLOViZ 2.0 is presented, an extension of PHYLoviZ tool, a platform independent Java tool that allows phylogenetic inference and data visualization for large datasets of sequence based typing methods, including Single Nucleotide Polymorphism (SNP) and whole genome/core genome Multilocus Sequence Typing (wg/cgMLST) analysis.
Abstract: Summary: High Throughput Sequencing provides a cost effective means of generating high resolution data for hundreds or even thousands of strains, and is rapidly superseding methodologies based on a few genomic loci. The wealth of genomic data deposited on public databases such as Sequence Read Archive/European Nucleotide Archive provides a powerful resource for evolutionary analysis and epidemiological surveillance. However, many of the analysis tools currently available do not scale well to these large datasets, nor provide the means to fully integrate ancillary data. Here we present PHYLOViZ 2.0, an extension of PHYLOViZ tool, a platform independent Java tool that allows phylogenetic inference and data visualization for large datasets of sequence based typing methods, including Single Nucleotide Polymorphism (SNP) and whole genome/core genome Multilocus Sequence Typing (wg/cgMLST) analysis. PHYLOViZ 2.0 incorporates new data analysis algorithms and new visualization modules, as well as the capability of saving projects for subsequent work or for dissemination of results. Availability and Implementation: http://www.phyloviz.net/ (licensed under GPLv3). Contact: cvaz@inesc-id.pt Supplementary information: Supplementary data are available at Bioinformatics online.

257 citations

Posted ContentDOI
09 Nov 2017-bioRxiv
TL;DR: G GrapeTree implements a novel minimum spanning tree algorithm to reconstruct genetic relationships despite missing data together with a static “GrapeTree Layout” algorithm to render interactive visualisations of large trees.
Abstract: Current methods struggle to reconstruct and visualise the genomic relationships of ≥100,000 bacterial genomes. GrapeTree facilitates the analyses of allelic profiles from 10,000’s of core genomes within a web browser window. GrapeTree implements a novel minimum spanning tree algorithm to reconstruct genetic relationships despite missing data together with a static “GrapeTree Layout” algorithm to render interactive visualisations of large trees. GrapeTree is a stand-along package for investigating Newick trees plus associated metadata and is also integrated into EnteroBase to facilitate cutting edge navigation of genomic relationships among >160,000 genomes from bacterial pathogens. The GrapeTree package was released under the GPL v3.0 Licence.

170 citations

Journal ArticleDOI
TL;DR: PHYLOViZ Online offers a RESTful API for programmatic access to data and algorithms, allowing it to be seamlessly integrated into any third party web service or software.
Abstract: High-throughput sequencing methods generated allele and single nucleotide polymorphism information for thousands of bacterial strains that are publicly available in online repositories and created the possibility of generating similar information for hundreds to thousands of strains more in a single study. Minimum spanning tree analysis of allelic data offers a scalable and reproducible methodological alternative to traditional phylogenetic inference approaches, useful in epidemiological investigations and population studies of bacterial pathogens. PHYLOViZ Online was developed to allow users to do these analyses without software installation and to enable easy accessing and sharing of data and analyses results from any Internet enabled computer. PHYLOViZ Online also offers a RESTful API for programmatic access to data and algorithms, allowing it to be seamlessly integrated into any third party web service or software. PHYLOViZ Online is freely available at https://online.phyloviz.net.

113 citations


Cited by
More filters
01 Jun 2012
TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).
Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.

10,124 citations

Journal ArticleDOI

3,734 citations

Journal Article
TL;DR: FastTree as mentioned in this paper uses sequence profiles of internal nodes in the tree to implement neighbor-joining and uses heuristics to quickly identify candidate joins, then uses nearest-neighbor interchanges to reduce the length of the tree.
Abstract: Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement neighbor-joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest-neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different characters, a distance matrix requires O(N^2) space and O(N^2 L) time, but FastTree requires just O( NLa + N sqrt(N) ) memory and O( N sqrt(N) log(N) L a ) time. To estimate the tree's reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over a distance matrix. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 hours and 2.4 gigabytes of memory. Just computing pairwise Jukes-Cantor distances and storing them, without inferring a tree or bootstrapping, would require 17 hours and 50 gigabytes of memory. In simulations, FastTree was slightly more accurate than neighbor joining, BIONJ, or FastME; on genuine alignments, FastTree's topologies had higher likelihoods. FastTree is available at http://microbesonline.org/fasttree.

2,436 citations

Journal ArticleDOI
24 Sep 2018
TL;DR: Developments in the BIGSdb software made from publication to June 2018 are described and it is shown how the platform realises microbial population genomics for a wide range of applications.
Abstract: The PubMLST.org website hosts a collection of open-access, curated databases that integrate population sequence data with provenance and phenotype information for over 100 different microbial species and genera. Although the PubMLST website was conceived as part of the development of the first multi-locus sequence typing (MLST) scheme in 1998 the software it uses, the Bacterial Isolate Genome Sequence database (BIGSdb, published in 2010), enables PubMLST to include all levels of sequence data, from single gene sequences up to and including complete, finished genomes. Here we describe developments in the BIGSdb software made from publication to June 2018 and show how the platform realises microbial population genomics for a wide range of applications. The system is based on the gene-by-gene analysis of microbial genomes, with each deposited sequence annotated and curated to identify the genes present and systematically catalogue their variation. Originally intended as a means of characterising isolates with typing schemes, the synthesis of sequences and records of genetic variation with provenance and phenotype data permits highly scalable (whole genome sequence data for tens of thousands of isolates) means of addressing a wide range of functional questions, including: the prediction of antimicrobial resistance; likely cross-reactivity with vaccine antigens; and the functional activities of different variants that lead to key phenotypes. There are no limitations to the number of sequences, genetic loci, allelic variants or schemes (combinations of loci) that can be included, enabling each database to represent an expanding catalogue of the genetic variation of the population in question. In addition to providing web-accessible analyses and links to third-party analysis and visualisation tools, the BIGSdb software includes a RESTful application programming interface (API) that enables access to all the underlying data for third-party applications and data analysis pipelines.

1,349 citations

Journal ArticleDOI
TL;DR: The calculus' contribution to analyzing mobile processes is a major topic, and it is dealt with extensively starting from part three, and how π-calculus can be employed in studying practical, modern software engineering concepts such as object-oriented programming is shown.
Abstract: The π-Calculus: A theory of mobile processes by Davide Sangiorgi and David Walker Formal methods have formed the foundation of Computer Science since its inception. Although, initially these formal methods dealt with processes and systems on an individual basis, the paradigm has shifted with the dawn of the age of computer networks. When dealing with systems with interconnected, communicating, dependent, cooperative, and competitive components, the older outlook of analyzing and developing singular systems—and the tools that went with it—were hardly suitable. This led to the development of theories and tools that would support the new paradigm. On the tools end, the development has been widespread and satisfactory: programming languages, development frameworks, databases, and even end-user software products such as word processors, have gained network-awareness. However on the theoretical end, the development has been far less satisfactory. The major work was done by Robin Milner, Joachim Parrow, and David Walker who developed the formalism known as π-calculus in 1989. π-calculus is a process calculus that treats communication between its components as the basic form of computation. It has been quite successful as a foundation of several other calculi in the field and as Milner puts it, it has become common to express ideas about interactions and mobility in variants of the calculus. Introduction The current book serves as a comprehensive reference to π-calculus. Besides Milner's own book on the subject, this is the only other book-length publication on the topic. In many ways, it is much more comprehensive than Milner's: a much wider area of topics are dealt with and in more detail as well. Contents The book is split into seven part. The first part presents the basic theory of π-calculus. However, basic does not mean concise: every topic is discussed in great detail. The section on bisimulation is particularly intensive and provides several insights about the motivation for the theory. Part two discusses several variants of the original calculus. By varying several characteristics of the calculus, such as whether a process can communicate with more than processes at a time, we can obtain these variants. A number of interesting properties of the language are studied by the other when discussing these variants. As can be understood from the title, the calculus' contribution to analyzing mobile processes is a major topic, and it is dealt with extensively starting from part three. The basics are introduced by the way of a sophisticated typing system whose application in speciying complex processes is presented in part four. Part five looks at higher-order π-calculus in which composed systems are considered as first-class citizens. Part six is one of the best in the book and discusses the relation between π-calculus and lambda-calculus, which is an older and more basic calculus. Finally part seven shows how π-calculus can be employed in studying practical, modern software engineering concepts such as object-oriented programming. Impressions One of my disappointments with this book is in how often the reader is left perplexed with some of the theoretical developments, specially in part three. π-calculus is a complicated topic, even for the graduate student audience to which this book is directed, and the author would have done much better by reducing the number of topics and instead focusing on more lucid and detailed explanations. There are several experimental digressions throughout the book, which although interesting, take away from some of the momentum of sequential study. For example, topics such as comparison and encoding of one language to another could be easily moved to a separate section in order to make the content more suitable for self-study. Another issue is the little effort towards making the connection from the theoretical to the practical. The main reason why formal methods have not been adopted in mainstream software development pracitces is that often it is unclear to developers how formalisms can contribute towards the software engineering process. The book would have served its purpose much better if it had spent part of eah chapter discussing the practical application of that chapter's content. For example, congruence checking and bisimulation can be incredbily exciting topics for programmers to learn if they can see practical applications of such powerful techniques. Beyond the above criticism, the book is absolutely indispensible to students and researchers in the field of formal methods. Currently it serves as the primary reference for anyone who wishes to learn the various aspects of π-calculus in detail. Raheel Ahmad

484 citations