scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Trelliscope: A system for detailed visualization in the deep analysis of large complex data

TL;DR: Trelliscope is introduced, a system that scales Trellis to large complex data and provides a way to create displays with a very large number of panels and an interactive viewer that allows the analyst to sort, filter, and sample the panels in a meaningful way.
Abstract: Trelliscope emanates from the Trellis Display framework for visualization and the Divide and Recombine (D&R) approach to analyzing large complex data. In Trellis, the data are broken up into subsets, a visualization method is applied to each subset, and the display result is an array of panels, one per subset. This is a powerful framework for visualization of data, both small and large. In D&R, the data are broken up into subsets, and any analytic method from statistics and machine learning is applied to each subset independently. Then the outputs are recombined. This provides not only a powerful framework for analysis, but also feasible and practical computations using distributed computational facilities. It enables deep analysis of the data: study of both data summaries as well as the detailed data at their finest granularity. This is critical to full understanding of the data. It also enables the analyst to program using an interactive high-level language for data analysis such as R, which allows the analyst to focus more on the data and less on code. In this paper we introduce Trelliscope, a system that scales Trellis to large complex data. It provides a way to create displays with a very large number of panels and an interactive viewer that allows the analyst to sort, filter, and sample the panels in a meaningful way. We discuss the underlying principles, design, and scalable architecture of Trelliscope, and illustrate its use on three analysis projects in proteomics, high intensity physics, and power systems engineering.
Citations
More filters
Journal ArticleDOI
TL;DR: It is found that Voyager facilitates exploration of previously unseen data and leads to increased data variable coverage, and the need to balance rapid exploration and targeted question-answering for visualization tools is distill.
Abstract: General visualization tools typically require manual specification of views: analysts must select data variables and then choose which transformations and visual encodings to apply. These decisions often involve both domain and visualization design expertise, and may impose a tedious specification process that impedes exploration. In this paper, we seek to complement manual chart construction with interactive navigation of a gallery of automatically-generated visualizations. We contribute Voyager, a mixed-initiative system that supports faceted browsing of recommended charts chosen according to statistical and perceptual measures. We describe Voyager's architecture, motivating design principles, and methods for generating and interacting with visualization recommendations. In a study comparing Voyager to a manual visualization specification tool, we find that Voyager facilitates exploration of previously unseen data and leads to increased data variable coverage. We then distill design implications for visualization tools, in particular the need to balance rapid exploration and targeted question-answering.

443 citations


Cites background from "Trelliscope: A system for detailed ..."

  • ..., [13]) is how to best fit such large plots in small spaces, perhaps via paging and “scrubbing” interactions common to videos or image collections....

    [...]

Journal ArticleDOI
TL;DR: An automated approach to imaging that utilizes label-free nanoproteomics to analyze tissue voxels, generating quantitative cell-type-specific images for >2000 proteins with 100-µm spatial resolution across mouse uterine tissue sections preparing for blastocyst implantation is demonstrated.
Abstract: Biological tissues exhibit complex spatial heterogeneity that directs the functions of multicellular organisms. Quantifying protein expression is essential for elucidating processes within complex biological assemblies. Imaging mass spectrometry (IMS) is a powerful emerging tool for mapping the spatial distribution of metabolites and lipids across tissue surfaces, but technical challenges have limited the application of IMS to the analysis of proteomes. Methods for probing the spatial distribution of the proteome have generally relied on the use of labels and/or antibodies, which limits multiplexing and requires a priori knowledge of protein targets. Past efforts to make spatially resolved proteome measurements across tissues have had limited spatial resolution and proteome coverage and have relied on manual workflows. Here, we demonstrate an automated approach to imaging that utilizes label-free nanoproteomics to analyze tissue voxels, generating quantitative cell-type-specific images for >2000 proteins with 100-µm spatial resolution across mouse uterine tissue sections preparing for blastocyst implantation.

150 citations

Book
01 Jan 2019
TL;DR: The Handbook of Big Data describes modern, scalable approaches for analyzing increasingly large datasets and defines the underlying concepts of the available analytical tools and techniques to instills a working understanding of key statistical and computing ideas that can be readily applied in research and practice.
Abstract: Handbook of Big Data provides a state-of-the-art overview of the analysis of large-scale datasets. Featuring contributions from well-known experts in statistics and computer science, this handbook presents a carefully curated collection of techniques from both industry and academia. Thus, the text instills a working understanding of key statistical and computing ideas that can be readily applied in research and practice. Offering balanced coverage of methodology, theory, and applications, this handbook: Describes modern, scalable approaches for analyzing increasingly large datasets Defines the underlying concepts of the available analytical tools and techniques Details intercommunity advances in computational statistics and machine learning Handbook of Big Data also identifies areas in need of further development, encouraging greater communication and collaboration between researchers in big data sub-specialties such as genomics, computational biology, and finance.

51 citations

Journal ArticleDOI
TL;DR: This study represents the largest (>2000 proteins measured) longitudinal expression profiles of human plasma proteome in T1D research and shows Oxidative stress related proteins have consistently different dysregulated patterns in T2D group than in age-sex matched healthy controls, even prior to appearance of islet autoantibodies.

35 citations

Journal ArticleDOI
TL;DR: In this paper, the authors present ftmsRanalysis, an R package which provides an extensive collection of data formatting and processing, filtering, visualization, and sample and group comparison functionalities.
Abstract: The high-resolution and mass accuracy of Fourier transform mass spectrometry (FT-MS) has made it an increasingly popular technique for discerning the composition of soil, plant and aquatic samples containing complex mixtures of proteins, carbohydrates, lipids, lignins, hydrocarbons, phytochemicals and other compounds. Thus, there is a growing demand for informatics tools to analyze FT-MS data that will aid investigators seeking to understand the availability of carbon compounds to biotic and abiotic oxidation and to compare fundamental chemical properties of complex samples across groups. We present ftmsRanalysis, an R package which provides an extensive collection of data formatting and processing, filtering, visualization, and sample and group comparison functionalities. The package provides a suite of plotting methods and enables expedient, flexible and interactive visualization of complex datasets through functions which link to a powerful and interactive visualization user interface, Trelliscope. Example analysis using FT-MS data from a soil microbiology study demonstrates the core functionality of the package and highlights the capabilities for producing interactive visualizations.

27 citations

References
More filters
Journal Article
TL;DR: Copyright (©) 1999–2012 R Foundation for Statistical Computing; permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and permission notice are preserved on all copies.
Abstract: Copyright (©) 1999–2012 R Foundation for Statistical Computing. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the R Core Team.

272,030 citations

Journal ArticleDOI
TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Abstract: Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

35,225 citations

Book
13 Aug 2009
TL;DR: This book describes ggplot2, a new data visualization package for R that uses the insights from Leland Wilkisons Grammar of Graphics to create a powerful and flexible system for creating data graphics.
Abstract: This book describes ggplot2, a new data visualization package for R that uses the insights from Leland Wilkisons Grammar of Graphics to create a powerful and flexible system for creating data graphics. With ggplot2, its easy to: produce handsome, publication-quality plots, with automatic legends created from the plot specification superpose multiple layers (points, lines, maps, tiles, box plots to name a few) from different data sources, with automatically adjusted common scales add customisable smoothers that use the powerful modelling capabilities of R, such as loess, linear models, generalised additive models and robust regression save any ggplot2 plot (or part thereof) for later modification or reuse create custom themes that capture in-house or journal style requirements, and that can easily be applied to multiple plots approach your graph from a visual perspective, thinking about how each component of the data is represented on the final plot. This book will be useful to everyone who has struggled with displaying their data in an informative and attractive way. You will need some basic knowledge of R (i.e. you should be able to get your data into R), but ggplot2 is a mini-language specifically tailored for producing graphics, and youll learn everything you need in the book. After reading this book youll be able to produce graphics customized precisely for your problems,and youll find it easy to get graphics out of your head and on to the screen or page.

29,504 citations


"Trelliscope: A system for detailed ..." refers methods in this paper

  • ...By using the R-based interfaces of base R graphics, lattice, or ggplot [20, 17], it is possible to create nearly any visual representation that a user requires....

    [...]

Journal ArticleDOI
TL;DR: DAMID is a web-accessible program that integrates functional genomic annotations with intuitive graphical summaries that assists in the interpretation of genome-scale datasets by facilitating the transition from data collection to biological meaning.
Abstract: The distributed nature of biological knowledge poses a major challenge to the interpretation of genome-scale datasets, including those derived from microarray and proteomic studies. This report describes DAVID, a web-accessible program that integrates functional genomic annotations with intuitive graphical summaries. Lists of gene or protein identifiers are rapidly annotated and summarized according to shared categorical data for Gene Ontology, protein domain, and biochemical pathway membership. DAVID assists in the interpretation of genome-scale datasets by facilitating the transition from data collection to biological meaning.

8,849 citations

Book
29 May 2009
TL;DR: This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoops clusters.
Abstract: Hadoop: The Definitive Guide helps you harness the power of your data. Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters. Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you: Use the Hadoop Distributed File System (HDFS) for storing large datasets, and run distributed computations over those datasets using MapReduce Become familiar with Hadoop's data and I/O building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud Use Pig, a high-level query language for large-scale data processing Take advantage of HBase, Hadoop's database for structured and semi-structured data Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems If you have lots of data -- whether it's gigabytes or petabytes -- Hadoop is the perfect solution. Hadoop: The Definitive Guide is the most thorough book available on the subject. "Now you have the opportunity to learn about Hadoop from a master-not only of the technology, but also of common sense and plain talk." -- Doug Cutting, Hadoop Founder, Yahoo!

3,797 citations


"Trelliscope: A system for detailed ..." refers methods in this paper

  • ...Panel data or images stored on HDFS are stored as key/value pair Hadoop mapfiles [19] with the key being the panel key....

    [...]

Trending Questions (1)
Can a software developer become data analyst?

It also enables the analyst to program using an interactive high-level language for data analysis such as R, which allows the analyst to focus more on the data and less on code.