scispace - formally typeset
Search or ask a question
Topic

Biological data

About: Biological data is a research topic. Over the lifetime, 3435 publications have been published within this topic receiving 80702 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The toolkit incorporates over 130 functions, which are designed to meet the increasing demand for big-data analyses, ranging from bulk sequence processing to interactive data visualization, and a new plotting engine developed to maximum their interactive ability.

5,173 citations

Journal ArticleDOI
TL;DR: G:Profiler is now capable of analysing data from any organism, including vertebrates, plants, fungi, insects and parasites, and the 2019 update introduces an extensive technical rewrite making the services faster and more flexible.
Abstract: Biological data analysis often deals with lists of genes arising from various studies. The g:Profiler toolset is widely used for finding biological categories enriched in gene lists, conversions between gene identifiers and mappings to their orthologs. The mission of g:Profiler is to provide a reliable service based on up-to-date high quality data in a convenient manner across many evidence types, identifier spaces and organisms. g:Profiler relies on Ensembl as a primary data source and follows their quarterly release cycle while updating the other data sources simultaneously. The current update provides a better user experience due to a modern responsive web interface, standardised API and libraries. The results are delivered through an interactive and configurable web design. Results can be downloaded as publication ready visualisations or delimited text files. In the current update we have extended the support to 467 species and strains, including vertebrates, plants, fungi, insects and parasites. By supporting user uploaded custom GMT files, g:Profiler is now capable of analysing data from any organism. All past releases are maintained for reproducibility and transparency. The 2019 update introduces an extensive technical rewrite making the services faster and more flexible. g:Profiler is freely available at https://biit.cs.ut.ee/gprofiler.

2,959 citations

Journal ArticleDOI
TL;DR: This paper will discuss how geometry and topology can be applied to make useful contributions to the analysis of various kinds of data, particularly high throughput data from microarray or other sources.
Abstract: An important feature of modern science and engineering is that data of various kinds is being produced at an unprecedented rate This is so in part because of new experimental methods, and in part because of the increase in the availability of high powered computing technology It is also clear that the nature of the data we are obtaining is significantly different For example, it is now often the case that we are given data in the form of very long vectors, where all but a few of the coordinates turn out to be irrelevant to the questions of interest, and further that we don’t necessarily know which coordinates are the interesting ones A related fact is that the data is often very high-dimensional, which severely restricts our ability to visualize it The data obtained is also often much noisier than in the past and has more missing information (missing data) This is particularly so in the case of biological data, particularly high throughput data from microarray or other sources Our ability to analyze this data, both in terms of quantity and the nature of the data, is clearly not keeping pace with the data being produced In this paper, we will discuss how geometry and topology can be applied to make useful contributions to the analysis of various kinds of data Geometry and topology are very natural tools to apply in this direction, since geometry can be regarded as the study of distance functions, and what one often works with are distance functions on large finite sets of data The mathematical formalism which has been developed for incorporating geometric and topological techniques deals with point clouds, ie finite sets of points equipped with a distance function It then adapts tools from the various branches of geometry to the study of point clouds The point clouds are intended to be thought of as finite samples taken from a geometric object, perhaps with noise Here are some of the key points which come up when applying these geometric methods to data analysis • Qualitative information is needed: One important goal of data analysis is to allow the user to obtain knowledge about the data, ie to understand how it is organized on a large scale For example, if we imagine that we are looking at a data set constructed somehow from diabetes patients, it would be important to develop the understanding that there are two types of the disease, namely the juvenile and adult onset forms Once that is established, one of course wants to develop quantitative methods for distinguishing them, but the first insight about the distinct forms of the disease is key

2,203 citations

Journal ArticleDOI
TL;DR: In this comprehensive survey, a large number of existing approaches to biclustering are analyzed, and they are classified in accordance with the type of biclusters they can find, the patterns of bIClusters that are discovered, the methods used to perform the search, the approaches used to evaluate the solution, and the target applications.
Abstract: A large number of clustering approaches have been proposed for the analysis of gene expression data obtained from microarray experiments. However, the results from the application of standard clustering methods to genes are limited. This limitation is imposed by the existence of a number of experimental conditions where the activity of genes is uncorrelated. A similar limitation exists when clustering of conditions is performed. For this reason, a number of algorithms that perform simultaneous clustering on the row and column dimensions of the data matrix has been proposed. The goal is to find submatrices, that is, subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated activities for every condition. In this paper, we refer to this class of algorithms as biclustering. Biclustering is also referred in the literature as coclustering and direct clustering, among others names, and has also been used in fields such as information retrieval and data mining. In this comprehensive survey, we analyze a large number of existing approaches to biclustering, and classify them in accordance with the type of biclusters they can find, the patterns of biclusters that are discovered, the methods used to perform the search, the approaches used to evaluate the solution, and the target applications.

2,123 citations

Journal ArticleDOI
TL;DR: MixOmics is introduced, an R package dedicated to the multivariate analysis of biological data sets with a specific focus on data exploration, dimension reduction and visualisation and extends Projection to Latent Structure models for discriminant analysis.
Abstract: The advent of high throughput technologies has led to a wealth of publicly available 'omics data coming from different sources, such as transcriptomics, proteomics, metabolomics. Combining such large-scale biological data sets can lead to the discovery of important biological insights, provided that relevant information can be extracted in a holistic manner. Current statistical approaches have been focusing on identifying small subsets of molecules (a 'molecular signature') to explain or predict biological conditions, but mainly for a single type of 'omics. In addition, commonly used methods are univariate and consider each biological feature independently. We introduce mixOmics, an R package dedicated to the multivariate analysis of biological data sets with a specific focus on data exploration, dimension reduction and visualisation. By adopting a systems biology approach, the toolkit provides a wide range of methods that statistically integrate several data sets at once to probe relationships between heterogeneous 'omics data sets. Our recent methods extend Projection to Latent Structure (PLS) models for discriminant analysis, for data integration across multiple 'omics data or across independent studies, and for the identification of molecular signatures. We illustrate our latest mixOmics integrative frameworks for the multivariate analyses of 'omics data available from the package.

1,862 citations


Network Information
Related Topics (5)
Genome
74.2K papers, 3.8M citations
80% related
Cluster analysis
146.5K papers, 2.9M citations
79% related
Gene expression profiling
26.9K papers, 1.7M citations
78% related
Support vector machine
73.6K papers, 1.7M citations
77% related
Gene
211.7K papers, 10.3M citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202346
202287
2021160
2020180
2019193
2018195