A comparative study of statistical methods used to identify dependencies between gene expression signals
Reads0
Chats0
TLDR
This work seeks to summarize the main methods used to identify dependency between random variables, especially gene expression data, and also to evaluate the strengths and limitations of each method.Abstract:
One major task in molecular biology is to understand the dependency among genes to model gene regulatory networks. Pearson's correlation is the most common method used to measure dependence between gene expression signals, but it works well only when data are linearly associated. For other types of association, such as non-linear or non-functional relationships, methods based on the concepts of rank correlation and information theory-based measures are more adequate than the Pearson's correlation, but are less used in applications, most probably because of a lack of clear guidelines for their use. This work seeks to summarize the main methods (Pearson's, Spearman's and Kendall's correlations; distance correlation; Hoeffding's D: measure; Heller-Heller-Gorfine measure; mutual information and maximal information coefficient) used to identify dependency between random variables, especially gene expression data, and also to evaluate the strengths and limitations of each method. Systematic Monte Carlo simulation analyses ranging from sample size, local dependence and linear/non-linear and also non-functional relationships are shown. Moreover, comparisons in actual gene expression data are carried out. Finally, we provide a suggestive list of methods that can be used for each type of data set.read more
Citations
More filters
Detecting Novel Associations in Large Data Sets
David N. Reshef,David N. Reshef,David N. Reshef,Yakir A. Reshef,Yakir A. Reshef,Hilary K. Finucane,Sharon R. Grossman,Sharon R. Grossman,Gilean McVean,Gilean McVean,Peter J. Turnbaugh,Eric S. Lander,Eric S. Lander,Eric S. Lander,Michael Mitzenmacher,Pardis C. Sabeti,Pardis C. Sabeti +16 more
TL;DR: The maximal information coefficient (MIC) as mentioned in this paper is a measure of dependence for two-variable relationships that captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R2) of the data relative to the regression function.
Journal ArticleDOI
Variations in sustainable development goal interactions: Population, regional, and income disaggregation
TL;DR: In this article, the authors conduct a cross-sectional correlational analysis for 2016 to understand SDG interactions under the entire development spectrum, and apply several correlation methods to classify the interaction as synergy or trade-off and characterize them according to their monotony and linearity.
Journal ArticleDOI
Gene Networks in Plant Biology: Approaches in Reconstruction and Analysis
TL;DR: Methods to analyze gene networks using gene expression data are discussed, specifically focusing on four common statistical approaches used to reconstruct networks: correlation, feature selection in supervised learning, probabilistic graphical model, and meta-prediction.
Journal ArticleDOI
Ranking genome-wide correlation measurements improves microarray and RNA-seq based global and targeted co-expression networks.
Franziska Liesecke,Dimitri Daudu,Rodolphe Dugé de Bernonville,Sébastien Besseau,Marc Clastre,Vincent Courdavault,Johan-Owen De Craene,Joël Crèche,Nathalie Giglioli-Guivarc’h,Gaëlle Glévarec,Olivier Pichon,Thomas Dugé de Bernonville +11 more
TL;DR: This work highlights how PCC ranked with HRR is better suited for global network construction and PLC with microarray and RNA-seq data than other distance methods, especially to cluster genes in partitions similar to biological subpathways.
Journal ArticleDOI
Selecting biologically informative genes in co-expression networks with a centrality score
TL;DR: This work investigates a method that identifies potential biologically meaningful genes based on a weighted connectivity score and indicators of statistical relevance and outperformed standard centrality measures by highlighting more biologically informative genes in different gene co-expression networks and biological research domains.
References
More filters
Journal Article
R: A language and environment for statistical computing.
TL;DR: Copyright (©) 1999–2012 R Foundation for Statistical Computing; permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and permission notice are preserved on all copies.
Journal ArticleDOI
Investigating Causal Relations by Econometric Models and Cross-Spectral Methods
TL;DR: In this article, the cross spectrum between two variables can be decomposed into two parts, each relating to a single causal arm of a feedback situation, and measures of causal lag and causal strength can then be constructed.
Journal Article
The mathematical theory of communication
Claude E. Shannon,Warren Weaver +1 more
TL;DR: The Mathematical Theory of Communication (MTOC) as discussed by the authors was originally published as a paper on communication theory more than fifty years ago and has since gone through four hardcover and sixteen paperback printings.
Book ChapterDOI
Investigating causal relations by econometric models and cross-spectral methods
TL;DR: In this article, it is shown that the cross spectrum between two variables can be decomposed into two parts, each relating to a single causal arm of a feedback situation, and measures of causal lag and causal strength can then be constructed.
Journal ArticleDOI
A new measure of rank correlation
TL;DR: Rank correlation as mentioned in this paper is a measure of similarity between two rankings of the same set of individuals, and it has been used in psychological work to compare two different rankings of individuals in order to indicate similarity of taste.