scispace - formally typeset
Search or ask a question

Showing papers by "Valen E. Johnson published in 2022"


Journal ArticleDOI
TL;DR: The authors proposed the use of nonlocal alternative hypotheses to solve the problem that default implementations of Bayesian tests prevent the accumulation of strong evidence in favor of true null hypotheses because associated default alternative hypotheses assign a high probability to data that are most consistent with a null effect.
Abstract: Bayesian hypothesis testing procedures have gained increased acceptance in recent years. A key advantage that Bayesian tests have over classical testing procedures is their potential to quantify information in support of true null hypotheses. Ironically, default implementations of Bayesian tests prevent the accumulation of strong evidence in favor of true null hypotheses because associated default alternative hypotheses assign a high probability to data that are most consistent with a null effect. We propose the use of "nonlocal" alternative hypotheses to resolve this paradox. The resulting class of Bayesian hypothesis tests permits more rapid accumulation of evidence in favor of both true null hypotheses and alternative hypotheses that are compatible with standardized effect sizes of most interest in psychology. (PsycInfo Database Record (c) 2022 APA, all rights reserved).

2 citations


Journal ArticleDOI
TL;DR: A clustering algorithm based on the normalized left Gram matrix, G = XX T /P, which provides the most accurate estimate of the underlying cluster configuration more than twice as often as its closest competitors.
Abstract: For high dimensional data, where P features for N objects ( P (cid:29) N ) are represented in an N × P matrix X , we describe a clustering algorithm based on the normalized left Gram matrix, G = XX T /P . Under certain regularity conditions, the rows in G that correspond to objects in the same cluster converge to the same mean vector. By clustering on the row means, the algorithm does not require preprocessing by dimension reduction or feature selection techniques, and does not require specification of tuning or hyperparameter values. Because it is based on the N × N matrix G , it has lower computational cost than many methods based on clustering the feature matrix X . When compared to 14 other clustering algorithms applied to 32 benchmarked microarray datasets, the proposed algorithm provided the most accurate estimate of the underlying cluster configuration more than twice as often as its closest competitors.

1 citations


Journal ArticleDOI
TL;DR: BFFs depend on a single non-centrality parameter that can be expressed as a function of standardized effect sizes, and plots of BFFs versus effect size provide informative summaries of hypothesis tests that can been easily aggregated across studies.
Abstract: Significance Bayes factors represent an informative alternative to P-values for reporting outcomes of hypothesis tests. They provide direct measures of the relative support that data provide to competing hypotheses and are able to quantify support for true null hypotheses. However, their use has been limited by several factors, including the requirement to specify alternative hypotheses and difficulties encountered in their calculation. Bayes factor functions (BFFs) overcome these difficulties by defining Bayes factors from classical test statistics and using standardized effect sizes to define alternative hypotheses. BFFs provide clear summaries of the outcome from a single experiment, eliminate arbitrary significance thresholds, and are ideal for combining evidence from replicated studies.

1 citations


TL;DR: Rahman et al. as discussed by the authors proposed a fast clustering algorithm for high dimensional data based on the Gram Matrix Decomposition (GMC) version 3.2.4, which can be used to simulate test data and to learn how to use the algorithm.
Abstract: October 12, 2022 Title A Fast Clustering Algorithm for High Dimensional Data Based on the Gram Matrix Decomposition Version 3.2.4 Author Shahina Rahman [aut], Valen E. Johnson [aut], Suhasini Subba Rao [aut], Rachael Shudde [aut, cre, trl] Maintainer Rachael Shudde Description Clustering algorithm for high dimensional data. Assuming that P feature measurements on N objects are arranged in an N×P matrix X, this package provides clustering based on the left Gram matrix XX^T. To simulate test data, type ``help('simulate_HD_data')'' and to learn how to use the clustering algorithm, type ``help('RJclust')''. To cite this package, type 'citation(``RJcluster'')'. License GPL (>= 2) Encoding UTF-8 Imports Rcpp (>= 1.0.2), matrixStats, infotheo, rlang, stats, graphics, profvis, mclust, foreach, utils LinkingTo Rcpp, RcppArmadillo Suggests testthat (>= 2.1.0), knitr, rmarkdown RoxygenNote 7.1.1 VignetteBuilder knitr Depends R (>= 2.10) NeedsCompilation yes Repository CRAN Date/Publication 2022-02-14 21:30:02 UTC

Journal ArticleDOI
TL;DR: In this paper , a simple transformation of the Gram matrix and application of the strong law of large numbers to the transformed matrix is proposed to group high dimensional, small sample size settings into groups based on features measured on each object.
Abstract: Clustering is a challenging problem in machine learning in which one attempts to group $N$ objects into $K_{0}$ groups based on $P$ features measured on each object. In this article, we examine the case where $N \ll P$ and $K_{0}$ is not known. Clustering in such high dimensional, small sample size settings has numerous applications in biology, medicine, the social sciences, clinical trials, and other scientific and experimental fields. Whereas most existing clustering algorithms either require the number of clusters to be known a priori or are sensitive to the choice of tuning parameters, our method does not require the prior specification of $K_{0}$ or any tuning parameters. This represents an important advantage for our method because training data are not available in the applications we consider (i.e., in unsupervised learning problems). Without training data, estimating $K_{0}$ and other hyperparameters–and thus applying alternative clustering algorithms–can be difficult and lead to inaccurate results. Our method is based on a simple transformation of the Gram matrix and application of the strong law of large numbers to the transformed matrix. If the correlation between features decays as the number of features grows, we show that the transformed feature vectors concentrate tightly around their respective cluster expectations in a low-dimensional space. This result simplifies the detection and visualization of the unknown cluster configuration. We illustrate the algorithm by applying it to 32 benchmarked microarray datasets, each containing thousands of genomic features measured on a relatively small number of tissue samples. Compared to 21 other commonly used clustering methods, we find that the proposed algorithm is faster and twice as accurate in determining the “best” cluster configuration.