# A similarity index for comparing coupled matrices

TL;DR: A 2‐stage similarity index framework for comparing 2 matrices and applications include the investigation of redundancy in spectroscopic data and the Investigation of assessor consistency or deviations in sensory science.

Abstract: Application of different multivariate measurement technologies to the same set of samples is an interesting challenge in many fields of applied data analysis. Our proposal is a 2‐stage similarity index framework for comparing 2 matrices in this type of situation. The first step is to identify factors (and associated subspaces) of the matrices by methods such as principal component analysis or partial least squares regression to provide good (low‐dimensional) summaries of their information content. Thereafter, statistical significances are assigned to the similarity values obtained at various factor subset combinations by considering orthogonal projections or Procrustes rotations and how to express the results compactly in corresponding summary plots. Applications of the methodology include the investigation of redundancy in spectroscopic data and the investigation of assessor consistency or deviations in sensory science. The proposed methodology is implemented in the R‐package “MatrixCorrelation” available online from CRAN.

##### Citations

More filters

•

TL;DR: In this article, a sintetic measure is proposed to measure the similarity between two pairs of A matrices, where the matching measure is defined as cosine of the angle between vectors of proportions of variance explained by the factors.

Abstract: The usual approach, proposed by several authors in matching of two factor solutions obtained on two different samples measured by the same set of variables is the determination of the matrix where and are either matrices of correlations between variables and orthogonal or oblique factors or parallel projections on oblique factors, as reported by Fruchter (1966). When the comparison of oblique factor solutions takes place, the difference in values obtained by two definitions of A matrices can be substantial, so it is recommended to calculate the congruences of both pairs, the structure and pattern matrices. In this paper, a new approach is proposed where a single sintetic measure represents the matching of both pairs of A matrices. The matching measure is defined as cosine of the angle between vectors of proportions of variance explained by the factors.

65 citations

••

TL;DR: An improved integrated approach to identify effective 3CLpro inhibitors from effective Chinese herbal formulas can be utilized in the discovery of antiviral drugs to achieve rapid acquisition of drugs with specific effects on antiviral targets.

Abstract: The current severe situation of coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has not been reversed and posed great threats to global health. Therefore, there is an urgent need to find out effective antiviral drugs. The 3-chymotrypsin-like protease (3CLpro) in SARS-CoV-2 serve as a promising anti-virus target due to its essential role in the regulation of virus reproduction. Here, we report an improved integrated approach to identify effective 3CLpro inhibitors from effective Chinese herbal formulas. With this approach, we identified the 5 natural products (NPs) including narcissoside, kaempferol-3-O-gentiobioside, rutin, vicenin-2 and isoschaftoside as potential anti-SARS-CoV-2 candidates. Subsequent molecular dynamics simulation additionally revealed that these molecules can be tightly bound to 3CLpro and confirmed effectiveness against COVID-19. Moreover, kaempferol-3-o-gentiobioside, vicenin-2 and isoschaftoside were first reported to have SARS-CoV-2 3CLpro inhibitory activity. In summary, this optimized integrated strategy for drug screening can be utilized in the discovery of antiviral drugs to achieve rapid acquisition of drugs with specific effects on antiviral targets.

14 citations

••

09 Apr 2021TL;DR: In this article, the performance of a wide range of peptide encodings on multiple datasets from different biomedical domains was investigated, and the results demonstrate that none of the encoders are superior for all biomedical domains.

Abstract: Owing to the great variety of distinct peptide encodings, working on a biomedical classification task at hand is challenging. Researchers have to determine encodings capable to represent underlying patterns as numerical input for the subsequent machine learning. A general guideline is lacking in the literature, thus, we present here the first large-scale comprehensive study to investigate the performance of a wide range of encodings on multiple datasets from different biomedical domains. For the sake of completeness, we added additional sequence- and structure-based encodings. In particular, we collected 50 biomedical datasets and defined a fixed parameter space for 48 encoding groups, leading to a total of 397 700 encoded datasets. Our results demonstrate that none of the encodings are superior for all biomedical domains. Nevertheless, some encodings often outperform others, thus reducing the initial encoding selection substantially. Our work offers researchers to objectively compare novel encodings to the state of the art. Our findings pave the way for a more sophisticated encoding optimization, for example, as part of automated machine learning pipelines. The work presented here is implemented as a large-scale, end-to-end workflow designed for easy reproducibility and extensibility. All standardized datasets and results are available for download to comply with FAIR standards.

13 citations

••

TL;DR: The results show how decades of quantitative research on human choice behavior can be synthesized within a single representational framework.

Abstract: Decision models are essential theoretical tools in the study of choice behavior, but there is little consensus about the best model for describing choice, with different fields and different research programs favoring their own idiosyncratic sets of models. Even within a given field, decision models are seldom studied alongside each other, and insights obtained using 1 model are not typically generalized to others. We present the results of a large-scale computational analysis that uses landscaping techniques to generate a representational structure for describing decision models. Our analysis includes 89 prominent models of risky and intertemporal choice, and results in an ontology of decision models, interpretable in terms of model spaces, clusters, hierarchies, and graphs. We use this ontology to measure the properties of individual models and quantify the relationships between different models. Our results show how decades of quantitative research on human choice behavior can be synthesized within a single representational framework. (PsycInfo Database Record (c) 2020 APA, all rights reserved).

13 citations

••

TL;DR: Although hoggorm shares some statistical methods with the Python library scikit-learn for machine learning, it follows the chemometrics paradigm for data analysis where great attention is paid to understanding and interpretation of the variance in the data.

Abstract: hoggorm is a python library for explorative analysis of multivariate data that implements statistical methods typically used in the field of chemometrics (Tormod Næs & Martens, 1988). Although hoggorm shares some statistical methods with the Python library scikit-learn for machine learning, it follows the chemometrics paradigm for data analysis where great attention is paid to understanding and interpretation of the variance in the data.

10 citations

##### References

More filters

••

TL;DR: In this article, the authors investigated the problem of translating, rotating, reflecting and scaling configurations to minimize the goodness-of-fit criterion, where Gi is the centroid of the points in p-dimensional space.

Abstract: SupposePi(i) (i = 1, 2, ...,m, j = 1, 2, ...,n) give the locations ofmn points inp-dimensional space. Collectively these may be regarded asm configurations, or scalings, each ofn points inp-dimensions. The problem is investigated of translating, rotating, reflecting and scaling them configurations to minimize the goodness-of-fit criterion Σi=1m Σi=1n Δ2(Pj(i)Gi), whereGi is the centroid of them pointsPi(i) (i = 1, 2, ...,m). The rotated positions of each configuration may be regarded as individual analyses with the centroid configuration representing a consensus, and this relationship with individual scaling analysis is discussed. A computational technique is given, the results of which can be summarized in analysis of variance form. The special casem = 2 corresponds to Classical Procrustes analysis but the choice of criterion that fits each configuration to the common centroid configuration avoids difficulties that arise when one set is fitted to the other, regarded as fixed.

2,852 citations

••

TL;DR: In this article, the use of Partial Least Squares (PLS) for handling collinearities among the independent variables X in multiple regression is discussed, and successive estimates are obtained using the residuals from previous rank as a new dependent variable y.

Abstract: The use of partial least squares (PLS) for handling collinearities among the independent variables X in multiple regression is discussed. Consecutive estimates $({\text{rank }}1,2,\cdots )$ are obtained using the residuals from previous rank as a new dependent variable y. The PLS method is equivalent to the conjugate gradient method used in Numerical Analysis for related problems.To estimate the “optimal” rank, cross validation is used. Jackknife estimates of the standard errors are thereby obtained with no extra computation.The PLS method is compared with ridge regression and principal components regression on a chemical example of modelling the relation between the measured biological activity and variables describing the chemical structure of a set of substituted phenethylamines.

2,290 citations

•

11 Sep 2013

1,790 citations

### Additional excerpts

...r1 X1;X2 ð Þ 1⁄4 trace X1X2 = trace X1X1 trace X2X2 1=2 (10)...

[...]

••

TL;DR: In this article, the shape-space l. k m whose points represent the shapes of not totally degenerate /c-ads in IR m is introduced as a quotient space carrying the quotient metric.

Abstract: The shape-space l. k m whose points a represent the shapes of not totally degenerate /c-ads in IR m is introduced as a quotient space carrying the quotient metric. When m = 1, we find that Y\ = S k ~ 2 ; when m ^ 3, the shape-space contains singularities. This paper deals mainly with the case m = 2, when the shape-space I* ca n be identified with a version of CP*~ 2 . Of special importance are the shape-measures induced on CP k ~ 2 by any assigned diffuse law of distribution for the k vertices. We determine several such shape-measures, we resolve some of the technical problems associated with the graphic presentation and statistical analysis of empirical shape distributions, and among applications we discuss the relevance of these ideas to testing for the presence of non-accidental multiple alignments in collections of (i) neolithic stone monuments and (ii) quasars. Finally the recently introduced Ambartzumian density is examined from the present point of view, its norming constant is found, and its connexion with random Crofton polygons is established.

1,468 citations

### Additional excerpts

...SMIOP T;U ð Þ−SMIPR T;U ð Þ 1⁄4 1r∑ r k1⁄41 sk−sÞ≥ 0; ð (9)...

[...]