scispace - formally typeset
Open AccessJournal ArticleDOI

Matrix Factorization with Interval-Valued Data

TLDR
This paper proposes matrix decomposition techniques that consider the existence of interval-valued data and shows that naive ways to deal with such imperfect data may introduce errors in analysis and present factorization techniques that are especially effective when the amount of imprecise information is large.
Abstract
With many applications relying on multi-dimensional datasets for decision making, matrix factorization (or decomposition) is becoming the basis for many knowledge discoveries and machine learning tasks, from clustering, trend detection, anomaly detection, to correlation analysis. Unfortunately, a major shortcoming of matrix analysis operations is that, despite their effectiveness when the data is scalar, these operations become difficult to apply in the presence of non-scalar data, as they are not designed for data that include non-scalar observations, such as intervals. Yet, in many applications, the available data are inherently non-scalar for various reasons, including imprecision in data collection, conflicts in aggregated data, data summarization, or privacy issues, where one is provided with a reduced, clustered, or intentionally noisy and obfuscated version of the data to hide information. In this paper, we propose matrix decomposition techniques that consider the existence of interval-valued data. We show that naive ways to deal with such imperfect data may introduce errors in analysis and present factorization techniques that are especially effective when the amount of imprecise information is large.

read more

Content maybe subject to copyright    Report

10 August 2022
AperTO - Archivio Istituzionale Open Access dell'Università di Torino
Original Citation:
Matrix Factorization with Interval-Valued Data
Published version:
DOI:10.1109/TKDE.2019.2942310
Terms of use:
Open Access
(Article begins on next page)
Anyone can freely access the full text of works made available as "Open Access". Works made available
under a Creative Commons license can be used according to the terms and conditions of said license. Use
of all other works requires consent of the right holder (author or publisher) if not exempted from copyright
protection by the applicable law.
Availability:
This is the author's manuscript
This version is available http://hdl.handle.net/2318/1726448 since 2020-02-03T22:20:21Z

Matrix Factorization with Interval-Valued Data
Journal:
Transactions on Knowledge and Data Engineering
Manuscript ID
TKDE-2018-12-1264.R1
Manuscript Type:
Regular
Keywords:
Matrix factorization, Interval valued data, H.2.8.d Data mining < H.2.8
Database Applications < H.2 Database Management < H Information
Technology and Systems, H.2.8.i Mining methods and algorithms <
H.2.8 Database Applications < H.2 Database Management < H
Information Technology and Sys, G.1.3.h Singular value decomposition
< G.1.3 Numerical Linear Algebra < G.1 Numerical Analysis < G
Mathematics of Computing, I.2.3.l Uncertainty, “fuzzy,” and probabilistic
reasoning < H.5.2 User Interfaces < H.5 Information Interfaces and
Represent
Transactions on Knowledge and Data Engineering

TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 1
Matrix Factor ization with Interval-Valued Data
Mao-Lin Li, Francesco Di Mauro, K Selc¸uk Candan, and Maria Luisa Sapino
Abstract—With many applications relying on multi-dimensional datasets for decision making, matrix factorization (or decomposition) is
becoming the basis for many knowledge discoveries and machine learning tasks, from clustering, trend detection, anomaly detection,
to correlation analysis. Unfortunately, a major shortcoming of matrix analysis operations is that, despite their effectiveness when t he
data is scalar, these operations become difficult to apply in the presence of non-scalar data, as they are not designed for data that
include non-scalar observations, such as intervals. Yet, in many applications, the available data are inherently non-scalar for various
reasons, including imprecision in data collection, conflicts in aggregated data, data summarization, or privacy issues, where one is
provided with a reduced, clustered, or intentionally noi sy and obfuscated version of the data to hide information. I n this paper, we
propose m atrix decomposition techniques that consider the existence of interval-valued data. We show that naive ways to deal with
such imperfect data may introduce errors in analysis and present factorization techniques that are especially effective when the amount
of i mprecise information is large.
Index Terms—Mat rix factorization, Interval valued data.
1 INTRODUCTION
W
ITH many machine-learning applications requiring
latent semantics underlying the data s e ts, matrix fac-
torization has emerged as a successful tool for discovering
latent patterns in data [1], [2]: matrices are used to encode
relationships among pairs of entities and data are analyzed
for their latent semantics through matrix decomposition
operations, such as singular value decomposition, SVD [3]
or principal component analysis, PCA [4].
1.1 Challenge: Interval-Valued Data
In many applications, data need t o b e represented as ranges
or intervals of possible values, a s opposed to scalar data:
Summarized data. Analyzing reduced or summarized
data sets can be more efficient, especially for imple-
menting interactive a pplications [5], [6]. When several
observations are grouped and collapsed into a single
observation, data may need to be represented as value
ranges. While it may sometimes be possible to asso-
ciate statistical meanings to the intervals and (assuming
that appropriate generative models, probability distri-
butions, and conditioning strategies are found) it might
be possib le to le verage probabilistic matrix factorization
techniques, such as [7], this approach may be infeasible
or ineffective due to the lack of appropriate statistical
representations and/or the cost.
Data with conflicts. When a data set reflects knowledge
integrated from different data sources, it might not
be possible to assign a single scalar weight to each
observation and an interval of poss ible values might
be a more appropriate representations [6]. Moreover,
Mao-Lin Li and K. Sel¸cuk Candan are with the School of Computing, In-
formatics, and Decision Systems Engineering, Arizona State University,
Tempe, AZ, 85281, USA.
E-mail: {mao-lin.li,candan}@asu.edu
Francesco Di Mauro and Maria Luisa Sapino are with the Department of
Computer Science, University o f Turin, Italy.
E-mail: {dimauro,mlsapino}@di.unito.it
when analyzing such inte grated data, the resulting in-
tervals may not have a statistical interpretation, beyond
presenting the spread (i.e., minimum and maxim um
values) of the data.
Anonymized data. Various privacy-preserving data pub-
lishing algorithms, such as recoding techniques [8],
replace p recise scalar values with less precise value
ranges or intervals (such as those obtained through
value generalization [8]). The resulting intervals (inten-
tionally) do not represent any specific data distribution;
consequently, associating a statistical interpretation t o
the interval is not necessarily appropriate. This means
that probabilistic techniques for data analysis are not
appropriate for anonymized data sets. In Section 6, we
see that the proposed approach is highly effective for
interval-valued data generated through generalization.
Imprecise data. Data imprecision may be caused by vari-
ous reasons, including limitations in measurement. For
example, minute variations in multiple facial images
from a single individual may be represented using
interval-valued data (see [9] and Section 6.1.2). As we
further discuss in Section 6.1.3 , ambiguities in users’
ratings in a collaborating filtering application may also
be cap tured using interval-valued data [10], [11].
The key challenge in performing decompositions over
interval-valued matrices is that definitions of basic algebraic
operations, such as multiplication and inversion (needed to
implement factorization operations), are not as straightfor-
ward for intervals as they are for s calars (see Section 2.1).
Also, unlike scalars which are totally ordered, intervals
are often partially ordered. Furthermore, a naive approach
(which would exhaustively e numerate all possible decom-
positions) would significantly increase the computational
complexity of the problem (which is already high for the
case with scalar weights). Therefore, many basic operations
need to be redefined to a ccommodate such non-scalar data
and these need to be implemented in wa ys that avoid
increases in computa tional costs.
Page 1 of 29 Transactions on Knowledge and Data Engineering
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 2
1.2 Contributions of this Paper
In this paper, we study the problem of obtaining decompo-
sitions of interval-valued matrices:
We present the decomposition problem for interval val-
ued data sets. We introduce interval-valued a lge bra and
discuss the core challenges presented by the decompo-
sition of interval-valued data.
We propose an interval-valued latent semant ics align-
ment scheme and, relying on this, we develop algo-
rithms for obtaining eigenvalue-base d (such as SVD [3])
or probabilistic (such as PM F [7]) decomposition for
interval-valued data matrices.
We finally study the effectiveness of the proposed
schemes in several applications, including face image
analysis a nd collaborative filtering.
1.3 Organization of the Paper
The paper is organized as follows: We introduce the back-
ground and review t he related work in Section 2. Section 3
introduces the problem and presents t he key observations
and mathematical formulations regarding interval-valued
latent spaces. Section 4 presents the proposed interval sin-
gular value decomposition (ISVD) algorithm to obta in SVD
decompositions in the presence of interval-valued data.
Section 5 shows that the p roposed semantic alignment tech-
nique can also be used in probabilistic matrix factorization
scenarios. Section 6 reports our experimental results under
diverse scenarios. We conclude the paper in Section 7.
2 BACKGROUND AND RELATED WORKS
2.1 Interval Algebra
We first formalize the definitions for interval-valued data
and its algebraic operations:
Definition 1 (Interval representation). An interval a
is a pair
a
= [a
, a
], a
a
,
where a
is the minimum v alue and a
is the maximum value of
the interval a
. If a
= a
, then a
is scalar.
Definition 2 (Interval span). Given an interval a
, the corre-
sponding span is computed by:
span(a
) = span([a
, a
]) = (a
a
) R
Definition 3 (Interval algebraic operations). Given two in-
tervals, [a
, a
] and [b
, b
], we adopt the following interval
algebraic operations on the m [12]:
addition: [a
, a
] + [b
, b
] = [a
+ b
, a
+ b
],
subtraction: [a
, a
] [b
, b
] = [a
b
, a
b
],
multiplication: [a
, a
] × [b
, b
] =
[min(a
×b
, a
×b
, a
×b
, a
×b
), max(a
×b
, a
×
b
, a
× b
, a
× b
)].
When one of the valu e s, say a, is scalar, the multipli-
cation [a, a] × [b
, b
] can be written as [min(a × b
, a ×
b
), max(a × b
, a × b
)] and the corresponding value of the
span is span(a × [b
, b
]) = a × span([b
, b
]).
Note that given the above definition of interval alge-
braic ope rations, more complex interval-valued operations,
such as interval-valued matrix a lge bra, can be defined by
replacing scalar addition, subtraction, and multiplication
operations, with their interval-valued counterparts.
2.2 Matrix Factorization
Feature selection and dimensionality reduction tech-
niques [13] usually involve some (often linear) transforma-
tion of the vector space containing the data to help focus
on a few features (or combinations of features) that best
discriminate the da ta in a given corpus. Among these trans-
formations, Karhunen-Loeve Transform, KLT (also known
as the principal component analysis, PCA [14]), and singular
value decomposition, SVD [3] have the key property that
the vectors selected as the dimensions of the space are
mutually orthogonal
and, hence, linearly independent (i.e.,
there is no redundancy among the dimens ions). The result-
ing basis vectors are referred to as the latent variables [15] or
the latent se mantics of the data [3]. While KLT and SVD may
result in negative values, in non-negative matrix factoriza-
tion (NMF) [16], [17], factor matrices are non-negative and
enable probabilistic interpretation of the results and discov-
ery of ge nerative models. Below, we outline three common
matrix factoriza tion schemes: singular value decomposition
(SVD [3]), non-negativ e matrix factorization (NMF [16],
[17]), and probabilistic matrix factorization (PMF [7 ]).
2.2.1 Singular Value Decomposition (SVD)
Let M R
n×m
represent the input matrix. Let the rank,
r, be a positive integer r min(n, m). In this paper, we
denote the value of i
th
row and j
th
column of M as M [i, j].
The j
th
column vector of M is similarly denoted as M [j].
M can be decomposed into M = UΣV
T
through singular
value decomposition (SVD), where
U R
n×r
, and UU
T
= I
n
;
Σ diag(R
r
+
);
V
T
= transpose (V ), V R
m×r
, and V V
T
= I
m
.
The columns of U, als o called the left singular vectors of
matrix M , are the eigenvectors of the n × n matrix, MM
T
.
The columns of V , or the right singular vectors of M, are
the eigenvectors of the m×m matrix, M
T
M. Note that b oth
columns of U and columns of V are mutually orthogonal.
2.2.2 Non-negative Matrix Factorizati on (NMF)
Given a non-negative matrix M R
n×m
+
, NMF factorizes M
into two non-negative matrices U R
n×r
+
and V R
m×r
+
with target rank r, which minimize the L
2
loss function
L
NM F
= kM UV
T
k
2
F
,
where U 0, V 0 and k.k
2
F
denotes the Frobenius norm.
The approximated solutions for U and V are commonly
found by iterative update rules, such as [17]
U[i, j] U [i, j]
(MV )[i, j]
(UV
T
V )[i, j]
V
T
[i, j] V
T
[i, j]
(U
T
M)[i, j]
(U
T
UV
T
)[i, j]
.
Page 2 of 29Transactions on Knowledge and Data Engineering
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 3
[9] extended these to interval-valu e d matrices as follows:
L
INM F
= kM
U V
T
k
2
F
+ kM
U V
T
k
2
F
U[i, j] U[i, j]
(MV )[i , j]
(UV
T
V )[i, j]
V
T
[i, j] V
T
[i, j]
(U
T
M
)[i, j]
(U
T
UV
T
)[i, j]
V
T
[i, j] V
T
[i, j]
(U
T
M
)[i, j]
(U
T
UV
T
)[i, j]
.
Note that this scheme, called I-NMF, factorizes the matrix
into a scalar-valued U and an interval-valued V
= [V
, V
].
2.2.3 Probabilistic Matrix Factorization (PMF)
Probabilistic matrix factorization (PMF) [7] assumes matrix
entries are drawn from Gaussian distribution. In particular,
given a matrix M R
n×m
, the conditional distribution over
the observed values is defined as
p(M[i, j]|U, V, σ
2
) =
n
X
i=1
m
X
j=1
[N (M[i, j]|U
[i,:]
V
[j,:]
T
, σ
2
)]
I
ij
,
where N is the probability density function of Gaussian
distribution with mean µ and variance σ
2
, I
ij
is the indi-
cator function that is equal to 1 if M[i, j] is not null and 0
otherwise. Above, U
[i,:]
and V
[j,:]
are row vectors
1
in U and
V , such that M[i, j] U
[i,:]
V
[j,:]
T
, and they place z e ro-mean
spherical Gaussian priors on latent semantics
2
. The factors,
U and V , are computed via the loss function
L
P M F
= kM UV
T
k
2
F
+ λ
U
kUk
2
F
+ λ
V
kV
T
k
2
F
,
where λ
U
=σ
2
2
U
, λ
V
=σ
2
2
V
, and k.k
2
F
denotes the Frobe-
nius norm. A local minimum of loss function L
P M F
can be
found via gradient descent in U
[i,:]
and V
[j,:]
T
L
P M F
U
[i,:]
=
m
X
j=1
(U
[i,:]
V
[j,:]
T
M [i, j])V
[j,:]
+ λU
[i,:]
L
P M F
V
[j,:]
T
=
n
X
i=1
(U
[i,:]
V
[j,:]
T
M [i, j])U
[i,:]
T
+ λV
[j,:]
T
.
2.3 Analysis of Symbolic and Interval-valued Data
In the real world, data rarely comes in simple scalar
form. Often variables of interest may take complex, of-
ten symbolic, forms, including sets, histograms, vectors,
intervals, or probability distributions [18], [19], [20], [21].
This is especially true when data is aggregated [22] or
anonymized [8]. Consequently, several data analysis tools,
including regression [23], [24], canonical analysis [25], and
multi-dimensional scaling [26], have been developed for
symbolic and interval-valued data. Given the popularity
of PCA in data analysis, several interval-valued PCA al-
gorithms have a lso been proposed [27], [28], [29], [30],
most of which leverage the sp e cific statistical and geometric
meanings of principal components of a system of variables.
As discussed above, interval NMF and PMF [9] also have
1. In this paper, we use X
[i,:]
to denote i
th
row vector and X
[:,j]
to
denote j
th
column vector of matrix X.
2. Note that here, and in th e rest of the paper, we use to denote
approximately equal
!"
!#
Fig. 1: Scalar latent semantic spaces
!"
!#
(a) int. -valued latent space
!"#$
!"%$
(b) point in int.-valued latent space
Fig. 2: Interval-valued latent semantic spaces
been studied to resolve alignment approximation in face
analysis a nd rating approximation in collaborative filtering.
In contrast, we develop a more general interval-valued la-
tent semantic alignment algorithm which can be integrated
in common ma trix factorization approaches that directly
leverages interval-va lued properties.
3 INTERVAL-VALUED LATENT SPACES
In Section 2.2.1, we have seen that a s calar matrix can be
decomposed into factor matrices (U and V ) and core matrix
(Σ). The columns in the factor matrices are referred to as
latent semantics (LS) and, preferably, they are mutually
orthogonal to serve as basis of the transformed space. Figure
1 shows a scalar-valued latent semantic space superimposed
on the original space; the figure also shows a scalar-valued
data point projected onto both original and latent sp aces.
Unfortunately, scalar-valued latent spaces are not sufficient
to present interval-valued data.
3.1 Interval-Valued Decomposition
Here, we first extend the definition of singular valued
decomposition taking into account the presence of interval-
valued data.
Definition 4 (Interval-valued Decomposition). Given an
interval-valued matrix, M
R
n×m
, and a target rank r
min(n, m), interval-valued SVD would decompose M
into
M
U
Σ
V
T
, such that
U
R
n×r
and V
R
m×r
(potentially interval-valued)
matrices, s uch that the columns of U
and V
are quasi-
orthonormal; i.e., given column indexes h and l,
Page 3 of 29 Transactions on Knowledge and Data Engineering
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

Citations
More filters
Journal ArticleDOI

Tensor-Train Decomposition in the Presence of Interval-Valued Data

TL;DR: In this article , the Tensor-Train technique is extended to deal with uncertain data, here modeled as intervals, and the authors propose a way to address this issue by extending the known tensor-train technique for tensor decomposition in order to handle uncertain data.
Journal ArticleDOI

SIRTEM: Spatially Informed Rapid Testing for Epidemic Modeling and Response to COVID-19

TL;DR: A novel extended spatially-informed epidemic model, SIRTEM, Spatially Informed Rapid Testing for Epidemic Modeling and Response to Covid-19, that integrates a multi-modal testing strategy considering test accuracies is presented and an optimization model is developed to provide a cost-effective testing strategy when multiple test types are available.
Proceedings ArticleDOI

Matrix Factorization with Interval-Valued Data

TL;DR: This paper proposes matrix decomposition techniques that consider the existence of interval-valued data and shows that naive ways to deal with such imperfect data may introduce errors in analysis and present factorization techniques that are especially effective when the amount of imprecise information is large.
References
More filters
Book

Elements of information theory

TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Journal ArticleDOI

Indexing by Latent Semantic Analysis

TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.
Journal ArticleDOI

Learning the parts of objects by non-negative matrix factorization

TL;DR: An algorithm for non-negative matrix factorization is demonstrated that is able to learn parts of faces and semantic features of text and is in contrast to other methods that learn holistic, not parts-based, representations.

Learning parts of objects by non-negative matrix factorization

D. D. Lee
TL;DR: In this article, non-negative matrix factorization is used to learn parts of faces and semantic features of text, which is in contrast to principal components analysis and vector quantization that learn holistic, not parts-based, representations.
Proceedings Article

Probabilistic Matrix Factorization

TL;DR: The Probabilistic Matrix Factorization (PMF) model is presented, which scales linearly with the number of observations and performs well on the large, sparse, and very imbalanced Netflix dataset and is extended to include an adaptive prior on the model parameters.
Related Papers (5)
Frequently Asked Questions (12)
Q1. What are the contributions in "Matrix factorization with interval-valued data" ?

Yet, in many applications, the available data are inherently non-scalar for various reasons, including imprecision in data collection, conflicts in aggregated data, data summarization, or privacy issues, where one is provided with a reduced, clustered, or intentionally noisy and obfuscated version of the data to hide information. In this paper, the authors propose matrix decomposition techniques that consider the existence of interval-valued data. The authors show that naive ways to deal with such imperfect data may introduce errors in analysis and present factorization techniques that are especially effective when the amount of imprecise information is large. 

several data analysis tools, including regression [23], [24], canonical analysis [25], and multi-dimensional scaling [26], have been developed for symbolic and interval-valued data. 

Let matrices V∗ and V ∗ capture the eigenvectors and Σ∗ and Σ ∗ be diagonal matrices encoding the square roots of the eigenvalues of matrices 

in many applications, the available data are inherently non-scalar for various reasons, including imprecision in data collection, conflicts in aggregated data, data summarization, or privacy issues, where one is provided with a reduced, clustered, or intentionally noisy and obfuscated version of the data to hide information. 

The key challenge in performing decompositions over interval-valued matrices is that definitions of basic algebraic operations, such as multiplication and inversion (needed to implement factorization operations), are not as straightforward for intervals as they are for scalars (see Section 2.1). 

Corollary 2. Corollary 1 further implies that U†U T † and V†V T † cannot be equal to scalar-valued matrix, The author, which means that an exact decomposition of interval-valued matrices is not possible. 

A local minimum of loss function LPMF can be found via gradient descent in U[i,:] and V[j,:] T∂LPMF ∂U[i,:]= m ∑j=1(U[i,:]V[j,:] T −M [i, j])V[j,:] + λU[i,:]∂LPMF ∂V[j,:] 

The result is visualized in Figure 5(b): as the authors see here, after the recomputation step, the V∗ and V ∗ matrices become much more similar, indicating more precise factor matrices, an improvement which (as the authors detail in Section 6) contributes to more accurate decompositions. 

The columns in the factor matrices are referred to as latent semantics (LS) and, preferably, they are mutually orthogonal to serve as basis of the transformed space. 

This approach results in U and V matrices that are scalar and orthonormal and a Σ core matrix that is also scalarvalued (i.e., it is compatible only with the decomposition target-c, discussed in Section 3.4). 

Figure 6 provides an overview of the accuracy and execution time results for the default configuration:• the authors obtain the highest accuracies using ISVD#-b class of techniques (returning both scalar-valued factors andinterval-valued core) – highest overall accuracy is provided by ISVD4-b, which leverages both semantic alignment and latent space recomputation techniques; • the ISVD#-c class of techniques (returning scalar valued factor and core matrices) approximate the accuracy of the ISVD0 technique – however, these include redundant work; • linear-programing based competitors [33], [35] have poor accuracies and massive execution times; the reason for this poor performance is that, as also acknowledged by the authors, these approaches are effective only when the interval ranges are very small, while their proposed approaches are able to handle intervals of varying sizes effectively. 

Given this observation, the authors can now present an interval latent semantic alignment problem, which would optimally combine minimum and maximum vectors to form an interval-valued latent space.