scispace - formally typeset

Proceedings Articleā€¢DOIā€¢

Interval-valued Matrix Factorization with Applications

13 Dec 2010-pp 1037-1042

TL;DR: The Interval-valued Matrix Factorization (IMF) framework is proposed and it is shown that proposed I-NMF and I-PMF significantly outperform their single-valued counterparts in FA and CF applications.
Abstract: In this paper, we propose the Interval-valued Matrix Factorization (IMF) framework. Matrix Factorization (MF) is a fundamental building block of data mining. MF techniques, such as Nonnegative Matrix Factorization (NMF) and Probabilistic Matrix Factorization (PMF), are widely used in applications of data mining. For example, NMF has shown its advantage in Face Analysis (FA) while PMF has been successfully applied to Collaborative Filtering (CF). In this paper, we analyze the data approximation in FA as well as CF applications and construct interval-valued matrices to capture these approximation phenomenons. We adapt basic NMF and PMF models to the interval-valued matrices and propose Interval-valued NMF (I-NMF) as well as Interval-valued PMF (I-PMF). We conduct extensive experiments to show that proposed I-NMF and I-PMF significantly outperform their single-valued counterparts in FA and CF applications.

Content maybe subject toĀ copyrightĀ Ā Ā  Report

Interval-valued Matrix Factorization with Applications
Zhiyong Shen
1,3
,LiangDu
2,1
, Xukun Shen
3
, Yidong Shen
2
1
Hewlett Packard Labs China, zhiyongs@hp.com
2
State Key Laboratory of Computer Science, China,{duliang,ydshen}@ios.ac.cn
3
State Key Laboratory of Virtual Reality Technology and system,China, xkshen@vrlab.buaa.edu.cn
Abstractā€”In this paper, we propose the Interval-valued
Matrix Factorization (IMF) framework. Matrix Factorization
(MF) is a fundamental building block of data mining. MF
techniques, such as Nonnegative Matrix Factorization (NMF)
and Probabilistic Matrix Factorization (PMF), are widely used
in applications of data mining. For example, NMF has shown
its advantage in Face Analysis (FA) while PMF has been
successfully applied to Collaborative Filtering (CF). In this
paper, we analyze the data approximation in FA as well
as CF applications and construct interval-valued matrices to
capture these approximation phenomenons. We adapt basic
NMF and PMF models to the interval-valued matrices and
propose Interval-valued NMF (I-NMF) as well as Interval-
valued PMF (I-PMF). We conduct extensive experiments to
show that proposed I-NMF and I-PMF signiļ¬cantly outperform
their single-valued counterparts in FA and CF applications.
Keywords -Matrix factorization, uncertainty
I. INTRODUCTION
Exploring data approximation has attracted much atten-
tion in uncertain data mining [1] and privacy preserving
data mining [2]. Data approximation might be caused by
limitations of measuring, delayed data update or intensional
data perturbation. When traditional data mining techniques
are employed, the consideration of data approximation may
improve the quality of results. Thus, various data mining
techniques, such as clustering, classiļ¬cation, association
mining have been adapted to handling data approximation. In
this paper, we devote to inject data approximation into Ma-
trix Factorization (MF) techniques. MF, also known as ma-
trix decomposition, underlies many data mining techniques
including clustering, dimensionality reduction and missing
data prediction etc.. It decomposes an input data matrix into
a number of low-rank factor matrices, which leads to a more
compact linear approximation for the original data matrix.
Variations MF have been extensively studied in literatures.
In this paper, we pay special attention to Nonnegative Ma-
trix Factorization (NMF) [3], [4] and Probabilistic Matrix
Factorization (PMF) [5]. Each of these MF techniques is
suited for a particular class of applications. For example,
NMF has shown its advantage in Face Analysis (FA) [4].
In FA applications, each face is represented by a feature
vector. NMF factorizes the matrix of multiple face feature
vectors into factor matrices and then achieve a more compact
representation of the original face data. On the other hand,
PMF has been successfully applied to Collaborative Filtering
(CF) [6]. CF is one of the most successful techniques
for automatic recommendation systems which need only
an observed r ating matrix as input. PMF decomposes the
sparse rating matrix into user proļ¬le matrix and item proļ¬le
matrix, and then makes predictions for the unknown entries.
However, traditional NMF and PMF ignore the following
data approximation phenomenons in FA and CF.
Alignment approximation in FA: The faces need to be
rotated and aligned to make sure that same columns in the
data matrix are corresponding to the same positions in faces.
Such alignment is hardly to be perfect in practice, i.e. there
is approximation with the alignment in FA applications (see
Section II-A for details).
Rating approximation in CF: When a user rates an
item in a real-life rating system she/he usually selects a
discretized rating value which is close to the ideal numerical
preference value (the exact preference degree). Thus, the
rating matrix does contain approximations to some degree
(see Section II-B for details).
Interval bounds are better than single-valued variables to
describe the above phenomenons of approximation. Many
application areas have taken advantage of interval-valued
data analysis (see for instance [7]), such as object tracking,
market analysis, quantitative economics and so on. In tra-
ditional MF techniques, input data matrices might be real
values, non-negative values or binary values etc., all of
which are single-valued. In this paper, we introduce a new
type of data matrix āˆ’ interval-valued matrix to MF, which
captures approximation in the observed data matrix. Then,
we propose a novel MF framework āˆ’ Interval-valued Matrix
Factorization (IMF) to decompose such matrices. Under the
IMF framework, we inject data approximation i nto NMF and
PMF and extend them to interval-valued NMF (I-NMF for
short) and interval-valued PMF (I-PMF for short). Therefore,
our work is a marriage between interval-valued data analysis
[7] and MF, and our contributions on both sides of research
area are summarized as follows
āˆ™ We analyze the alignment approximation in FR as well
as the rating approximation in CF, and formalize them
with interval-valued matrices (Section II).
āˆ™ We propose the IMF framework, under which we
extend two representative basic MF techniques NMF
2010 IEEE International Conference on Data Mining
1550-4786/10 $26.00 Ā© 2010 IEEE
DOI 10.1109/ICDM.2010.115
1037

and PMF to I-NMF and I-PMF which are capable of
handling interval-valued matrices (Section IV).
āˆ™
We conduct extensive experiments to show that the the
proposed I-NMF and I-PMF signiļ¬cantly outperform
their traditional single-valued counterparts in FA and
CF applications (Section V).
II. I
NTERVAL-VALUED MATRIX AND DATA
APPROXIMATION
In this section we formalize the approximation in CF and
FR problems with interval-valued matrices. First of all, we
give formal deļ¬nitions of interval-valued matrix.
Let š‘æ āˆˆ ā„
š‘›Ć—š‘‘
denote the input data matrix, with
entries denoted as š‘‹
š‘–š‘—
.Letš¼(š‘æ) denote the interval-valued
matrix corresponding to š‘æ, and we have the following two
equivalent representations for š¼(š‘æ).
Deļ¬nition 1 (Center-radius representation). We denote the
interval with center š‘‹
š‘–š‘—
and radius š›æ
š‘–š‘—
as
š¼(š‘‹
š‘–š‘—
)=āŸØš‘‹
š‘–š‘—
,š›æ
š‘–š‘—
āŸ© (1)
For entire matrices, we have š¼(š‘æ)=āŸØš‘æ, šœ¹āŸ©.
Deļ¬nition 2 (Min-max representation). We denote the in-
terval bounds as š‘‹
low
š‘–š‘—
= š‘‹
š‘–š‘—
āˆ’ š›æ
š‘–š‘—
and š‘‹
up
š‘–š‘—
= š‘‹
š‘–š‘—
+ š›æ
š‘–š‘—
.
š¼(š‘‹
š‘–š‘—
)=[š‘‹
low
š‘–š‘—
,š‘‹
up
š‘–š‘—
] (2)
For entire matrices, we have š¼(š‘æ)=[š‘æ
low
, š‘æ
up
].
In practice, we might only observe single-valued data
matrices rather than interval-valued ones. In the following
subsections weā€™ll give the empirical method to construct
š‘°(
š‘æ) based on š‘æ. The above deļ¬nitions have already been
adopted in interval-valued data analysis [8]. In our work,
weā€™ll use the center-radius representation (Deļ¬nition 1) to
formalize the rating approximation in CF and alignment
approximation in FR and then construct interval-valued
matrices. The min-max representation (Deļ¬nition 2) will be
used as input for the proposed IMF models introduced in
Section IV.
A. Alignment Approximation in FR
In many FA techniques, we need to align the faces image
such that, ideally, the pixels with the same coordinates
correspond to the identical positions of a face. In Figure
1, we take the position of the nose tip as an example to
show that the alignment is not perfect. Although the same
position of a face is not exactly aligned, they should be
near to each other. Take the ļ¬rst row as examples, the pixel
with coordinate (33,35) may corresponds to the face position
coordinated by (41,34) in the second image, or (33,40) in
the third and so on. Formally, the value of a pixel with
coordinates (š‘„, š‘¦),š‘„ āˆˆ{1, ..., š‘‘
š‘„
},š‘¦ āˆˆ{1, ..., š‘‘
š‘¦
}, might
correspond to a pixel with coordinates (š‘„ +Ī”š‘„, š‘¦ +Ī”š‘¦),
0
ā‰¤ Ī”š‘„, Ī”š‘¦ ā‰¤ š‘Ÿ.InMF,theš‘–ā€™th face is represented by
(33,35)
20 40 60
20
40
60
(41,34)
20 40 60
20
40
60
(33,40)
20 40 60
20
40
60
(25,32)
20 40 60
20
40
60
(44,33)
20 40 60
20
40
60
(34,32)
20 40 60
20
40
60
(29,38)
20 40 60
20
40
60
(32,36)
20 40 60
20
40
60
(35,38)
20 40 60
20
40
60
(37,37)
20 40 60
20
40
60
Figure 1. Illustration o f alignment approximation.
20 40 60
20
40
60
20 40 60
20
40
60
20 40 60
20
40
60
20 40 60
20
40
60
20 40 60
20
40
60
20 40 60
20
40
60
20 40 60
20
40
60
20 40 60
20
40
60
20 40 60
20
40
60
20 40 60
20
40
60
Figure 2. An example of šœ¹ matrix corresponding to faces in Figure 1.
a vector š‘æ
š‘–ā‹…
with dimensionality š‘‘ = š‘‘
š‘„
Ɨ š‘‘
š‘¦
.Weuse
(š‘„
(š‘–,š‘—)
,š‘¦
(š‘–,š‘—)
) to denote the coordinates of pixels in the š‘–ā€™th
image which corresponds to the š‘—ā€™th element in vector š‘æ
š‘–ā‹…
,
namely š‘‹
š‘–š‘—
. Then, we deļ¬ne the following set of the entries
in š‘æ for each š‘‹
š‘–š‘—
š’®
FA( š‘Ÿ)
š‘–š‘—
= {š‘‹
š‘–š‘—
ā€²
āˆ£āˆ£š‘„
(š‘–,š‘—
ā€²
)
āˆ’š‘„
(š‘–,š‘—)
āˆ£ā‰¤š‘Ÿ āˆ§āˆ£š‘¦
(š‘–,š‘—
ā€²
)
āˆ’š‘¦
(š‘–,š‘—)
āˆ£ā‰¤š‘Ÿ}
(3)
The elements in š’®
FA( š‘Ÿ)
š‘–š‘—
correspond to pixels around
(š‘„
(š‘–,š‘—)
,š‘¦
(š‘–,š‘—)
) in a range š‘Ÿ. Intuitively, š‘‹
š‘–š‘—
may corresponds
to a value in the interval of [min(š’®
FA ( š‘Ÿ)
š‘–š‘—
),max(š’®
FA ( š‘Ÿ)
š‘–š‘—
)],
which coincides the min-max deļ¬nition (Deļ¬nition 2).
However, min-max statistics are not robust in practice and
alternatively, we construct š¼(š‘‹
š‘–š‘—
) based on the standard
deviation to capture the variation in š’®
FA( š‘Ÿ)
š‘–š‘—
. According to
Deļ¬nition 1, we set š‘‹
š‘–š‘—
as the center of š¼(š‘‹
š‘–š‘—
) and calculate
the radius via
š›æ
FA( š‘Ÿ)
š‘–š‘—
:= š›¼ ā‹… std(š’®
FR(š‘Ÿ)
š‘–š‘—
) (4)
where š›¼ āˆˆ ā„
+
is a multiplicative scale coefļ¬cient. Based
on Deļ¬ntion 2, itā€™s easy to calculate the bounds of interval-
valued input for I-NMF according to min-max representation
(Deļ¬nition 2). Examples of the šœ¹
FR(r)
š‘–ā‹…
corresponding to
the faces in Figure 1 are shown in Figure 2, where lighter
gray level represents larger radius. In Figure 2, we can see
positions such as eyes or nose have larger radius. These
positions are more sensitive to alignment error that may
hurt the performance of single-valued techniques. With a
relatively large radius, the interval-valued techniques may
be more tolerant to such alignment errors.
1038

Table I
E
XAMPLES OF SINGLE-VALUED AND INTERVAL-VALUED RATING MATRICES FOR CF
(a) A single-valued rating matrix: š‘æ
š‘š
1
š‘š
2
š‘š
3
š‘š
4
š‘š
5
š‘¢
1
14 5
š‘¢
2
312
š‘¢
3
14
š‘¢
4
5
š‘¢
5
142
š‘¢
6
325
(b) A interval-valued rating matrix: š¼(š‘æ)
š‘š
1
š‘š
2
š‘š
3
š‘š
4
š‘š
5
š‘¢
1
[0.6,1.4] [3.5,4.5] [4.8,5.2]
š‘¢
2
[2.8,3.2] [0.5,1.5] [1.5,2.5]
š‘¢
3
[0.7,1.3] [3.5,4.5]
š‘¢
4
[4.5,5.5]
š‘¢
5
[0.4,1.6] [3.7,4.3] [1.8,2.2]
š‘¢
6
[2.7, 3.3] [1.4, 2.6] [4.2, 5.8]
B. Rating Approximation in CF
In CF, the rating degree is actually an approximate to
its actual preference degree of a user š‘¢ over an item. For
example, a web site allows users to rate items from one star
to ļ¬ve stars. User š‘¢ may think the two items š‘Ž and š‘ are
beyond two stars while not worth four stars, and he may
prefer š‘Ž to š‘. Suppose the continuous preference degrees of
user š‘¢ on š‘Ž and š‘ are 3.4 and 2.8, respectively. However,
due to the constraint of the rating system, š‘¢ can only rate
both š‘Ž and š‘ as three stars, and the difference between š‘Ž
and š‘ disappears. It also indicates that the rating degree
actually represents a continuous interval, which may include
the i deal preference degree. Intuitively, the rating degree š‘‹
š‘–š‘—
is affected by both the š‘–ā€™th user and š‘—ā€™th item. Therefore,
we deļ¬ne the observations relevant to š‘‹
š‘–š‘—
with the set as
follows:
š’®
CF
š‘–š‘—
= {š‘‹
š‘–
ā€²
š‘—
ā€²
āˆ£(š‘–
ā€²
= š‘– āˆØ š‘—
ā€²
= š‘—) āˆ§ (š‘–
ā€²
,š‘—
ā€²
) āˆˆ (i, j)} (5)
š’®
CF
š‘–š‘—
is actually constructed by the observed rating degrees
in the š‘–-th row and š‘—-th column of the rating matrix in CF.
Again, we calculate the radius š›æ
CF
š‘–š‘—
for each observed r ating
degree š‘‹
š‘–š‘—
according to Deļ¬nition 1 based on the standard
deviation of the ratings in š’®
CF
š‘–š‘—
:
š›æ
CF
š‘–š‘—
:= š›¼ ā‹… std(š’®
CF
š‘–š‘—
) (6)
where š›¼ āˆˆ ā„
+
is again a multiplicative scale coefļ¬cient.
Intuitively, a userā€™s ratings on different items (or the ratings
of a item from different users) vary greatly, we should assign
a big value of interval radius to this entry. Then, itā€™s easy
to calculate the bounds of interval-valued input for I-PMF
according to min-max representation (Deļ¬nition 2). A exam-
ple of interval-valued rating matrix with its corresponding
single-valued matrix in min-max representation are shown
in Table II-A
III. M
ATR I X FACTORIZATION WITH APPLICATIONS
In this section we brieļ¬‚y discuss the MF techniques with
their applications. We devote special attention the the NMF
and PMF techniques since they serve to be the single-valued
counterparts of the proposed IMF models.
MF is a linear approximation data representation for the
original data matrix š‘æ āˆˆ ā„
š‘›Ć—š‘‘
. Generally, we have
š‘æ ā†’ š‘¼š‘½ (7)
where š‘¼ āˆˆ ā„
š‘›Ć—š‘˜
and š‘½ āˆˆ ā„
š‘˜Ć—š‘‘
. Each data instance š‘‹
š‘–ā‹…
is approximated by a linear combination of the rows of š‘½
with weight vector š‘¼
š‘–ā‹…
,theš‘–ā€™th row of š‘¼ . Thus, we call
š‘¼ as weight matrix and š‘½ as basis matrix. The ranks of
š‘¼ and š‘½ are always much lower than the rank of š‘æ, i.e.
š‘˜ ā‰Ŗ š‘šš‘–š‘›(š‘›, š‘‘). After learning š‘¼ and š‘½ , we can reconstruct
š‘æ as follows
Ė†
š‘æ ā† š‘¼š‘½ (8)
Various assumptions over š‘¼ and š‘½ lead to different MF
models which have been widely used in data mining appli-
cations. The following two series of applications are relevant
to this paper:
Parts-based representation: MF naturally represent the
original data matrix š‘æ by parts. The rows in š‘½ , so-called
basis vectors, are optimized for the linear approximation
for š‘æ , and š‘¼
š‘–ā‹…
could be regard as a representation for the
š‘‹
š‘–ā‹…
with lower dimensionality. NMF has been successfully
applied to ļ¬nd addictive parts-based representations for face
images (see for detail in Section III-A).
Missing Data Prediction: The reconstructed matrix
Ė†
š‘æ is
a full matrix. Therefore, when š‘æ is sparse, we can make
prediction for its missing entries based on
Ė†
š‘æ. For example,
PMF is successfully applied to predict the missing entries
of the rating matrices in CF (see for detail in Section III-B).
A. Nonnegative Matrix Factorization
NMF aims to factorize a nonnegative matrix š‘æ āˆˆ ā„
š‘›Ć—š‘‘
+
with two nonnegative matrices š‘¼ āˆˆ ā„
š‘›Ć—š‘˜
+
and š‘½ āˆˆ ā„
š‘˜Ć—š‘‘
+
which minimize the following šæ
2
loss function
ā„’
NMF
= āˆ„š‘æ āˆ’ š‘¼š‘½ āˆ„
2
F
s.t. š‘¼ ā‰„ 0, š‘½ ā‰„ 0
(9)
where āˆ„ā‹…āˆ„
F
denotes the Frobenius norm. The estimations
of š‘¼ and š‘½ can be ļ¬nd via the multiplicative update rules
proposed in [3], which iteratively update š‘¼ and š‘½ as follows
1039

š‘ˆ
š‘–š‘—
ā† š‘ˆ
š‘–š‘—
(š‘暝‘½
š‘‡
)
š‘–š‘—
(š‘¼š‘½ š‘½
š‘‡
)
š‘–š‘—
š‘‰
š‘–š‘—
ā† š‘‰
š‘–š‘—
(š‘¼
š‘‡
š‘æ)
š‘–š‘—
(š‘¼
š‘‡
š‘¼š‘½ )
š‘–š‘—
(10)
The update rules in (10) can be deduced according to
Karush-Kunhn-Trucker optimal condition [9] of inequality
constraint (see for detail in [10]). In [3], it is proved that
the updates in (10) lead to a local minimum of (9). The
non-negative constraints on š‘¼ and š‘½ only allow addictive
linear combination of basis vectors in š‘½ , so-called parts-
based representation [4]. NMF is suited for many real world
applications such as human face analysis [4]. In human
face analysis, the resultant matrix š‘¼ constructs a optimized
representation for the original data instances. Many FA
algorithms, such as face recognition, face clustering, may
be directly applied on š‘¼ instead of the original data matrix
š‘æ.
B. Probabilistic Matrix Factorization
In CF, the PMF model [5] assume that the ratings are
drawn from some Gaussian distribution.
š‘(š‘‹
š‘–š‘—
āˆ£š‘–, š‘—, š‘¼ , š‘½ ,šœŽ
2
)=G(š‘‹
š‘–š‘—
āˆ£š‘¼
š‘–ā‹…
š‘½
ā‹…š‘—
,šœŽ
2
) (11)
For š‘¼ and š‘½ , they place zero-mean spherical Gaussian
priors
š‘(š‘¼ āˆ£šœŽ
2
1
)=
āˆ
š‘–
G(š‘¼
š‘–ā‹…
āˆ£0,šœŽ
2
1
š‘°),š‘(š‘½ āˆ£šœŽ
2
1
)=
āˆ
š‘—
G(š‘½
ā‹…š‘—
āˆ£0,šœŽ
2
1
š‘°)
(12)
The š‘¼ and š‘½ are computed via over the observed ratings
ā„’
PMF
= āˆ„š‘æ āˆ’ š‘¼š‘½ āˆ„
2
F
+ šœ†
[
āˆ„š‘¼ āˆ„
2
F
+ āˆ„š‘½ āˆ„
2
F
]
(13)
where šœ† = šœŽ
2
/šœŽ
2
1
. A local minimum of (13) can be found
via gradient decent in š‘¼
š‘–ā‹…
and š‘½
ā‹…š‘—
āˆ‚ā„’
PMF
āˆ‚š‘¼
š‘–ā‹…
=
āˆ‘
š‘—āˆˆj
š‘–
(š‘¼
š‘–ā‹…
š‘½
ā‹…š‘—
āˆ’ š‘‹
š‘–š‘—
)š‘½
š‘‡
ā‹…š‘—
+ šœ†š‘¼
š‘–ā‹…
āˆ‚ā„’
PMF
āˆ‚š‘½
ā‹…š‘—
=
āˆ‘
š‘–āˆˆi
š‘—
(š‘¼
š‘–ā‹…
š‘½
š‘‡
ā‹…š‘—
āˆ’ š‘‹
š‘–š‘—
)š‘¼
š‘‡
š‘–ā‹…
+ šœ†š‘½
ā‹…š‘—
(14)
Based on the learning of š‘¼ and š‘½ , we can estimate the
unknown ratings in š‘æ via
Ė†
š‘‹
š‘–š‘—
= š‘¼
š‘–ā‹…
š‘½
ā‹…š‘—
(15)
IV. I
NTERVAL-VALUED MATR I X FACTORIZATION
In this section, we introduce the IMF framework. The
proposed framework is based on the Min-Max representation
of the interval-valued matrix: š¼(š‘æ )=[š‘æ
low
, š‘æ
up
]. We can
extend the original MF over š‘‹ to the joint MF over š‘æ
low
and š‘æ
up
. Firstly, we assume each š‘‹
š‘–š‘—
is drawn from a
uniform distribution with parameters š‘‹
low
š‘–š‘—
and š‘‹
up
š‘–š‘—
.
š‘‹
š‘–š‘—
āˆ¼ uniform(š‘‹
low
š‘–š‘—
,š‘‹
up
š‘–š‘—
) (16)
Base on this assumption, we have
E(š‘‹
š‘–š‘—
)=
1
2
(š‘‹
low
š‘–š‘—
+ š‘‹
up
š‘–š‘—
) (17)
Therefore, we propose to estimate the bounds of š¼(š‘æ) ļ¬rst
via the following joint MF
š‘æ
low
ā†’ š‘¼š‘½
low
, š‘æ
up
ā†’ š‘¼š‘½
up
(18)
We ļ¬x the weight matrix š‘¼ to make a unique proļ¬le for
each data instance and use š‘½
low
, š‘½
up
to maintain the data
approximation. The reconstructions of š‘æ
low
and š‘æ
up
could
be calculated as follows
Ė†
š‘æ
low
ā† š‘¼š‘½
low
,
Ė†
š‘æ
up
ā† š‘¼š‘½
up
(19)
According to (17) and (19), we can reconstruct š‘æ via
Ė†
š‘æ ā†
1
2
(
Ė†
š‘æ
low
+
Ė†
š‘æ
up
) (20)
A. Interval-valued NMF
According to (9) and (18), the šæ
2
loss function of interval-
valued NMF (I-NMF f or short) is
ā„’
Iāˆ’NMF
= āˆ„š‘æ
low
āˆ’ š‘¼š‘½
low
āˆ„
2
F
+ āˆ„š‘æ
up
āˆ’ š‘¼š‘½
up
āˆ„
2
F
s.t. š‘¼ ā‰„ 0, š‘½
low
ā‰„ 0, š‘½
up
ā‰„ 0
(21)
Similar to traditional NMF, we have the following multi-
plicative update rule for š‘¼ , š‘½
low
and š‘½
up
:
š‘ˆ
š‘”+1
š‘–š‘—
ā† š‘ˆ
š‘”
š‘–š‘—
[š‘æ
low
(š‘½
low
)
š‘‡
+ š‘æ
up
(š‘½
up
)
š‘‡
]
š‘–š‘—
[š‘¼š‘½
low
(š‘½
low
)
š‘‡
+ š‘¼š‘½
up
(š‘½
up
)
š‘‡
]
š‘–š‘—
š‘‰
low,š‘”+1
š‘–š‘—
ā† š‘‰
low,š‘”
š‘–š‘—
(š‘¼
š‘‡
š‘æ
low
)
š‘–š‘—
(š‘¼
š‘‡
š‘¼š‘½
low
)
š‘–š‘—
š‘‰
up,š‘”+1
š‘–š‘—
ā† š‘‰
up,š‘”
š‘–š‘—
(š‘¼
š‘‡
š‘æ
up
)
š‘–š‘—
(š‘¼
š‘‡
š‘¼š‘½
up
)
š‘–š‘—
(22)
Similar to traditional NMF, we also have that the šæ
2
loss
function ā„’
Iāˆ’NMF
as shown in (21) is nonincreasing under
the multiplicative update rules as shown in (22).
Traditional NMF decomposes the original data matrix
into two low-rank factor matrices: one proļ¬les the data
instances while the other proļ¬les the features. In I-NMF,
the proposed the joint matrix factorization framework makes
the feature proļ¬le factor matrices š‘½
low
ā‹…š‘—
and š‘½
up
ā‹…š‘—
contain the
data approximation while preserving a unique proļ¬le š‘¼
š‘–ā‹…
for
each data instance. We can directly apply the face analysis
techniques over š‘¼ .
B. Interval-valued PMF
In this section we introduce the interval-valued PMF (I-
PMF for short). Analogously to (13) and according to (18),
we have the following regularized šæ
2
loss
ā„’
Iāˆ’PMF
= āˆ„š‘æ
low
āˆ’ š‘¼š‘½
low
āˆ„
F
+ āˆ„š‘æ
up
āˆ’ š‘¼š‘½
up
āˆ„
2
F
+šœ†
(
āˆ„š‘¼ āˆ„
2
F
+ āˆ„š‘½
low
āˆ„
2
F
+ āˆ„š‘½
up
āˆ„
2
F
)
(23)
1040

20 22 24 26 28 30 32 34 36 38 40
0.87
0.875
0.88
0.885
0.89
0.895
0.9
0.905
0.91
0.915
0.92
Face Recognition
F1 Measure
Number of Factors (k)
ORL32: Raw
ORL32: NMF
ORL32: Iāˆ’NMF
20 22 24 26 28 30 32 34 36 38 40
13.5
14
14.5
15
15.5
16
16.5
17
Face Reconstruction
RE
Number of Factors (k)
20 22 24 26 28 30 32 34 36 38 40
0.62
0.63
0.64
0.65
0.66
0.67
0.68
0.69
0.7
0.71
Number of Factors (k)
ACC
Face Clustering
ORL64: Raw
ORL64: NMF
ORL64: Iāˆ’NMF
20 22 24 26 28 30 32 34 36 38 40
0.78
0.785
0.79
0.795
0.8
0.805
0.81
0.815
0.82
0.825
0.83
Face Clustering
NMI
Number of Factors (k)
Figure 3. Performance comparison in face analysis.
It is easy to derive a gradient decent in š‘¼
š‘–ā‹…
, š‘½
low
ā‹…š‘—
and
š‘½
up
ā‹…š‘—
to ļ¬nd a local minimum of (23).
āˆ‚ā„’
Iāˆ’PMF
āˆ‚š‘¼
š‘–ā‹…
=
āˆ‘
š‘—āˆˆj
š‘–
[(š‘¼
š‘–ā‹…
š‘½
low
ā‹…š‘—
āˆ’ š‘‹
low
š‘–š‘—
)š‘½
lowš‘‡
ā‹…š‘—
+(š‘¼
š‘–ā‹…
š‘½
up
ā‹…š‘—
āˆ’ š‘‹
up
š‘–š‘—
)š‘½
upš‘‡
ā‹…š‘—
]+šœ†š‘¼
š‘–ā‹…
āˆ‚ā„’
Iāˆ’PMF
āˆ‚š‘½
low
ā‹…š‘—
=
āˆ‘
š‘–āˆˆi
š‘—
(š‘¼
š‘–ā‹…
š‘½
low
ā‹…š‘—
āˆ’ š‘€
low
š‘¢š‘š
)š‘¼
š‘‡
š‘–ā‹…
+ šœ†š‘½
low
ā‹…š‘—
āˆ‚ā„’
Iāˆ’PMF
āˆ‚š‘½
up
ā‹…š‘—
=
āˆ‘
š‘–āˆˆi
š‘—
(š‘¼
š‘–ā‹…
š‘½
up
ā‹…š‘—
āˆ’ š‘€
up
š‘¢š‘š
)š‘¼
š‘‡
š‘–ā‹…
+ šœ†š‘½
up
ā‹…š‘—
(24)
For CF application, we can used the learned š‘¼ , š‘½
low
and
š‘½
up
to compute the unknown ratings via (19) and (20).
V. E
XPERIMENTAL RESULTS
We divide the experiments into two parts: In Section V-A
we conduct the comparison of I-NMF against the basic NMF
for FA applications, and in Section V-B we compare the
performance of I-PMF and PMF over CF applications.
A. Comparison of I-NMF against NMF
We compare the performance of NMF and I-NMF on
various FA applications including face recognition, face
reconstruction and face clustering.
1) Data Description and Evaluation Setting: We use the
Olivertti Research Laboratory (ORL) face data sets to evalu-
ate the NMF and I-NMF models, which contain ten different
images of each of 40 distinct persons, (š‘› =10Ɨ 40 = 400
in total). Two versions of processed data sets
1
: one with res-
olution 32Ɨ32 (ORL32) and the other with 64Ɨ64 (ORL64),
are used for our experimental evaluation. In ORL32, each
face image is represented by a vector with dimensionality
š‘‘ =32Ɨ 32 = 1024 while in ORL64, š‘‘ =64Ɨ 64 = 4096.
We implement I-NMF based on multiplicative update
rules introduced in Section IV-A. The experiments for NMF
1
http://www.cs.uiuc.edu/homes/dengcai2/Data/FaceData.html
and are based on the DTU NMF toolbox
2
. Various classiļ¬ers
has been adopted for face recognition and in this paper, we
apply the the nearest neighbor method for its simplicity. For
face clustering, we choose the popular K-means algorithm.
All the classiļ¬cation and clustering algorithms are applied
on the output weight matrices š‘¼ from NMF and I-NMF and
we also give the performance of these algorithms over the
raw data matrix š‘æ as the baseline. In the construction of
interval-valued matrices (4), we set š‘Ÿ =5and š›¼ =2.5.
We evaluate the proposed models in terms of the face
recognition and clustering effectiveness. Note that face
recognition is actually a classiļ¬cation problem. To evaluate
the effectiveness of classiļ¬cation (FR), we use the standard
š¹ 1 measure. We adopt two popular metrics Normalized
Mutual Information (NMI) [11] and Clustering Accuracy
(ACC) for cluster evaluation. Based on NMF, the faces
are reconstructed with the weighted summation of basis
vectors. We use the following Reconstruction Error (RE):
RE(
Ė†
š‘æ, š‘æ )=
āˆš
āˆ‘
š‘›
š‘–=1
āˆ‘
š‘‘
š‘—=1
(
Ė†
š‘‹
š‘–š‘—
āˆ’š‘‹
š‘–š‘—
)
2
š‘›Ć—š‘‘
to evaluate the good-
ness of reconstruction matrix
Ė†
š‘æ according to the original
data matrix š‘æ .
Note that larger values of F1, NMI and ACC indicate bet-
ter face recognition or clustering results while small values
of RE indicate better performance of face reconstruction.
2) Evaluation Results: We compare the models with
varying rank of factor matrix š‘˜ and interval sizes.
Evaluation with varying š‘˜: The face clustering and face
reconstruction tasks are evaluated over entire data sets. For
the face recognition task, we make ten rounds of random
sampling of 50% data for training. In general, the perfor-
mance of NMF and I-NMF for all the face analysis tasks
varies with the number of latent factors (š‘˜). For each value
of š‘˜, we run 100 rounds of NMF and I-NMF. The average
values of the performance metrics plotted for each model
as shown in Figure 3 where each sub-ļ¬gure corresponds
to a face analysis task with the speciļ¬c evaluation metric
and each line corresponds to a model on a speciļ¬c data
set.From Figure 3, we see that I-NMF outperforms NMF
with statistical signiļ¬cance over all evaluation metrics on
both two data sets.
B. Comparison of I-PMF against PMF
1) Data Description and Evaluation Setting: In this part
of experiments, we also use two data sets for evaluation.
Movielens data set
3
is downloaded from the web-site of
GroupLens research group and we use the subset which
contains 100,000 ratings for š‘‘ = 1682 movies by š‘› = 943
users of the online movie recommender service. We name
this data set as Movielens-100K. Netļ¬‚ix data set
4
is the
ofļ¬cial data set used in the Netļ¬‚ix Prize competition. Again,
2
http://isp.imm.dtu.dk/toolbox/nmf/nmf toolbox ver1.4.zip
3
http://www.grouplens.org/system/ļ¬les/ml-data 0.zip
4
http://archive.ics.uci.edu/ml/datasets/Netļ¬‚ix+Prize
1041

Citations
More filters

Journal Articleā€¢DOIā€¢
TL;DR: This paper proposes matrix decomposition techniques that consider the existence of interval-valued data and shows that naive ways to deal with such imperfect data may introduce errors in analysis and present factorization techniques that are especially effective when the amount of imprecise information is large.
Abstract: With many applications relying on multi-dimensional datasets for decision making, matrix factorization (or decomposition) is becoming the basis for many knowledge discoveries and machine learning tasks, from clustering, trend detection, anomaly detection, to correlation analysis. Unfortunately, a major shortcoming of matrix analysis operations is that, despite their effectiveness when the data is scalar, these operations become difficult to apply in the presence of non-scalar data, as they are not designed for data that include non-scalar observations, such as intervals. Yet, in many applications, the available data are inherently non-scalar for various reasons, including imprecision in data collection, conflicts in aggregated data, data summarization, or privacy issues, where one is provided with a reduced, clustered, or intentionally noisy and obfuscated version of the data to hide information. In this paper, we propose matrix decomposition techniques that consider the existence of interval-valued data. We show that naive ways to deal with such imperfect data may introduce errors in analysis and present factorization techniques that are especially effective when the amount of imprecise information is large.

1Ā citations


Cites background or methods from "Interval-valued Matrix Factorizatio..."

  • ...As discussed above, interval NMF and PMF [9] also have been studied to resolve alignment approximation in face analysis and rating approximation in collaborative filtering....

    [...]

  • ...As the chart shows, the prediction accuracy of all algorithms improves as we consider higher decomposition ranks and the proposed latent semantic alignment based approach, AIPMF, leads to better prediction performance than both PMF and I-PMF, for decomposition ranks > 60....

    [...]

  • ...[9] extended these to interval-valued matrices as follows:...

    [...]

  • ...As described in Section 6.1.2, we also compare proposed ISVD approaches with NMF and I-NMF [9] for the face analysis tasks: data reconstruction and classification....

    [...]

  • ...For collaborative filtering with social media data, discussed in Section 6.1.3, we used PMF and I-PMF [9] as competitors....

    [...]


Proceedings Articleā€¢DOIā€¢
01 May 2019-
TL;DR: A probabilistic model for analyzing the generalized interval valued matrix, a matrix that has scalar valued elements and bounded/unbounded interval valued elements, is proposed and it is proved that the objective function is monotonically decreasing by the parameter update.
Abstract: In this paper, we propose a probabilistic model for analyzing the generalized interval valued matrix, a matrix that has scalar valued elements and bounded/unbounded interval valued elements. We derive a majorization minimization algorithm for parameter estimation and prove that the objective function is monotonically decreasing by the parameter update. An experiment shows that the proposed model well handles interval- valued elements and offers improved performance.

References
More filters

Bookā€¢
Thomas M. Cover1, Joy A. Thomas2ā€¢Institutions (2)
01 Jan 1991-
TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Abstract: Preface to the Second Edition. Preface to the First Edition. Acknowledgments for the Second Edition. Acknowledgments for the First Edition. 1. Introduction and Preview. 1.1 Preview of the Book. 2. Entropy, Relative Entropy, and Mutual Information. 2.1 Entropy. 2.2 Joint Entropy and Conditional Entropy. 2.3 Relative Entropy and Mutual Information. 2.4 Relationship Between Entropy and Mutual Information. 2.5 Chain Rules for Entropy, Relative Entropy, and Mutual Information. 2.6 Jensen's Inequality and Its Consequences. 2.7 Log Sum Inequality and Its Applications. 2.8 Data-Processing Inequality. 2.9 Sufficient Statistics. 2.10 Fano's Inequality. Summary. Problems. Historical Notes. 3. Asymptotic Equipartition Property. 3.1 Asymptotic Equipartition Property Theorem. 3.2 Consequences of the AEP: Data Compression. 3.3 High-Probability Sets and the Typical Set. Summary. Problems. Historical Notes. 4. Entropy Rates of a Stochastic Process. 4.1 Markov Chains. 4.2 Entropy Rate. 4.3 Example: Entropy Rate of a Random Walk on a Weighted Graph. 4.4 Second Law of Thermodynamics. 4.5 Functions of Markov Chains. Summary. Problems. Historical Notes. 5. Data Compression. 5.1 Examples of Codes. 5.2 Kraft Inequality. 5.3 Optimal Codes. 5.4 Bounds on the Optimal Code Length. 5.5 Kraft Inequality for Uniquely Decodable Codes. 5.6 Huffman Codes. 5.7 Some Comments on Huffman Codes. 5.8 Optimality of Huffman Codes. 5.9 Shannon-Fano-Elias Coding. 5.10 Competitive Optimality of the Shannon Code. 5.11 Generation of Discrete Distributions from Fair Coins. Summary. Problems. Historical Notes. 6. Gambling and Data Compression. 6.1 The Horse Race. 6.2 Gambling and Side Information. 6.3 Dependent Horse Races and Entropy Rate. 6.4 The Entropy of English. 6.5 Data Compression and Gambling. 6.6 Gambling Estimate of the Entropy of English. Summary. Problems. Historical Notes. 7. Channel Capacity. 7.1 Examples of Channel Capacity. 7.2 Symmetric Channels. 7.3 Properties of Channel Capacity. 7.4 Preview of the Channel Coding Theorem. 7.5 Definitions. 7.6 Jointly Typical Sequences. 7.7 Channel Coding Theorem. 7.8 Zero-Error Codes. 7.9 Fano's Inequality and the Converse to the Coding Theorem. 7.10 Equality in the Converse to the Channel Coding Theorem. 7.11 Hamming Codes. 7.12 Feedback Capacity. 7.13 Source-Channel Separation Theorem. Summary. Problems. Historical Notes. 8. Differential Entropy. 8.1 Definitions. 8.2 AEP for Continuous Random Variables. 8.3 Relation of Differential Entropy to Discrete Entropy. 8.4 Joint and Conditional Differential Entropy. 8.5 Relative Entropy and Mutual Information. 8.6 Properties of Differential Entropy, Relative Entropy, and Mutual Information. Summary. Problems. Historical Notes. 9. Gaussian Channel. 9.1 Gaussian Channel: Definitions. 9.2 Converse to the Coding Theorem for Gaussian Channels. 9.3 Bandlimited Channels. 9.4 Parallel Gaussian Channels. 9.5 Channels with Colored Gaussian Noise. 9.6 Gaussian Channels with Feedback. Summary. Problems. Historical Notes. 10. Rate Distortion Theory. 10.1 Quantization. 10.2 Definitions. 10.3 Calculation of the Rate Distortion Function. 10.4 Converse to the Rate Distortion Theorem. 10.5 Achievability of the Rate Distortion Function. 10.6 Strongly Typical Sequences and Rate Distortion. 10.7 Characterization of the Rate Distortion Function. 10.8 Computation of Channel Capacity and the Rate Distortion Function. Summary. Problems. Historical Notes. 11. Information Theory and Statistics. 11.1 Method of Types. 11.2 Law of Large Numbers. 11.3 Universal Source Coding. 11.4 Large Deviation Theory. 11.5 Examples of Sanov's Theorem. 11.6 Conditional Limit Theorem. 11.7 Hypothesis Testing. 11.8 Chernoff-Stein Lemma. 11.9 Chernoff Information. 11.10 Fisher Information and the Cram-er-Rao Inequality. Summary. Problems. Historical Notes. 12. Maximum Entropy. 12.1 Maximum Entropy Distributions. 12.2 Examples. 12.3 Anomalous Maximum Entropy Problem. 12.4 Spectrum Estimation. 12.5 Entropy Rates of a Gaussian Process. 12.6 Burg's Maximum Entropy Theorem. Summary. Problems. Historical Notes. 13. Universal Source Coding. 13.1 Universal Codes and Channel Capacity. 13.2 Universal Coding for Binary Sequences. 13.3 Arithmetic Coding. 13.4 Lempel-Ziv Coding. 13.5 Optimality of Lempel-Ziv Algorithms. Compression. Summary. Problems. Historical Notes. 14. Kolmogorov Complexity. 14.1 Models of Computation. 14.2 Kolmogorov Complexity: Definitions and Examples. 14.3 Kolmogorov Complexity and Entropy. 14.4 Kolmogorov Complexity of Integers. 14.5 Algorithmically Random and Incompressible Sequences. 14.6 Universal Probability. 14.7 Kolmogorov complexity. 14.9 Universal Gambling. 14.10 Occam's Razor. 14.11 Kolmogorov Complexity and Universal Probability. 14.12 Kolmogorov Sufficient Statistic. 14.13 Minimum Description Length Principle. Summary. Problems. Historical Notes. 15. Network Information Theory. 15.1 Gaussian Multiple-User Channels. 15.2 Jointly Typical Sequences. 15.3 Multiple-Access Channel. 15.4 Encoding of Correlated Sources. 15.5 Duality Between Slepian-Wolf Encoding and Multiple-Access Channels. 15.6 Broadcast Channel. 15.7 Relay Channel. 15.8 Source Coding with Side Information. 15.9 Rate Distortion with Side Information. 15.10 General Multiterminal Networks. Summary. Problems. Historical Notes. 16. Information Theory and Portfolio Theory. 16.1 The Stock Market: Some Definitions. 16.2 Kuhn-Tucker Characterization of the Log-Optimal Portfolio. 16.3 Asymptotic Optimality of the Log-Optimal Portfolio. 16.4 Side Information and the Growth Rate. 16.5 Investment in Stationary Markets. 16.6 Competitive Optimality of the Log-Optimal Portfolio. 16.7 Universal Portfolios. 16.8 Shannon-McMillan-Breiman Theorem (General AEP). Summary. Problems. Historical Notes. 17. Inequalities in Information Theory. 17.1 Basic Inequalities of Information Theory. 17.2 Differential Entropy. 17.3 Bounds on Entropy and Relative Entropy. 17.4 Inequalities for Types. 17.5 Combinatorial Bounds on Entropy. 17.6 Entropy Rates of Subsets. 17.7 Entropy and Fisher Information. 17.8 Entropy Power Inequality and Brunn-Minkowski Inequality. 17.9 Inequalities for Determinants. 17.10 Inequalities for Ratios of Determinants. Summary. Problems. Historical Notes. Bibliography. List of Symbols. Index.

42,928Ā citations


Journal Articleā€¢DOIā€¢
21 Oct 1999-Nature
TL;DR: An algorithm for non-negative matrix factorization is demonstrated that is able to learn parts of faces and semantic features of text and is in contrast to other methods that learn holistic, not parts-based, representations.
Abstract: Is perception of the whole based on perception of its parts? There is psychological and physiological evidence for parts-based representations in the brain, and certain computational theories of object recognition rely on such representations. But little is known about how brains or computers might learn the parts of objects. Here we demonstrate an algorithm for non-negative matrix factorization that is able to learn parts of faces and semantic features of text. This is in contrast to other methods, such as principal components analysis and vector quantization, that learn holistic, not parts-based, representations. Non-negative matrix factorization is distinguished from the other methods by its use of non-negativity constraints. These constraints lead to a parts-based representation because they allow only additive, not subtractive, combinations. When non-negative matrix factorization is implemented as a neural network, parts-based representations emerge by virtue of two properties: the firing rates of neurons are never negative and synaptic strengths do not change sign.

9,911Ā citations


"Interval-valued Matrix Factorizatio..." refers methods in this paper

  • ...In this paper, we pay special attention to Nonnegative Matrix Factorization (NMF) [3], [4] and Probabilistic Matrix Factorization (PMF) [5]....

    [...]


01 Jan 1999-
Abstract: Is perception of the whole based on perception of its parts? There is psychological and physiological evidence for parts-based representations in the brain, and certain computational theories of object recognition rely on such representations. But little is known about how brains or computers might learn the parts of objects. Here we demonstrate an algorithm for non-negative matrix factorization that is able to learn parts of faces and semantic features of text. This is in contrast to other methods, such as principal components analysis and vector quantization, that learn holistic, not parts-based, representations. Non-negative matrix factorization is distinguished from the other methods by its use of non-negativity constraints. These constraints lead to a parts-based representation because they allow only additive, not subtractive, combinations. When non-negative matrix factorization is implemented as a neural network, parts-based representations emerge by virtue of two properties: the firing rates of neurons are never negative and synaptic strengths do not change sign.

9,604Ā citations


Proceedings Articleā€¢
Daniel D. Lee1, H. Sebastian Seung2ā€¢Institutions (2)
01 Jan 2000-
TL;DR: Two different multiplicative algorithms for non-negative matrix factorization are analyzed and one algorithm can be shown to minimize the conventional least squares error while the other minimizes the generalized Kullback-Leibler divergence.
Abstract: Non-negative matrix factorization (NMF) has previously been shown to be a useful decomposition for multivariate data. Two different multiplicative algorithms for NMF are analyzed. They differ only slightly in the multiplicative factor used in the update rules. One algorithm can be shown to minimize the conventional least squares error while the other minimizes the generalized Kullback-Leibler divergence. The monotonic convergence of both algorithms can be proven using an auxiliary function analogous to that used for proving convergence of the Expectation-Maximization algorithm. The algorithms can also be interpreted as diagonally rescaled gradient descent, where the rescaling factor is optimally chosen to ensure convergence.

6,919Ā citations


"Interval-valued Matrix Factorizatio..." refers methods in this paper

  • ...In this paper, we pay special attention to Nonnegative Matrix Factorization (NMF) [3], [4] and Probabilistic Matrix Factorization (PMF) [5]....

    [...]


01 Jan 2001-

5,010Ā citations


Network Information
Related Papers (5)
01 Jan 2014

Ke-Lin Du, Mallappa Kumara Swamy

12 Nov 2014

Viet-Hang Duong, Wen Chi Hsieh +1 more

07 Dec 2020, arXiv: Machine Learning

Matthew Corsetti, Ernest FokouƩ

07 Oct 2010

Robert Peharz, Michael Stark +1 more

Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20211
20191