TL;DR: The Interval-valued Matrix Factorization (IMF) framework is proposed and it is shown that proposed I-NMF and I-PMF significantly outperform their single-valued counterparts in FA and CF applications.
Abstract: In this paper, we propose the Interval-valued Matrix Factorization (IMF) framework. Matrix Factorization (MF) is a fundamental building block of data mining. MF techniques, such as Nonnegative Matrix Factorization (NMF) and Probabilistic Matrix Factorization (PMF), are widely used in applications of data mining. For example, NMF has shown its advantage in Face Analysis (FA) while PMF has been successfully applied to Collaborative Filtering (CF). In this paper, we analyze the data approximation in FA as well as CF applications and construct interval-valued matrices to capture these approximation phenomenons. We adapt basic NMF and PMF models to the interval-valued matrices and propose Interval-valued NMF (I-NMF) as well as Interval-valued PMF (I-PMF). We conduct extensive experiments to show that proposed I-NMF and I-PMF significantly outperform their single-valued counterparts in FA and CF applications.
TL;DR: This paper proposes matrix decomposition techniques that consider the existence of interval-valued data and shows that naive ways to deal with such imperfect data may introduce errors in analysis and present factorization techniques that are especially effective when the amount of imprecise information is large.
Abstract: With many applications relying on multi-dimensional datasets for decision making, matrix factorization (or decomposition) is becoming the basis for many knowledge discoveries and machine learning tasks, from clustering, trend detection, anomaly detection, to correlation analysis. Unfortunately, a major shortcoming of matrix analysis operations is that, despite their effectiveness when the data is scalar, these operations become difficult to apply in the presence of non-scalar data, as they are not designed for data that include non-scalar observations, such as intervals. Yet, in many applications, the available data are inherently non-scalar for various reasons, including imprecision in data collection, conflicts in aggregated data, data summarization, or privacy issues, where one is provided with a reduced, clustered, or intentionally noisy and obfuscated version of the data to hide information. In this paper, we propose matrix decomposition techniques that consider the existence of interval-valued data. We show that naive ways to deal with such imperfect data may introduce errors in analysis and present factorization techniques that are especially effective when the amount of imprecise information is large.
4 citations
Cites background or methods from "Interval-valued Matrix Factorizatio..."
...As discussed above, interval NMF and PMF [9] also have been studied to resolve alignment approximation in face analysis and rating approximation in collaborative filtering....
[...]
...As the chart shows, the prediction accuracy of all algorithms improves as we consider higher decomposition ranks and the proposed latent semantic alignment based approach, AIPMF, leads to better prediction performance than both PMF and I-PMF, for decomposition ranks > 60....
[...]
...[9] extended these to interval-valued matrices as follows:...
[...]
...As described in Section 6.1.2, we also compare proposed ISVD approaches with NMF and I-NMF [9] for the face analysis tasks: data reconstruction and classification....
[...]
...For collaborative filtering with social media data, discussed in Section 6.1.3, we used PMF and I-PMF [9] as competitors....
TL;DR: In this article , the Tensor-Train technique is extended to deal with uncertain data, here modeled as intervals, and the authors propose a way to address this issue by extending the known tensor-train technique for tensor decomposition in order to handle uncertain data.
Abstract: In many fields of computer science, tensor decomposition techniques are increasingly being adopted as the core of many applications that rely on multi-dimensional datasets for implementing knowledge discovery tasks. Unfortunately, a major shortcoming of state-of-the-art tensor analyses is that, despite their effectiveness when the data is certain, these operations become difficult to apply, or altogether inapplicable, in the presence of uncertainty in the data, a circumstance common to many real-world scenarios. In this paper we propose a way to address this issue by extending the known Tensor-Train technique for tensor factorization in order to deal with uncertain data, here modeled as intervals. Working with interval-valued data, however, presents numerous challenges, since many algebraic operations that form the building blocks of the factorization process, as well as the properties that make these procedures useful for knowledge discovery, cannot be easily extended from their scalar counterparts, and often require some approximation (including, though it is not only the case, for keeping computational costs manageable). These challenges notwithstanding, our proposed techniques proved to be reasonably effective, and are supported by a thorough experimental validation.
TL;DR: A probabilistic model for analyzing the generalized interval valued matrix, a matrix that has scalar valued elements and bounded/unbounded interval valued elements, is proposed and it is proved that the objective function is monotonically decreasing by the parameter update.
Abstract: In this paper, we propose a probabilistic model for analyzing the generalized interval valued matrix, a matrix that has scalar valued elements and bounded/unbounded interval valued elements. We derive a majorization minimization algorithm for parameter estimation and prove that the objective function is monotonically decreasing by the parameter update. An experiment shows that the proposed model well handles interval- valued elements and offers improved performance.
TL;DR: Privacy-Preserving Data Mining: Models and Algorithms proposes a number of techniques to perform the data mining tasks in a privacy-preserving way and is designed for researchers, professors, and advanced-level students in computer science.
Abstract: Advances in hardware technology have increased the capability to store and record personal data about consumers and individuals, causing concerns that personal data may be used for a variety of intrusive or malicious purposes. Privacy-Preserving Data Mining: Models and Algorithms proposes a number of techniques to perform the data mining tasks in a privacy-preserving way. These techniques generally fall into the following categories: data modification techniques, cryptographic methods and protocols for data sharing, statistical techniques for disclosure and inference control, query auditing methods, randomization and perturbation-based techniques. This edited volume contains surveys by distinguished researchers in the privacy field. Each survey includes the key research content as well as future research directions. Privacy-Preserving Data Mining: Models and Algorithms is designed for researchers, professors, and advanced-level students in computer science, and is also suitable for industry practitioners.
575 citations
"Interval-valued Matrix Factorizatio..." refers background in this paper
...Keywords-Matrix factorization, uncertainty
I. INTRODUCTION
Exploring data approximation has attracted much attention in uncertain data mining [1] and privacy preserving data mining [2]....
TL;DR: This paper provides a survey of uncertain data mining and management applications, and discusses different methodologies to process and mine uncertain data in a variety of forms.
Abstract: In recent years, a number of indirect data collection methodologies have lead to the proliferation of uncertain data. Such data points are often represented in the form of a probabilistic function, since the corresponding deterministic value is not known. This increases the challenge of mining and managing uncertain data, since the precise behavior of the underlying data is no longer known. In this paper, we provide a survey of uncertain data mining and management applications. In the field of uncertain data management, we will examine traditional methods such as join processing, query processing, selectivity estimation, OLAP queries, and indexing. In the field of uncertain data mining, we will examine traditional mining problems such as classification and clustering. We will also examine a general transform based technique for mining uncertain data. We discuss the models for uncertain data, and how they can be leveraged in a variety of applications. We discuss different methodologies to process and mine uncertain data in a variety of forms.
497 citations
"Interval-valued Matrix Factorizatio..." refers background in this paper
...Keywords-Matrix factorization, uncertainty
I. INTRODUCTION
Exploring data approximation has attracted much attention in uncertain data mining [1] and privacy preserving data mining [2]....
TL;DR: This paper construct an affinity graph to encode the geometrical information and seek a matrix factorization which respects the graph structure and demonstrates the success of this novel algorithm by applying it on real world problems.
Abstract: Recently non-negative matrix factorization (NMF) has received a lot of attentions in information retrieval, computer vision and pattern recognition. NMF aims to find two non-negative matrices whose product can well approximate the original matrix. The sizes of these two matrices are usually smaller than the original matrix. This results in a compressed version of the original data matrix. The solution of NMF yields a natural parts-based representation for the data. When NMF is applied for data representation, a major disadvantage is that it fails to consider the geometric structure in the data. In this paper, we develop a graph based approach for parts-based data representation in order to overcome this limitation. We construct an affinity graph to encode the geometrical information and seek a matrix factorization which respects the graph structure. We demonstrate the success of this novel algorithm by applying it on real world problems.
TL;DR: A new approach to fitting a linear regression model to symbolic interval data based on the estimation of the average behaviour of both the root mean square error and the square of the correlation coefficient in the framework of a Monte Carlo experiment is introduced.
223 citations
"Interval-valued Matrix Factorizatio..." refers methods in this paper
...1) Data Description and Evaluation Setting: In this part of experiments, we also use two data sets for evaluation....
Q1. What have the authors contributed in "Interval-valued matrix factorization with applications" ?
In this paper, the authors propose the Interval-valued Matrix Factorization ( IMF ) framework. In this paper, the authors analyze the data approximation in FA as well as CF applications and construct interval-valued matrices to capture these approximation phenomenons. The authors adapt basic NMF and PMF models to the interval-valued matrices and propose Interval-valued NMF ( I-NMF ) as well as Intervalvalued PMF ( I-PMF ). The authors conduct extensive experiments to show that proposed I-NMF and I-PMF significantly outperform their single-valued counterparts in FA and CF applications.
Q2. What is the way to evaluate the IMF framework?
The evaluations over multiple real-life data sets with different experimental settings show that I-NMF and I-PMF, which take these interval-valued matrices as input, significantly outperform their corresponding single-valued counterparts.
Q3. How do the authors propose the IMF framework?
5http://www.mit.edu/∼rsalakhu/BPMF.htmlIn this paper the authors propose the IMF framework which injects data approximation into traditional MF via taking intervalvalued matrices as input.