Dynamic Boolean Matrix Factorizations

doi:10.1109/ICDM.2012.118

Home
/
Papers
/
Dynamic Boolean Matrix Factorizations

Proceedings Article•DOI•

Dynamic Boolean Matrix Factorizations

Pauli Miettinen¹•Institutions (1)

Max Planck Society¹

10 Dec 2012-pp 519-528

TL;DR: This paper proposes a method to dynamically update the Boolean matrix factorization when new data is added to the data base and is extended with a mechanism to improve the factorization with a trade-off in speed of computation.

read less

Abstract: Boolean matrix factorization is a method to decompose a binary matrix into two binary factor matrices. Akin to other matrix factorizations, the factor matrices can be used for various data analysis tasks. Many (if not most) real-world data sets are dynamic, though, meaning that new information is recorded over time. Incorporating this new information into the factorization can require a re-computation of the factorization -- something we cannot do if we want to keep our factorization up-to-date after each update. This paper proposes a method to dynamically update the Boolean matrix factorization when new data is added to the data base. This method is extended with a mechanism to improve the factorization with a trade-off in speed of computation. The method is tested with a number of real-world and synthetic data sets including studying its efficiency against off-line methods. The results show that with good initialization the proposed online and dynamic methods can beat the state-of-the-art offline Boolean matrix factorization algorithms.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A Unifying Framework for Mining Approximate Top-k Binary Patterns

[...]

Claudio Lucchese¹, Salvatore Orlando², Raffaele Perego¹•Institutions (2)

Istituto di Scienza e Tecnologie dell'Informazione¹, Ca' Foscari University of Venice²

01 Dec 2014-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This work reviews several greedy algorithms, and discusses PANDA+, an algorithmic framework able to optimize different cost functions generalized into a unifying formulation, and evaluates the goodness of the algorithm by measuring the quality of the extracted patterns.

...read moreread less

Abstract: A major mining task for binary matrixes is the extraction of approximate top-k patterns that are able to concisely describe the input data. The top-k pattern discovery problem is commonly stated as an optimization one, where the goal is to minimize a given cost function, e.g., the accuracy of the data description. In this work, we review several greedy algorithms, and discuss PaNDa+, an algorithmic framework able to optimize different cost functions generalized into a unifying formulation. We evaluated the goodness of the algorithm by measuring the quality of the extracted patterns. We adapted standard quality measures to assess the capability of the algorithm to discover both the items and transactions of the patterns embedded in the data. The evaluation was conducted on synthetic data, where patterns were artificially embedded, and on real-world text collection, where each document is labeled with a topic. Finally, in order to qualitatively evaluate the usefulness of the discovered patterns, we exploited PaNDa+ to detect overlapping communities in a bipartite network. The results show that PaNDa+ is able to discover high-quality patterns in both synthetic and real-world datasets.

...read moreread less

76 citations

Journal Article•DOI•

Automatic Extraction of Behavioral Patterns for Elderly Mobility and Daily Routine Analysis

[...]

Chen Li¹, William W. L. Cheung¹, Jiming Liu¹, Joseph Kee-Yin Ng¹•Institutions (1)

Hong Kong Baptist University¹

01 Jun 2018-ACM Transactions on Intelligent Systems and Technology

TL;DR: This article focuses on automatic detection of behavioral patterns from the trajectory data of an individual for activity identification as well as daily routine discovery and proposes a novel nominal matrix factorization method under a Bayesian framework with Lasso to extract highly interpretable daily routines.

...read moreread less

Abstract: The elderly living in smart homes can have their daily movement recorded and analyzed. As different elders can have their own living habits, a methodology that can automatically identify their daily activities and discover their daily routines will be useful for better elderly care and support. In this article, we focus on automatic detection of behavioral patterns from the trajectory data of an individual for activity identification as well as daily routine discovery. The underlying challenges lie in the need to consider longer-range dependency of the sensor triggering events and spatiotemporal variations of the behavioral patterns exhibited by humans. We propose to represent the trajectory data using a behavior-aware flow graph that is a probabilistic finite state automaton with its nodes and edges attributed with some local behavior-aware features. We identify the underlying subflows as the behavioral patterns using the kernel k-means algorithm. Given the identified activities, we propose a novel nominal matrix factorization method under a Bayesian framework with Lasso to extract highly interpretable daily routines. For empirical evaluation, the proposed methodology has been compared with a number of existing methods based on both synthetic and publicly available real smart home datasets with promising results obtained. We also discuss how the proposed unsupervised methodology can be used to support exploratory behavior analysis for elderly care.

...read moreread less

13 citations

Cites background from "Dynamic Boolean Matrix Factorizatio..."

...Boolean matrix factorization (Miettinen 2012) introduces the Boolean operation to deal with binary valued matrices....
[...]

Design of Virtual Local Area Network Scheme Based on Genetic Optimization and Visual Analysis.

[...]

Igor Saenko, Igor Kotenko

01 Jan 2014

TL;DR: A number of improvements were implemented in the proposed genetic algorithm, concerning the formation of initial population, kind of fitness function, coding chromosomes, and operation of crossing and mutation.

...read moreread less

Abstract: The paper considers an approach to genetic optimization of Virtual Local Area Network (VLAN) scheme using the developed software — VLAN scheme design tool. Authors suggest a formal statement of the problem of VLAN scheme optimization, which solution can improve the reliability and security of operation of corporate computer networks. The paper shows that the problem considered is related to one of the forms of Boolean Matrix Factorization. A number of improvements were implemented in the proposed genetic algorithm, concerning the formation of initial population, kind of fitness function, coding chromosomes, and operation of crossing and mutation. The VLAN scheme design tool allows to solve the problem by genetic optimization, forms a visual representation of the progress of solving the problem and provides an estimation of the genetic algorithm. Experimental results show the proposed genetic algorithm has high effectiveness.

...read moreread less

8 citations

Proceedings Article•DOI•

Identifying Dynamics of Brain Function Via Boolean Matrix Factorization

[...]

Ali Haddad¹, Foroogh Shamsi¹, Li Zhu¹, Laleh Najafizadeh¹•Institutions (1)

Rutgers University¹

01 Oct 2018

TL;DR: Two hierarchical frameworks capable of identifying cortical activity that i and ii exhibit “common” spatio-temporal characteristics across trials of a given task (or classes of similar tasks) are presented.

...read moreread less

Abstract: One of the fundamental problems in the field of neuroscience is characterizing the dynamic changes that occur in the functional interactions among brain networks. In this paper, utilizing Boolean matrix factorization (BMF), we present two hierarchical frameworks capable of identifying cortical activity that i) exhibit “common” spatio-temporal characteristics across trials of a given task (or classes of similar tasks), and ii) exhibit “discriminatory” spatio-temporal characteristics across trials of different (classes of) tasks. The frameworks are developed to analyze data obtained through electroencephalography (EEG). Both frameworks are applied to motor task EEG data, and results are presented and discussed. By probing features of the networks in the spatial and temporal domains, the proposed frameworks provide valuable tools for exploring the dynamics of brain function.

...read moreread less

8 citations

Cites methods from "Dynamic Boolean Matrix Factorizatio..."

...Different algorithms for the implementation of the factorization have been recently proposed, as the technique has been receiving increased attention for the analysis of data in a variety of applications including bioinformatics and recommendation algorithms [9]–[13]....
[...]

Book Chapter•DOI•

A Genetic Approach for Virtual Computer Network Design

[...]

Igor Saenko¹, Igor Kotenko¹•Institutions (1)

Russian Academy of Sciences¹

01 Jan 2015

TL;DR: The paper formulates the virtual subnet design task and proposes genetic algorithms as a means to solve it and shows that the problem considered is related to one of the forms of Boolean Matrix Factorization.

...read moreread less

Abstract: One of possible levels of computer protection may consist in splitting computer networks into logical chunks that are known as virtual computer networks or virtual subnets. The paper considers a novel approach to determine virtual subnets that is based on the given matrix of logic connectivity of computers. The paper shows that the problem considered is related to one of the forms of Boolean Matrix Factorization. It formulates the virtual subnet design task and proposes genetic algorithms as a means to solve it. Basic improvements proposed in the paper are using trivial solutions to generate an initial population, taking into account in the fitness function the criterion of minimum number of virtual subnets, and using columns of the connectivity matrix as genes of chromosomes. Experimental results show the proposed genetic algorithm has high effectiveness.

...read moreread less

7 citations

1
2
3
4
…

References

PDF

Open Access

More filters

Journal Article•DOI•

Indexing by Latent Semantic Analysis

[...]

Scott Deerwester¹, Susan T. Dumais², George W. Furnas², Thomas K. Landauer², Richard A. Harshman³ - Show less +1 more•Institutions (3)

University of Chicago¹, Telcordia Technologies², University of Western Ontario³

01 Sep 1990-Journal of the Association for Information Science and Technology

TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.

...read moreread less

Abstract: A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries. The particular technique used is singular-value decomposition, in which a large term by document matrix is decomposed into a set of ca. 100 orthogonal factors from which the original matrix can be approximated by linear combination. Documents are represented by ca. 100 item vectors of factor weights. Queries are represented as pseudo-document vectors formed from weighted combinations of terms, and documents with supra-threshold cosine values are returned. initial tests find this completely automatic method for retrieval to be promising.

...read moreread less

12,443 citations

"Dynamic Boolean Matrix Factorizatio..." refers methods in this paper

...Extending a matrix factorization is a common problem in Information Retrieval (IR) when latent factor models, such as Latent Semantic Indexing [16], are used....
[...]

Journal Article•DOI•

Paper: Modeling by shortest data description

[...]

Jorma Rissanen¹•Institutions (1)

IBM¹

01 Sep 1978-Automatica

TL;DR: The number of digits it takes to write down an observed sequence x1,...,xN of a time series depends on the model with its parameters that one assumes to have generated the observed data.

...read moreread less

6,254 citations

"Dynamic Boolean Matrix Factorizatio..." refers methods in this paper

...The third algorithm we consider is in many ways a combination of the Asso algorithm with the idea of Panda: Asso+MDL [7], [8] uses the Minimum Description Length (MDL) principle [9] to decide the best parameters τ and k for the Asso algorithm....
[...]

Book•

Probability and Computing: Randomized Algorithms and Probabilistic Analysis

[...]

Michael Mitzenmacher¹, Eli Upfal²•Institutions (2)

Harvard University¹, Brown University²

01 Jan 2005

TL;DR: Preface 1. Events and probability 2. Discrete random variables and expectation 3. Moments and deviations 4. Chernoff bounds 5. Balls, bins and random graphs 6. Probabilistic method 7. Markov chains and random walks 8. Continuous distributions and the Poisson process

...read moreread less

Abstract: Preface 1. Events and probability 2. Discrete random variables and expectation 3. Moments and deviations 4. Chernoff bounds 5. Balls, bins and random graphs 6. The probabilistic method 7. Markov chains and random walks 8. Continuous distributions and the Poisson process 9. Entropy, randomness and information 10. The Monte Carlo method 11. Coupling of Markov chains 12. Martingales 13. Pairwise independence and universal hash functions 14. Balanced allocations References.

...read moreread less

2,543 citations

Journal Article•DOI•

Cuckoo hashing

[...]

Rasmus Pagh¹, Flemming Friche Rodler²•Institutions (2)

IT University of Copenhagen¹, Aalborg University²

01 May 2004

TL;DR: In this paper, a simple dictionary with worst case constant lookup time was presented, equaling the theoretical performance of the classic dynamic perfect hashing scheme of Dietzfelbinger et al.

...read moreread less

Abstract: We present a simple dictionary with worst case constant lookup time, equaling the theoretical performance of the classic dynamic perfect hashing scheme of Dietzfelbinger et al. [SIAM J. Comput. 23 (4) (1994) 738-761]. The space usage is similar to that of binary search trees. Besides being conceptually much simpler than previous dynamic dictionaries with worst case constant lookup time, our data structure is interesting in that it does not use perfect hashing, but rather a variant of open addressing where keys can be moved back in their probe sequences. An implementation inspired by our algorithm, but using weaker hash functions, is found to be quite practical. It is competitive with the best known dictionaries having an average case (but no nontrivial worst case) guarantee on lookup time.

...read moreread less

963 citations

Journal Article•

Probability and computing: randomized algorithms and probabilistic analysis.

[...]

Harald Niederreiter

01 Jan 2006-Mathematics of Computation

TL;DR: For many applications, a randomized algorithm is often the simplest algorithm available, the fastest, or both.

...read moreread less

716 citations