scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Dynamic Boolean Matrix Factorizations

10 Dec 2012-pp 519-528
TL;DR: This paper proposes a method to dynamically update the Boolean matrix factorization when new data is added to the data base and is extended with a mechanism to improve the factorization with a trade-off in speed of computation.
Abstract: Boolean matrix factorization is a method to decompose a binary matrix into two binary factor matrices. Akin to other matrix factorizations, the factor matrices can be used for various data analysis tasks. Many (if not most) real-world data sets are dynamic, though, meaning that new information is recorded over time. Incorporating this new information into the factorization can require a re-computation of the factorization -- something we cannot do if we want to keep our factorization up-to-date after each update. This paper proposes a method to dynamically update the Boolean matrix factorization when new data is added to the data base. This method is extended with a mechanism to improve the factorization with a trade-off in speed of computation. The method is tested with a number of real-world and synthetic data sets including studying its efficiency against off-line methods. The results show that with good initialization the proposed online and dynamic methods can beat the state-of-the-art offline Boolean matrix factorization algorithms.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: This work reviews several greedy algorithms, and discusses PANDA+, an algorithmic framework able to optimize different cost functions generalized into a unifying formulation, and evaluates the goodness of the algorithm by measuring the quality of the extracted patterns.
Abstract: A major mining task for binary matrixes is the extraction of approximate top-k patterns that are able to concisely describe the input data. The top-k pattern discovery problem is commonly stated as an optimization one, where the goal is to minimize a given cost function, e.g., the accuracy of the data description. In this work, we review several greedy algorithms, and discuss PaNDa+, an algorithmic framework able to optimize different cost functions generalized into a unifying formulation. We evaluated the goodness of the algorithm by measuring the quality of the extracted patterns. We adapted standard quality measures to assess the capability of the algorithm to discover both the items and transactions of the patterns embedded in the data. The evaluation was conducted on synthetic data, where patterns were artificially embedded, and on real-world text collection, where each document is labeled with a topic. Finally, in order to qualitatively evaluate the usefulness of the discovered patterns, we exploited PaNDa+ to detect overlapping communities in a bipartite network. The results show that PaNDa+ is able to discover high-quality patterns in both synthetic and real-world datasets.

76 citations

Journal ArticleDOI
TL;DR: This article focuses on automatic detection of behavioral patterns from the trajectory data of an individual for activity identification as well as daily routine discovery and proposes a novel nominal matrix factorization method under a Bayesian framework with Lasso to extract highly interpretable daily routines.
Abstract: The elderly living in smart homes can have their daily movement recorded and analyzed. As different elders can have their own living habits, a methodology that can automatically identify their daily activities and discover their daily routines will be useful for better elderly care and support. In this article, we focus on automatic detection of behavioral patterns from the trajectory data of an individual for activity identification as well as daily routine discovery. The underlying challenges lie in the need to consider longer-range dependency of the sensor triggering events and spatiotemporal variations of the behavioral patterns exhibited by humans. We propose to represent the trajectory data using a behavior-aware flow graph that is a probabilistic finite state automaton with its nodes and edges attributed with some local behavior-aware features. We identify the underlying subflows as the behavioral patterns using the kernel k-means algorithm. Given the identified activities, we propose a novel nominal matrix factorization method under a Bayesian framework with Lasso to extract highly interpretable daily routines. For empirical evaluation, the proposed methodology has been compared with a number of existing methods based on both synthetic and publicly available real smart home datasets with promising results obtained. We also discuss how the proposed unsupervised methodology can be used to support exploratory behavior analysis for elderly care.

13 citations


Cites background from "Dynamic Boolean Matrix Factorizatio..."

  • ...Boolean matrix factorization (Miettinen 2012) introduces the Boolean operation to deal with binary valued matrices....

    [...]

01 Jan 2014
TL;DR: A number of improvements were implemented in the proposed genetic algorithm, concerning the formation of initial population, kind of fitness function, coding chromosomes, and operation of crossing and mutation.
Abstract: The paper considers an approach to genetic optimization of Virtual Local Area Network (VLAN) scheme using the developed software — VLAN scheme design tool. Authors suggest a formal statement of the problem of VLAN scheme optimization, which solution can improve the reliability and security of operation of corporate computer networks. The paper shows that the problem considered is related to one of the forms of Boolean Matrix Factorization. A number of improvements were implemented in the proposed genetic algorithm, concerning the formation of initial population, kind of fitness function, coding chromosomes, and operation of crossing and mutation. The VLAN scheme design tool allows to solve the problem by genetic optimization, forms a visual representation of the progress of solving the problem and provides an estimation of the genetic algorithm. Experimental results show the proposed genetic algorithm has high effectiveness.

8 citations

Proceedings ArticleDOI
01 Oct 2018
TL;DR: Two hierarchical frameworks capable of identifying cortical activity that i and ii exhibit “common” spatio-temporal characteristics across trials of a given task (or classes of similar tasks) are presented.
Abstract: One of the fundamental problems in the field of neuroscience is characterizing the dynamic changes that occur in the functional interactions among brain networks. In this paper, utilizing Boolean matrix factorization (BMF), we present two hierarchical frameworks capable of identifying cortical activity that i) exhibit “common” spatio-temporal characteristics across trials of a given task (or classes of similar tasks), and ii) exhibit “discriminatory” spatio-temporal characteristics across trials of different (classes of) tasks. The frameworks are developed to analyze data obtained through electroencephalography (EEG). Both frameworks are applied to motor task EEG data, and results are presented and discussed. By probing features of the networks in the spatial and temporal domains, the proposed frameworks provide valuable tools for exploring the dynamics of brain function.

8 citations


Cites methods from "Dynamic Boolean Matrix Factorizatio..."

  • ...Different algorithms for the implementation of the factorization have been recently proposed, as the technique has been receiving increased attention for the analysis of data in a variety of applications including bioinformatics and recommendation algorithms [9]–[13]....

    [...]

Book ChapterDOI
01 Jan 2015
TL;DR: The paper formulates the virtual subnet design task and proposes genetic algorithms as a means to solve it and shows that the problem considered is related to one of the forms of Boolean Matrix Factorization.
Abstract: One of possible levels of computer protection may consist in splitting computer networks into logical chunks that are known as virtual computer networks or virtual subnets. The paper considers a novel approach to determine virtual subnets that is based on the given matrix of logic connectivity of computers. The paper shows that the problem considered is related to one of the forms of Boolean Matrix Factorization. It formulates the virtual subnet design task and proposes genetic algorithms as a means to solve it. Basic improvements proposed in the paper are using trivial solutions to generate an initial population, taking into account in the fitness function the criterion of minimum number of virtual subnets, and using columns of the connectivity matrix as genes of chromosomes. Experimental results show the proposed genetic algorithm has high effectiveness.

7 citations

References
More filters
Journal ArticleDOI
TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.
Abstract: A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries. The particular technique used is singular-value decomposition, in which a large term by document matrix is decomposed into a set of ca. 100 orthogonal factors from which the original matrix can be approximated by linear combination. Documents are represented by ca. 100 item vectors of factor weights. Queries are represented as pseudo-document vectors formed from weighted combinations of terms, and documents with supra-threshold cosine values are returned. initial tests find this completely automatic method for retrieval to be promising.

12,443 citations


"Dynamic Boolean Matrix Factorizatio..." refers methods in this paper

  • ...Extending a matrix factorization is a common problem in Information Retrieval (IR) when latent factor models, such as Latent Semantic Indexing [16], are used....

    [...]

Journal ArticleDOI
Jorma Rissanen1
TL;DR: The number of digits it takes to write down an observed sequence x1,...,xN of a time series depends on the model with its parameters that one assumes to have generated the observed data.

6,254 citations


"Dynamic Boolean Matrix Factorizatio..." refers methods in this paper

  • ...The third algorithm we consider is in many ways a combination of the Asso algorithm with the idea of Panda: Asso+MDL [7], [8] uses the Minimum Description Length (MDL) principle [9] to decide the best parameters τ and k for the Asso algorithm....

    [...]

Book
01 Jan 2005
TL;DR: Preface 1. Events and probability 2. Discrete random variables and expectation 3. Moments and deviations 4. Chernoff bounds 5. Balls, bins and random graphs 6. Probabilistic method 7. Markov chains and random walks 8. Continuous distributions and the Poisson process
Abstract: Preface 1. Events and probability 2. Discrete random variables and expectation 3. Moments and deviations 4. Chernoff bounds 5. Balls, bins and random graphs 6. The probabilistic method 7. Markov chains and random walks 8. Continuous distributions and the Poisson process 9. Entropy, randomness and information 10. The Monte Carlo method 11. Coupling of Markov chains 12. Martingales 13. Pairwise independence and universal hash functions 14. Balanced allocations References.

2,543 citations

Journal ArticleDOI
01 May 2004
TL;DR: In this paper, a simple dictionary with worst case constant lookup time was presented, equaling the theoretical performance of the classic dynamic perfect hashing scheme of Dietzfelbinger et al.
Abstract: We present a simple dictionary with worst case constant lookup time, equaling the theoretical performance of the classic dynamic perfect hashing scheme of Dietzfelbinger et al. [SIAM J. Comput. 23 (4) (1994) 738-761]. The space usage is similar to that of binary search trees. Besides being conceptually much simpler than previous dynamic dictionaries with worst case constant lookup time, our data structure is interesting in that it does not use perfect hashing, but rather a variant of open addressing where keys can be moved back in their probe sequences. An implementation inspired by our algorithm, but using weaker hash functions, is found to be quite practical. It is competitive with the best known dictionaries having an average case (but no nontrivial worst case) guarantee on lookup time.

963 citations

Journal Article
TL;DR: For many applications, a randomized algorithm is often the simplest algorithm available, the fastest, or both.

716 citations