scispace - formally typeset
Search or ask a question

Showing papers on "Logical matrix published in 2015"


Proceedings Article
06 Jul 2015
TL;DR: This paper considers the matrix completion problem when the observations are one-bit measurements of some underlying matrix M, and in particular the observed samples consist only of ones and no zeros, and proposes a "shifted matrix completion" method that recovers M using only a subset of indices corresponding to ones.
Abstract: In this paper, we consider the matrix completion problem when the observations are one-bit measurements of some underlying matrix M, and in particular the observed samples consist only of ones and no zeros. This problem is motivated by modern applications such as recommender systems and social networks where only "likes" or "friendships" are observed. The problem is an instance of PU (positive-unlabeled) learning, i.e. learning from only positive and unlabeled examples that has been studied in the context of binary classification. Under the assumption that M has bounded nuclear norm, we provide recovery guarantees for two different observation models: 1) M parameterizes a distribution that generates a binary matrix, 2) M is thresholded to obtain a binary matrix. For the first case, we propose a "shifted matrix completion" method that recovers M using only a subset of indices corresponding to ones; for the second case, we propose a "biased matrix completion" method that recovers the (thresholded) binary matrix. Both methods yield strong error bounds -- if M e Rn×n the error is bounded as O( 1/(1-ρ)n), where 1 -- ρ denotes the fraction of ones observed. This implies a sample complexity of O(n log n) ones to achieve a small error, when M is dense and n is large. We extend our analysis to the inductive matrix completion problem, where rows and columns of M have associated features. We develop efficient and scalable optimization procedures for both the proposed methods and demonstrate their effectiveness for link prediction (on real-world networks consisting of over 2 million nodes and 90 million links) and semi-supervised clustering tasks.

113 citations


Journal ArticleDOI
TL;DR: The results emphasize the significance of factorizations that provide from-below approximations of the input matrix that help measure the possibly different significance of different matrix entries and suggest where to focus when computing factors.

69 citations


Journal ArticleDOI
TL;DR: In this article, the authors proved that the low-rank matrix approximation problem with respect to the component-wise LRA is NP-hard, already in the rank-one case, using a reduction from MAX CUT.
Abstract: The low-rank matrix approximation problem with respect to the component-wise $\ell_1$-norm ($\ell_1$-LRA), which is closely related to robust principal component analysis (PCA), has become a very popular tool in data mining and machine learning. Robust PCA aims at recovering a low-rank matrix that was perturbed with sparse noise, with applications for example in foreground-background video separation. Although $\ell_1$-LRA is strongly believed to be NP-hard, there is, to the best of our knowledge, no formal proof of this fact. In this paper, we prove that $\ell_1$-LRA is NP-hard, already in the rank-one case, using a reduction from MAX CUT. Our derivations draw interesting connections between $\ell_1$-LRA and several other well-known problems, namely, robust PCA, $\ell_0$-LRA, binary matrix factorization, a particular densest bipartite subgraph problem, the computation of the cut norm of $\{-1,+1\}$ matrices, and the discrete basis problem, which we all prove to be NP-hard.

66 citations


Posted Content
TL;DR: A gamma process dynamic Poisson factor analysis model is proposed to factorize a dynamic count matrix, whose columns are sequentially observed count vectors, and applies to text and music analysis, with state-of-the-art results.
Abstract: A gamma process dynamic Poisson factor analysis model is proposed to factorize a dynamic count matrix, whose columns are sequentially observed count vectors. The model builds a novel Markov chain that sends the latent gamma random variables at time $(t-1)$ as the shape parameters of those at time $t$, which are linked to observed or latent counts under the Poisson likelihood. The significant challenge of inferring the gamma shape parameters is fully addressed, using unique data augmentation and marginalization techniques for the negative binomial distribution. The same nonparametric Bayesian model also applies to the factorization of a dynamic binary matrix, via a Bernoulli-Poisson link that connects a binary observation to a latent count, with closed-form conditional posteriors for the latent counts and efficient computation for sparse observations. We apply the model to text and music analysis, with state-of-the-art results.

48 citations


Journal ArticleDOI
TL;DR: A novel method for alleviating cold start problem for new users and new items by incorporating content-based information about users and items, i.e., tags and keywords, which outperforms other state-of-the-art CF algorithms for historical data, but also has good scalability for new data.
Abstract: Cold start problem for new users and new items is a major challenge facing most collaborative filtering systems. Existing methods to collaborative filtering (CF) emphasize to scale well up to large and sparse dataset, lacking of scalable approach to dealing with new data. In this paper, we consider a novel method for alleviating the problem by incorporating content-based information about users and items, i.e., tags and keywords. The user-item ratings imply the relevance of users' tags to items' keywords, so we convert the direct prediction on the user-item rating matrix into the indirect prediction on the tag-keyword relation matrix that adopts to the emergence of new data. We first propose a novel neighborhood approach for building the tag-keyword relation matrix based on the statistics of tag-keyword pairs in the ratings. Then, with the relation matrix, we propose a 3-factor matrix factorization model over the rating matrix, for learning every user's interest vector for selected tags and every item's correlation vector for extracted keywords. Finally, we integrate the relation matrix with the two kinds of vectors to make recommendations. Experiments on real dataset demonstrate that our method not only outperforms other state-of-the-art CF algorithms for historical data, but also has good scalability for new data.

45 citations


Book ChapterDOI
Huacheng Yu1
06 Jul 2015
TL;DR: This work presents a new combinatorial algorithm for triangle finding and Boolean matrix multiplication that runs in \(\hat{O}(n^3/\log ^4 n)\) time, where the \(\hat {O}\) notation suppresses poly(loglog) factors.
Abstract: We present a new combinatorial algorithm for triangle finding and Boolean matrix multiplication that runs in \(\hat{O}(n^3/\log ^4 n)\) time, where the \(\hat{O}\) notation suppresses poly(loglog) factors. This improves the previous best combinatorial algorithm by Chan [4] that runs in \(\hat{O}(n^3/\log ^3 n)\) time. Our algorithm generalizes the divide-and-conquer strategy of Chan’s algorithm.

43 citations


Proceedings ArticleDOI
10 Sep 2015
TL;DR: It is proved that a Boolean control network is observable, if and only if, the last column for all rows of Uk∗ are one.
Abstract: The observability of Boolean control networks is investigated The pairs of states are classified into three classes: (i)diagonal, (ii) h-distinguishable, and h-indistinguishable For h-indistinguishable pairs, we construct a matrix W called the transferable matrix, which indicates the control-transferability among h-indistinguishable pairs Modifying W yields a matrix U0, which is used as the initial matrix for an iterative algorithm After finite iterations a stable Uk∗ is reached, which is called the observability matrix It is proved that a Boolean control network is observable, if and only if, the last column for all rows of Uk∗ are one Some numerical examples are presented

40 citations


Journal ArticleDOI
TL;DR: This note investigates the logical matrix factorization with application to the topological structure analysis of Boolean networks and presents a size-reduced structure-equivalent logical network for the given system.
Abstract: This note investigates the logical matrix factorization with application to the topological structure analysis of Boolean networks. First, the concepts of both factorization and rank are defined for logical matrices, and two factorization problems are then studied. Using nonsingular logical matrix transformations, several necessary and sufficient conditions for the factorization of a given logical matrix are presented. Second, the logical matrix factorization is applied to the topological structure analysis of a given Boolean network, and a size-reduced structure-equivalent logical network is constructed for the given system. It is shown that the topological structure (including all the fixed points and cycles) of the resulting size-reduced logical network is the same as that of the original Boolean network. The study of an illustrative example shows that the new results presented in this note are effective in analyzing the topological structure of Boolean networks.

29 citations


Proceedings Article
01 Jan 2015
TL;DR: In this paper, a gamma process dynamic Poisson factor analysis model is proposed to factorize a dynamic count matrix, whose columns are sequentially observed count vectors, and the model builds a novel Markov chain that sends the latent gamma random variables at time (t 1) as the shape parameters of those at time t, which are linked to observed or latent counts under the Poisson likelihood.
Abstract: A gamma process dynamic Poisson factor analysis model is proposed to factorize a dynamic count matrix, whose columns are sequentially observed count vectors. The model builds a novel Markov chain that sends the latent gamma random variables at time (t 1) as the shape parameters of those at time t, which are linked to observed or latent counts under the Poisson likelihood. The signicant challenge of inferring the gamma shape parameters is fully addressed, using unique data augmentation and marginalization techniques for the negative binomial distribution. The same nonparametric Bayesian model also applies to the factorization of a dynamic binary matrix, via a BernoulliPoisson link that connects a binary observation to a latent count, with closed-form conditional posteriors for the latent counts and ecient computation for sparse observations. We apply the model to text and music analysis, with state-of-the-art results.

23 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed weighted symmetric binary matrix factorization (wSBMF) framework to detect overlapping communities in bipartite networks, which describes the relationships between two types of nodes.
Abstract: In this paper, we propose weighted symmetric binary matrix factorization (wSBMF) framework to detect overlapping communities in bipartite networks, which describes the relationships between two types of nodes. Our method improves performance by recognizing the distinction between two types of missing edges — ones among the nodes in each node type and the others between two node types. Our method can also explicitly assign community membership and distinguish outliers from overlapping nodes, as well as incorporating existing knowledge on the network. We propose a generalized partition density for bipartite networks as a quality function, which identifies the most appropriate number of communities. The experimental results on both synthetic and real-world networks demonstrate the effectiveness of our method.

19 citations


Patent
04 Feb 2015
TL;DR: In this paper, a deep neural network was used for video classification, which is used for overcoming the defect in the prior art that videos cannot be accurately classified, and improving the degree of accuracy of video classification.
Abstract: The embodiment of the invention discloses a video classification method, and is used for overcoming the defect in the prior art that videos cannot be accurately classified, and improving the degree of accuracy of video classification. The video classification method includes obtaining information in a video, the information in the video including image information, light stream information and acoustic information; utilizing a deep neural network to generate first reference information corresponding to the image information, second reference information corresponding to the light stream information and third reference information corresponding to the acoustic information; processing the video according to the first reference information, the second reference information and the third reference information to obtain a confidence degree matrix of the video and a category relation matrix of the video; and substituting the confidence degree matrix of the video and the category relation matrix of the video into an objective function to obtain an objective fusion parameter of the video, wherein the objective fusion parameter is used for classifying the video.

Posted Content
TL;DR: This empirical study demonstrates that message passing is able to recover low-rank Boolean matrices, in the boundaries of theoretically possible recovery and compares favorably with state-of-the-art in real-world applications, such collaborative filtering with large-scale Boolean data.
Abstract: Boolean matrix factorization and Boolean matrix completion from noisy observations are desirable unsupervised data-analysis methods due to their interpretability, but hard to perform due to their NP-hardness. We treat these problems as maximum a posteriori inference problems in a graphical model and present a message passing approach that scales linearly with the number of observations and factors. Our empirical study demonstrates that message passing is able to recover low-rank Boolean matrices, in the boundaries of theoretically possible recovery and compares favorably with state-of-the-art in real-world applications, such collaborative filtering with large-scale Boolean data.

Proceedings Article
01 Jan 2015
TL;DR: This work proposes and study an approach for collabora- tive filtering, which is based on Boolean matrix factorisation and exploits additional (context) information about users and items and uses an adjusted type of projection of a target user to the obtained factor space.
Abstract: In this work we propose and study an approach for collabora- tive filtering, which is based on Boolean matrix factorisation and exploits additional (context) information about users and items. To avoid simi- larity loss in case of Boolean representation we use an adjusted type of projection of a target user to the obtained factor space. We have com- pared the proposed method with SVD-based approach on the MovieLens dataset. The experiments demonstrate that the proposed method has better MAE and Precision and comparable Recall and F-measure. We also report an increase of quality in the context information presence.

Proceedings ArticleDOI
28 Dec 2015
TL;DR: This work proposes introducing a prior distribution over the spatial correlation matrices called the DOA mixture model instead of using the weighted sum model, and shows that the proposed method provided 1.94 [dB] improvement compared with the previous method in terms of the the signal-to-distortion ratios of separated signals.
Abstract: We deal with the problems of blind source separation, dereverberation, audio event detection and direction-of-arrival (DOA) estimation. We previously proposed a generative model of multichannel signals called the multichannel facto rial hidden Markov model, which allows us to simultaneously solve these problems through a joint optimization problem formulation. In this approach, we modeled the spatial cor relation matrix of each source as a weighted sum of the spatial correlation matrices corresponding to all possible DOAs. However, it became clear through real environment experiments that the estimate of the spatial correlation matrix tended to deviate from the actual correlation matrix since the plane wave assumption does not hold due to reverber ation and noise components. To handle such deviations, we propose introducing a prior distribution over the spatial correlation matrices called the DOA mixture model instead of using the weighted sum model. The experiment showed that the proposed method provided 1.94 [dB] improvement compared with our previous method in terms of the the signal-to-distortion ratios of separated signals.

Patent
23 Dec 2015
TL;DR: In this article, a method for analyzing and recognizing the structure of a handwritten mathematical formula in a natural scene image is presented. But the method is not suitable for the classification of handwritten mathematical formulas.
Abstract: The invention provides a method for analyzing and recognizing the structure of a handwritten mathematical formula in a natural scene image. The method comprises the steps of S1, converting the gray matrix of a natural scene image into a local contrast matrix, and conducting the binary classification on the local contrast matrix based on the otsu method to obtain a binary matrix; S2, analyzing the connected domains of the binary matrix obtained in the step S1, and removing non-character type connected domains to obtain character type connected domains; S3, detecting formula structural elements and other special structural elements in the character type connected domains based on the correlation coefficient method, and separately marking out all detected special structural elements; S4, dividing the binary matrix obtained in the step S1 based on the horizontal projection method; S5, recognizing each character type connected domain via a convolutional neural network; S6, defining an output sequence and outputting recognized results according to the corresponding sequence in the latex layout format. According to the technical scheme of the invention, by means of the method, the expression problem of elementary mathematical formulas during the OCR recognition process can be effectively solved.

Posted Content
20 Jul 2015
TL;DR: In this paper, the authors provide a framework to design and analyze various recommendation algorithms, including item-item collaborative filtering, which achieves good cold-start performance by quickly making good recommendations to new users about whom there is little information.
Abstract: There is much empirical evidence that item-item collaborative filtering works well in practice. Motivated to understand this, we provide a framework to design and analyze various recommendation algorithms. The setup amounts to online binary matrix completion, where at each time a random user requests a recommendation and the algorithm chooses an entry to reveal in the user's row. The goal is to minimize regret, or equivalently to maximize the number of +1 entries revealed at any time. We analyze an item-item collaborative filtering algorithm that can achieve fundamentally better performance compared to user-user collaborative filtering. The algorithm achieves good "cold-start" performance (appropriately defined) by quickly making good recommendations to new users about whom there is little information.

Proceedings ArticleDOI
12 Nov 2015
TL;DR: A novel Non-negative Matrix Factorization based on the logistic link function for decomposition of binary data finds that choosing the number of components is an essential part in the modelling and interpretation, that is still unresolved.
Abstract: We propose the Logistic Non-negative Matrix Factorization for decomposition of binary data. Binary data are frequently generated in e.g. text analysis, sensory data, market basket data etc. A common method for analysing non-negative data is the Non-negative Matrix Factorization, though this is in theory not appropriate for binary data, and thus we propose a novel Non-negative Matrix Factorization based on the logistic link function. Furthermore we generalize the method to handle missing data. The formulation of the method is compared to a previously proposed logistic matrix factorization without non-negativity constraint on the features. We compare the performance of the Logistic Non-negative Matrix Factorization to Least Squares Non-negative Matrix Factorization and Kullback-Leibler (KL) Non-negative Matrix Factorization on sets of binary data: a synthetic dataset, a set of student comments on their professors collected in a binary term-document matrix and a sensory dataset. We find that choosing the number of components is an essential part in the modelling and interpretation, that is still unresolved.

Patent
11 Dec 2015
TL;DR: In this paper, an adaptive tile matrix representation of input matrix A may then be created, which is then used to perform dynamic tile-granular optimization based on density estimates and a cost model.
Abstract: According to some embodiments, matrix A data may be loaded into a temporary, unordered starting representation that contains coordinates and values for each element of matrix A. Z-curve ordering of matrix A may be performed to create a two-dimensional density map of matrix A by counting matrix elements that are contained in logical two-dimensional block cells of a given size. A quad-tree recursion may be executed on the two-dimensional density map structure in reduced Z-space to identify areas of different densities in the two dimensional matrix space. An adaptive tile matrix representation of input matrix A may then be created. According to some embodiments, an adaptive tile matrix multiplication operation may perform dynamic tile-granular optimization based on density estimates and a cost model.

Posted Content
TL;DR: The first results for the general case of low rank approximation of binary matrices for both $\mathrm{GF}(2)$ and Boolean model are given and it is shown that the low rank binary matrix approximation problem is NP-hard even for $k=1$, solving a conjecture in \cite{Koyuturk03}.
Abstract: We consider the problem of low rank approximation of binary matrices. Here we are given a $d \times n$ binary matrix $A$ and a small integer $k < d$. The goal is to find two binary matrices $U$ and $V$ of sizes $d \times k$ and $k \times n$ respectively, so that the Frobenius norm of $A-U V$ is minimized. There are two models of this problem, depending on the definition of the product of binary matrices: The $\mathrm{GF}(2)$ model and the Boolean semiring model. Previously, the only known results are $2$-approximation algorithms for the special case $k=1$ \cite{KDD:ShenJY09, Jiang14} (where the two models are equivalent). In this paper, we give the first results for the general case $k>1$ for both $\mathrm{GF}(2)$ and Boolean model. For the $\mathrm{GF}(2)$ model, we show that a simple column-selection algorithm achieves $O(k)$-approximation. For the Boolean model, we develop a new algorithm and show that it is $O(2^k)$-approximation. For constant $k$, both algorithms run in polynomial time in the size of the matrix. We also show that the low rank binary matrix approximation problem is NP-hard even for $k=1$, solving a conjecture in \cite{Koyuturk03}.

Patent
25 Mar 2015
TL;DR: In this article, an active learning method and a device based on network data is presented, where a sample relation matrix is established by means of unlabeled samples, and the sample relation matrices contains the information content of each sample and the relation information content between every two unlabelled samples.
Abstract: The invention provides an active learning method and device based on network data. A sample relation matrix is established by means of unlabeled samples, the sample relation matrix contains the information content of each sample and the relation information content between every two unlabeled samples, and the sample relation matrix considers the non-independent distribution property of the samples, namely the relation property between every two samples; an objective function is established based on the sample relation matrix, the larger the value of the objective function is, the larger the information content of a selected sample is, multiple unlabelled samples enabling the value of the objective function to be the largest are obtained by resolving the objective function, and an optimal sample subset is formed by the multiple unlabeled samples. According to the method and device, the optimal sample subset can be selected from a large amount of unlabelled network data so that a classifier model can be reconstructed after the selected sample subset is labeled manually, and then the network data classification performance of the classifier model is improved.

Patent
07 Oct 2015
TL;DR: In this paper, the authors proposed an article scoring and recommending method of a social network, which comprises the following steps of 1) establishing a user-article score bigraph according to the scores of a user set U to the scored article set I, calculating the similarity between two users in the user set u, predicting the score of a target user u in U to a to-be-scored article j, and obtaining a prediction user-articleship bigraph.
Abstract: The present invention relates to an article scoring and recommending method of a social network. The method comprises the following steps of 1) establishing a user-article score bigraph according to the scores of a user set U to the scored article set I, calculating the similarity between two users in the user set U, predicting the score of a target user u in the user set U to a to-be-scored article j, and obtaining a prediction user-article score bigraph; 2) establishing a user-user friendly relation nonseparable graph according to the social network of the user set U, and calculating to obtain a user-article score matrix R and a user-user friendly relation matrix A according to the prediction user-article score bigraph and the user-user friendly relation nonseparable graph; 3) selecting a weight a, and establishing an article recommending fusion matrix X by making the user-article score matrix R and the user-user friendly relation matrix A in identical trend; 4) according to the article recommending fusion matrix X, recommending an article to the target user u in the user set U. Compared with the prior art, the article scoring and recommending method of the social network of the present invention has the advantages of advanced method, high feasibility, etc.

Proceedings ArticleDOI
01 Nov 2015
TL;DR: A novel gray image encryption algorithm based on chaotic mapping and DNA (Deoxyribonucleic Acid) encoding that can resistant the several attacks such as chosen plaintext attack, brute-force attack, and statistic attack is proposed.
Abstract: With the rapid development of network, more and more digital images need to be stored and communicated. Due to the openness and network sharing, the problems of digital image security become an important threat In this paper, we propose a novel gray image encryption algorithm based on chaotic mapping and DNA (Deoxyribonucleic Acid) encoding. We solve the error of irreversibility of a previous work, which can only encrypt the plain image, and cannot decrypt the cipher image with the correct secret key and can be attacked by the chosen plaintext. To make the algorithm invertible, we encode the input gray image by DNA encoding and generate a random matrix based on the logistic chaotic mapping. The DNA addition operation is conducted on the random matrix follow by the DNA complement operation guided by a random binary matrix generate by 2 logistic chaotic mapping sequences. We solve the problem of the irreversibility successfully. In addition, the algorithm can now resistant the several attacks such as chosen plaintext attack, brute-force attack, and statistic attack.

Patent
18 Nov 2015
TL;DR: In this paper, a data reading algorithm that locates at least a portion of the data modules within the image without using a fixed pattern, fits a model of the module positions from the image, extrapolates the model resulting in predicted module positions, determines module values from an image at the predicted position positions, and extracts a binary matrix from the module values.
Abstract: Systems and methods for reading a two-dimensional matrix symbol or for determining if a two-dimensional matrix symbol is decodable are disclosed. The systems and methods can include a data reading algorithm that receives an image, locates at least a portion of the data modules within the image without using a fixed pattern, fits a model of the module positions from the image, extrapolates the model resulting in predicted module positions, determines module values from the image at the predicted module positions, and extracts a binary matrix from the module values.

Journal ArticleDOI
TL;DR: It is shown that OIM is almost the same as DRM in the tiling phase, and becomes less precise after interpretation, and it is proved that the consistency of a complete basic OIM network can be decided in cubic time.
Abstract: How to express and reason with cardinal directions between extended objects such as lines and regions is an important problem in qualitative spatial reasoning (QSR), a common subfield of geographical information science and Artificial Intelligence (AI). The direction relation matrix (DRM) model, proposed by Goyal and Egenhofer in 1997, is one very expressive relation model for this purpose. Unlike many other relation models in QSR, the set-theoretic converse of a DRM relation is not necessarily representable in DRM. Schneider et al. regard this as a serious shortcoming and propose, in their work published in ACM TODS (2012), the objects interaction matrix (OIM) model for modelling cardinal directions between complex regions. OIM is also a tiling-based model that consists of two phases: the tiling phase and the interpretation phase. Although it was claimed that OIM is a novel concept, we show that it is not so different from DRM if we represent the cardinal direction of two regions a and b by both the DRM ...

Journal ArticleDOI
TL;DR: An integrated evaluation approach based on the HDSM with emphasis on its application for supporting the top-down design process of CoPS is explored and the effectiveness and potential of the proposed approach is highlighted through a real-world case dealing with the development of a large tonnage crawler crane.
Abstract: To enable a more optimised process model to effectively support the complex product and system CoPS development, the hierarchical design structure matrix HDSM is refined from the traditional design structure matrix to exhibit the correlative iterations and structure-evolvement within design activities. When implementing a top-down design process, the required modelling method is engraving with the traits, which are of great consideration through the decomposition of design activities, of hierarchy, coupling and complexity. Explored in this paper is an integrated evaluation approach based on the HDSM with emphasis on its application for supporting the top-down design process of CoPS. After the dependence degree between different levels in the top-down design process is analysed when the HDSM constructed, a concordant method based on triangular fuzzy number converts binary matrix design to the weighted form representing the coupling strength. Purpose served in this approach, including strengthening the internal polymerisation and sorting, can be demonstrated among design activities. The relevant issues having been developed and the coupling analysis implemented, the effectiveness and potential of the proposed approach is highlighted through a real-world case dealing with the development of a large tonnage crawler crane.

Journal ArticleDOI
TL;DR: A new feature selection method named binary matrix shuffling filter was used in neuronal morphology classification, which showed optimal performance and exhibited broad generalization ability in five random replications of neuron datasets.
Abstract: A prerequisite to understand neuronal function and characteristic is to classify neuron correctly. The existing classification techniques are usually based on structural characteristic and employ principal component analysis to reduce feature dimension. In this work, we dedicate to classify neurons based on neuronal morphology. A new feature selection method named binary matrix shuffling filter was used in neuronal morphology classification. This method, coupled with support vector machine for implementation, usually selects a small amount of features for easy interpretation. The reserved features are used to build classification models with support vector classification and another two commonly used classifiers. Compared with referred feature selection methods, the binary matrix shuffling filter showed optimal performance and exhibited broad generalization ability in five random replications of neuron datasets. Besides, the binary matrix shuffling filter was able to distinguish each neuron type from other types correctly; for each neuron type, private features were also obtained.

Book ChapterDOI
07 Sep 2015
TL;DR: A unified theory, based on generalized outer product operators, that encompasses many pattern set mining tasks that immediately applies to a large number of data mining problems, and hopefully allow generalizing future results and algorithms, as well.
Abstract: Matrix factorizations are a popular tool to mine regularities from data. There are many ways to interpret the factorizations, but one particularly suited for data mining utilizes the fact that a matrix product can be interpreted as a sum of rank-1 matrices. Then the factorization of a matrix becomes the task of finding a small number of rank-1 matrices, sum of which is a good representation of the original matrix. Seen this way, it becomes obvious that many problems in data mining can be expressed as matrix factorizations with correct definitions of what a rank-1 matrix and a sum of rank-1 matrices mean. This paper develops a unified theory, based on generalized outer product operators, that encompasses many pattern set mining tasks. The focus is on the computational aspects of the theory and studying the computational complexity and approximability of many problems related to generalized matrix factorizations. The results immediately apply to a large number of data mining problems, and hopefully allow generalizing future results and algorithms, as well.

Journal ArticleDOI
TL;DR: A generalized partition density for bipartite networks as a quality function, which identifies the most appropriate number of communities and can also explicitly assign community membership and distinguish outliers from overlapping nodes.
Abstract: In this paper we propose weighted symmetric binary matrix factorization (wSBMF) framework to detect overlapping communities in bipartite networks, which describe relationships between two types of nodes. Our method improves performance by recognizing the distinction between two types of missing edges---ones among the nodes in each node type and the others between two node types. Our method can also explicitly assign community membership and distinguish outliers from overlapping nodes, as well as incorporating existing knowledge on the network. We propose a generalized partition density for bipartite networks as a quality function, which identifies the most appropriate number of communities. The experimental results on both synthetic and real-world networks demonstrate the effectiveness of our method.

Posted Content
TL;DR: In this paper, the authors consider the case where defective items are random and follow simple probability distributions and show how distance, a pairwise property of the columns of the matrix, translates to a (t+1)-wise property.
Abstract: In a group testing scheme, set of tests are designed to identify a small number t of defective items that are present among a large numberN of items. Each test takes as input a group of items and produces a binary output indicating whether any defective item is present in the group. In a non-adaptive scheme the tests have to be designed in one-shot. In this setting, designing a testing scheme is equivalent to the construction of a disjunct matrix, anM N binary matrix where the union of supports of anyt columns does not contain the support of any other column. In principle, one wants to have such a matrix with minimum possible number M of rows. In this paper we consider the scenario where defective items are random and follow simple probability distributions. In particular we consider the cases where 1) each item can be defective independently with probability t N and 2) each t-set of items can be defective with uniform probability. In both cases our aim is to design a testing matrix that successfully identifies the set of defectives with high probability. Both of these models have been studied in the literature before and it is known that O(t logN) tests are necessary as well as sufficient (via random coding) in both cases. Our main focus is explicit deterministic construction of the test matrices amenable to above scenarios. One of the most popular ways of constructing test matrices relies on constant-weight error-correcting codes and their minimum distance. In particular, it is known that codes result in test matrices with O(t 2 logN) rows that identify anyt defectives. We go beyond the minimum distance analysis and connect the average distance of a constant weight code to the parameters of the resulting test matrix. Indeed, we show how distance, a pairwise property of the columns of the matrix, translates to a (t+1)-wise property

Journal ArticleDOI
TL;DR: The adaptation consists of determining types of logical functions for composite components of the logical network by representing it in the form of a polynomial whose coefficients are specified by a Hadamard matrix or a Zhegalkin polynometric.
Abstract: The authors consider the problem of adaptation of a logical network composed of universal logical elements to the solution of the problem of classification of input sets of binary vectors. The adaptation consists of determining types of logical functions for composite components of the logical network by representing it in the form of a polynomial whose coefficients are specified by a Hadamard matrix or a Zhegalkin polynomial.