scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Behavior Analysis through Routine Cluster Discovery in Ubiquitous Sensor Data

TL;DR: This work proposes a novel clustering technique for BA which can find hidden routines in ubiquitous data and also captures the pattern in the routines and efficiently works on high dimensional data for BA without performing any computationally expensive reduction operations.
Abstract: Behavioral analysis (BA) on ubiquitous sensor data is the task of finding the latent distribution of features for modeling user-specific characteristics. These characteristics, in turn, can be used for a number of tasks including resource management, power efficiency, and smart home applications. In recent years, the employment of topic models for BA has been found to successfully extract the dynamics of the sensed data. Topic modeling is popularly performed on text data for mining inherent topics. The task of finding the latent topics in textual data is done in an unsupervised manner. In this work we propose a novel clustering technique for BA which can find hidden routines in ubiquitous data and also captures the pattern in the routines. Our approach efficiently works on high dimensional data for BA without performing any computationally expensive reduction operations. We evaluate three different techniques namely LDA, the Non-negative Matrix Factorization (NMF) and the Probabilistic Latent Semantic Analysis (PLSA) for comparative study. We have analyzed the efficiency of the methods by using performance indices like perplexity and silhouette on three real-world ubiquitous sensor datasets namely, the Intel Lab Data, Kyoto Data, and MERL data. Through rigorous experiments, we achieve silhouette scores of 0.7049 over the Intel Lab dataset, 0.6547 over the Kyoto dataset and 0.8312 over the MERL dataset for clustering.
Citations
More filters
Journal ArticleDOI
TL;DR: This survey provides a comprehensive summary of recent research on AI-based algorithms for intelligent sensing and presents a comparative analysis of algorithms, models, influential parameters, available datasets, applications and projects in the area of intelligent sensing.
Abstract: In recent years, intelligent sensing has gained significant attention because of its autonomous decision-making ability to solve complex problems. Today, smart sensors complement and enhance the capabilities of human beings and have been widely embraced in numerous application areas. Artificial intelligence (AI) has made astounding growth in domains of natural language processing, machine learning (ML), and computer vision. The methods based on AI enable a computer to learn and monitor activities by sensing the source of information in a real-time environment. The combination of these two technologies provides a promising solution in intelligent sensing. This survey provides a comprehensive summary of recent research on AI-based algorithms for intelligent sensing. This work also presents a comparative analysis of algorithms, models, influential parameters, available datasets, applications and projects in the area of intelligent sensing. Furthermore, we present a taxonomy of AI models along with the cutting edge approaches. Finally, we highlight challenges and open issues, followed by the future research directions pertaining to this exciting and fast-moving field.

5 citations

References
More filters
Journal ArticleDOI
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Abstract: We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.

30,570 citations


Additional excerpts

  • ...such data, we first show a comparative analysis of three topic models namely Latent Dirichlet Analysis (LDA) [3] which is a generative statistical model where a set of observations are explained by unobserved groups....

    [...]

Proceedings Article
03 Jan 2001
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Abstract: We propose a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI) [3]. In the context of text modeling, our model posits that each document is generated as a mixture of topics, where the continuous-valued mixture proportions are distributed as a latent Dirichlet random variable. Inference and learning are carried out efficiently via variational algorithms. We present empirical results on applications of this model to problems in text modeling, collaborative filtering, and text classification.

25,546 citations

Posted Content
TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.
Abstract: We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.

20,077 citations

Journal ArticleDOI
TL;DR: The decomposition of A is called the singular value decomposition (SVD) and the diagonal elements of ∑ are the non-negative square roots of the eigenvalues of A T A; they are called singular values.
Abstract: Let A be a real m×n matrix with m≧n. It is well known (cf. [4]) that $$A = U\sum {V^T}$$ (1) where $${U^T}U = {V^T}V = V{V^T} = {I_n}{\text{ and }}\sum {\text{ = diag(}}{\sigma _{\text{1}}}{\text{,}} \ldots {\text{,}}{\sigma _n}{\text{)}}{\text{.}}$$ The matrix U consists of n orthonormalized eigenvectors associated with the n largest eigenvalues of AA T , and the matrix V consists of the orthonormalized eigenvectors of A T A. The diagonal elements of ∑ are the non-negative square roots of the eigenvalues of A T A; they are called singular values. We shall assume that $${\sigma _1} \geqq {\sigma _2} \geqq \cdots \geqq {\sigma _n} \geqq 0.$$ Thus if rank(A)=r, σ r+1 = σ r+2=⋯=σ n = 0. The decomposition (1) is called the singular value decomposition (SVD).

3,036 citations


"Behavior Analysis through Routine C..." refers methods in this paper

  • ...So actually multiplying the three matrices gives P (w|d) = ∑ t Z P (w|t)P (t|d) Let number of documents be m and size of vocabulary be n, i.e A have a dimension of m × n, After SVD, dimension of U will be m× k, S will be k × k and of V will be k × n. 2) Non-Negative Matrix Factorization (NMF): This is also a matrix factorization method similar to SVD, with the constraint that the matrices have to be non-negative....

    [...]

  • ...SVD gives three matrices decomposed from the input as the output....

    [...]

  • ...For the implementation of PLSA, we use Singular Value Decomposition (SVD) [13] to factorize the matrix, allowing us to work with matrices of lower dimensions....

    [...]

Proceedings Article
30 Jul 1999
TL;DR: This work proposes a widely applicable generalization of maximum likelihood model fitting by tempered EM, based on a mixture decomposition derived from a latent class model which results in a more principled approach which has a solid foundation in statistics.
Abstract: Probabilistic Latent Semantic Analysis is a novel statistical technique for the analysis of two-mode and co-occurrence data, which has applications in information retrieval and filtering, natural language processing, machine learning from text, and in related areas. Compared to standard Latent Semantic Analysis which stems from linear algebra and performs a Singular Value Decomposition of co-occurrence tables, the proposed method is based on a mixture decomposition derived from a latent class model. This results in a more principled approach which has a solid foundation in statistics. In order to avoid overfitting, we propose a widely applicable generalization of maximum likelihood model fitting by tempered EM. Our approach yields substantial and consistent improvements over Latent Semantic Analysis in a number of experiments.

2,306 citations


"Behavior Analysis through Routine C..." refers methods in this paper

  • ...Input to PLSA is the document-word matrix which is modified to contain the counts of the occurrences of a word in a document....

    [...]

  • ...Here A represents P (w|d), X represents P (w|t), and D represents P (t|d) which is similar to Matrix factorization as described in previous models, so LDA is also based on the same concept as the PLSA and NMF....

    [...]

  • ...1) Probabilistic Latent Semantic Analysis (PLSA): PLSA has not been applied towards BA earlier....

    [...]

  • ...Perplexity and coherence are both scored for probabilistic models like LDA, PLSA, and NMF where the scores essentially represent what chances are there of discovering a new cluster....

    [...]

  • ...For the implementation of PLSA, we use Singular Value Decomposition (SVD) [13] to factorize the matrix, allowing us to work with matrices of lower dimensions....

    [...]