Sparse Latent Semantic Analysis.

doi:10.1137/1.9781611972818.41

Open AccessProceedings ArticleDOI

Sparse Latent Semantic Analysis.

- pp 474-485

TLDR

A new model called Sparse LSA is proposed, which produces a sparse projection matrix via the `1 regularization and achieves similar performance gains to LSA, but is more efficient in projection computation, storage, and also well explain the topic-word relationships.

Abstract:

Latent semantic analysis (LSA), as one of the most popular unsupervised dimension reduction tools, has a wide range of applications in text mining and information retrieval. The key idea of LSA is to learn a projection matrix that maps the high dimensional vector space representations of documents to a lower dimensional latent space, i.e. so called latent topic space. In this paper, we propose a new model called Sparse LSA, which produces a sparse projection matrix via the `1 regularization. Compared to the traditional LSA, Sparse LSA selects only a small number of relevant words for each topic and hence provides a compact representation of topic-word relationships. Moreover, Sparse LSA is computationally very efficient with much less memory usage for storing the projection matrix. Furthermore, we propose two important extensions of Sparse LSA: group structured Sparse LSA and non-negative Sparse LSA. We conduct experiments on several benchmark datasets and compare Sparse LSA and its extensions with several widely used methods, e.g. LSA, Sparse Coding and LDA. Empirical results suggest that Sparse LSA achieves similar performance gains to LSA, but is more efficient in projection computation, storage, and also well explain the topic-word relationships.

Citations

PDF

Open Access

More filters

Proceedings Article

Angular Quantization-based Binary Codes for Fast Similarity Search

Yunchao Gong, +3 more

TL;DR: This work introduces a novel angular quantization-based binary coding (AQBC) technique for high-dimensional non-negative data that arises in vision and text applications where counts or frequencies are used as features and proposes a method for mapping feature vectors to their smallest-angle binary vertices that scales as O(d log d).

...read moreread less

Proceedings Article

Learning Topics in Short Texts by Non-negative Matrix Factorization on Term Correlation Matrix

Xueqi Cheng, +4 more

TL;DR: This paper introduces a novel way to compute term correlation in short texts by representing each term with its co-occurred terms and formulated the topic learning problem as symmetric non-negative matrix factorization on the term correlation matrix.

...read moreread less

Journal ArticleDOI

Automated risk identification using NLP in cloud based development environments

K. Vijayakumar, +1 more

- 10 May 2017 -

Journal of Ambient Intelligence and Huma...

TL;DR: The need for automated risk assessments with the help of NLP to auto identify the risks on the analysis of weakness and vulnerabilities is addressed.

...read moreread less

Proceedings ArticleDOI

Regularized latent semantic indexing

Quan Wang, +3 more

TL;DR: Regularized Latent Semantic Indexing (RLSI), a new method which is designed for parallelization, is introduced, which is as effective as existing topic models, and scales to larger datasets without reducing input vocabulary.

...read moreread less

Proceedings Article

Harmonious hashing

Bin Xu, +5 more

TL;DR: A novel hashing algorithm called Harmonious Hashing is introduced which aims at learning hash functions with low information loss and learns a set of optimized projections to preserve the maximum cumulative energy and meet the constraint of equivalent variance on each dimension as much as possible.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

LIBSVM: A library for support vector machines

Chih-Chung Chang, +1 more

- 06 May 2011 -

ACM Transactions on Intelligent Systems ...

TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

...read moreread less

Journal ArticleDOI

Regression Shrinkage and Selection via the Lasso

Robert Tibshirani

- 01 Jan 1996 -

Journal of the royal statistical society...

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.

...read moreread less

Journal ArticleDOI

Latent dirichlet allocation

David M. Blei, +2 more

- 01 Mar 2003 -

Journal of Machine Learning Research

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

Proceedings Article

Latent Dirichlet Allocation

David M. Blei, +2 more

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).

...read moreread less