Proceedings ArticleDOI
A Hybrid Document Feature Extraction Method Using Latent Dirichlet Allocation and Word2Vec
Zhibo Wang,Long Ma,Yan-Qing Zhang +2 more
- pp 98-103
TLDR
Experimental results indicate that document features generated by the hybrid method are useful to improve classification performance by consolidating both global and local relationships.Abstract:
Latent Dirichlet Allocation (LDA) is a probabilistic topic model to discover latent topics from documents and describe each document with a probability distribution over the discovered topics. It defines a global hierarchical relationship from words to a topic and then from topics to a document. Word2Vec is a word-embedding model to predict a target word from its surrounding contextual words. In this paper, we propose a hybrid approach to extract features from documents with bag-of-distances in a semantic space. By using both Word2Vec and LDA, our hybrid method not only generates the relationships between documents and topics, but also integrates the contextual relationships among words. Experimental results indicate that document features generated by our hybrid method are useful to improve classification performance by consolidating both global and local relationships.read more
Citations
More filters
Journal ArticleDOI
A Review of Text Corpus-Based Tourism Big Data Mining
TL;DR: A detailed and up-to-date review of text mining techniques that have been, or have the potential to be, applied to modern tourism big data analysis and their applications in tourist profiling, destination image analysis, market demand, etc.
Proceedings ArticleDOI
LDA Meets Word2Vec: A Novel Model for Academic Abstract Clustering
Chang-Zhou Li,Yao Lu,Jun-Feng Wu,Yongrui Zhang,Xia Zhongzhou,Tianchen Wang,Dantian Yu,Xurui Chen,Peidong Liu,Guo Junyu +9 more
TL;DR: A novel clustering model that uses abstract text instead of keywords to cluster because keywords may be ambiguous and cause unsatisfied clustering results shown by previous work, and Experimental results show that the clusteringresults of PW-LDA are much more accurate and stable than state-of-the-art techniques.
Journal ArticleDOI
Tourism Review Sentiment Classification Using a Bidirectional Recurrent Neural Network with an Attention Mechanism and Topic-Enriched Word Vectors
TL;DR: A bidirectional gated recurrent unit neural network model (BiGRULA) is proposed for sentiment analysis by combining a topic model (lda2vec) and an attention mechanism that allows for more coherent topics from these reviews and achieves good performance in sentiment classification.
Journal ArticleDOI
Exploring the donation allocation of online charitable crowdfunding based on topical and spatial analysis: Evidence from the Tencent GongYi
TL;DR: A comparative analysis of four types of crowdfunding projects to examine differences in their general characteristics and donation allocation suggests that the success rates for these four different types of online charitable crowdfunding vary and that the key influencing factor among them is the type of project executors.
Journal ArticleDOI
Classification of movie reviews using term frequency-inverse document frequency and optimized machine learning algorithms
TL;DR: This study provides the implementation of various machine learning models to measure the polarity of the sentiments presented in user reviews on the IMDb website and indicates that the SVM obtains the highest accuracy when used with TF-IDF features and achieves an accuracy of 89.55%.
References
More filters
Journal Article
Scikit-learn: Machine Learning in Python
Fabian Pedregosa,Gaël Varoquaux,Alexandre Gramfort,Vincent Michel,Bertrand Thirion,Olivier Grisel,Mathieu Blondel,Peter Prettenhofer,Ron Weiss,Vincent Dubourg,Jake Vanderplas,Alexandre Passos,David Cournapeau,Matthieu Brucher,Matthieu Perrot,Edouard Duchesnay +15 more
TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Journal ArticleDOI
Latent dirichlet allocation
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings ArticleDOI
Glove: Global Vectors for Word Representation
TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Posted Content
Scikit-learn: Machine Learning in Python
Fabian Pedregosa,Gaël Varoquaux,Alexandre Gramfort,Vincent Michel,Bertrand Thirion,Olivier Grisel,Mathieu Blondel,Andreas Müller,Joel Nothman,Gilles Louppe,Peter Prettenhofer,Ron Weiss,Vincent Dubourg,Jake Vanderplas,Alexandre Passos,David Cournapeau,Matthieu Brucher,Matthieu Perrot,Edouard Duchesnay +18 more
TL;DR: Scikit-learn as mentioned in this paper is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems.
Proceedings Article
Latent Dirichlet Allocation
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).