Open AccessBook
The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
Ronen Feldman,James Sanger +1 more
TLDR
Providing an in-depth examination of core text mining and link detection algorithms and operations, this text examines advanced pre-processing techniques, knowledge representation considerations, and visualization approaches.Abstract:
1. Introduction to text mining 2. Core text mining operations 3. Text mining preprocessing techniques 4. Categorization 5. Clustering 6. Information extraction 7. Probabilistic models for Information extraction 8. Preprocessing applications using probabilistic and hybrid approaches 9. Presentation-layer considerations for browsing and query refinement 10. Visualization approaches 11. Link analysis 12. Text mining applications Appendix Bibliography.read more
Citations
More filters
Book
Natural Language Processing with Python
TL;DR: This book offers a highly accessible introduction to natural language processing, the field that supports a variety of language technologies, from predictive text and email filtering to automatic summarization and translation.
Proceedings ArticleDOI
Detecting large-scale system problems by mining console logs
TL;DR: In this article, a general methodology to mine this rich source of information to automatically detect system runtime problems was proposed, combining source code analysis with information retrieval to create composite features and then analyze these features using machine learning to detect operational problems.
Proceedings Article
Detecting Large-Scale System Problems by Mining Console Logs
TL;DR: This work first parse console logs by combining source code analysis with information retrieval to create composite features, and then analyzes these features using machine learning to detect operational problems to automatically detect system runtime problems.
Proceedings Article
From throw-away traffic to bots: detecting the rise of DGA-based malware
Manos Antonakakis,Roberto Perdisci,Yacin Nadji,Nikolaos Vasiloglou,Saeed Abu-Nimeh,Wenke Lee,David Dagon +6 more
TL;DR: A new technique to detect randomly generated domains without reversing is presented, finding that most of the DGA-generated domains that a bot queries would result in Non-Existent Domain (NXDomain) responses, and that bots from the same bot-net (with the same DGA algorithm) would generate similar NXDomain traffic.
Journal ArticleDOI
The power of social media analytics
Weiguo Fan,Michael D. Gordon +1 more
TL;DR: How to use, and influence, consumer social communications to improve business performance, reputation, and profit.
References
More filters
Book
Elements of information theory
Thomas M. Cover,Joy A. Thomas +1 more
TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Book
The Nature of Statistical Learning Theory
TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
Journal ArticleDOI
Latent dirichlet allocation
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article
Latent Dirichlet Allocation
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Journal ArticleDOI
A tutorial on hidden Markov models and selected applications in speech recognition
TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.