Blogger-Link-Topic model for blog mining

doi:10.1007/978-3-642-28320-8_3

Home
/
Papers
/
Blogger-Link-Topic model for blog mining

Book Chapter•DOI•

Blogger-Link-Topic model for blog mining

Flora S. Tsai¹•Institutions (1)

Singapore University of Technology and Design¹

24 May 2011-pp 28-39

TL;DR: This paper proposes the blogger-link-topic model for blog mining based on the multiple attributes of blog content, bloggers, and links and presents a unique blog classification framework that computes the normalized document-topic matrix, which is applied to retrieve the classification results.

read less

Abstract: Blog mining is an important area of behavior informatics because produces effective techniques for analyzing and understanding human behaviors from social media. In this paper, we propose the blogger-link-topic model for blog mining based on the multiple attributes of blog content, bloggers, and links. In addition, we present a unique blog classification framework that computes the normalized document-topic matrix, which is applied our model to retrieve the classification results. After comparing the results for blog classification on real-world blog data, we find that our blogger-link-topic model outperforms the other techniques in terms of overall precision and recall. This demonstrates that additional information contained in blog-specific attributes can help improve blog classification and retrieval results.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Text Mining in Big Data Analytics

[...]

Hossein Hassani, Christina Beneki, Stephan Unger, Maedeh Taj Mazinani, Mohammad Reza Yeganegi - Show less +1 more

16 Jan 2020

TL;DR: The state-of-the-art text mining approaches and techniques used for analyzing transcripts and speeches, meeting transcripts, and academic journal articles, as well as websites, emails, blogs, and social media platforms, are investigated.

...read moreread less

Abstract: Text mining in big data analytics is emerging as a powerful tool for harnessing the power of unstructured textual data by analyzing it to extract new knowledge and to identify significant patterns and correlations hidden in the data. This study seeks to determine the state of text mining research by examining the developments within published literature over past years and provide valuable insights for practitioners and researchers on the predominant trends, methods, and applications of text mining research. In accordance with this, more than 200 academic journal articles on the subject are included and discussed in this review; the state-of-the-art text mining approaches and techniques used for analyzing transcripts and speeches, meeting transcripts, and academic journal articles, as well as websites, emails, blogs, and social media platforms, across a broad range of application areas are also investigated. Additionally, the benefits and challenges related to text mining are also briefly outlined.

...read moreread less

103 citations

Journal Article•DOI•

A New Approach in Bloggers Classification with Hybrid of K-Nearest Neighbor and Artificial Neural Network Algorithms

[...]

Farhad Soleimanian Gharehchopogh¹, Seyyed Reza Khaze², Isa Maleki²•Institutions (2)

Hacettepe University¹, Islamic Azad University²

01 Feb 2015-Indian journal of science and technology

TL;DR: Two methods were used in this paper: K-Nearest Neighbor (KNN) and Artificial Neural Networks (ANNs) which are classified based on Kohkiloye and Boyer Ahmad province bloggers dataset considering input features of each blogger to the other methods and previously provided algorithms as more optimal.

...read moreread less

Abstract: Blogs are one of the effective tools of web2 which are considered as one of the major module and of social and interactive capabilities in making IT world wonderful for the cyber and virtual living. Two methods were used in this paper: K-Nearest Neighbor (KNN) and Artificial Neural Networks (ANNs). These methods are classified based on Kohkiloye and Boyer Ahmad province bloggers dataset considering input features of each blogger to the other methods and previously provided algorithms as more optimal. Our simulation and experiments not only provide hopeful results but also higher anticipation and classification rate.

...read moreread less

50 citations

Journal Article•DOI•

Significance of machine learning algorithms in professional blogger's classification

[...]

Yousra Asim¹, Ahmad Raza Shahid¹, Ahmad Kamran Malik¹, Basit Raza¹•Institutions (1)

COMSATS Institute of Information Technology¹

10 Aug 2017-Computers & Electrical Engineering

TL;DR: The factors that influence a blogger to behave professionally are identified based on the classifier with the best results, and the causes behind the varying performance of algorithms are elaborated.

...read moreread less

27 citations

A Bibliography of Papers in Lecture Notes in Computer Science (2012): Volumes 6121{7125

[...]

Nelson H. F. Beebe

01 Jan 2012

TL;DR: (∆ + 1) [1577].

...read moreread less

Abstract: (∆ + 1) [1577]. (ρ,G) [266]. (r|p) [781]. 1 [1022]. 1 [1342]. 2 [27, 1294, 1138, 432, 1028, 281, 758, 272, 1440, 546, 861, 867, 1352, 578, 561]. 3 [579, 1293, 1381, 176, 1355, 1623, 1294, 1012, 1358, 341, 1370, 1028, 157, 160, 978, 1440, 861, 1385, 279, 995, 1340, 1400, 1433, 1352, 173, 1295, 1343, 1560, 1409, 662]. 4 [1349]. [0, 1] [660]. + [204]. 2 [608, 1012]. 3 [1012, 622]. p [647]. A∗ [1264]. B [623]. β [217]. C [673]. C [656]. `0 [268]. [324, 1470]. G [649]. GM(1, 1) [536]. H∞ [392]. K [1026, 909, 1433, 1516, 930, 1033]. L1 [673]. μ [1709]. p [526, 240, 1089]. P0 [103]. q [683]. R [297, 1012]. ρ [1643, 1626]. τ [522].

...read moreread less

8 citations

Journal Article•DOI•

Analysis of Issues Related to Artificial Intelligence Based on Topic Modeling

[...]

Seol-Hyun Noh

01 Jan 2020-Journal of Digital Convergence

TL;DR: This study analyzes domestic articles related to AI using topic modeling method based on LDA algorithm to determine new value that can be created through the convergence between artificial intelligence technology (AIT) and all industries.

...read moreread less

Abstract: The present study determined new value that can be created through the convergence between artificial intelligence technology (AIT) and all industries by deriving and thoroughly analyzing major issues related to artificial intelligence (AI). This study analyzes domestic articles related to AI using topic modeling method based on LDA algorithm.

...read moreread less

6 citations

Additional excerpts

...박자현, 송민 (2013)은 토픽모델링을 통해 국내 문헌정보학 연구 동향 을 분석하였고[2], Flora(2011)는 토픽모델링을 이용하여 웹 블로그의 콘텐츠 동향을 분석하였다[3]....
[...]

References

PDF

Open Access

More filters

Journal Article•DOI•

Latent dirichlet allocation

[...]

David M. Blei¹, Andrew Y. Ng², Michael I. Jordan¹•Institutions (2)

University of California, Berkeley¹, Stanford University²

01 Mar 2003-Journal of Machine Learning Research

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

Abstract: We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.

...read moreread less

30,570 citations

Proceedings Article•

Latent Dirichlet Allocation

[...]

David M. Blei¹, Andrew Y. Ng¹, Michael I. Jordan¹•Institutions (1)

University of California, Berkeley¹

03 Jan 2001

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).

...read moreread less

Abstract: We propose a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI) [3]. In the context of text modeling, our model posits that each document is generated as a mixture of topics, where the continuous-valued mixture proportions are distributed as a latent Dirichlet random variable. Inference and learning are carried out efficiently via variational algorithms. We present empirical results on applications of this model to problems in text modeling, collaborative filtering, and text classification.

...read moreread less

25,546 citations

Proceedings Article•DOI•

The author-topic model for authors and documents

[...]

Michal Rosen-Zvi¹, Thomas L. Griffiths², Mark Steyvers¹, Padhraic Smyth¹•Institutions (2)

University of California, Irvine¹, Stanford University²

07 Jul 2004

TL;DR: The author-topic model is introduced, a generative model for documents that extends Latent Dirichlet Allocation to include authorship information, and applications to computing similarity between authors and entropy of author output are demonstrated.

...read moreread less

Abstract: We introduce the author-topic model, a generative model for documents that extends Latent Dirichlet Allocation (LDA; Blei, Ng, & Jordan, 2003) to include authorship information. Each author is associated with a multinomial distribution over topics and each topic is associated with a multinomial distribution over words. A document with multiple authors is modeled as a distribution over topics that is a mixture of the distributions associated with the authors. We apply the model to a collection of 1,700 NIPS conference papers and 160,000 CiteSeer abstracts. Exact inference is intractable for these datasets and we use Gibbs sampling to estimate the topic and author distributions. We compare the performance with two other generative models for documents, which are special cases of the author-topic model: LDA (a topic model) and a simple author model in which each author is associated with a distribution over words rather than a distribution over topics. We show topics recovered by the author-topic model, and demonstrate applications to computing similarity between authors and entropy of author output.

...read moreread less

1,554 citations

Proceedings Article•DOI•

Probabilistic author-topic models for information discovery

[...]

Mark Steyvers¹, Padhraic Smyth¹, Michal Rosen-Zvi¹, Thomas L. Griffiths²•Institutions (2)

University of California, Irvine¹, Stanford University²

22 Aug 2004

TL;DR: The methodology is applied to a large corpus of 160,000 abstracts and 85,000 authors from the well-known CiteSeer digital library, and a model with 300 topics is learned using a Markov chain Monte Carlo algorithm.

...read moreread less

Abstract: We propose a new unsupervised learning technique for extracting information from large text collections. We model documents as if they were generated by a two-stage stochastic process. Each author is represented by a probability distribution over topics, and each topic is represented as a probability distribution over words for that topic. The words in a multi-author paper are assumed to be the result of a mixture of each authors' topic mixture. The topic-word and author-topic distributions are learned from data in an unsupervised manner using a Markov chain Monte Carlo algorithm. We apply the methodology to a large corpus of 160,000 abstracts and 85,000 authors from the well-known CiteSeer digital library, and learn a model with 300 topics. We discuss in detail the interpretation of the results discovered by the system including specific topic and author models, ranking of authors by topic and topics by author, significant trends in the computer science literature between 1990 and 2002, parsing of abstracts by topics and authors and detection of unusual papers by specific authors. An online query interface to the model is also discussed that allows interactive exploration of author-topic models for corpora such as CiteSeer.

...read moreread less

618 citations

Proceedings Article•

The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity

[...]

David Cohn, Thomas Hofmann¹•Institutions (1)

Brown University¹

01 Jan 2000

TL;DR: A joint probabilistic model for modeling the contents and inter-connectivity of document collections such as sets of web pages or research paper archives is described, based on a Probabilistic factor decomposition.

...read moreread less

Abstract: We describe a joint probabilistic model for modeling the contents and inter-connectivity of document collections such as sets of web pages or research paper archives. The model is based on a probabilistic factor decomposition and allows identifying principal topics of the collection as well as authoritative documents within those topics. Furthermore, the relationships between topics is mapped out in order to build a predictive model of link content. Among the many applications of this approach are information retrieval and search, topic identification, query disambiguation, focused web crawling, web authoring, and bibliometric analysis.

...read moreread less

519 citations