Sentiment-based topic suggestion for micro-reviews

Sentiment-Based Topic Suggestion for Micro-Reviews

Ziyu Lu

∗

, Nikos Mamoulis

‡

, Evaggelia Pitoura

‡

and Panayiotis Tsaparas

‡

∗

The University of Hong Kong

‡

University of Ioannina

zylu@cs.hku.hk {nikos,pitoura,tsap}@cs.uoi.gr

Abstract

Location-based social sites, such as Foursquare or Yelp,

are gaining increasing popularity. These sites allow users to

check in at venues and leave a short commentary in the form

of a micro-review. Micro-reviews are rich in content as they

offer a distilled and concise account of user experience. In

this paper we consider the problem of predicting the topic

of a micro-review by a user who visits a new venue. Such a

prediction can help users make informed decisions, and also

help venue owners personalize users’ experiences. However,

topic modeling for micro-reviews is particularly difﬁcult, due

to their short and fragmented nature. We address this issue

using pooling strategies, which aggregate micro-reviews at

the venue or user level, and we propose novel probabilistic

models based on Latent Dirichlet Allocation (LDA) for ex-

tracting the topics related to a user-venue pair. Our best topic

model integrates inﬂuences from both venue inherent prop-

erties and user preferences, considering at the same the sen-

timent orientation of the users. Experimental results on real

datasets demonstrate the superiority of this model compared

to simpler models and previous work; they also show that

venue-inherent properties have higher inﬂuence on the top-

ics of micro-reviews.

Introduction

In the past few years, location-based social sites, such as

Foursquare

1

, Yelp

2

and Facebook places

3

have emerged as

prime online destinations, where users can record their foot-

prints via check-ins, as well as their experience through

micro-reviews. Micro-reviews, or tips, accompany a check-

in at a venue, and they contain a short commentary on the

venue. Tips may offer information about the venue, opin-

ions on what is good, or advice for new customers. They

are very targeted and concise, and they provide a distilled

account of the experience of the users in the venue. They

are a fast-growing corpus, and they have recently attracted

considerable research interest (Aggarwal, Almeida, and Ku-

maraguru 2013; Moraes et al. 2013; Nguyen, Lauw, and

Tsaparas 2015) for the rich content they contain.

Copyright

c

 2016, Association for the Advancement of Artiﬁcial

1

https://foursquare.com/

2

http://www.yelp.com/

3

https://www.facebook.com/places/

In this paper, we consider the problem of predicting the

topic that a user will comment on in her tip when checking

in to a new venue. This is a problem of great practical im-

portance for both users and venues. For venues, knowing in

advance the aspect of the venue that a user will most likely

focus on, allows them to offer a personalized experience to

the user. For example, if we can predict that when visiting a

restaurant a speciﬁc user is likely to comment on the quality

of the service, the manager of the venue can make sure to

ﬁne-tune their service in order to meet the customer’s needs.

Furthermore, exposing the topic prediction to the users pro-

vides ﬁne-grain information about the venue. For example,

for a user who is a wine enthusiast, recommending a restau-

rant, and predicting that the user is likely to comment on the

extensive wine selection of the place adds texture and con-

text to the recommendation.

Similar problems have been considered in the context of

full-text reviews, where the goal is to estimate the rating

of speciﬁc aspects of venues (Wang, Lu, and Zhai 2011;

Hai et al. 2014; Wang, Lu, and Zhai 2010; Moghaddam and

Ester 2011). These approaches rely on generative models

that extract latent topical aspects and their ratings. However,

applying such techniques to micro-reviews is not straightfor-

ward. Micro-reviews have special characteristics, very dif-

ferent from those of reviews. First, they are very short (up

to 200 characters), and provide very limited context and few

word co-occurrences (Lin et al. 2014). Second, due to the

length limitation, the expression is very dense, and the text

is often fragmented and poorly structured. Finally, micro-

reviews often contain diverse pieces of information stitched

together in a few sentences, resulting in incoherent seman-

tics. These characteristics make opinion mining and analysis

on micro-reviews harder compared to the full-text reviews.

To the best of our knowledge, we are the ﬁrst to consider the

problem of topic prediction for micro-reviews.

In order to tackle the limited and incomplete character-

istics of micro-reviews, we use pooling strategies for doc-

ument collaborative ﬁltering (Mehrotra et al. 2013; Weng

et al. 2010) and contextualization (Tang, Zhang, and Mei

2013) to integrate multiple contexts. We aggregate all micro-

reviews of a venue and a user into a single venue-document

and user-document respectively, and we consider probabilis-

tic topic models on the aggregated documents for the prob-

lem of topic prediction. We ﬁrst deﬁne two simple mod-

els that apply Latent Dirichlet Allocation (LDA) (Blei, Ng,

and Jordan 2003) individually on the venue and user doc-

uments. Building on these models, we propose two novel

models, Authority-LDA (ALDA) and Authority-Sentiment

LDA (ASLDA), which integrate both venue inherent proper-

ties and user personalized preferences. In the ASLDA model

we add a sentiment layer to cluster topics into different sen-

timent groups based on the assumption that users might have

personalized sentiment orientation (e.g. some users tend to

be negative), and they are more likely to comment on as-

pects (topics) that match their sentiment orientation (e.g.,

negative users will tend to comment on negative aspects).

The ASLDA model can predict not just the topic of a fu-

ture tip, but also the most probable sentiment orientation.

We evaluate the proposed models on datasets from two real

location-based social sites Foursquare and Yelp. Experimen-

tal results show that our methods outperform other competi-

tor approaches.

In summary, in this paper we make the following contri-

butions:

• We deﬁne the problem of topic prediction for micro-

reviews when a user checks in an unvisited venue. To the

best of our knowledge, this is the ﬁrst work that deals with

topic prediction and suggestion in the context of micro-

reviews at location-based social sites.

• We deﬁne four probabilistic models for the problem, in-

cluding two novel probabilistic models that leverage both

venue inherent aspects and user personalized preferences.

The Authority-Sentiment LDA (ASLDA) model intro-

duces a novel way to incorporate the inﬂuences from the

user sentiment orientation in the topic prediction, and it is

able to predict the sentiment orientation of a future tip.

• We evaluate the proposed methods for topic prediction

on four datasets from two real location-based social sites

Foursquare and Yelp, and compare against other ap-

proaches.

Related Work

Micro-reviews is a relatively new corpus that only recently

drew the attention of the research community. There is work

on micro-reviews on spam detection (Aggarwal, Almeida,

and Kumaraguru 2013), polarity analysis (Moraes et al.

2013), micro-review summarization (Nguyen, Lauw, and

Tsaparas 2015). To the best of our knowledge we are the

ﬁrst to consider the problem of topic prediction for micro-

reviews.

Topic modeling algorithms have been widely adopted in

text mining (Blei, Ng, and Jordan 2003; Rosen-Zvi et al.

2004). One of the ﬁrst such models, proposed by Blei et al.

(Blei, Ng, and Jordan 2003), is Latent Dirichlet allocation

(LDA). Many topic models based on LDA have been de-

veloped to address review mining problems. For example,

Moghaddam and Ester (Moghaddam and Ester 2011) intro-

duced an Interdependent Latent Dirichlet Allocation (ILDA)

model to infer latent aspects and their ratings for online

product reviews. Lin and He (Lin and He 2009) proposed

a joint sentiment-topic model (JST) for sentiment analysis

of movie reviews, by extending LDA with a new sentiment

layer. JST is based on the assumption that topic genera-

tion depends on sentiments, and word generation depends

on sentiment-topic pairs.

However, (Lin et al. 2014) showed that the characteris-

tics of short text reduce the effectiveness of topic model-

ing methods. Micro-reviews in location-based social sites

are very short, and have a relatively small vocabulary

and a broad range of topics. The probability of word co-

occurrence in the micro-reviews is very small, compromis-

ing the performance of topic models originally designed for

long reviews. In order to address this data sparsity problem,

heuristics such as document pooling (Mehrotra et al. 2013;

Weng et al. 2010) or contextualization (Tang, Zhang, and

Mei 2013) have been proposed to improve the performance

of topic modeling on short text. For instance, Mehrotra et al.

(Mehrotra et al. 2013) proposed to aggregate all documents

by the same author or all documents with speciﬁc hashtags

and form pooling documents on which topic modeling can

be applied effectively. Contextualized topic models are pro-

posed to integrate particular types of contexts into classical

models like LDA, by introducing either additional layers to

the topic model (Jo and Oh 2011; Rosen-Zvi et al. 2004;

Lin and He 2009) or use a coin-ﬂipping selection process to

select among contexts (Paul and Girju 2010; Tang, Zhang,

and Mei 2013; Zhao et al. 2011). The author-topic model

(AT) of (Rosen-Zvi et al. 2004), which utilizes authorship in-

formation for modeling scientiﬁc publications, can also been

viewed as a contextualized topic model. Tang et al. (Tang,

Zhang, and Mei 2013) proposed a model, which formulates

different types of contexts as multiple views of the partition

of the corpus and uses voting to determine consensus topics.

Our models adopt both pooling methods and contextu-

alization in order to facilitate topic discovery for micro-

reviews in location-based social sites. We aggregate micro-

reviews on the same venue or micro-reviews by the same au-

thor to construct aggregated pooling documents. To the ag-

gregated documents, we add additional context such as au-

thority information and the sentiment orientation of users to

improve latent topic learning. Prior work on sentiment-topic

models (Lin and He 2009; Moghaddam and Ester 2011) in-

troduced sentiment as a latent (unknown) variable based on

some assumptions of the dependencies between sentiment

variable and topic variable, and then jointly learned the sen-

timents and topics. On the other hand, in our work, we as-

sume that the sentiment information is observed (from a ex-

isting sentiment lexicon for short text) and utilize the sen-

timent orientation of users to enhance the process of topic

discovery.

Problem Deﬁnition

In this section we introduce some terminology and deﬁne

our problem.

A location-based social site consists of a set of users, a set

of venues, and a collection of micro-reviews. Formally, we

will use V to denote the set of venues in the site, and A to

denote the set of all users (authors) in the site. A tip t is a

short piece of text written by a user u ∈ A for a venue v ∈

V . A tip is deﬁned over a vocabulary W consisting of the

union of all terms that appear in all tips in our collection. We

assume that stop-words have been removed and that terms

have been stemmed. We deﬁne a micro-review r = hu, v, ti

as a triplet consisting of a user u ∈ U, a venue v, and a tip t

that was left from the user u about the venue v. We use R to

denote the set of all micro-reviews.

As tips consist of short text, studying them individually is

not very informative. We thus use pooling methods to con-

struct aggregated documents for a venue or a user. For a

venue v, we use A

v

to deﬁne the set of all users that have

written a micro-review for venue v, and R

v

to deﬁne the

collection of all micro-reviews for venue v. We use d

v

to

denote the venue-document deﬁned by taking the union of

all the tips in R

v

. We use W

d

= {w

1

, w

2

, ...w

m

} to denote

the vocabulary of the document d

v

. In a symmetric fashion,

for a user u we deﬁne the venue set V

u

, the micro-review set

R

u

, the user-document d

u

and the vocabulary W

u

.

Given a collection of (user or venue) documents D =

{d

1

, d

2

, ...}, using topic-modeling techniques we can extract

a set of K latent topics Z = {z

1

, ..., z

k

}. Each topic z

i

is

deﬁned as a distribution over the vocabulary W . Our goal is

given user-venue pair (u, v) for which there is currently no

micro-review r, to predict the latent topic of that tip. For-

mally, we deﬁne our problem as follows.

Problem Deﬁnition: Given social site consisting of a col-

lection of users A, venues V , and micro-reviews R, a set of

latent topics Z, a user-venue pair (u, v), and a number N,

identify a set of N latent topics Z

N

⊂ Z that the user u is

most likely to comment on about venue v.

Proposed models

(a) VLDA (b) ULDA

Figure 1: LDA models

LDA model

The LDA model was proposed in (Blei, Ng, and Jordan

2003). It represents each document as a multinomial distri-

bution over K latent topics, and each topic as a multino-

mial distribution over terms (words). We applied LDA on

our pooling documents (venue or user documents). For each

type (venue/user) of document collection, we derived two

distributions. Using the venue document collection, we de-

rived a venue-topic distribution, and a topic-word distribu-

tion. This model is denoted as Venue-LDA model (VLDA).

Similarly, using the user document collection, we extracted a

user-topic distribution and a topic-word distribution to form

a User-LDA model (ULDA). Each of these two models cap-

tures the different inﬂuences that the venues or users, re-

spectively, have on the topics of the tip to be given by the

target user to the target venue. The graphical representations

of VLDA and ULDA are shown in Figure 1. In the diagrams,

M is the number of terms in a venue/user document, V is

the number of venues, U is the number of users, and K is

the number of topics (aspects) z.

In both models, φ is the topic-term distribution; θ is the

venue-topic distribution, and χ is the user-topic distribution

in Figure 1(a) and Figure 1(b), respectively; α and β are

prior parameters. The generative process of LDA models on

venue documents (VLDA) or user documents (ULDA) are

as follows:

1. Sample θ (VLDA) or χ (ULDA) from Dirichlet priors

Dir(α).

2. For each topic z, sample φ

z

from Dirichlet priors β

z

.

3. For each term w

di

in the (venue or user) document d,

• draw a topic z

di

from Dir(θ

d

) (VLDA) or Dir(χ

d

)

(ULDA)

• draw a term w

di

from Dir(φ

z

di

)

Parameter Estimation. The probability of document col-

lection D is deﬁned as follows:

p(D|α, β) =

N

Y

d=1

Z

p(θ|α)(

M

Y

m=1

X

z

p(z|θ)p(w

dm

|z, β))dθ

d

For the venue document collection, each document is a

venue document and N = V . For the user document col-

lection, each document is a user document and N = U, and

θ is replaced by χ.

We use Gibbs sampling (Grifﬁths and Steyvers 2004)

to perform approximate inference, and to estimate the un-

known parameters {θ, φ}. The conditional distribution for

Gibbs sampling is as follows:.

p(z

di

|z

¬di

, w, d) ∝

n

¬di

dz

di

+ α

z

di

P

z

(n

¬di

dz

+ α

z

)

×

n

¬di

z

di

w

di

+ β

w

di

P

w

(n

¬di

z

di

w

+ β

w

)

where n

dz

is the number of times that topic z has been sam-

pled from the multinomial distribution to the document d.

n

zw

is the number of times that term w has been sampled

to topic z. A superscript (e.g., ¬di), denotes a quantity, ex-

cluding the speciﬁed instance (e.g. the i

th

word in document

d).

After sampling for a sufﬁcient number of iterations, θ or

χ and φ are calculated as follows:

ˆ

θ

v z

=

n

v z

+ α

z

P

z

0

n

v z

+ α

z

ˆχ

uz

=

n

uz

+ α

z

P

z

0

n

uz

+ α

z

ˆ

φ

zw

=

n

zw

+ β

w

P

w

0

n

zw

0

+ β

w

0

(1)

Authority-LDA model

VLDA and ULDA models only consider inﬂuences from one

side (either venue or user). We propose a new Authority-

LDA model (ALDA) which integrates inﬂuences from both

Figure 2: Authority-LDA model

Table 1: Notations used in ALDA

Symbols Description

V , U the number of venues, the number of users

K the number of topics

M the number of words in an document

A

v

the set of users for the venue document d

v

u, z the user variable, the latent topic variable

c the switch variable

θ

v

distribution of topics speciﬁc to venue v

χ

u

distribution of topics speciﬁc to user u

φ

z

distribution of words speciﬁc to topic z

α, σ, β Dirichlet priors for θ, χ, φ

λ

u

the parameter of Bernoulli distribution speciﬁc

to user u for sampling the binary switch c

γ Beta prior for λ, where γ = {γ, γ

0

}

users and venues. ALDA employs venue-wise pooling to

construct venue documents for the representation of venue

inﬂuences and leverages the author information to represent

user inﬂuences as in Author-Topic (AT) modeling (Rosen-

Zvi et al. 2004). Then the latent topics depend on both the

inherent aspects of venues and personal preferences of users.

We use a mixing parameter λ to control the weights of

inﬂuence from both sides. The parameter λ is follows a

Bernoulli distribution which samples a binary variable c that

switches between inﬂuence from venue inherent aspects and

user preferences. In other words, when a user u comments

on a venue, we assume that the tip is inﬂuenced by the user’s

personal preferences with probability λ

u

(c=1) and by the in-

herent aspects of venue with probability 1−λ

u

(c=0). The la-

tent topics are still multinomial distributions over terms. Fig-

ure 2 shows a graphical representation of the ALDA model,

while Table 1 summarizes the symbols used in ALDA.

Note that the latent topics z depend on both the venue-

topic distribution θ and the user-topic distribution χ. A

v

is

the set of users who have written micro-reviews about venue

v, namely the authority users for the venue document v. φ

represents topic-term distribution. M is the number of terms

in the venue documents, V is the number of venues, U is

the number of users, and K is the number of topics (aspects)

z. The generative process of the proposed Authority-LDA

model is as follows:

1. For each topic z, draw φ

z

from Dir(β).

2. For each user u,

• Draw χ

u

from Dir(σ) and λ

u

from Beta(γ)

3. For each venue document d

v

,

• Draw θ

v

from Dir(α)

• For each term w

d

v

i

in the venue document d

v

,

– Draw a user u from A

v

uniformly,

– Draw switch c ∼ Bernoulli(λ

u

)

– If c=0

∗ Draw a topic z

d

v

i

from Dir(θ

v

)

– If c=1

∗ Draw a topic z

d

v

i

from Dir(χ

u

)

– draw a term w

d

v

i

from Dir(φ

z

d

v

i

)

Parameter Estimation. We also use Gibbs sampling to

estimate the unknown parameters {θ, χ, φ, λ}. In the Gibbs

sampling procedure, we ﬁrst compute the posterior distribu-

tion on u, c and z and then estimate {θ, χ, φ, λ}. The poste-

rior distribution of the hidden variables for each word w

d

v

i

is calculated as follows:

P (u

d

v

i

= u, z

d

v

i

= z, c

d

v

i

= 1|u

¬d

v

i

, z

¬d

v

i

, c

¬d

v

i

, w, A

v

)

∝

n

¬d

v

i

uc

(1) + γ

n

¬d

v

i

uc

+ γ + γ

0

×

n

¬d

v

i

uz

+ σ

P

z

0

(n

¬d

v

i

uz

0

+ σ)

×

n

¬d

v

i

zw

+ β

P

w

0

(n

¬d

v

i

zw

0

+ β)

P (u

d

v

i

= u, z

d

v

i

= z, c

d

v

i

= 0|u

¬d

v

i

, z

¬d

v

i

, c

¬d

v

i

, w, A

v

)

∝

n

¬d

v

i

uc

(0) + γ

0

n

¬d

v

i

uc

+ γ + γ

0

×

n

¬d

v

i

v z

+ α

P

z

0

(n

¬d

v

i

v z

0

+ α)

×

n

¬d

v

i

zw

+ β

P

w

0

(n

¬d

v

i

zw

0

+ β)

where n

uc

(1) and n

uc

(0) is the number of times that c=1 and

c=0, respectively, has been sampled for user u. n

uc

equals

n

uc

(1) + n

uc

(0). n

v z

is the number of times that topic z has

been sampled from the distribution θ

v

speciﬁc to venue v

and n

uz

is the number of times that topic z has been sampled

from the distribution χ

u

. n

zw

is the number of times that

term w has been sampled from the distribution φ

z

speciﬁc to

topic z. Superscript ¬d

v

i again denotes a quantity excluding

the current instance d

v

i.

After Gibbs sampling, {θ, χ, φ, λ} can be estimated as

follows:

ˆ

θ

v z

=

n

v z

+ α

z

P

z

0

n

v z

+ α

z

ˆχ

uz

=

n

uz

+ σ

z

P

z

0

n

uz

+ σ

z

ˆ

φ

zw

=

n

zw

+ β

w

P

w

0

n

zw

0

+ β

w

0

ˆ

λ

u

=

n

uc

(1) + γ

n

uc

+ γ + γ

0

(2)

Authority-Sentiment-LDA model

Quite frequently, the commenting behavior of users is af-

fected by their sentiment. For example, there exist negative

users who tend to comment on negative aspects of products;

at the same time they do not bother to post their opinions

for positive or neutral aspects. Similarly for positive users.

Motivated by this observation, we label the users based on

their sentiment orientation, i.e., tendency to give positive or

negative comments. We then predict the tip aspects for the

target venue, taking into consideration the user sentiments.

The resulting model is a Authority-sentiment-LDA model

(ASLDA), which extends ALDA by adding a sentiment ori-

entation layer, which captures the users’ sentiment prefer-

ences. Sentiment orientation is not a latent variable, but a

known label. We used an existing sentiment lexicon (Hu and

Liu 2004) for micro-reviews like tweets to get the sentiment

polarity s for each term w. From this, we can derive for any

venue document d

v

, a set of hw, si pairs with terms and their

polarity and the authority user set A

v

. The main difference

between ALDA and ASLDA is that in ASLDA we assert

that the authority users are sentiment-oriented and that their

sentiments determine the predicted topic.

Similarly to ALDA, mixing parameter λ is introduced to

weight the inﬂuence from inherent aspects of venues and the

sentiment-oriented user preferences, by sampling a binary

variable c. A graphical representation of ASLDA is shown

Figure 3: Authority-sentiment-LDA model

in Figure 3. The notations used in the description of ASLDA

are the same as those used for ALDA (Table 1), except χ and

π. In this model: χ is the multinomial distribution of user

over sentiments, and χ

u

represents the probability distribu-

tion of sentiment orientation speciﬁc to u; π is the multino-

mial distribution of sentiments over topics for representing

the impact of each sentiment orientation on topic selection;

π

s

is the distribution of topic speciﬁc to sentiment orienta-

tion s; s is the known sentiment polarity for each term that

takes values from three labels: positive, negative and neu-

tral. The inherent properties of venues are still represented

as a multinomial distribution over topics θ and the term-topic

distribution φ. S is the number of sentiment orientation la-

bels (S=3, positive, negative, neutral). The generative pro-

cess of ASLDA (for venue document collections) is as fol-

lows:

1. For each topic z, draw φ

z

from Dir(β).

2. For each user u,

• Draw χ

u

from Dir(σ) and λ

u

from Beta(γ)

3. For each venue document d

v

,

• Draw θ

v

from Dir(α)

• For each term w

d

v

i

in the venue document d

v

,

– Draw an user author u from A

v

uniformly,

– Draw switch c ∼ Bernoulli(λ

u

)

– If c=0

∗ Draw a topic z

d

v

i

from Dir(θ

v

)

– If c=1

∗ Draw a sentiment s

u

from Dir(χ

u

)

∗ Draw a topic z

d

v

i

from Dir(π

s

u

)

– draw a term w

d

v

i

from Dir(φ

z

d

v

i

)

Collapsed Gibbs sampling is used to estimate the unknown

parameters {θ, χ, π, φ, λ}. First, we calculate the posterior

probability as follows:

P (u

d

v

i

= u, z

d

v

i

= z, c

d

v

i

= 1|u

¬d

v

i

, z

¬d

v

i

, c

¬d

v

i

, w, s, A

v

)

∝

n

¬d

v

i

uc

(1) + γ

n

¬d

v

i

uc

+ γ + γ

0

×

n

¬d

v

i

us

+ σ

P

s

0

(n

¬d

v

i

us

0

+ σ)

×

n

¬d

v

i

sz

+ η

P

z

0

(n

¬d

v

i

sz

0

+ η)

×

n

¬d

v

i

zw

+ β

P

w

0

(n

¬d

v

i

zw

0

+ β)

P (u

d

v

i

= u, z

d

v

i

= z, c

d

v

i

= 0|u

¬d

v

i

, z

¬d

v

i

, c

¬d

v

i

, w, s, A

v

)

∝

n

¬d

v

i

uc

(0) + γ

0

n

¬d

v

i

uc

+ γ + γ

0

×

n

¬d

v

i

v z

+ α

P

z

0

(n

¬d

v

i

v z

0

+ α)

×

n

¬d

v

i

zw

+ β

P

w

0

(n

¬d

v

i

zw

0

+ β)

in which n

uc

(1), n

uc

(0), n

uc

, n

v z

, n

zw

have the same mean-

ing as in ALDA. n

us

is the number of times that the sent

s has been sampled from χ

u

speciﬁc to user u. n

sz

is the

number of times that the topic z has been sampled from the

distribution π

s

speciﬁc to the sentiment orientation s.

After sufﬁcient iterations of Gibbs sampling,

{θ, χ, π, φ, λ} can be estimated as follows:

ˆ

θ

v z

=

n

v z

+ α

z

P

z

0

n

v z

+ α

z

ˆχ

us

=

n

us

+ σ

z

P

s

0

n

us

+ σ

s

ˆπ

sz

=

n

sz

+ σ

s

P

s

0

n

sz

+ σ

s

ˆ

φ

zw

=

n

zw

+ β

w

P

w

0

n

zw

0

+ β

w

0

ˆ

λ

u

=

n

uc

(1) + γ

n

uc

+ γ + γ

0

(3)

Topic Suggestion

After training the above models, our task is to estimate

p(z|u, v), i.e., the probability of all topics/aspects z given

a new pair of user and venue u, v.

Suggestion by base LDA models

For basic models like VLDA and ULDA, venues and users

are considered independently. In other words, the latent top-

ics detected from them are only based on one perspective:

the venue or the user. p(z|u, v) from VLDA is proportional

to θ

v

while p(z|u, v) from ULDA is proportional to χ

u

:

p(z|u, v) ∝ p(z|v) = θ

v

, p(z|u, v) ∝ p(z|u) = χ

u

(4)

Suggestion by ALDA

The Authority-LDA model (ALDA) considers both the

venues’ inherent aspects and the users’ commenting prefer-

ences. The detected topics are interdependently inﬂuenced

by the venue-topic distribution θ and the user-topic distribu-

tion χ. Given a query pair (v, u), the predicted topics depend

Sentiment-based topic suggestion for micro-reviews

Citations

Conceptual Representations for Computational Concept Creation

RETRACTED ARTICLE: Sentiment topic emotion model on students feedback for educational benefits and practices

A Robust User Sentiment Biterm Topic Mixture Model Based on User Aggregation Strategy to Avoid Data Sparsity for Short Text

Using language models to improve opinion detection

PeRView: A Framework for Personalized Review Selection Using Micro-Reviews

References

Latent dirichlet allocation

Latent Dirichlet Allocation

Mining and summarizing customer reviews

Finding scientific topics

TwitterRank: finding topic-sensitive influential twitterers

Related Papers (4)

Topic, and focus

What research topic is the “hot” research topic?

Topic Modeling of Suicide Papers using Text Mining

Research collaboration and topic trends in Computer Science based on top active authors