scispace - formally typeset
Open AccessProceedings Article

Sentiment-based topic suggestion for micro-reviews

Reads0
Chats0
TLDR
This paper addresses the problem of predicting the topic of a micro-review by a user who visits a new venue using pooling strategies, and proposes novel probabilistic models based on Latent Dirichlet Allocation (LDA) for extracting the topics related to a user-venue pair.
Abstract
Location-based social sites, such as Foursquare or Yelp, are gaining increasing popularity. These sites allow users to check in at venues and leave a short commentary in the form of a micro-review. Micro-reviews are rich in content as they offer a distilled and concise account of user experience. In this paper we consider the problem of predicting the topic of a micro-review by a user who visits a new venue. Such a prediction can help users make informed decisions, and also help venue owners personalize users’ experiences. However, topic modeling for micro-reviews is particularly difficult, due to their short and fragmented nature. We address this issue using pooling strategies, which aggregate micro-reviews at the venue or user level, and we propose novel probabilistic models based on Latent Dirichlet Allocation (LDA) for extracting the topics related to a user-venue pair. Our best topic model integrates influences from both venue inherent properties and user preferences, considering at the same the sentiment orientation of the users. Experimental results on real datasets demonstrate the superiority of this model compared to simpler models and previous work; they also show that venue-inherent properties have higher influence on the topics of micro-reviews.

read more

Content maybe subject to copyright    Report

Sentiment-Based Topic Suggestion for Micro-Reviews
Ziyu Lu
, Nikos Mamoulis
, Evaggelia Pitoura
and Panayiotis Tsaparas
The University of Hong Kong
University of Ioannina
zylu@cs.hku.hk {nikos,pitoura,tsap}@cs.uoi.gr
Abstract
Location-based social sites, such as Foursquare or Yelp,
are gaining increasing popularity. These sites allow users to
check in at venues and leave a short commentary in the form
of a micro-review. Micro-reviews are rich in content as they
offer a distilled and concise account of user experience. In
this paper we consider the problem of predicting the topic
of a micro-review by a user who visits a new venue. Such a
prediction can help users make informed decisions, and also
help venue owners personalize users’ experiences. However,
topic modeling for micro-reviews is particularly difficult, due
to their short and fragmented nature. We address this issue
using pooling strategies, which aggregate micro-reviews at
the venue or user level, and we propose novel probabilistic
models based on Latent Dirichlet Allocation (LDA) for ex-
tracting the topics related to a user-venue pair. Our best topic
model integrates influences from both venue inherent prop-
erties and user preferences, considering at the same the sen-
timent orientation of the users. Experimental results on real
datasets demonstrate the superiority of this model compared
to simpler models and previous work; they also show that
venue-inherent properties have higher influence on the top-
ics of micro-reviews.
Introduction
In the past few years, location-based social sites, such as
Foursquare
1
, Yelp
2
and Facebook places
3
have emerged as
prime online destinations, where users can record their foot-
prints via check-ins, as well as their experience through
micro-reviews. Micro-reviews, or tips, accompany a check-
in at a venue, and they contain a short commentary on the
venue. Tips may offer information about the venue, opin-
ions on what is good, or advice for new customers. They
are very targeted and concise, and they provide a distilled
account of the experience of the users in the venue. They
are a fast-growing corpus, and they have recently attracted
considerable research interest (Aggarwal, Almeida, and Ku-
maraguru 2013; Moraes et al. 2013; Nguyen, Lauw, and
Tsaparas 2015) for the rich content they contain.
Copyright
c
2016, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
1
https://foursquare.com/
2
http://www.yelp.com/
3
https://www.facebook.com/places/
In this paper, we consider the problem of predicting the
topic that a user will comment on in her tip when checking
in to a new venue. This is a problem of great practical im-
portance for both users and venues. For venues, knowing in
advance the aspect of the venue that a user will most likely
focus on, allows them to offer a personalized experience to
the user. For example, if we can predict that when visiting a
restaurant a specific user is likely to comment on the quality
of the service, the manager of the venue can make sure to
fine-tune their service in order to meet the customer’s needs.
Furthermore, exposing the topic prediction to the users pro-
vides fine-grain information about the venue. For example,
for a user who is a wine enthusiast, recommending a restau-
rant, and predicting that the user is likely to comment on the
extensive wine selection of the place adds texture and con-
text to the recommendation.
Similar problems have been considered in the context of
full-text reviews, where the goal is to estimate the rating
of specific aspects of venues (Wang, Lu, and Zhai 2011;
Hai et al. 2014; Wang, Lu, and Zhai 2010; Moghaddam and
Ester 2011). These approaches rely on generative models
that extract latent topical aspects and their ratings. However,
applying such techniques to micro-reviews is not straightfor-
ward. Micro-reviews have special characteristics, very dif-
ferent from those of reviews. First, they are very short (up
to 200 characters), and provide very limited context and few
word co-occurrences (Lin et al. 2014). Second, due to the
length limitation, the expression is very dense, and the text
is often fragmented and poorly structured. Finally, micro-
reviews often contain diverse pieces of information stitched
together in a few sentences, resulting in incoherent seman-
tics. These characteristics make opinion mining and analysis
on micro-reviews harder compared to the full-text reviews.
To the best of our knowledge, we are the first to consider the
problem of topic prediction for micro-reviews.
In order to tackle the limited and incomplete character-
istics of micro-reviews, we use pooling strategies for doc-
ument collaborative filtering (Mehrotra et al. 2013; Weng
et al. 2010) and contextualization (Tang, Zhang, and Mei
2013) to integrate multiple contexts. We aggregate all micro-
reviews of a venue and a user into a single venue-document
and user-document respectively, and we consider probabilis-
tic topic models on the aggregated documents for the prob-
lem of topic prediction. We first define two simple mod-

els that apply Latent Dirichlet Allocation (LDA) (Blei, Ng,
and Jordan 2003) individually on the venue and user doc-
uments. Building on these models, we propose two novel
models, Authority-LDA (ALDA) and Authority-Sentiment
LDA (ASLDA), which integrate both venue inherent proper-
ties and user personalized preferences. In the ASLDA model
we add a sentiment layer to cluster topics into different sen-
timent groups based on the assumption that users might have
personalized sentiment orientation (e.g. some users tend to
be negative), and they are more likely to comment on as-
pects (topics) that match their sentiment orientation (e.g.,
negative users will tend to comment on negative aspects).
The ASLDA model can predict not just the topic of a fu-
ture tip, but also the most probable sentiment orientation.
We evaluate the proposed models on datasets from two real
location-based social sites Foursquare and Yelp. Experimen-
tal results show that our methods outperform other competi-
tor approaches.
In summary, in this paper we make the following contri-
butions:
We define the problem of topic prediction for micro-
reviews when a user checks in an unvisited venue. To the
best of our knowledge, this is the first work that deals with
topic prediction and suggestion in the context of micro-
reviews at location-based social sites.
We define four probabilistic models for the problem, in-
cluding two novel probabilistic models that leverage both
venue inherent aspects and user personalized preferences.
The Authority-Sentiment LDA (ASLDA) model intro-
duces a novel way to incorporate the influences from the
user sentiment orientation in the topic prediction, and it is
able to predict the sentiment orientation of a future tip.
We evaluate the proposed methods for topic prediction
on four datasets from two real location-based social sites
Foursquare and Yelp, and compare against other ap-
proaches.
Related Work
Micro-reviews is a relatively new corpus that only recently
drew the attention of the research community. There is work
on micro-reviews on spam detection (Aggarwal, Almeida,
and Kumaraguru 2013), polarity analysis (Moraes et al.
2013), micro-review summarization (Nguyen, Lauw, and
Tsaparas 2015). To the best of our knowledge we are the
first to consider the problem of topic prediction for micro-
reviews.
Topic modeling algorithms have been widely adopted in
text mining (Blei, Ng, and Jordan 2003; Rosen-Zvi et al.
2004). One of the first such models, proposed by Blei et al.
(Blei, Ng, and Jordan 2003), is Latent Dirichlet allocation
(LDA). Many topic models based on LDA have been de-
veloped to address review mining problems. For example,
Moghaddam and Ester (Moghaddam and Ester 2011) intro-
duced an Interdependent Latent Dirichlet Allocation (ILDA)
model to infer latent aspects and their ratings for online
product reviews. Lin and He (Lin and He 2009) proposed
a joint sentiment-topic model (JST) for sentiment analysis
of movie reviews, by extending LDA with a new sentiment
layer. JST is based on the assumption that topic genera-
tion depends on sentiments, and word generation depends
on sentiment-topic pairs.
However, (Lin et al. 2014) showed that the characteris-
tics of short text reduce the effectiveness of topic model-
ing methods. Micro-reviews in location-based social sites
are very short, and have a relatively small vocabulary
and a broad range of topics. The probability of word co-
occurrence in the micro-reviews is very small, compromis-
ing the performance of topic models originally designed for
long reviews. In order to address this data sparsity problem,
heuristics such as document pooling (Mehrotra et al. 2013;
Weng et al. 2010) or contextualization (Tang, Zhang, and
Mei 2013) have been proposed to improve the performance
of topic modeling on short text. For instance, Mehrotra et al.
(Mehrotra et al. 2013) proposed to aggregate all documents
by the same author or all documents with specific hashtags
and form pooling documents on which topic modeling can
be applied effectively. Contextualized topic models are pro-
posed to integrate particular types of contexts into classical
models like LDA, by introducing either additional layers to
the topic model (Jo and Oh 2011; Rosen-Zvi et al. 2004;
Lin and He 2009) or use a coin-flipping selection process to
select among contexts (Paul and Girju 2010; Tang, Zhang,
and Mei 2013; Zhao et al. 2011). The author-topic model
(AT) of (Rosen-Zvi et al. 2004), which utilizes authorship in-
formation for modeling scientific publications, can also been
viewed as a contextualized topic model. Tang et al. (Tang,
Zhang, and Mei 2013) proposed a model, which formulates
different types of contexts as multiple views of the partition
of the corpus and uses voting to determine consensus topics.
Our models adopt both pooling methods and contextu-
alization in order to facilitate topic discovery for micro-
reviews in location-based social sites. We aggregate micro-
reviews on the same venue or micro-reviews by the same au-
thor to construct aggregated pooling documents. To the ag-
gregated documents, we add additional context such as au-
thority information and the sentiment orientation of users to
improve latent topic learning. Prior work on sentiment-topic
models (Lin and He 2009; Moghaddam and Ester 2011) in-
troduced sentiment as a latent (unknown) variable based on
some assumptions of the dependencies between sentiment
variable and topic variable, and then jointly learned the sen-
timents and topics. On the other hand, in our work, we as-
sume that the sentiment information is observed (from a ex-
isting sentiment lexicon for short text) and utilize the sen-
timent orientation of users to enhance the process of topic
discovery.
Problem Definition
In this section we introduce some terminology and define
our problem.
A location-based social site consists of a set of users, a set
of venues, and a collection of micro-reviews. Formally, we
will use V to denote the set of venues in the site, and A to
denote the set of all users (authors) in the site. A tip t is a
short piece of text written by a user u A for a venue v
V . A tip is defined over a vocabulary W consisting of the
union of all terms that appear in all tips in our collection. We

assume that stop-words have been removed and that terms
have been stemmed. We define a micro-review r = hu, v, ti
as a triplet consisting of a user u U, a venue v, and a tip t
that was left from the user u about the venue v. We use R to
denote the set of all micro-reviews.
As tips consist of short text, studying them individually is
not very informative. We thus use pooling methods to con-
struct aggregated documents for a venue or a user. For a
venue v, we use A
v
to define the set of all users that have
written a micro-review for venue v, and R
v
to define the
collection of all micro-reviews for venue v. We use d
v
to
denote the venue-document defined by taking the union of
all the tips in R
v
. We use W
d
= {w
1
, w
2
, ...w
m
} to denote
the vocabulary of the document d
v
. In a symmetric fashion,
for a user u we define the venue set V
u
, the micro-review set
R
u
, the user-document d
u
and the vocabulary W
u
.
Given a collection of (user or venue) documents D =
{d
1
, d
2
, ...}, using topic-modeling techniques we can extract
a set of K latent topics Z = {z
1
, ..., z
k
}. Each topic z
i
is
defined as a distribution over the vocabulary W . Our goal is
given user-venue pair (u, v) for which there is currently no
micro-review r, to predict the latent topic of that tip. For-
mally, we define our problem as follows.
Problem Definition: Given social site consisting of a col-
lection of users A, venues V , and micro-reviews R, a set of
latent topics Z, a user-venue pair (u, v), and a number N,
identify a set of N latent topics Z
N
Z that the user u is
most likely to comment on about venue v.
Proposed models
(a) VLDA (b) ULDA
Figure 1: LDA models
LDA model
The LDA model was proposed in (Blei, Ng, and Jordan
2003). It represents each document as a multinomial distri-
bution over K latent topics, and each topic as a multino-
mial distribution over terms (words). We applied LDA on
our pooling documents (venue or user documents). For each
type (venue/user) of document collection, we derived two
distributions. Using the venue document collection, we de-
rived a venue-topic distribution, and a topic-word distribu-
tion. This model is denoted as Venue-LDA model (VLDA).
Similarly, using the user document collection, we extracted a
user-topic distribution and a topic-word distribution to form
a User-LDA model (ULDA). Each of these two models cap-
tures the different influences that the venues or users, re-
spectively, have on the topics of the tip to be given by the
target user to the target venue. The graphical representations
of VLDA and ULDA are shown in Figure 1. In the diagrams,
M is the number of terms in a venue/user document, V is
the number of venues, U is the number of users, and K is
the number of topics (aspects) z.
In both models, φ is the topic-term distribution; θ is the
venue-topic distribution, and χ is the user-topic distribution
in Figure 1(a) and Figure 1(b), respectively; α and β are
prior parameters. The generative process of LDA models on
venue documents (VLDA) or user documents (ULDA) are
as follows:
1. Sample θ (VLDA) or χ (ULDA) from Dirichlet priors
Dir(α).
2. For each topic z, sample φ
z
from Dirichlet priors β
z
.
3. For each term w
di
in the (venue or user) document d,
draw a topic z
di
from Dir(θ
d
) (VLDA) or Dir(χ
d
)
(ULDA)
draw a term w
di
from Dir(φ
z
di
)
Parameter Estimation. The probability of document col-
lection D is defined as follows:
p(D|α, β) =
N
Y
d=1
Z
p(θ|α)(
M
Y
m=1
X
z
p(z|θ)p(w
dm
|z, β))
d
For the venue document collection, each document is a
venue document and N = V . For the user document col-
lection, each document is a user document and N = U, and
θ is replaced by χ.
We use Gibbs sampling (Griffiths and Steyvers 2004)
to perform approximate inference, and to estimate the un-
known parameters {θ, φ}. The conditional distribution for
Gibbs sampling is as follows:.
p(z
di
|z
¬di
, w, d)
n
¬di
dz
di
+ α
z
di
P
z
(n
¬di
dz
+ α
z
)
×
n
¬di
z
di
w
di
+ β
w
di
P
w
(n
¬di
z
di
w
+ β
w
)
where n
dz
is the number of times that topic z has been sam-
pled from the multinomial distribution to the document d.
n
zw
is the number of times that term w has been sampled
to topic z. A superscript (e.g., ¬di), denotes a quantity, ex-
cluding the specified instance (e.g. the i
th
word in document
d).
After sampling for a sufficient number of iterations, θ or
χ and φ are calculated as follows:
ˆ
θ
v z
=
n
v z
+ α
z
P
z
0
n
v z
+ α
z
ˆχ
uz
=
n
uz
+ α
z
P
z
0
n
uz
+ α
z
ˆ
φ
zw
=
n
zw
+ β
w
P
w
0
n
zw
0
+ β
w
0
(1)
Authority-LDA model
VLDA and ULDA models only consider influences from one
side (either venue or user). We propose a new Authority-
LDA model (ALDA) which integrates influences from both

Figure 2: Authority-LDA model
Table 1: Notations used in ALDA
Symbols Description
V , U the number of venues, the number of users
K the number of topics
M the number of words in an document
A
v
the set of users for the venue document d
v
u, z the user variable, the latent topic variable
c the switch variable
θ
v
distribution of topics specific to venue v
χ
u
distribution of topics specific to user u
φ
z
distribution of words specific to topic z
α, σ, β Dirichlet priors for θ, χ, φ
λ
u
the parameter of Bernoulli distribution specific
to user u for sampling the binary switch c
γ Beta prior for λ, where γ = {γ, γ
0
}
users and venues. ALDA employs venue-wise pooling to
construct venue documents for the representation of venue
influences and leverages the author information to represent
user influences as in Author-Topic (AT) modeling (Rosen-
Zvi et al. 2004). Then the latent topics depend on both the
inherent aspects of venues and personal preferences of users.
We use a mixing parameter λ to control the weights of
influence from both sides. The parameter λ is follows a
Bernoulli distribution which samples a binary variable c that
switches between influence from venue inherent aspects and
user preferences. In other words, when a user u comments
on a venue, we assume that the tip is influenced by the user’s
personal preferences with probability λ
u
(c=1) and by the in-
herent aspects of venue with probability 1λ
u
(c=0). The la-
tent topics are still multinomial distributions over terms. Fig-
ure 2 shows a graphical representation of the ALDA model,
while Table 1 summarizes the symbols used in ALDA.
Note that the latent topics z depend on both the venue-
topic distribution θ and the user-topic distribution χ. A
v
is
the set of users who have written micro-reviews about venue
v, namely the authority users for the venue document v. φ
represents topic-term distribution. M is the number of terms
in the venue documents, V is the number of venues, U is
the number of users, and K is the number of topics (aspects)
z. The generative process of the proposed Authority-LDA
model is as follows:
1. For each topic z, draw φ
z
from Dir(β).
2. For each user u,
Draw χ
u
from Dir(σ) and λ
u
from Beta(γ)
3. For each venue document d
v
,
Draw θ
v
from Dir(α)
For each term w
d
v
i
in the venue document d
v
,
Draw a user u from A
v
uniformly,
Draw switch c Bernoulli(λ
u
)
If c=0
Draw a topic z
d
v
i
from Dir(θ
v
)
If c=1
Draw a topic z
d
v
i
from Dir(χ
u
)
draw a term w
d
v
i
from Dir(φ
z
d
v
i
)
Parameter Estimation. We also use Gibbs sampling to
estimate the unknown parameters {θ, χ, φ, λ}. In the Gibbs
sampling procedure, we first compute the posterior distribu-
tion on u, c and z and then estimate {θ, χ, φ, λ}. The poste-
rior distribution of the hidden variables for each word w
d
v
i
is calculated as follows:
P (u
d
v
i
= u, z
d
v
i
= z, c
d
v
i
= 1|u
¬d
v
i
, z
¬d
v
i
, c
¬d
v
i
, w, A
v
)
n
¬d
v
i
uc
(1) + γ
n
¬d
v
i
uc
+ γ + γ
0
×
n
¬d
v
i
uz
+ σ
P
z
0
(n
¬d
v
i
uz
0
+ σ)
×
n
¬d
v
i
zw
+ β
P
w
0
(n
¬d
v
i
zw
0
+ β)
P (u
d
v
i
= u, z
d
v
i
= z, c
d
v
i
= 0|u
¬d
v
i
, z
¬d
v
i
, c
¬d
v
i
, w, A
v
)
n
¬d
v
i
uc
(0) + γ
0
n
¬d
v
i
uc
+ γ + γ
0
×
n
¬d
v
i
v z
+ α
P
z
0
(n
¬d
v
i
v z
0
+ α)
×
n
¬d
v
i
zw
+ β
P
w
0
(n
¬d
v
i
zw
0
+ β)
where n
uc
(1) and n
uc
(0) is the number of times that c=1 and
c=0, respectively, has been sampled for user u. n
uc
equals
n
uc
(1) + n
uc
(0). n
v z
is the number of times that topic z has
been sampled from the distribution θ
v
specific to venue v
and n
uz
is the number of times that topic z has been sampled
from the distribution χ
u
. n
zw
is the number of times that
term w has been sampled from the distribution φ
z
specific to
topic z. Superscript ¬d
v
i again denotes a quantity excluding
the current instance d
v
i.
After Gibbs sampling, {θ, χ, φ, λ} can be estimated as
follows:
ˆ
θ
v z
=
n
v z
+ α
z
P
z
0
n
v z
+ α
z
ˆχ
uz
=
n
uz
+ σ
z
P
z
0
n
uz
+ σ
z
ˆ
φ
zw
=
n
zw
+ β
w
P
w
0
n
zw
0
+ β
w
0
ˆ
λ
u
=
n
uc
(1) + γ
n
uc
+ γ + γ
0
(2)
Authority-Sentiment-LDA model
Quite frequently, the commenting behavior of users is af-
fected by their sentiment. For example, there exist negative
users who tend to comment on negative aspects of products;
at the same time they do not bother to post their opinions
for positive or neutral aspects. Similarly for positive users.
Motivated by this observation, we label the users based on
their sentiment orientation, i.e., tendency to give positive or
negative comments. We then predict the tip aspects for the
target venue, taking into consideration the user sentiments.

The resulting model is a Authority-sentiment-LDA model
(ASLDA), which extends ALDA by adding a sentiment ori-
entation layer, which captures the users’ sentiment prefer-
ences. Sentiment orientation is not a latent variable, but a
known label. We used an existing sentiment lexicon (Hu and
Liu 2004) for micro-reviews like tweets to get the sentiment
polarity s for each term w. From this, we can derive for any
venue document d
v
, a set of hw, si pairs with terms and their
polarity and the authority user set A
v
. The main difference
between ALDA and ASLDA is that in ASLDA we assert
that the authority users are sentiment-oriented and that their
sentiments determine the predicted topic.
Similarly to ALDA, mixing parameter λ is introduced to
weight the influence from inherent aspects of venues and the
sentiment-oriented user preferences, by sampling a binary
variable c. A graphical representation of ASLDA is shown
Figure 3: Authority-sentiment-LDA model
in Figure 3. The notations used in the description of ASLDA
are the same as those used for ALDA (Table 1), except χ and
π. In this model: χ is the multinomial distribution of user
over sentiments, and χ
u
represents the probability distribu-
tion of sentiment orientation specific to u; π is the multino-
mial distribution of sentiments over topics for representing
the impact of each sentiment orientation on topic selection;
π
s
is the distribution of topic specific to sentiment orienta-
tion s; s is the known sentiment polarity for each term that
takes values from three labels: positive, negative and neu-
tral. The inherent properties of venues are still represented
as a multinomial distribution over topics θ and the term-topic
distribution φ. S is the number of sentiment orientation la-
bels (S=3, positive, negative, neutral). The generative pro-
cess of ASLDA (for venue document collections) is as fol-
lows:
1. For each topic z, draw φ
z
from Dir(β).
2. For each user u,
Draw χ
u
from Dir(σ) and λ
u
from Beta(γ)
3. For each venue document d
v
,
Draw θ
v
from Dir(α)
For each term w
d
v
i
in the venue document d
v
,
Draw an user author u from A
v
uniformly,
Draw switch c Bernoulli(λ
u
)
If c=0
Draw a topic z
d
v
i
from Dir(θ
v
)
If c=1
Draw a sentiment s
u
from Dir(χ
u
)
Draw a topic z
d
v
i
from Dir(π
s
u
)
draw a term w
d
v
i
from Dir(φ
z
d
v
i
)
Collapsed Gibbs sampling is used to estimate the unknown
parameters {θ, χ, π, φ, λ}. First, we calculate the posterior
probability as follows:
P (u
d
v
i
= u, z
d
v
i
= z, c
d
v
i
= 1|u
¬d
v
i
, z
¬d
v
i
, c
¬d
v
i
, w, s, A
v
)
n
¬d
v
i
uc
(1) + γ
n
¬d
v
i
uc
+ γ + γ
0
×
n
¬d
v
i
us
+ σ
P
s
0
(n
¬d
v
i
us
0
+ σ)
×
n
¬d
v
i
sz
+ η
P
z
0
(n
¬d
v
i
sz
0
+ η)
×
n
¬d
v
i
zw
+ β
P
w
0
(n
¬d
v
i
zw
0
+ β)
P (u
d
v
i
= u, z
d
v
i
= z, c
d
v
i
= 0|u
¬d
v
i
, z
¬d
v
i
, c
¬d
v
i
, w, s, A
v
)
n
¬d
v
i
uc
(0) + γ
0
n
¬d
v
i
uc
+ γ + γ
0
×
n
¬d
v
i
v z
+ α
P
z
0
(n
¬d
v
i
v z
0
+ α)
×
n
¬d
v
i
zw
+ β
P
w
0
(n
¬d
v
i
zw
0
+ β)
in which n
uc
(1), n
uc
(0), n
uc
, n
v z
, n
zw
have the same mean-
ing as in ALDA. n
us
is the number of times that the sent
s has been sampled from χ
u
specific to user u. n
sz
is the
number of times that the topic z has been sampled from the
distribution π
s
specific to the sentiment orientation s.
After sufficient iterations of Gibbs sampling,
{θ, χ, π, φ, λ} can be estimated as follows:
ˆ
θ
v z
=
n
v z
+ α
z
P
z
0
n
v z
+ α
z
ˆχ
us
=
n
us
+ σ
z
P
s
0
n
us
+ σ
s
ˆπ
sz
=
n
sz
+ σ
s
P
s
0
n
sz
+ σ
s
ˆ
φ
zw
=
n
zw
+ β
w
P
w
0
n
zw
0
+ β
w
0
ˆ
λ
u
=
n
uc
(1) + γ
n
uc
+ γ + γ
0
(3)
Topic Suggestion
After training the above models, our task is to estimate
p(z|u, v), i.e., the probability of all topics/aspects z given
a new pair of user and venue u, v.
Suggestion by base LDA models
For basic models like VLDA and ULDA, venues and users
are considered independently. In other words, the latent top-
ics detected from them are only based on one perspective:
the venue or the user. p(z|u, v) from VLDA is proportional
to θ
v
while p(z|u, v) from ULDA is proportional to χ
u
:
p(z|u, v) p(z|v) = θ
v
, p(z|u, v) p(z|u) = χ
u
(4)
Suggestion by ALDA
The Authority-LDA model (ALDA) considers both the
venues’ inherent aspects and the users’ commenting prefer-
ences. The detected topics are interdependently influenced
by the venue-topic distribution θ and the user-topic distribu-
tion χ. Given a query pair (v, u), the predicted topics depend

Citations
More filters
Journal ArticleDOI

RETRACTED ARTICLE: Sentiment topic emotion model on students feedback for educational benefits and practices

TL;DR: The editor has determined that the articles do not meet the required scholarly standards to remain published in the journal, and therefore has taken the decision to retract the articles.
Journal ArticleDOI

A Robust User Sentiment Biterm Topic Mixture Model Based on User Aggregation Strategy to Avoid Data Sparsity for Short Text

TL;DR: The proposed Robust User Sentiment Biterm Topic Mixture (RUSBTM) model discovers the user preference and their sentiment orientation views for effective Topic Modelling using Biterms or word-pair from the short text of a particular venue.
Journal ArticleDOI

Using language models to improve opinion detection

TL;DR: This paper proposes a lexicon-based approach for opinion retrieval that hypothesizes that a document with a strong similarity to opinionated sources is more likely to be opinionated itself and deduces the best opinion detection models.
Posted Content

PeRView: A Framework for Personalized Review Selection Using Micro-Reviews

TL;DR: A framework known as PeRView is proposed for personalized review selection using micro-reviews based on the proposed evaluation metric approach which considering two main factors (personalized matching score and subset size) and PRSA is proposed which makes use of multiple similarity measures merged to have highly efficient personalized reviews matching function for selection.
References
More filters
Journal ArticleDOI

Latent dirichlet allocation

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article

Latent Dirichlet Allocation

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Proceedings ArticleDOI

Mining and summarizing customer reviews

TL;DR: This research aims to mine and to summarize all the customer reviews of a product, and proposes several novel techniques to perform these tasks.
Journal ArticleDOI

Finding scientific topics

TL;DR: A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics.
Proceedings ArticleDOI

TwitterRank: finding topic-sensitive influential twitterers

TL;DR: Experimental results show that TwitterRank outperforms the one Twitter currently uses and other related algorithms, including the original PageRank and Topic-sensitive PageRank, which is proposed to measure the influence of users in Twitter.