What is the key to a question recommender?

given a question collection, the distribution of users and their answered questions can be formulated as follows:Pr(u, q) = ∑zPr(u|z) Pr(q|z)Pr(z) (1)where u ∈ u1, u2, ..., un are users, q ∈ q1, q2, ..., qm are questions and z ∈ z1, z2, ..., zk are k topic models, each capturing one topic u.

(Open Access) Probabilistic question recommendation for question answering communities (2009) | Mingcheng Qu

Q: What are the contributions mentioned in the paper "Probabilistic question recommendation for question answering communities∗" ?

In this paper, the authors adopt the Probabilistic Latent Semantic Analysis ( PLSA ) model for question recommendation and propose a novel metric to evaluate the performance of their approach. The experimental results show their recommendation approach is effective.

Q: What is the common method of QA?

such as maintaining in user home pages a question list automatically generated based on features like posted time and ratings.

Q: where w w1, w2,..., wl are words?

In order to deal with sparsity, the authors use a user-word aspect model instead, where the co-occurrence data represent the event that users type words in a particular question:Pr(u, w) = ∑zPr(u|z) Pr(w|z)Pr(z) (2)where w ∈ w1, w2, ..., wl are words which questions contain.

Q: What is the simplest way to find the answers to questions?

with the exponential growth in data volume, it is becoming more and more time-consuming for users to find the questions that are of interest to them.

Q: What is the probability of a question being recomended?

Probabilistic Question Recommendation for Question

Answering Communities

∗

Mingcheng Qu

, Guang Qiu

, Xiaofei He

, Cheng Zhang

, Hao Wu

, Jiajun Bu

, and

Chun Chen

1,2

College of Computer Science and Technology, Zhejiang University, China

China Disabled Persons’ Federation Information Center

{qumingcheng, qiuguang, haowu, bjj, chenc}@zju.edu.cn,

xiaofeihe@cad.zju.edu.cn,

zhangcheng@cdpf.org.cn

ABSTRACT

User-Interactive Question Answering (QA ) communities such

as Yahoo! Answers are growing in popularity. However,

as these QA sites always have thousands of new questions

posted daily, it is diﬃcult for users to ﬁnd the qu estions that

are of interest to them. Consequently, this may delay the an-

swering of th e new questions. This gives rise to question rec-

ommendation techniques that help users locate interesting

questions. In this paper, we adopt the Probabilistic Latent

Semantic Analysis (PLSA) model for qu estion recommenda-

tion and propose a novel metric to evaluate the performance

of our approach. The experimental results show our recom-

mendation approach is eﬀective.

Categories and Subject Descriptors

H.3.3 [Information Search and Retrieval]: information

ﬁltering

General Terms

Algorithms, Design, Experimentation

Keywords

Question Recommendation, Question Answering, PLSA

1. INTRODUCTION

Nowadays, the User-Interactive Question Answering (QA)

community has become a popular medium for online infor-

mation seeking and knowledge sharing. For example, Ya-

hoo! Answers

, one of the largest QA communities nowa-

days, has app roximately 23 million resolved questions, which

are posted and answered by users. In addition, there are also

thousands of questions posted daily. However, with the ex-

ponential growth in data volume, it is becoming more and

more time-consuming for users to ﬁnd the questions that are

of interest to them. As a result, t he asker would have to wait

for a long time before getting answers to his/her question.

∗

Supported by National Key Technology R&D Program of

China (NO.2008BAH26B02)

http://answers.yahoo.com

WWW 2009, April 20–24, 2009, Madrid, Spain.

ACM 978-1-60558-487-4/09/04.

To h elp users ﬁnd interesting questions and expedite the

answering of new questions, some question recommendation

attempts are seen in QA communities like Yahoo! Answers,

such as maintaining in user home p ages a qu estion list auto-

matically generated based on features like posted time and

ratings.

However, these systems are not typical recommender sys-

tems in essence in that they have not taken users’ interest

into account. In our work, We employ PLSA [3] t o analyze a

user’s interest by investigating his previously asked questions

and accordingly generate ﬁ ne-grained question recommen-

dation. Meanwhile, because traditional evaluation metrics

cannot meet t he special requirements of QA communities,

we also propose a novel metric to evaluate the recommen-

dation performance. Experimental results show the PLSA

model works eﬀectively for recommending questions.

2. PLSA FOR QUESTION RECOMMENDA-

TION

Aiming to improve a QA community’s eﬃciency, question

recommendation is to recommend questions to users who are

interested in, and capable of answering them. Therefore, the

key to a question recommender is to capture users’ interest.

In our work, we propose to analyze u sers’ interest by in-

vestigating his previously asked questions. In a typical ques-

tion answering cycle, users always answer questions by ﬁrst

identifying the topics in an implicit way. PLSA model [2],

known for its ability of capturing underlying topics, suits

our problem well. The latent variables in PLSA denote the

topics of corresponding questions. Therefore, given a ques-

tion collection, the distribution of users and their answered

questions can be formulated as follows:

Pr(u, q) =

Pr(u|z) Pr(q|z) Pr(z) (1)

where u ∈ u

, u

, ..., u

are users, q ∈ q

, q

, ..., q

are ques-

tions and z ∈ z

, z

, ..., z

are k topic models, each capturing

one topic u.

However, in a real QA community, each user can only

answer a small percentage of the overall questions, which

means that most observations (u, q) should be zero. In or-

der to deal with sparsity, we use a user-word aspect model

instead, where the co-occurrence data represent the event

that users type words in a particular question:

Pr(u, w) =

Pr(u|z) Pr(w|z) Pr(z) (2)

WWW 2009 MADRID!

Poster Sessions: Friday, April 24, 2009

1229

where w ∈ w

, w

, ..., w

are words which questions contain.

Note that the PLSA model allows multiple topics per user,

reﬂecting the fact that each user has lots of interest.

Then t he log likelihood L of the question collection is

L =

u,w

c(u, w) log Pr(u, w) (3)

where c(u, w) is t he sum of word w’s count in all questions

the user u answers.

Model parameters can be learned using Expectation Max-

imization (EM) to ﬁnd a local maximum of the log likelihood

of the question collection:

Pr(z|u, w) =

Pr(u|z) Pr(w|z) Pr(z)

′

Pr(u|z

′

) Pr(w|z

′

) Pr(z

′

)

(4)

Pr(u|z) ∝

c(u, w) Pr(z|u, w) (5)

Pr(w|z) ∝

c(u, w) Pr(z|u, w) (6)

Pr(z) ∝

u,w

c(u, w) Pr(z|u, w) (7)

We then model recommending questions to users as the

posterior probability Pr(u|q), that is, according to how likely

it is that user u will access the corresponding question q. Ac-

cording to Bayesian law, we can compute Pr(u|q) ∝ Pr(u, q),

which is calculated as the product of the probabilities of the

words q contains, normalized by the qu estion length:

Pr(u, q) =

Pr(u, w

)

1/|q|

(8)

where w

are words in the question q , and |q| is the length

of q. Consequently, a rank in g list of users will be maintained

for the qu estion q according to the score. The recommenda-

tion can be conducted by recommending q to top-n u sers.

3. EXPERIMENTS AND RESULTS

To obtain the data sets for experiments, we crawl ques-

tions of three categories of the Yahoo! Answers: Astronomy,

Global Warming, and Philosophy, and ﬁlter out all questions

which have only one answer. Questions in each data sets are

already labeled with the best answers. The data set statis-

tics are listed in Table 1. For each category, a PLSA model

is trained from 85% of the question sets (questions and th eir

corresponding answers), and the left are used for testing. We

empirically choose the number of latent variables k = 100.

In traditional recommender systems, we can use the pre-

cision to evaluate the performance. However, the precision

metric fails to suit the QA context. Users in a QA com-

munity can only access a small portion of questions of all.

While questions one accessed are those he/she is interested

in, there is no guarantee that questions h e/she has not ac-

cessed are those he/she does not like.

Here we propose a new metric for the evaluation of ques-

tion recommendation. For each question in testing data, we

only recommend it to the users who actually answered it in-

stead of all possible users in the whole data sets. Then the

accuracy for this question is deﬁned according to the rank of

the user who provides the best answerer. Since the choice of

the best answer subjects to asker’s personal viewpoint, one

may question whether the best answer is objectively the best

Table 1: Yahoo! Answers data set.

Category Questions Answers U sers

Astronomy 8,920 49,297 16,391

Global Warming 8,330 82,788 22,015

Philosophy 9,477 84,953 22,822

Table 2: Comparison of recommending methods.

Category Cosine PLSA

Astronomy 0.621 0.648

Global Warming 0.627 0.674

Philosophy 0.634 0.709

of all, or just the asker’s prejudice. Adamic et al. [1] check

questions from diﬀerent categories in Yahoo! Answers, and

draw the conclusion that answers selected as best answers

are mostly indeed best answers for t he qu estions. There-

fore, in this paper we use the best answerer’s rank as the

ground truth of our evaluation metric:

accuracy =

|R| − R

− 1

|R| − 1

(9)

where |R| is the length of recommending list, which is equally

the number of answers in th is question set, and R

is the

rank of the best answerer.

As there is no previous work done on recommending ques-

tions to users according to their interest in QA communities,

for comparison we implement Cosine Similarity between

user and question vectors, with tf.idf weights:

s(u, q) =

tf.idf(u, w)tf.idf(q, w)

tf.idf(u, w)

tf.idf(q, w)

(10)

where tf.idf(q, w) is the word w’s tf.idf weight in q, and

tf.idf(u, w) is the sum of w’s tf.idf weights in questions that

u posts/answers.

Table 2 shows the experimental results. We observe that

our PLSA model outperforms the cosine similarity measure

in all the three data sets. I t shows PLSA can capture users’

interest and recommend questions eﬀectively.

4. CONCLUSION

In this paper, we introduce the novel problem of question

recommendation in Question Answering communities. We

adopt the PLSA model to tackle this novel problem. We

also propose a novel evaluation metric to measure the per-

formance. The results show PLSA model can improve the

quality of recommending. In conclusion, our study opens a

promising direction to question recommendation.

5. REFERENCES

[1] L. A. Adamic, J. Zhang, E. Bakshy, and M. S. Ackerman.

Knowledge sharing and yahoo answers: everyone knows

something. In WWW ’08.

[2] T. Hofmann. Probabilistic latent semantic indexing. In SIGIR

’99.

[3] A. Popescul, L. H. Ungar, D. M. Pennock, and S. Lawrence.

Probabilistic model s f or uniﬁed collaborative and content-based

recommendation i n sparse-data environments. In UAI ’01.

WWW 2009 MADRID!

Poster Sessions: Friday, April 24, 2009

1230

Probabilistic question recommendation for question answering communities

Figures

Citations

CQArank: jointly model topics and expertise in community question answering

Personalized task recommendation in crowdsourcing information systems - Current state of the art

Finding expert users in community question answering

Routing questions to appropriate answerers in community question answering services

Expert Finding for Question Answering via Graph Regularized Matrix Completion

References

Probabilistic latent semantic indexing

Knowledge sharing and yahoo answers: everyone knows something

Probabilistic Models for Unified Collaborative and Content-Based Recommendation in Sparse-Data Environments

Related Papers (5)

Latent dirichlet allocation

Discovering authorities in question answer communities by using link analysis

Finding high-quality content in social media

The anatomy of a large-scale social search engine

Knowledge sharing and yahoo answers: everyone knows something

Frequently Asked Questions (7)

Q1. What are the contributions mentioned in the paper "Probabilistic question recommendation for question answering communities∗" ?

Q2. What is the common method of QA?

Q3. How many questions are trained from the data sets?

Q4. where w w1, w2,..., wl are words?

Q5. What is the key to a question recommender?

Q6. What is the simplest way to find the answers to questions?

Q7. What is the probability of a question being recomended?