Learning the Latent Topics for Question Retrieval in Community QA

Open AccessProceedings Article

Learning the Latent Topics for Question Retrieval in Community QA

Li Cai, +3 more

- pp 273-281

Chats0

TLDR

This paper proposes a topic model incorporated with the category information into the process of discovering the latent topics in the content of questions and combines the semantic similarity based latent topics with the translation-based language model into a unified framework for question retrieval.

Abstract:

Community-based Question Answering (cQA) is a popular online service where users can ask and answer questions on any topics. This paper is concerned with the problem of question retrieval. Question retrieval in cQA aims to find historical questions that are semantically equivalent or relevant to the queried questions. Although the translation-based language model (Xue et al., 2008) has gained the state-of-the-art performance for question retrieval, they ignore the latent topic information in calculating the semantic similarity between questions. In this paper, we propose a topic model incorporated with the category information into the process of discovering the latent topics in the content of questions. Then we combine the semantic similarity based latent topics with the translation-based language model into a unified framework for question retrieval. Experiments are carried out on a real world cQA data set from Yahoo! Answers. The results show that our proposed method can significantly improve the question retrieval performance of translation-based language model.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

CQArank: jointly model topics and expertise in community question answering

Liu Yang, +6 more

TL;DR: This work proposed Topic Expertise Model (TEM), a novel probabilistic generative model with GMM hybrid, to jointly model topics and expertise by integrating textual content model and link structure analysis, and proposed CQARank to measure user interests and expertise score under different topics.

...read moreread less

Proceedings ArticleDOI

Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering

Guangyou Zhou, +3 more

TL;DR: This paper proposes to learn continuous word embeddings with metadata of category information within cQA pages for question retrieval with the framework of fisher kernel to deal with the variable size of word embedding vectors.

...read moreread less

Proceedings ArticleDOI

Question-answer topic model for question retrieval in community question answering

Zongcheng Ji, +3 more

TL;DR: A novel Question-Answer Topic Model (QATM) is proposed to learn the latent topics aligned across the question-answer pairs to alleviate the lexical gap problem, with the assumption that a question and its paired answer share the same topic distribution.

...read moreread less

Proceedings ArticleDOI

Question Retrieval with High Quality Answers in Community Question Answering

Kai Zhang, +4 more

TL;DR: A topic-based language model, which matches questions not only on a term level but also on a topic level, which can significantly outperform state-of-the-art retrieval models in CQA.

...read moreread less

Proceedings ArticleDOI

Learning Hybrid Representations to Retrieve Semantically Equivalent Questions

Cicero Nogueira dos Santos, +3 more

TL;DR: Retrieving similar questions in online QA (2) BOW-CNN is more robust than the pure CNN for long texts.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Latent dirichlet allocation

David M. Blei, +2 more

- 01 Mar 2003 -

Journal of Machine Learning Research

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

Proceedings Article

Latent Dirichlet Allocation

David M. Blei, +2 more

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).

...read moreread less

Journal ArticleDOI

A vector space model for automatic indexing

Gerard Salton, +2 more

- 01 Nov 1975 -

Communications of The ACM

TL;DR: An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents, demonstating the usefulness of the model.

...read moreread less

Journal ArticleDOI

Finding scientific topics

Thomas L. Griffiths, +1 more

- 06 Apr 2004 -

Proceedings of the National Academy of S...

TL;DR: A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics.

...read moreread less

Journal Article

The mathematics of statistical machine translation: parameter estimation

Peter Fitzhugh Brown, +3 more

- 01 Jun 1993 -

Computational Linguistics

TL;DR: The authors describe a series of five statistical models of the translation process and give algorithms for estimating the parameters of these models given a set of pairs of sentences that are translations of one another.

...read moreread less

Journal of Machine Learning Research

A syntactic tree matching approach to finding similar questions in community-based qa services

Kai Wang, +2 more

Bridging the lexical chasm: statistical approaches to answer-finding

Adam L. Berger, +4 more

Learning the Latent Topics for Question Retrieval in Community QA

Citations

CQArank: jointly model topics and expertise in community question answering

Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering

Question-answer topic model for question retrieval in community question answering

Question Retrieval with High Quality Answers in Community Question Answering

Learning Hybrid Representations to Retrieve Semantically Equivalent Questions

References

Latent dirichlet allocation

Latent Dirichlet Allocation

A vector space model for automatic indexing

Finding scientific topics

The mathematics of statistical machine translation: parameter estimation

Related Papers (5)

Retrieval models for question and answer archives

Finding similar questions in large question and answer archives

Latent dirichlet allocation

A syntactic tree matching approach to finding similar questions in community-based qa services

Bridging the lexical chasm: statistical approaches to answer-finding