Open AccessProceedings Article
Learning the Latent Topics for Question Retrieval in Community QA
Li Cai,Guangyou Zhou,Kang Liu,Jun Zhao +3 more
- pp 273-281
Reads0
Chats0
TLDR
This paper proposes a topic model incorporated with the category information into the process of discovering the latent topics in the content of questions and combines the semantic similarity based latent topics with the translation-based language model into a unified framework for question retrieval.Abstract:
Community-based Question Answering (cQA) is a popular online service where users can ask and answer questions on any topics. This paper is concerned with the problem of question retrieval. Question retrieval in cQA aims to find historical questions that are semantically equivalent or relevant to the queried questions. Although the translation-based language model (Xue et al., 2008) has gained the state-of-the-art performance for question retrieval, they ignore the latent topic information in calculating the semantic similarity between questions. In this paper, we propose a topic model incorporated with the category information into the process of discovering the latent topics in the content of questions. Then we combine the semantic similarity based latent topics with the translation-based language model into a unified framework for question retrieval. Experiments are carried out on a real world cQA data set from Yahoo! Answers. The results show that our proposed method can significantly improve the question retrieval performance of translation-based language model.read more
Citations
More filters
Proceedings ArticleDOI
CQArank: jointly model topics and expertise in community question answering
TL;DR: This work proposed Topic Expertise Model (TEM), a novel probabilistic generative model with GMM hybrid, to jointly model topics and expertise by integrating textual content model and link structure analysis, and proposed CQARank to measure user interests and expertise score under different topics.
Proceedings ArticleDOI
Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering
TL;DR: This paper proposes to learn continuous word embeddings with metadata of category information within cQA pages for question retrieval with the framework of fisher kernel to deal with the variable size of word embedding vectors.
Proceedings ArticleDOI
Question-answer topic model for question retrieval in community question answering
TL;DR: A novel Question-Answer Topic Model (QATM) is proposed to learn the latent topics aligned across the question-answer pairs to alleviate the lexical gap problem, with the assumption that a question and its paired answer share the same topic distribution.
Proceedings ArticleDOI
Question Retrieval with High Quality Answers in Community Question Answering
TL;DR: A topic-based language model, which matches questions not only on a term level but also on a topic level, which can significantly outperform state-of-the-art retrieval models in CQA.
Proceedings ArticleDOI
Learning Hybrid Representations to Retrieve Semantically Equivalent Questions
TL;DR: Retrieving similar questions in online QA (2) BOW-CNN is more robust than the pure CNN for long texts.
References
More filters
Journal ArticleDOI
Latent dirichlet allocation
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article
Latent Dirichlet Allocation
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Journal ArticleDOI
A vector space model for automatic indexing
Gerard Salton,A. Wong,C. S. Yang +2 more
TL;DR: An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents, demonstating the usefulness of the model.
Journal ArticleDOI
Finding scientific topics
TL;DR: A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics.
Journal Article
The mathematics of statistical machine translation: parameter estimation
TL;DR: The authors describe a series of five statistical models of the translation process and give algorithms for estimating the parameters of these models given a set of pairs of sentences that are translations of one another.