Understanding and Predicting Question Subjectivity in Social Question and Answering

doi:10.1109/TCSS.2016.2564400

Home
/
Papers
/
Understanding and Predicting Question Subjectivity in Social Question and Answering

Journal Article•DOI•

Understanding and Predicting Question Subjectivity in Social Question and Answering

Zhe Liu¹, Bernard J. Jansen²•Institutions (2)

IBM¹, Qatar Computing Research Institute²

01 Mar 2016-IEEE Transactions on Computational Social Systems (IEEE)-Vol. 3, Iss: 1, pp 32-41

TL;DR: This paper model the task of intent detection as a binary classification problem, and thus for each question, two classes are defined: subjective and objective, and finds that the two types of questions exhibited very different characteristics.

read less

Abstract: The explosive popularity of social networking sites has provided an additional venue for online information seeking. By posting questions in their status updates, more and more people are turning to social networks to fulfill their information needs. Given that understanding individuals’ information needs could improve the performance of question answering, in this paper, we model the task of intent detection as a binary classification problem, and thus for each question, two classes are defined: subjective and objective. We use a comprehensive set of lexical, syntactical, and contextual features to build the classifier and the experimental results show satisfactory classification performance. By applying the classifier on a larger dataset, we then present in-depth analyses to compare subjective and objective questions, in terms of the way they are being asked and answered. We find that the two types of questions exhibited very different characteristics, and further validate the expected benefits of differentiating questions according to their subjectivity orientations.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Formulation of a hybrid expertise retrieval system in community question answering services

[...]

Dipankar Kundu¹, Deba Prasad Mandal¹•Institutions (1)

Indian Statistical Institute¹

01 Feb 2019-Applied Intelligence

TL;DR: The performance of the proposed hybrid expertise retrieval system is found to be superior than that of 18 state-of-the-art algorithms on these four real-world datasets.

...read moreread less

Abstract: In this paper, we propose a hybrid expertise retrieval system for community question answering services The proposed system consists of two segments: a text based segment and a network based segment For a given question, the text based segment estimates users’ knowledge introducing two new concepts: question hardness and question answerer association The network based segment, moreover, incorporates users’ relative performances into the network structure We denote the outputs of these two segments as knowledge score and authority score, respectively We aggregate these two scores using a fusion technique to quantify the expertise of a given user for a given question We have generated four datasets by downloading questions and answers from Yahoo! Answers The performance of the proposed system is found to be superior than that of 18 state-of-the-art algorithms on these four real-world datasets

...read moreread less

18 citations

Journal Article•DOI•

Label-indicator morpheme growth on LSTM for Chinese healthcare question department classification.

[...]

Yang Hu¹, Guihua Wen¹, Jiajiong Ma¹, Danyang Li¹, Changjun Wang², Huihui Li¹, Er-Yang Huan¹ - Show less +3 more•Institutions (2)

South China University of Technology¹, Guangdong General Hospital²

01 Jun 2018-Journal of Biomedical Informatics

TL;DR: A morpheme growth model that enhances the memories of key elements in questions, and later extracts the "label-indicators" and germinates the expansion vectors around them is presented, which serves as a useful system for the automatic understanding of patient questions.

...read moreread less

10 citations

Journal Article•DOI•

Comparing the web and mobile platforms of a social Q&A service from the user’s perspective

[...]

Xiaoyu Chen¹, Alton Y. K. Chua¹, Shengli Deng²•Institutions (2)

Nanyang Technological University¹, Wuhan University²

20 Mar 2018

TL;DR: This study proposes a conceptual framework for comparison between the web and mobile platforms of social Q&A from the user’s perspective, and the comparative results of this study could give social QQ&A service providers useful information about users’ differences between web andMobile platforms ofSocial QQA services.

...read moreread less

Abstract: As an increasing number of users have acquired information across the web and mobile platforms for social question and answering (QA mobile users perceive higher affinity with Zhihu.com than web users; and mobile users perceive higher information-seeking intention than web users do.,Regarding the theoretical aspect, this study proposes a conceptual framework for comparison between the web and mobile platforms of social Q&A from the user’s perspective. Regarding the practical aspect, the comparative results of this study could give social Q&A service providers useful information about users’ differences between web and mobile platforms of social Q&A services.

...read moreread less

10 citations

Question Answering System : A Review On Question Analysis, Document Processing, And Answer Extraction Techniques

[...]

Fandy Setyo Utomo, Nanna Suryana, Mohd Sanusi Azmi

01 Jan 2017

TL;DR: Results from this study reveal the potential research issues, namely morphology analysis, question classification, and term weighting algorithm for question classification in Question Answering framework.

...read moreread less

Abstract: Question Answering System could automatically provide an answer to a question posed by human in natural languages. This system consists of question analysis, document processing, and answer extraction module. Question Analysis module has task to translate query into a form that can be processed by document processing module. Document processing is a technique for identifying candidate documents, containing answer relevant to the user query. Furthermore, answer extraction module receives the set of passages from document processing module, then determine the best answers to user. Challenge to optimize Question Answering framework is to increase the performance of all modules in the framework. The performance of all modules that has not been optimized has led to the less accurate answer from question answering systems. Based on this issues, the objective of this study is to review the current state of question analysis, document processing, and answer extraction techniques. Result from this study reveals the potential research issues, namely morphology analysis, question classification, and term weighting algorithm for question classification.

...read moreread less

5 citations

Cites background from "Understanding and Predicting Questi..."

...Some of the areas that have implemented QAS such as social media ([10],[18],[19]), geographic ([5],[20–23]), geology [24], software engineering ([25],[26]),...
[...]

Subjectivity Analysis In Opinion Mining - A Systematic Literature Review

[...]

Halizah Basiron, Emaliana Kasmuri

01 Jan 2017

TL;DR: The SLR has found that majority of the study are using machine learning approach to identify and learn subjective text due to the nature of subjectivity analysis problem that is viewed as classification problem, and the performance of this approach outperformed other approaches though currently it is at satisfactory level.

...read moreread less

Abstract: Subjectivity analysis determines existence of subjectivity in text using subjective clues.It is the first task in opinion mining process.The difference between subjectivity analysis and polarity determination is the latter process subjective text to determine the orientation as positive or negative.There were many techniques used to solve the problem of segregating subjective and objective text.This paper used systematic literature review (SLR) to compile the undertaking study in subjective analysis.SLR is a literature review that collects multiple and critically analyse multiple studies to answer the research questions.Eight research questions were drawn for this purpose.Information such as technique,corpus,subjective clues representation and performance were extracted from 97 articles known as primary studies.This information was analysed to identify the strengths and weaknesses of the technique,affecting elements to the performance and missing elements from the subjectivity analysis.The SLR has found that majority of the study are using machine learning approach to identify and learn subjective text due to the nature of subjectivity analysis problem that is viewed as classification problem.The performance of this approach outperformed other approaches though currently it is at satisfactory level.Therefore,more studies are needed to improve the performance of subjectivity analysis.

...read moreread less

4 citations

Additional excerpts

...Table 4: Selected primary studies Year Primary Studies 2007 [15][16][17] 2008 [18][19][20][21][22][23][24][25] 2009 [26][27][28][29] 2010 [12][30][31][32][33][34] 2011 [35][36][37][38][39][40][41][42] 2012 [43][44][45][46][47][48][49][50][49][51] 2013 [52][53][54][55][56][57][58][59][60][61][62][63][64][65][66][67][68] [69][70][71][72][73] 2014 [74][75][76][77][78][79][80][81][82][83][84][85][86][87] 2015 [88][89][90][91][92][93][94][95][96][97][98][99][100][101][102][103] 2016 [104][105][106][107][108] 2017 [7]...
[...]

References

PDF

Open Access

More filters

Journal Article•DOI•

The WEKA data mining software: an update

[...]

Mark Hall, Eibe Frank¹, Geoffrey Holmes¹, Bernhard Pfahringer¹, Peter Reutemann¹, Ian H. Witten¹ - Show less +2 more•Institutions (1)

University of Waikato¹

16 Nov 2009-Sigkdd Explorations

TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.

...read moreread less

Abstract: More than twelve years have elapsed since the first public release of WEKA. In that time, the software has been rewritten entirely from scratch, evolved substantially and now accompanies a text on data mining [35]. These days, WEKA enjoys widespread acceptance in both academia and business, has an active community, and has been downloaded more than 1.4 million times since being placed on Source-Forge in April 2000. This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.

...read moreread less

19,603 citations

"Understanding and Predicting Questi..." refers methods in this paper

...We experimented with Naïve Bayes support vector machines (SVMs) (sequential minimal optimization) and decision trees (J48) as implemented in WEKA [29]....
[...]
...using the information gain criterion [28] as implemented in WEKA [29]....
[...]
...First, due to the large number of features extracted, before conducting the classification, we performed feature selection using the information gain criterion [28] as implemented in WEKA [29]....
[...]

Journal Article•DOI•

An algorithm for suffix stripping

[...]

M. F. Porter¹•Institutions (1)

University of Cambridge¹

01 Dec 1997-Program: Electronic Library and Information Systems

TL;DR: An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL, and performs slightly better than a much more elaborate system with which it has been compared.

...read moreread less

Abstract: The automatic removal of suffixes from words in English is of particular interest in the field of information retrieval. An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL. Although simple, it performs slightly better than a much more elaborate system with which it has been compared. It effectively works by treating complex suffixes as compounds made up of simple suffixes, and removing the simple suffixes in a number of steps. In each step the removal of the suffix is made to depend upon the form of the remaining stem, which usually involves a measure of its syllable length.

...read moreread less

7,572 citations

Proceedings Article•

A Comparative Study on Feature Selection in Text Categorization

[...]

Yiming Yang, Jan O. Pedersen

08 Jul 1997

TL;DR: This paper finds strong correlations between the DF IG and CHI values of a term and suggests that DF thresholding the simplest method with the lowest cost in computation can be reliably used instead of IG or CHI when the computation of these measures are too expensive.

...read moreread less

Abstract: This paper is a comparative study of feature selection methods in statistical learning of text categorization The focus is on aggres sive dimensionality reduction Five meth ods were evaluated including term selection based on document frequency DF informa tion gain IG mutual information MI a test CHI and term strength TS We found IG and CHI most e ective in our ex periments Using IG thresholding with a k nearest neighbor classi er on the Reuters cor pus removal of up to removal of unique terms actually yielded an improved classi cation accuracy measured by average preci sion DF thresholding performed similarly Indeed we found strong correlations between the DF IG and CHI values of a term This suggests that DF thresholding the simplest method with the lowest cost in computation can be reliably used instead of IG or CHI when the computation of these measures are too expensive TS compares favorably with the other methods with up to vocabulary reduction but is not competitive at higher vo cabulary reduction levels In contrast MI had relatively poor performance due to its bias towards favoring rare terms and its sen sitivity to probability estimation errors

...read moreread less

5,366 citations

"Understanding and Predicting Questi..." refers methods in this paper

...using the information gain criterion [28] as implemented in WEKA [29]....
[...]

Proceedings Article•DOI•

Feature-rich part-of-speech tagging with a cyclic dependency network

[...]

Kristina Toutanova¹, Dan Klein¹, Christopher D. Manning¹, Yoram Singer²•Institutions (2)

Stanford University¹, Hebrew University of Jerusalem²

27 May 2003

TL;DR: A new part-of-speech tagger is presented that demonstrates the following ideas: explicit use of both preceding and following tag contexts via a dependency network representation, broad use of lexical features, and effective use of priors in conditional loglinear models.

...read moreread less

Abstract: We present a new part-of-speech tagger that demonstrates the following ideas: (i) explicit use of both preceding and following tag contexts via a dependency network representation, (ii) broad use of lexical features, including jointly conditioning on multiple consecutive words, (iii) effective use of priors in conditional loglinear models, and (iv) fine-grained modeling of unknown word features. Using these ideas together, the resulting tagger gives a 97.24% accuracy on the Penn Treebank WSJ, an error reduction of 4.4% on the best previous single automatically learned tagging result.

...read moreread less

3,466 citations

"Understanding and Predicting Questi..." refers methods in this paper

...To tag the POS of each tweet, we used the Stanford tagger [27]....
[...]

Proceedings Article•DOI•

Information credibility on twitter

[...]

Carlos Castillo¹, Marcelo Mendoza², Barbara Poblete³•Institutions (3)

Yahoo!¹, Federico Santa María Technical University², University of Chile³

28 Mar 2011

TL;DR: There are measurable differences in the way messages propagate, that can be used to classify them automatically as credible or not credible, with precision and recall in the range of 70% to 80%.

...read moreread less

Abstract: We analyze the information credibility of news propagated through Twitter, a popular microblogging service. Previous research has shown that most of the messages posted on Twitter are truthful, but the service is also used to spread misinformation and false rumors, often unintentionally.On this paper we focus on automatic methods for assessing the credibility of a given set of tweets. Specifically, we analyze microblog postings related to "trending" topics, and classify them as credible or not credible, based on features extracted from them. We use features from users' posting and re-posting ("re-tweeting") behavior, from the text of the posts, and from citations to external sources.We evaluate our methods using a significant number of human assessments about the credibility of items on a recent sample of Twitter postings. Our results shows that there are measurable differences in the way messages propagate, that can be used to classify them automatically as credible or not credible, with precision and recall in the range of 70% to 80%.

...read moreread less

2,123 citations