scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Understanding and Predicting Question Subjectivity in Social Question and Answering

TL;DR: This paper model the task of intent detection as a binary classification problem, and thus for each question, two classes are defined: subjective and objective, and finds that the two types of questions exhibited very different characteristics.
Abstract: The explosive popularity of social networking sites has provided an additional venue for online information seeking. By posting questions in their status updates, more and more people are turning to social networks to fulfill their information needs. Given that understanding individuals’ information needs could improve the performance of question answering, in this paper, we model the task of intent detection as a binary classification problem, and thus for each question, two classes are defined: subjective and objective. We use a comprehensive set of lexical, syntactical, and contextual features to build the classifier and the experimental results show satisfactory classification performance. By applying the classifier on a larger dataset, we then present in-depth analyses to compare subjective and objective questions, in terms of the way they are being asked and answered. We find that the two types of questions exhibited very different characteristics, and further validate the expected benefits of differentiating questions according to their subjectivity orientations.
Citations
More filters
Journal ArticleDOI
TL;DR: The performance of the proposed hybrid expertise retrieval system is found to be superior than that of 18 state-of-the-art algorithms on these four real-world datasets.
Abstract: In this paper, we propose a hybrid expertise retrieval system for community question answering services The proposed system consists of two segments: a text based segment and a network based segment For a given question, the text based segment estimates users’ knowledge introducing two new concepts: question hardness and question answerer association The network based segment, moreover, incorporates users’ relative performances into the network structure We denote the outputs of these two segments as knowledge score and authority score, respectively We aggregate these two scores using a fusion technique to quantify the expertise of a given user for a given question We have generated four datasets by downloading questions and answers from Yahoo! Answers The performance of the proposed system is found to be superior than that of 18 state-of-the-art algorithms on these four real-world datasets

18 citations

Journal ArticleDOI
TL;DR: A morpheme growth model that enhances the memories of key elements in questions, and later extracts the "label-indicators" and germinates the expansion vectors around them is presented, which serves as a useful system for the automatic understanding of patient questions.

10 citations

Journal ArticleDOI
20 Mar 2018
TL;DR: This study proposes a conceptual framework for comparison between the web and mobile platforms of social Q&A from the user’s perspective, and the comparative results of this study could give social QQ&A service providers useful information about users’ differences between web andMobile platforms ofSocial QQA services.
Abstract: As an increasing number of users have acquired information across the web and mobile platforms for social question and answering (QA mobile users perceive higher affinity with Zhihu.com than web users; and mobile users perceive higher information-seeking intention than web users do.,Regarding the theoretical aspect, this study proposes a conceptual framework for comparison between the web and mobile platforms of social Q&A from the user’s perspective. Regarding the practical aspect, the comparative results of this study could give social Q&A service providers useful information about users’ differences between web and mobile platforms of social Q&A services.

10 citations

01 Jan 2017
TL;DR: Results from this study reveal the potential research issues, namely morphology analysis, question classification, and term weighting algorithm for question classification in Question Answering framework.
Abstract: Question Answering System could automatically provide an answer to a question posed by human in natural languages. This system consists of question analysis, document processing, and answer extraction module. Question Analysis module has task to translate query into a form that can be processed by document processing module. Document processing is a technique for identifying candidate documents, containing answer relevant to the user query. Furthermore, answer extraction module receives the set of passages from document processing module, then determine the best answers to user. Challenge to optimize Question Answering framework is to increase the performance of all modules in the framework. The performance of all modules that has not been optimized has led to the less accurate answer from question answering systems. Based on this issues, the objective of this study is to review the current state of question analysis, document processing, and answer extraction techniques. Result from this study reveals the potential research issues, namely morphology analysis, question classification, and term weighting algorithm for question classification.

5 citations


Cites background from "Understanding and Predicting Questi..."

  • ...Some of the areas that have implemented QAS such as social media ([10],[18],[19]), geographic ([5],[20–23]), geology [24], software engineering ([25],[26]),...

    [...]

01 Jan 2017
TL;DR: The SLR has found that majority of the study are using machine learning approach to identify and learn subjective text due to the nature of subjectivity analysis problem that is viewed as classification problem, and the performance of this approach outperformed other approaches though currently it is at satisfactory level.
Abstract: Subjectivity analysis determines existence of subjectivity in text using subjective clues.It is the first task in opinion mining process.The difference between subjectivity analysis and polarity determination is the latter process subjective text to determine the orientation as positive or negative.There were many techniques used to solve the problem of segregating subjective and objective text.This paper used systematic literature review (SLR) to compile the undertaking study in subjective analysis.SLR is a literature review that collects multiple and critically analyse multiple studies to answer the research questions.Eight research questions were drawn for this purpose.Information such as technique,corpus,subjective clues representation and performance were extracted from 97 articles known as primary studies.This information was analysed to identify the strengths and weaknesses of the technique,affecting elements to the performance and missing elements from the subjectivity analysis.The SLR has found that majority of the study are using machine learning approach to identify and learn subjective text due to the nature of subjectivity analysis problem that is viewed as classification problem.The performance of this approach outperformed other approaches though currently it is at satisfactory level.Therefore,more studies are needed to improve the performance of subjectivity analysis.

4 citations


Additional excerpts

  • ...Table 4: Selected primary studies Year Primary Studies 2007 [15][16][17] 2008 [18][19][20][21][22][23][24][25] 2009 [26][27][28][29] 2010 [12][30][31][32][33][34] 2011 [35][36][37][38][39][40][41][42] 2012 [43][44][45][46][47][48][49][50][49][51] 2013 [52][53][54][55][56][57][58][59][60][61][62][63][64][65][66][67][68] [69][70][71][72][73] 2014 [74][75][76][77][78][79][80][81][82][83][84][85][86][87] 2015 [88][89][90][91][92][93][94][95][96][97][98][99][100][101][102][103] 2016 [104][105][106][107][108] 2017 [7]...

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
Abstract: More than twelve years have elapsed since the first public release of WEKA. In that time, the software has been rewritten entirely from scratch, evolved substantially and now accompanies a text on data mining [35]. These days, WEKA enjoys widespread acceptance in both academia and business, has an active community, and has been downloaded more than 1.4 million times since being placed on Source-Forge in April 2000. This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.

19,603 citations


"Understanding and Predicting Questi..." refers methods in this paper

  • ...We experimented with Naïve Bayes support vector machines (SVMs) (sequential minimal optimization) and decision trees (J48) as implemented in WEKA [29]....

    [...]

  • ...using the information gain criterion [28] as implemented in WEKA [29]....

    [...]

  • ...First, due to the large number of features extracted, before conducting the classification, we performed feature selection using the information gain criterion [28] as implemented in WEKA [29]....

    [...]

Journal ArticleDOI
TL;DR: An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL, and performs slightly better than a much more elaborate system with which it has been compared.
Abstract: The automatic removal of suffixes from words in English is of particular interest in the field of information retrieval. An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL. Although simple, it performs slightly better than a much more elaborate system with which it has been compared. It effectively works by treating complex suffixes as compounds made up of simple suffixes, and removing the simple suffixes in a number of steps. In each step the removal of the suffix is made to depend upon the form of the remaining stem, which usually involves a measure of its syllable length.

7,572 citations

Proceedings Article
08 Jul 1997
TL;DR: This paper finds strong correlations between the DF IG and CHI values of a term and suggests that DF thresholding the simplest method with the lowest cost in computation can be reliably used instead of IG or CHI when the computation of these measures are too expensive.
Abstract: This paper is a comparative study of feature selection methods in statistical learning of text categorization The focus is on aggres sive dimensionality reduction Five meth ods were evaluated including term selection based on document frequency DF informa tion gain IG mutual information MI a test CHI and term strength TS We found IG and CHI most e ective in our ex periments Using IG thresholding with a k nearest neighbor classi er on the Reuters cor pus removal of up to removal of unique terms actually yielded an improved classi cation accuracy measured by average preci sion DF thresholding performed similarly Indeed we found strong correlations between the DF IG and CHI values of a term This suggests that DF thresholding the simplest method with the lowest cost in computation can be reliably used instead of IG or CHI when the computation of these measures are too expensive TS compares favorably with the other methods with up to vocabulary reduction but is not competitive at higher vo cabulary reduction levels In contrast MI had relatively poor performance due to its bias towards favoring rare terms and its sen sitivity to probability estimation errors

5,366 citations


"Understanding and Predicting Questi..." refers methods in this paper

  • ...using the information gain criterion [28] as implemented in WEKA [29]....

    [...]

Proceedings ArticleDOI
27 May 2003
TL;DR: A new part-of-speech tagger is presented that demonstrates the following ideas: explicit use of both preceding and following tag contexts via a dependency network representation, broad use of lexical features, and effective use of priors in conditional loglinear models.
Abstract: We present a new part-of-speech tagger that demonstrates the following ideas: (i) explicit use of both preceding and following tag contexts via a dependency network representation, (ii) broad use of lexical features, including jointly conditioning on multiple consecutive words, (iii) effective use of priors in conditional loglinear models, and (iv) fine-grained modeling of unknown word features. Using these ideas together, the resulting tagger gives a 97.24% accuracy on the Penn Treebank WSJ, an error reduction of 4.4% on the best previous single automatically learned tagging result.

3,466 citations


"Understanding and Predicting Questi..." refers methods in this paper

  • ...To tag the POS of each tweet, we used the Stanford tagger [27]....

    [...]

Proceedings ArticleDOI
28 Mar 2011
TL;DR: There are measurable differences in the way messages propagate, that can be used to classify them automatically as credible or not credible, with precision and recall in the range of 70% to 80%.
Abstract: We analyze the information credibility of news propagated through Twitter, a popular microblogging service. Previous research has shown that most of the messages posted on Twitter are truthful, but the service is also used to spread misinformation and false rumors, often unintentionally.On this paper we focus on automatic methods for assessing the credibility of a given set of tweets. Specifically, we analyze microblog postings related to "trending" topics, and classify them as credible or not credible, based on features extracted from them. We use features from users' posting and re-posting ("re-tweeting") behavior, from the text of the posts, and from citations to external sources.We evaluate our methods using a significant number of human assessments about the credibility of items on a recent sample of Twitter postings. Our results shows that there are measurable differences in the way messages propagate, that can be used to classify them automatically as credible or not credible, with precision and recall in the range of 70% to 80%.

2,123 citations