scispace - formally typeset
Proceedings ArticleDOI

Classifying stack overflow posts on API issues

Reads0
Chats0
TLDR
A supervised learning approach is developed using a Conditional Random Field (CRF) method, a statistical modeling method, to identify API issue-related sentences and builds a technique, called CAPS, that can classify SO posts concerning API issues.
Abstract
The design and maintenance of APIs are complex tasks due to the constantly changing requirements of its users. Despite the efforts of its designers, APIs may suffer from a number of issues (such as incomplete or erroneous documentation, poor performance, and backward incompatibility). To maintain a healthy client base, API designers must learn these issues to fix them. Question answering sites, such as Stack Overflow (SO), has become a popular place for discussing API issues. These posts about API issues are invaluable to API designers, not only because they can help to learn more about the problem but also because they can facilitate learning the requirements of API users. However, the unstructured nature of posts and the abundance of non-issue posts make the task of detecting SO posts concerning API issues difficult and challenging. In this paper, we first develop a supervised learning approach using a Conditional Random Field (CRF), a statistical modeling method, to identify API issue-related sentences. We use the above information together with different features of posts and experience of users to build a technique, called CAPS, that can classify SO posts concerning API issues. Evaluation of CAPS using carefully curated SO posts on three popular API types reveals that the technique outperforms all three baseline approaches we consider in this study. We also conduct studies to test the generalizability of CAPS results and to understand the effects of different sources of information on it.

read more

Citations
More filters
Journal ArticleDOI

Bug severity prediction using question-and-answer pairs from Stack Overflow

TL;DR: This paper extracts all the posts related to bug repositories from Stack Overflow and combines them with bug reports to obtain enhanced versions of bug reports and achieves severity prediction on three popular open source projects with Naive Bayesian, k-Nearest Neighbor algorithm (KNN), and Long Short-Term Memory (LSTM).
Proceedings ArticleDOI

An empirical study on challenges of application development in serverless computing

TL;DR: In this article, the authors mine and analyze 22,731 relevant questions from Stack Overflow (a popular Q&A website for developers), and show the increasing popularity trend and the high difficulty level of serverless computing for developers.
Journal ArticleDOI

Mining API usage scenarios from stack overflow

TL;DR: A framework to automatically mine API usage scenarios from Stack Overflow, supported by three novel algorithms is proposed and implemented and deployed in the proof-of-concept online tool, Opiner.
Journal ArticleDOI

You broke my code: understanding the motivations for breaking changes in APIs

TL;DR: It is revealed that breaking changes have an important impact on clients, since 45% of the questions are from clients asking how to overcome specific breaking changes; they are also common in other ecosystems—JavaScript, .NET, etc.
Journal ArticleDOI

CAPS: a supervised technique for classifying Stack Overflow posts concerning API issues

TL;DR: A supervised learning approach using a Conditional Random Field (CRF), a statistical modeling method, to identify API issue-related sentences and evaluates the performance of the CRF-based technique for classifying issue sentences, which reveals that the technique has high potential.
References
More filters
Journal ArticleDOI

Latent dirichlet allocation

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article

Latent Dirichlet Allocation

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Journal ArticleDOI

The WEKA data mining software: an update

TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
Proceedings ArticleDOI

The Stanford CoreNLP Natural Language Processing Toolkit

TL;DR: The design and use of the Stanford CoreNLP toolkit is described, an extensible pipeline that provides core natural language analysis, and it is suggested that this follows from a simple, approachable design, straightforward interfaces, the inclusion of robust and good quality analysis components, and not requiring use of a large amount of associated baggage.
Proceedings ArticleDOI

Feature-rich part-of-speech tagging with a cyclic dependency network

TL;DR: A new part-of-speech tagger is presented that demonstrates the following ideas: explicit use of both preceding and following tag contexts via a dependency network representation, broad use of lexical features, and effective use of priors in conditional loglinear models.
Related Papers (5)
Trending Questions (1)
Is API developer a good career?

These posts about API issues are invaluable to API designers, not only because they can help to learn more about the problem but also because they can facilitate learning the requirements of API users.