Classifying stack overflow posts on API issues

doi:10.1109/SANER.2018.8330213

Proceedings ArticleDOI

Classifying stack overflow posts on API issues

Ahasanuzzaman, +3 more

- pp 244-254

Chats0

TLDR

A supervised learning approach is developed using a Conditional Random Field (CRF) method, a statistical modeling method, to identify API issue-related sentences and builds a technique, called CAPS, that can classify SO posts concerning API issues.

Abstract:

The design and maintenance of APIs are complex tasks due to the constantly changing requirements of its users. Despite the efforts of its designers, APIs may suffer from a number of issues (such as incomplete or erroneous documentation, poor performance, and backward incompatibility). To maintain a healthy client base, API designers must learn these issues to fix them. Question answering sites, such as Stack Overflow (SO), has become a popular place for discussing API issues. These posts about API issues are invaluable to API designers, not only because they can help to learn more about the problem but also because they can facilitate learning the requirements of API users. However, the unstructured nature of posts and the abundance of non-issue posts make the task of detecting SO posts concerning API issues difficult and challenging. In this paper, we first develop a supervised learning approach using a Conditional Random Field (CRF), a statistical modeling method, to identify API issue-related sentences. We use the above information together with different features of posts and experience of users to build a technique, called CAPS, that can classify SO posts concerning API issues. Evaluation of CAPS using carefully curated SO posts on three popular API types reveals that the technique outperforms all three baseline approaches we consider in this study. We also conduct studies to test the generalizability of CAPS results and to understand the effects of different sources of information on it.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Bug severity prediction using question-and-answer pairs from Stack Overflow

You-shuai Tan, +5 more

- 01 Jul 2020 -

Journal of Systems and Software

TL;DR: This paper extracts all the posts related to bug repositories from Stack Overflow and combines them with bug reports to obtain enhanced versions of bug reports and achieves severity prediction on three popular open source projects with Naive Bayesian, k-Nearest Neighbor algorithm (KNN), and Long Short-Term Memory (LSTM).

...read moreread less

Proceedings ArticleDOI

An empirical study on challenges of application development in serverless computing

Jinfeng Wen, +7 more

TL;DR: In this article, the authors mine and analyze 22,731 relevant questions from Stack Overflow (a popular Q&A website for developers), and show the increasing popularity trend and the high difficulty level of serverless computing for developers.

...read moreread less

Journal ArticleDOI

Mining API usage scenarios from stack overflow

Gias Uddin, +2 more

- 01 Jun 2020 -

Information & Software Technology

TL;DR: A framework to automatically mine API usage scenarios from Stack Overflow, supported by three novel algorithms is proposed and implemented and deployed in the proof-of-concept online tool, Opiner.

...read moreread less

Journal ArticleDOI

You broke my code: understanding the motivations for breaking changes in APIs

Aline Brito, +3 more

- 01 Mar 2020 -

Empirical Software Engineering

TL;DR: It is revealed that breaking changes have an important impact on clients, since 45% of the questions are from clients asking how to overcome specific breaking changes; they are also common in other ecosystems—JavaScript, .NET, etc.

...read moreread less

Journal ArticleDOI

CAPS: a supervised technique for classifying Stack Overflow posts concerning API issues

Ahasanuzzaman, +3 more

- 01 Mar 2020 -

Empirical Software Engineering

TL;DR: A supervised learning approach using a Conditional Random Field (CRF), a statistical modeling method, to identify API issue-related sentences and evaluates the performance of the CRF-based technique for classifying issue sentences, which reveals that the technique has high potential.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Latent dirichlet allocation

David M. Blei, +2 more

- 01 Mar 2003 -

Journal of Machine Learning Research

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

Proceedings Article

Latent Dirichlet Allocation

David M. Blei, +2 more

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).

...read moreread less

Journal ArticleDOI

The WEKA data mining software: an update

Mark Hall, +5 more

- 16 Nov 2009 -

Sigkdd Explorations

TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.

...read moreread less

Proceedings ArticleDOI

The Stanford CoreNLP Natural Language Processing Toolkit

Christopher D. Manning, +5 more

TL;DR: The design and use of the Stanford CoreNLP toolkit is described, an extensible pipeline that provides core natural language analysis, and it is suggested that this follows from a simple, approachable design, straightforward interfaces, the inclusion of robust and good quality analysis components, and not requiring use of a large amount of associated baggage.

...read moreread less

Proceedings ArticleDOI

Feature-rich part-of-speech tagging with a cyclic dependency network

Kristina Toutanova, +3 more

TL;DR: A new part-of-speech tagger is presented that demonstrates the following ideas: explicit use of both preceding and following tag contexts via a dependency network representation, broad use of lexical features, and effective use of priors in conditional loglinear models.

...read moreread less