scispace - formally typeset
Open AccessProceedings ArticleDOI

An empirical study on developer interactions in StackOverflow

TLDR
Latent Dirichlet Allocation (LDA), a well known topic modeling approach, is used to analyze the contents of tens of thousands of questions and answers, and LDA provides an alternative perspective different from that of Treude et al. for categorizing StackOverflow questions.
Abstract
StackOverflow provides a popular platform where developers post and answer questions. Recently, Treude et al. manually label 385 questions in StackOverflow and group them into 10 categories based on their contents. They also analyze how tags are used in StackOverflow. In this study, we extend their work to obtain a deeper understanding on how developers interact with one another on such a question and answer web site. First, we analyze the distributions of developers who ask and answer questions. We also investigate if there is a segregation of the StackOverflow community into questioners and answerers. We also perform automated text mining to find the various kinds of topics asked by developers. We use Latent Dirichlet Allocation (LDA), a well known topic modeling approach, to analyze the contents of tens of thousands of questions and answers, and produce five topics. Our topic modeling strategy provides an alternative perspective different from that of Treude et al. for categorizing StackOverflow questions. Each question can now be categorized into several topics with different probabilities, and the learned topic model could automatically assign a new question to several categories with varying probabilities. Last but not least, we show the distributions of questions and developers belonging to various topics generated by LDA.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

You Get Where You're Looking for: The Impact of Information Sources on Code Security

TL;DR: Analyzing how the use of information resources impacts code security confirms that API documentation is secure but hard to use, while informal documentation such as Stack Overflow is more accessible but often leads to insecurity.
Proceedings ArticleDOI

Mining questions about software energy consumption

TL;DR: This paper presents the first empirical study on understanding the views of application programmers on software energy consumption problems, using StackOverflow as the primary data source and analyzes a carefully curated sample of more than 300 questions and 550 answers.
Proceedings ArticleDOI

How do API changes trigger stack overflow discussions? a study on the Android SDK

TL;DR: It is suggested that Android developers usually have more questions when the behavior of APIs is modified, and deleting public methods from APIs is a trigger for questions that are more discussed and of major interest for the community, and posted by more experienced developers.
Proceedings ArticleDOI

Mining duplicate questions in stack overflow

TL;DR: A manual investigation is performed to understand why users submit duplicate questions in Stack Overflow and a classification technique is proposed that uses a number of carefully chosen features to identify duplicate questions with reasonable accuracy.
Proceedings ArticleDOI

SOTorrent: reconstructing and analyzing the evolution of stack overflow posts

TL;DR: SOTorrent as discussed by the authors provides access to the version history of Stack Overflow content at the level of whole posts and individual text or code blocks by aggregating URLs from text blocks and collecting references from GitHub files to SO posts.
References
More filters
Journal ArticleDOI

Latent dirichlet allocation

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article

Latent Dirichlet Allocation

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Book ChapterDOI

Comparing twitter and traditional media using topic models

TL;DR: This paper empirically compare the content of Twitter with a traditional news medium, New York Times, using unsupervised topic modeling, and finds interesting and useful findings for downstream IR or DM applications.
Proceedings ArticleDOI

Social coding in GitHub: transparency and collaboration in an open software repository

TL;DR: It is found that people make a surprisingly rich set of social inferences from the networked activity information in GitHub, such as inferring someone else's technical goals and vision when they edit code, or guessing which of several similar projects has the best chance of thriving in the long term.
Journal ArticleDOI

The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies

TL;DR: The nested Chinese restaurant process (nCRP) as discussed by the authors is a stochastic process that assigns probability distributions to ensembles of infinitely deep, infinitely branching trees, and it can be used as a prior distribution in a Bayesian nonparametric model of document collections.
Related Papers (5)