An empirical study on developer interactions in StackOverflow

doi:10.1145/2480362.2480557

Open AccessProceedings ArticleDOI

An empirical study on developer interactions in StackOverflow

- pp 1019-1024

TLDR

Latent Dirichlet Allocation (LDA), a well known topic modeling approach, is used to analyze the contents of tens of thousands of questions and answers, and LDA provides an alternative perspective different from that of Treude et al. for categorizing StackOverflow questions.

Abstract:

StackOverflow provides a popular platform where developers post and answer questions. Recently, Treude et al. manually label 385 questions in StackOverflow and group them into 10 categories based on their contents. They also analyze how tags are used in StackOverflow. In this study, we extend their work to obtain a deeper understanding on how developers interact with one another on such a question and answer web site. First, we analyze the distributions of developers who ask and answer questions. We also investigate if there is a segregation of the StackOverflow community into questioners and answerers. We also perform automated text mining to find the various kinds of topics asked by developers. We use Latent Dirichlet Allocation (LDA), a well known topic modeling approach, to analyze the contents of tens of thousands of questions and answers, and produce five topics. Our topic modeling strategy provides an alternative perspective different from that of Treude et al. for categorizing StackOverflow questions. Each question can now be categorized into several topics with different probabilities, and the learned topic model could automatically assign a new question to several categories with varying probabilities. Last but not least, we show the distributions of questions and developers belonging to various topics generated by LDA.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

You Get Where You're Looking for: The Impact of Information Sources on Code Security

Yasemin Acar, +5 more

TL;DR: Analyzing how the use of information resources impacts code security confirms that API documentation is secure but hard to use, while informal documentation such as Stack Overflow is more accessible but often leads to insecurity.

...read moreread less

Proceedings ArticleDOI

Mining questions about software energy consumption

Gustavo Pinto, +2 more

TL;DR: This paper presents the first empirical study on understanding the views of application programmers on software energy consumption problems, using StackOverflow as the primary data source and analyzes a carefully curated sample of more than 300 questions and 550 answers.

...read moreread less

Proceedings ArticleDOI

How do API changes trigger stack overflow discussions? a study on the Android SDK

Mario Linares-Vasquez, +4 more

TL;DR: It is suggested that Android developers usually have more questions when the behavior of APIs is modified, and deleting public methods from APIs is a trigger for questions that are more discussed and of major interest for the community, and posted by more experienced developers.

...read moreread less

Proceedings ArticleDOI

Mining duplicate questions in stack overflow

Muhammad Ahasanuzzaman, +3 more

TL;DR: A manual investigation is performed to understand why users submit duplicate questions in Stack Overflow and a classification technique is proposed that uses a number of carefully chosen features to identify duplicate questions with reasonable accuracy.

...read moreread less

Proceedings ArticleDOI

SOTorrent: reconstructing and analyzing the evolution of stack overflow posts

Sebastian Baltes, +3 more

TL;DR: SOTorrent as discussed by the authors provides access to the version history of Stack Overflow content at the level of whole posts and individual text or code blocks by aggregating URLs from text blocks and collecting references from GitHub files to SO posts.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Latent dirichlet allocation

David M. Blei, +2 more

- 01 Mar 2003 -

Journal of Machine Learning Research

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

Proceedings Article

Latent Dirichlet Allocation

David M. Blei, +2 more

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).

...read moreread less

Book ChapterDOI

Comparing twitter and traditional media using topic models

Wayne Xin Zhao, +6 more

TL;DR: This paper empirically compare the content of Twitter with a traditional news medium, New York Times, using unsupervised topic modeling, and finds interesting and useful findings for downstream IR or DM applications.

...read moreread less

Proceedings ArticleDOI

Social coding in GitHub: transparency and collaboration in an open software repository

Laura Dabbish, +3 more

TL;DR: It is found that people make a surprisingly rich set of social inferences from the networked activity information in GitHub, such as inferring someone else's technical goals and vision when they edit code, or guessing which of several similar projects has the best chance of thriving in the long term.

...read moreread less

Journal ArticleDOI

The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies

David M. Blei, +2 more

- 08 Feb 2010 -

Journal of the ACM

TL;DR: The nested Chinese restaurant process (nCRP) as discussed by the authors is a stochastic process that assigns probability distributions to ensembles of infinitely deep, infinitely branching trees, and it can be used as a prior distribution in a Bayesian nonparametric model of document collections.

...read moreread less

Related Papers (5)

What are developers talking about? An analysis of topics and trends in Stack Overflow

Anton Barua, +2 more

- 01 Jun 2014 -

Empirical Software Engineering

Discovering value from community activity on focused question answering sites: a case study of stack overflow

Ashton Anderson, +3 more

An empirical study on developer interactions in StackOverflow

Citations

You Get Where You're Looking for: The Impact of Information Sources on Code Security

Mining questions about software energy consumption

How do API changes trigger stack overflow discussions? a study on the Android SDK

Mining duplicate questions in stack overflow

SOTorrent: reconstructing and analyzing the evolution of stack overflow posts

References

Latent dirichlet allocation

Latent Dirichlet Allocation

Comparing twitter and traditional media using topic models

Social coding in GitHub: transparency and collaboration in an open software repository

The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies

Related Papers (5)

What are developers talking about? An analysis of topics and trends in Stack Overflow

How do programmers ask and answer questions on the web? (NIER track)

What makes a good code example?: A study of programming Q&A in StackOverflow

What are mobile developers asking about? A large scale study using stack overflow

Discovering value from community activity on focused question answering sites: a case study of stack overflow