scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Questions for data scientists in software engineering: a replication

08 Nov 2020-pp 568-579
TL;DR: In this paper, the authors present a study at ING, a software-defined enterprise in banking in which over 15,000 IT staff provides in-house software solutions, and find that software engineering questions for data scientists in the software defined enterprise are largely similar to the software company, albeit with exceptions.
Abstract: In 2014, a Microsoft study investigated the sort of questions that data science applied to software engineering should answer. This resulted in 145 questions that developers considered relevant for data scientists to answer, thus providing a research agenda to the community. Fast forward to five years, no further studies investigated whether the questions from the software engineers at Microsoft hold for other software companies, including software-intensive companies with different primary focus (to which we refer as software-defined enterprises). Furthermore, it is not evident that the problems identified five years ago are still applicable, given the technological advances in software engineering. This paper presents a study at ING, a software-defined enterprise in banking in which over 15,000 IT staff provides in-house software solutions. This paper presents a comprehensive guide of questions for data scientists selected from the previous study at Microsoft along with our current work at ING. We replicated the original Microsoft study at ING, looking for questions that impact both software companies and software-defined enterprises and continue to impact software engineering. We also add new questions that emerged from differences in the context of the two companies and the five years gap in between. Our results show that software engineering questions for data scientists in the software-defined enterprise are largely similar to the software company, albeit with exceptions. We hope that the software engineering research community builds on the new list of questions to create a useful body of knowledge.

Summary (5 min read)

1 INTRODUCTION

  • Software engineering researchers try solving problems that are relevant to software developers, teams, and organizations.
  • As the authors started looking for existing resources, they came across the 145 software engineering questions for data scientists presented in the Microsoft study [7] .
  • Microsoft is a large software company, while ING that is a Fin-Tech company using software to improve its banking solutions (software-defined enterprise).
  • The authors try to understand whether the questions relevant for a software company extend to a software-defined enterprise.
  • The authors shared subsets of these 171 descriptive questions with another random sample of 1,296 ING engineers for ranking.

2 IMPACT OF THE MICROSOFT 2014 STUDY

  • In order to gain a good insight into the further course of the Microsoft 2014 study after it was published, including any implications for research, the authors conducted a citation analysis.
  • The authors notice that all citing Microsoft studies use a survey among a large number of SE practitioners (ranging from 16 to 793 respondents with a median of 311), whereas other studies based on a survey generally reach substantially lower numbers of participants.
  • The third sub-table shows that most cited studies are about software analytics, often combined with a focus on the role of the software engineer and its perceptions, e.g. [42, 51] .
  • In addition, studies also relate to continuous delivery pipelines and pipeline automation [74, 78] .

3 STUDY DESIGN

  • In part one, the authors replicate the original Microsoft study at ING.
  • The authors follow the step-by-step procedure prescribed in the original study, with slight modifications appropriate for their context Figure 1 depicts the research methodology they followed; the figure is an exact copy of the approach used in the original Microsoft 2014 study with numbers from their study.
  • In the next step, the authors compare the questions identified in the Microsoft study to ours for similarities and differences including addition of new questions and removal of previous questions to answer their research questions.

3.1 The Initial Survey

  • The authors sent the initial survey to 1,002 ING software engineers randomly chosen from a group of 2,342 employees working within the IT department of ING in May 2018.
  • Unlike the Microsoft study, the authors did not offer any reward to increase the participation.
  • This is a deviation from the original study but aligns with the policy of ING.
  • Out of the 1,002 engineers 387 engineers started the survey, 271 of them even filled the demographics but stopped when asked to write questions.
  • Table 3 shows the distribution of responses across discipline and role.

3.2 Coding and Categorization

  • Next the authors did an open card sort to group 336 questions into categories.
  • To create independent codes, the first author who did a majority of the coding did not study the Microsoft paper before or during the replication.
  • Questions 81 to 90 were tagged by both the second and the fourth author.
  • The authors distilled them into a set of so-called descriptive questions that more concisely describe each category (and sub-category).

3.3 The Rating Survey

  • The authors created a second survey to rate the 171 descriptive questions.
  • The authors split the questionnaire into eight component blocks (similar to the Microsoft study) and sent component blocks to potential respondents.
  • The authors rank each question based on the above percentages, with the top rank (#1) having the highest percentage in a dimension (Essential, Worthwhile+, or Unwise).
  • Table 5 and Table 6 presents the most desired (Top 10 Essential, Top 10 Worthwhile+) and the most undesired (Top 10 Unwise) descriptive questions.

3.3.2 Rating by Demographics.

  • Unlike the Microsoft study, the authors did not have employee database to rank responses based on demographics, and privacy regulations prevented us from asking people-related aspects such as years of experience (another deviation from the original study).
  • The authors build their own models since the referenced study did not share scripts to run statistical tests although they did follow their procedure as is.
  • For each of the 171 questions, the authors built a model with Essential response (yes/no) as a dependent variable and professional background as independent variable.
  • The authors built similar models for Worthwhile+ and Unwise responses.
  • In total, the authors built 513 models, three for each of the 171 descriptive questions.

3.4 Comparison of Questions

  • Then for each theme, the authors see how the prominent questions in ING compare against the prominent questions at Microsoft.
  • First, the authors ran word counts on the questions from both the companies presenting a text-based comparison to identify broad differences.
  • Further, the first two authors manually analyzed top 100 essential questions from the two companies in detail.
  • The authors drew affinity diagrams using Microsoft questions and appended related questions from ING to it.
  • Analyses of the three clusters and the frequency distribution of questions (in addition to the previous three analyses) present insights into their research question.

4 RESULTS

  • The original Microsoft study came up with 145 questions that software engineers want data scientists to answer.
  • Replicating the original study at ING, the authors identified 171 data science questions.
  • This section presents a comparison of the two sets of questions based on category, type of questions within categories, top-rated questions, bottom-rated questions, and questions relevant for different demographics.
  • Next, the authors compare the questions from the two companies using word count and affinity diagrams to answer their research question.

4.1 Categories

  • The authors noticed that some of their categories directly match the Microsoft study.
  • Other categories, however, can be mapped to one or more categories of the Microsoft study.
  • No new emergent category in their study indicates that broadly there are no differences between the questions for a software-defined enterprise from a software company.
  • Next, the authors explore the essential questions at ING and their distinguishing link to the questions from the Microsoft study.

4.1.3 Development Best Practices (BEST)

  • This category emphasized best (or worst) development practices relating to technology selection, effectiveness, and choice of tools.
  • Questions here ranged from automated test data generation, on-demand provisioning of test environments, testing of high volumes, to question like "should the authors let loose Chaos Monkey" [35] [5].
  • Notably, questions relating to development trade-offs such as backward compatibility or the impact of testing in production appeared in the Microsoft study but not ours.

4.2 Top-Rated Questions

  • Interestingly, only two out of the top 15 "Essential" questions were a part of the top 10 "Worthwhile or higher" questions and none vice-versa.
  • The authors also noticed that in their study topics like the effects of automated continuous delivery pipeline popped up which were not seen in the Microsoft study.
  • This suggests that for Microsoft customer benefit is most important or perhaps one of the most important question.
  • Overall, it seems that Microsoft has a big focus on customer while ING emphasizes on the engineering team itself.
  • Finally, seven questions in the Microsoft study (marked with the icon ⋆) were about qualityrelated issues (same as ours with eleven questions).

4.3 Bottom-Rated Questions

  • Table 6 shows the top 10 unwise questions.
  • The most "Unwise" question (Q27) at ING is the use of domain-specific language for use by non-experts.
  • This effect can be seen in their study too (two of the top ten unwise questions -Q161 and Q30 -relate to measuring the performance of individual engineers), but not nearly as strongly as in the Microsoft study.
  • It indicates resistance against comparing departments based on key performance indicators like the time to market.

4.4.1 Discipline.

  • Microsoft study showed tester as a specific discipline mainly interested in test suites, bugs, and product quality.
  • This can be seen in Table 7 in which overall scores relating to "Test" are low and highest for "Development".
  • Questions that are also in Table 5 are shown in italics.
  • The role "Manager" includes the responses for "Manager" and "Lead".
  • Testers are for example significantly interested in the testability of software code, and the quality of software related to an agile way of working and working in DevOps teams.

4.5 Comparing ING and Microsoft Questions

  • A comparison of the top 15 words from each company (see Table 8 ) shows that a majority of the popular themes are the same (e.g., code, test, software, and quality).
  • Apart from this, Microsoft questions focused more on bugs, cost, time, customers, and tools while ING employees talked about version, problem, systems, process, and impact.
  • Next, the authors inferred 24 themes from the clusters in the affinity diagram organically merging into three broad categories: relating to code (like understanding code, testing, quality), developers (individual and team productivity) and customers (note that while customers did not make it to the top-10 essential questions, they were important in the top-100).
  • In the ING study, however, the authors do not see such questions.
  • Another subtle difference between the two companies is relating to code size.

5 DISCUSSION

  • The authors discuss potential explanations for the differences in the list of questions found in their study compared to the Microsoft study.
  • The authors saw questions eliciting the need of agile methods in the Microsoft study while at ING the questions related to functional aspects.
  • One potential explanation for the observation can be that software systems at ING are not of the same scale as Microsoft.
  • The authors noticed that employees often talked about security, but no real finance-related questions appear.
  • One explanation for this observation can be that the data science challenges relating to software development are independent of the actual field to which it is applied.

5.1 Implications

  • One of the key findings of this paper is a list of 171 questions that software engineers in a large, software-driven organization would like to see answered, in order to optimize their software development activities.
  • From a practical perspective, their study offers a new way of thinking to software development organizations who care about their development processes.
  • This is exactly how ING intends to use the questions, and the authors believe companies around the world can follow suit.
  • From a research perspective, the authors have seen that the original Microsoft study has generated a series of papers that apply some form of Machine Learning to address the raised in that study.
  • The authors study aims to add urgency and direction to this emerging field, by highlighting not just which questions can be answered, but which ones should be answered, from a practitioner perspective.

5.2 Threats to Validity

  • While their study expands the external validity of the original study, the fact remains that the two lists of questions are based on just two companies, which are both large organizations with over 10,000 software developers.
  • The authors tried mitigating it by limiting their exposure to the previous study, not involving authors from the Microsoft study, and multiple authors generating codes independently.
  • Especially mapping the professional background "Discipline" of the original study on the demographic "Discipline" as applied within ING was challenging.
  • Another potential threat is sensitivity of the ranks which mostly occurs at the extreme sides of the ranking, when, e.g., none of the participants label a question as 'Unwise'.
  • Furthermore, researchers may have their biases which can potentially influence the results.

6 CONCLUSION

  • Conducted at ING-a software-defined enterprise providing banking solutions-this study presents 171 questions that software engineers at ING would like data scientists to answer.
  • The authors compared the two lists of questions and found that the core software development challenges (relating to code, developer, and customer) remain the same.
  • The authors complete their analysis with a report on the impact Microsoft 2014 study generated, also indicating the impact that their study is capable to generate.
  • A thorough understanding of key questions software engineers have that can be answered by data scientists is of crucial importance to both the research community and modern software engineering practice.
  • The authors study aims to contribute to this understanding.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Delft University of Technology
Questions for Data Scientists in Software Engineering: A Replication
Huijgens, Hennie; Rastogi, Ayushi; Mulders, Ernst; Gousios, Georgios; Deursen, Arie van
DOI
10.1145/3368089.3409717
Publication date
2020
Document Version
Accepted author manuscript
Published in
Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and
Symposium on the Foundations of Software Engineering
Citation (APA)
Huijgens, H., Rastogi, A., Mulders, E., Gousios, G., & Deursen, A. V. (2020). Questions for Data Scientists
in Software Engineering: A Replication. In P. Devanbu, M. Cohen, & T. Zimmermann (Eds.),
Proceedings of
the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the
Foundations of Software Engineering
(pp. 568–579). (ESEC/FSE 2020). Association for Computing
Machinery (ACM). https://doi.org/10.1145/3368089.3409717
Important note
To cite this publication, please use the final published version (if applicable).
Please check the document version above.
Copyright
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent
of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Takedown policy
Please contact us and provide details if you believe this document breaches copyrights.
We will remove access to the work immediately and investigate your claim.
This work is downloaded from Delft University of Technology.
For technical reasons the number of authors shown on this cover page is limited to a maximum of 10.

estions for Data Scientists in Soware Engineering:
A Replication
Hennie Huijgens
Delft University of Technology
Delft, The Netherlands
h.k.m.huijgens@tudelft.nl
Ayushi Rastogi
Ernst Mulders
Delft University of Technology
Delft, The Netherlands
a.rastogi@tudelft.nl
ernst@mulde.rs
Georgios Gousios
Arie van Deursen
Delft University of Technology
Delft, The Netherlands
g.gousios@tudelft.nl
arie.vandeursen@tudelft.nl
ABSTRACT
In 2014, a Microsoft study investigated the sort of questions that
data science applied to software engineering should answer. This re-
sulted in 145 questions that developers considered relevant for data
scientists to answer, thus providing a research agenda to the com-
munity. Fast forward to ve years, no further studies investigated
whether the questions from the software engineers at Microsoft
hold for other software companies, including software-intensive
companies with dierent primary focus (to which we refer as
software-dened enterprises). Furthermore, it is not evident that
the problems identied ve years ago are still applicable, given the
technological advances in software engineering.
This paper presents a study at ING, a software-dened enter-
prise in banking in which over 15,000 IT sta provides in-house
software solutions. This paper presents a comprehensive guide of
questions for data scientists selected from the previous study at
Microsoft along with our current work at ING. We replicated the
original Microsoft study at ING, looking for questions that impact
both software companies and software-dened enterprises and con-
tinue to impact software engineering. We also add new questions
that emerged from dierences in the context of the two companies
and the ve years gap in between. Our results show that software
engineering questions for data scientists in the software-dened
enterprise are largely similar to the software company, albeit with
exceptions. We hope that the software engineering research com-
munity builds on the new list of questions to create a useful body
of knowledge.
CCS CONCEPTS
General and reference Surveys and overviews.
KEYWORDS
Data Science, Software Engineering, Software Analytics.
Work completed during an internship at ING.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
ESEC/FSE ’20, November 8–13, 2020, Virtual Event, USA
© 2020 Association for Computing Machinery.
ACM ISBN 978-1-4503-7043-1/20/11.. . $15.00
https://doi.org/10.1145/3368089.3409717
ACM Reference Format:
Hennie Huijgens, Ayushi Rastogi, Ernst Mulders, Georgios Gousios, and Arie
van Deursen. 2020. Questions for Data Scientists in Software Engineering: A
Replication. In Proceedings of the 28th ACM Joint European Software Engineer-
ing Conference and Symposium on the Foundations of Software Engineering
(ESEC/FSE ’20), November 8–13, 2020, Virtual Event, USA. ACM, New York,
NY, USA, 21 pages. https://doi.org/10.1145/3368089.3409717
1 INTRODUCTION
Software engineering researchers try solving problems that are
relevant to software developers, teams, and organizations. Histori-
cally, researchers identied these problems from their experience,
connections in industry and/or prior research. In 2014, however, a
study at Microsoft [
7
] systematically analyzed software engineer-
ing questions that data scientists can answer and made it accessible
to a wider audience.
Switching context, in the past few years ING transformed it-
self from a nance-oriented company to a software-dened, data-
driven enterprise. From a software engineering perspective, this
includes the implementation of fully automated release engineer-
ing pipelines for software development activities in more than 600
teams performing 2,500+ deployments per month for 750+ appli-
cations. These activities leave a trove of data, suggesting that data
scientists using, e.g., modern machine learning techniques could
oer valuable and actionable insights to ING.
To that end, ING needs questions that are relevant for their
engineers which their data scientists can answer. As we started
looking for existing resources, we came across the 145 software
engineering questions for data scientists presented in the Microsoft
study [7]. However, before adopting the list, we wanted to know:
RQ: To what extent do software engineering questions relevant for
Microsoft apply to ING, ve years later?
Microsoft is a large software company, while ING that is a Fin-
Tech company using software to improve its banking solutions
(software-dened enterprise). Moreover, the two companies are at
dierent scale. In 2014, Microsoft had more than 30,000 engineers
while even today ING is almost half its size with approximately
15,000 IT employees (on a total of 45,000). More details on the
dierences in the context of the two companies are available in
Table 1. We try to understand whether the questions relevant for
a software company extend to a software-dened enterprise. We
compare the results of the original Microsoft study [
7
] with our
results at ING to understand the relevance of the questions beyond
Microsoft but also as a guide for other software-dened enterprises
that are undergoing their digital transformation. We further explore

ESEC/FSE ’20, November 8–13, 2020, Virtual Event, USA Hennie Huijgens, Ayushi Rastogi, Ernst Mulders, Georgios Gousios, and Arie van Deursen
whether the technological advances in the last ve years changed
the way we develop software. To answer this question, we carried
out a replication of the original Microsoft study at ING. Similar
to the original study, we conducted two surveys: one, to nd data
science problems in software engineering, and second, to rank the
questions in the order of their relevance (see Figure 1). For the rst
survey, we randomly sampled 1,002 ING engineers and received
116 responses with 336 questions. We grouped the 336 questions on
similarities resulting in 171 descriptive questions. We shared sub-
sets of these 171 descriptive questions with another random sample
of 1,296 ING engineers for ranking. In the end, we received 21,888
rankings from 128 ING engineers. These ranked 171 questions are
the questions that engineers at ING would like data scientists to
solve. Further, we compare our list of 171 questions to the original
list of 145 questions to answer our research question. Our study
shows that the core software development problems, relating to
code (e.g. understanding code, testing, and quality), developer pro-
ductivity (both individuals and team) and customer are same for the
software company and the software-dened enterprise. Nonethe-
less, subtle dierences in the type of questions point to changes in
market as well as dierences in the context of the two companies.
2 IMPACT OF THE MICROSOFT 2014 STUDY
In order to gain a good insight into the further course of the Mi-
crosoft 2014 study after it was published, including any implica-
tions for research, we conducted a citation analysis. In addition,
we looked at studies that have not quoted the Microsoft study, but
that are relevant to our study. Hence this section also serves as our
discussion of related work. We investigated the 136 studies that,
according to Google Scholar, quote the Microsoft study. First of all,
we looked at the number of times that the 136 studies themselves
were cited by other studies; we limited the further analysis to 70
studies with a citation per year greater than 1.00. We then charac-
terized studies into empirical approach, reference characterization,
SE topic, and machine learning (ML) topic (see Table 2). Note that
one paper can belong to multiple topics. We made the following
observations:
Microsoft itself is building on its study. 11% of the citations come
from Microsoft studies itself, mostly highly cited studies on SE
culture, such as [
18
,
41
,
51
]. we notice that all citing Microsoft
studies use a survey among a large number of SE practitioners
(ranging from 16 to 793 respondents with a median of 311), whereas
other studies based on a survey generally reach substantially lower
numbers of participants.
Table 1: Context of Microsoft in 2014 and ING in 2019.
Microsoft 2014 ING 2019
Branch Software Company Banking (FinTech)
Organization Size
Approx. 100,000 (in 2014),
about 30,000 engineers
45,000 employees of
which 15,000 IT
Team Structure Typically size 5 ± 2 600 teams of size 9 ± 2
Development Model Agile/Scrum (60%+) Agile (Scrum / Kanban)
Pipeline automation
Every team is dierent.
Continuous Integration
in many teams
Continuous Delivery as a
Service
Development Practice
DevOps (Biz)DevOps
Table 2: Characterizations of Citing Studies.
Empirical Approach (n = 70)
Number of studies
Percentage
Analysis of SE process data (e.g. IDE) 30 43%
Survey SE practitioners 17 24%
Interview SE practitioners 7 10%
Literature review 5 7%
Experiment, case, or eld study 5 7%
Reference characterization (n = 70)
Number of studies
Percentage
Plain reference in related work 38 54%
Reference as example for study setup 27 39%
Study partly answers MS question 9 13%
Study explicitly answers MS question 3 4%
Software Engineering Topic (n = 70)
Number of studies
Percentage
Software analytics, data science 20 29%
Testing, debugging, quality, code review 15 21%
Software engineering process 12 17%
Software engineering culture 9 13%
Mobile apps 3 4%
Machine Learning Topic (n = 24)
Number of studies
Percentage
Examples of Machine Learning applications 8 11%
Natural Language Processing 5 7%
Ensemble Algorithms 3 4%
Instance-based Algorithms 2 3%
Deep Learning Algorithms 2 3%
Other 4 5%
Half of the citing studies analyze SE process data, and 24% uses a
survey. Looking at the empirical approach (see the rst sub-table
in Table 2), indicates that 43% of the studies contain a quantitative
component, in which analysis of SE process data in particular is part
of the study. Good examples are [
9
,
28
]. Furthermore, 24% of the
citing studies uses a survey among SE practitioners, for example
[
18
,
22
,
45
,
69
,
75
]. Ten percent is based on interviews with SE
practitioners, such as [
20
,
41
,
42
,
50
]. Seven percent contains a
literature review, for example [
12
,
45
,
73
]. Another 7% conducts an
experiment [33, 62], case study [49, 59], or eld study [9, 10].
Only three out of 70 studies explicitly answer a question from the
initial Microsoft study. The second sub-table in Table 2 shows that
only 3 studies (4%) explicitly refer their research question to an
initial Microsoft one: [
16
,
28
,
33
]. Nine studies (13%) partly try to
answer a MS question: [
8
10
,
30
,
52
,
62
,
64
,
65
,
70
]. 29 studies (39%)
refer to the original Microsoft study because they used it as an
example for their own study [
17
,
59
], either with regard to the
study design [
20
,
22
,
29
,
37
,
46
,
48
,
67
], the rating approach (Kano)
[
51
,
61
], or the card sorting technique [
19
,
54
,
60
,
63
]. Furthermore,
a large part (38 studies, 54%) of the citing studies simply refers to
the original Microsoft study in a simple related work way.
A majority of citing studies is about Software Analytics, Testing
related studies, and SE Process. The third sub-table shows that most
cited studies are about software analytics, often combined with
a focus on the role of the software engineer and its perceptions,
e.g. [
42
,
51
]. In other cases the emphasis on software analytics is

estions for Data Scientists in Soware Engineering: A Replication ESEC/FSE ’20, November 8–13, 2020, Virtual Event, USA
combined with a more technical focus on machine learning, e.g.
[
21
,
48
]. Other studies within the topic software analytics are about
a variety of methods, tools, and techniques [
2
,
3
,
11
,
14
,
15
,
27
,
38
,
47
,
55
,
71
73
]. Many of the studies that cite the Microsoft study—
and which are often quoted themselves—relate to testing or test
automation. Fifteen studies (21%) are about testing [
8
10
,
13
,
23
,
24, 33, 45, 66], debugging [80] and code review [25, 46].
12 studies (17%) handle SE process related topics, such as produc-
tivity of software engineers [
52
], visualization [
6
,
31
], and continu-
ous delivery [
74
,
76
]. In addition, studies also relate to continuous
delivery pipelines and pipeline automation [
74
,
78
]. Another fre-
quent topic in citing studies is data and models, including aspects
of cloud development [
32
,
49
,
55
]. Driven by a tendency toward
automation of pipelines, software generates a large amount of data.
Many dierent data sources—such as version control systems, peer
code review systems, issue tracking systems, mail archives—are
available for mining purposes [29, 79].
34% of the cited studies includes some form of Machine Learning.
One third of the citing papers do include some form of machine
learning (ML), ranging from applying a ML technique for analysis
purposes to coming up with examples of the application of ML
in practice. As the fourth sub-table in Table 2 shows, 8 studies
include examples of applications of ML in practice, e.g. [
11
,
41
,
55
].
Text related techniques such as NLP occur 5 times, e.g. [
23
,
61
],
ensemble techniques 3 times [
30
,
37
,
60
], and instance-based and
deep learning both 2 times [
14
,
21
,
27
,
48
]. Four other techniques—
neural networks, clustering, decision trees, and regression—occur
one time. Perhaps this nding supports a trend that is visible in
SE research, where more and more machine learning techniques
are being used in SE analyzes and vice versa, also called AI-for-
Software-Engineering [1, 40, 53].
13% are about the cultural aspects of software engineering. Soft-
ware analytics is an area of extensive growth [
56
]. The original
Microsoft 2014 study inuenced ongoing research, looking at the
136 papers citing it gives the impression that it certainly did inspire
other researchers and practitioners in setting up studies on software
developers needs. Nine studies (13%) of the citing studies are about
cultural aspects of software engineering, such as topic selection in
experiments [
58
], characteristics of software engineers [
20
,
50
,
67
],
causes for frustration [
19
], or challenges for software engineers
[29, 63, 69].
3 STUDY DESIGN
Our study design comprises of two parts. In part one, we replicate
the original Microsoft study at ING. We follow the step-by-step
procedure prescribed in the original study, with slight modications
appropriate for our contextFigure 1 depicts the research methodol-
ogy we followed; the gure is an exact copy of the approach used in
the original Microsoft 2014 study with numbers from our study. In
the next step, we compare the questions identied in the Microsoft
study to ours for similarities and dierences including addition of
new questions and removal of previous questions to answer our
research questions.
This gure is a copy from the original Microsoft 2014 study, with numbers from our
study. The gure was re-used with permission of the Microsoft 2014 study authors.
Figure 1: Overview of the research methodology
3.1 The Initial Survey
We sent the initial survey to 1,002 ING software engineers randomly
chosen from a group of 2,342 employees working within the IT
department of ING in May 2018. Unlike the Microsoft study, we did
not oer any reward to increase the participation. This is a deviation
from the original study but aligns with the policy of ING. Out of the
1,002 engineers 387 engineers started the survey, 271 of them even
lled the demographics but stopped when asked to write questions.
In the end, we received 336 questions from 116 responses for a
response rate of 11.6%. Table 3 shows the distribution of responses
across discipline and role.
3.2 Coding and Categorization
Next we did an open card sort to group 336 questions into categories.
Our card sort was open, meaning that we coded independently
from the Microsoft study. To create independent codes, the rst
author who did a majority of the coding did not study the Microsoft
paper before or during the replication. The other authors knew the
paper from before and merely skimmed the methodology section
for replication.
We let the groups emerge and evolve during the sorting process.
This process comprised of three phases. In preparation phase, we
created a card for each question. Questions 1 to 40 were tagged by

ESEC/FSE ’20, November 8–13, 2020, Virtual Event, USA Hennie Huijgens, Ayushi Rastogi, Ernst Mulders, Georgios Gousios, and Arie van Deursen
the second author. Questions 41 to 80 were tagged by the fourth
author. Questions 81 to 90 were tagged by both the second and
the fourth author. The tags of questions 1 to 90 were discussed by
both the second and fourth author and based on their discussion
nal tags were prepared. The remaining questions 91 to 336 were
then tagged by the rst author, based on the tags from the previous
step. We discarded cards that made general comments on software
development and did not inquire any specic topic.
In the execution phase, cards were sorted into meaningful groups
and were assigned a descriptive title. Similar to the Microsoft study,
the questions were not easy to work with; many questions were
same or similar to one another, most were quite verbose while oth-
ers were overly specic. We distilled them into a set of so-called
descriptive questions that more concisely describe each category
(and sub-category). In this step, out of the 336 questions, 49 ques-
tions were discarded and the remaining 287 questions were divided
into 35 sub-categories. An example of reaching descriptive question
is presented below
1
:
‘What factors aect the composition of DevOps teams?’
from the following respondents’ questions:
7
“Would it be better to create specialized development teams instead
of DevOps teams?"
7
"What is your idea of an ideal team that should develop software?
How many and what kind of people should be part of it?"
Finally, in the analysis phase, we created abstract hierarchies
to deduce general categories and themes. In total, we created 171
descriptive questions, a full list of which is available in the appendix.
3.3 The Rating Survey
We created a second survey to rate the 171 descriptive questions.
We split the questionnaire into eight component blocks (similar
to the Microsoft study) and sent component blocks to potential
respondents. The idea behind using the split questionnaire survey
design is to avoid low response rate. Each participant received a
block of questions along with a text "In your opinion, how important
is it to have a software data analytics team answer this question?"
with possible answers as "Essential", "Worthwhile", "Unimportant",
"Unwise", and "I don’t understand" [39].
1
A closed balloon indicates a respondent question; an open balloon indicates a descrip-
tive question.
Table 3: Distribution of responses based on discipline and
role in the initial survey as well as rating survey.
Discipline Initial Survey Rating Survey
Development & Testing 62.0% 68.8%
Project Management 2.0% 3.9%
Other Engineering (e.g. architect) 28.0% 19.5%
Non-Engineering 8.0% 7.8%
Current Role Initial Survey Rating Survey
Developer 51.1% 20.0%
Lead 14.3% 18.7%
Architect 9.0% 11.8%
Manager & Executive 8.3% 20.0%
Other 17.3% 29.6%
The rating survey was sent to the remaining 1,296 software
engineers at ING. Here too, 360 engineers started the survey (28%),
but many of them did not complete it (36% drop-out rate). Finally,
we received 128 responses, for a somewhat low response rate of
10%. On an average each question received 21,888/177=123 ratings
making the resulting ranks stable. Table 3 shows the distribution
of responses for the rating survey based on discipline and current
role.
3.3.1 Top-Rated/Boom-Rated estions. Finally, to rank each
question, we dichotomized the ordinal Kano scale avoiding any
scale violations [
44
]. We computed the following percentages for
each descriptive question:
Percentage of
Essential
responses among all the responses:
Essential
Essential + Worthwhile + Unimportant + Unwise
Percentage of ’Essential’ and ’Worthwhile responses among
all the responses (to which we refer as Worthwhile+):
Essential + W orthwhile
Essential + W orthwhile + Unimportant + Unwise
Percentage of
Unwise
responses among all the responses:
U nw ise
Essential + W orthwhile + Unimportant + Unwise
We rank each question based on the above percentages, with
the top rank (#1) having the highest percentage in a dimension
(Essential, Worthwhile+, or Unwise). Table 5 and Table 6 presents
the most desired (Top 10 Essential, Top 10 Worthwhile+) and the
most undesired (Top 10 Unwise) descriptive questions. For all 171
questions and their rank, see the appendix.
3.3.2 Rating by Demographics. Unlike the Microsoft study, we did
not have employee database to rank responses based on demograph-
ics, and privacy regulations prevented us from asking people-related
aspects such as years of experience (another deviation from the
original study). Nonetheless, in both the initial and the rating sur-
vey, we asked the following professional background data from the
participants:
Discipline: Participants were asked to indicate their primary
working area: Development, Test, Project Management, Other
Engineer (e.g. architect, lead), or Other Non-Engineer (only
one selection was possible).
Current Role: Participants were asked to indicate their cur-
rent role: Individual Contributor, Lead, Architect, Manager,
Executive, or Other (more selections were possible).
To investigate the relations of descriptive questions to profes-
sional background (discipline or current role), we built stepwise
logistic regression models. We build our own models since the refer-
enced study did not share scripts to run statistical tests although we
did follow their procedure as is. Stepwise regression eliminated pro-
fessional backgrounds that did not improve the model for a given
question and a response. In addition, we removed professional back-
grounds for which the coecient in the model was not statistically
signicant at p-value < 0.01. For each of the 171 questions, we built
a model with Essential response (yes/no) as a dependent variable

Citations
More filters
Proceedings ArticleDOI
28 Jun 2021
TL;DR: In this paper, the authors identify, analyzes, and synthesizes the challenges of ML-enabled software development, which differs from traditional software development. But, with the adoption of the SE technique to engineer MLenabled software, the study was able to identify advancement for ML-based software like automation of mismatch detection, which occurs due to the nature of different perspectives of stakeholders involved.
Abstract: As the data increase keeps on getting more extensive due to technology evolvement from the rational database, online transaction, cloud computing, data warehouse to big data analytics. This changes influences organizations to advance from data mining support to machine learning-enabled software platform. Seemingly, the study summarised secondary data from non-grey and grey academic literature as the research field recently started getting attention. Consequently, the work identifies, analyzes, and synthesizes the challenges of ML-enabled software development, which differs from traditional software development. But, with the adoption of the SE technique to engineer ML-enabled software development, the study was able to identify advancement for ML-enabled software likes automation of mismatch detection, which occurs due to the nature of different perspectives of stakeholders involved. Another one is integrating ML and SE data end-to-end pipeline to allow Systematic test mechanism and test automation where necessary when ML is complex in format to enable standard SE test logs. Then, education, training, and cooperation between the stakeholders, especially SE and ML, to gain more experience, knowledge, put rifts aside to join hands, and work together to ascertain user requirements. Finally, the work reframed the traditional SE development process to engineer the ML software development process. Therefore, the study can benefit stakeholders in the ML and SE communities in handling ML development challenges and may benefits academicians in conduction future research on software engineering for artificial intelligence.

2 citations

Proceedings ArticleDOI
07 Nov 2022
TL;DR: Nalanda as discussed by the authors is a large scale data platform for information overload and discovery in software development, which contains two subsystems: (1) a large-scale socio-technical graph system, named Nalanda graph system and (2) a big data index system, which aims at satisfying the information needs of software developers.
Abstract: Software development is information-dense knowledge work that requires collaboration with other developers and awareness of artifacts such as work items, pull requests, and file changes. With the speed of development increasing, information overload and information discovery are challenges for people developing and maintaining these systems. Finding information about similar code changes and experts is difficult for software engineers, especially when they work in large software systems or have just recently joined a project. In this paper, we build a large scale data platform named Nalanda platform to address the challenges of information overload and discovery. Nalanda contains two subsystems: (1) a large scale socio-technical graph system, named Nalanda graph system, and (2) a large scale index system, named Nalanda index system that aims at satisfying the information needs of software developers. To show the versatility of the Nalanda platform, we built two applications: (1) a software analytics application with a news feed named MyNalanda that has Daily Active Users (DAU) of 290 and Monthly Active Users (MAU) of 590, and (2) a recommendation system for related work items and pull requests that accomplished similar tasks (artifact recommendation) and a recommendation system for subject matter experts (expert recommendation), augmented by the Nalanda socio-technical graph. Initial studies of the two applications found that developers and engineering managers are favorable toward continued use of the news feed application for information discovery. The studies also found that developers agreed that a system like Nalanda artifact and expert recommendation application could reduce the time spent and the number of places needed to visit to find information.

1 citations

Proceedings ArticleDOI
01 May 2022
TL;DR: In this paper , the authors provide an industry perspective on why this is a challenging and worthy problem that needs to be addressed and outline an approach to quickly gauge the greenness of a software project based on the choices made across different SDLC dimensions.
Abstract: As sustainability takes center stage across businesses, green and energy-efficient choices are more crucial than ever. While it is becoming increasingly evident that software and the software industry are substantial and rapidly evolving contributors to carbon emissions, there is a dearth of approaches to create actionable awareness about this during the software development lifecycle (SDLC). Can software teams comprehend how green are their projects? Here we provide an industry perspective on why this is a challenging and worthy problem that needs to be addressed. We also outline an approach to quickly gauge the “greenness” of a software project based on the choices made across different SDLC dimensions and present the initial encouraging feedback this approach has received.

1 citations

Proceedings ArticleDOI
01 May 2022
TL;DR: The usefulness of this data is demonstrated by reporting the findings from two small studies: a topic model analysis providing an overview of open-source community dynamics since 2011 and a qualitative analysis of a smaller community-oriented sample within the authors' dataset to gain a better understanding of why contributors leave open- source software.
Abstract: Talks at practitioner-focused open-source software conferences are a valuable source of information for software engineering researchers. They provide a pulse of the community and are valuable source material for grey literature analysis. We curated a dataset of 24,669 talks from 87 open-source conferences between 2010 and 2021. We stored all relevant metadata from these conferences and provide scripts to collect the transcripts. We believe this data is useful for answering many kinds of questions, such as: What are the important/highly discussed topics within practitioner communities? How do practitioners interact? And how do they present themselves to the public? We demonstrate the usefulness of this data by reporting our findings from two small studies: a topic model analysis providing an overview of open-source community dynamics since 2011 and a qualitative analysis of a smaller community-oriented sample within our dataset to gain a better understanding of why contributors leave open-source software.
Proceedings ArticleDOI
20 Aug 2021
TL;DR: Huijgens et al. as discussed by the authors reported the use of the 145 software engineering questions for data scientists presented in the Microsoft study in a recent FSE~'20 paper.
Abstract: We report here the use of the 145 software engineering questions for data scientists presented in the Microsoft study in a recent FSE~'20 paper by Huijgens et al. The study by Begel et al. was replicated by Huijgens et al.
References
More filters
Proceedings ArticleDOI
13 May 2014
TL;DR: It is concluded that using snowballing, as a first search strategy, may very well be a good alternative to the use of database searches.
Abstract: Background: Systematic literature studies have become common in software engineering, and hence it is important to understand how to conduct them efficiently and reliably.Objective: This paper presents guidelines for conducting literature reviews using a snowballing approach, and they are illustrated and evaluated by replicating a published systematic literature review.Method: The guidelines are based on the experience from conducting several systematic literature reviews and experimenting with different approaches.Results: The guidelines for using snowballing as a way to search for relevant literature was successfully applied to a systematic literature review.Conclusions: It is concluded that using snowballing, as a first search strategy, may very well be a good alternative to the use of database searches.

2,279 citations

Journal ArticleDOI
TL;DR: In this paper, a correlation analysis was done between the five factor scores of the Ng et al. reanalysis and the four dimension scores of Hofstede's this paper study.
Abstract: Ng et al. (1982) collected data among students in nine Asian and Pacific countries using a modified version of the Rokeach Value Survey. Their data were reanalyzed by the present authors through an ecological factor analysis that produced five factors. Six of the countries covered also appear in Hofstede's (1983) extended study of work-related values among employees of a multinational corporation in 53 countries and regions. For the overlapping countries a correlation analysis was done between the five factor scores of the Ng et al. reanalysis and the four dimension scores of Hofstede. This correlation analysis revealed that each of Hofstede's dimensions can be distinctly identified in the Ng et al. data as well. This article is presented as an example of synergy between different cross-cultural studies.

1,391 citations

Book ChapterDOI
01 Jan 2008
TL;DR: This chapter uses examples of three software engineering surveys to illustrate the advantages and pitfalls of using surveys and discusses the six most important stages in survey-based research.
Abstract: Although surveys are an extremely common research method, surveybased research is not an easy option. In this chapter, we use examples of three software engineering surveys to illustrate the advantages and pitfalls of using surveys. We discuss the six most important stages in survey-based research: setting the survey’s objectives; selecting the most appropriate survey design; constructing the survey instrument (concentrating on self-administered questionnaires); assessing the reliability and validity of the survey instrument; administering the instrument; and, finally, analysing the collected data. This chapter provides only an introduction to survey-based research; readers should consult the referenced literature for more detailed advice.

386 citations

Journal ArticleDOI
TL;DR: This paper identifies two types of replications: exact replications, in which the procedures of an experiment are followed as closely as possible; and conceptual replication, inWhich the same research question is evaluated by using a different experimental procedure.
Abstract: Replications play a key role in Empirical Software Engineering by allowing the community to build knowledge about which results or observations hold under which conditions. Therefore, not only can a replication that produces similar results as the original experiment be viewed as successful, but a replication that produce results different from those of the original experiment can also be viewed as successful. In this paper we identify two types of replications: exact replications, in which the procedures of an experiment are followed as closely as possible; and conceptual replications, in which the same research question is evaluated by using a different experimental procedure. The focus of this paper is on exact replications. We further explore them to identify two sub-categories: dependent replications, where researchers attempt to keep all the conditions of the experiment the same or very similar and independent replications, where researchers deliberately vary one or more major aspects of the conditions of the experiment. We then discuss the role played by each type of replication in terms of its goals, benefits, and limitations. Finally, we highlight the importance of producing adequate documentation for an experiment (original or replication) to allow for replication. A properly documented replication provides the details necessary to gain a sufficient understanding of the study being replicated without requiring the replicator to slavishly follow the given procedures.

318 citations

Frequently Asked Questions (3)
Q1. What have the authors contributed in "Questions for data scientists in software engineering: a replication" ?

In 2014, a Microsoft study investigated the sort of questions that data science applied to software engineering should answer. This resulted in 145 questions that developers considered relevant for data scientists to answer, thus providing a research agenda to the community. This paper presents a study at ING, a software-defined enterprise in banking in which over 15,000 IT staff provides in-house software solutions. This paper presents a comprehensive guide of questions for data scientists selected from the previous study at Microsoft along with their current work at ING. The authors replicated the original Microsoft study at ING, looking for questions that impact both software companies and software-defined enterprises and continue to impact software engineering. The authors hope that the software engineering research community builds on the new list of questions to create a useful body of knowledge. Fast forward to five years, no further studies investigated whether the questions from the software engineers at Microsoft hold for other software companies, including software-intensive companies with different primary focus ( to which the authors refer as software-defined enterprises ). Furthermore, it is not evident that the problems identified five years ago are still applicable, given the technological advances in software engineering. 

Fifteen studies (21%) are about testing [8–10, 13, 23, 24, 33, 45, 66], debugging [80] and code review [25, 46].12 studies (17%) handle SE process related topics, such as productivity of software engineers [52], visualization [6, 31], and continuous delivery [74, 76]. 

The authors saw questions eliciting the need of agile methods in the Microsoft study while at ING the questions related to functional aspects. 

Trending Questions (1)
Can a software developer become data analyst?

Our results show that software engineering questions for data scientists in the software-defined enterprise are largely similar to the software company, albeit with exceptions.