Proceedings Article•DOI•

Questions for data scientists in software engineering: a replication

Q: How many studies are cited about software analytics?

Fifteen studies (21%) are about testing [8–10, 13, 23, 24, 33, 45, 66], debugging [80] and code review [25, 46].12 studies (17%) handle SE process related topics, such as productivity of software engineers [52], visualization [6, 31], and continuous delivery [74, 76].

Hennie Huijgens¹, Ayushi Rastogi¹, Ernst Mulders¹, Georgios Gousios¹, Arie van Deursen¹ - Show less +1 more•Institutions (1)

Delft University of Technology¹

08 Nov 2020-pp 568-579

TL;DR: In this paper, the authors present a study at ING, a software-defined enterprise in banking in which over 15,000 IT staff provides in-house software solutions, and find that software engineering questions for data scientists in the software defined enterprise are largely similar to the software company, albeit with exceptions.

read less

Abstract: In 2014, a Microsoft study investigated the sort of questions that data science applied to software engineering should answer. This resulted in 145 questions that developers considered relevant for data scientists to answer, thus providing a research agenda to the community. Fast forward to five years, no further studies investigated whether the questions from the software engineers at Microsoft hold for other software companies, including software-intensive companies with different primary focus (to which we refer as software-defined enterprises). Furthermore, it is not evident that the problems identified five years ago are still applicable, given the technological advances in software engineering. This paper presents a study at ING, a software-defined enterprise in banking in which over 15,000 IT staff provides in-house software solutions. This paper presents a comprehensive guide of questions for data scientists selected from the previous study at Microsoft along with our current work at ING. We replicated the original Microsoft study at ING, looking for questions that impact both software companies and software-defined enterprises and continue to impact software engineering. We also add new questions that emerged from differences in the context of the two companies and the five years gap in between. Our results show that software engineering questions for data scientists in the software-defined enterprise are largely similar to the software company, albeit with exceptions. We hope that the software engineering research community builds on the new list of questions to create a useful body of knowledge.

...read moreread less

Summary (5 min read)

Jump to: [1 INTRODUCTION] – [2 IMPACT OF THE MICROSOFT 2014 STUDY] – [3 STUDY DESIGN] – [3.1 The Initial Survey] – [3.2 Coding and Categorization] – [3.3 The Rating Survey] – [3.3.2 Rating by Demographics.] – [3.4 Comparison of Questions] – [4 RESULTS] – [4.1 Categories] – [4.1.3 Development Best Practices (BEST)] – [4.2 Top-Rated Questions] – [4.3 Bottom-Rated Questions] – [4.4.1 Discipline.] – [4.5 Comparing ING and Microsoft Questions] – [5 DISCUSSION] – [5.1 Implications] – [5.2 Threats to Validity] and [6 CONCLUSION]

1 INTRODUCTION

Software engineering researchers try solving problems that are relevant to software developers, teams, and organizations.
As the authors started looking for existing resources, they came across the 145 software engineering questions for data scientists presented in the Microsoft study [7] .
Microsoft is a large software company, while ING that is a Fin-Tech company using software to improve its banking solutions (software-defined enterprise).
The authors try to understand whether the questions relevant for a software company extend to a software-defined enterprise.
The authors shared subsets of these 171 descriptive questions with another random sample of 1,296 ING engineers for ranking.

2 IMPACT OF THE MICROSOFT 2014 STUDY

In order to gain a good insight into the further course of the Microsoft 2014 study after it was published, including any implications for research, the authors conducted a citation analysis.
The authors notice that all citing Microsoft studies use a survey among a large number of SE practitioners (ranging from 16 to 793 respondents with a median of 311), whereas other studies based on a survey generally reach substantially lower numbers of participants.
The third sub-table shows that most cited studies are about software analytics, often combined with a focus on the role of the software engineer and its perceptions, e.g. [42, 51] .
In addition, studies also relate to continuous delivery pipelines and pipeline automation [74, 78] .

3 STUDY DESIGN

In part one, the authors replicate the original Microsoft study at ING.
The authors follow the step-by-step procedure prescribed in the original study, with slight modifications appropriate for their context Figure 1 depicts the research methodology they followed; the figure is an exact copy of the approach used in the original Microsoft 2014 study with numbers from their study.
In the next step, the authors compare the questions identified in the Microsoft study to ours for similarities and differences including addition of new questions and removal of previous questions to answer their research questions.

3.1 The Initial Survey

The authors sent the initial survey to 1,002 ING software engineers randomly chosen from a group of 2,342 employees working within the IT department of ING in May 2018.
Unlike the Microsoft study, the authors did not offer any reward to increase the participation.
This is a deviation from the original study but aligns with the policy of ING.
Out of the 1,002 engineers 387 engineers started the survey, 271 of them even filled the demographics but stopped when asked to write questions.
Table 3 shows the distribution of responses across discipline and role.

3.2 Coding and Categorization

Next the authors did an open card sort to group 336 questions into categories.
To create independent codes, the first author who did a majority of the coding did not study the Microsoft paper before or during the replication.
Questions 81 to 90 were tagged by both the second and the fourth author.
The authors distilled them into a set of so-called descriptive questions that more concisely describe each category (and sub-category).

3.3 The Rating Survey

The authors created a second survey to rate the 171 descriptive questions.
The authors split the questionnaire into eight component blocks (similar to the Microsoft study) and sent component blocks to potential respondents.
The authors rank each question based on the above percentages, with the top rank (#1) having the highest percentage in a dimension (Essential, Worthwhile+, or Unwise).
Table 5 and Table 6 presents the most desired (Top 10 Essential, Top 10 Worthwhile+) and the most undesired (Top 10 Unwise) descriptive questions.

3.3.2 Rating by Demographics.

Unlike the Microsoft study, the authors did not have employee database to rank responses based on demographics, and privacy regulations prevented us from asking people-related aspects such as years of experience (another deviation from the original study).
The authors build their own models since the referenced study did not share scripts to run statistical tests although they did follow their procedure as is.
For each of the 171 questions, the authors built a model with Essential response (yes/no) as a dependent variable and professional background as independent variable.
The authors built similar models for Worthwhile+ and Unwise responses.
In total, the authors built 513 models, three for each of the 171 descriptive questions.

3.4 Comparison of Questions

Then for each theme, the authors see how the prominent questions in ING compare against the prominent questions at Microsoft.
First, the authors ran word counts on the questions from both the companies presenting a text-based comparison to identify broad differences.
Further, the first two authors manually analyzed top 100 essential questions from the two companies in detail.
The authors drew affinity diagrams using Microsoft questions and appended related questions from ING to it.
Analyses of the three clusters and the frequency distribution of questions (in addition to the previous three analyses) present insights into their research question.

4 RESULTS

The original Microsoft study came up with 145 questions that software engineers want data scientists to answer.
Replicating the original study at ING, the authors identified 171 data science questions.
This section presents a comparison of the two sets of questions based on category, type of questions within categories, top-rated questions, bottom-rated questions, and questions relevant for different demographics.
Next, the authors compare the questions from the two companies using word count and affinity diagrams to answer their research question.

4.1 Categories

The authors noticed that some of their categories directly match the Microsoft study.
Other categories, however, can be mapped to one or more categories of the Microsoft study.
No new emergent category in their study indicates that broadly there are no differences between the questions for a software-defined enterprise from a software company.
Next, the authors explore the essential questions at ING and their distinguishing link to the questions from the Microsoft study.

4.1.3 Development Best Practices (BEST)

This category emphasized best (or worst) development practices relating to technology selection, effectiveness, and choice of tools.
Questions here ranged from automated test data generation, on-demand provisioning of test environments, testing of high volumes, to question like "should the authors let loose Chaos Monkey" [35] [5].
Notably, questions relating to development trade-offs such as backward compatibility or the impact of testing in production appeared in the Microsoft study but not ours.

4.2 Top-Rated Questions

Interestingly, only two out of the top 15 "Essential" questions were a part of the top 10 "Worthwhile or higher" questions and none vice-versa.
The authors also noticed that in their study topics like the effects of automated continuous delivery pipeline popped up which were not seen in the Microsoft study.
This suggests that for Microsoft customer benefit is most important or perhaps one of the most important question.
Overall, it seems that Microsoft has a big focus on customer while ING emphasizes on the engineering team itself.
Finally, seven questions in the Microsoft study (marked with the icon ⋆) were about qualityrelated issues (same as ours with eleven questions).

4.3 Bottom-Rated Questions

Table 6 shows the top 10 unwise questions.
The most "Unwise" question (Q27) at ING is the use of domain-specific language for use by non-experts.
This effect can be seen in their study too (two of the top ten unwise questions -Q161 and Q30 -relate to measuring the performance of individual engineers), but not nearly as strongly as in the Microsoft study.
It indicates resistance against comparing departments based on key performance indicators like the time to market.

4.4.1 Discipline.

Microsoft study showed tester as a specific discipline mainly interested in test suites, bugs, and product quality.
This can be seen in Table 7 in which overall scores relating to "Test" are low and highest for "Development".
Questions that are also in Table 5 are shown in italics.
The role "Manager" includes the responses for "Manager" and "Lead".
Testers are for example significantly interested in the testability of software code, and the quality of software related to an agile way of working and working in DevOps teams.

4.5 Comparing ING and Microsoft Questions

A comparison of the top 15 words from each company (see Table 8 ) shows that a majority of the popular themes are the same (e.g., code, test, software, and quality).
Apart from this, Microsoft questions focused more on bugs, cost, time, customers, and tools while ING employees talked about version, problem, systems, process, and impact.
Next, the authors inferred 24 themes from the clusters in the affinity diagram organically merging into three broad categories: relating to code (like understanding code, testing, quality), developers (individual and team productivity) and customers (note that while customers did not make it to the top-10 essential questions, they were important in the top-100).
In the ING study, however, the authors do not see such questions.
Another subtle difference between the two companies is relating to code size.

5 DISCUSSION

The authors discuss potential explanations for the differences in the list of questions found in their study compared to the Microsoft study.
The authors saw questions eliciting the need of agile methods in the Microsoft study while at ING the questions related to functional aspects.
One potential explanation for the observation can be that software systems at ING are not of the same scale as Microsoft.
The authors noticed that employees often talked about security, but no real finance-related questions appear.
One explanation for this observation can be that the data science challenges relating to software development are independent of the actual field to which it is applied.

5.1 Implications

One of the key findings of this paper is a list of 171 questions that software engineers in a large, software-driven organization would like to see answered, in order to optimize their software development activities.
From a practical perspective, their study offers a new way of thinking to software development organizations who care about their development processes.
This is exactly how ING intends to use the questions, and the authors believe companies around the world can follow suit.
From a research perspective, the authors have seen that the original Microsoft study has generated a series of papers that apply some form of Machine Learning to address the raised in that study.
The authors study aims to add urgency and direction to this emerging field, by highlighting not just which questions can be answered, but which ones should be answered, from a practitioner perspective.

5.2 Threats to Validity

While their study expands the external validity of the original study, the fact remains that the two lists of questions are based on just two companies, which are both large organizations with over 10,000 software developers.
The authors tried mitigating it by limiting their exposure to the previous study, not involving authors from the Microsoft study, and multiple authors generating codes independently.
Especially mapping the professional background "Discipline" of the original study on the demographic "Discipline" as applied within ING was challenging.
Another potential threat is sensitivity of the ranks which mostly occurs at the extreme sides of the ranking, when, e.g., none of the participants label a question as 'Unwise'.
Furthermore, researchers may have their biases which can potentially influence the results.

6 CONCLUSION

Conducted at ING-a software-defined enterprise providing banking solutions-this study presents 171 questions that software engineers at ING would like data scientists to answer.
The authors compared the two lists of questions and found that the core software development challenges (relating to code, developer, and customer) remain the same.
The authors complete their analysis with a report on the impact Microsoft 2014 study generated, also indicating the impact that their study is capable to generate.
A thorough understanding of key questions software engineers have that can be answered by data scientists is of crucial importance to both the research community and modern software engineering practice.
The authors study aims to contribute to this understanding.

Did you find this useful? Give us your feedback

Figures (11)

Table 2: Characterizations of Citing Studies.

Table 10: Overview of all descriptive questions.

Table 4: ING categories and questions mapped on to the 12 Microsoft categories

Figure 2: Analysis of ING 2019 and MS 2014 questions.

Table 11: Statistically significant rating differences by demographics.

Table 8: Top 15 words from questions at ING and Microsoft

Table 1: Context of Microsoft in 2014 and ING in 2019.

Table 5: Questions with the highest "Essential" and "Worthwhile or higher" percentages.

Table 7: Statistically significant rating differences for the response "Essential" by professional background.

Table 3: Distribution of responses based on discipline and role in the initial survey as well as rating survey.

Content maybe subject to copyright Report

Delft University of Technology

Questions for Data Scientists in Software Engineering: A Replication

Huijgens, Hennie; Rastogi, Ayushi; Mulders, Ernst; Gousios, Georgios; Deursen, Arie van

DOI

10.1145/3368089.3409717

Publication date

2020

Document Version

Accepted author manuscript

Published in

Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and

Symposium on the Foundations of Software Engineering

Citation (APA)

Huijgens, H., Rastogi, A., Mulders, E., Gousios, G., & Deursen, A. V. (2020). Questions for Data Scientists

in Software Engineering: A Replication. In P. Devanbu, M. Cohen, & T. Zimmermann (Eds.),

Proceedings of

the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the

Foundations of Software Engineering

(pp. 568–579). (ESEC/FSE 2020). Association for Computing

Machinery (ACM). https://doi.org/10.1145/3368089.3409717

Important note

To cite this publication, please use the final published version (if applicable).

Please check the document version above.

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent

of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Takedown policy

Please contact us and provide details if you believe this document breaches copyrights.

We will remove access to the work immediately and investigate your claim.

This work is downloaded from Delft University of Technology.

For technical reasons the number of authors shown on this cover page is limited to a maximum of 10.

estions for Data Scientists in Soware Engineering:

A Replication

Hennie Huijgens

Delft University of Technology

Delft, The Netherlands

h.k.m.huijgens@tudelft.nl

Ayushi Rastogi

Ernst Mulders

∗

Delft University of Technology

Delft, The Netherlands

a.rastogi@tudelft.nl

ernst@mulde.rs

Georgios Gousios

Arie van Deursen

Delft University of Technology

Delft, The Netherlands

g.gousios@tudelft.nl

arie.vandeursen@tudelft.nl

ABSTRACT

In 2014, a Microsoft study investigated the sort of questions that

data science applied to software engineering should answer. This re-

sulted in 145 questions that developers considered relevant for data

scientists to answer, thus providing a research agenda to the com-

munity. Fast forward to ve years, no further studies investigated

whether the questions from the software engineers at Microsoft

hold for other software companies, including software-intensive

companies with dierent primary focus (to which we refer as

software-dened enterprises). Furthermore, it is not evident that

the problems identied ve years ago are still applicable, given the

technological advances in software engineering.

This paper presents a study at ING, a software-dened enter-

prise in banking in which over 15,000 IT sta provides in-house

software solutions. This paper presents a comprehensive guide of

questions for data scientists selected from the previous study at

Microsoft along with our current work at ING. We replicated the

original Microsoft study at ING, looking for questions that impact

both software companies and software-dened enterprises and con-

tinue to impact software engineering. We also add new questions

that emerged from dierences in the context of the two companies

and the ve years gap in between. Our results show that software

engineering questions for data scientists in the software-dened

enterprise are largely similar to the software company, albeit with

exceptions. We hope that the software engineering research com-

munity builds on the new list of questions to create a useful body

of knowledge.

CCS CONCEPTS

• General and reference → Surveys and overviews.

KEYWORDS

Data Science, Software Engineering, Software Analytics.

∗

Work completed during an internship at ING.

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specic permission and/or a

fee. Request permissions from permissions@acm.org.

ESEC/FSE ’20, November 8–13, 2020, Virtual Event, USA

ACM ISBN 978-1-4503-7043-1/20/11.. . $15.00

https://doi.org/10.1145/3368089.3409717

ACM Reference Format:

Hennie Huijgens, Ayushi Rastogi, Ernst Mulders, Georgios Gousios, and Arie

van Deursen. 2020. Questions for Data Scientists in Software Engineering: A

Replication. In Proceedings of the 28th ACM Joint European Software Engineer-

ing Conference and Symposium on the Foundations of Software Engineering

(ESEC/FSE ’20), November 8–13, 2020, Virtual Event, USA. ACM, New York,

NY, USA, 21 pages. https://doi.org/10.1145/3368089.3409717

1 INTRODUCTION

Software engineering researchers try solving problems that are

relevant to software developers, teams, and organizations. Histori-

cally, researchers identied these problems from their experience,

connections in industry and/or prior research. In 2014, however, a

study at Microsoft [

] systematically analyzed software engineer-

ing questions that data scientists can answer and made it accessible

to a wider audience.

Switching context, in the past few years ING transformed it-

self from a nance-oriented company to a software-dened, data-

driven enterprise. From a software engineering perspective, this

includes the implementation of fully automated release engineer-

ing pipelines for software development activities in more than 600

teams performing 2,500+ deployments per month for 750+ appli-

cations. These activities leave a trove of data, suggesting that data

scientists using, e.g., modern machine learning techniques could

oer valuable and actionable insights to ING.

To that end, ING needs questions that are relevant for their

engineers which their data scientists can answer. As we started

looking for existing resources, we came across the 145 software

engineering questions for data scientists presented in the Microsoft

study [7]. However, before adopting the list, we wanted to know:

RQ: To what extent do software engineering questions relevant for

Microsoft apply to ING, ve years later?

Microsoft is a large software company, while ING that is a Fin-

Tech company using software to improve its banking solutions

(software-dened enterprise). Moreover, the two companies are at

dierent scale. In 2014, Microsoft had more than 30,000 engineers

while even today ING is almost half its size with approximately

15,000 IT employees (on a total of 45,000). More details on the

dierences in the context of the two companies are available in

Table 1. We try to understand whether the questions relevant for

a software company extend to a software-dened enterprise. We

compare the results of the original Microsoft study [

] with our

results at ING to understand the relevance of the questions beyond

Microsoft but also as a guide for other software-dened enterprises

that are undergoing their digital transformation. We further explore

ESEC/FSE ’20, November 8–13, 2020, Virtual Event, USA Hennie Huijgens, Ayushi Rastogi, Ernst Mulders, Georgios Gousios, and Arie van Deursen

whether the technological advances in the last ve years changed

the way we develop software. To answer this question, we carried

out a replication of the original Microsoft study at ING. Similar

to the original study, we conducted two surveys: one, to nd data

science problems in software engineering, and second, to rank the

questions in the order of their relevance (see Figure 1). For the rst

survey, we randomly sampled 1,002 ING engineers and received

116 responses with 336 questions. We grouped the 336 questions on

similarities resulting in 171 descriptive questions. We shared sub-

sets of these 171 descriptive questions with another random sample

of 1,296 ING engineers for ranking. In the end, we received 21,888

rankings from 128 ING engineers. These ranked 171 questions are

the questions that engineers at ING would like data scientists to

solve. Further, we compare our list of 171 questions to the original

list of 145 questions to answer our research question. Our study

shows that the core software development problems, relating to

code (e.g. understanding code, testing, and quality), developer pro-

ductivity (both individuals and team) and customer are same for the

software company and the software-dened enterprise. Nonethe-

less, subtle dierences in the type of questions point to changes in

market as well as dierences in the context of the two companies.

2 IMPACT OF THE MICROSOFT 2014 STUDY

In order to gain a good insight into the further course of the Mi-

crosoft 2014 study after it was published, including any implica-

tions for research, we conducted a citation analysis. In addition,

we looked at studies that have not quoted the Microsoft study, but

that are relevant to our study. Hence this section also serves as our

discussion of related work. We investigated the 136 studies that,

according to Google Scholar, quote the Microsoft study. First of all,

we looked at the number of times that the 136 studies themselves

were cited by other studies; we limited the further analysis to 70

studies with a citation per year greater than 1.00. We then charac-

terized studies into empirical approach, reference characterization,

SE topic, and machine learning (ML) topic (see Table 2). Note that

one paper can belong to multiple topics. We made the following

observations:

Microsoft itself is building on its study. 11% of the citations come

from Microsoft studies itself, mostly highly cited studies on SE

culture, such as [

]. we notice that all citing Microsoft

studies use a survey among a large number of SE practitioners

(ranging from 16 to 793 respondents with a median of 311), whereas

other studies based on a survey generally reach substantially lower

numbers of participants.

Table 1: Context of Microsoft in 2014 and ING in 2019.

Microsoft 2014 ING 2019

Branch Software Company Banking (FinTech)

Organization Size

Approx. 100,000 (in 2014),

about 30,000 engineers

45,000 employees of

which 15,000 IT

Team Structure Typically size 5 ± 2 600 teams of size 9 ± 2

Development Model Agile/Scrum (60%+) Agile (Scrum / Kanban)

Pipeline automation

Every team is dierent.

Continuous Integration

in many teams

Continuous Delivery as a

Service

Development Practice

DevOps (Biz)DevOps

Table 2: Characterizations of Citing Studies.

Empirical Approach (n = 70)

Number of studies

Percentage

Analysis of SE process data (e.g. IDE) 30 43%

Survey SE practitioners 17 24%

Interview SE practitioners 7 10%

Literature review 5 7%

Experiment, case, or eld study 5 7%

Reference characterization (n = 70)

Number of studies

Percentage

Plain reference in related work 38 54%

Reference as example for study setup 27 39%

Study partly answers MS question 9 13%

Study explicitly answers MS question 3 4%

Software Engineering Topic (n = 70)

Number of studies

Percentage

Software analytics, data science 20 29%

Testing, debugging, quality, code review 15 21%

Software engineering process 12 17%

Software engineering culture 9 13%

Mobile apps 3 4%

Machine Learning Topic (n = 24)

Number of studies

Percentage

Examples of Machine Learning applications 8 11%

Natural Language Processing 5 7%

Ensemble Algorithms 3 4%

Instance-based Algorithms 2 3%

Deep Learning Algorithms 2 3%

Other 4 5%

Half of the citing studies analyze SE process data, and 24% uses a

survey. Looking at the empirical approach (see the rst sub-table

in Table 2), indicates that 43% of the studies contain a quantitative

component, in which analysis of SE process data in particular is part

of the study. Good examples are [

]. Furthermore, 24% of the

citing studies uses a survey among SE practitioners, for example

[

]. Ten percent is based on interviews with SE

practitioners, such as [

]. Seven percent contains a

literature review, for example [

]. Another 7% conducts an

experiment [33, 62], case study [49, 59], or eld study [9, 10].

Only three out of 70 studies explicitly answer a question from the

initial Microsoft study. The second sub-table in Table 2 shows that

only 3 studies (4%) explicitly refer their research question to an

initial Microsoft one: [

]. Nine studies (13%) partly try to

answer a MS question: [

–

]. 29 studies (39%)

refer to the original Microsoft study because they used it as an

example for their own study [

], either with regard to the

study design [

], the rating approach (Kano)

[

], or the card sorting technique [

]. Furthermore,

a large part (38 studies, 54%) of the citing studies simply refers to

the original Microsoft study in a simple related work way.

A majority of citing studies is about Software Analytics, Testing

related studies, and SE Process. The third sub-table shows that most

cited studies are about software analytics, often combined with

a focus on the role of the software engineer and its perceptions,

e.g. [

]. In other cases the emphasis on software analytics is

estions for Data Scientists in Soware Engineering: A Replication ESEC/FSE ’20, November 8–13, 2020, Virtual Event, USA

combined with a more technical focus on machine learning, e.g.

[

]. Other studies within the topic software analytics are about

a variety of methods, tools, and techniques [

–

]. Many of the studies that cite the Microsoft study—

and which are often quoted themselves—relate to testing or test

automation. Fifteen studies (21%) are about testing [

–

24, 33, 45, 66], debugging [80] and code review [25, 46].

12 studies (17%) handle SE process related topics, such as produc-

tivity of software engineers [

], visualization [

], and continu-

ous delivery [

]. In addition, studies also relate to continuous

delivery pipelines and pipeline automation [

]. Another fre-

quent topic in citing studies is data and models, including aspects

of cloud development [

]. Driven by a tendency toward

automation of pipelines, software generates a large amount of data.

Many dierent data sources—such as version control systems, peer

code review systems, issue tracking systems, mail archives—are

available for mining purposes [29, 79].

34% of the cited studies includes some form of Machine Learning.

One third of the citing papers do include some form of machine

learning (ML), ranging from applying a ML technique for analysis

purposes to coming up with examples of the application of ML

in practice. As the fourth sub-table in Table 2 shows, 8 studies

include examples of applications of ML in practice, e.g. [

Text related techniques such as NLP occur 5 times, e.g. [

ensemble techniques 3 times [

], and instance-based and

deep learning both 2 times [

]. Four other techniques—

neural networks, clustering, decision trees, and regression—occur

one time. Perhaps this nding supports a trend that is visible in

SE research, where more and more machine learning techniques

are being used in SE analyzes and vice versa, also called AI-for-

Software-Engineering [1, 40, 53].

13% are about the cultural aspects of software engineering. Soft-

ware analytics is an area of extensive growth [

]. The original

Microsoft 2014 study inuenced ongoing research, looking at the

136 papers citing it gives the impression that it certainly did inspire

other researchers and practitioners in setting up studies on software

developers needs. Nine studies (13%) of the citing studies are about

cultural aspects of software engineering, such as topic selection in

experiments [

], characteristics of software engineers [

causes for frustration [

], or challenges for software engineers

[29, 63, 69].

3 STUDY DESIGN

Our study design comprises of two parts. In part one, we replicate

the original Microsoft study at ING. We follow the step-by-step

procedure prescribed in the original study, with slight modications

appropriate for our contextFigure 1 depicts the research methodol-

ogy we followed; the gure is an exact copy of the approach used in

the original Microsoft 2014 study with numbers from our study. In

the next step, we compare the questions identied in the Microsoft

study to ours for similarities and dierences including addition of

new questions and removal of previous questions to answer our

research questions.

This gure is a copy from the original Microsoft 2014 study, with numbers from our

study. The gure was re-used with permission of the Microsoft 2014 study authors.

Figure 1: Overview of the research methodology

3.1 The Initial Survey

We sent the initial survey to 1,002 ING software engineers randomly

chosen from a group of 2,342 employees working within the IT

department of ING in May 2018. Unlike the Microsoft study, we did

not oer any reward to increase the participation. This is a deviation

from the original study but aligns with the policy of ING. Out of the

1,002 engineers 387 engineers started the survey, 271 of them even

lled the demographics but stopped when asked to write questions.

In the end, we received 336 questions from 116 responses for a

response rate of 11.6%. Table 3 shows the distribution of responses

across discipline and role.

3.2 Coding and Categorization

Next we did an open card sort to group 336 questions into categories.

Our card sort was open, meaning that we coded independently

from the Microsoft study. To create independent codes, the rst

author who did a majority of the coding did not study the Microsoft

paper before or during the replication. The other authors knew the

paper from before and merely skimmed the methodology section

for replication.

We let the groups emerge and evolve during the sorting process.

This process comprised of three phases. In preparation phase, we

created a card for each question. Questions 1 to 40 were tagged by

ESEC/FSE ’20, November 8–13, 2020, Virtual Event, USA Hennie Huijgens, Ayushi Rastogi, Ernst Mulders, Georgios Gousios, and Arie van Deursen

the second author. Questions 41 to 80 were tagged by the fourth

author. Questions 81 to 90 were tagged by both the second and

the fourth author. The tags of questions 1 to 90 were discussed by

both the second and fourth author and based on their discussion

nal tags were prepared. The remaining questions 91 to 336 were

then tagged by the rst author, based on the tags from the previous

step. We discarded cards that made general comments on software

development and did not inquire any specic topic.

In the execution phase, cards were sorted into meaningful groups

and were assigned a descriptive title. Similar to the Microsoft study,

the questions were not easy to work with; many questions were

same or similar to one another, most were quite verbose while oth-

ers were overly specic. We distilled them into a set of so-called

descriptive questions that more concisely describe each category

(and sub-category). In this step, out of the 336 questions, 49 ques-

tions were discarded and the remaining 287 questions were divided

into 35 sub-categories. An example of reaching descriptive question

is presented below

 ‘What factors aect the composition of DevOps teams?’

from the following respondents’ questions:

“Would it be better to create specialized development teams instead

of DevOps teams?"

"What is your idea of an ideal team that should develop software?

How many and what kind of people should be part of it?"

Finally, in the analysis phase, we created abstract hierarchies

to deduce general categories and themes. In total, we created 171

descriptive questions, a full list of which is available in the appendix.

3.3 The Rating Survey

We created a second survey to rate the 171 descriptive questions.

We split the questionnaire into eight component blocks (similar

to the Microsoft study) and sent component blocks to potential

respondents. The idea behind using the split questionnaire survey

design is to avoid low response rate. Each participant received a

block of questions along with a text "In your opinion, how important

is it to have a software data analytics team answer this question?"

with possible answers as "Essential", "Worthwhile", "Unimportant",

"Unwise", and "I don’t understand" [39].

A closed balloon indicates a respondent question; an open balloon indicates a descrip-

tive question.

Table 3: Distribution of responses based on discipline and

role in the initial survey as well as rating survey.

Discipline Initial Survey Rating Survey

Development & Testing 62.0% 68.8%

Project Management 2.0% 3.9%

Other Engineering (e.g. architect) 28.0% 19.5%

Non-Engineering 8.0% 7.8%

Current Role Initial Survey Rating Survey

Developer 51.1% 20.0%

Lead 14.3% 18.7%

Architect 9.0% 11.8%

Manager & Executive 8.3% 20.0%

Other 17.3% 29.6%

The rating survey was sent to the remaining 1,296 software

engineers at ING. Here too, 360 engineers started the survey (28%),

but many of them did not complete it (36% drop-out rate). Finally,

we received 128 responses, for a somewhat low response rate of

10%. On an average each question received 21,888/177=123 ratings

making the resulting ranks stable. Table 3 shows the distribution

of responses for the rating survey based on discipline and current

role.

3.3.1 Top-Rated/Boom-Rated estions. Finally, to rank each

question, we dichotomized the ordinal Kano scale avoiding any

scale violations [

]. We computed the following percentages for

each descriptive question:

•

Percentage of ’

Essential

’ responses among all the responses:

Essential

Essential + Worthwhile + Unimportant + Unwise

•

Percentage of ’Essential’ and ’Worthwhile’ responses among

all the responses (to which we refer as Worthwhile+):

Essential + W orthwhile

Essential + W orthwhile + Unimportant + Unwise

•

Percentage of ’

Unwise

’ responses among all the responses:

U nw ise

Essential + W orthwhile + Unimportant + Unwise

We rank each question based on the above percentages, with

the top rank (#1) having the highest percentage in a dimension

(Essential, Worthwhile+, or Unwise). Table 5 and Table 6 presents

the most desired (Top 10 Essential, Top 10 Worthwhile+) and the

most undesired (Top 10 Unwise) descriptive questions. For all 171

questions and their rank, see the appendix.

3.3.2 Rating by Demographics. Unlike the Microsoft study, we did

not have employee database to rank responses based on demograph-

ics, and privacy regulations prevented us from asking people-related

aspects such as years of experience (another deviation from the

original study). Nonetheless, in both the initial and the rating sur-

vey, we asked the following professional background data from the

participants:

•

Discipline: Participants were asked to indicate their primary

working area: Development, Test, Project Management, Other

Engineer (e.g. architect, lead), or Other Non-Engineer (only

one selection was possible).

•

Current Role: Participants were asked to indicate their cur-

rent role: Individual Contributor, Lead, Architect, Manager,

Executive, or Other (more selections were possible).

To investigate the relations of descriptive questions to profes-

sional background (discipline or current role), we built stepwise

logistic regression models. We build our own models since the refer-

enced study did not share scripts to run statistical tests although we

did follow their procedure as is. Stepwise regression eliminated pro-

fessional backgrounds that did not improve the model for a given

question and a response. In addition, we removed professional back-

grounds for which the coecient in the model was not statistically

signicant at p-value < 0.01. For each of the 171 questions, we built

a model with Essential response (yes/no) as a dependent variable

HTML Viewer

Frequently Asked Questions (3)

Q1. What have the authors contributed in "Questions for data scientists in software engineering: a replication" ?

In 2014, a Microsoft study investigated the sort of questions that data science applied to software engineering should answer. This resulted in 145 questions that developers considered relevant for data scientists to answer, thus providing a research agenda to the community. This paper presents a study at ING, a software-defined enterprise in banking in which over 15,000 IT staff provides in-house software solutions. This paper presents a comprehensive guide of questions for data scientists selected from the previous study at Microsoft along with their current work at ING. The authors replicated the original Microsoft study at ING, looking for questions that impact both software companies and software-defined enterprises and continue to impact software engineering. The authors hope that the software engineering research community builds on the new list of questions to create a useful body of knowledge. Fast forward to five years, no further studies investigated whether the questions from the software engineers at Microsoft hold for other software companies, including software-intensive companies with different primary focus ( to which the authors refer as software-defined enterprises ). Furthermore, it is not evident that the problems identified five years ago are still applicable, given the technological advances in software engineering.

Q2. How many studies are cited about software analytics?

Fifteen studies (21%) are about testing [8–10, 13, 23, 24, 33, 45, 66], debugging [80] and code review [25, 46].12 studies (17%) handle SE process related topics, such as productivity of software engineers [52], visualization [6, 31], and continuous delivery [74, 76].

Q3. What did the authors see in the ING study?

The authors saw questions eliciting the need of agile methods in the Microsoft study while at ING the questions related to functional aspects.

Questions for data scientists in software engineering: a replication

Summary (5 min read)

1 INTRODUCTION

2 IMPACT OF THE MICROSOFT 2014 STUDY

3 STUDY DESIGN

3.1 The Initial Survey

3.2 Coding and Categorization

3.3 The Rating Survey

3.3.2 Rating by Demographics.

3.4 Comparison of Questions

4 RESULTS

4.1 Categories

4.1.3 Development Best Practices (BEST)

4.2 Top-Rated Questions

4.3 Bottom-Rated Questions

4.4.1 Discipline.

4.5 Comparing ING and Microsoft Questions

5 DISCUSSION

5.1 Implications

5.2 Threats to Validity

6 CONCLUSION

Figures (11)

Citations

References

Related Papers (5)

Frequently Asked Questions (3)

Q1. What have the authors contributed in "Questions for data scientists in software engineering: a replication" ?

Q2. How many studies are cited about software analytics?

Q3. What did the authors see in the ING study?

Trending Questions (1)