scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Data Scientist Skills

01 Apr 2016-Vol. 03, Iss: 04, pp 52-61
TL;DR: The aim of this paper is to identify data scientist skills from global best practices and examine the most important data Scientist skills required by Information Technology (IT) personnel and found 44 data scientist Skills and the top 5 (five) skills are business, statistic, machine learning, communication, and analysis.
Abstract: Decision making is one of the most important aspects in order to enhance service delivery to citizens and businesses, gain more profit, and help stakeholders to strategize their business functions. Nowadays, most of the stakeholders make decisions based on the data that is precise, concise, appropriate, and accurate. Even though Big Data Analytic (BDA) tools and software can assist in this matter, skills and competency of the personnel that handle and manage the data is more crucial and important. Thus, the aim of this paper is to identify data scientist skills from global best practices and examine the most important data scientist skills required by Information Technology (IT) personnel. From our findings, we found 44 data scientist skills and the top 5 (five) skills are business, statistic, machine learning, communication, and analysis.

Content maybe subject to copyright    Report

IOSR Journal of Mobile Computing & Application (IOSR-JMCA)
e-ISSN: 2394-0050, P-ISSN: 2394-0042.Volume 3, Issue 4 (Jul. - Aug. 2016), PP 52-61
www.iosrjournals.org
DOI: 10.9790/0050-03045261 www.iosrjournals.org 52 | Page
Data Scientist Skills
Nur Amie Ismail, Wardah Zainal Abidin
(Advance Informatics School, University of Technology Malaysia, Kuala Lumpur)
Abstract: Decision making is one of the most important aspects in order to enhance service delivery to citizens
and businesses, gain more profit, and help stakeholders to strategize their business functions. Nowadays, most of
the stakeholders make decisions based on the data that is precise, concise, appropriate, and accurate. Even though
Big Data Analytic (BDA) tools and software can assist in this matter, skills and competency of the personnel that
handle and manage the data is more crucial and important. Thus, the aim of this paper is to identify data scientist
skills from global best practices and examine the most important data scientist skills required by Information
Technology (IT) personnel. From our findings, we found 44 data scientist skills and the top 5 (five) skills are
business, statistic, machine learning, communication, and analysis.
Keywords: Data Science, Data Scientist, Data Scientist Skill
I. Introduction
Nowadays with the vast amounts of data available in the world, companies across industry are focusing
on exploiting data for their competitive advantage. Hence, they realized that they need to hire more data scientists
or equip their employees with data scientist skills. Data scientist is an expert who is capable to extract meaningful
value from the data and also manage the whole lifecycle of data [1]. Data scientists also help to bridge the
communication gap between business and IT functions, proposing meaningful measures, modelling the data,
visualizing the output, sharing the technique, and automating the process [2]. According to McKinsey Global
Institute, in United Stated of America (USA) alone, they need another 140 - 190 thousand of data scientist by
2018. Whereas in Malaysia, Multimedia Development Corperation (MDeC) have a set an ambitious target to
produce 1,500 data scientists by 2020. According to [3], currently in Malaysia there are only eighty (80) data
scientist across the country. In order to increase number of data scientist, various programs have been arranged
such as Big Data conference, trainings, certification, and Massive Open Online Courses (MOOC). However,
programs that have been arranged still insufficient to cater for meeting the ambitious target. Thus, this paper will
identify data scientist skills from global best practices and examine the most important data scientist skills required
by IT personnel in order to be recognized as Data Scientist.
This paper is organized as per following sequence. Section 2 explained review methodology that has
been used in this study. Section 3 briefly explained data science definition, data science fundamental concept, and
the difference between Data Scientist and Data Analyst. Section 4 discusses the data scientist skills from global
best practices followed by section 5 about finding. Finally, the conclusion and future works is in Section 6.
II. Review Methodology
To get clear and better understanding on the research topic, research review has been conducted from
various resources such as books, articles, journals, and web sites. List of computerized databases used in this paper
are Association for Computing Machinery (ACM) Digital Library, IEEE, Science Direct, Springer Link, Wiley
Online Library, ERIC, Gartner, and Google Scholar. The resources that are reviewed are within the period of 2011
to 2016. Fig. 1 below shows the review methodology that has been used in this paper:
Fig. 1: Research review methodology

Data Scientist Skills
DOI: 10.9790/0050-03045261 www.iosrjournals.org 53 | Page
III. Data Science Definition, Data Science Fundamental Concepts And Difference Between
Data Scientist And Data Analyst
3.1 Data Science Definition
There are several definitions of data scientists from several authors as listed in the table 1.
Table 1: Data Science Definitions
No.
Definition
1
Set of fundamental principles that support and guide the principle extraction of information and
knowledge from data [4].
2
Data science is the study of the generalizable extraction of knowledge from data [5].
3
Data science is a combination of statistic, computer science, and information design [2].
From table 1, we can summarize that, data science is combination of field of study related to extraction and
transformation of data.
3.2 Data Science Fundamental Concepts
According to [2], the fundamental concept of data science is extracting useful knowledge from data to
solve business problems that can be treated systematically by following a process with reasonably well-defined
stages. Data-science results requires careful consideration of the context in which they will be used in the
relationship between the business problem and the analytics solution. This often can be decomposed into tractable
sub problems via the framework of analyzing expected value. IT can be used to find informative data items from
within a large body of data. Other than that, entities that are similar with respect to known features or attributes
often are similar with respect to unknown features or attributes, data might not generalize beyond the observed
data and to draw causal conclusions, an attention to the presence of confounding factors possibly unseen ones.
3.3 Difference Between Data Scientist And Data Analyst
According to [6], Data Analyst focus on the movement and interpretation of data, typically focus on the
past and present. Where Data Scientist focus on summarizing data and to provide forecasting based on pattern
identified from past and current data. [7] define and differentiate between Data Scientist and Data Analyst as
describe in table 2.
Table 2: Data Scientist vs Data Analyst
Data Scientist
Data Analyst
Building statistical models that make decisions
based on data. Each decision can be hard, e.g. block
a page from rendering, or soft, e.g. assign a score for
the maliciousness of a page that is used by
downward systems or humans.
Writing custom queries to answer
complex business questions.
Conducting causality experiments that attempt to
attribute the root cause of an observed phenomenon.
This can be done by designing A/B experiments or
if A/B experiment is not possible apply
epidemiological approach to the problem.
Conceiving and implementing new
metrics on capturing previously
poorly understood parts of the
business / product.
Identifying new products or features that come from
unlocking the value of data; being a thought leader
on the value of data. A good example of that is the
product recommendations feature that Amazon first
made available to a mass audience.
Addressing data quality issues, such
as data gaps or biases in data
acquisition. Working with the rest
of engineering to instrument
incremental new data acquisition.
IV. Data Scientist Skills
In order to explore the list of data scientist skills, this study has a global reach and perspective as well
includes the Malaysian public sector. Basically the data scientist incorporates advanced analytical approach using
sophisticated analytic and data visualization software or tools in order to discover patterns of the data. The data
scientist then will do data migration and integration, data cleaning, analyzing and deliver the outcomes. According
to [8], the data scientist must be able to write in different programming language such as Python, R, Java, Ruby,
Clojure, Matlab, Pig, and SQL. Other than that, the data scientist need to understand about Hive, Hadoop, and
Map Reduce. They also suggest that the data scientist must be familiar with Natural Language Processing (NLP),
machine learning, conceptual modelling, statistical analysis, predictive modelling, and hypothesis testing. Even

Data Scientist Skills
DOI: 10.9790/0050-03045261 www.iosrjournals.org 54 | Page
though the data scientist has to learn new skills as explained above, at least they should have the capabilities in
communication skills, querying the database, understand about business strategy, able to design simple prototype
for top management, and have good understanding in system architecture.
Educational data scientist is rarely sighted breed especially within business and government. In order to
tackle this scenario, we need to produce more graduates and also equip the employees with necessary skills in
data science. [2] suggest that the data scientist should have skills in data mining, data modelling, data visualization,
and machine learning. According to [9], the data scientist uses advanced analytics such as predictive analysis, data
visualization and modelling, and machine learning to predict what is going to happen in the future and give
recommendations to enhance existing business process. They also defines that the data scientist is a combination
of three(3) main fields which are computer science, statistics, and domain knowledge.
Fig. 2 shows the relationship and skills for each area.
Fig. 2:Data scientist skills (Ayankoya et al., 2014)
[5] emphasizes that machine learning is the most important skill and necessary for all data scientists. In
machine learning, the data scientist should master of all 3 class of skills as illustrated in table 3 as per below.
Other than machine learning, data scientist also required knowledge in text mining, markup language like XML,
mathematics, and artificial intelligence (AI).
Table 3: Three (3) class of skill in Machine Learning
No.
Class
Skills
1
Statistic
Bayesian statistic and probability.
2
Computer Science
Data structure, algorithm and distributed computer.
3
Correlation and Causation
Modelling.
Data scientist is an expert that has the ability to manipulate and extract knowledge and turn it into
meaningful value [1]. According to their study, currently there is no accepted and effective data science
professional curriculum. [10] found that the two (2) top skills companies are looking for in a data scientist are
programming and statistical. The details of these two (2) skills illustrate in fig. 3.
Fig. 3: Programming and Statistics
According to [11], data science is now being benchmarked against practices that employed on highly
skilled professionals. Data Scientist uses scientific methods to discover knowledge and patterns of the data.

Data Scientist Skills
DOI: 10.9790/0050-03045261 www.iosrjournals.org 55 | Page
Fig. 4: The data science benefits-realization process(Viaene, 2013)
Fig. 4 illustrates on how to use data to improve the business. This process involves modelling,
discovering, operationalizing, and cultivating the knowledge. The data scientist must have a pretty good skill in
business domain, analysis, and communication. While, according to [12], data scientist is the sexist job in this
century. Sexy in the sense of having a rare quality in high demand. Data scientist is urgently needed by
organizations because they know how to use the analysis of big data to make effective decisions. Among the skill that
they should consider are programming language, computer science, mathematics, economics, probability, and business. In
the O’Reilly book, Analysing the Analysers by [13], they have made a survey over more than 200 data scientists
to discover and analyze what data skills needed by the data scientist. They found 22 generic skills shown in fig.
5.
Fig. 5: Data Scientist generic skills
Malaysia Public Sector have started their BDA project since 2014 led by MAMPU. Since data science is
still new in Malaysia, they do not have internal expertise in this area. Therefore, they have hired external
consultants to develop the project. Even though they use external parties; knowledge transfer, training, and

Data Scientist Skills
DOI: 10.9790/0050-03045261 www.iosrjournals.org 56 | Page
technology updates are given to Government IT officers. Soon, Malaysian Government do realize that the
importance of having internal expertise in this field. Hence, skills of data scientist are identified to enhance
Government IT officer competency and knowledge. According to [14], the skills that are required for data scientist
consist of model and analysis, data processing, statistic, business domain, soft skill, and technical skill as
illustrated in Fig. 6.
Fig. 6: Data scientist skill (Suhailis, 2016)
In 2013, [15] have announced the Digital Malaysia Roadmap, which encompasses a plan that addresses
three ICT areas which are to access, adoption and usage ICT services. One of the goals in the roadmap is to
improve Big Data literacies in Malaysia. Therefore, in October 2013, MDeC have conducted a survey to 17
experts in Big Data. The participants come from different background such as telecommunication company,
universities, marketing agency, software development companies, and others. Based on their survey, the top five
skills needed are:
(i) Big and Distributed Data (eg: Hadoop, MapReduce)
(ii) Algorithms (eg: computational complexity, CS theory)
(iii) Machine Learning (eg: decision trees, neural nets, SVM, clustering)
(iv) Back-End Programming (eg: JAVA/Rails/Objective C)
(v) Visualization (eg: statistical graphics, mapping, web-based dataviz)
In the last few years, the interest in data science field has soared. Most of the companies in USA are
seeking and recruiting employees who have skills related to data science. From the perspective of [16], she
emphasizes that the data scientist must have both technical skill and non-technical as listed in the table 4 below:
Table 4: List of skill needed to recruit employee in data science
No.
Skills
1
Analytics, SAS, R, Python, Coding, Hadoop, SQL, and Database.
2
Intellectual curiosity, business acumen, and communication skills.
In United Kingdom, data science is among the most rapidly emerging field based on trend in ICT market.
The key to success in business nowadays is to understand customer’s preferences, needs, and behavior. Thus, data
scientist plays an important role to do a prediction and make decision in this particular area. [17] concludes that
data scientist need multi-faceted skills illustrated in fig. 7.

Citations
More filters
Proceedings ArticleDOI
03 Oct 2017
TL;DR: The results reveal the difficulty users had in interpreting displayed icon, locating the information provided, re-finding it and in navigating through the mobile app and suggests that the main menu of the app need to be further improved upon to enhance its usability.
Abstract: The increased use of mobile devices has led to an upsurge in the number of mobile applications. This makes the usability of these applications a very crucial and critical issue. The Amila Pregnancy mobile app is revolutionizing the delivery of healthcare services to pregnancy woman across the globe and is increasingly becoming beneficial in their daily life. Only a few digital interventions have been developed for pregnant woman, and little is known about the acceptability and usability of such mobile apps that provide assistance to pregnant women. Usability comprises everything that is connected with the intuitive and efficient handling of user interaction with human-made devices. This paper reports the result of a usability evaluation for Amila Pregnancy mobile application. In the study five attributes of perceived usability was measured following Jakob Nielsen principles, namely: effectiveness, efficiency, learnability, memorability and satisfaction. In addition, performance metrics were also captured. The results reveal the difficulty users had in interpreting displayed icon, locating the information provided, re-finding it and in navigating through the mobile app. The study suggests that the main menu of the app need to be further improved upon to enhance its usability.

14 citations

Book ChapterDOI
01 Jan 2020
TL;DR: A framework of design principles for the application of ML in supply chain risk management (SCRM) is presented, derived and grouped by the three interrelated elements of organization, development and operation, which are to be considered when applying ML in SCRM.
Abstract: The opportunity to anticipate delivery failures, shortages or delays in a company’s upstream supply chains at an early stage facilitates to take preventive countermeasures to mitigate potential damage. However, data-driven predictive technologies such as machine learning (ML) are rarely examined in supply chain risk management (SCRM). The purpose of this paper is to present a framework of design principles for the application of ML in SCRM. The foundation of this framework is an action design research (ADR) project performed in collaboration with the SCRM department of an automotive company. A predictive ML model is developed and evaluated in collaboration with the company. Based on the findings and observations made during the project, general design principles are derived and grouped by the three interrelated elements of organization, development and operation, which are to be considered when applying ML in SCRM. Finally, the derived elements and the corresponding design principles are discussed and justified with reference to the literature.

1 citations

Proceedings ArticleDOI
21 Apr 2021
TL;DR: In this article, the authors presented an NLP approach to the analysis of job listings from Glassdoor and mined insights on trending technical and soft skills in the Data Science job categories, based on the insights, they provided recommendations to design overall data science curriculum learning outcomes.
Abstract: With increasing data volume and adoption of technologies including machine learning and artificial intelligence across all industries, the demand for skilled Data Science professionals is continuing to increase globally. For educational institutions to teach the most up-to-date and industry-relevant skills and for businesses to hire employees with the right set of skills, it is important for them to stay tuned to the fast-changing dynamics of job landscape. In this research study, we present an NLP approach to the analysis of job listings from Glassdoor. Our solution mines insights on trending technical and soft skills in the Data Science job categories. Based on the insights, we provide recommendations to design overall data science curriculum learning outcomes (LOs). We also provide recommendations to the course designers on specific technical skills required for the topics of courses under the data science curriculum.

1 citations

Journal Article
TL;DR: This article classify different types of data scientists’ workplaces through performing latent 1 class analysis using several workplace attributes within a sample of n=486 German data scientists and reveals considerable distinctions regarding the intensity of technostress creators, strains due to ICT use, and job performance.
Abstract: Data scientists represent a heterogeneous occupational group that has reached high relevance due to the wide-spread availability of quantitative data generated in the rapid progress of digital transformation. These employees play a crucial role in gaining competitive advantages for companies out of such big data. In this context, employees who frequently analyse data often occupy different job titles and, therefore, are difficult to detect. At the same time, a psychological downside of digitalization, which is called technostress, has risen. However, these issues caused by the use of information and communication technologies are rarely examined in the context of specific occupational groups and workplace attributes. Considering these challenges, this article extends current technostress research by focusing on technostress within the specific job class of data scientists. We classify different types of data scientists’ workplaces through performing latent 1 Derra et al.: Examining Technostress at Different Types of Data Scientists’ Workplaces Published by AIS Electronic Library (AISeL), © Scandinavian Journal of Information Systems, 2022 34(1), 71-118 Derra et al.: Examining Technostress at Different Types of Data Scientists’ Workplaces 72 class analysis using several workplace attributes within a sample of n=486 German data scientists. Subsequently, we reveal considerable distinctions between these classes regarding the intensity of technostress creators, strains due to ICT use, and job performance. We discuss our empirical findings and deliver theoretical contributions as well as practical implications for both employees and employers and starting points for future research.
Book ChapterDOI
01 Jan 2020
TL;DR: The chapter considers a new digital divide between the society and this small group of people that make sense out of the vast data and help the organization in informed decision making.
Abstract: With the advent of big data, the search for respective data experts has become more intensive. This study aims to discuss data scientist skills and some topical issues that are related to data specialist profiles. A complex competence model has been deployed, dividing the skills into three groups: hard, soft, and analytical skills. The primary focus is on analytical thinking as one of the key competences of the successful data scientist taking into account the trans-discipline nature of data science. The chapter considers a new digital divide between the society and this small group of people that make sense out of the vast data and help the organization in informed decision making. As data science training needs to be business-oriented, the curricula of the Master’s degree in Data Science is compared with the required knowledge and skills for recruitment. Data Science is Here: Are We Ready to Benefit From the Opportunities It Provides?
References
More filters
Journal Article
TL;DR: Harvard Business School's Davenport and Greylock's Patil take a deep dive on what organizations need to know about data scientists: where to look for them, how to attract and develop them, and how to spot a great one.
Abstract: No sales force consists entirely of stars; sales staffs are usually made up mainly of solid perfomers, with smaller groups of laggards and rainmakers. Though most compensation plans approach these three groups as if they were the same, research shows that each is motivated by something different. By accounting for those differences in their incentive programs, companies can coax better performance from all their salespeople. As the largest cadre, core performers typically represent the greatest opportunity, but they're often ignored by incentive plans. Contests with prizes that vary in nature and value (and don't all go to stars) will inspire them to ramp up their efforts, and tiered targets will guide them up the performance curve. Laggards need quarterly bonuses to stay on track; when they have only annual bonuses, their revenues will drop 10%, studies show. This group is also motivated by social pressure-especially from new talent on the sales bench. Stars tend to get the most attention in comp plans, but companies often go astray by capping their commissions to control costs. If firms instead remove commission ceilings and pay extra for overachievement, they'll see the sales needle really jump. The key is to treat sales compensation not as an expense to rein in but as a portfolio of investments to manage. Companies that do this will be rewarded with much higher returns.

860 citations


"Data Scientist Skills" refers background in this paper

  • ...While, according to [12], data scientist is the sexist job in this century....

    [...]

Posted Content
TL;DR: The key is to treat sales compensation not as an expense to rein in but as a portfolio of investments to manage, and companies that do this will be rewarded with much higher returns.
Abstract: Companies fiddle constantly with their incentive plans - but most of their changes have little effect. Here's a better approach.

494 citations

Journal Article
TL;DR: The author considers a new IT-created employment opportunity--the data scientist, which he looks at data, information, and knowledge and current IT job classifications to provide context, and describes how big data has inspired the field of data science.
Abstract: The last installment of this column dealt with the specter of IT-caused unemployment. Here, the author considers a new IT-created employment opportunity--the data scientist. He looks at data, information, and knowledge and current IT job classifications to provide context, describes how big data has inspired the field of data science, and defines what data science is and what data scientists do.

181 citations

Book
27 Jun 2013
TL;DR: The complementary nature of traditional data warehouses and big-data analytics platforms and how they feed each other are described, with a greater focus on architectures that leverage the scale and power of big data and the ability to integrate and apply analytics principles to data which earlier was not accessible.
Abstract: Big Data Imperatives, focuses on resolving the key questions on everyones mind: Which data matters? Do you have enough data volume to justify the usage? How you want to process this amount of data? How long do you really need to keep it active for your analysis, marketing, and BI applications? Big data is emerging from the realm of one-off projects to mainstream business adoption; however, the real value of big data is not in the overwhelming size of it, but more in its effective use. This book addresses the following big data characteristics: Very large, distributed aggregations of loosely structured data often incomplete and inaccessible Petabytes/Exabytes of data Millions/billions of people providing/contributing to the context behind the data Flat schema's with few complex interrelationships Involves time-stamped events Made up of incomplete data Includes connections between data elements that must be probabilistically inferred Big Data Imperativesexplains 'what big data can do'. It can batch process millions and billions of records both unstructured and structured much faster and cheaper. Big data analytics provide a platform to merge all analysis which enables data analysis to be more accurate, well-rounded, reliable and focused on a specific business capability. Big Data Imperativesdescribes the complementary nature of traditional data warehouses and big-data analytics platforms and how they feed each other. This book aims to bring the big data and analytics realms together with a greater focus on architectures that leverage the scale and power of big data and the ability to integrate and apply analytics principles to data which earlier was not accessible. This book can also be used as a handbook for practitioners; helping them on methodology,technical architecture, analytics techniques and best practices. At the same time, this book intends to hold the interest of those new to big data and analytics by giving them a deep insight into the realm of big data. What youll learn Understanding the technology, implementation of big data platforms and their usage for analytics Big data architectures Big data design patterns Implementation best practices Who this book is for This book is designed for IT professionals, data warehousing, business intelligence professionals, data analysis professionals, architects, developers and business users. Table of Contents The New Information ManagementParadigm Big Data's Implication for Businesses Big Data Implications for Information Management Defining Big Data Architecture Characteristics Co-Existent Architectures Data Quality for Big Data Data Security and Privacy Considerations for Big Data Big Data and Analytics Big Data Implications for Practitioners

90 citations


"Data Scientist Skills" refers background in this paper

  • ...According to [8], the data scientist must be able to write in different programming language such as Python, R, Java, Ruby, Clojure, Matlab, Pig, and SQL....

    [...]

Journal ArticleDOI
TL;DR: Success in data science requires a multiskilled project team with data scientists and domain experts working closely together.
Abstract: It will take a lot of conversation to make data science work Data scientists can't do it on their own Success in data science requires a multiskilled project team with data scientists and domain experts working closely together

56 citations


"Data Scientist Skills" refers background in this paper

  • ...3: Programming and Statistics According to [11], data science is now being benchmarked against practices that employed on highly skilled professionals....

    [...]

Trending Questions (2)
What kind of soft skill are important for data scientist?

The important soft skills for data scientists are communication and analysis.

Can a software developer become data analyst?

Even though Big Data Analytic (BDA) tools and software can assist in this matter, skills and competency of the personnel that handle and manage the data is more crucial and important.