scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

The need for new processes, methodologies and tools to support big data teams and improve big data project effectiveness

29 Oct 2015-pp 2066-2071
TL;DR: This paper discusses the key research questions relating methodologies, tools and frameworks to improve big data team effectiveness as well as the potential goals for a big data process methodology.
Abstract: As data continues to be produced in massive amounts, with increasing volume, velocity and variety, big data projects are growing in frequency and importance. However, the growth in the use of big data has outstripped the knowledge of how to support teams that need to do big data projects. In fact, while much has been written in terms of the use of algorithms that can help generate insightful analysis, much less has been written about methodologies, tools and frameworks that could enable teams to more effectively and efficiently "do" big data projects. Hence, this paper discusses the key research questions relating methodologies, tools and frameworks to improve big data team effectiveness as well as the potential goals for a big data process methodology. Finally, the paper also discusses related domains, such as software development, operations research and business intelligence, since these fields might provide insight into how to define a big data process methodology.
Citations
More filters
Proceedings ArticleDOI
01 Dec 2016
TL;DR: There is no agreed upon standard for executing Big Data projects but that there is a growing research focus in this area and that an improved process methodology would be useful, and the synthesis provides useful suggestions to help practitioners execute their projects.
Abstract: This paper reports on our review of published research relating to how teams work together to execute Big Data projects. Our findings suggest that there is no agreed upon standard for executing these projects but that there is a growing research focus in this area and that an improved process methodology would be useful. In addition, our synthesis also provides useful suggestions to help practitioners execute their projects, specifically our identified list of 33 important success factors for executing Big Data efforts, which are grouped by our six identified characteristics of a mature Big Data organization.

69 citations


Cites background from "The need for new processes, methodo..."

  • ...Current Big Data research has typically focused on improving data models and algorithms, but not on understanding the best approach to execute projects [1] [2]....

    [...]

  • ...The goals of such a process methodology could be to improve coordination with others, ensure quality, ensure data ownership (as well as security and privacy), analyze requirements, prioritize requirements and enable analytical solutions to be deployed [2]....

    [...]

  • ...Process & Phase [2] [17] [24] [25] [28] [30] [31] [32] -- --...

    [...]

  • ...Conference Journal Search Process & Phase [2] [17] [24] [25] [28] [30] [31] [32] -- -- Process [1] [38] [46] -- [19] [20]...

    [...]

  • ...Furthermore, it has also been reported that a project management methodology from a related field, such as software development or operations research, fails to address specific big data challenges [2]....

    [...]

Proceedings ArticleDOI
04 Jan 2017
TL;DR: The results from an experiment comparing four different methodologies to manage and coordinate a data science project demonstrate that there are significant differences based on the methodology used, with an Agile Kanban methodology being the most effective and surprisingly, anAgile Scrum methodologybeing the least effective.
Abstract: Data Science is an emerging field with a significant research focus on improving the techniques available to analyze data. However, there has been much less focus on how people should work together on a data science project. In this paper, we report on the results of an experiment comparing four different methodologies to manage and coordinate a data science project. We first introduce a model to compare different project management methodologies and then report on the results of our experiment. The results from our experiment demonstrate that there are significant differences based on the methodology used, with an Agile Kanban methodology being the most effective and surprisingly, an Agile Scrum methodology being the least effective.

64 citations


Cites background or methods from "The need for new processes, methodo..."

  • ...However, if one wants to compare different methodologies to improve a group’s overall performance, then it becomes critical to be able to evaluate and compare the effectiveness of the different teams [3]....

    [...]

  • ...For example, in the field of data science, there is no known “best” process to do a data science project [3]....

    [...]

  • ...This step-by-step data science process description described by Jadadish and others does not provide much guidance about the process a data science team should use to work together [3]....

    [...]

  • ...For example, compared to software development, data science projects have an increased focus on data, what data is needed and the availability, quality and timeliness of the data [1, 3, 21]....

    [...]

Journal ArticleDOI
TL;DR: A model that examines the relationship between the application of big data analytics and organizational performance in small and medium enterprises (SMEs) indicated that the ABDA had a positive and significant impact on OP.
Abstract: Drawing from tenets of the resource-based theory, we propose and test a model that examines the relationship between the application of big data analytics (ABDA) and organizational performance (OP) in small and medium enterprises (SMEs). Further, this study examines the mediating role of knowledge management practices (KMP) in relation to the ABDA and OP. Data were collected from respondents working in SMEs through an adapted instrument. This research study adopts the Baron–Kenny approach to test the mediation. The results indicated that the ABDA had a positive and significant impact on OP. Also, KMP had partially mediated the relationship between ABDA and OP in SMEs. The dataset was solely comprised of SMEs from Pakistan administered Kashmir and may not reflect the insights from other regions. Hence limits the generalizability of the results. Findings highlight both strategic and practical implications related to decision making in organizations for top management, particularly in developing countries. This study attempts to contribute to the literature through novel findings and recommendations. These fallouts will help the top management during the key decision-making process and encourage practitioners who seek competitive advantage through enhanced organizational performance in SMEs.

38 citations


Cites background from "The need for new processes, methodo..."

  • ...A firm needs to get an analytical insight into a huge volume of data to apply big data analytics which will certainly help an organization to improve business performance [19, 60]....

    [...]

Journal ArticleDOI
TL;DR: A set of case studies where researchers were embedded within data science teams and where the researcher observations and analysis was focused on the attributes that can help describe data science projects and the challenges faced by the teams executing these projects, as opposed to the algorithms and technologies that were used to perform the analytics.
Abstract: The challenge in executing a data science project is more than just identifying the best algorithm and tool set to use. Additional sociotechnical challenges include items such as how to define the project goals and how to ensure the project is effectively managed. This paper reports on a set of case studies where researchers were embedded within data science teams and where the researcher observations and analysis was focused on the attributes that can help describe data science projects and the challenges faced by the teams executing these projects, as opposed to the algorithms and technologies that were used to perform the analytics. Based on our case studies, we identified 14 characteristics that can help describe a data science project. We then used these characteristics to create a model that defines two key dimensions of the project. Finally, by clustering the projects within these two dimensions, we identified four types of data science projects, and based on the type of project, we identified some of the sociotechnical challenges that project teams should expect to encounter when executing data science projects.

35 citations


Cites background from "The need for new processes, methodo..."

  • ...While there are certainly parallels to other types of technical projects such as software development, Saltz (2015) noted that there are differences as compared to these other types of projects, and that this suggests that the desired organizational processes, and description of the project, could be different than those used in existing domains such as software development....

    [...]

  • ...While there are certainly parallels to other types of technical projects such as software development, Saltz (2015) noted that there are differences as compared to these other types of projects, and that this suggests that the desired organizational processes, and description of the project, could…...

    [...]

  • ...While the use of a step-by-step description of data science provides some understanding of the tasks involved, it does not provide much guidance about the roles required within a data science team (Saltz, 2015)....

    [...]

  • ...…items such as understanding what data might be available, what might be the goals of the effort, what project management framework should be used, how to engage and coordinate with the proper extended team, and how to establish realistic project timelines (Ahangama & Poo, 2015; Saltz, 2015)....

    [...]

  • ...” It has been suggested that an improved data science process methodology could help improve the success rate of these projects (Saltz, 2015)....

    [...]

Proceedings ArticleDOI
29 Oct 2015
TL;DR: This paper characterize data-driven practice and research and explore how to design effective methods for systematizing such practice andResearch and how to develop automated computing methods that require significant and expensive computing power in order to scale effectively.
Abstract: Big Data is characterized by the five V's — of Volume, Velocity, Variety, Veracity and Value. Research on Big Data, that is, the practice of gaining insights from it, challenges the intellectual, process, and computational limits of an enterprise. Leveraging the correct and appropriate toolset requires careful consideration of a large software ecosystem. Powerful algorithms exist, but the exploratory and often ad-hoc nature of analytic demands and a distinct lack of established processes and methodologies make it difficult for Big Data teams to set expectations or even create valid project plans. The exponential growth of data generated exceeds the capacity of humans to process it, and compels us to develop automated computing methods that require significant and expensive computing power in order to scale effectively. In this paper, we characterize data-driven practice and research and explore how we might design effective methods for systematizing such practice and research [19, 22]. Brief case studies are presented in order to ground our conclusions and insights.

32 citations

References
More filters
Journal ArticleDOI
TL;DR: This paper discusses many of the important IS success research contributions of the last decade, focusing especially on research efforts that apply, validate, challenge, and propose enhancements to the original model.
Abstract: Ten years ago, we presented the DeLone and McLean Information Systems (IS) Success Model as a framework and model for measuring the complex-dependent variable in IS research. In this paper, we discuss many of the important IS success research contributions of the last decade, focusing especially on research efforts that apply, validate, challenge, and propose enhancements to our original model. Based on our evaluation of those contributions, we propose minor refinements to the model and propose an updated DeLone and McLean IS Success Model. We discuss the utility of the updated model for measuring e-commerce system success. Finally, we make a series of recommendations regarding current and future measurement of IS success.

9,544 citations


"The need for new processes, methodo..." refers methods in this paper

  • ...DeLone and McLean’s model has three components: the creation of a system, the use of the system, and the consequences of this system use....

    [...]

  • ...[17] Delone, W. H., & McLean, E. R. (2003)....

    [...]

  • ...The DeLone and McLean model of information systems success: a ten-year update....

    [...]

  • ...A commonly cited model for IS Success is from DeLone and McLean [17], which is based on the system and information quality that drives use and user satisfaction, which drives individual impact and leads to organizational impact....

    [...]

Journal ArticleDOI
TL;DR: An overview of this emerging field is provided, clarifying how data mining and knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and databases.
Abstract: ■ Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. What is all the excitement about? This article provides an overview of this emerging field, clarifying how data mining and knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and databases. The article mentions particular real-world applications, specific data-mining techniques, challenges involved in real-world applications of knowledge discovery, and current and future research directions in the field.

4,782 citations


"The need for new processes, methodo..." refers methods in this paper

  • ...For example, they are similar to the KDD (Knowledge Discovery in Databases) process described nearly twenty years ago [5]....

    [...]

Journal ArticleDOI
TL;DR: This introduction to the MIS Quarterly Special Issue on Business Intelligence Research first provides a framework that identifies the evolution, applications, and emerging research areas of BI&A, and introduces and characterized the six articles that comprise this special issue in terms of the proposed BI &A research framework.
Abstract: Business intelligence and analytics (BI&A) has emerged as an important area of study for both practitioners and researchers, reflecting the magnitude and impact of data-related problems to be solved in contemporary business organizations. This introduction to the MIS Quarterly Special Issue on Business Intelligence Research first provides a framework that identifies the evolution, applications, and emerging research areas of BI&A. BI&A 1.0, BI&A 2.0, and BI&A 3.0 are defined and described in terms of their key characteristics and capabilities. Current research in BI&A is analyzed and challenges and opportunities associated with BI&A research and education are identified. We also report a bibliometric study of critical BI&A publications, researchers, and research topics based on more than a decade of related academic and industry publications. Finally, the six articles that comprise this special issue are introduced and characterized in terms of the proposed BI&A research framework.

4,610 citations


"The need for new processes, methodo..." refers background in this paper

  • ...In fact, many in the field believe that big data research needs to continue to focus on analytics [1]....

    [...]

01 Jan 1987

2,580 citations

Journal ArticleDOI
TL;DR: The capability maturity model (CMM), developed to present sets of recommended practices in a number of key process areas that have been shown to enhance software-development and maintenance capability, is discussed.
Abstract: The capability maturity model (CMM), developed to present sets of recommended practices in a number of key process areas that have been shown to enhance software-development and maintenance capability, is discussed. The CMM was designed to help developers select process-improvement strategies by determining their current process maturity and identifying the issues most critical to improving their software quality and process. The initial release of the CMM, version 1.0, was reviewed and used by the software community during 1991 and 1992. A workshop on CMM 1.0, held in April 1992, was attended by about 200 software professionals. The current version of the CMM is the result of the feedback from that workshop and ongoing feedback from the software community. The technical report that describes version 1.1. is summarised. >

1,179 citations


"The need for new processes, methodo..." refers background in this paper

  • ...Hence, not surprisingly, Bhardwaj [8] noted that teams doing data analysis and data science work in an ad hoc fashion, using trial and error to identify the right tools, that is, at a low level of process maturity [9]....

    [...]

  • ...(CMM) framework provides a standard definition of process maturity, which includes five levels of process maturity [9]...

    [...]