scispace - formally typeset
Search or ask a question
Journal Article

Data Mining Practical Machine Learning Tools and Techniques

About: This article is published in Journal of management science.The article was published on 2014-01-01 and is currently open access. It has received 9185 citations till now.
Citations
More filters
Journal ArticleDOI
TL;DR: This introduction to the MIS Quarterly Special Issue on Business Intelligence Research first provides a framework that identifies the evolution, applications, and emerging research areas of BI&A, and introduces and characterized the six articles that comprise this special issue in terms of the proposed BI &A research framework.
Abstract: Business intelligence and analytics (BI&A) has emerged as an important area of study for both practitioners and researchers, reflecting the magnitude and impact of data-related problems to be solved in contemporary business organizations. This introduction to the MIS Quarterly Special Issue on Business Intelligence Research first provides a framework that identifies the evolution, applications, and emerging research areas of BI&A. BI&A 1.0, BI&A 2.0, and BI&A 3.0 are defined and described in terms of their key characteristics and capabilities. Current research in BI&A is analyzed and challenges and opportunities associated with BI&A research and education are identified. We also report a bibliometric study of critical BI&A publications, researchers, and research topics based on more than a decade of related academic and industry publications. Finally, the six articles that comprise this special issue are introduced and characterized in terms of the proposed BI&A research framework.

4,610 citations


Cites background from "Data Mining Practical Machine Learn..."

  • ...Most of these popular data mining algorithms have been incorporated in commercial and open source data mining systems (Witten et al. 2011)....

    [...]

Journal ArticleDOI
TL;DR: This survey paper formally defines transfer learning, presents information on current solutions, and reviews applications applied toTransfer learning, which can be applied to big data environments.
Abstract: Machine learning and data mining techniques have been used in numerous real-world applications. An assumption of traditional machine learning methodologies is the training data and testing data are taken from the same domain, such that the input feature space and data distribution characteristics are the same. However, in some real-world machine learning scenarios, this assumption does not hold. There are cases where training data is expensive or difficult to collect. Therefore, there is a need to create high-performance learners trained with more easily obtained data from different domains. This methodology is referred to as transfer learning. This survey paper formally defines transfer learning, presents information on current solutions, and reviews applications applied to transfer learning. Lastly, there is information listed on software downloads for various transfer learning solutions and a discussion of possible future research work. The transfer learning solutions surveyed are independent of data size and can be applied to big data environments.

2,900 citations


Cites background from "Data Mining Practical Machine Learn..."

  • ...The field of data mining and machine learning has been widely and successfully used in many applications where patterns from past information (training data) can be extracted in order to predict future outcomes [129]....

    [...]

  • ...Witten IH, Frank E. Data mining, practical machine learning tools and techniques....

    [...]

  • ...For more information on machine learning see Witten [129]....

    [...]

Journal ArticleDOI
TL;DR: Tapping into the "folk knowledge" needed to advance machine learning applications is a natural next step in the development of artificial intelligence systems.
Abstract: Machine learning algorithms can figure out how to perform important tasks by generalizing from examples. This is often feasible and cost-effective where manual programming is not. As more data becomes available, more ambitious problems can be tackled. As a result, machine learning is widely used in computer science and other fields. However, developing successful machine learning applications requires a substantial amount of “black art” that is hard to find in textbooks. This article summarizes twelve key lessons that machine learning researchers and practitioners have learned. These include pitfalls to avoid, important issues to focus on, and answers to common questions.

2,482 citations


Cites background from "Data Mining Practical Machine Learn..."

  • ...Several fine textbooks are available to interested practitioners and researchers (e.g, [16, 24])....

    [...]

Journal ArticleDOI
TL;DR: The state of the art in HAR based on wearable sensors is surveyed and a two-level taxonomy in accordance to the learning approach and the response time is proposed.
Abstract: Providing accurate and opportune information on people's activities and behaviors is one of the most important tasks in pervasive computing. Innumerable applications can be visualized, for instance, in medical, security, entertainment, and tactical scenarios. Despite human activity recognition (HAR) being an active field for more than a decade, there are still key aspects that, if addressed, would constitute a significant turn in the way people interact with mobile devices. This paper surveys the state of the art in HAR based on wearable sensors. A general architecture is first presented along with a description of the main components of any HAR system. We also propose a two-level taxonomy in accordance to the learning approach (either supervised or semi-supervised) and the response time (either offline or online). Then, the principal issues and challenges are discussed, as well as the main solutions to each one of them. Twenty eight systems are qualitatively evaluated in terms of recognition performance, energy consumption, obtrusiveness, and flexibility, among others. Finally, we present some open problems and ideas that, due to their high relevance, should be addressed in future research.

2,184 citations


Cites methods from "Data Mining Practical Machine Learn..."

  • ...For instance, classification algorithms such as Instance Based Learning [32] and Bagging [49] are very expensive in their evaluation phase, which makes them not convenient for mobile HAR....

    [...]

Journal ArticleDOI
TL;DR: Given the growing trend on the application of ML methods in cancer research, this work presents here the most recent publications that employ these techniques as an aim to model cancer risk or patient outcomes.
Abstract: Cancer has been characterized as a heterogeneous disease consisting of many different subtypes. The early diagnosis and prognosis of a cancer type have become a necessity in cancer research, as it can facilitate the subsequent clinical management of patients. The importance of classifying cancer patients into high or low risk groups has led many research teams, from the biomedical and the bioinformatics field, to study the application of machine learning (ML) methods. Therefore, these techniques have been utilized as an aim to model the progression and treatment of cancerous conditions. In addition, the ability of ML tools to detect key features from complex datasets reveals their importance. A variety of these techniques, including Artificial Neural Networks (ANNs), Bayesian Networks (BNs), Support Vector Machines (SVMs) and Decision Trees (DTs) have been widely applied in cancer research for the development of predictive models, resulting in effective and accurate decision making. Even though it is evident that the use of ML methods can improve our understanding of cancer progression, an appropriate level of validation is needed in order for these methods to be considered in the everyday clinical practice. In this work, we present a review of recent ML approaches employed in the modeling of cancer progression. The predictive models discussed here are based on various supervised ML techniques as well as on different input features and data samples. Given the growing trend on the application of ML methods in cancer research, we present here the most recent publications that employ these techniques as an aim to model cancer risk or patient outcomes.

1,991 citations


Cites methods from "Data Mining Practical Machine Learn..."

  • ...Scientists applied different methods, such as screening in early stage, in order to find types of cancer before they cause symptoms....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This introduction to the MIS Quarterly Special Issue on Business Intelligence Research first provides a framework that identifies the evolution, applications, and emerging research areas of BI&A, and introduces and characterized the six articles that comprise this special issue in terms of the proposed BI &A research framework.
Abstract: Business intelligence and analytics (BI&A) has emerged as an important area of study for both practitioners and researchers, reflecting the magnitude and impact of data-related problems to be solved in contemporary business organizations. This introduction to the MIS Quarterly Special Issue on Business Intelligence Research first provides a framework that identifies the evolution, applications, and emerging research areas of BI&A. BI&A 1.0, BI&A 2.0, and BI&A 3.0 are defined and described in terms of their key characteristics and capabilities. Current research in BI&A is analyzed and challenges and opportunities associated with BI&A research and education are identified. We also report a bibliometric study of critical BI&A publications, researchers, and research topics based on more than a decade of related academic and industry publications. Finally, the six articles that comprise this special issue are introduced and characterized in terms of the proposed BI&A research framework.

4,610 citations

Journal ArticleDOI
TL;DR: This survey paper formally defines transfer learning, presents information on current solutions, and reviews applications applied toTransfer learning, which can be applied to big data environments.
Abstract: Machine learning and data mining techniques have been used in numerous real-world applications. An assumption of traditional machine learning methodologies is the training data and testing data are taken from the same domain, such that the input feature space and data distribution characteristics are the same. However, in some real-world machine learning scenarios, this assumption does not hold. There are cases where training data is expensive or difficult to collect. Therefore, there is a need to create high-performance learners trained with more easily obtained data from different domains. This methodology is referred to as transfer learning. This survey paper formally defines transfer learning, presents information on current solutions, and reviews applications applied to transfer learning. Lastly, there is information listed on software downloads for various transfer learning solutions and a discussion of possible future research work. The transfer learning solutions surveyed are independent of data size and can be applied to big data environments.

2,900 citations

Journal ArticleDOI
TL;DR: Tapping into the "folk knowledge" needed to advance machine learning applications is a natural next step in the development of artificial intelligence systems.
Abstract: Machine learning algorithms can figure out how to perform important tasks by generalizing from examples. This is often feasible and cost-effective where manual programming is not. As more data becomes available, more ambitious problems can be tackled. As a result, machine learning is widely used in computer science and other fields. However, developing successful machine learning applications requires a substantial amount of “black art” that is hard to find in textbooks. This article summarizes twelve key lessons that machine learning researchers and practitioners have learned. These include pitfalls to avoid, important issues to focus on, and answers to common questions.

2,482 citations

Journal ArticleDOI
TL;DR: The state of the art in HAR based on wearable sensors is surveyed and a two-level taxonomy in accordance to the learning approach and the response time is proposed.
Abstract: Providing accurate and opportune information on people's activities and behaviors is one of the most important tasks in pervasive computing. Innumerable applications can be visualized, for instance, in medical, security, entertainment, and tactical scenarios. Despite human activity recognition (HAR) being an active field for more than a decade, there are still key aspects that, if addressed, would constitute a significant turn in the way people interact with mobile devices. This paper surveys the state of the art in HAR based on wearable sensors. A general architecture is first presented along with a description of the main components of any HAR system. We also propose a two-level taxonomy in accordance to the learning approach (either supervised or semi-supervised) and the response time (either offline or online). Then, the principal issues and challenges are discussed, as well as the main solutions to each one of them. Twenty eight systems are qualitatively evaluated in terms of recognition performance, energy consumption, obtrusiveness, and flexibility, among others. Finally, we present some open problems and ideas that, due to their high relevance, should be addressed in future research.

2,184 citations

Journal ArticleDOI
TL;DR: Given the growing trend on the application of ML methods in cancer research, this work presents here the most recent publications that employ these techniques as an aim to model cancer risk or patient outcomes.
Abstract: Cancer has been characterized as a heterogeneous disease consisting of many different subtypes. The early diagnosis and prognosis of a cancer type have become a necessity in cancer research, as it can facilitate the subsequent clinical management of patients. The importance of classifying cancer patients into high or low risk groups has led many research teams, from the biomedical and the bioinformatics field, to study the application of machine learning (ML) methods. Therefore, these techniques have been utilized as an aim to model the progression and treatment of cancerous conditions. In addition, the ability of ML tools to detect key features from complex datasets reveals their importance. A variety of these techniques, including Artificial Neural Networks (ANNs), Bayesian Networks (BNs), Support Vector Machines (SVMs) and Decision Trees (DTs) have been widely applied in cancer research for the development of predictive models, resulting in effective and accurate decision making. Even though it is evident that the use of ML methods can improve our understanding of cancer progression, an appropriate level of validation is needed in order for these methods to be considered in the everyday clinical practice. In this work, we present a review of recent ML approaches employed in the modeling of cancer progression. The predictive models discussed here are based on various supervised ML techniques as well as on different input features and data samples. Given the growing trend on the application of ML methods in cancer research, we present here the most recent publications that employ these techniques as an aim to model cancer risk or patient outcomes.

1,991 citations