scispace - formally typeset
Book ChapterDOI

An Empirical Comparison of Six Supervised Machine Learning Techniques on Spark Platform for Health Big Data

Reads0
Chats0
TLDR
The goal of this paper is to compare the performance of the six different machine learning techniques in spark platform specifically for health big data and discuss the results from the experiments conducted on datasets of different characteristics, thereby drawing inferences and conclusion.
Abstract
Health care is one of the prominent industries that generate voluminous data, thereby finding the need for machine learning techniques with big data solutions. The goal of this paper is to (i) compare the performance of the six different machine learning techniques in spark platform specifically for health big data and (ii) discuss the results from the experiments conducted on datasets of different characteristics, thereby drawing inferences and conclusion. Five benchmark health datasets are considered for experimentation. The metric chosen for comparison is the accuracy, and the computational time of the algorithms and the experiments are conducted with different proportions of training data. The experimental results show that the logistic regression and random forests perform well in terms of accuracy and naive Bayes outperforms other techniques in terms of computational time for health big datasets.

read more

Citations
More filters
Journal Article

Health big data analytics : current perspectives, challenges and potential solutions

TL;DR: The characteristics of Health Big Data as well as the challenges and solutions for health Big Data Analytics (BDA) are discussed and a pipelined framework for use as a guideline/reference in health BDA is designed and evaluated.
Proceedings ArticleDOI

TrORF: Building Trading Areas Around Organizations Based on Machine Learning Techniques

TL;DR: This paper proposes a machine-learning based methodology of determining business types and trading areas, named TrORF, to help investors selectbusiness types and build trading areas around organizations and results indicate that this approach has strong ability to build the suitable business types around each university's trading area.
Journal ArticleDOI

An Intelligent Technique to Predict the Autism Spectrum Disorder Using Big Data Platform

TL;DR: In this paper , the Intelligent Classification Based on Association rules (ICBA) algorithm is proposed for finding the correlations between the features to decide whether an individual has autism in its early stage, especially in childhood.
References
More filters
Proceedings Article

Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing

TL;DR: Resilient Distributed Datasets is presented, a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner and is implemented in a system called Spark, which is evaluated through a variety of user applications and benchmarks.
Proceedings ArticleDOI

An empirical comparison of supervised learning algorithms

TL;DR: A large-scale empirical comparison between ten supervised learning methods: SVMs, neural nets, logistic regression, naive bayes, memory-based learning, random forests, decision trees, bagged trees, boosted trees, and boosted stumps is presented.
Journal ArticleDOI

Logistic regression and artificial neural network classification models: a methodology review

TL;DR: In this paper, the differences and similarities of these models from a technical point of view, and compare them with other machine learning algorithms are summarized and compared using a set of quality criteria for logistic regression and artificial neural networks.
Journal ArticleDOI

A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification

TL;DR: The performance impact of feature set reduction, using Consistency-based and Correlation-based feature selection, is demonstrated on Na naïve Bayes, C4.5, Bayesian Network and Naïve Bayes Tree algorithms.
Journal ArticleDOI

Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500

TL;DR: In this article, the effectiveness of deep neural networks (DNN), gradient-boosted-trees (GBT), random forests (RAF), and several ensembles of these methods in the context of statistical arbitrage was evaluated.
Related Papers (5)