Author

Satish Gopalani

Bio: Satish Gopalani is an academic researcher. The author has contributed to research in topics: Spark (mathematics) & Cluster analysis. The author has an hindex of 1, co-authored 1 publications receiving 105 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Comparing Apache Spark and Map Reduce with Performance Analysis using K-Means

[...]

Satish Gopalani, Rohan Arora

18 Mar 2015-International Journal of Computer Applications

TL;DR: Two of the comparison of - Hadoop Map Reduce and the recently introduced Apache Spark - both of which provide a processing model for analyzing big data are discussed, both of whom vary significantly based on the use case under implementation.

...read moreread less

Abstract: Data has long been the topic of fascination for Computer Science enthusiasts around the world, and has gained even more prominence in the recent times with the continuous explosion of data resulting from the likes of social media and the quest for tech giants to gain access to deeper analysis of their data This paper discusses two of the comparison of - Hadoop Map Reduce and the recently introduced Apache Spark - both of which provide a processing model for analyzing big data Although both of these options are based on the concept of Big Data, their performance varies significantly based on the use case under implementation This is what makes these two options worthy of analysis with respect to their variability and variety in the dynamic field of Big Data In this paper we compare these two frameworks along with providing the performance analysis using a standard machine learning algorithm for clustering (K- Means)

...read moreread less

124 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Big data in healthcare: management, analysis and future prospects

[...]

Sabyasachi Dash¹, Sushil Kumar Shakyawar², Mohit Sharma³, Sandeep Kaushik⁴•Institutions (4)

Cornell University¹, University of Minho², Jagiellonian University³, European Institute⁴

19 Jun 2019-Journal of Big Data

TL;DR: To provide relevant solutions for improving public health, healthcare providers are required to be fully equipped with appropriate infrastructure to systematically generate and analyze big data.

...read moreread less

Abstract: ‘Big data’ is massive amounts of information that can work wonders. It has become a topic of special interest for the past two decades because of a great potential that is hidden in it. Various public and private sector industries generate, store, and analyze big data with an aim to improve the services they provide. In the healthcare industry, various sources for big data include hospital records, medical records of patients, results of medical examinations, and devices that are a part of internet of things. Biomedical research also generates a significant portion of big data relevant to public healthcare. This data requires proper management and analysis in order to derive meaningful information. Otherwise, seeking solution by analyzing big data quickly becomes comparable to finding a needle in the haystack. There are various challenges associated with each step of handling big data which can only be surpassed by using high-end computing solutions for big data analysis. That is why, to provide relevant solutions for improving public health, healthcare providers are required to be fully equipped with appropriate infrastructure to systematically generate and analyze big data. An efficient management, analysis, and interpretation of big data can change the game by opening new avenues for modern healthcare. That is exactly why various industries, including the healthcare industry, are taking vigorous steps to convert this potential into better services and financial advantages. With a strong integration of biomedical and healthcare data, modern healthcare organizations can possibly revolutionize the medical therapies and personalized medicine.

...read moreread less

615 citations

Journal Article•DOI•

Big data analytics on Apache Spark

[...]

Salman Salloum¹, Ruslan Dautov¹, Xiaojun Chen¹, Patrick Xiaogang Peng¹, Joshua Zhexue Huang¹ - Show less +1 more•Institutions (1)

Shenzhen University¹

13 Oct 2016-Journal of data science

TL;DR: This review shows what Apache Spark has for designing and implementing big data algorithms and pipelines for machine learning, graph analysis and stream processing and highlights some research and development directions on Apache Spark for big data analytics.

...read moreread less

Abstract: Apache Spark has emerged as the de facto framework for big data analytics with its advanced in-memory programming model and upper-level libraries for scalable machine learning, graph analysis, streaming and structured data processing. It is a general-purpose cluster computing framework with language-integrated APIs in Scala, Java, Python and R. As a rapidly evolving open source project, with an increasing number of contributors from both academia and industry, it is difficult for researchers to comprehend the full body of development and research behind Apache Spark, especially those who are beginners in this area. In this paper, we present a technical review on big data analytics using Apache Spark. This review focuses on the key components, abstractions and features of Apache Spark. More specifically, it shows what Apache Spark has for designing and implementing big data algorithms and pipelines for machine learning, graph analysis and stream processing. In addition, we highlight some research and development directions on Apache Spark for big data analytics.

...read moreread less

241 citations

Journal Article•DOI•

Statistical Learning Theory and ELM for Big Social Data Analysis

[...]

Luca Oneto¹, Federica Bisio¹, Erik Cambria², Davide Anguita¹•Institutions (2)

University of Genoa¹, Nanyang Technological University²

01 Aug 2016-IEEE Computational Intelligence Magazine

TL;DR: This paper shows how to exploit the most recent technological tools and advances in Statistical Learning Theory (SLT) in order to efficiently build an Extreme Learning Machine (ELM) and assess the resultant model's performance when applied to big social data analysis.

...read moreread less

Abstract: The science of opinion analysis based on data from social networks and other forms of mass media has garnered the interest of the scientific community and the business world. Dealing with the increasing amount of information present on the Web is a critical task and requires efficient models developed by the emerging field of sentiment analysis. To this end, current research proposes an efficient approach to support emotion recognition and polarity detection in natural language text. In this paper, we show how to exploit the most recent technological tools and advances in Statistical Learning Theory (SLT) in order to efficiently build an Extreme Learning Machine (ELM) and assess the resultant model's performance when applied to big social data analysis. ELM represents a powerful learning tool, developed to overcome some issues in back-propagation networks. The main problem with ELM is in training them to work in the event of a large number of available samples, where the generalization performance has to be carefully assessed. For this reason, we propose an ELM implementation that exploits the Spark distributed in memory technology and show how to take advantage of the most recent advances in SLT in order to address the issue of selecting ELM hyperparameters that give the best generalization performance.

...read moreread less

101 citations

Proceedings Article•DOI•

Big data machine learning using apache spark MLlib

[...]

Mehdi Assefi¹, Ehsun Behravesh, Guangchi Liu, Ahmad P. Tafti²•Institutions (2)

University of Georgia¹, Marshfield Clinic²

01 Dec 2017

TL;DR: This contribution explores the expanding body of the Apache Spark MLlib 2.0 as an open-source, distributed, scalable, and platform independent machine learning library, and performs several real world machine learning experiments to examine the qualitative and quantitative attributes of the platform.

...read moreread less

Abstract: Artificial intelligence, and particularly machine learning, has been used in many ways by the research community to turn a variety of diverse and even heterogeneous data sources into high quality facts and knowledge, providing premier capabilities to accurate pattern discovery. However, applying machine learning strategies on big and complex datasets is computationally expensive, and it consumes a very large amount of logical and physical resources, such as data file space, CPU, and memory. A sophisticated platform for efficient big data analytics is becoming more important these days as the data amount generated in a daily basis exceeds over quintillion bytes. Apache Spark MLlib is one of the most prominent platforms for big data analysis which offers a set of excellent functionalities for different machine learning tasks ranging from regression, classification, and dimension reduction to clustering and rule extraction. In this contribution, we explore, from the computational perspective, the expanding body of the Apache Spark MLlib 2.0 as an open-source, distributed, scalable, and platform independent machine learning library. Specifically, we perform several real world machine learning experiments to examine the qualitative and quantitative attributes of the platform. Furthermore, we highlight current trends in big data machine learning research and provide insights for future work.

...read moreread less

66 citations

Journal Article•DOI•

A three-way cluster ensemble approach for large-scale data

[...]

Hong Yu¹, Chen Yun¹, Pawan Lingras², Guoyin Wang¹•Institutions (2)

Chongqing University of Posts and Telecommunications¹, Saint Mary's University²

01 Dec 2019-International Journal of Approximate Reasoning

TL;DR: The experimental results show that the proposed three-way cluster ensemble approach can effectively deal with large-scale data, and the proposed consensus clustering algorithm has a lower time cost and does not sacrifice the clustering quality.

...read moreread less

52 citations

Collapse