Author

C. Ranichandra

Bio: C. Ranichandra is an academic researcher from VIT University. The author has contributed to research in topics: MNIST database & NumPy. The author has co-authored 1 publications.

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

Logistic Regression on Hadoop Using PySpark

[...]

Krishna Kumar Mahto¹, C. Ranichandra¹•Institutions (1)

VIT University¹

16 Dec 2019

TL;DR: The purpose of this work was to see how effective Hadoop can be in terms of increasing the efficiency of working with Machine Learning for a given problem by implementing and training three Logistic Regression models.

...read moreread less

Abstract: Training a Machine Learning (ML) model on bigger datasets is a difficult task to accomplish, especially when a high-end configuration is not accessible. A relatively good configuration may also not always produce quick outcomes and depending on the dataset size, the time taken would be anything between seconds to several hours. More often, the tasks we are interested in involve big datasets and complex models. The purpose of our work was to see how effective Hadoop can be in terms of increasing the efficiency of working with Machine Learning for a given problem. Out of many models to choose from, Logistic Regression was chosen, which is relatively simpler to implement. Three Logistic Regression models were implemented and trained on MNIST Handwritten Digits dataset. First one was implemented in Python using NumPy without any ML libraries. The second implementation used LogisticRegression class that comes with the Scikit-learn Python package, and the third implementation was done using PySpark MLlib. Towards the end of the paper, we present the observations and results obtained from the execution of each.

...read moreread less