Author

Aarati Mahajan

Bio: Aarati Mahajan is an academic researcher. The author has contributed to research in topics: Big data. The author has an hindex of 1, co-authored 1 publications receiving 89 citations.

Topics: Big data

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Big Challenges? Big Data …

[...]

Sahil R. Kalra, Aarati Mahajan

17 Dec 2015-International Journal of Computer Applications

TL;DR: This document provides insights on the challenges of managing such a huge data – popularly known as Big Data, the solutions offered by Big Data management tools/ techniques and the opportunities it has created.

...read moreread less

Abstract: In today„s world, every tiny gadget is a potential data source, adding to the huge data bank. Every day, we create 2.5 quintillion bytes of data – structured and unstructured, so much that 90% of the data in the world today has been created in the last two years alone. This data generated through large customer transactions, social networking sites is varied, voluminous and rapidly generating. All this data prove a storage and processing crisis for the enterprises. While more data enables realistic analysis and thus help in making accurate business decisions / goals, it is equally difficult to manage and analyze such a huge amount of data. This document provides insights on the challenges of managing such a huge Data – popularly known as Big Data, the solutions offered by Big Data management tools/ techniques and the opportunities it has created. General Terms Big Data, Big Data Opportunities, Big Data Challenges

...read moreread less

118 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

From big data analysis to personalized medicine for all: challenges and opportunities

[...]

Akram Alyass¹, Michelle Turcotte¹, David Meyre¹•Institutions (1)

McMaster University¹

27 Jun 2015-BMC Medical Genomics

TL;DR: This review provides an update of important developments in the analysis of big data and forward strategies to accelerate the global transition to personalized medicine.

...read moreread less

Abstract: Recent advances in high-throughput technologies have led to the emergence of systems biology as a holistic science to achieve more precise modeling of complex diseases. Many predict the emergence of personalized medicine in the near future. We are, however, moving from two-tiered health systems to a two-tiered personalized medicine. Omics facilities are restricted to affluent regions, and personalized medicine is likely to widen the growing gap in health systems between high and low-income countries. This is mirrored by an increasing lag between our ability to generate and analyze big data. Several bottlenecks slow-down the transition from conventional to personalized medicine: generation of cost-effective high-throughput data; hybrid education and multidisciplinary teams; data storage and processing; data integration and interpretation; and individual and global economic relevance. This review provides an update of important developments in the analysis of big data and forward strategies to accelerate the global transition to personalized medicine.

...read moreread less

415 citations

Journal Article•DOI•

Big Data with Cloud Computing: an insight on the computing environment, MapReduce, and programming frameworks

[...]

Alberto Fernández¹, Sara del Río², Victoria López², Abdullah Bawakid³, María José del Jesus¹, José Manuel Benítez², Francisco Herrera², Francisco Herrera³ - Show less +4 more•Institutions (3)

University of Jaén¹, University of Granada², King Abdulaziz University³

01 Sep 2014-Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

TL;DR: This article provides an overview on the topic of Big Data, and how the current problem can be addressed from the perspective of Cloud Computing and its programming frameworks, and focuses on systems for large‐scale analytics based on the MapReduce scheme and Hadoop, its open‐source implementation.

...read moreread less

Abstract: The term 'Big Data' has spread rapidly in the framework of Data Mining and Business Intelligence. This new scenario can be defined by means of those problems that cannot be effectively or efficiently addressed using the standard computing resources that we currently have. We must emphasize that Big Data does not just imply large volumes of data but also the necessity for scalability, i.e., to ensure a response in an acceptable elapsed time. When the scalability term is considered, usually traditional parallel-type solutions are contemplated, such as the Message Passing Interface or high performance and distributed Database Management Systems. Nowadays there is a new paradigm that has gained popularity over the latter due to the number of benefits it offers. This model is Cloud Computing, and among its main features we has to stress its elasticity in the use of computing resources and space, less management effort, and flexible costs. In this article, we provide an overview on the topic of Big Data, and how the current problem can be addressed from the perspective of Cloud Computing and its programming frameworks. In particular, we focus on those systems for large-scale analytics based on the MapReduce scheme and Hadoop, its open-source implementation. We identify several libraries and software projects that have been developed for aiding practitioners to address this new programming model. We also analyze the advantages and disadvantages of MapReduce, in contrast to the classical solutions in this field. Finally, we present a number of programming frameworks that have been proposed as an alternative to MapReduce, developed under the premise of solving the shortcomings of this model in certain scenarios and platforms. WIREs Data Mining Knowl Discov 2014, 4:380-409. doi: 10.1002/widm.1134

...read moreread less

221 citations

Journal Article•DOI•

MRPR: A MapReduce solution for prototype reduction in big data classification

[...]

Isaac Triguero¹, Daniel Peralta¹, Jaume Bacardit², Salvador García³, Francisco Herrera¹ - Show less +1 more•Institutions (3)

University of Granada¹, Newcastle University², University of Jaén³

20 Feb 2015-Neurocomputing

TL;DR: A novel distributed partitioning methodology for prototype reduction techniques in nearest neighbor classification that enables prototype reduction algorithms to be applied over big data classification problems without significant accuracy loss and is a suitable tool to enhance the performance of the nearest neighbor classifier with big data.

...read moreread less

212 citations

Journal Article•DOI•

A distributed spatial-temporal weighted model on MapReduce for short-term traffic flow forecasting

[...]

Dawen Xia¹, Binfeng Wang², Huaqing Li², Yantao Li², Zili Zhang³ - Show less +1 more•Institutions (3)

Minzu University of China¹, Southwest University², Deakin University³

29 Feb 2016-Neurocomputing

TL;DR: The proposed Spatial-Temporal Weighted K-Nearest Neighbor model, named STW-KNN, is implemented on a widely adopted Hadoop distributed computing platform with the MapReduce parallel processing paradigm, to enhance the accuracy and efficiency of short-term traffic flow forecasting.

...read moreread less

160 citations

Journal Article•DOI•

Galactica: A Large Language Model for Science

[...]

Ross Taylor, Marcin Kardas, Guillem Cucurull, Thomas Scialom, Anthony S. Hartshorn, Elvis Saravia, Andrew M. Poulton, Viktor Kerkez, Robert Stojnic - Show less +5 more

16 Nov 2022-arXiv.org

TL;DR: Galactica as mentioned in this paper is a large language model that can store, combine and reason about scientific knowledge and outperforms existing models on a range of scientific tasks, such as LaTeX equations.

...read moreread less

Abstract: Information overload is a major obstacle to scientific progress. The explosive growth in scientific literature and data has made it ever harder to discover useful insights in a large mass of information. Today scientific knowledge is accessed through search engines, but they are unable to organize scientific knowledge alone. In this paper we introduce Galactica: a large language model that can store, combine and reason about scientific knowledge. We train on a large scientific corpus of papers, reference material, knowledge bases and many other sources. We outperform existing models on a range of scientific tasks. On technical knowledge probes such as LaTeX equations, Galactica outperforms the latest GPT-3 by 68.2% versus 49.0%. Galactica also performs well on reasoning, outperforming Chinchilla on mathematical MMLU by 41.3% to 35.7%, and PaLM 540B on MATH with a score of 20.4% versus 8.8%. It also sets a new state-of-the-art on downstream tasks such as PubMedQA and MedMCQA dev of 77.6% and 52.9%. And despite not being trained on a general corpus, Galactica outperforms BLOOM and OPT-175B on BIG-bench. We believe these results demonstrate the potential for language models as a new interface for science. We open source the model for the benefit of the scientific community1.

...read moreread less

142 citations

Collapse