scispace - formally typeset
Search or ask a question
Author

Aarati Mahajan

Bio: Aarati Mahajan is an academic researcher. The author has contributed to research in topics: Big data. The author has an hindex of 1, co-authored 1 publications receiving 89 citations.
Topics: Big data

Papers
More filters
Journal ArticleDOI
TL;DR: This document provides insights on the challenges of managing such a huge data – popularly known as Big Data, the solutions offered by Big Data management tools/ techniques and the opportunities it has created.
Abstract: In today„s world, every tiny gadget is a potential data source, adding to the huge data bank. Every day, we create 2.5 quintillion bytes of data – structured and unstructured, so much that 90% of the data in the world today has been created in the last two years alone. This data generated through large customer transactions, social networking sites is varied, voluminous and rapidly generating. All this data prove a storage and processing crisis for the enterprises. While more data enables realistic analysis and thus help in making accurate business decisions / goals, it is equally difficult to manage and analyze such a huge amount of data. This document provides insights on the challenges of managing such a huge Data – popularly known as Big Data, the solutions offered by Big Data management tools/ techniques and the opportunities it has created. General Terms Big Data, Big Data Opportunities, Big Data Challenges

118 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This review provides an update of important developments in the analysis of big data and forward strategies to accelerate the global transition to personalized medicine.
Abstract: Recent advances in high-throughput technologies have led to the emergence of systems biology as a holistic science to achieve more precise modeling of complex diseases. Many predict the emergence of personalized medicine in the near future. We are, however, moving from two-tiered health systems to a two-tiered personalized medicine. Omics facilities are restricted to affluent regions, and personalized medicine is likely to widen the growing gap in health systems between high and low-income countries. This is mirrored by an increasing lag between our ability to generate and analyze big data. Several bottlenecks slow-down the transition from conventional to personalized medicine: generation of cost-effective high-throughput data; hybrid education and multidisciplinary teams; data storage and processing; data integration and interpretation; and individual and global economic relevance. This review provides an update of important developments in the analysis of big data and forward strategies to accelerate the global transition to personalized medicine.

415 citations

Journal ArticleDOI
TL;DR: This article provides an overview on the topic of Big Data, and how the current problem can be addressed from the perspective of Cloud Computing and its programming frameworks, and focuses on systems for large‐scale analytics based on the MapReduce scheme and Hadoop, its open‐source implementation.
Abstract: The term 'Big Data' has spread rapidly in the framework of Data Mining and Business Intelligence. This new scenario can be defined by means of those problems that cannot be effectively or efficiently addressed using the standard computing resources that we currently have. We must emphasize that Big Data does not just imply large volumes of data but also the necessity for scalability, i.e., to ensure a response in an acceptable elapsed time. When the scalability term is considered, usually traditional parallel-type solutions are contemplated, such as the Message Passing Interface or high performance and distributed Database Management Systems. Nowadays there is a new paradigm that has gained popularity over the latter due to the number of benefits it offers. This model is Cloud Computing, and among its main features we has to stress its elasticity in the use of computing resources and space, less management effort, and flexible costs. In this article, we provide an overview on the topic of Big Data, and how the current problem can be addressed from the perspective of Cloud Computing and its programming frameworks. In particular, we focus on those systems for large-scale analytics based on the MapReduce scheme and Hadoop, its open-source implementation. We identify several libraries and software projects that have been developed for aiding practitioners to address this new programming model. We also analyze the advantages and disadvantages of MapReduce, in contrast to the classical solutions in this field. Finally, we present a number of programming frameworks that have been proposed as an alternative to MapReduce, developed under the premise of solving the shortcomings of this model in certain scenarios and platforms. WIREs Data Mining Knowl Discov 2014, 4:380-409. doi: 10.1002/widm.1134

221 citations

Journal ArticleDOI
TL;DR: A novel distributed partitioning methodology for prototype reduction techniques in nearest neighbor classification that enables prototype reduction algorithms to be applied over big data classification problems without significant accuracy loss and is a suitable tool to enhance the performance of the nearest neighbor classifier with big data.

212 citations

Journal ArticleDOI
TL;DR: The proposed Spatial-Temporal Weighted K-Nearest Neighbor model, named STW-KNN, is implemented on a widely adopted Hadoop distributed computing platform with the MapReduce parallel processing paradigm, to enhance the accuracy and efficiency of short-term traffic flow forecasting.

160 citations

Journal ArticleDOI
TL;DR: Galactica as mentioned in this paper is a large language model that can store, combine and reason about scientific knowledge and outperforms existing models on a range of scientific tasks, such as LaTeX equations.
Abstract: Information overload is a major obstacle to scientific progress. The explosive growth in scientific literature and data has made it ever harder to discover useful insights in a large mass of information. Today scientific knowledge is accessed through search engines, but they are unable to organize scientific knowledge alone. In this paper we introduce Galactica: a large language model that can store, combine and reason about scientific knowledge. We train on a large scientific corpus of papers, reference material, knowledge bases and many other sources. We outperform existing models on a range of scientific tasks. On technical knowledge probes such as LaTeX equations, Galactica outperforms the latest GPT-3 by 68.2% versus 49.0%. Galactica also performs well on reasoning, outperforming Chinchilla on mathematical MMLU by 41.3% to 35.7%, and PaLM 540B on MATH with a score of 20.4% versus 8.8%. It also sets a new state-of-the-art on downstream tasks such as PubMedQA and MedMCQA dev of 77.6% and 52.9%. And despite not being trained on a general corpus, Galactica outperforms BLOOM and OPT-175B on BIG-bench. We believe these results demonstrate the potential for language models as a new interface for science. We open source the model for the benefit of the scientific community1.

142 citations