scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Big Challenges? Big Data …

17 Dec 2015-International Journal of Computer Applications (Foundation of Computer Science (FCS), NY, USA)-Vol. 131, Iss: 11, pp 14-18
TL;DR: This document provides insights on the challenges of managing such a huge data – popularly known as Big Data, the solutions offered by Big Data management tools/ techniques and the opportunities it has created.
Abstract: In today„s world, every tiny gadget is a potential data source, adding to the huge data bank. Every day, we create 2.5 quintillion bytes of data – structured and unstructured, so much that 90% of the data in the world today has been created in the last two years alone. This data generated through large customer transactions, social networking sites is varied, voluminous and rapidly generating. All this data prove a storage and processing crisis for the enterprises. While more data enables realistic analysis and thus help in making accurate business decisions / goals, it is equally difficult to manage and analyze such a huge amount of data. This document provides insights on the challenges of managing such a huge Data – popularly known as Big Data, the solutions offered by Big Data management tools/ techniques and the opportunities it has created. General Terms Big Data, Big Data Opportunities, Big Data Challenges

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: This review provides an update of important developments in the analysis of big data and forward strategies to accelerate the global transition to personalized medicine.
Abstract: Recent advances in high-throughput technologies have led to the emergence of systems biology as a holistic science to achieve more precise modeling of complex diseases. Many predict the emergence of personalized medicine in the near future. We are, however, moving from two-tiered health systems to a two-tiered personalized medicine. Omics facilities are restricted to affluent regions, and personalized medicine is likely to widen the growing gap in health systems between high and low-income countries. This is mirrored by an increasing lag between our ability to generate and analyze big data. Several bottlenecks slow-down the transition from conventional to personalized medicine: generation of cost-effective high-throughput data; hybrid education and multidisciplinary teams; data storage and processing; data integration and interpretation; and individual and global economic relevance. This review provides an update of important developments in the analysis of big data and forward strategies to accelerate the global transition to personalized medicine.

415 citations

Journal ArticleDOI
TL;DR: This article provides an overview on the topic of Big Data, and how the current problem can be addressed from the perspective of Cloud Computing and its programming frameworks, and focuses on systems for large‐scale analytics based on the MapReduce scheme and Hadoop, its open‐source implementation.
Abstract: The term 'Big Data' has spread rapidly in the framework of Data Mining and Business Intelligence. This new scenario can be defined by means of those problems that cannot be effectively or efficiently addressed using the standard computing resources that we currently have. We must emphasize that Big Data does not just imply large volumes of data but also the necessity for scalability, i.e., to ensure a response in an acceptable elapsed time. When the scalability term is considered, usually traditional parallel-type solutions are contemplated, such as the Message Passing Interface or high performance and distributed Database Management Systems. Nowadays there is a new paradigm that has gained popularity over the latter due to the number of benefits it offers. This model is Cloud Computing, and among its main features we has to stress its elasticity in the use of computing resources and space, less management effort, and flexible costs. In this article, we provide an overview on the topic of Big Data, and how the current problem can be addressed from the perspective of Cloud Computing and its programming frameworks. In particular, we focus on those systems for large-scale analytics based on the MapReduce scheme and Hadoop, its open-source implementation. We identify several libraries and software projects that have been developed for aiding practitioners to address this new programming model. We also analyze the advantages and disadvantages of MapReduce, in contrast to the classical solutions in this field. Finally, we present a number of programming frameworks that have been proposed as an alternative to MapReduce, developed under the premise of solving the shortcomings of this model in certain scenarios and platforms. WIREs Data Mining Knowl Discov 2014, 4:380-409. doi: 10.1002/widm.1134

221 citations

Journal ArticleDOI
TL;DR: A novel distributed partitioning methodology for prototype reduction techniques in nearest neighbor classification that enables prototype reduction algorithms to be applied over big data classification problems without significant accuracy loss and is a suitable tool to enhance the performance of the nearest neighbor classifier with big data.

212 citations


Cites methods from "Big Challenges? Big Data …"

  • ...MRPR: A MapReduce solution for prototype reduction in big data classification Isaac Triguero a,n, Daniel Peralta a, Jaume Bacardit b, Salvador García c, Francisco Herrera a a Department of Computer Science and Artificial Intelligence, CITIC-UGR (Research Center on Information and Communications Technology), University of Granada, 18071 Granada, Spain b School of Computing Science, Newcastle University, NE1 7RU, Newcastle, UK c Department of Computer Science....

    [...]

  • ...MRPR: A MapReduce solution for prototype reduction in big data classification Isaac Triguero a,n, Daniel Peralta a, Jaume Bacardit b, Salvador García c, Francisco Herrera a a Department of Computer Science and Artificial Intelligence, CITIC-UGR (Research Center on Information and Communications…...

    [...]

Journal ArticleDOI
TL;DR: The proposed Spatial-Temporal Weighted K-Nearest Neighbor model, named STW-KNN, is implemented on a widely adopted Hadoop distributed computing platform with the MapReduce parallel processing paradigm, to enhance the accuracy and efficiency of short-term traffic flow forecasting.

160 citations


Cites background from "Big Challenges? Big Data …"

  • ...It has witnessed the big data era [1] for transportation coming...

    [...]

Journal ArticleDOI
TL;DR: Galactica as mentioned in this paper is a large language model that can store, combine and reason about scientific knowledge and outperforms existing models on a range of scientific tasks, such as LaTeX equations.
Abstract: Information overload is a major obstacle to scientific progress. The explosive growth in scientific literature and data has made it ever harder to discover useful insights in a large mass of information. Today scientific knowledge is accessed through search engines, but they are unable to organize scientific knowledge alone. In this paper we introduce Galactica: a large language model that can store, combine and reason about scientific knowledge. We train on a large scientific corpus of papers, reference material, knowledge bases and many other sources. We outperform existing models on a range of scientific tasks. On technical knowledge probes such as LaTeX equations, Galactica outperforms the latest GPT-3 by 68.2% versus 49.0%. Galactica also performs well on reasoning, outperforming Chinchilla on mathematical MMLU by 41.3% to 35.7%, and PaLM 540B on MATH with a score of 20.4% versus 8.8%. It also sets a new state-of-the-art on downstream tasks such as PubMedQA and MedMCQA dev of 77.6% and 52.9%. And despite not being trained on a general corpus, Galactica outperforms BLOOM and OPT-175B on BIG-bench. We believe these results demonstrate the potential for language models as a new interface for science. We open source the model for the benefit of the scientific community1.

142 citations

References
More filters
Journal ArticleDOI
TL;DR: A preference hierarchy is described that can be used to specify the structure of a naming system’s inference mechanism and defines criteria by which different naming systems can be evaluated.
Abstract: This paper reasons about naming systems as specialized inference mechanisms, It describes a preference )-zierarch.v that can be used to specify the structure of a naming system’s inference mechanism and defines criteria by which different naming systems can be evaluated, For example, the preference hierarchy allows one to compare naming systems based on how dkcrzmznating they are and to identify the class of names for which a given naming system is sound and complete. A study of several example naming systems demonstrates how the preference hierarchy can be used as a formal tool for designing naming systems.

412 citations

Journal ArticleDOI
18 Jul 2017
TL;DR: It is likely that somewhere between 70–80% of the information in the electronic health record is pathology data, and these data are relatively readily available and represent the collective experience of the pathology profession.
Abstract: It is likely that somewhere between 70–80% of the information in the electronic health record is pathology data (1). These data are relatively readily available and represent the collective experience of the pathology profession. The amount of this data is enormous. There are more than 500 million pathology tests performed each year in Australia alone (2).

11 citations