Big Challenges? Big Data …

doi:10.5120/IJCA2015907452

Home
/
Papers
/
Big Challenges? Big Data …

Journal Article•DOI•

Big Challenges? Big Data …

17 Dec 2015-International Journal of Computer Applications (Foundation of Computer Science (FCS), NY, USA)-Vol. 131, Iss: 11, pp 14-18

TL;DR: This document provides insights on the challenges of managing such a huge data – popularly known as Big Data, the solutions offered by Big Data management tools/ techniques and the opportunities it has created.

read less

Abstract: In today„s world, every tiny gadget is a potential data source, adding to the huge data bank. Every day, we create 2.5 quintillion bytes of data – structured and unstructured, so much that 90% of the data in the world today has been created in the last two years alone. This data generated through large customer transactions, social networking sites is varied, voluminous and rapidly generating. All this data prove a storage and processing crisis for the enterprises. While more data enables realistic analysis and thus help in making accurate business decisions / goals, it is equally difficult to manage and analyze such a huge amount of data. This document provides insights on the challenges of managing such a huge Data – popularly known as Big Data, the solutions offered by Big Data management tools/ techniques and the opportunities it has created. General Terms Big Data, Big Data Opportunities, Big Data Challenges

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

From big data analysis to personalized medicine for all: challenges and opportunities

[...]

Akram Alyass¹, Michelle Turcotte¹, David Meyre¹•Institutions (1)

McMaster University¹

27 Jun 2015-BMC Medical Genomics

TL;DR: This review provides an update of important developments in the analysis of big data and forward strategies to accelerate the global transition to personalized medicine.

...read moreread less

Abstract: Recent advances in high-throughput technologies have led to the emergence of systems biology as a holistic science to achieve more precise modeling of complex diseases. Many predict the emergence of personalized medicine in the near future. We are, however, moving from two-tiered health systems to a two-tiered personalized medicine. Omics facilities are restricted to affluent regions, and personalized medicine is likely to widen the growing gap in health systems between high and low-income countries. This is mirrored by an increasing lag between our ability to generate and analyze big data. Several bottlenecks slow-down the transition from conventional to personalized medicine: generation of cost-effective high-throughput data; hybrid education and multidisciplinary teams; data storage and processing; data integration and interpretation; and individual and global economic relevance. This review provides an update of important developments in the analysis of big data and forward strategies to accelerate the global transition to personalized medicine.

...read moreread less

415 citations

Journal Article•DOI•

Big Data with Cloud Computing: an insight on the computing environment, MapReduce, and programming frameworks

[...]

Alberto Fernández¹, Sara del Río², Victoria López², Abdullah Bawakid³, María José del Jesus¹, José Manuel Benítez², Francisco Herrera², Francisco Herrera³ - Show less +4 more•Institutions (3)

University of Jaén¹, University of Granada², King Abdulaziz University³

01 Sep 2014-Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

TL;DR: This article provides an overview on the topic of Big Data, and how the current problem can be addressed from the perspective of Cloud Computing and its programming frameworks, and focuses on systems for large‐scale analytics based on the MapReduce scheme and Hadoop, its open‐source implementation.

...read moreread less

Abstract: The term 'Big Data' has spread rapidly in the framework of Data Mining and Business Intelligence. This new scenario can be defined by means of those problems that cannot be effectively or efficiently addressed using the standard computing resources that we currently have. We must emphasize that Big Data does not just imply large volumes of data but also the necessity for scalability, i.e., to ensure a response in an acceptable elapsed time. When the scalability term is considered, usually traditional parallel-type solutions are contemplated, such as the Message Passing Interface or high performance and distributed Database Management Systems. Nowadays there is a new paradigm that has gained popularity over the latter due to the number of benefits it offers. This model is Cloud Computing, and among its main features we has to stress its elasticity in the use of computing resources and space, less management effort, and flexible costs. In this article, we provide an overview on the topic of Big Data, and how the current problem can be addressed from the perspective of Cloud Computing and its programming frameworks. In particular, we focus on those systems for large-scale analytics based on the MapReduce scheme and Hadoop, its open-source implementation. We identify several libraries and software projects that have been developed for aiding practitioners to address this new programming model. We also analyze the advantages and disadvantages of MapReduce, in contrast to the classical solutions in this field. Finally, we present a number of programming frameworks that have been proposed as an alternative to MapReduce, developed under the premise of solving the shortcomings of this model in certain scenarios and platforms. WIREs Data Mining Knowl Discov 2014, 4:380-409. doi: 10.1002/widm.1134

...read moreread less

221 citations

Journal Article•DOI•

MRPR: A MapReduce solution for prototype reduction in big data classification

[...]

Isaac Triguero¹, Daniel Peralta¹, Jaume Bacardit², Salvador García³, Francisco Herrera¹ - Show less +1 more•Institutions (3)

University of Granada¹, Newcastle University², University of Jaén³

20 Feb 2015-Neurocomputing

TL;DR: A novel distributed partitioning methodology for prototype reduction techniques in nearest neighbor classification that enables prototype reduction algorithms to be applied over big data classification problems without significant accuracy loss and is a suitable tool to enhance the performance of the nearest neighbor classifier with big data.

...read moreread less

212 citations

Cites methods from "Big Challenges? Big Data …"

...MRPR: A MapReduce solution for prototype reduction in big data classification Isaac Triguero a,n, Daniel Peralta a, Jaume Bacardit b, Salvador García c, Francisco Herrera a a Department of Computer Science and Artificial Intelligence, CITIC-UGR (Research Center on Information and Communications Technology), University of Granada, 18071 Granada, Spain b School of Computing Science, Newcastle University, NE1 7RU, Newcastle, UK c Department of Computer Science....
[...]
...MRPR: A MapReduce solution for prototype reduction in big data classification Isaac Triguero a,n, Daniel Peralta a, Jaume Bacardit b, Salvador García c, Francisco Herrera a a Department of Computer Science and Artificial Intelligence, CITIC-UGR (Research Center on Information and Communications…...
[...]

Journal Article•DOI•

A distributed spatial-temporal weighted model on MapReduce for short-term traffic flow forecasting

[...]

Dawen Xia¹, Binfeng Wang², Huaqing Li², Yantao Li², Zili Zhang³ - Show less +1 more•Institutions (3)

Minzu University of China¹, Southwest University², Deakin University³

29 Feb 2016-Neurocomputing

TL;DR: The proposed Spatial-Temporal Weighted K-Nearest Neighbor model, named STW-KNN, is implemented on a widely adopted Hadoop distributed computing platform with the MapReduce parallel processing paradigm, to enhance the accuracy and efficiency of short-term traffic flow forecasting.

...read moreread less

160 citations

Cites background from "Big Challenges? Big Data …"

...It has witnessed the big data era [1] for transportation coming...
[...]

Journal Article•DOI•

Galactica: A Large Language Model for Science

[...]

Ross Taylor, Marcin Kardas, Guillem Cucurull, Thomas Scialom, Anthony S. Hartshorn, Elvis Saravia, Andrew M. Poulton, Viktor Kerkez, Robert Stojnic - Show less +5 more

16 Nov 2022-arXiv.org

TL;DR: Galactica as mentioned in this paper is a large language model that can store, combine and reason about scientific knowledge and outperforms existing models on a range of scientific tasks, such as LaTeX equations.

...read moreread less

Abstract: Information overload is a major obstacle to scientific progress. The explosive growth in scientific literature and data has made it ever harder to discover useful insights in a large mass of information. Today scientific knowledge is accessed through search engines, but they are unable to organize scientific knowledge alone. In this paper we introduce Galactica: a large language model that can store, combine and reason about scientific knowledge. We train on a large scientific corpus of papers, reference material, knowledge bases and many other sources. We outperform existing models on a range of scientific tasks. On technical knowledge probes such as LaTeX equations, Galactica outperforms the latest GPT-3 by 68.2% versus 49.0%. Galactica also performs well on reasoning, outperforming Chinchilla on mathematical MMLU by 41.3% to 35.7%, and PaLM 540B on MATH with a score of 20.4% versus 8.8%. It also sets a new state-of-the-art on downstream tasks such as PubMedQA and MedMCQA dev of 77.6% and 52.9%. And despite not being trained on a general corpus, Galactica outperforms BLOOM and OPT-175B on BIG-bench. We believe these results demonstrate the potential for language models as a new interface for science. We open source the model for the benefit of the scientific community1.

...read moreread less

142 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Reasoning about naming systems

[...]

Mic Bowman¹, Saumya K. Debray², Larry L. Peterson²•Institutions (2)

Pennsylvania State University¹, University of Arizona²

01 Nov 1993-ACM Transactions on Programming Languages and Systems

TL;DR: A preference hierarchy is described that can be used to specify the structure of a naming system’s inference mechanism and defines criteria by which different naming systems can be evaluated.

...read moreread less

Abstract: This paper reasons about naming systems as specialized inference mechanisms, It describes a preference )-zierarch.v that can be used to specify the structure of a naming system’s inference mechanism and defines criteria by which different naming systems can be evaluated, For example, the preference hierarchy allows one to compare naming systems based on how dkcrzmznating they are and to identify the class of names for which a given naming system is sound and complete. A study of several example naming systems demonstrates how the preference hierarchy can be used as a formal tool for designing naming systems.

...read moreread less

412 citations

Journal Article•DOI•

Big data, bigger opportunities

[...]

Tony Badrick

18 Jul 2017

TL;DR: It is likely that somewhere between 70–80% of the information in the electronic health record is pathology data, and these data are relatively readily available and represent the collective experience of the pathology profession.

...read moreread less

Abstract: It is likely that somewhere between 70–80% of the information in the electronic health record is pathology data (1). These data are relatively readily available and represent the collective experience of the pathology profession. The amount of this data is enormous. There are more than 500 million pathology tests performed each year in Australia alone (2).

...read moreread less

11 citations