scispace - formally typeset
Search or ask a question
Author

Katarina Grolinger

Other affiliations: University of Zagreb
Bio: Katarina Grolinger is an academic researcher from University of Western Ontario. The author has contributed to research in topics: Deep learning & Energy consumption. The author has an hindex of 18, co-authored 51 publications receiving 1783 citations. Previous affiliations of Katarina Grolinger include University of Zagreb.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper compiles, summarizes, and organizes machine learning challenges with Big Data, highlighting the cause–effect relationship by organizing challenges according to Big Data Vs or dimensions that instigated the issue: volume, velocity, variety, or veracity.
Abstract: The Big Data revolution promises to transform how we live, work, and think by enabling process optimization, empowering insight discovery and improving decision making. The realization of this grand potential relies on the ability to extract value from such massive data through data analytics; machine learning is at its core because of its ability to learn from data and provide data driven insights, decisions, and predictions. However, traditional machine learning approaches were developed in a different era, and thus are based upon multiple assumptions, such as the data set fitting entirely into memory, what unfortunately no longer holds true in this new context. These broken assumptions, together with the Big Data characteristics, are creating obstacles for the traditional techniques. Consequently, this paper compiles, summarizes, and organizes machine learning challenges with Big Data. In contrast to other research that discusses challenges, this work highlights the cause–effect relationship by organizing challenges according to Big Data Vs or dimensions that instigated the issue: volume, velocity, variety, or veracity. Moreover, emerging machine learning approaches and techniques are discussed in terms of how they are capable of handling the various challenges with the ultimate objective of helping practitioners select appropriate solutions for their use cases. Finally, a matrix relating the challenges and approaches is presented. Through this process, this paper provides a perspective on the domain, identifies research gaps and opportunities, and provides a strong foundation and encouragement for further research in the field of machine learning with Big Data.

592 citations

Journal ArticleDOI
01 Dec 2013
TL;DR: This study has identified challenges in the field, including the immense diversity and inconsistency of terminologies, limited documentation, sparse comparison and benchmarking criteria, and nonexistence of standardized query languages.
Abstract: Advances in Web technology and the proliferation of mobile devices and sensors connected to the Internet have resulted in immense processing and storage requirements. Cloud computing has emerged as a paradigm that promises to meet these requirements. This work focuses on the storage aspect of cloud computing, specifically on data management in cloud environments. Traditional relational databases were designed in a different hardware and software era and are facing challenges in meeting the performance and scale requirements of Big Data. NoSQL and NewSQL data stores present themselves as alternatives that can handle huge volume of data. Because of the large number and diversity of existing NoSQL and NewSQL solutions, it is difficult to comprehend the domain and even more challenging to choose an appropriate solution for a specific task. Therefore, this paper reviews NoSQL and NewSQL solutions with the objective of: (1) providing a perspective in the field, (2) providing guidance to practitioners and researchers to choose the appropriate data store, and (3) identifying challenges and opportunities in the field. Specifically, the most prominent solutions are compared focusing on data models, querying, scaling, and security related capabilities. Features driving the ability to scale read requests and write requests, or scaling data storage are investigated, in particular partitioning, replication, consistency, and concurrency control. Furthermore, use cases and scenarios in which NoSQL and NewSQL data stores have been used are discussed and the suitability of various solutions for different sets of applications is examined. Consequently, this study has identified challenges in the field, including the immense diversity and inconsistency of terminologies, limited documentation, sparse comparison and benchmarking criteria, and nonexistence of standardized query languages.

304 citations

Proceedings ArticleDOI
01 Dec 2015
TL;DR: This paper proposes an architecture to create a flexible and scalable machine learning as a service, using real-world sensor and weather data by running different algorithms at the same time.
Abstract: The demand for knowledge extraction has been increasing. With the growing amount of data being generated by global data sources (e.g., social media and mobile apps) and the popularization of context-specific data (e.g., the Internet of Things), companies and researchers need to connect all these data and extract valuable information. Machine learning has been gaining much attention in data mining, leveraging the birth of new solutions. This paper proposes an architecture to create a flexible and scalable machine learning as a service. An open source solution was implemented and presented. As a case study, a forecast of electricity demand was generated using real-world sensor and weather data by running different algorithms at the same time.

281 citations

Journal ArticleDOI
TL;DR: A new pattern-based anomaly classifier is proposed, the collective contextual anomaly detection using sliding window (CCAD-SW) framework, which improved the anomaly detection capacity of the CCAD- SW by 3.6% and reduced false alarm rate by 2.7%.

162 citations

Proceedings ArticleDOI
27 Jun 2014
TL;DR: The identified issues and challenges MapReduce faces when handling Big Data are grouped into four main categories corresponding to Big Data tasks types: data storage, Big Data analytics, online processing, and security and privacy.
Abstract: In the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting continuously increasing demands on computing resources imposed by massive data sets. The reason for this is the high scalability of the MapReduce paradigm which allows for massively parallel and distributed execution over a large number of computing nodes. This paper identifies MapReduce issues and challenges in handling Big Data with the objective of providing an overview of the field, facilitating better planning and management of Big Data projects, and identifying opportunities for future research in this field. The identified challenges are grouped into four main categories corresponding to Big Data tasks types: data storage (relational databases and NoSQL stores), Big Data analytics (machine learning and interactive analytics), online processing, and security and privacy. Moreover, current efforts aimed at improving and extending MapReduce to address identified challenges are presented. Consequently, by identifying issues and challenges MapReduce faces when handling Big Data, this study encourages future Big Data research.

149 citations


Cited by
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Book ChapterDOI
01 Jan 1982
TL;DR: In this article, the authors discuss leading problems linked to energy that the world is now confronting and propose some ideas concerning possible solutions, and conclude that it is necessary to pursue actively the development of coal, natural gas, and nuclear power.
Abstract: This chapter discusses leading problems linked to energy that the world is now confronting and to propose some ideas concerning possible solutions. Oil deserves special attention among all energy sources. Since the beginning of 1981, it has merely been continuing and enhancing the downward movement in consumption and prices caused by excessive rises, especially for light crudes such as those from Africa, and the slowing down of worldwide economic growth. Densely-populated oil-producing countries need to produce to live, to pay for their food and their equipment. If the economic growth of the industrialized countries were to be 4%, even if investment in the rational use of energy were pushed to the limit and the development of nonpetroleum energy sources were also pursued actively, it would be extremely difficult to prevent a sharp rise in prices. It is evident that it is absolutely necessary to pursue actively the development of coal, natural gas, and nuclear power if a physical shortage of energy is not to block economic growth.

2,283 citations

Journal ArticleDOI
TL;DR: An application-oriented review of smart meter data analytics identifies the key application areas as load analysis, load forecasting, and load management and reviews the techniques and methodologies adopted or developed to address each application.
Abstract: The widespread popularity of smart meters enables an immense amount of fine-grained electricity consumption data to be collected. Meanwhile, the deregulation of the power industry, particularly on the delivery side, has continuously been moving forward worldwide. How to employ massive smart meter data to promote and enhance the efficiency and sustainability of the power grid is a pressing issue. To date, substantial works have been conducted on smart meter data analytics. To provide a comprehensive overview of the current research and to identify challenges for future research, this paper conducts an application-oriented review of smart meter data analytics. Following the three stages of analytics, namely, descriptive, predictive, and prescriptive analytics, we identify the key application areas as load analysis, load forecasting, and load management. We also review the techniques and methodologies adopted or developed to address each application. In addition, we also discuss some research trends, such as big data issues, novel machine learning technologies, new business models, the transition of energy systems, and data privacy and security.

621 citations