scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Beyond the hype

01 Apr 2015-International Journal of Information Management (Pergamon)-Vol. 35, Iss: 2, pp 137-144
TL;DR: The need to develop appropriate and efficient analytical methods to leverage massive volumes of heterogeneous data in unstructured text, audio, and video formats is highlighted and the need to devise new tools for predictive analytics for structured big data is reinforced.
About: This article is published in International Journal of Information Management.The article was published on 2015-04-01 and is currently open access. It has received 2962 citations till now. The article focuses on the topics: Analytics & Big data.
Citations
More filters
Journal ArticleDOI
TL;DR: In this article, the authors present a state-of-the-art review that presents a holistic view of the BD challenges and BDA methods theorized/proposed/employed by organizations to help others understand this landscape with the objective of making robust investment decisions.

1,267 citations


Cites background or methods from "Beyond the hype"

  • ...On the other hand, the challenges are significant such as data integration complexities (Gandomi & Haider, 2015), lack of skilled personal and sufficient resources (Kim, Trimi, & Chung, 2014), data security and privacy issues (Barnaghi, Sheth, & Henson, 2013), inadequate infrastructure and…...

    [...]

  • ...…say 3Vs [volume, velocity and variety] of data (e.g. Shah, Rabhi, & Ray, 2015), others reported 4Vs [volume, velocity, variety, and variability] of data (e.g. Liao, Yin, Huang, & Sheng, 2014) and 6Vs [volume, velocity, variety, veracity, variability, and value] of data (Gandomi & Haider, 2015)....

    [...]

  • ...Gandomi and Haider (2015) asserts the need to develop new solutions for predictive analytics for structured BD. Predictive analytics are principally based on statistical methods and seeks to uncover patterns and capture relationships in data....

    [...]

  • ...Big Data analytical methods – related to Q2 To facilitate evidence-based decision-making, organizations need efficient methods to process large volumes of assorted data into meaningful comprehensions (Gandomi & Haider, 2015)....

    [...]

  • ...Thus, the necessity to deal with inaccurate and ambiguous data is another facet of BD, which is addressed using tools and analytics developed for management and mining of unreliable data (Gandomi & Haider, 2015)....

    [...]

Journal ArticleDOI
TL;DR: A new CNN based on LeNet-5 is proposed for fault diagnosis which can extract the features of the converted 2-D images and eliminate the effect of handcrafted features and has achieved significant improvements.
Abstract: Fault diagnosis is vital in manufacturing system, since early detections on the emerging problem can save invaluable time and cost. With the development of smart manufacturing, the data-driven fault diagnosis becomes a hot topic. However, the traditional data-driven fault diagnosis methods rely on the features extracted by experts. The feature extraction process is an exhausted work and greatly impacts the final result. Deep learning (DL) provides an effective way to extract the features of raw data automatically. Convolutional neural network (CNN) is an effective DL method. In this study, a new CNN based on LeNet-5 is proposed for fault diagnosis. Through a conversion method converting signals into two-dimensional (2-D) images, the proposed method can extract the features of the converted 2-D images and eliminate the effect of handcrafted features. The proposed method which is tested on three famous datasets, including motor bearing dataset, self-priming centrifugal pump dataset, and axial piston hydraulic pump dataset, has achieved prediction accuracy of 99.79%, 99.481%, and 100%, respectively. The results have been compared with other DL and traditional methods, including adaptive deep CNN, sparse filter, deep belief network, and support vector machine. The comparisons show that the proposed CNN-based data-driven fault diagnosis method has achieved significant improvements.

1,240 citations


Cites methods from "Beyond the hype"

  • ...This provides new opportunities for the data-driven fault diagnosis methods to make full use of the massive mechanical data [5], and has received more and more attentions from the researchers and engineers....

    [...]

Journal ArticleDOI
TL;DR: This paper presents a comprehensive discussion on state-of-the-art big data technologies based on batch and stream data processing based on structuralism and functionalism paradigms and strengths and weaknesses of these technologies are analyzed.

964 citations

Journal ArticleDOI
TL;DR: The role of big data in supporting smart manufacturing is discussed, a historical perspective to data lifecycle in manufacturing is overviewed, and a conceptual framework proposed in the paper is proposed.

937 citations

Journal ArticleDOI
Qinglin Qi1, Fei Tao1
TL;DR: The similarities and differences between big data and digital twin are compared from the general and data perspectives and how they can be integrated to promote smart manufacturing are discussed.
Abstract: With the advances in new-generation information technologies, especially big data and digital twin, smart manufacturing is becoming the focus of global manufacturing transformation and upgrading. Intelligence comes from data. Integrated analysis for the manufacturing big data is beneficial to all aspects of manufacturing. Besides, the digital twin paves a way for the cyber-physical integration of manufacturing, which is an important bottleneck to achieve smart manufacturing. In this paper, the big data and digital twin in manufacturing are reviewed, including their concept as well as their applications in product design, production planning, manufacturing, and predictive maintenance. On this basis, the similarities and differences between big data and digital twin are compared from the general and data perspectives. Since the big data and digital twin can be complementary, how they can be integrated to promote smart manufacturing are discussed.

856 citations

References
More filters
Journal ArticleDOI
01 Nov 2011
TL;DR: Methods for video structure analysis, including shot boundary detection, key frame extraction and scene segmentation, extraction of features including static key frame features, object features and motion features, video data mining, video annotation, and video retrieval including query interfaces are analyzed.
Abstract: Video indexing and retrieval have a wide spectrum of promising applications, motivating the interest of researchers worldwide. This paper offers a tutorial and an overview of the landscape of general strategies in visual content-based video indexing and retrieval, focusing on methods for video structure analysis, including shot boundary detection, key frame extraction and scene segmentation, extraction of features including static key frame features, object features and motion features, video data mining, video annotation, video retrieval including query interfaces, similarity measure and relevance feedback, and video browsing. Finally, we analyze future research directions.

606 citations

Book ChapterDOI
17 Mar 2011
TL;DR: This article surveys some representative link prediction methods by categorizing them by the type of models, largely considering three types of models: first, the traditional (non-Bayesian) models which extract a set of features to train a binary classification model, and second, the probabilistic approaches which model the joint-probability among the entities in a network by Bayesian graphical models.
Abstract: Link prediction is an important task for analying social networks which also has applications in other domains like, information retrieval, bioinformatics and e-commerce There exist a variety of techniques for link prediction, ranging from feature-based classification and kernel-based method to matrix factorization and probabilistic graphical models These methods differ from each other with respect to model complexity, prediction performance, scalability, and generalization ability In this article, we survey some representative link prediction methods by categorizing them by the type of the models We largely consider three types of models: first, the traditional (non-Bayesian) models which extract a set of features to train a binary classification model Second, the probabilistic approaches which model the joint-probability among the entities in a network by Bayesian graphical models And, finally the linear algebraic approach which computes the similarity between the nodes in a network by rank-reduced similarity matrices We discuss various existing link prediction models that fall in these broad categories and analyze their strength and weakness We conclude the survey with a discussion on recent developments and future research direction

566 citations


Additional excerpts

  • ...In biology, link prediction techniques are used to discover links r associations in biological networks (e.g., protein–protein interction networks), eliminating the need for expensive experiments Hasan & Zaki, 2011)....

    [...]

Journal ArticleDOI
TL;DR: A research model is proposed to explain the acquisition intention of big data analytics mainly from the theoretical perspectives of data quality management and data usage experience and empirical investigation reveals that a firm's intention for big data Analytics can be positively affected by its competence in maintaining the quality of corporate data.

550 citations

Proceedings ArticleDOI
Doug Beaver1, Sanjeev Kumar1, Harry C. Li1, Jason Sobel1, Peter Vajgel1 
04 Oct 2010
TL;DR: This paper describes Haystack, an object storage system optimized for Facebook's Photos application, which provides a less expensive and higher performing solution than the previous approach, which leveraged network attached storage appliances over NFS.
Abstract: This paper describes Haystack, an object storage system optimized for Facebook's Photos application Facebook currently stores over 260 billion images, which translates to over 20 petabytes of data Users upload one billion new photos (∼60 terabytes) each week and Facebook serves over one million images per second at peak Haystack provides a less expensive and higher performing solution than our previous approach, which leveraged network attached storage appliances over NFS Our key observation is that this traditional design incurs an excessive number of disk operations because of metadata lookups We carefully reduce this per photo metadata so that Haystack storage machines can perform all metadata lookups in main memory This choice conserves disk operations for reading actual data and thus increases overall throughput

473 citations

Book
05 May 2010
TL;DR: This book discusses graph-based community detection techniques and many important extensions that handle dynamic, heterogeneous networks in social media, and demonstrates how discovered patterns of communities can be used for social media mining.
Abstract: This book, from a data mining perspective, introduces characteristics of social media, reviews representative tasks of computing with social media, and illustrates associated challenges. It introduces basic concepts, presents state-of-the-art algorithms with easy-to-understand examples, and recommends effective evaluation methods. In particular, we discuss graph-based community detection techniques and many important extensions that handle dynamic, heterogeneous networks in social media. We also demonstrate how discovered patterns of communities can be used for social media mining. The concepts, algorithms, and methods presented in this lecture can help harness the power of social media and support building socially-intelligent systems. This book is an accessible introduction to the study of \emph{community detection and mining in social media}. It is an essential reading for students, researchers, and practitioners in disciplines and applications where social media is a key source of data that piques our curiosity to understand, manage, innovate, and excel. This book is supported by additional materials, including lecture slides, the complete set of figures, key references, some toy data sets used in the book, and the source code of representative algorithms. The readers are encouraged to visit the book website http://dmml.asu.edu/cdm/ for the latest information.

373 citations