scispace - formally typeset

Journal ArticleDOI

Beyond the hype

01 Apr 2015-International Journal of Information Management (Pergamon)-Vol. 35, Iss: 2, pp 137-144

TL;DR: The need to develop appropriate and efficient analytical methods to leverage massive volumes of heterogeneous data in unstructured text, audio, and video formats is highlighted and the need to devise new tools for predictive analytics for structured big data is reinforced.
Abstract: We define what is meant by big data.We review analytics techniques for text, audio, video, and social media data.We make the case for new statistical techniques for big data.We highlight the expected future developments in big data analytics. Size is the first, and at times, the only dimension that leaps out at the mention of big data. This paper attempts to offer a broader definition of big data that captures its other unique and defining characteristics. The rapid evolution and adoption of big data by industry has leapfrogged the discourse to popular outlets, forcing the academic press to catch up. Academic journals in numerous disciplines, which will benefit from a relevant discussion of big data, have yet to cover the topic. This paper presents a consolidated description of big data by integrating definitions from practitioners and academics. The paper's primary focus is on the analytic methods used for big data. A particular distinguishing feature of this paper is its focus on analytics related to unstructured data, which constitute 95% of big data. This paper highlights the need to develop appropriate and efficient analytical methods to leverage massive volumes of heterogeneous data in unstructured text, audio, and video formats. This paper also reinforces the need to devise new tools for predictive analytics for structured big data. The statistical methods in practice were devised to infer from sample data. The heterogeneity, noise, and the massive size of structured big data calls for developing computationally efficient algorithms that may avoid big data pitfalls, such as spurious correlation.
Topics: Analytics (65%), Big data (64%), Unstructured data (61%), Predictive analytics (54%)
Citations
More filters

Journal ArticleDOI
Abstract: Big Data (BD), with their potential to ascertain valued insights for enhanced decision-making process, have recently attracted substantial interest from both academics and practitioners. Big Data Analytics (BDA) is increasingly becoming a trending practice that many organizations are adopting with the purpose of constructing valuable information from BD. The analytics process, including the deployment and use of BDA tools, is seen by organizations as a tool to improve operational efficiency though it has strategic potential, drive new revenue streams and gain competitive advantages over business rivals. However, there are different types of analytic applications to consider. Therefore, prior to hasty use and buying costly BD tools, there is a need for organizations to first understand the BDA landscape. Given the significant nature of the BD and BDA, this paper presents a state-of-the-art review that presents a holistic view of the BD challenges and BDA methods theorized/proposed/employed by organizations to help others understand this landscape with the objective of making robust investment decisions. In doing so, systematically analysing and synthesizing the extant research published on BD and BDA area. More specifically, the authors seek to answer the following two principal questions: Q1 – What are the different types of BD challenges theorized/proposed/confronted by organizations? and Q2 – What are the different types of BDA methods theorized/proposed/employed to overcome BD challenges? . This systematic literature review (SLR) is carried out through observing and understanding the past trends and extant patterns/themes in the BDA research area, evaluating contributions, summarizing knowledge, thereby identifying limitations, implications and potential further research avenues to support the academic community in exploring research themes/patterns. Thus, to trace the implementation of BD strategies, a profiling method is employed to analyze articles (published in English-speaking peer-reviewed journals between 1996 and 2015) extracted from the Scopus database. The analysis presented in this paper has identified relevant BD research studies that have contributed both conceptually and empirically to the expansion and accrual of intellectual wealth to the BDA in technology and organizational resource management discipline.

883 citations


Cites background or methods from "Beyond the hype"

  • ...On the other hand, the challenges are significant such as data integration complexities (Gandomi & Haider, 2015), lack of skilled personal and sufficient resources (Kim, Trimi, & Chung, 2014), data security and privacy issues (Barnaghi, Sheth, & Henson, 2013), inadequate infrastructure and…...

    [...]

  • ...…say 3Vs [volume, velocity and variety] of data (e.g. Shah, Rabhi, & Ray, 2015), others reported 4Vs [volume, velocity, variety, and variability] of data (e.g. Liao, Yin, Huang, & Sheng, 2014) and 6Vs [volume, velocity, variety, veracity, variability, and value] of data (Gandomi & Haider, 2015)....

    [...]

  • ...Gandomi and Haider (2015) asserts the need to develop new solutions for predictive analytics for structured BD. Predictive analytics are principally based on statistical methods and seeks to uncover patterns and capture relationships in data....

    [...]

  • ...Big Data analytical methods – related to Q2 To facilitate evidence-based decision-making, organizations need efficient methods to process large volumes of assorted data into meaningful comprehensions (Gandomi & Haider, 2015)....

    [...]

  • ...Thus, the necessity to deal with inaccurate and ambiguous data is another facet of BD, which is addressed using tools and analytics developed for management and mining of unreliable data (Gandomi & Haider, 2015)....

    [...]


Journal ArticleDOI
TL;DR: This paper presents a comprehensive discussion on state-of-the-art big data technologies based on batch and stream data processing based on structuralism and functionalism paradigms and strengths and weaknesses of these technologies are analyzed.
Abstract: We use structuralism and functionalism paradigms to analyze the origins of big data applications.Current trends and sources of big data.Processing technologies, methods and analysis techniques for big data are compared in detail.We analyze major challenges with big data and also discussed several opportunities.Case studies and emerging technologies for big data problems are discussed. Big data is a potential research area receiving considerable attention from academia and IT communities. In the digital world, the amounts of data generated and stored have expanded within a short period of time. Consequently, this fast growing rate of data has created many challenges. In this paper, we use structuralism and functionalism paradigms to analyze the origins of big data applications and its current trends. This paper presents a comprehensive discussion on state-of-the-art big data technologies based on batch and stream data processing. Moreover, strengths and weaknesses of these technologies are analyzed. This study also discusses big data analytics techniques, processing methods, some reported case studies from different vendors, several open research challenges, and the opportunities brought about by big data. The similarities and differences of these techniques and technologies based on important parameters are also investigated. Emerging technologies are recommended as a solution for big data problems.

877 citations


Journal ArticleDOI
TL;DR: A new CNN based on LeNet-5 is proposed for fault diagnosis which can extract the features of the converted 2-D images and eliminate the effect of handcrafted features and has achieved significant improvements.
Abstract: Fault diagnosis is vital in manufacturing system, since early detections on the emerging problem can save invaluable time and cost. With the development of smart manufacturing, the data-driven fault diagnosis becomes a hot topic. However, the traditional data-driven fault diagnosis methods rely on the features extracted by experts. The feature extraction process is an exhausted work and greatly impacts the final result. Deep learning (DL) provides an effective way to extract the features of raw data automatically. Convolutional neural network (CNN) is an effective DL method. In this study, a new CNN based on LeNet-5 is proposed for fault diagnosis. Through a conversion method converting signals into two-dimensional (2-D) images, the proposed method can extract the features of the converted 2-D images and eliminate the effect of handcrafted features. The proposed method which is tested on three famous datasets, including motor bearing dataset, self-priming centrifugal pump dataset, and axial piston hydraulic pump dataset, has achieved prediction accuracy of 99.79%, 99.481%, and 100%, respectively. The results have been compared with other DL and traditional methods, including adaptive deep CNN, sparse filter, deep belief network, and support vector machine. The comparisons show that the proposed CNN-based data-driven fault diagnosis method has achieved significant improvements.

654 citations


Cites methods from "Beyond the hype"

  • ...This provides new opportunities for the data-driven fault diagnosis methods to make full use of the massive mechanical data [5], and has received more and more attentions from the researchers and engineers....

    [...]


Journal ArticleDOI
Fei Tao1, Qinglin Qi1, Ang Liu2, Andrew Kusiak3Institutions (3)
TL;DR: The role of big data in supporting smart manufacturing is discussed, a historical perspective to data lifecycle in manufacturing is overviewed, and a conceptual framework proposed in the paper is proposed.
Abstract: The advances in the internet technology, internet of things, cloud computing, big data, and artificial intelligence have profoundly impacted manufacturing. The volume of data collected in manufacturing is growing. Big data offers a tremendous opportunity in the transformation of today’s manufacturing paradigm to smart manufacturing. Big data empowers companies to adopt data-driven strategies to become more competitive. In this paper, the role of big data in supporting smart manufacturing is discussed. A historical perspective to data lifecycle in manufacturing is overviewed. The big data perspective is supported by a conceptual framework proposed in the paper. Typical application scenarios of the proposed framework are outlined.

544 citations


Journal ArticleDOI
29 Mar 2017-IEEE Access
TL;DR: The state-of-the-art research efforts directed toward big IoT data analytics are investigated, the relationship between big data analytics and IoT is explained, and several opportunities brought by data analytics in IoT paradigm are discussed.
Abstract: Voluminous amounts of data have been produced, since the past decade as the miniaturization of Internet of things (IoT) devices increases. However, such data are not useful without analytic power. Numerous big data, IoT, and analytics solutions have enabled people to obtain valuable insight into large data generated by IoT devices. However, these solutions are still in their infancy, and the domain lacks a comprehensive survey. This paper investigates the state-of-the-art research efforts directed toward big IoT data analytics. The relationship between big data analytics and IoT is explained. Moreover, this paper adds value by proposing a new architecture for big IoT data analytics. Furthermore, big IoT data analytic types, methods, and technologies for big data mining are discussed. Numerous notable use cases are also presented. Several opportunities brought by data analytics in IoT paradigm are then discussed. Finally, open research challenges, such as privacy, big data mining, visualization, and integration, are presented as future research directions.

467 citations


Cites background from "Beyond the hype"

  • ...Furthermore, customer buying predictions and social media trends are analyzed through predictive analytics [61] (see Table 2)....

    [...]


References
More filters

Book
Bo Pang1, Lillian Lee2Institutions (2)
08 Jul 2008-
TL;DR: This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems and focuses on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis.
Abstract: An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object. This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Our focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. We include material on summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.

7,180 citations


"Beyond the hype" refers background in this paper

  • ...For example, sentiment analysis (opinion mining) have been known 1 l of In s n u T s t b h n i g d t m i o A s R A B B C C C D F F F G G H H H H H 44 A. Gandomi, M. Haider / International Journa ince the early 2000s (Pang & Lee, 2008)....

    [...]


Book
13 May 2011-
TL;DR: The amount of data in the authors' world has been exploding, and analyzing large data sets will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus, according to research by MGI and McKinsey.
Abstract: The amount of data in our world has been exploding, and analyzing large data sets—so-called big data— will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus, according to research by MGI and McKinsey's Business Technology Office. Leaders in every sector will have to grapple with the implications of big data, not just a few data-oriented managers. The increasing volume and detail of information captured by enterprises, the rise of multimedia, social media, and the Internet of Things will fuel exponential growth in data for the foreseeable future.

4,551 citations


Journal ArticleDOI
TL;DR: This introduction to the MIS Quarterly Special Issue on Business Intelligence Research first provides a framework that identifies the evolution, applications, and emerging research areas of BI&A, and introduces and characterized the six articles that comprise this special issue in terms of the proposed BI &A research framework.
Abstract: Business intelligence and analytics (BI&A) has emerged as an important area of study for both practitioners and researchers, reflecting the magnitude and impact of data-related problems to be solved in contemporary business organizations. This introduction to the MIS Quarterly Special Issue on Business Intelligence Research first provides a framework that identifies the evolution, applications, and emerging research areas of BI&A. BI&A 1.0, BI&A 2.0, and BI&A 3.0 are defined and described in terms of their key characteristics and capabilities. Current research in BI&A is analyzed and challenges and opportunities associated with BI&A research and education are identified. We also report a bibliometric study of critical BI&A publications, researchers, and research topics based on more than a decade of related academic and industry publications. Finally, the six articles that comprise this special issue are introduced and characterized in terms of the proposed BI&A research framework.

4,008 citations


Book
Bing Liu1Institutions (1)
01 May 2012-
Abstract: Sentiment analysis and opinion mining is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language. It is one of the most active research areas in natural language processing and is also widely studied in data mining, Web mining, and text mining. In fact, this research has spread outside of computer science to the management sciences and social sciences due to its importance to business and society as a whole. The growing importance of sentiment analysis coincides with the growth of social media such as reviews, forum discussions, blogs, micro-blogs, Twitter, and social networks. For the first time in human history, we now have a huge volume of opinionated data recorded in digital form for analysis. Sentiment analysis systems are being applied in almost every business and social domain because opinions are central to almost all human activities and are key influencers of our behaviors. Our beliefs and perceptions of reality, and the choices we make, are largely conditioned on how others see and evaluate the world. For this reason, when we need to make a decision we often seek out the opinions of others. This is true not only for individuals but also for organizations. This book is a comprehensive introductory and survey text. It covers all important topics and the latest developments in the field with over 400 references. It is suitable for students, researchers and practitioners who are interested in social media analysis in general and sentiment analysis in particular. Lecturers can readily use it in class for courses on natural language processing, social media analysis, text mining, and data mining. Lecture slides are also available online.

3,931 citations


Journal ArticleDOI
Jianqing Fan1, Jinchi Lv2Institutions (2)
Abstract: Summary. Variable selection plays an important role in high dimensional statistical modelling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, accuracy of estimation and computational cost are two top concerns. Recently, Candes and Tao have proposed the Dantzig selector using L1-regularization and showed that it achieves the ideal risk up to a logarithmic factor log (p). Their innovative procedure and remarkable result are challenged when the dimensionality is ultrahigh as the factor log (p) can be large and their uniform uncertainty principle can fail. Motivated by these concerns, we introduce the concept of sure screening and propose a sure screening method that is based on correlation learning, called sure independence screening, to reduce dimensionality from high to a moderate scale that is below the sample size. In a fairly general asymptotic framework, correlation learning is shown to have the sure screening property for even exponentially growing dimensionality. As a methodological extension, iterative sure independence screening is also proposed to enhance its finite sample performance. With dimension reduced accurately from high to below sample size, variable selection can be improved on both speed and accuracy, and can then be accomplished by a well-developed method such as smoothly clipped absolute deviation, the Dantzig selector, lasso or adaptive lasso. The connections between these penalized least squares methods are also elucidated.

1,913 citations


Network Information
Related Papers (5)
Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
202218
2021443
2020460
2019540
2018420
2017307