scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Big-Data Processing Techniques and Their Challenges in Transport Domain

TL;DR: The strengths and weaknesses of various big-data cloud processing techniques are highlighted in order to help the big- data community select the appropriate processing technique.
Abstract: This paper describes the fundamentals of cloud computing and current big-data key technologies. We categorize big-data processing as batch-based, stream-based, graph-based, DAG-based, interactive-based, or visual-based according to the processing technique. We highlight the strengths and weaknesses of various big-data cloud processing techniques in order to help the big-data community select the appropriate processing technique. We also provide big data research challenges and future directions in aspect to transportation management systems.
Citations
More filters
Journal ArticleDOI
TL;DR: This research paper investigates the current trends and identifies the existing challenges in development of a big scholarly data platform, with specific focus on directions for future research and maps them to the different phases of the big data lifecycle.
Abstract: Survey of big scholarly data with respect to the different phases of the big data lifecycle.Identifies the different big data tools and technologies that can be used for development of scholarly applications.Investigates research challenges and limitations specific to big scholarly data and its applications.Provides research directions and paves way towards the development of a generic and comprehensive big scholarly data platform. Recently, there has been a shifting focus of organizations and governments towards digitization of academic and technical documents, adding a new facet to the concept of digital libraries. The volume, variety and velocity of this generated data, satisfies the big data definition, as a result of which, this scholarly reserve is popularly referred to as big scholarly data. In order to facilitate data analytics for big scholarly data, architectures and services for the same need to be developed. The evolving nature of research problems has made them essentially interdisciplinary. As a result, there is a growing demand for scholarly applications like collaborator discovery, expert finding and research recommendation systems, in addition to several others. This research paper investigates the current trends and identifies the existing challenges in development of a big scholarly data platform, with specific focus on directions for future research and maps them to the different phases of the big data lifecycle.

104 citations

Journal ArticleDOI
TL;DR: An overview of Big Data technologies in context of transportation with specific to Railways is given and insight on how the existing data modules from the transport authority combines Big Data and how can be incorporated in providing maintenance decision making is given.

102 citations

Journal ArticleDOI
TL;DR: The paper presents a Big Data smart library system that has the potential to create new values and data-driven decisions by incorporating multiple sources of differential data.
Abstract: With the exponential growth of the amount of data, the most sophisticated systems of traditional libraries are not able to fulfill the demands of modern business and user needs. The purpose of this paper is to present the possibility of creating a Big Data smart library as an integral and enhanced part of the educational system that will improve user service and increase motivation in the continuous learning process through content-aware recommendations.,This paper presents an approach to the design of a Big Data system for collecting, analyzing, processing and visualizing data from different sources to a smart library specifically suitable for application in educational institutions.,As an integrated recommender system of the educational institution, the practical application of Big Data smart library meets the user needs and assists in finding personalized content from several sources, resulting in economic benefits for the institution and user long-term satisfaction.,The need for continuous education alters business processes in libraries with requirements to adopt new technologies, business demands, and interactions with users. To be able to engage in a new era of business in the Big Data environment, librarians need to modernize their infrastructure for data collection, data analysis, and data visualization.,A unique value of this paper is its perspective of the implementation of a Big Data solution for smart libraries as a part of a continuous learning process, with the aim to improve the results of library operations by integrating traditional systems with Big Data technology. The paper presents a Big Data smart library system that has the potential to create new values and data-driven decisions by incorporating multiple sources of differential data.

47 citations

Book ChapterDOI
28 Jun 2021
TL;DR: In this paper, the authors highlight the impression and effect on decision-making through big data and discuss applications of big data-influenced decision making, along with state-of-the-art big data techniques and technologies.
Abstract: Big Data (BD) has shifted the paradigm of conventional data analysis with the exploitation of emerging technologies. Analysis using BD contributes to foreseeing and pulling out value from large data, exposing covert information, and expediting the decision-making process. This study highlights the impression and effect on decision-making through BD. The investigation’s rationale is to dig deep insight into the buzzword to enable stakeholders to understand the challenges and opportunities that BD has bought in the current business scenarios. It also discusses applications of BD-influenced decision-making, along with state-of-the-art BD techniques and technologies. The study is a review article based on the research articles, conference proceedings, books, and web articles available on Google Scholar and Google from the period between 2010 and 2020. Due to BD’s extreme importance, the available techniques and technologies should facilitate effective data collection, storage, analysis, and visualization. Every opportunity comes with greater challenges; this paper summarizes the strengths and weaknesses of different tools associated with three broad categories of BD technologies. This enables researchers to quickly glance at the available tools’ pros and cons in one only place. This emerging field is still very young and premature. Various techniques and technologies have been designed to deal with such humungous data, but they still offer minimal efficacy to deal with BD problems completely. This is high time now that technologists, researchers, and governments pay significant attention to this vast and evolving field by investing their time and money in developing efficient tools that maximize value from it. BD also means big opportunities, big challenges, and big systems; therefore, it also requires big attention from researchers to overcome the research gaps that exist in this big field.

42 citations

Journal ArticleDOI
01 Jun 2019
TL;DR: The principle of Industry 4.0 is presented, new directions for research in system modeling, big data analysis, health management, cyber-physical system, human-machine interaction, uncertainty, jointly optimization, communication, and interfaces are proposed, and some of these challenges and opportunities for reliability engineering are discussed.
Abstract: According to the development of Industry 4.0 and increase the integration of digital, physical and human worlds, reliability engineering must evolve for addressing the existing and future challenges about that. In this paper, the principle of Industry 4.0 is presented and some of these challenges and opportunities for reliability engineering are discussed. New directions for research in system modeling, big data analysis, health management, cyber-physical system, human-machine interaction, uncertainty, jointly optimization, communication, and interfaces are proposed. Each topic can be investigated individually, but this paper summarizes them and prepared a vision about reliability engineering for consideration and discussion by the interested scientific community.

24 citations


Cites background or methods from "Big-Data Processing Techniques and ..."

  • ...Cloud-Based Big-Data processing techniques as an interesting topic have been investigated in the past decades, different computing models have been proposed based on different platform and focus, such as batchbased, stream-based, graph-based, directed acyclic graph based, Interactive-Based and Visual-Based processing [43]....

    [...]

  • ...Neural networks are a useful tool in reliability engineering, especially for remain useful lifetime prediction [43], Farsi and Hosseini used ANN to reduce noise effect and estimated a bearing lifetime [44]....

    [...]

References
More filters
Journal ArticleDOI
Jeffrey Dean1, Sanjay Ghemawat1
06 Dec 2004
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Abstract: MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.

20,309 citations

Journal ArticleDOI
Jeffrey Dean1, Sanjay Ghemawat1
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Abstract: MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google's clusters every day, processing a total of more than twenty petabytes of data per day.

17,663 citations

ReportDOI
28 Sep 2011
TL;DR: This cloud model promotes availability and is composed of five essential characteristics, three service models, and four deployment models.
Abstract: Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model promotes availability and is composed of five essential characteristics, three service models, and four deployment models.

15,145 citations

Journal ArticleDOI
TL;DR: This paper defines Cloud computing and provides the architecture for creating Clouds with market-oriented resource allocation by leveraging technologies such as Virtual Machines (VMs), and provides insights on market-based resource management strategies that encompass both customer-driven service management and computational risk management to sustain Service Level Agreement (SLA) oriented resource allocation.

5,850 citations


"Big-Data Processing Techniques and ..." refers background in this paper

  • ...SOA services are flexible, scalable, and loosely coupled [17]....

    [...]

  • ...In an SOA, services are interoperable, which means that distributed systems can communicate and exchange data with each another [17]....

    [...]

Journal ArticleDOI
TL;DR: The bulk-synchronous parallel (BSP) model is introduced as a candidate for this role, and results quantifying its efficiency both in implementing high-level language features and algorithms, as well as in being implemented in hardware.
Abstract: The success of the von Neumann model of sequential computation is attributable to the fact that it is an efficient bridge between software and hardware: high-level languages can be efficiently compiled on to this model; yet it can be effeciently implemented in hardware. The author argues that an analogous bridge between software and hardware in required for parallel computation if that is to become as widely used. This article introduces the bulk-synchronous parallel (BSP) model as a candidate for this role, and gives results quantifying its efficiency both in implementing high-level language features and algorithms, as well as in being implemented in hardware.

3,885 citations


"Big-Data Processing Techniques and ..." refers methods in this paper

  • ...Bulk Synchronous Parallel (BSP) computing paradigm was introduced by Valiant and Leslie in [21]....

    [...]

  • ...A BSP algorithm [21], [22] generates a series of super-steps, each of which executes a user-defined function in parallel that performs computations asynchronously....

    [...]