scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Big data and advanced analytics tools

TL;DR: The study of big data 5V's definition, Analysis requirements, tools, frame works and different type of cloud based big data analytics tools provide by different companies and functioning of Hadoop or MapReduce Process is dealt with.
Abstract: Nowadays a big data analytics is a very broad area for both academia and industry. Big data analytics has attracted intense interest from all academia and industry recently for its attempt to extract knowledge, information and wisdom form big data. Big data and cloud computing, two of the most important trends that are defining the new emerging analytical tools. Big data analytical capabilities using cloud delivery models could ease adoption for many industry, and most important thinking to cost saving, it could simplify useful insights that could providing them with different kinds of competitive advantage. Many companies to provide online Big Data analytical tools some of the top most companies like Amazon Big data Analytics Platform, HIVE web based Interface, SAP Big data Analytics, IBM InfoSphere BigInsights, TERADATA Big Data Analytics, 1010data Big Data Platform, Cloudera Big Data Solution etc. Those companies analyze huge amount of data with help of different type of tools and also provide easy or simple user interface for analyzing data. This paper deals with the study of big data 5V's definition, Analysis requirements, tools, frame works and different type of cloud based big data analytics tools provide by different companies and functioning of Hadoop or MapReduce Process.
Citations
More filters
Book ChapterDOI
21 Nov 2016
TL;DR: The research method show that students can generate personalized activities and offer academic advising, and some opportunities of Big Data analytics to develop the efficiency and effectiveness of students learning and maximize their knowledge retention are introduced.
Abstract: The use of Big Data systems in the field of education allows to envisage new approaches and new learning contexts. Indeed, the rapid emergence of the new e-learning platforms have been presented in many interests. However, the quality of the teaching service rendered depends on the capacity of the learning approaches to be provided to learners, content and learning path tailored to their needs. In this paper, we will present how Big Data helps to solve education issues through reaching the objective of learning. Then, we will introduce some opportunities of Big Data analytics to develop the efficiency and effectiveness of students learning and maximize their knowledge retention. Finally, our research method show that students can generate personalized activities and offer academic advising. Big Data can expose the capabilities of learners, predict their future performances and offer assistance for educational organizations to make strategic decisions.

22 citations

Proceedings ArticleDOI
15 May 2018
TL;DR: This paper presents how big data technologies in the context of smart cities are used to implement a framework with a prototype R Shiny application to analyze road traffic and pollution data to make a step towards smart mobility.
Abstract: A smart city is a modern and visionary approach for a city to provide intelligent and smart urban services by using information and communication technologies (ICT). The Internet of Things (IoT), emerging through intelligent networking and sensing technologies, is seen as the data-driven enabler for smart cities with current and future infrastructures. The Open Data Aarhus datasets have been created from sensor data in the city Aarhus in Denmark. This paper presents how big data technologies in the context of smart cities are used to implement a framework with a prototype R Shiny application to analyze road traffic and pollution data to make a step towards smart mobility. The main objective of the approach is the calculation and visualization of the least polluted route from a chosen start to an end point by applying an algorithm utilizing the MapReduce framework running on a Hadoop cluster.

20 citations


Cites methods from "Big data and advanced analytics too..."

  • ...It is used to store and retrieve data within a Hadoop architecture [7]....

    [...]

Journal ArticleDOI
TL;DR: An energy-efficient model for Mobile Big Data was developed which addressed key limitations in mobile device processing and analytics and reduced execution time and limited battery resources and was supported with the development of three new algorithms for the effective use of resources, energy saving, parallel processing and Analytics customization.

15 citations

Journal ArticleDOI
TL;DR: This paper provides an efficient mechanism to perform opinion mining by coming up with a finish to finish pipeline with the assistance of Apache Flume, Apache HDFS, and Apache Pig.
Abstract: Twitter, one of the largest and famous social media site receives millions of tweets every day on variety of important topic. This large amount of raw data can be used for industrial , Social, Economic, Government policies or business purpose by organizing according to our need and processing. Hadoop is one of the best tool options for twitter data analysis and hadoop works for distributed Big data , Streaming data , Time Stamped data , text data etc. This paper discuss how to use FLUME for extracting twitter data and store it into HDFS for opinion mining because twitter contains variety of opinions on various topics so we have to analyse these opinions using hadoop and its ecosystems to check every tweets polarity either tweets contains positive ,negative or neutral opinions on particular topic. This paper provides an efficient mechanism to perform opinion mining by coming up with a finish to finish pipeline with the assistance of Apache Flume ,Apache HDFS, and Apache Pig. Here we have used dictionary based approach for analysis for which we have implemented pig statements through which we can analysis these complex twitter data to check polarity of the tweets based on the polarity dictionary through which we can say that which tweets have negative opinion or positive opinion.

11 citations

Proceedings ArticleDOI
01 Oct 2019
TL;DR: Light is shed on IoT technology, and its relationship with big data evolution scene is shown, then the importance of the new paradigm known as fog computing is shown to overcome most of these problems.
Abstract: Many vital domains such as smart building, smart cities, healthcare, agriculture, and environment monitoring, take benefits from IoT (Internet of Things) technology emergence. Recently, the amount of IoT generated data becomes huge, creating many research challenges in data management topic, in another side, transmission of these amounts of data requires large bandwidth, and generate significant delay. From this vision, we prepare this paper to help new searchers interested in IoT data management topic, to find out a good starting point. For this reason, we shed light on IoT technology, and show its relationship with big data evolution scene, then show the importance of the new paradigm known as fog computing to overcome most of these problems. At first, we introduce the five elements of an IoT object and common communication models used by smart objects, then we discuss comparison of hardware IoT platforms, and recommendations to take into consideration while choosing a hardware IoT platform, in the next pages we introduce cloud IoT infrastructures, followed by a presentation of fog computing advantages and challenges.

9 citations

References
More filters
Journal ArticleDOI
Jeffrey Dean1, Sanjay Ghemawat1
06 Dec 2004
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Abstract: MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.

20,309 citations


"Big data and advanced analytics too..." refers methods in this paper

  • ...Master node which is typically used to store the replica of metadata of the NameNode for handing Letdowns [14][15][16][17]....

    [...]

Journal ArticleDOI
Jeffrey Dean1, Sanjay Ghemawat1
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Abstract: MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google's clusters every day, processing a total of more than twenty petabytes of data per day.

17,663 citations

Book
29 May 2009
TL;DR: This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoops clusters.
Abstract: Hadoop: The Definitive Guide helps you harness the power of your data. Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters. Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you: Use the Hadoop Distributed File System (HDFS) for storing large datasets, and run distributed computations over those datasets using MapReduce Become familiar with Hadoop's data and I/O building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud Use Pig, a high-level query language for large-scale data processing Take advantage of HBase, Hadoop's database for structured and semi-structured data Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems If you have lots of data -- whether it's gigabytes or petabytes -- Hadoop is the perfect solution. Hadoop: The Definitive Guide is the most thorough book available on the subject. "Now you have the opportunity to learn about Hadoop from a master-not only of the technology, but also of common sense and plain talk." -- Doug Cutting, Hadoop Founder, Yahoo!

3,797 citations

Proceedings ArticleDOI
01 Dec 2012
TL;DR: In this paper, the authors have done prototype implementation of Hadoop cluster, HDFS storage and Map Reduce framework for processing large data sets by considering prototype of big data application scenarios.
Abstract: The size of the databases used in today's enterprises has been growing at exponential rates day by day. Simultaneously, the need to process and analyze the large volumes of data for business decision making has also increased. In several business and scientific applications, there is a need to process terabytes of data in efficient manner on daily bases. This has contributed to the big data problem faced by the industry due to the inability of conventional database systems and software tools to manage or process the big data sets within tolerable time limits. Processing of data can include various operations depending on usage like culling, tagging, highlighting, indexing, searching, faceting, etc operations. It is not possible for single or few machines to store or process this huge amount of data in a finite time period. This paper reports the experimental work on big data problem and its optimal solution using Hadoop cluster, Hadoop Distributed File System (HDFS) for storage and using parallel processing to process large data sets using Map Reduce programming framework. We have done prototype implementation of Hadoop cluster, HDFS storage and Map Reduce framework for processing large data sets by considering prototype of big data application scenarios. The results obtained from various experiments indicate favorable results of above approach to address big data problem.

203 citations


"Big data and advanced analytics too..." refers methods in this paper

  • ...Master node which is typically used to store the replica of metadata of the NameNode for handing Letdowns [14][15][16][17]....

    [...]