scispace - formally typeset
Search or ask a question
Author

Daniel Puschmann

Bio: Daniel Puschmann is an academic researcher from University of Surrey. The author has contributed to research in topics: Data stream mining & Raw data. The author has an hindex of 7, co-authored 9 publications receiving 478 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: The CityPulse framework supports smart city service creation by means of a distributed system for semantic discovery, data analytics, and interpretation of large-scale (near-)real-time Internet of Things data and social media data streams to break away from silo applications and enable cross-domain data integration.
Abstract: Our world and our lives are changing in many ways. Communication, networking, and computing technologies are among the most influential enablers that shape our lives today. Digital data and connected worlds of physical objects, people, and devices are rapidly changing the way we work, travel, socialize, and interact with our surroundings, and they have a profound impact on different domains, such as healthcare, environmental monitoring, urban systems, and control and management applications, among several other areas. Cities currently face an increasing demand for providing services that can have an impact on people’s everyday lives. The CityPulse framework supports smart city service creation by means of a distributed system for semantic discovery, data analytics, and interpretation of large-scale (near-)real-time Internet of Things data and social media data streams. To goal is to break away from silo applications and enable cross-domain data integration. The CityPulse framework integrates multimodal, mixed quality, uncertain and incomplete data to create reliable, dependable information and continuously adapts data processing techniques to meet the quality of information requirements from end users. Different than existing solutions that mainly offer unified views of the data, the CityPulse framework is also equipped with powerful data analytics modules that perform intelligent data aggregation, event detection, quality assessment, contextual filtering, and decision support. This paper presents the framework, describes its components, and demonstrates how they interact to support easy development of custom-made applications for citizens. The benefits and the effectiveness of the framework are demonstrated in a use-case scenario implementation presented in this paper.

199 citations

Journal ArticleDOI
TL;DR: A survey of the requirements and solutions and challenges in the area of information abstraction and an efficient workflow to extract meaningful information from raw sensor data based on the current state-of-the-art in this area are provided.
Abstract: The term Internet of Things (IoT) refers to the interaction and communication between billions of devices that produce and exchange data related to real-world objects (i.e. things). Extracting higher level information from the raw sensory data captured by the devices and representing this data as machine-interpretable or human-understandable information has several interesting applications. Deriving raw data into higher level information representations demands mechanisms to find, extract, and characterize meaningful abstractions from the raw data. This meaningful abstractions then have to be presented in a human and/or machine-understandable representation. However, the heterogeneity of the data originated from different sensor devices and application scenarios such as e-health, environmental monitoring, and smart home applications, and the dynamic nature of sensor data make it difficult to apply only one particular information processing technique to the underlying data. A considerable amount of methods from machine-learning, the semantic web, as well as pattern and data mining have been used to abstract from sensor observations to information representations. This paper provides a survey of the requirements and solutions and describes challenges in the area of information abstraction and presents an efficient workflow to extract meaningful information from raw sensor data based on the current state-of-the-art in this area. This paper also identifies research directions at the edge of information abstraction for sensor data. To ease the understanding of the abstraction workflow process, we introduce a software toolkit that implements the introduced techniques and motivates to apply them on various data sets.

139 citations

Proceedings ArticleDOI
01 Sep 2014
TL;DR: A framework for real-time semantic annotation of streaming IoT data to support dynamic integration into the Web using the Advanced Message Queuing Protocol (AMPQ) will enable delivery of large volume of data that can influence the performance of the smart city systems that use IoT data.
Abstract: Internet of Things is a generic term that refers to interconnection of real-world services which are provided by smart objects and sensors that enable interaction with the physical world. Cities are also evolving into large interconnected ecosystems in an effort to improve sustainability and operational efficiency of the city services and infrastructure. However, it is often difficult to perform real-time analysis of large amount of heterogeneous data and sensory information that are provided by various sources. This paper describes a framework for real-time semantic annotation of streaming IoT data to support dynamic integration into the Web using the Advanced Message Queuing Protocol (AMPQ). This will enable delivery of large volume of data that can influence the performance of the smart city systems that use IoT data. We present an information model to represent summarisation and reliability of stream data. The framework is evaluated with the data size and average exchanged message time using summarised and raw sensor data. Based on a statistical analysis, a detailed comparison between various sensor points is made to investigate the memory and computational cost for the stream annotation framework.

108 citations

Journal ArticleDOI
TL;DR: This work proposes a method which determines how many different clusters can be found in a stream based on the data distribution, and demonstrates how the number of clusters in a real-world data stream can be determined by analyzing the data distributions.
Abstract: The emergence of the Internet of Things (IoT) has led to the production of huge volumes of real-world streaming data. We need effective techniques to process IoT data streams and to gain insights and actionable information from real-world observations and measurements. Most existing approaches are application or domain dependent. We propose a method which determines how many different clusters can be found in a stream based on the data distribution. After selecting the number of clusters, we use an online clustering mechanism to cluster the incoming data from the streams. Our approach remains adaptive to drifts by adjusting itself as the data changes. We benchmark our approach against state-of-the-art stream clustering algorithms on data streams with data drift. We show how our method can be applied in a use case scenario involving near real-time traffic data. Our results allow to cluster, label, and interpret IoT data streams dynamically according to the data distribution. This enables to adaptively process large volumes of dynamic data online based on the current situation. We show how our method adapts itself to the changes. We demonstrate how the number of clusters in a real-world data stream can be determined by analyzing the data distributions.

86 citations

Journal ArticleDOI
TL;DR: A framework for real-time semantic annotation and aggregation of data streams to support dynamic integration into the Web using the advanced message queuing protocol and suggests that regardless of utilized segmentation approach, it is desirable to find the optimal data aggregation parameters in order to reduce the energy consumption and improve the data aggregation quality.
Abstract: With the growing popularity of information and communications technologies and information sharing and integration, cities are evolving into large interconnected ecosystems by using smart objects and sensors that enable interaction with the physical world. However, it is often difficult to perform real-time analysis of large amount on heterogeneous data and sensory information that are provided by various resources. This paper describes a framework for real-time semantic annotation and aggregation of data streams to support dynamic integration into the Web using the advanced message queuing protocol. We provide a comprehensive analysis on the effect of adaptive and nonadaptive window size in segmentation of time series using SensorSAX and symbolic aggregate approximation (SAX) approaches for data streams with different variation and sampling rate in real-time processing. The framework is evaluated with three parameters, namely window size parameter of the SAX algorithm, sensitivity level, and minimum window size parameters of the SensorSAX algorithm based on the average data aggregation and annotation time, CPU consumption, data size, and data reconstruction rate. Based on a statistical analysis, a detailed comparison between various sensor points is made to investigate the memory and computational cost of the stream-processing framework. Our results suggests that regardless of utilized segmentation approach, due to the fact that each geographically different sensory environment has got a different dynamicity level, it is desirable to find the optimal data aggregation parameters in order to reduce the energy consumption and improve the data aggregation quality.

36 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A combined IoT-based system for smart city development and urban planning using Big Data analytics, consisting of various types of sensor deployment, including smart home sensors, vehicular networking, weather and water sensors, smart parking sensors, and surveillance objects is proposed.

701 citations

Journal ArticleDOI
TL;DR: This article assesses the different machine learning methods that deal with the challenges in IoT data by considering smart cities as the main use case and presents a taxonomy of machine learning algorithms explaining how different techniques are applied to the data in order to extract higher level information.

690 citations

Journal ArticleDOI
TL;DR: This paper comprehensively presents a tutorial on three typical edge computing technologies, namely mobile edge computing, cloudlets, and fog computing, and the standardization efforts, principles, architectures, and applications of these three technologies are summarized and compared.

442 citations

Journal ArticleDOI
TL;DR: A review is conducted to map the research landscape of smart home based on Internet of Things into a coherent taxonomy and identifies the basic characteristics of this emerging field in the following aspects: motivation of using IoT in smart home applications, open challenges hindering utilization, and recommendations to improve the acceptance and use of smartHome IoT applications in literature.

413 citations

Journal ArticleDOI
TL;DR: This paper provides a list of criteria for making selections along with an analysis of the advantages and drawbacks of three different processing paradigms along with a comparison of engines that implement them, including MapReduce, Spark, Flink, Storm, and H2O.
Abstract: With an ever-increasing amount of options, the task of selecting machine learning tools for big data can be difficult. The available tools have advantages and drawbacks, and many have overlapping uses. The world’s data is growing rapidly, and traditional tools for machine learning are becoming insufficient as we move towards distributed and real-time processing. This paper is intended to aid the researcher or professional who understands machine learning but is inexperienced with big data. In order to evaluate tools, one should have a thorough understanding of what to look for. To that end, this paper provides a list of criteria for making selections along with an analysis of the advantages and drawbacks of each. We do this by starting from the beginning, and looking at what exactly the term “big data” means. From there, we go on to the Hadoop ecosystem for a look at many of the projects that are part of a typical machine learning architecture and an understanding of how everything might fit together. We discuss the advantages and disadvantages of three different processing paradigms along with a comparison of engines that implement them, including MapReduce, Spark, Flink, Storm, and H2O. We then look at machine learning libraries and frameworks including Mahout, MLlib, SAMOA, and evaluate them based on criteria such as scalability, ease of use, and extensibility. There is no single toolkit that truly embodies a one-size-fits-all solution, so this paper aims to help make decisions smoother by providing as much information as possible and quantifying what the tradeoffs will be. Additionally, throughout this paper, we review recent research in the field using these tools and talk about possible future directions for toolkit-based learning.

379 citations