scispace - formally typeset
Open AccessProceedings ArticleDOI

NStreamAware: real-time visual analytics for data streams to enhance situational awareness

TLDR
This work proposes a system that uses modern distributed processing technologies to analyze streams using stream slices, which are presented to analysts in a web-based visual analytics application, called NVisAware, and visually guides the user in the feature selection process to summarize the slices.
Abstract
The analysis of data streams is important in many security-related domains to gain situational awareness. To provide monitoring and visual analysis of such data streams, we propose a system, called NStreamAware, that uses modern distributed processing technologies to analyze streams using stream slices, which are presented to analysts in a web-based visual analytics application, called NVisAware. Furthermore, we visually guide the user in the feature selection process to summarize the slices to focus on the most interesting parts of the stream based on introduced expert knowledge of the analyst. We show through case studies, how the system can be used to gain situational awareness and eventually enhance network security. Furthermore, we apply the system to a social media data stream to compete in an international challenge to evaluate the applicability of our approach to other domains.

read more

Content maybe subject to copyright    Report

NStreamAware: Real-Time Visual Analytics for Data
Streams to Enhance Situational Awareness
Fabian Fischer
University of Konstanz, Germany
Fabian.Fischer@uni-konstanz.de
Daniel A. Keim
University of Konstanz, Germany
Daniel.Keim@uni-konstanz.de
ABSTRACT
The analysis of data streams is important in many security-
related domains to gain situational awareness. To provide
monitoring and visual analysis of such data streams, we
propose a system, called NStreamAware, that uses mod-
ern distributed processing technologies to analyze streams
using stream slices, which are presented to analysts in a
web-based visual analytics application, called NVisAware.
Furthermore, we visually guide the user in the feature se-
lection process to summarize the slices to focus on the most
interesting parts of the stream based on introduced expert
knowledge of the analyst. We show through case studies,
how the system can be used to gain situational awareness
and eventually enhance network security. Furthermore, we
apply the system to a social media data stream to compete
in an international challenge to evaluate the applicability of
our approach to other domains.
Categories and Subject Descriptors
C.2.0 [Computer-Communication Networks]: General—
Security and protection; C.3.8 [Computer Graphics]: Ap-
plication; H.5.2 [Information Interfaces and Presenta-
tion]: User Interfaces
General Terms
Real-Time Processing, Data Streams, Situational Aware-
ness, Network Security, Visual Analytics
1. INTRODUCTION
In many security-related scenarios the analysis and situ-
ational assessment of data streams is crucial to detect sus-
picious behavior, to monitor and understand ongoing activ-
ities, or to reduce streams to focus on the most relevant
parts. For example, in the field of system and network ad-
ministration, network routers and servers produce a con-
tinuous stream of NetFlow records or system log messages,
and hundreds of system metrics and performance data. In
some times, analysts do a close real-time monitoring, while
in other situations analysts have no choice, but to focus
only on the most important parts of a data stream. The
same is true in the field of law enforcement in the analysis
of criminal activities of ongoing threats to maintain situ-
ational awareness (SA). In this scenario, analysts need to
handle streams of possibly important social media messages
and call center messages. Both scenarios are technically re-
lated and show the high importance of research in the field
of data stream analysis with the analyst in the loop that
is a key to enhance situational awareness. The challenge
in this field is also to merge and aggregate heterogeneous
high velocity data streams. While we do have a wide variety
of highly-scalable databases and there has been much re-
search in intrusion and anomaly detection, fully automated
systems are not working sufficiently. To convey and support
understanding, generate insights, and evaluate hypothesis,
analysts needs to have a central role in such a system, to
not loose context, and to be able to judge data provenance.
The ultimate goal allows the analysts to actually get an
idea what is going on in a data stream to gain situational
awareness. Such analysts are often “being asked to make
decisions on ill-defined problems. These problems may con-
tain uncertain or incomplete data, and are often complex to
piece together. Consequently, decision makers rely heavily
on intuition, knowledge and experience” [14], which high-
lights the need to guide analysts to the right parts of a data
stream, because it is impossible to analyze everything in the
same level of detail.
In this paper, we introduce NStreamAware, which is a vi-
sual analytics system designed to address this challenge us-
ing latest analysis technologies available from the big data
analysis community [20] and real-time visual analytics re-
search [12].
The main contributions of our work are the following:
Firstly, a system architecture, called NStreamAware, based
on Apache Spark Streaming [2] to summarize incoming data
streams in sliding slices. Secondly, a web-based visual an-
alytics application, called NVisAware, using a novel com-
bination of various visualization techniques within multiple
sliding slices to visually summarize the data stream based
on selected features steered by a visual analytics interface.
The remainder of this paper is structured as follows: Sec-
tion 2 elaborates on important design considerations. Sec-
tion 3 gives an overview of related work. Section 4 describes
the different aspects of our approach, while the evaluation is
discussed in Section 5. Section 6 discusses limitations and
future work and concludes with Section 7.
65
Konstanzer Online-Publikations-System (KOPS)
URL: http://nbn-resolving.de/urn:nbn:de:bsz:352-0-267315
Erschienen in: Proceedings of the Eleventh Workshop on Visualization for Cyber Security (VizSec '14), Paris, France, November 10, 2014 / Lane
Harrison ... (Hrsg.). - New York, NY : ACM, 2014. - S. 65-72. - ISBN 978-1-4503-2826-5

2. DESIGN CONSIDERATIONS
Based on the given problem, experience, and expert feed-
back with earlier work in the field, we identified following
design considerations and principles as crucial for our ap-
proach.
DC1 Incorporate novel scalable analytics methods:
Scalable, distributed, and proven large-scale analysis
frameworks must be building blocks of a system able
to address big data problems. We need to take ad-
vantage of such novel technologies from the big data
community and use them in visual analytics applica-
tion. We need to bring those worlds together and keep
the analyst in the loop to address complex problems.
DC2 Enabling real-time monitoring: While it is not
possible to present all raw messages for high speed
streams, it is still relevant for many scenarios, where
analysts want to closely monitor messages from a par-
ticular system, or based on a specific filter criteria in
real-time. Many available visual analytics systems,
however, still do require a static batch loading first.
We see the need to be able to directly push data to our
system in a streaming fashion, and be able to smoothly
switch between monitoring and exploration.
DC3 Deterministic screen updates, independent from
underlying data streams: The problem in systems
supporting DC2 is the high cognitive load for the an-
alysts when analyzing real-time streams. Because of
the unpredictable characteristics of data streams with
respect to volume, velocity, variety, and veracity, we
additionally need visualizations able to decouple the
flow-rate of a data stream from screen updates and
keep the latter constant and predictable to not over-
whelm the user. There is a trade off between DC2 and
DC3 to achieve both at the same time.
DC4 Fusion of heterogeneous data sources: Many avail-
able systems do focus on individual data sources, and
provide less flexibility to incorporate and correlate var-
ious heterogeneous data sources. However, focusing
on particular individual data sources helps to develop
highly effective specific visualization systems. On the
other hand, it is important to cover a broader field of
scenarios and tasks, to provide better situational as-
sessment.
DC5 User-steered feature selection: Feature selection
is an important field to support analysts using appro-
priate visualization and interaction techniques. Our
goal is to enhance understanding of data streams and
provide more compact overviews. In this process, we
want to integrate the human in the workflow that re-
quires a tight coupling of visual representations, inter-
action and analytic methods.
3. RELATED WORK
The contributions of our work are related to various re-
search fields, so we discuss various areas in the following
section. Many researchers focus on the algorithmic analysis
of data streams, especially in the field of stream clustering
[1] and event detection. In recent years, there was a focus
on social data streams, because of the wide availability of
such data. While most of these systems focus on the de-
tection of events, our work contributes more in the field of
visualizing a condensed heterogeneous data stream to fo-
cus on more interesting changes, omitting or merging less
interesting ranges to eventually focus on important parts
in more detail. This idea is related to the work of Xie et
al. [19] proposing a fully-automated merging algorithm for
time-series data streams.
A recent study by Wanner et al. [18] takes a look at the
evolution of visual analytics applications for event detection
for text streams and concludes that “visualizations were pri-
marily used as presentation, but had no interaction possible
to steer the underlying data processing algorithm”. This
confirms our assumption, that many systems do not cover
DC5 appropriately. Our approach differs, that we provide
interactions, so that users are able to steer the feature se-
lection process. Therefore, the system does not only rely
on the fully-automated selection of interesting parts, but on
the user-adjusted feature set. The ultimate goal of visual
analytics systems for data streams is to enhance situational
awareness to facilitate decision making. Endsley provides a
widely used generic definition of SA. It “is the perception
of the elements in the environment within a volume of time
and space, the comprehension of their meaning, and the pro-
jection of their status in the near future” [6]. Further work
makes it clear, that situation awareness primarily resides “in
the minds of humans”, while situation assessment better de-
scribes the “process or set of processes” leading to the state
of SA [16]. In the complex field of computer network secu-
rity operations, only a combination of various tools used by
experienced domain experts, will eventually be able to guide
the user to such cognitive state. Franke and Brynielsson [9]
give a systematic literature overview specifically for the field
of cyber situational awareness.
Furthermore, there is not only work on SA systems, but
also visualization techniques (e.g., [7]) designed to convey
the current state of the network to best support situational
assessment. ELVIS [10] is a highly interactive system to an-
alyze system log data, but cannot be applied to real-time
streams. SnortView [11] focus on the specific analysis of
intrusion detection alerts and does satisfy DC2. The focus
of Event Visualizer [8], is to provide real-time visualizations
for event data streams (e.g., system log data) to provide
real-time monitoring and possibilities to smoothly switch to
exploration mode covering DC2 and DC4. In contrast to
this event-based approach, Best et al. [3] proposes another
real-time system to enhance situational awareness using the
analysis of network traffic based on LiveRAC [13]. The ana-
lyzed and aggregated time-series are displayed in a zoomable
tabular interface to provide the analyst an interactive explo-
ration interface for time-series data, while our approach is
more general to include also other data types (e.g., frequent
words or users, hierarchical overviews) addressing DC4.
Additionally, Shiravi et al. [15] provides an extensive
overview of various visualization systems for network secu-
rity based on five major use case classes: Host/Server Moni-
toring, Internal/External Monitoring, Port Activity, Attack
Patterns, and Routing Behavior. The authors also identi-
fied the fact, that most security visualization systems, in
their current state, are mostly suitable for offline forensics
analysis”, while “real-time processing of network events re-
quires extensive resources, both in terms of the computation
power required to process an event, as well as the amount
66

of memory needed to store the aggregated statistics” [15].
Compared to work specifically found in one of these use case
classes, our approach tries to combine multiple use cases into
a real-time visual analytics system and addresses the scala-
bility issues using Apache Spark.
4. VISUAL ANALYTICS SYSTEM
In the following, we describe the building blocks of NStrea-
mAware. The overall architecture can be seen in Figure 1.
To process the data stream, we made use of various mod-
ern technologies to provide a scalable infrastructure for our
modular visual analytics system. Our architecture consists
of our REST Service, Spark Service and a web application
with various visualizations, called NVisAware. To provide
proven and scalable data processing, we make use of Apache
Spark
1
, RabbitMQ
2
, ElasticSearch
3
, and MongoDB
4
.
The REST Service (1) connects to the data streams (2)
and preprocesses the data and calculates various additional
information for the incoming events. The service does also
provide a REST interface to retrieve historical data or man-
age insights. All events are stored to a distributed Elas-
ticSearch cluster and are forwarded to our message broker
RabbitMQ.
The Spark Service (3), which runs on top of the Apache
Spark Streaming platform for analytics, generates real-time
summaries on sliding windows, and stores them to a Mon-
goDB database (4). Spark Streaming is a development frame-
work to help to implement analytical algorithms executed
in large distributed cluster environments to provide scala-
bility even in big data scenarios. The Spark Service is im-
plemented using Scala and calculates various statistics and
features based on sliding windows. Table 1 shows a selec-
tion of calculated example features for a network security
use case. We call these summaries, which are generated in
a regular interval, sliding slices. Those slices and also a se-
lection of raw messages are eventually forwarded to our web
application NVisAware (5), so that they can be visualized
in the graphical user interface to the analyst using various
interactive real-time displays. All modules are loosely cou-
pled, so that they can be run on separate computers or in
cluster environments to achieve best performance for large-
scale data streams.
4.1 REST Service Module
The REST Service (1), which is implemented as multi-
threaded standalone Java application, provides a REST in-
terface accessible by all other modules, especially the web
application. This REST service is used to handle job queu-
ing and to answer data requests. To attach new data streams,
the respective jobs can be sent to the service via a defined
REST API. The job is added as new thread and the API
can be used to control or retrieve status information about
these running jobs. Incoming messages from the data stream
are then preprocessed, fields are extracted, and eventually
treated as individual events, enriched with various addi-
tional attributes. The procedure is based on the assigned
scenario configuration. For social media messages, sentiment
values are calculated, while for IP-related data geo lookups
1
https://spark.apache.org/
2
http://www.rabbitmq.com/
3
http://www.elasticsearch.org/
4
http://www.mongodb.com/
can be made. In practice, many server do not provide very
accurate timestamps, therefore, a new field with the current
timestamp is added as well, to have a more accurate tim-
ings in cases where the workstation does not make use of the
network time protocol or uses deviating time settings.
Figure 1: System Architecture: NStreamAware uses
various modern systems, including Apache Spark,
RabbitMQ, MongoDB, and ElasticSearch, to pro-
vide the needed scalability for an interactive visual
analytics application.
4.2 Module for Spark Streaming
Apache Spark provides distributed memory abstraction,
that is fault-tolerant and efficient. This helps to program
distributed data processing applications without worrying
about fault-tolerance. Apache Spark introduces a program-
ming model, called Resilient Distributed Datasets (RDDs),
which provide an interface to coarse-grained transformations
(e.g., map, group-by, filter, join). The RDDs can be ad-
dresses within Scala similar to normal collections, however,
they are indeed spread over the underlying cluster machines.
If a transformation is called on a RDD, the execution is ac-
tually done on various worker machines. When an action is
called (e.g., count), the result is retrieved from all workers
to return final results. We use the streaming extension of
Apache Spark and use the same programming model to an-
alyze data streams in real-time. We define a sliding window
and connect to a RabbitMQ queue to receive messages for-
warded by the REST Service. Currently, we defined various
feature types to be calculated on the incoming messages:
count, set, new-set, key-value list, and key-array list. All
features as seen in Table 1 for example belong to one of
these message types. After calculating the various features,
they are directly stored to a MongoDB collection. When all
features are ready, NVisAware is notified via RabbitMQ to
retrieve the sliding slice content via the REST API using
the appropriate database queries. Count provides a simple
counter of number of messages. A set stores a list of unique
values occurred within a sliding window, while a new-set
67

feature will only include values, which have never been seen
in the whole stream before. A key-value list can be used
to count the number of occurrences for all words to gather
a list of frequent words. The key-array list can be used to
store for each key an array of values. This can be used, for
example, to track for each IP address, all used port numbers
in the sliding window.
Feature Type Stream
#events count Syslog
timestamps set Syslog
#pr ograms count Syslog
#hosts count Syslog
#frequentW ords count Syslog
programs key-value list Syslog
hosts key-value list Syslog
frequentW ords key-value list Syslog
newHosts new-set Syslog
newP rograms new-set Syslog
srcAddr key-value list NetFlow
dstAddr key-value list NetFlow
srcP orts key-value list NetFlow
dstP orts key-value list NetFlow
topT alker key-array list NetFlow
#srcAddr count NetFlow
#dstAddr count NetFlow
#srcP orts count NetFlow
#dstP orts count NetFlow
ossecAlerts key-value list OSSEC
Table 1: Selection of aggregation features for each
sliding slice generated by our implemented analysis
and aggregation module.
4.3 NVisAware Web Application
The graphical user interface is provided by our web ap-
plication NVisAware, which provides various displays. The
application is written in HTML5 and JavaScript using var-
ious visualization libraries. The display consists of multiple
configuration and parameter views and six main tabs: Real-
Time Data Stream, Real-Time Sliding Slices, Visual Feature
Selection, Summarized Sliding Slices, Event Timeline & In-
sights, and Search & Exploration. The first display can be
seen in Figure 3 and is used to take a look at the raw mes-
sages in the data stream.
4.4 Real-Time Sliding Slices
To visually represent the generated sliding slices, we pro-
vide a novel visualization with various embedded charts like
word clouds, node-link diagrams, treemaps, and counters
within each slice. The slices are juxtapositioned next to
each other to provide a timeline based on consecutive slices
as seen in Figure 4. The prominent background color uses
a colormap from dark green over white to pink based on a
diverging ColorBrewer set. The color indicates a similarity
score to the previous slice to alarm the analyst. In the up-
per left corner a star icon can be used, to store the slice for
further investigations. The slice will also be added to the
Event Timeline & Insights view, where all starred objects
are presented in a traditional interactive timeline to explore
the events flagged and labeled by the analysts.
4.5 Visual Feature Selection
In many situations, the analyst is not interested in fol-
lowing the data stream in real-time. However, in some
cases a summary of the current data stream should be pro-
vided. Fully-automated summarizations are hard to achieve
for complex heterogeneous data streams. Therefore, we pro-
vide a visual feature selection interface, to steer the merging
algorithm based on the user’s criteria.
All count features in Table 1 can directly be used in the
feature timelines in Figure 2. More features can be derived
from key-value lists. For example the occurrences over time
of a specific word found in the stream. Each feature time-
line contains many values, one value for each sliding slice
observed so far. This data is processed on the server side
and each feature timeline is cut into segments: Each time-
line is clustered using the DBSCAN algorithm. Afterwards,
consecutive slices belonging to the same cluster are merged
to a segment. The start and end points of these possibly
important segments are visible as vertical colored lines and
through the background shading within the timelines. The
analyst can visually interpret these segments, modify them,
or add new segments for interesting parts, which were not
detected by the algorithm. The analyst can remove or re-
order the features using drag and drop.
Figure 2: Visual Feature Selection: The analyst is in
the loop to steer the merging algorithm to provide
meaningful summaries of sliding slices.
The final feature order and selection is sent to the REST
service, where all segments are merged together with the
given constraints, while ignoring low-ranked conflicting fea-
tures and keeping non-conflicting and more specific segments.
Eventually, the original sliding slices can be compressed
according to the resulting heuristic merge and importance
model. Less important segments are merged together pro-
viding a multi-focal scaling of the data stream steered by
the analyst according to the tasks at hand.
68

5. EVALUATION
In general, it is quite challenging to evaluate complex vi-
sual analytics applications. Individual design decisions can
be formally evaluated in user studies and many decisions are
indeed based on perception studies. However, proper evalua-
tion of complex expert applications is more than to evaluate
all individual design decisions. Describing convincing use
cases or presenting case studies with experts are often the
only reasonable ways. However, also these results are often
subjective and hard to compare to alternative approaches.
Another reason is, that “insight, the major aim of visual
analytics, is ill-defined and hard to measure” [17]. This is
even more true, if we are talking about a mental state of sit-
uational awareness as goal of the system. Generally, there
is also a lack of proper ground truth, and the sensitive na-
ture of the involved data streams makes it hard to share the
data. With that respect international challenges that pro-
vide complex but anonymous data streams are very helpful
for a proper evaluation based on gained insights.
Having this in mind we decided to go for two directions
of evaluations. Firstly, we describe a case study, how our
system can be used in an operational computer network of
a working group to help the system administrator to stay
informed about the most important activities. Secondly, to
evaluate the real-time capabilities of our system and the in-
sights management, we actively participated in VAST Chal-
lenge 2014 with an early version of our prototype.
5.1 Application for Network Security
To show the capabilities of our system, we implemented
our system in a computer network of a working group with
about 85 active local devices including workstations, mobile
devices, and servers, producing about 1.4 million NetFlow
records per day with peaks up to 10 000 records per minute.
13 servers are connected to a central syslog server, produc-
ing 30 000 to 80 000 messages per day with individual peaks
of up to 5 000 messages per minute. These servers are also
monitored using OSSEC [4], which is a widely used “host-
based intrusion detection system that performs log analysis,
file integrity checking, policy monitoring, rootkit detection,
real-time alerting and active response”. The generated alerts
are also pushed to the central syslog server. With this infras-
tructure in place, we were able to forward the data streams
to our REST Service to make them available for NStrea-
mAware. In the following, we made use of the system log
stream (SL), NetFlow stream (NF), and OSSEC alert stream
(OS). It would be easy, to further include additional data
from the underlying network, for example, system metrics,
Snort alerts, or web server access logs.
The analyst opened the web application NVisAware in a
modern web browser and added the data streams as jobs to
the server-side REST Service. Seconds later, the first mes-
sages appeared in the Real-Time Data Streams tab as seen
in Figure 3. This view is a split-screen showing the real-
time events of SL and OS as textual messages, similar to a
traditional tail -f command on UNIX systems. The bottom
window presents a zoomable geographical map to plot and
cluster extracted geographic locations. NF records are not
plotted to the geographic map, because a geographic map of
the total IP traffic will most likely not provide actionable in-
sights. However, mapping specific IP addresses of successful
logins can be worth monitoring to identify suspicious be-
havior or to reveal misuse of login credentials. Furthermore,
Figure 3: Real-Time Data Stream: Display to mon-
itor the incoming live streams as raw messages and
plot extracted geographic locations to a map.
real-time filtering and search can be applied to reduce the
number of live events shown in the display.
The Spark Service was operated in local mode on a nor-
mal workstation Dell OptiPlex 980, Core i7-860, 8GB RAM
4x 2.80GHz with 10 separate working threads. To provide
further scalability the service could also be deployed to a
cluster of hardware machines running Apache Spark or to a
cloud-based deployment. To provide a new sliding slice every
30 seconds, we initialized the system with a batch and slide
interval of 30s and a window length of 60s. These settings
depend on the general characteristics of the data streams.
To reduce the cognitive load, the analyst decided to switch
to the real-time sliding slices visualization as seen in Figure 4
showing an example of five consecutive slices. The interac-
tive display can be explored by the analyst while new slices
are continuously added to the right in regular intervals to
support situational awareness. The first slice contains criti-
cal OSSEC alerts (L5, L10, L3) visualized in a small treemap
widget (1). Alerts with a severity of 10 should warn the an-
alyst of ongoing security issues, which should be explored
using drill-down functions. Those alerts are related to au-
thentication issues as seen in the word cloud (2). Another
treemap widget in the first slice (3) gives an overview of
involved programs. The third slice suddenly reveals a high
port usage (4), which can be recognized at the port counter.
The treemap of source hosts (5) reveals the source host. The
analyst can use the IP-Port node-link diagram based on NF
(6) to visually explore those suspicious connections.
Later on, the analyst decided to not look on all sliding
slices, but to compress the view based on specific features.
Figure 5 shows that the analyst is interested in slices with
highly critical OSSEC alerts of level 10, segments based on
the number of syslog messages received, and based on the
number of destination ports utilized in the computer net-
work. Based on this selection the slices are merged accord-
ingly. (1) relates to the segments relating to a port scan.
After that, there were no important slices according to the
feature selection, so a long time span is merged to a single
summary slice (2). The analyst was also interested in the
message drop in (3). Then various OSSEC alerts occurred
in multiple sliding slices (4). This area seams to be highly
suspicious, leading to many individual summary slices to
provide more details. Eventually, there are further suspi-
cious events based on NF data in (5) and another peak with
OSSEC alerts in (6) related to invalid SSH logins.
69

Citations
More filters
Journal ArticleDOI

Temporal MDS Plots for Analysis of Multivariate Data

TL;DR: The proposed Temporal Multidimensional Scaling (TMDS) is a novel visualization technique that computes temporal one-dimensional MDS plots for multivariate data which evolve over time and enable visual identification of patterns based on multidimensional similarity of the data evolving over time.
Journal ArticleDOI

BubbleNet: A Cyber Security Dashboard for Visualizing Patterns

TL;DR: To overcome constraints, the design study employed a user‐centered design process and a variety of methods to incorporate user feedback throughout the design of BubbleNet, a cyber security dashboard to help network analysts identify and summarize patterns within the data.
Proceedings ArticleDOI

Unlocking user-centered design methods for building cyber security visualizations

TL;DR: This paper discusses three design methods and illustrates how each method informed two real-world cyber security visualization projects which resulted in successful deployments to users.
Journal ArticleDOI

Commercial Visual Analytics Systems–Advances in the Big Data Analytics Field

TL;DR: Five years after the first state-of-the-art report on Commercial Visual Analytics Systems, a reevaluation of the Big Data Analytics field finds that innovation and research-driven development are increasingly sacrificed to satisfy a wide range of user groups.
Journal ArticleDOI

Human Factors in Streaming Data Analysis: Challenges and Opportunities for Information Visualization

TL;DR: The goal is to study how the state of the art in streaming data visualization handles the challenges and reflect on the gaps and opportunities by studying how visualization design meets challenges specific to change perception.
References
More filters
Journal ArticleDOI

Toward a Theory of Situation Awareness in Dynamic Systems

TL;DR: A theoretical model of situation awareness based on its role in dynamic human decision making in a variety of domains is presented and design implications for enhancing operator situation awareness and future directions for situation awareness research are explored.
Book ChapterDOI

A framework for clustering evolving data streams

TL;DR: A fundamentally different philosophy for data stream clustering is discussed which is guided by application-centered requirements and uses the concepts of a pyramidal time frame in conjunction with a microclustering approach.
Proceedings Article

Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters

TL;DR: D-Streams support a new recovery mechanism that improves efficiency over the traditional replication and upstream backup solutions in streaming databases: parallel recovery of lost state across the cluster.
Journal ArticleDOI

A Survey of Visualization Systems for Network Security

TL;DR: A comprehensive review of network security visualization is offered and a taxonomy in the form of five use-case classes encompassing nearly all recent works in this area is provided.
Journal ArticleDOI

Cyber situational awareness – A systematic review of the literature

TL;DR: A systematic and up-to-date review of the scientific literature on cyber situational awareness is presented, based on systematic queries in four leading scientific databases.
Related Papers (5)
Frequently Asked Questions (14)
Q1. What contributions have the authors mentioned in the paper "Nstreamaware: real-time visual analytics for data streams to enhance situational awareness" ?

To provide monitoring and visual analysis of such data streams, the authors propose a system, called NStreamAware, that uses modern distributed processing technologies to analyze streams using stream slices, which are presented to analysts in a web-based visual analytics application, called NVisAware. Furthermore, the authors visually guide the user in the feature selection process to summarize the slices to focus on the most interesting parts of the stream based on introduced expert knowledge of the analyst. The authors show through case studies, how the system can be used to gain situational awareness and eventually enhance network security. Furthermore, the authors apply the system to a social media data stream to compete in an international challenge to evaluate the applicability of their approach to other domains. 

However, the system still needs to be applied to a larger computer network, which is part of the future work. Automatically defining good sizes for the sliding windows is also planed for the future. The merging model based on the feature selection process, could be applied to the realtime stream in the future, to actually merge sliding slices in real-time, which is not fully implemented yet. Tracking individual events over time was not the focus of this work, however, more work seems to be promising to extend the approach in that respect as well. 

Apache Spark introduces a programming model, called Resilient Distributed Datasets (RDDs), which provide an interface to coarse-grained transformations (e.g., map, group-by, filter, join). 

The ultimate goal of visual analytics systems for data streams is to enhance situational awareness to facilitate decision making. 

Because of the unpredictable characteristics of data streams with respect to volume, velocity, variety, and veracity, the authors additionally need visualizations able to decouple the flow-rate of a data stream from screen updates and keep the latter constant and predictable to not overwhelm the user. 

Their architecture consists of their REST Service, Spark Service and a web application with various visualizations, called NVisAware. 

To provide a new sliding slice every 30 seconds, the authors initialized the system with a batch and slide interval of 30s and a window length of 60s. 

To provide further scalability the service could also be deployed to a cluster of hardware machines running Apache Spark or to a cloud-based deployment. 

13 servers are connected to a central syslog server, producing 30 000 to 80 000 messages per day with individual peaks of up to 5 000 messages per minute. 

The Spark Service was operated in local mode on a normal workstation Dell OptiPlex 980, Core i7-860, 8GB RAM 4x 2.80GHz with 10 separate working threads. 

When displaying hundreds of sliding slices at the same time the performance decreased, because of browser and memory restrictions of the workstation. 

A first analysis had to be sent to the organizers within three hours after first connecting to the final data stream from 20:00 to 21:30, which could only be streamed once, to force the participants to do real-time processing and provide immediate situational assessment under time pressure. 

Because of an ongoing conflict between an organization known as the Protectors of Kronos (POK), they are suspected in the disappearance. 

The slice will also be added to the Event Timeline & Insights view, where all starred objects are presented in a traditional interactive timeline to explore the events flagged and labeled by the analysts.