scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Blurring snapshots: Temporal inference of missing and uncertain data

20 May 2010-pp 40-50
TL;DR: This paper defines a decay function and a set of inference approaches to filling in missing and uncertain data in this continuous query, and evaluates the usefulness of this abstraction in its application to complex spatio-temporal pattern queries in pervasive computing networks.
Abstract: Many pervasive computing applications continuously monitor state changes in the environment by acquiring, interpreting and responding to information from sensors embedded in the environment. However, it is extremely difficult and expensive to obtain a continuous, complete, and consistent picture of a continuously evolving operating environment. One standard technique to mitigate this problem is to employ mathematical models that compute missing data from sampled observations thereby approximating a continuous and complete stream of information. However, existing models have traditionally not incorporated a notion of temporal validity, or the quantification of imprecision associated with inferring data values from past or future observations. In this paper, we support continuous monitoring of dynamic pervasive computing phenomena through the use of a series of snapshot queries. We define a decay function and a set of inference approaches to filling in missing and uncertain data in this continuous query.We evaluate the usefulness of this abstraction in its application to complex spatio-temporal pattern queries in pervasive computing networks.

Summary (4 min read)

Introduction

  • The emergence of pervasive computing is characterized by increased instrumentation of the physical world, including small sensing devices that allow applications to query a local area using a dynamic and distributed network for support.
  • First, the authors must be able to provide estimates of the continuous query result between adjacent snapshot queries.
  • The authors approach relies on a simple abstraction called a decay function (Section III) that quantifies the temporal validity associated with sensing a particular phenomenon.
  • The inference and its associated confidence can also provide the application a concrete sense of what the degree of the uncertainty is.

II. BACKGROUND

  • This paper builds on their previous approaches defining snapshot and continuous query fidelity and an associated middleware [15], [18].
  • These approaches approximate a continuous query using a sequence of snapshot queries evaluated over the network at discrete times.
  • Using the values of the host triple, the authors can derive physical and logical connectivity relations.
  • As one example, if the host’s context, ζ, includes the host’s location, the authors can define a physical connectivity relation based on communication range.
  • The environment evolves as the network changes, values change, and hosts exchange messages.

III. MODELING UNCERTAINTY

  • The authors approach to query processing allows users to pose continuous queries to an evolving network and receive a result that resembles a data stream even though it is obtained using discrete snapshot queries.
  • Missing and uncertain sensed items can be a bane to this process, especially in monitoring the evolution of the data.
  • On a construction site, a site supervisor may use a continuous query to monitor the total number of available bricks on the site.
  • When their snapshot queries are not impacted by any missing or uncertain data, the stable set the trend analysis generates is the actual stable set.
  • The black circles represent hosts the snapshot query directly sampled; gray circles represent hosts for which data values have been inferred.

IV. TEMPORAL INFERENCE FOR CONTINUOUS QUERIES

  • Decay functions allow applications to define the validity of projecting information across time.
  • The authors now address the question what the value of that projected data should be.
  • Specifically, the authors present a suite of simple techniques that estimate inferred values.
  • The authors also demonstrate how this inference can be combined with decay functions to associate confidence with inferred values.
  • In later sections, the authors evaluate the applicability of these inference approaches to real phenomena.

A. Nearest Neighbor Inference

  • For some applications, data value changes may be difficult to predict, for instance when the underlying process observed is unknown or arbitrary.
  • In such cases, one technique to estimate missing data is to assume the sampled value closest in time is still correct.
  • The application then sums across all of the data readings to generate a total number of bricks on the site.
  • In interpolation, the observed values are fit on a function, where the domain is typically the time of observation and the range is the attribute’s value.
  • As with interpolation, regression comes in several flavors ranging from simple techniques like linear regression to more complex non-linear variants.

C. Determining Inferencing Error

  • When employing statistical techniques like interpolation and regression, the observed data acts as the only source of ground truth and serves as input to generate a function that estimates missing or uncertain data.
  • To measure how well the model fits the ground truth, the authors define metrics that estimate the distance between the model and reality.
  • Similar measures of error for interpolation are difficult to define because the interpolation function is defined such that there is no error in fitting the sampled points to the function.
  • As Fig. 7 demonstrates, wildly different interpolation functions (e.g., polynomials of different orders) can fit the same set of sampled points.
  • That is, from among the interpolation functions that “fit” the data, the authors favor those that minimize the function’s rate of change.

D. Computing Confidence from Decay and Error

  • Interpolation Error a combination of decay functions and the error measures defined above, also known as 7.
  • If the application is inferring values between samples with low confidence, it should likely increase the frequency of sampling.
  • Things are slightly more complicated for interpolation and regression.
  • In addition to applying the decay function to the area between successful samples, the authors can also use information about the estimated inferencing error to strengthen or weaken their confidence in inferred values.
  • Minimizing error like the root mean squared error of a regression or the derivative of an interpolation can increase the confidence in an inferred value.

V. USAGE SCENARIOS

  • This approach to inferring missing and uncertain data applies to a wide variety of pervasive computing applications that are increasingly supported by sensor networks.
  • In the introduction, the authors overviewed an intelligent construction site, where people, vehicles, pieces of equipment, parts of the building, and assets are all equipped with sensors that can monitor conditions on the site and share information with applications.
  • Broadly speaking, given a series of snapshot queries formed into a continuous query, an application can issue two types of requests for information from the continuous query: point requests, for a value of the continuous query at a single point in time, and range requests, that monitor the continuous query over a specified period of time.
  • It may prove difficult to use a continuous function to infer missing values for this type of phenomenon; the authors also define a third query that more intuitively fits a continuous function: Q3:.

VI. EVALUATION

  • The authors have prototyped their framework using OMNeT++ and the MiXiM framework [12], [13].
  • The authors implemented the queries given in the previous section and evaluate their framework’s performance.
  • Requests are flooded through the network, and each node has a reply probability of 0.5, i.e., every sensor node responds to half of the requests it receives.
  • Each experiment was run for at least 50 runs.
  • Trucks moved randomly; when they left the construction site, bricks were randomly added to or removed from the truck.

A. Measuring Confidence

  • The authors first evaluate the correctness and usefulness of applying their decay functions to determine confidence in inferred responses.
  • The authors executed all three queries described previously and attempted to infer missing and uncertain data for each of them using each of the three aforementioned inference strategies.
  • Fig. 10 plots the inferencing error versus the confidence reported by their decay function; specifically the figure shows the results for applying linear interpolation to the results of Q3.
  • When their framework reports a higher confidence in an inferred value, the error of that value from the ground truth should be lower.
  • These initial experiments served simply to validate their query inferencing and decay function framework.

B. Cost Savings

  • Employing inference models allows applications to trade expense for error.
  • Given that their models allow us to blur across these dynamics, the authors are able to query the network far less frequently.
  • In addition, instead of querying every node in the network, the authors can intentionally skip some nodes in each snapshot query, also reducing the communication overhead.
  • The authors omit charts plotting the communication overhead of their approach for brevity; however, they achieve approximately a 6x reduction in communication overhead in comparison to a flooding approach that queries the network frequently enough to catch every significant change in the dynamic data (which they estimate to be every 5 seconds in their case).
  • This reduction in communication translates directly to a reduction in energy expenditures, a significant concern in resource constrained pervasive computing networks.

C. Application Performance

  • The authors next evaluate the usefulness of blurring the snapshot queries in forming a continuous query.
  • This is due to the fact that the phenomenon under observation here is subject to very local and discrete data changes.
  • Fig. 12 plots the total number of bricks, an aggregate measure that sums samples from multiple nodes.
  • The authors expected nearest neighbor inference to be preferred in this situation due to the discrete data; it would likely be the application’s choice due to its consistent performance and simplicity in comparison to interpolation.
  • While their decay function provides a strong measure of confidence for individual data estimates, it is possible to develop a more sophisticated metric for confidence in aggregate estimates that combine data values to produce a combinatorial measure (e.g., a sum, average, maximum, etc.).

VIII. CONCLUSIONS

  • Pervasive computing is increasingly supported by sensor networks that perform continuous monitoring of network or physical phenomena.
  • Continuous monitoring can be expensive in terms of communication and energy costs.
  • Therefore, continuous queries in pervasive computing applications inevitably contain missing or uncertain data items.
  • Attempting to understand trends and patterns in measuring these continuous phenomena is hindered by this inherent uncertainty.
  • The authors designed a framework that employs statistical modeling to both infer missing and uncertain data and to understand the degree of confidence an application should place in that inferred information.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Blurring Snapshots: Temporal
Inference of Missing and
Uncertain Data
TR-UTEDGE-2009-005
Vasanth Rajamani, The University of Texas at Austin
Christine Julien, The University of Texas at Austin
© Copyright 2009
The University of Texas at Austin

1
Blurring Snapshots: Temporal Inference of
Missing and Uncertain Data
Vasanth Rajamani and Christine Julien
Department of Electrical and Computer Engineering
The University of Texas at Austin
{vasanthrajamani,c.julien}@mail.utexas.edu
Abstract—Many pervasive computing applications continu-
ously monitor state changes in the environment by acquiring,
interpreting and responding to information from sensors embed-
ded in the environment. However, it is extremely difficult and
expensive to obtain a continuous, complete, and consistent picture
of a continuously evolving operating environment. One standard
technique to mitigate this problem is to employ mathematical
models that compute missing data from sampled observations
thereby approximating a continuous and complete stream of
information. However, existing models have traditionally not
incorporated a notion of temporal validity, or the quantification
of imprecision associated with inferring data values from past
or future observations. In this paper, we support continuous
monitoring of dynamic pervasive computing phenomena through
the use of a series of snapshot queries. We define a decay
function and a set of inference approaches to filling in missing
and uncertain data in this continuous query. We evaluate the
usefulness of this abstraction in its application to complex spatio-
temporal pattern queries in pervasive computing networks.
Keywords-sensor networks, queries, dynamics, interpolation
I. INTRODUCTION
As applications place an increased focus on using dis-
tributed embedded networks to monitor both physical and
network phenomena, it becomes necessary to support efficient
and robust continuous monitoring that can communicate the
uncertainty associated with data collected from a dynamic net-
work. The emergence of pervasive computing is characterized
by increased instrumentation of the physical world, including
small sensing devices that allow applications to query a local
area using a dynamic and distributed network for support. On
the roadways, all vehicles may be equipped with devices that
sense and share location, and that information can be queried
by other nearby vehicles to understand traffic flow patterns.
On an intelligent construction site, workers, equipment, assets,
and even parts of buildings may be equipped with sensors
to measure location, temperature, humidity, stress, etc., with
the goal of generating meaningful pictures of the project’s
progress and maintaining safe working conditions.
Central to these and other applications is the ability to
monitor some condition and its evolution over a period of
time. On a construction site, the amount of an available
material at a particular time may be useful, but it may be
just as useful to monitor how that material is consumed
(and resupplied) over time. Such trends are usually measured
through continuous queries that are often registered at the
remote information sources and periodically push sensed data
back to the consumers [2], [9]. Such a “push” approach to
continuous query processing requires maintaining a distributed
data structure, which can be costly in dynamic settings. In
addition, this often requires that a query issuer interact with
a collector that is known in advance and reachable at any
instant, which is often unreasonable. We have demonstrated
that, in dynamic networks, it often makes sense to generate a
continuous queries using a sequence of snapshot queries [18].
A snapshot query is distributed through the network at a
particular point in time, takes measurements of the target
phenomenon, and sends the results back to the the query issuer.
In our model (Section II), a continuous query is the integration
over time across a sequence of snapshot queries.
In generating a continuous and accurate reflection of an
evolving environment, uncertainty is introduced in several
ways [15], [16]. First, there is a significant tradeoff between
the cost of generating the continuous query result and the
quality of the result. For instance, the more frequently the
snapshot queries execute, the more closely the continuous
query reflects the ground truth, but the more expensive it is
to execute in terms of communication bandwidth and battery
power. In addition, the snapshot queries can be executed
using different protocols that consider the same tradeoff (e.g.,
consider the differences in quality and cost of a query flooded
to all hosts in the network and one probabilistically gossiped
to some subset). On a more fundamental level, the quality of
any interaction with a dynamic network is inherently affected
by the unreliability of the network—packets may be dropped
or corrupted, and communication links may break. The fact
that a continuous query fails to sense a value at a particular
instant may simply be a reflection of this inherent uncertainty.
Even when these uncertainties weaken a continuous query,
applications can still benefit if the query processing can
provide some knowledge about the degree of the uncertainty.
For example, in a continuous query on a construction site for
the amount of available material, it would be useful to know
that, with some degree of certainty (i.e., a confidence) there
is a given amount of available material. This may be based
on information collected directly from the environment (in
which case the confidence is quite high), historical trends, or
knowledge about the nature of the phenomenon. Model-driven
approaches that estimate missing data using mathematical
models can alleviate these uncertainties [6], [7]. In these
approaches, the goal is to build a model of the phenomenon
being observed and to only query the network to rebuild the

2
model when the confidence in the model has degraded to
make relying on it unacceptable. Section VII examines these
approaches and the relationship to our work in more detail.
Because we build a continuous query from a sequence of
snapshot queries, handling uncertainty is twofold. First, we
must be able to provide estimates of the continuous query
result between adjacent snapshot queries. Second, even if we
fail to sample a data point in a given snapshot, we may
have some information about that data point at a previous
time (and potentially a future time) that we may use to infer
something about the missing data. In both cases, we are not
actually changing the amount of information available to the
application; instead we are blurring the snapshot queries and
associating a level of confidence with inferred results.
Our approach relies on a simple abstraction called a decay
function (Section III) that quantifies the temporal validity
associated with sensing a particular phenomenon. We use this
decay function as the basis for performing model-assisted
inference (Section IV) to use sampled data values from the
snapshot queries to infer values into the past and future. This
inference can allow us to fill in gaps in the sequence of snap-
shot queries to enable trend analysis on the components of the
continuous query. The inference and its associated confidence
can also provide the application a concrete sense of what the
degree of the uncertainty is. Finally, by smoothing across the
available data, this inference makes the information that is
available more viewable and understandable by the application
and its user. We examine these benefits in Sections V and VI.
Our novel contributions are threefold. First, we introduce
decay functions that allow applications to define temporal
validity in a principled way. Second, we build a set of simple
statistical models that allow us to effectively blur snapshot
queries into continuous queries and use them to study the use
of model-assisted inference for a variety of different types
of dynamic phenomena. Finally, we demonstrate through an
implementation and evaluation and a set of usage scenarios
the efficacy and usefulness of using inference to fill in missing
data in real world situations. If the network supporting data
collection is highly dynamic, our approaches help mitigate the
impact of the dynamics on the inherent uncertainty; however,
even in less dynamic situations, our approach helps applica-
tions reasonably trade off the cost of executing continuous
queries for the quality of the result.
II. BACKGROUND
This paper builds on our previous approaches defining snap-
shot and continuous query fidelity and an associated middle-
ware [15], [18]. These approaches approximate a continuous
query using a sequence of snapshot queries evaluated over
the network at discrete times. We model a dynamic pervasive
computing network as a closed system of hosts, where each
host has a location and data value (though a single data value
may represent a collection of values). A host is represented as a
triple (ι, ζ, ν), where ι is the host’s identifier, ζ is its context,
and ν is its data value. The context can be simply a host’s
location, but it can be extended to include a list of neighbors,
routing tables, and other system or network information.
The global state of a network, a configuration (C), is a set
of host tuples. Given a host h in a configuration, an effective
configuration (E) is the projection of the configuration with
respect to the hosts reachable from h. Practically, h is a host
initiating a query, and E contains the hosts expected to receive
and respond to the query. To capture connectivity, we define a
binary logical connectivity relation, K, to express the ability
of a host to communicate with a neighboring host. Using the
values of the host triple, we can derive physical and logical
connectivity relations. As one example, if the host’s context,
ζ, includes the host’s location, we can define a physical
connectivity relation based on communication range. K is not
necessarily symmetric; in the cases that it is symmetric, K
specifies bi-directional communication.
The environment evolves as the network changes, values
change, and hosts exchange messages. We model network evo-
lution as a state transition system where the state space is the
set of possible configurations, and transitions are configuration
changes. A single configuration change consists of one of
the following: 1) a neighbor change: changes in hosts’ states
impact the connectivity relation, K; 2) a value change: a single
host changes its stored data value; or 3) a message exchange:
a host sends a message that is received by one or more
neighboring nodes. To refer to the connectivity relation for
a particular configuration, we assign configurations subscripts
(e.g., C
0
, C
1
, etc.) and use K
i
to refer to the connectivity
of configuration C
i
. We have also extended K to define
query reachability. Informally, this determines whether it was
possible to deliver a one-time query to and receive a response
from some host h within the sequence of configurations [17].
A snapshot query’s result (ρ) is a subset of a configuration:
it is a collection of host tuples that constitute responses to the
query. No host in the network is represented more than once
in ρ, though it is possible that a host is not represented at all
(e.g., because it was never reachable from the query issuer).
Depending on both the protocol used to execute the snapshot
query (e.g., whether the query was flooded to all hosts in the
network or whether it was gossiped) and inherent network
failures, only a subset of the reachable hosts may respond.
This results in missing and uncertain data in the results of
snapshot queries, which may result in a degradation in the
quality of and confidence in the continuous query’s result.
III. MODELING UNCERTAINTY
Our approach to query processing allows users to pose
continuous queries to an evolving network and receive a result
that resembles a data stream even though it is obtained using
discrete snapshot queries. This stream can then be analyzed
to evaluate trends in the sensed data. However, missing and
uncertain sensed items can be a bane to this process, especially
in monitoring the evolution of the data. For example, on a
construction site, a site supervisor may use a continuous query
to monitor the total number of available bricks on the site.
This query may be accomplished by associating a sensor with
each pallet of bricks; the snapshot queries collect the identity
of the pallets and the number of bricks the pallet holds. If
consecutive snapshot queries do not sample the same subset

3
of pallets, the sums they report are not comparable, resulting
in inconsistent information supplied to the site supervisor.
Consider the continuous query in Fig. 1. The three networks
on the left of the dark line show the results of the continuous
query’s first three snapshot queries. Each circle represents a
host; a circle’s color represents the host’s data value; and
lines represent connectivity. Throughout the continuous query,
some hosts depart, some arrive, and others change their data
value. In this case, the trend the application is analyzing is the
data items that remain available and unchanged throughout the
continuous query. When our snapshot queries are not impacted
by any missing or uncertain data, the stable set the trend
analysis generates is the actual stable set.
!""#"$%&'('$
)*(*$+*,-#$./*01#'$
2
3
$ 2
4
$ 2
5
$
60*7'/&($8-#9:#'$
6(*;,#$6#($
Fig. 1: A Continuous Query
Consider, however, what happens when data is missing or
uncertain, as depicted in Fig. 2. In this situation, the ground
truth (i.e., what the snapshot queries should have returned)
is equivalent to that shown in Fig. 1, but due to network
dynamics or other sources of uncertainty, the sample from host
A was not collected in the second snapshot query (ρ
1
), and the
sample from host B was not collected in the third snapshot
query (ρ
2
). Consequently the result of the trend analysis in
Fig. 2 is quite different from that in Fig. 1. On a construction
site, if the data items represent pallets of bricks, this trend
analysis may cause the site supervisor to have additional
supplies delivered when it is unnecessary or even impractical.
!""#"$%&'('$
)*(*$+*,-#$./*01#'$
2
3
$ 2
4
$ 2
5
$
60*7'/&($8-#9:#'$
6(*;,#$6#($
Fig. 2: A Continuous Query with Missing Data
One way to handle this uncertainty is to blur the snapshot
queries. In Fig. 2, given the fact that we know the network
to be dynamic, we can say with some confidence that host A
should have been represented in ρ
1
; the level of this confidence
depends on the temporal validity of the phenomenon sensed
(i.e., how long do we expect a data value to remain valid), the
frequency with which the snapshot queries are issued, and the
degree of network dynamics. The fact that A “reappeared” in
ρ
2
further increases our confidence that it may have, in fact,
been present in ρ
1
as well. Fig. 3 shows a simple example
of how this inference can be used to project data values into
future snapshots (e.g., from ρ
1
to ρ
2
) and into past snapshots
(e.g., from ρ
1
to ρ
0
). In this figure, the black circles repre-
sent hosts the snapshot query directly sampled; gray circles
represent hosts for which data values have been inferred. The
question that remains, however, is how to determine both the
values that should be associated with the inferred results and
the confidence we have in their correctness. We deal with
the former concern in the next section; here we introduce
decay functions to ascribe temporal validity to observations
and calculate confidence in unsampled (inferred) values.
!"#$%%$&'()*+*' ,-./0$&'()*+*'
1
2
' 1
3
'1
4
'
Fig. 3: Projection Forward and Backwards in Time
To address temporal validity, we rely on the intuitive ob-
servation that the closer in time an inferred value is to a
sensed sample, the more likely it is to be a correct inference.
For example, in Fig. 3, the value projected from ρ
0
to ρ
1
is
more likely to be correct than the value projected from ρ
0
to ρ
2
. If the sample missing in ρ
1
is also missing in ρ
2
, it
becomes increasingly likely that the host generating the sample
has, in fact, departed. We exploit this observation by allowing
applications to specify the temporal validity of different sensed
phenomena using a decay function that defines the validity of
a measured observation as a function of time.
Formally, a decay function is a function d(t) = f (|t t
l
|)
where t is the current time and t
l
is a time from either the
future or the past of the nearest (in time) actual sample of the
data value. The period |t t
l
| is the period of uncertainty; the
larger the period of uncertainty, the less likely it is that the
sampled value retains any correlation with the actual value.
The decay function’s value falls between 0 and 1; it is a
measure of percentage likelihood. These decay functions are
an intuitive representation of confidence and are easy for
application developers to grasp. It is also straightforward to
define decay functions to describe a variety of phenomena.
For instance, on a construction site, a moving truck’s GPS
location might be associated with a decay function of the
form: d(t) = e
(|tt
l
|)
, which is a rapid exponential drop in
confidence over time. On the other hand a GPS mounted on
a stationary sensor on the site might have a decay function of
the form: d(t) = 1 because the location value, once measured,
is not expected to change. Possibilities for formulating decay
functions are numerous and depend on the nature of the
phenomenon being sensed and the sensing environment.

4
Given a user-defined decay function, it is straightforward
to determine a confidence measure of an inferred value. We
measure this confidence probabilistically. At any time instant
t, the inferred data value’s degree of confidence p, is updated
using the following rule.
if time t is the time at which an actual data reading was
acquired, then the value of p at time t is set to 1;
otherwise, p is updated using the formula: p
t
= d(t).
Thus, at every point in time an data value of interest has an
imprecision that ranges from one to zero depending on when
it was last sampled. The further in time the inferred value is
from an actual sensed value, the less confidence it has. With
this understanding, we look next at how to estimate how a
sampled value may have changed during periods where it is
not sampled, allowing us to infer its value.
IV. TEMPORAL INFERENCE FOR CONTINUOUS QUERIES
Decay functions allow applications to define the validity of
projecting information across time. We now address the ques-
tion what the value of that projected data should be. Specif-
ically, we present a suite of simple techniques that estimate
inferred values. We also demonstrate how this inference can be
combined with decay functions to associate confidence with
inferred values. In later sections, we evaluate the applicability
of these inference approaches to real phenomena.
A. Nearest Neighbor Inference
For some applications, data value changes may be difficult
to predict, for instance when the underlying process observed
is unknown or arbitrary. These changes are usually discrete;
at some instant in time, the value changes to some potentially
unpredictable value. Consider a construction site where pallets
of bricks are distributed to different locations around the site
for storage and use. A distributed query may execute across the
site, measuring how many bricks are present at each location
at query time. The bricks are laid and restocked during the
day as trucks and construction workers perform their tasks.
Without any knowledge of the project’s goals and the rate of
brick laying at different sites, it is difficult to create a model
that effectively estimates the number of bricks at any given
location for instants that have no recorded observations.
In such cases, one technique to estimate missing data is to
assume the sampled value closest in time is still correct. As the
temporal validity decays, the sensed value is increasingly un-
reliable. Consider again the pallets of bricks on a construction
site and an application that samples the number of available
bricks periodically (e.g., every 10 minutes). The application
then sums across all of the data readings to generate a total
number of bricks on the site. Fig. 4 shows an example where
the value for the number of pallets at node A changes between
the two samples. Up until t = 5, the total number of pallets is
estimated using the original sample; after that, it is assumed
that the value is the sample taken at t = 10.
The example in Fig. 4 focuses on uncertain data; i.e.,
inferring data values that the application did not attempt to
sample. The same approach can be used to infer missing data,
e.g., if the application failed to sample a value for node A
!
"#
#$%&"'#
()#
((#
("#
("#
!
(#
#$%&("'#
()#
((#
("#
(*#
("#
)"#
+"#
,"#
*"#
-"#
"# ("#*#
./0#
12330%4#56#789:;4#
<# <#
=5>0#<#
%5%23#
Fig. 4: Nearest Neighbor Inference for Uncertain Data
at time t = 10 but did resample it at time t = 20. This
example also demonstrates the importance of inferring missing
data. Because this data is used to monitor the total number of
pallets of bricks on the site, if data values are missing from a
particular snapshot, the site supervisor might observe radical
fluctuations in the number of bricks that actually did not occur.
B. Interpolation and Regression
The evolution of many pervasive computing phenomena
can be fairly accurately represented by continuous functions.
If a truck is driving at a steady speed across the site, and
we sample its location at t = 0 and t = 10 it may be
reasonable to infer that at t = 5, the truck was at the midpoint
of a line drawn between the two sample points. In such
cases, standard statistical techniques like interpolation and
regression can be employed to infer data across snapshots.
In interpolation, the observed values are fit on a function,
where the domain is typically the time of observation and
the range is the attribute’s value. For any point in time where
there is no recorded observation, the value is estimated using
the function. Interpolation approaches range from simple (e.g.,
linear interpolation) to complex (e.g., spline interpolation).
Linear interpolation connects consecutive observations of
a data item with a line segment. Polynomial interpolation
generalizes the function to a degree higher than one; in general,
one can fit a curve through n data points using a function of
degree n1. Spline interpolation breaks set of data points into
subsets, and applied polynomial interpolation to each subset.
Fig. 5 shows an example of interpolation. The data values
sensed are the locations of the devices on a 3x4 grid; the
moving truck’s data is missing from snapshots ρ
1
and ρ
3
. The
bottom figures show how linear interpolation and an example
of polynomial interpolation estimate the missing data.
Regression identifies relationships between a dependent
sensed variable (e.g., location or temperature at a particular
device) and an independent variable (e.g., time). However,
regression does not try to fit a curve or a function through
every observed data point. Instead, the end result of regression
encodes an approximation of the relationship between the
independent and dependent variables. As with interpolation,
regression comes in several flavors ranging from simple
techniques like linear regression to more complex non-linear
variants. Effectively, regression provides a “looser fit” function
for the data; this can be effective when the underlying data is
noisy (e.g., when the samples may contain errors), and it may
not be useful to fit a curve through every observed data point,
since those data points may not be an accurate reflection of

Citations
More filters
01 Jan 2003
TL;DR: In this article, a method of learning a Bayesian model of a traveler moving through an urban environment is presented, which simultaneously learns a unified model of the traveler's current mode of transportation as well as his most likely route, in an unsupervised manner.
Abstract: We present a method of learning a Bayesian model of a traveler moving through an urban environment. This technique is novel in that it simultaneously learns a unified model of the traveler’s current mode of transportation as well as his most likely route, in an unsupervised manner. The model is implemented using particle filters and learned using Expectation-Maximization. The training data is drawn from a GPS sensor stream that was collected by the authors over a period of three months. We demonstrate that by adding more external knowledge about bus routes and bus stops, accuracy is improved.

30 citations

Book ChapterDOI
06 Dec 2011
TL;DR: This paper studies a novel way of resolving context inconsistency with the aim of minimizing such side effect for an application, and presents an efficient framework to minimize it during context inconsistency resolution.
Abstract: Applications in ubiquitous computing adapt their behavior based on contexts. The adaptation can be faulty if the contexts are subject to inconsistency. Various techniques have been proposed to identify key contexts from inconsistencies. By removing these contexts, an application is expected to run with inconsistencies resolved. However, existing practice largely overlooks an application’s internal requirements on using these contexts for adaptation. It may lead to unexpected side effect from inconsistency resolution. This paper studies a novel way of resolving context inconsistency with the aim of minimizing such side effect for an application. We model and analyze the side effect for rule-based ubiquitous applications, and experimentally measure and compare it for various inconsistency resolution strategies. We confirm the significance of such side effect if not controlled, and present an efficient framework to minimize it during context inconsistency resolution.

16 citations


Cites background from "Blurring snapshots: Temporal infere..."

  • ...Context inconsistency may also come from the failure of synchronizing all contexts [22] or the absence of a global consistency of all environmental conditions [19]....

    [...]

Journal ArticleDOI
TL;DR: The main objective of this work is to develop a generic scheduling mechanism for collaborative sensors to achieve the error-bounded scheduling control in monitoring applications and show that the approach is effective and efficient in tracking the dramatic temperature shift in dynamic environments.
Abstract: Due to the limited power constraint in sensors, dynamic scheduling with data quality management is strongly preferred in the practical deployment of long-term wireless sensor network applications. We could reduce energy consumption by turning off (i.e., duty cycling) sensor, however, at the cost of low-sensing fidelity due to sensing gaps introduced. Typical techniques treat data quality management as an isolated process for individual nodes. And existing techniques have investigated how to collaboratively reduce the sensing gap in space and time domain; however, none of them provides a rigorous approach to confine sensing error is within desirable bound when seeking to optimize the tradeoff between energy consumption and accuracy of predictions. In this paper, we propose and evaluate a scheduling algorithm based on error inference between collaborative sensor pairs, called CIES. Within a node, we use a sensing probability bound to control tolerable sensing error. Within a neighborhood, nodes can trigger additional sensing activities of other nodes when inferred sensing error has aggregately exceeded the tolerance. The main objective of this work is to develop a generic scheduling mechanism for collaborative sensors to achieve the error-bounded scheduling control in monitoring applications. We conducted simulations to investigate system performance using historical soil temperature data in Wisconsin-Minnesota area. The simulation results demonstrate that the system error is confined within the specified error tolerance bounds and that a maximum of 60 percent of the energy savings can be achieved, when the CIES is compared to several fixed probability sensing schemes such as eSense. And further simulation results show the CIES scheme can achieve an improved performance when comparing the metric of a prediction error with baseline schemes. We further validated the simulation and algorithms by constructing a lab test bench to emulate actual environment monitoring applications. The results show that our approach is effective and efficient in tracking the dramatic temperature shift in dynamic environments.

15 citations

Proceedings ArticleDOI
16 Dec 2011
TL;DR: This work proposes a collaborative scheme called CIES, based on the novel concept of error inference between collaborative sensor pairs, which is effective and efficient in tracking the dramatic temperature shift in highly dynamic environments.
Abstract: Energy constraint is a critical hurdle hindering the practical deployment of long-term wireless sensor network applications. Turning off (i.e., duty cycling) sensors could reduce energy consumption, however at the cost of low sensing fidelity due to sensing gaps introduced. Existing techniques have studied how to collaboratively reduce the sensing gap in space and time, however none of them provides a rigorous approach to confine sensing error within desirable bounds. In this work, we propose a collaborative scheme called CIES, based on the novel concept of error inference between collaborative sensor pairs. Within a node, we use a sensing probability bound to control tolerable sensing error. Within a neighborhood, nodes can trigger additional sensing activities of other nodes when inferred sensing error has aggregately exceed the tolerance. We conducted simulations to investigate system performance using historical soil temperature data in Wisconsin-Minnesota area. The simulation results demonstrate that the system error is confined within the specified error tolerance bounds and that a maximum of 60 percent of the energy savings can be achieved, when the CIES is compared to several fixed probability sensing schemes such as eSense. We further validated the simulation and algorithms by constructing a lab test-bench to emulate actual environment monitoring applications. The results show that our approach is effective and efficient in tracking the dramatic temperature shift in highly dynamic environments.

11 citations


Cites result from "Blurring snapshots: Temporal infere..."

  • ...The observations that sensor nodes demonstrate spatial correlations found in [16], [17], [18] are also supported by our preliminary experiments described in Section III....

    [...]

Journal ArticleDOI
TL;DR: A model of context inference in pervasive computing, the associated research challenges, and the significant practical impact of intelligent use of such context in pervasive health-care environments are described.
Abstract: Many emerging pervasive health-care applications require the determination of a variety of context attributes of an individual's activities and medical parameters and her surrounding environment. Context is a high-level representation of an entity's state, which captures activities, relationships, capabilities, etc. In practice, high-level context measures are often difficult to sense from a single data source and must instead be inferred using multiple sensors embedded in the environment. A key challenge in deploying context-driven health-care applications involves energy-efficient determination or inference of high-level context information from low-level sensor data streams. Because this abstraction has the potential to reduce the quality of the context information, it is also necessary to model the tradeoff between the cost of sensor data collection and the quality of the inferred context. This article describes a model of context inference in pervasive computing, the associated research challenges, and the significant practical impact of intelligent use of such context in pervasive health-care environments.

9 citations

References
More filters
Proceedings Article
01 Sep 2009
TL;DR: In this paper, the authors present a data stream system that captures data uncertainty from data collection to query processing to final result generation using probabilistic modeling and inference to generate uncertainty description for raw data, and then a suite of statistical techniques to capture changes of uncertainty as data propagates through query operators.
Abstract: We present the design and development of a data stream system that captures data uncertainty from data collection to query processing to final result generation. Our system focuses on data that is naturally modeled as continuous random variables such as many types of sensor data. To provide an end-to-end solution, our system employs probabilistic modeling and inference to generate uncertainty description for raw data, and then a suite of statistical techniques to capture changes of uncertainty as data propagates through query operators. To cope with high-volume streams, we explore advanced approximation techniques for both space and time efficiency. We are currently working with a group of scientists to evaluate our system using traces collected from real-world applications for hazardous weather monitoring and for object tracking and monitoring.

52 citations

Journal ArticleDOI
TL;DR: A novel and effective scheme for optimizing the placement of data within a distributed storage subsystem employing retention value functions is provided, to keep the data of highest overall value, while simultaneously balancing the read load to the file system.
Abstract: We consider storage in an extremely large-scale distributed computer system designed for stream processing applications. In such systems, both incoming data and intermediate results may need to be stored to enable analyses at unknown future times. The quantity of data of potential use would dominate even the largest storage system. Thus, a mechanism is needed to keep the data most likely to be used. One recently introduced approach is to employ retention value functions, which effectively assign each data object a value that changes over time in a prespecified way lDouglis et al.2004r. Storage space for data entering the system is reclaimed automatically by deleting data of the lowest current value. In such large systems, there will naturally be multiple file systems available, each with different properties. Choosing the right file system for a given incoming stream of data presents a challenge. In this article we provide a novel and effective scheme for optimizing the placement of data within a distributed storage subsystem employing retention value functions. The goal is to keep the data of highest overall value, while simultaneously balancing the read load to the file system. The key aspects of such a scheme are quite different from those that arise in traditional file assignment problems. We further motivate this optimization problem and describe a solution, comparing its performance to other reasonable schemes via simulation experiments.

20 citations

Journal ArticleDOI
TL;DR: The authors propose a novel model-based view, the Markovian stream, to represent correlated probabilistic sequences, and suggest applications interested in evaluating event queries-extracting sophisticated state sequences-can improve robustness by querying a MarkOVian stream view instead of querying raw data directly.
Abstract: Building applications on top of sensor data streams is challenging because sensor data is noisy. A model-based view can reduce noise by transforming raw sensor streams into streams of probabilistic state estimates, which smooth out errors and gaps. The authors propose a novel model-based view, the Markovian stream, to represent correlated probabilistic sequences. Applications interested in evaluating event queries-extracting sophisticated state sequences-can improve robustness by querying a Markovian stream view instead of querying raw data directly. The primary challenge is to properly handle the Markovian stream's correlations.

11 citations

Journal ArticleDOI
TL;DR: A protocol for query processing that automatically assesses and adaptively provides an achievable degree of consistency given the operational environment throughout its execution is proposed.
Abstract: Queries are convenient abstractions for the discovery of information and services, as they offer content-based information access. In distributed settings, query semantics are well-defined, for example, queries are often designed to satisfy ACID transactional properties. When query processing is introduced in a dynamic network setting, achieving transactional semantics becomes complex due to the open and unpredictable environment. In this article, we propose a query processing model for mobile ad hoc and sensor networks that is suitable for expressing a wide range of query semantics; the semantics differ in the degree of consistency with which query results reflect the state of the environment during query execution. We introduce several distinct notions of consistency and formally express them in our model. A practical and significant contribution of this article is a protocol for query processing that automatically assesses and adaptively provides an achievable degree of consistency given the operational environment throughout its execution. The protocol attaches an assessment of the achieved guarantee to returned query results, allowing precise reasoning about a query with a range of possible semantics. We evaluate the performance of this protocol and demonstrate the benefits accrued to applications through examples drawn from an industrial application.

10 citations


"Blurring snapshots: Temporal infere..." refers background in this paper

  • ...In generating a continuous and accurate reflection of an evolving environment, uncertainty is introduced in several ways [15], [16]....

    [...]

  • ...This paper builds on our previous approaches defining snapshot and continuous query fidelity and an associated middleware [15], [18]....

    [...]

Book ChapterDOI
28 Mar 2009
TL;DR: This paper introduces the notion of inquiry mode to allow the user to exercise control over the query processing policy so as to match the level of accuracy to the requirements of the task.
Abstract: This paper focuses on the information gathering support needs for enterprises that operate over wireless mobile ad hoc networks While queries are a convenient way to obtain information, the highly dynamic nature of such networks makes it difficult to ensure a precise match between the results returned by a query and the actual state of the enterprise However, decisions can be made based on the perceived quality of the information retrieved; specialized query support is needed to control and assess the accuracy of the query results In this paper, we introduce the notion of inquiry mode to allow the user to exercise control over the query processing policy so as to match the level of accuracy to the requirements of the task In addition, we describe the use of query introspection, a process for assessing the fitness of a particular inquiry mode Both concepts are formalized, illustrated, and evaluated

9 citations


"Blurring snapshots: Temporal infere..." refers background in this paper

  • ...Informally, this determines whether it was possible to deliver a one-time query to and receive a response from some host h within the sequence of configurations [17]....

    [...]

Frequently Asked Questions (1)
Q1. What have the authors contributed in "Blurring snapshots: temporal inference of missing and uncertain data tr-utedge-2009-005" ?

In this paper, the authors support continuous monitoring of dynamic pervasive computing phenomena through the use of a series of snapshot queries. The authors evaluate the usefulness of this abstraction in its application to complex spatiotemporal pattern queries in pervasive computing networks.