scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Blurring snapshots: Temporal inference of missing and uncertain data

20 May 2010-pp 40-50
TL;DR: This paper defines a decay function and a set of inference approaches to filling in missing and uncertain data in this continuous query, and evaluates the usefulness of this abstraction in its application to complex spatio-temporal pattern queries in pervasive computing networks.
Abstract: Many pervasive computing applications continuously monitor state changes in the environment by acquiring, interpreting and responding to information from sensors embedded in the environment. However, it is extremely difficult and expensive to obtain a continuous, complete, and consistent picture of a continuously evolving operating environment. One standard technique to mitigate this problem is to employ mathematical models that compute missing data from sampled observations thereby approximating a continuous and complete stream of information. However, existing models have traditionally not incorporated a notion of temporal validity, or the quantification of imprecision associated with inferring data values from past or future observations. In this paper, we support continuous monitoring of dynamic pervasive computing phenomena through the use of a series of snapshot queries. We define a decay function and a set of inference approaches to filling in missing and uncertain data in this continuous query.We evaluate the usefulness of this abstraction in its application to complex spatio-temporal pattern queries in pervasive computing networks.

Summary (4 min read)

Introduction

  • The emergence of pervasive computing is characterized by increased instrumentation of the physical world, including small sensing devices that allow applications to query a local area using a dynamic and distributed network for support.
  • First, the authors must be able to provide estimates of the continuous query result between adjacent snapshot queries.
  • The authors approach relies on a simple abstraction called a decay function (Section III) that quantifies the temporal validity associated with sensing a particular phenomenon.
  • The inference and its associated confidence can also provide the application a concrete sense of what the degree of the uncertainty is.

II. BACKGROUND

  • This paper builds on their previous approaches defining snapshot and continuous query fidelity and an associated middleware [15], [18].
  • These approaches approximate a continuous query using a sequence of snapshot queries evaluated over the network at discrete times.
  • Using the values of the host triple, the authors can derive physical and logical connectivity relations.
  • As one example, if the host’s context, ζ, includes the host’s location, the authors can define a physical connectivity relation based on communication range.
  • The environment evolves as the network changes, values change, and hosts exchange messages.

III. MODELING UNCERTAINTY

  • The authors approach to query processing allows users to pose continuous queries to an evolving network and receive a result that resembles a data stream even though it is obtained using discrete snapshot queries.
  • Missing and uncertain sensed items can be a bane to this process, especially in monitoring the evolution of the data.
  • On a construction site, a site supervisor may use a continuous query to monitor the total number of available bricks on the site.
  • When their snapshot queries are not impacted by any missing or uncertain data, the stable set the trend analysis generates is the actual stable set.
  • The black circles represent hosts the snapshot query directly sampled; gray circles represent hosts for which data values have been inferred.

IV. TEMPORAL INFERENCE FOR CONTINUOUS QUERIES

  • Decay functions allow applications to define the validity of projecting information across time.
  • The authors now address the question what the value of that projected data should be.
  • Specifically, the authors present a suite of simple techniques that estimate inferred values.
  • The authors also demonstrate how this inference can be combined with decay functions to associate confidence with inferred values.
  • In later sections, the authors evaluate the applicability of these inference approaches to real phenomena.

A. Nearest Neighbor Inference

  • For some applications, data value changes may be difficult to predict, for instance when the underlying process observed is unknown or arbitrary.
  • In such cases, one technique to estimate missing data is to assume the sampled value closest in time is still correct.
  • The application then sums across all of the data readings to generate a total number of bricks on the site.
  • In interpolation, the observed values are fit on a function, where the domain is typically the time of observation and the range is the attribute’s value.
  • As with interpolation, regression comes in several flavors ranging from simple techniques like linear regression to more complex non-linear variants.

C. Determining Inferencing Error

  • When employing statistical techniques like interpolation and regression, the observed data acts as the only source of ground truth and serves as input to generate a function that estimates missing or uncertain data.
  • To measure how well the model fits the ground truth, the authors define metrics that estimate the distance between the model and reality.
  • Similar measures of error for interpolation are difficult to define because the interpolation function is defined such that there is no error in fitting the sampled points to the function.
  • As Fig. 7 demonstrates, wildly different interpolation functions (e.g., polynomials of different orders) can fit the same set of sampled points.
  • That is, from among the interpolation functions that “fit” the data, the authors favor those that minimize the function’s rate of change.

D. Computing Confidence from Decay and Error

  • Interpolation Error a combination of decay functions and the error measures defined above, also known as 7.
  • If the application is inferring values between samples with low confidence, it should likely increase the frequency of sampling.
  • Things are slightly more complicated for interpolation and regression.
  • In addition to applying the decay function to the area between successful samples, the authors can also use information about the estimated inferencing error to strengthen or weaken their confidence in inferred values.
  • Minimizing error like the root mean squared error of a regression or the derivative of an interpolation can increase the confidence in an inferred value.

V. USAGE SCENARIOS

  • This approach to inferring missing and uncertain data applies to a wide variety of pervasive computing applications that are increasingly supported by sensor networks.
  • In the introduction, the authors overviewed an intelligent construction site, where people, vehicles, pieces of equipment, parts of the building, and assets are all equipped with sensors that can monitor conditions on the site and share information with applications.
  • Broadly speaking, given a series of snapshot queries formed into a continuous query, an application can issue two types of requests for information from the continuous query: point requests, for a value of the continuous query at a single point in time, and range requests, that monitor the continuous query over a specified period of time.
  • It may prove difficult to use a continuous function to infer missing values for this type of phenomenon; the authors also define a third query that more intuitively fits a continuous function: Q3:.

VI. EVALUATION

  • The authors have prototyped their framework using OMNeT++ and the MiXiM framework [12], [13].
  • The authors implemented the queries given in the previous section and evaluate their framework’s performance.
  • Requests are flooded through the network, and each node has a reply probability of 0.5, i.e., every sensor node responds to half of the requests it receives.
  • Each experiment was run for at least 50 runs.
  • Trucks moved randomly; when they left the construction site, bricks were randomly added to or removed from the truck.

A. Measuring Confidence

  • The authors first evaluate the correctness and usefulness of applying their decay functions to determine confidence in inferred responses.
  • The authors executed all three queries described previously and attempted to infer missing and uncertain data for each of them using each of the three aforementioned inference strategies.
  • Fig. 10 plots the inferencing error versus the confidence reported by their decay function; specifically the figure shows the results for applying linear interpolation to the results of Q3.
  • When their framework reports a higher confidence in an inferred value, the error of that value from the ground truth should be lower.
  • These initial experiments served simply to validate their query inferencing and decay function framework.

B. Cost Savings

  • Employing inference models allows applications to trade expense for error.
  • Given that their models allow us to blur across these dynamics, the authors are able to query the network far less frequently.
  • In addition, instead of querying every node in the network, the authors can intentionally skip some nodes in each snapshot query, also reducing the communication overhead.
  • The authors omit charts plotting the communication overhead of their approach for brevity; however, they achieve approximately a 6x reduction in communication overhead in comparison to a flooding approach that queries the network frequently enough to catch every significant change in the dynamic data (which they estimate to be every 5 seconds in their case).
  • This reduction in communication translates directly to a reduction in energy expenditures, a significant concern in resource constrained pervasive computing networks.

C. Application Performance

  • The authors next evaluate the usefulness of blurring the snapshot queries in forming a continuous query.
  • This is due to the fact that the phenomenon under observation here is subject to very local and discrete data changes.
  • Fig. 12 plots the total number of bricks, an aggregate measure that sums samples from multiple nodes.
  • The authors expected nearest neighbor inference to be preferred in this situation due to the discrete data; it would likely be the application’s choice due to its consistent performance and simplicity in comparison to interpolation.
  • While their decay function provides a strong measure of confidence for individual data estimates, it is possible to develop a more sophisticated metric for confidence in aggregate estimates that combine data values to produce a combinatorial measure (e.g., a sum, average, maximum, etc.).

VIII. CONCLUSIONS

  • Pervasive computing is increasingly supported by sensor networks that perform continuous monitoring of network or physical phenomena.
  • Continuous monitoring can be expensive in terms of communication and energy costs.
  • Therefore, continuous queries in pervasive computing applications inevitably contain missing or uncertain data items.
  • Attempting to understand trends and patterns in measuring these continuous phenomena is hindered by this inherent uncertainty.
  • The authors designed a framework that employs statistical modeling to both infer missing and uncertain data and to understand the degree of confidence an application should place in that inferred information.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Blurring Snapshots: Temporal
Inference of Missing and
Uncertain Data
TR-UTEDGE-2009-005
Vasanth Rajamani, The University of Texas at Austin
Christine Julien, The University of Texas at Austin
© Copyright 2009
The University of Texas at Austin

1
Blurring Snapshots: Temporal Inference of
Missing and Uncertain Data
Vasanth Rajamani and Christine Julien
Department of Electrical and Computer Engineering
The University of Texas at Austin
{vasanthrajamani,c.julien}@mail.utexas.edu
Abstract—Many pervasive computing applications continu-
ously monitor state changes in the environment by acquiring,
interpreting and responding to information from sensors embed-
ded in the environment. However, it is extremely difficult and
expensive to obtain a continuous, complete, and consistent picture
of a continuously evolving operating environment. One standard
technique to mitigate this problem is to employ mathematical
models that compute missing data from sampled observations
thereby approximating a continuous and complete stream of
information. However, existing models have traditionally not
incorporated a notion of temporal validity, or the quantification
of imprecision associated with inferring data values from past
or future observations. In this paper, we support continuous
monitoring of dynamic pervasive computing phenomena through
the use of a series of snapshot queries. We define a decay
function and a set of inference approaches to filling in missing
and uncertain data in this continuous query. We evaluate the
usefulness of this abstraction in its application to complex spatio-
temporal pattern queries in pervasive computing networks.
Keywords-sensor networks, queries, dynamics, interpolation
I. INTRODUCTION
As applications place an increased focus on using dis-
tributed embedded networks to monitor both physical and
network phenomena, it becomes necessary to support efficient
and robust continuous monitoring that can communicate the
uncertainty associated with data collected from a dynamic net-
work. The emergence of pervasive computing is characterized
by increased instrumentation of the physical world, including
small sensing devices that allow applications to query a local
area using a dynamic and distributed network for support. On
the roadways, all vehicles may be equipped with devices that
sense and share location, and that information can be queried
by other nearby vehicles to understand traffic flow patterns.
On an intelligent construction site, workers, equipment, assets,
and even parts of buildings may be equipped with sensors
to measure location, temperature, humidity, stress, etc., with
the goal of generating meaningful pictures of the project’s
progress and maintaining safe working conditions.
Central to these and other applications is the ability to
monitor some condition and its evolution over a period of
time. On a construction site, the amount of an available
material at a particular time may be useful, but it may be
just as useful to monitor how that material is consumed
(and resupplied) over time. Such trends are usually measured
through continuous queries that are often registered at the
remote information sources and periodically push sensed data
back to the consumers [2], [9]. Such a “push” approach to
continuous query processing requires maintaining a distributed
data structure, which can be costly in dynamic settings. In
addition, this often requires that a query issuer interact with
a collector that is known in advance and reachable at any
instant, which is often unreasonable. We have demonstrated
that, in dynamic networks, it often makes sense to generate a
continuous queries using a sequence of snapshot queries [18].
A snapshot query is distributed through the network at a
particular point in time, takes measurements of the target
phenomenon, and sends the results back to the the query issuer.
In our model (Section II), a continuous query is the integration
over time across a sequence of snapshot queries.
In generating a continuous and accurate reflection of an
evolving environment, uncertainty is introduced in several
ways [15], [16]. First, there is a significant tradeoff between
the cost of generating the continuous query result and the
quality of the result. For instance, the more frequently the
snapshot queries execute, the more closely the continuous
query reflects the ground truth, but the more expensive it is
to execute in terms of communication bandwidth and battery
power. In addition, the snapshot queries can be executed
using different protocols that consider the same tradeoff (e.g.,
consider the differences in quality and cost of a query flooded
to all hosts in the network and one probabilistically gossiped
to some subset). On a more fundamental level, the quality of
any interaction with a dynamic network is inherently affected
by the unreliability of the network—packets may be dropped
or corrupted, and communication links may break. The fact
that a continuous query fails to sense a value at a particular
instant may simply be a reflection of this inherent uncertainty.
Even when these uncertainties weaken a continuous query,
applications can still benefit if the query processing can
provide some knowledge about the degree of the uncertainty.
For example, in a continuous query on a construction site for
the amount of available material, it would be useful to know
that, with some degree of certainty (i.e., a confidence) there
is a given amount of available material. This may be based
on information collected directly from the environment (in
which case the confidence is quite high), historical trends, or
knowledge about the nature of the phenomenon. Model-driven
approaches that estimate missing data using mathematical
models can alleviate these uncertainties [6], [7]. In these
approaches, the goal is to build a model of the phenomenon
being observed and to only query the network to rebuild the

2
model when the confidence in the model has degraded to
make relying on it unacceptable. Section VII examines these
approaches and the relationship to our work in more detail.
Because we build a continuous query from a sequence of
snapshot queries, handling uncertainty is twofold. First, we
must be able to provide estimates of the continuous query
result between adjacent snapshot queries. Second, even if we
fail to sample a data point in a given snapshot, we may
have some information about that data point at a previous
time (and potentially a future time) that we may use to infer
something about the missing data. In both cases, we are not
actually changing the amount of information available to the
application; instead we are blurring the snapshot queries and
associating a level of confidence with inferred results.
Our approach relies on a simple abstraction called a decay
function (Section III) that quantifies the temporal validity
associated with sensing a particular phenomenon. We use this
decay function as the basis for performing model-assisted
inference (Section IV) to use sampled data values from the
snapshot queries to infer values into the past and future. This
inference can allow us to fill in gaps in the sequence of snap-
shot queries to enable trend analysis on the components of the
continuous query. The inference and its associated confidence
can also provide the application a concrete sense of what the
degree of the uncertainty is. Finally, by smoothing across the
available data, this inference makes the information that is
available more viewable and understandable by the application
and its user. We examine these benefits in Sections V and VI.
Our novel contributions are threefold. First, we introduce
decay functions that allow applications to define temporal
validity in a principled way. Second, we build a set of simple
statistical models that allow us to effectively blur snapshot
queries into continuous queries and use them to study the use
of model-assisted inference for a variety of different types
of dynamic phenomena. Finally, we demonstrate through an
implementation and evaluation and a set of usage scenarios
the efficacy and usefulness of using inference to fill in missing
data in real world situations. If the network supporting data
collection is highly dynamic, our approaches help mitigate the
impact of the dynamics on the inherent uncertainty; however,
even in less dynamic situations, our approach helps applica-
tions reasonably trade off the cost of executing continuous
queries for the quality of the result.
II. BACKGROUND
This paper builds on our previous approaches defining snap-
shot and continuous query fidelity and an associated middle-
ware [15], [18]. These approaches approximate a continuous
query using a sequence of snapshot queries evaluated over
the network at discrete times. We model a dynamic pervasive
computing network as a closed system of hosts, where each
host has a location and data value (though a single data value
may represent a collection of values). A host is represented as a
triple (ι, ζ, ν), where ι is the host’s identifier, ζ is its context,
and ν is its data value. The context can be simply a host’s
location, but it can be extended to include a list of neighbors,
routing tables, and other system or network information.
The global state of a network, a configuration (C), is a set
of host tuples. Given a host h in a configuration, an effective
configuration (E) is the projection of the configuration with
respect to the hosts reachable from h. Practically, h is a host
initiating a query, and E contains the hosts expected to receive
and respond to the query. To capture connectivity, we define a
binary logical connectivity relation, K, to express the ability
of a host to communicate with a neighboring host. Using the
values of the host triple, we can derive physical and logical
connectivity relations. As one example, if the host’s context,
ζ, includes the host’s location, we can define a physical
connectivity relation based on communication range. K is not
necessarily symmetric; in the cases that it is symmetric, K
specifies bi-directional communication.
The environment evolves as the network changes, values
change, and hosts exchange messages. We model network evo-
lution as a state transition system where the state space is the
set of possible configurations, and transitions are configuration
changes. A single configuration change consists of one of
the following: 1) a neighbor change: changes in hosts’ states
impact the connectivity relation, K; 2) a value change: a single
host changes its stored data value; or 3) a message exchange:
a host sends a message that is received by one or more
neighboring nodes. To refer to the connectivity relation for
a particular configuration, we assign configurations subscripts
(e.g., C
0
, C
1
, etc.) and use K
i
to refer to the connectivity
of configuration C
i
. We have also extended K to define
query reachability. Informally, this determines whether it was
possible to deliver a one-time query to and receive a response
from some host h within the sequence of configurations [17].
A snapshot query’s result (ρ) is a subset of a configuration:
it is a collection of host tuples that constitute responses to the
query. No host in the network is represented more than once
in ρ, though it is possible that a host is not represented at all
(e.g., because it was never reachable from the query issuer).
Depending on both the protocol used to execute the snapshot
query (e.g., whether the query was flooded to all hosts in the
network or whether it was gossiped) and inherent network
failures, only a subset of the reachable hosts may respond.
This results in missing and uncertain data in the results of
snapshot queries, which may result in a degradation in the
quality of and confidence in the continuous query’s result.
III. MODELING UNCERTAINTY
Our approach to query processing allows users to pose
continuous queries to an evolving network and receive a result
that resembles a data stream even though it is obtained using
discrete snapshot queries. This stream can then be analyzed
to evaluate trends in the sensed data. However, missing and
uncertain sensed items can be a bane to this process, especially
in monitoring the evolution of the data. For example, on a
construction site, a site supervisor may use a continuous query
to monitor the total number of available bricks on the site.
This query may be accomplished by associating a sensor with
each pallet of bricks; the snapshot queries collect the identity
of the pallets and the number of bricks the pallet holds. If
consecutive snapshot queries do not sample the same subset

3
of pallets, the sums they report are not comparable, resulting
in inconsistent information supplied to the site supervisor.
Consider the continuous query in Fig. 1. The three networks
on the left of the dark line show the results of the continuous
query’s first three snapshot queries. Each circle represents a
host; a circle’s color represents the host’s data value; and
lines represent connectivity. Throughout the continuous query,
some hosts depart, some arrive, and others change their data
value. In this case, the trend the application is analyzing is the
data items that remain available and unchanged throughout the
continuous query. When our snapshot queries are not impacted
by any missing or uncertain data, the stable set the trend
analysis generates is the actual stable set.
!""#"$%&'('$
)*(*$+*,-#$./*01#'$
2
3
$ 2
4
$ 2
5
$
60*7'/&($8-#9:#'$
6(*;,#$6#($
Fig. 1: A Continuous Query
Consider, however, what happens when data is missing or
uncertain, as depicted in Fig. 2. In this situation, the ground
truth (i.e., what the snapshot queries should have returned)
is equivalent to that shown in Fig. 1, but due to network
dynamics or other sources of uncertainty, the sample from host
A was not collected in the second snapshot query (ρ
1
), and the
sample from host B was not collected in the third snapshot
query (ρ
2
). Consequently the result of the trend analysis in
Fig. 2 is quite different from that in Fig. 1. On a construction
site, if the data items represent pallets of bricks, this trend
analysis may cause the site supervisor to have additional
supplies delivered when it is unnecessary or even impractical.
!""#"$%&'('$
)*(*$+*,-#$./*01#'$
2
3
$ 2
4
$ 2
5
$
60*7'/&($8-#9:#'$
6(*;,#$6#($
Fig. 2: A Continuous Query with Missing Data
One way to handle this uncertainty is to blur the snapshot
queries. In Fig. 2, given the fact that we know the network
to be dynamic, we can say with some confidence that host A
should have been represented in ρ
1
; the level of this confidence
depends on the temporal validity of the phenomenon sensed
(i.e., how long do we expect a data value to remain valid), the
frequency with which the snapshot queries are issued, and the
degree of network dynamics. The fact that A “reappeared” in
ρ
2
further increases our confidence that it may have, in fact,
been present in ρ
1
as well. Fig. 3 shows a simple example
of how this inference can be used to project data values into
future snapshots (e.g., from ρ
1
to ρ
2
) and into past snapshots
(e.g., from ρ
1
to ρ
0
). In this figure, the black circles repre-
sent hosts the snapshot query directly sampled; gray circles
represent hosts for which data values have been inferred. The
question that remains, however, is how to determine both the
values that should be associated with the inferred results and
the confidence we have in their correctness. We deal with
the former concern in the next section; here we introduce
decay functions to ascribe temporal validity to observations
and calculate confidence in unsampled (inferred) values.
!"#$%%$&'()*+*' ,-./0$&'()*+*'
1
2
' 1
3
'1
4
'
Fig. 3: Projection Forward and Backwards in Time
To address temporal validity, we rely on the intuitive ob-
servation that the closer in time an inferred value is to a
sensed sample, the more likely it is to be a correct inference.
For example, in Fig. 3, the value projected from ρ
0
to ρ
1
is
more likely to be correct than the value projected from ρ
0
to ρ
2
. If the sample missing in ρ
1
is also missing in ρ
2
, it
becomes increasingly likely that the host generating the sample
has, in fact, departed. We exploit this observation by allowing
applications to specify the temporal validity of different sensed
phenomena using a decay function that defines the validity of
a measured observation as a function of time.
Formally, a decay function is a function d(t) = f (|t t
l
|)
where t is the current time and t
l
is a time from either the
future or the past of the nearest (in time) actual sample of the
data value. The period |t t
l
| is the period of uncertainty; the
larger the period of uncertainty, the less likely it is that the
sampled value retains any correlation with the actual value.
The decay function’s value falls between 0 and 1; it is a
measure of percentage likelihood. These decay functions are
an intuitive representation of confidence and are easy for
application developers to grasp. It is also straightforward to
define decay functions to describe a variety of phenomena.
For instance, on a construction site, a moving truck’s GPS
location might be associated with a decay function of the
form: d(t) = e
(|tt
l
|)
, which is a rapid exponential drop in
confidence over time. On the other hand a GPS mounted on
a stationary sensor on the site might have a decay function of
the form: d(t) = 1 because the location value, once measured,
is not expected to change. Possibilities for formulating decay
functions are numerous and depend on the nature of the
phenomenon being sensed and the sensing environment.

4
Given a user-defined decay function, it is straightforward
to determine a confidence measure of an inferred value. We
measure this confidence probabilistically. At any time instant
t, the inferred data value’s degree of confidence p, is updated
using the following rule.
if time t is the time at which an actual data reading was
acquired, then the value of p at time t is set to 1;
otherwise, p is updated using the formula: p
t
= d(t).
Thus, at every point in time an data value of interest has an
imprecision that ranges from one to zero depending on when
it was last sampled. The further in time the inferred value is
from an actual sensed value, the less confidence it has. With
this understanding, we look next at how to estimate how a
sampled value may have changed during periods where it is
not sampled, allowing us to infer its value.
IV. TEMPORAL INFERENCE FOR CONTINUOUS QUERIES
Decay functions allow applications to define the validity of
projecting information across time. We now address the ques-
tion what the value of that projected data should be. Specif-
ically, we present a suite of simple techniques that estimate
inferred values. We also demonstrate how this inference can be
combined with decay functions to associate confidence with
inferred values. In later sections, we evaluate the applicability
of these inference approaches to real phenomena.
A. Nearest Neighbor Inference
For some applications, data value changes may be difficult
to predict, for instance when the underlying process observed
is unknown or arbitrary. These changes are usually discrete;
at some instant in time, the value changes to some potentially
unpredictable value. Consider a construction site where pallets
of bricks are distributed to different locations around the site
for storage and use. A distributed query may execute across the
site, measuring how many bricks are present at each location
at query time. The bricks are laid and restocked during the
day as trucks and construction workers perform their tasks.
Without any knowledge of the project’s goals and the rate of
brick laying at different sites, it is difficult to create a model
that effectively estimates the number of bricks at any given
location for instants that have no recorded observations.
In such cases, one technique to estimate missing data is to
assume the sampled value closest in time is still correct. As the
temporal validity decays, the sensed value is increasingly un-
reliable. Consider again the pallets of bricks on a construction
site and an application that samples the number of available
bricks periodically (e.g., every 10 minutes). The application
then sums across all of the data readings to generate a total
number of bricks on the site. Fig. 4 shows an example where
the value for the number of pallets at node A changes between
the two samples. Up until t = 5, the total number of pallets is
estimated using the original sample; after that, it is assumed
that the value is the sample taken at t = 10.
The example in Fig. 4 focuses on uncertain data; i.e.,
inferring data values that the application did not attempt to
sample. The same approach can be used to infer missing data,
e.g., if the application failed to sample a value for node A
!
"#
#$%&"'#
()#
((#
("#
("#
!
(#
#$%&("'#
()#
((#
("#
(*#
("#
)"#
+"#
,"#
*"#
-"#
"# ("#*#
./0#
12330%4#56#789:;4#
<# <#
=5>0#<#
%5%23#
Fig. 4: Nearest Neighbor Inference for Uncertain Data
at time t = 10 but did resample it at time t = 20. This
example also demonstrates the importance of inferring missing
data. Because this data is used to monitor the total number of
pallets of bricks on the site, if data values are missing from a
particular snapshot, the site supervisor might observe radical
fluctuations in the number of bricks that actually did not occur.
B. Interpolation and Regression
The evolution of many pervasive computing phenomena
can be fairly accurately represented by continuous functions.
If a truck is driving at a steady speed across the site, and
we sample its location at t = 0 and t = 10 it may be
reasonable to infer that at t = 5, the truck was at the midpoint
of a line drawn between the two sample points. In such
cases, standard statistical techniques like interpolation and
regression can be employed to infer data across snapshots.
In interpolation, the observed values are fit on a function,
where the domain is typically the time of observation and
the range is the attribute’s value. For any point in time where
there is no recorded observation, the value is estimated using
the function. Interpolation approaches range from simple (e.g.,
linear interpolation) to complex (e.g., spline interpolation).
Linear interpolation connects consecutive observations of
a data item with a line segment. Polynomial interpolation
generalizes the function to a degree higher than one; in general,
one can fit a curve through n data points using a function of
degree n1. Spline interpolation breaks set of data points into
subsets, and applied polynomial interpolation to each subset.
Fig. 5 shows an example of interpolation. The data values
sensed are the locations of the devices on a 3x4 grid; the
moving truck’s data is missing from snapshots ρ
1
and ρ
3
. The
bottom figures show how linear interpolation and an example
of polynomial interpolation estimate the missing data.
Regression identifies relationships between a dependent
sensed variable (e.g., location or temperature at a particular
device) and an independent variable (e.g., time). However,
regression does not try to fit a curve or a function through
every observed data point. Instead, the end result of regression
encodes an approximation of the relationship between the
independent and dependent variables. As with interpolation,
regression comes in several flavors ranging from simple
techniques like linear regression to more complex non-linear
variants. Effectively, regression provides a “looser fit” function
for the data; this can be effective when the underlying data is
noisy (e.g., when the samples may contain errors), and it may
not be useful to fit a curve through every observed data point,
since those data points may not be an accurate reflection of

Citations
More filters
01 Jan 2003
TL;DR: In this article, a method of learning a Bayesian model of a traveler moving through an urban environment is presented, which simultaneously learns a unified model of the traveler's current mode of transportation as well as his most likely route, in an unsupervised manner.
Abstract: We present a method of learning a Bayesian model of a traveler moving through an urban environment. This technique is novel in that it simultaneously learns a unified model of the traveler’s current mode of transportation as well as his most likely route, in an unsupervised manner. The model is implemented using particle filters and learned using Expectation-Maximization. The training data is drawn from a GPS sensor stream that was collected by the authors over a period of three months. We demonstrate that by adding more external knowledge about bus routes and bus stops, accuracy is improved.

30 citations

Book ChapterDOI
06 Dec 2011
TL;DR: This paper studies a novel way of resolving context inconsistency with the aim of minimizing such side effect for an application, and presents an efficient framework to minimize it during context inconsistency resolution.
Abstract: Applications in ubiquitous computing adapt their behavior based on contexts. The adaptation can be faulty if the contexts are subject to inconsistency. Various techniques have been proposed to identify key contexts from inconsistencies. By removing these contexts, an application is expected to run with inconsistencies resolved. However, existing practice largely overlooks an application’s internal requirements on using these contexts for adaptation. It may lead to unexpected side effect from inconsistency resolution. This paper studies a novel way of resolving context inconsistency with the aim of minimizing such side effect for an application. We model and analyze the side effect for rule-based ubiquitous applications, and experimentally measure and compare it for various inconsistency resolution strategies. We confirm the significance of such side effect if not controlled, and present an efficient framework to minimize it during context inconsistency resolution.

16 citations


Cites background from "Blurring snapshots: Temporal infere..."

  • ...Context inconsistency may also come from the failure of synchronizing all contexts [22] or the absence of a global consistency of all environmental conditions [19]....

    [...]

Journal ArticleDOI
TL;DR: The main objective of this work is to develop a generic scheduling mechanism for collaborative sensors to achieve the error-bounded scheduling control in monitoring applications and show that the approach is effective and efficient in tracking the dramatic temperature shift in dynamic environments.
Abstract: Due to the limited power constraint in sensors, dynamic scheduling with data quality management is strongly preferred in the practical deployment of long-term wireless sensor network applications. We could reduce energy consumption by turning off (i.e., duty cycling) sensor, however, at the cost of low-sensing fidelity due to sensing gaps introduced. Typical techniques treat data quality management as an isolated process for individual nodes. And existing techniques have investigated how to collaboratively reduce the sensing gap in space and time domain; however, none of them provides a rigorous approach to confine sensing error is within desirable bound when seeking to optimize the tradeoff between energy consumption and accuracy of predictions. In this paper, we propose and evaluate a scheduling algorithm based on error inference between collaborative sensor pairs, called CIES. Within a node, we use a sensing probability bound to control tolerable sensing error. Within a neighborhood, nodes can trigger additional sensing activities of other nodes when inferred sensing error has aggregately exceeded the tolerance. The main objective of this work is to develop a generic scheduling mechanism for collaborative sensors to achieve the error-bounded scheduling control in monitoring applications. We conducted simulations to investigate system performance using historical soil temperature data in Wisconsin-Minnesota area. The simulation results demonstrate that the system error is confined within the specified error tolerance bounds and that a maximum of 60 percent of the energy savings can be achieved, when the CIES is compared to several fixed probability sensing schemes such as eSense. And further simulation results show the CIES scheme can achieve an improved performance when comparing the metric of a prediction error with baseline schemes. We further validated the simulation and algorithms by constructing a lab test bench to emulate actual environment monitoring applications. The results show that our approach is effective and efficient in tracking the dramatic temperature shift in dynamic environments.

15 citations

Proceedings ArticleDOI
16 Dec 2011
TL;DR: This work proposes a collaborative scheme called CIES, based on the novel concept of error inference between collaborative sensor pairs, which is effective and efficient in tracking the dramatic temperature shift in highly dynamic environments.
Abstract: Energy constraint is a critical hurdle hindering the practical deployment of long-term wireless sensor network applications. Turning off (i.e., duty cycling) sensors could reduce energy consumption, however at the cost of low sensing fidelity due to sensing gaps introduced. Existing techniques have studied how to collaboratively reduce the sensing gap in space and time, however none of them provides a rigorous approach to confine sensing error within desirable bounds. In this work, we propose a collaborative scheme called CIES, based on the novel concept of error inference between collaborative sensor pairs. Within a node, we use a sensing probability bound to control tolerable sensing error. Within a neighborhood, nodes can trigger additional sensing activities of other nodes when inferred sensing error has aggregately exceed the tolerance. We conducted simulations to investigate system performance using historical soil temperature data in Wisconsin-Minnesota area. The simulation results demonstrate that the system error is confined within the specified error tolerance bounds and that a maximum of 60 percent of the energy savings can be achieved, when the CIES is compared to several fixed probability sensing schemes such as eSense. We further validated the simulation and algorithms by constructing a lab test-bench to emulate actual environment monitoring applications. The results show that our approach is effective and efficient in tracking the dramatic temperature shift in highly dynamic environments.

11 citations


Cites result from "Blurring snapshots: Temporal infere..."

  • ...The observations that sensor nodes demonstrate spatial correlations found in [16], [17], [18] are also supported by our preliminary experiments described in Section III....

    [...]

Journal ArticleDOI
TL;DR: A model of context inference in pervasive computing, the associated research challenges, and the significant practical impact of intelligent use of such context in pervasive health-care environments are described.
Abstract: Many emerging pervasive health-care applications require the determination of a variety of context attributes of an individual's activities and medical parameters and her surrounding environment. Context is a high-level representation of an entity's state, which captures activities, relationships, capabilities, etc. In practice, high-level context measures are often difficult to sense from a single data source and must instead be inferred using multiple sensors embedded in the environment. A key challenge in deploying context-driven health-care applications involves energy-efficient determination or inference of high-level context information from low-level sensor data streams. Because this abstraction has the potential to reduce the quality of the context information, it is also necessary to model the tradeoff between the cost of sensor data collection and the quality of the inferred context. This article describes a model of context inference in pervasive computing, the associated research challenges, and the significant practical impact of intelligent use of such context in pervasive health-care environments.

9 citations

References
More filters
Journal ArticleDOI
TL;DR: In this article, the authors explore and evaluate the use of directed diffusion for a simple remote-surveillance sensor network analytically and experimentally and demonstrate that directed diffusion can achieve significant energy savings and can outperform idealized traditional schemes under the investigated scenarios.
Abstract: Advances in processor, memory, and radio technology will enable small and cheap nodes capable of sensing, communication, and computation. Networks of such nodes can coordinate to perform distributed sensing of environmental phenomena. In this paper, we explore the directed-diffusion paradigm for such coordination. Directed diffusion is data-centric in that all communication is for named data. All nodes in a directed-diffusion-based network are application aware. This enables diffusion to achieve energy savings by selecting empirically good paths and by caching and processing data in-network (e.g., data aggregation). We explore and evaluate the use of directed diffusion for a simple remote-surveillance sensor network analytically and experimentally. Our evaluation indicates that directed diffusion can achieve significant energy savings and can outperform idealized traditional schemes (e.g., omniscient multicast) under the investigated scenarios.

2,550 citations

Book ChapterDOI
31 Aug 2004
TL;DR: This paper enrichs interactive sensor querying with statistical modeling techniques, and demonstrates that such models can help provide answers that are both more meaningful, and, by introducing approximations with probabilistic confidences, significantly more efficient to compute in both time and energy.
Abstract: Declarative queries are proving to be an attractive paradigm for ineracting with networks of wireless sensors. The metaphor that "the sensornet is a database" is problematic, however, because sensors do not exhaustively represent the data in the real world. In order to map the raw sensor readings onto physical reality, a model of that reality is required to complement the readings. In this paper, we enrich interactive sensor querying with statistical modeling techniques. We demonstrate that such models can help provide answers that are both more meaningful, and, by introducing approximations with probabilistic confidences, significantly more efficient to compute in both time and energy. Utilizing the combination of a model and live data acquisition raises the challenging optimization problem of selecting the best sensor readings to acquire, balancing the increase in the confidence of our answer against the communication and data acquisition costs in the network. We describe an exponential time algorithm for finding the optimal solution to this optimization problem, and a polynomial-time heuristic for identifying solutions that perform well in practice. We evaluate our approach on several real-world sensor-network data sets, taking into account the real measured data and communication quality, demonstrating that our model-based approach provides a high-fidelity representation of the real phenomena and leads to significant performance gains versus traditional data acquisition techniques.

1,218 citations


"Blurring snapshots: Temporal infere..." refers background in this paper

  • ...Model-driven approaches that estimate missing data using mathematical models can alleviate these uncertainties [6], [7]....

    [...]

Proceedings ArticleDOI
09 Jun 2003
TL;DR: The current version of TelegraphCQ is shown, which is implemented by leveraging the code base of the open source PostgreSQL database system, which found that a significant portion of the PostgreSQL code was easily reusable.
Abstract: At Berkeley, we are developing TelegraphCQ [1, 2], a dataflow system for processing continuous queries over data streams. TelegraphCQ is based on a novel, highly-adaptive architecture supporting dynamic query workloads in volatile data streaming environments. In this demonstration we show our current version of TelegraphCQ, which we implemented by leveraging the code base of the open source PostgreSQL database system. Although TelegraphCQ differs significantly from a traditional database system, we found that a significant portion of the PostgreSQL code was easily reusable. We also found the extensibility features of PostgreSQL very useful, particularly its rich data types and the ability to load user-developed functions. Challenges: As discussed in [1], sharing and adaptivity are our main techniques for implementing a continuous query system. Doing this in the codebase of a conventional database posed a number of challenges:

767 citations


"Blurring snapshots: Temporal infere..." refers background in this paper

  • ...Such trends are usually measured through continuous queries that are often registered at the remote information sources and periodically push sensed data back to the consumers [2], [9]....

    [...]

Proceedings ArticleDOI
09 Jun 2003
TL;DR: This paper addresses the important issue of measuring the quality of the answers to query evaluation based upon uncertain data, and provides algorithms for efficiently pulling data from relevant sensors or moving objects in order to improve thequality of the executing queries.
Abstract: Many applications employ sensors for monitoring entities such as temperature and wind speed. A centralized database tracks these entities to enable query processing. Due to continuous changes in these values and limited resources (e.g., network bandwidth and battery power), it is often infeasible to store the exact values at all times. A similar situation exists for moving object environments that track the constantly changing locations of objects. In this environment, it is possible for database queries to produce incorrect or invalid results based upon old data. However, if the degree of error (or uncertainty) between the actual value and the database value is controlled, one can place more confidence in the answers to queries. More generally, query answers can be augmented with probabilistic estimates of the validity of the answers. In this paper we study probabilistic query evaluation based upon uncertain data. A classification of queries is made based upon the nature of the result set. For each class, we develop algorithms for computing probabilistic answers. We address the important issue of measuring the quality of the answers to these queries, and provide algorithms for efficiently pulling data from relevant sensors or moving objects in order to improve the quality of the executing queries. Extensive experiments are performed to examine the effectiveness of several data update policies.

632 citations

Book ChapterDOI
12 Oct 2003
TL;DR: In this paper, a method of learning a Bayesian model of a traveler moving through an urban environment is presented, which simultaneously learns a unified model of the traveler's current mode of transportation as well as his most likely route, in an unsupervised manner.
Abstract: We present a method of learning a Bayesian model of a traveler moving through an urban environment. This technique is novel in that it simultaneously learns a unified model of the traveler’s current mode of transportation as well as his most likely route, in an unsupervised manner. The model is implemented using particle filters and learned using Expectation-Maximization. The training data is drawn from a GPS sensor stream that was collected by the authors over a period of three months. We demonstrate that by adding more external knowledge about bus routes and bus stops, accuracy is improved.

601 citations

Frequently Asked Questions (1)
Q1. What have the authors contributed in "Blurring snapshots: temporal inference of missing and uncertain data tr-utedge-2009-005" ?

In this paper, the authors support continuous monitoring of dynamic pervasive computing phenomena through the use of a series of snapshot queries. The authors evaluate the usefulness of this abstraction in its application to complex spatiotemporal pattern queries in pervasive computing networks.