# Blurring snapshots: Temporal inference of missing and uncertain data

TL;DR: This paper defines a decay function and a set of inference approaches to filling in missing and uncertain data in this continuous query, and evaluates the usefulness of this abstraction in its application to complex spatio-temporal pattern queries in pervasive computing networks.

Abstract: Many pervasive computing applications continuously monitor state changes in the environment by acquiring, interpreting and responding to information from sensors embedded in the environment. However, it is extremely difficult and expensive to obtain a continuous, complete, and consistent picture of a continuously evolving operating environment. One standard technique to mitigate this problem is to employ mathematical models that compute missing data from sampled observations thereby approximating a continuous and complete stream of information. However, existing models have traditionally not incorporated a notion of temporal validity, or the quantification of imprecision associated with inferring data values from past or future observations. In this paper, we support continuous monitoring of dynamic pervasive computing phenomena through the use of a series of snapshot queries. We define a decay function and a set of inference approaches to filling in missing and uncertain data in this continuous query.We evaluate the usefulness of this abstraction in its application to complex spatio-temporal pattern queries in pervasive computing networks.

## Summary (4 min read)

### Introduction

- The emergence of pervasive computing is characterized by increased instrumentation of the physical world, including small sensing devices that allow applications to query a local area using a dynamic and distributed network for support.
- First, the authors must be able to provide estimates of the continuous query result between adjacent snapshot queries.
- The authors approach relies on a simple abstraction called a decay function (Section III) that quantifies the temporal validity associated with sensing a particular phenomenon.
- The inference and its associated confidence can also provide the application a concrete sense of what the degree of the uncertainty is.

### II. BACKGROUND

- This paper builds on their previous approaches defining snapshot and continuous query fidelity and an associated middleware [15], [18].
- These approaches approximate a continuous query using a sequence of snapshot queries evaluated over the network at discrete times.
- Using the values of the host triple, the authors can derive physical and logical connectivity relations.
- As one example, if the host’s context, ζ, includes the host’s location, the authors can define a physical connectivity relation based on communication range.
- The environment evolves as the network changes, values change, and hosts exchange messages.

### III. MODELING UNCERTAINTY

- The authors approach to query processing allows users to pose continuous queries to an evolving network and receive a result that resembles a data stream even though it is obtained using discrete snapshot queries.
- Missing and uncertain sensed items can be a bane to this process, especially in monitoring the evolution of the data.
- On a construction site, a site supervisor may use a continuous query to monitor the total number of available bricks on the site.
- When their snapshot queries are not impacted by any missing or uncertain data, the stable set the trend analysis generates is the actual stable set.
- The black circles represent hosts the snapshot query directly sampled; gray circles represent hosts for which data values have been inferred.

### IV. TEMPORAL INFERENCE FOR CONTINUOUS QUERIES

- Decay functions allow applications to define the validity of projecting information across time.
- The authors now address the question what the value of that projected data should be.
- Specifically, the authors present a suite of simple techniques that estimate inferred values.
- The authors also demonstrate how this inference can be combined with decay functions to associate confidence with inferred values.
- In later sections, the authors evaluate the applicability of these inference approaches to real phenomena.

### A. Nearest Neighbor Inference

- For some applications, data value changes may be difficult to predict, for instance when the underlying process observed is unknown or arbitrary.
- In such cases, one technique to estimate missing data is to assume the sampled value closest in time is still correct.
- The application then sums across all of the data readings to generate a total number of bricks on the site.
- In interpolation, the observed values are fit on a function, where the domain is typically the time of observation and the range is the attribute’s value.
- As with interpolation, regression comes in several flavors ranging from simple techniques like linear regression to more complex non-linear variants.

### C. Determining Inferencing Error

- When employing statistical techniques like interpolation and regression, the observed data acts as the only source of ground truth and serves as input to generate a function that estimates missing or uncertain data.
- To measure how well the model fits the ground truth, the authors define metrics that estimate the distance between the model and reality.
- Similar measures of error for interpolation are difficult to define because the interpolation function is defined such that there is no error in fitting the sampled points to the function.
- As Fig. 7 demonstrates, wildly different interpolation functions (e.g., polynomials of different orders) can fit the same set of sampled points.
- That is, from among the interpolation functions that “fit” the data, the authors favor those that minimize the function’s rate of change.

### D. Computing Confidence from Decay and Error

- Interpolation Error a combination of decay functions and the error measures defined above, also known as 7.
- If the application is inferring values between samples with low confidence, it should likely increase the frequency of sampling.
- Things are slightly more complicated for interpolation and regression.
- In addition to applying the decay function to the area between successful samples, the authors can also use information about the estimated inferencing error to strengthen or weaken their confidence in inferred values.
- Minimizing error like the root mean squared error of a regression or the derivative of an interpolation can increase the confidence in an inferred value.

### V. USAGE SCENARIOS

- This approach to inferring missing and uncertain data applies to a wide variety of pervasive computing applications that are increasingly supported by sensor networks.
- In the introduction, the authors overviewed an intelligent construction site, where people, vehicles, pieces of equipment, parts of the building, and assets are all equipped with sensors that can monitor conditions on the site and share information with applications.
- Broadly speaking, given a series of snapshot queries formed into a continuous query, an application can issue two types of requests for information from the continuous query: point requests, for a value of the continuous query at a single point in time, and range requests, that monitor the continuous query over a specified period of time.
- It may prove difficult to use a continuous function to infer missing values for this type of phenomenon; the authors also define a third query that more intuitively fits a continuous function: Q3:.

### VI. EVALUATION

- The authors have prototyped their framework using OMNeT++ and the MiXiM framework [12], [13].
- The authors implemented the queries given in the previous section and evaluate their framework’s performance.
- Requests are flooded through the network, and each node has a reply probability of 0.5, i.e., every sensor node responds to half of the requests it receives.
- Each experiment was run for at least 50 runs.
- Trucks moved randomly; when they left the construction site, bricks were randomly added to or removed from the truck.

### A. Measuring Confidence

- The authors first evaluate the correctness and usefulness of applying their decay functions to determine confidence in inferred responses.
- The authors executed all three queries described previously and attempted to infer missing and uncertain data for each of them using each of the three aforementioned inference strategies.
- Fig. 10 plots the inferencing error versus the confidence reported by their decay function; specifically the figure shows the results for applying linear interpolation to the results of Q3.
- When their framework reports a higher confidence in an inferred value, the error of that value from the ground truth should be lower.
- These initial experiments served simply to validate their query inferencing and decay function framework.

### B. Cost Savings

- Employing inference models allows applications to trade expense for error.
- Given that their models allow us to blur across these dynamics, the authors are able to query the network far less frequently.
- In addition, instead of querying every node in the network, the authors can intentionally skip some nodes in each snapshot query, also reducing the communication overhead.
- The authors omit charts plotting the communication overhead of their approach for brevity; however, they achieve approximately a 6x reduction in communication overhead in comparison to a flooding approach that queries the network frequently enough to catch every significant change in the dynamic data (which they estimate to be every 5 seconds in their case).
- This reduction in communication translates directly to a reduction in energy expenditures, a significant concern in resource constrained pervasive computing networks.

### C. Application Performance

- The authors next evaluate the usefulness of blurring the snapshot queries in forming a continuous query.
- This is due to the fact that the phenomenon under observation here is subject to very local and discrete data changes.
- Fig. 12 plots the total number of bricks, an aggregate measure that sums samples from multiple nodes.
- The authors expected nearest neighbor inference to be preferred in this situation due to the discrete data; it would likely be the application’s choice due to its consistent performance and simplicity in comparison to interpolation.
- While their decay function provides a strong measure of confidence for individual data estimates, it is possible to develop a more sophisticated metric for confidence in aggregate estimates that combine data values to produce a combinatorial measure (e.g., a sum, average, maximum, etc.).

### VIII. CONCLUSIONS

- Pervasive computing is increasingly supported by sensor networks that perform continuous monitoring of network or physical phenomena.
- Continuous monitoring can be expensive in terms of communication and energy costs.
- Therefore, continuous queries in pervasive computing applications inevitably contain missing or uncertain data items.
- Attempting to understand trends and patterns in measuring these continuous phenomena is hindered by this inherent uncertainty.
- The authors designed a framework that employs statistical modeling to both infer missing and uncertain data and to understand the degree of confidence an application should place in that inferred information.

Did you find this useful? Give us your feedback

...read more

##### Citations

30 citations

15 citations

12 citations

### Cites background from "Blurring snapshots: Temporal infere..."

...Context inconsistency may also come from the failure of synchronizing all contexts [22] or the absence of a global consistency of all environmental conditions [19]....

[...]

11 citations

### Cites result from "Blurring snapshots: Temporal infere..."

...The observations that sensor nodes demonstrate spatial correlations found in [16], [17], [18] are also supported by our preliminary experiments described in Section III....

[...]

9 citations

##### References

2,517 citations

1,203 citations

### "Blurring snapshots: Temporal infere..." refers background in this paper

...Model-driven approaches that estimate missing data using mathematical models can alleviate these uncertainties [6], [7]....

[...]

750 citations

### "Blurring snapshots: Temporal infere..." refers background in this paper

...Such trends are usually measured through continuous queries that are often registered at the remote information sources and periodically push sensed data back to the consumers [2], [9]....

[...]

626 citations

588 citations