TL;DR: This paper defines a decay function and a set of inference approaches to filling in missing and uncertain data in this continuous query, and evaluates the usefulness of this abstraction in its application to complex spatio-temporal pattern queries in pervasive computing networks.
Abstract: Many pervasive computing applications continuously monitor state changes in the environment by acquiring, interpreting and responding to information from sensors embedded in the environment. However, it is extremely difficult and expensive to obtain a continuous, complete, and consistent picture of a continuously evolving operating environment. One standard technique to mitigate this problem is to employ mathematical models that compute missing data from sampled observations thereby approximating a continuous and complete stream of information. However, existing models have traditionally not incorporated a notion of temporal validity, or the quantification of imprecision associated with inferring data values from past or future observations. In this paper, we support continuous monitoring of dynamic pervasive computing phenomena through the use of a series of snapshot queries. We define a decay function and a set of inference approaches to filling in missing and uncertain data in this continuous query.We evaluate the usefulness of this abstraction in its application to complex spatio-temporal pattern queries in pervasive computing networks.
The emergence of pervasive computing is characterized by increased instrumentation of the physical world, including small sensing devices that allow applications to query a local area using a dynamic and distributed network for support.
First, the authors must be able to provide estimates of the continuous query result between adjacent snapshot queries.
The authors approach relies on a simple abstraction called a decay function (Section III) that quantifies the temporal validity associated with sensing a particular phenomenon.
The inference and its associated confidence can also provide the application a concrete sense of what the degree of the uncertainty is.
II. BACKGROUND
This paper builds on their previous approaches defining snapshot and continuous query fidelity and an associated middleware [15], [18].
These approaches approximate a continuous query using a sequence of snapshot queries evaluated over the network at discrete times.
Using the values of the host triple, the authors can derive physical and logical connectivity relations.
As one example, if the host’s context, ζ, includes the host’s location, the authors can define a physical connectivity relation based on communication range.
The environment evolves as the network changes, values change, and hosts exchange messages.
III. MODELING UNCERTAINTY
The authors approach to query processing allows users to pose continuous queries to an evolving network and receive a result that resembles a data stream even though it is obtained using discrete snapshot queries.
Missing and uncertain sensed items can be a bane to this process, especially in monitoring the evolution of the data.
On a construction site, a site supervisor may use a continuous query to monitor the total number of available bricks on the site.
When their snapshot queries are not impacted by any missing or uncertain data, the stable set the trend analysis generates is the actual stable set.
The black circles represent hosts the snapshot query directly sampled; gray circles represent hosts for which data values have been inferred.
IV. TEMPORAL INFERENCE FOR CONTINUOUS QUERIES
Decay functions allow applications to define the validity of projecting information across time.
The authors now address the question what the value of that projected data should be.
Specifically, the authors present a suite of simple techniques that estimate inferred values.
The authors also demonstrate how this inference can be combined with decay functions to associate confidence with inferred values.
In later sections, the authors evaluate the applicability of these inference approaches to real phenomena.
A. Nearest Neighbor Inference
For some applications, data value changes may be difficult to predict, for instance when the underlying process observed is unknown or arbitrary.
In such cases, one technique to estimate missing data is to assume the sampled value closest in time is still correct.
The application then sums across all of the data readings to generate a total number of bricks on the site.
In interpolation, the observed values are fit on a function, where the domain is typically the time of observation and the range is the attribute’s value.
As with interpolation, regression comes in several flavors ranging from simple techniques like linear regression to more complex non-linear variants.
C. Determining Inferencing Error
When employing statistical techniques like interpolation and regression, the observed data acts as the only source of ground truth and serves as input to generate a function that estimates missing or uncertain data.
To measure how well the model fits the ground truth, the authors define metrics that estimate the distance between the model and reality.
Similar measures of error for interpolation are difficult to define because the interpolation function is defined such that there is no error in fitting the sampled points to the function.
As Fig. 7 demonstrates, wildly different interpolation functions (e.g., polynomials of different orders) can fit the same set of sampled points.
That is, from among the interpolation functions that “fit” the data, the authors favor those that minimize the function’s rate of change.
D. Computing Confidence from Decay and Error
Interpolation Error a combination of decay functions and the error measures defined above, also known as 7.
If the application is inferring values between samples with low confidence, it should likely increase the frequency of sampling.
Things are slightly more complicated for interpolation and regression.
In addition to applying the decay function to the area between successful samples, the authors can also use information about the estimated inferencing error to strengthen or weaken their confidence in inferred values.
Minimizing error like the root mean squared error of a regression or the derivative of an interpolation can increase the confidence in an inferred value.
V. USAGE SCENARIOS
This approach to inferring missing and uncertain data applies to a wide variety of pervasive computing applications that are increasingly supported by sensor networks.
In the introduction, the authors overviewed an intelligent construction site, where people, vehicles, pieces of equipment, parts of the building, and assets are all equipped with sensors that can monitor conditions on the site and share information with applications.
Broadly speaking, given a series of snapshot queries formed into a continuous query, an application can issue two types of requests for information from the continuous query: point requests, for a value of the continuous query at a single point in time, and range requests, that monitor the continuous query over a specified period of time.
It may prove difficult to use a continuous function to infer missing values for this type of phenomenon; the authors also define a third query that more intuitively fits a continuous function: Q3:.
VI. EVALUATION
The authors have prototyped their framework using OMNeT++ and the MiXiM framework [12], [13].
The authors implemented the queries given in the previous section and evaluate their framework’s performance.
Requests are flooded through the network, and each node has a reply probability of 0.5, i.e., every sensor node responds to half of the requests it receives.
Each experiment was run for at least 50 runs.
Trucks moved randomly; when they left the construction site, bricks were randomly added to or removed from the truck.
A. Measuring Confidence
The authors first evaluate the correctness and usefulness of applying their decay functions to determine confidence in inferred responses.
The authors executed all three queries described previously and attempted to infer missing and uncertain data for each of them using each of the three aforementioned inference strategies.
Fig. 10 plots the inferencing error versus the confidence reported by their decay function; specifically the figure shows the results for applying linear interpolation to the results of Q3.
When their framework reports a higher confidence in an inferred value, the error of that value from the ground truth should be lower.
These initial experiments served simply to validate their query inferencing and decay function framework.
B. Cost Savings
Employing inference models allows applications to trade expense for error.
Given that their models allow us to blur across these dynamics, the authors are able to query the network far less frequently.
In addition, instead of querying every node in the network, the authors can intentionally skip some nodes in each snapshot query, also reducing the communication overhead.
The authors omit charts plotting the communication overhead of their approach for brevity; however, they achieve approximately a 6x reduction in communication overhead in comparison to a flooding approach that queries the network frequently enough to catch every significant change in the dynamic data (which they estimate to be every 5 seconds in their case).
This reduction in communication translates directly to a reduction in energy expenditures, a significant concern in resource constrained pervasive computing networks.
C. Application Performance
The authors next evaluate the usefulness of blurring the snapshot queries in forming a continuous query.
This is due to the fact that the phenomenon under observation here is subject to very local and discrete data changes.
Fig. 12 plots the total number of bricks, an aggregate measure that sums samples from multiple nodes.
The authors expected nearest neighbor inference to be preferred in this situation due to the discrete data; it would likely be the application’s choice due to its consistent performance and simplicity in comparison to interpolation.
While their decay function provides a strong measure of confidence for individual data estimates, it is possible to develop a more sophisticated metric for confidence in aggregate estimates that combine data values to produce a combinatorial measure (e.g., a sum, average, maximum, etc.).
VII. RELATED WORK
The authors work relates to a variety of approaches from querying sensor networks to understanding uncertainty in databases.
The authors approach can execute temporal queries capable of understanding trends in data, even in the presence of missing and uncertain information.
The authors work also overlaps with systems that use statistical models for different purposes.
These approaches both focus on observing phenomena with predictable degrees 9 of low dynamics (e.g., sensing temperature reduction at night) rather than highly dynamic phenomena.
This work does not rely on a priori models fit to the sensed data but on the live data itself.
VIII. CONCLUSIONS
Pervasive computing is increasingly supported by sensor networks that perform continuous monitoring of network or physical phenomena.
Continuous monitoring can be expensive in terms of communication and energy costs.
Therefore, continuous queries in pervasive computing applications inevitably contain missing or uncertain data items.
Attempting to understand trends and patterns in measuring these continuous phenomena is hindered by this inherent uncertainty.
The authors designed a framework that employs statistical modeling to both infer missing and uncertain data and to understand the degree of confidence an application should place in that inferred information.
TL;DR: In this article, a method of learning a Bayesian model of a traveler moving through an urban environment is presented, which simultaneously learns a unified model of the traveler's current mode of transportation as well as his most likely route, in an unsupervised manner.
Abstract: We present a method of learning a Bayesian model of a traveler moving through an urban environment. This technique is novel in that it simultaneously learns a unified model of the traveler’s current mode of transportation as well as his most likely route, in an unsupervised manner. The model is implemented using particle filters and learned using Expectation-Maximization. The training data is drawn from a GPS sensor stream that was collected by the authors over a period of three months. We demonstrate that by adding more external knowledge about bus routes and bus stops, accuracy is improved.
TL;DR: This paper studies a novel way of resolving context inconsistency with the aim of minimizing such side effect for an application, and presents an efficient framework to minimize it during context inconsistency resolution.
Abstract: Applications in ubiquitous computing adapt their behavior based on contexts. The adaptation can be faulty if the contexts are subject to inconsistency. Various techniques have been proposed to identify key contexts from inconsistencies. By removing these contexts, an application is expected to run with inconsistencies resolved. However, existing practice largely overlooks an application’s internal requirements on using these contexts for adaptation. It may lead to unexpected side effect from inconsistency resolution. This paper studies a novel way of resolving context inconsistency with the aim of minimizing such side effect for an application. We model and analyze the side effect for rule-based ubiquitous applications, and experimentally measure and compare it for various inconsistency resolution strategies. We confirm the significance of such side effect if not controlled, and present an efficient framework to minimize it during context inconsistency resolution.
16 citations
Cites background from "Blurring snapshots: Temporal infere..."
...Context inconsistency may also come from the failure of synchronizing all contexts [22] or the absence of a global consistency of all environmental conditions [19]....
TL;DR: The main objective of this work is to develop a generic scheduling mechanism for collaborative sensors to achieve the error-bounded scheduling control in monitoring applications and show that the approach is effective and efficient in tracking the dramatic temperature shift in dynamic environments.
Abstract: Due to the limited power constraint in sensors, dynamic scheduling with data quality management is strongly preferred in the practical deployment of long-term wireless sensor network applications. We could reduce energy consumption by turning off (i.e., duty cycling) sensor, however, at the cost of low-sensing fidelity due to sensing gaps introduced. Typical techniques treat data quality management as an isolated process for individual nodes. And existing techniques have investigated how to collaboratively reduce the sensing gap in space and time domain; however, none of them provides a rigorous approach to confine sensing error is within desirable bound when seeking to optimize the tradeoff between energy consumption and accuracy of predictions. In this paper, we propose and evaluate a scheduling algorithm based on error inference between collaborative sensor pairs, called CIES. Within a node, we use a sensing probability bound to control tolerable sensing error. Within a neighborhood, nodes can trigger additional sensing activities of other nodes when inferred sensing error has aggregately exceeded the tolerance. The main objective of this work is to develop a generic scheduling mechanism for collaborative sensors to achieve the error-bounded scheduling control in monitoring applications. We conducted simulations to investigate system performance using historical soil temperature data in Wisconsin-Minnesota area. The simulation results demonstrate that the system error is confined within the specified error tolerance bounds and that a maximum of 60 percent of the energy savings can be achieved, when the CIES is compared to several fixed probability sensing schemes such as eSense. And further simulation results show the CIES scheme can achieve an improved performance when comparing the metric of a prediction error with baseline schemes. We further validated the simulation and algorithms by constructing a lab test bench to emulate actual environment monitoring applications. The results show that our approach is effective and efficient in tracking the dramatic temperature shift in dynamic environments.
TL;DR: This work proposes a collaborative scheme called CIES, based on the novel concept of error inference between collaborative sensor pairs, which is effective and efficient in tracking the dramatic temperature shift in highly dynamic environments.
Abstract: Energy constraint is a critical hurdle hindering the practical deployment of long-term wireless sensor network applications. Turning off (i.e., duty cycling) sensors could reduce energy consumption, however at the cost of low sensing fidelity due to sensing gaps introduced. Existing techniques have studied how to collaboratively reduce the sensing gap in space and time, however none of them provides a rigorous approach to confine sensing error within desirable bounds. In this work, we propose a collaborative scheme called CIES, based on the novel concept of error inference between collaborative sensor pairs. Within a node, we use a sensing probability bound to control tolerable sensing error. Within a neighborhood, nodes can trigger additional sensing activities of other nodes when inferred sensing error has aggregately exceed the tolerance. We conducted simulations to investigate system performance using historical soil temperature data in Wisconsin-Minnesota area. The simulation results demonstrate that the system error is confined within the specified error tolerance bounds and that a maximum of 60 percent of the energy savings can be achieved, when the CIES is compared to several fixed probability sensing schemes such as eSense. We further validated the simulation and algorithms by constructing a lab test-bench to emulate actual environment monitoring applications. The results show that our approach is effective and efficient in tracking the dramatic temperature shift in highly dynamic environments.
11 citations
Cites result from "Blurring snapshots: Temporal infere..."
...The observations that sensor nodes demonstrate spatial correlations found in [16], [17], [18] are also supported by our preliminary experiments described in Section III....
TL;DR: A model of context inference in pervasive computing, the associated research challenges, and the significant practical impact of intelligent use of such context in pervasive health-care environments are described.
Abstract: Many emerging pervasive health-care applications require the determination of a variety of context attributes of an individual's activities and medical parameters and her surrounding environment. Context is a high-level representation of an entity's state, which captures activities, relationships, capabilities, etc. In practice, high-level context measures are often difficult to sense from a single data source and must instead be inferred using multiple sensors embedded in the environment. A key challenge in deploying context-driven health-care applications involves energy-efficient determination or inference of high-level context information from low-level sensor data streams. Because this abstraction has the potential to reduce the quality of the context information, it is also necessary to model the tradeoff between the cost of sensor data collection and the quality of the inferred context. This article describes a model of context inference in pervasive computing, the associated research challenges, and the significant practical impact of intelligent use of such context in pervasive health-care environments.
TL;DR: In this article, the authors explore and evaluate the use of directed diffusion for a simple remote-surveillance sensor network analytically and experimentally and demonstrate that directed diffusion can achieve significant energy savings and can outperform idealized traditional schemes under the investigated scenarios.
Abstract: Advances in processor, memory, and radio technology will enable small and cheap nodes capable of sensing, communication, and computation. Networks of such nodes can coordinate to perform distributed sensing of environmental phenomena. In this paper, we explore the directed-diffusion paradigm for such coordination. Directed diffusion is data-centric in that all communication is for named data. All nodes in a directed-diffusion-based network are application aware. This enables diffusion to achieve energy savings by selecting empirically good paths and by caching and processing data in-network (e.g., data aggregation). We explore and evaluate the use of directed diffusion for a simple remote-surveillance sensor network analytically and experimentally. Our evaluation indicates that directed diffusion can achieve significant energy savings and can outperform idealized traditional schemes (e.g., omniscient multicast) under the investigated scenarios.
TL;DR: This paper enrichs interactive sensor querying with statistical modeling techniques, and demonstrates that such models can help provide answers that are both more meaningful, and, by introducing approximations with probabilistic confidences, significantly more efficient to compute in both time and energy.
Abstract: Declarative queries are proving to be an attractive paradigm for ineracting with networks of wireless sensors. The metaphor that "the sensornet is a database" is problematic, however, because sensors do not exhaustively represent the data in the real world. In order to map the raw sensor readings onto physical reality, a model of that reality is required to complement the readings. In this paper, we enrich interactive sensor querying with statistical modeling techniques. We demonstrate that such models can help provide answers that are both more meaningful, and, by introducing approximations with probabilistic confidences, significantly more efficient to compute in both time and energy. Utilizing the combination of a model and live data acquisition raises the challenging optimization problem of selecting the best sensor readings to acquire, balancing the increase in the confidence of our answer against the communication and data acquisition costs in the network. We describe an exponential time algorithm for finding the optimal solution to this optimization problem, and a polynomial-time heuristic for identifying solutions that perform well in practice. We evaluate our approach on several real-world sensor-network data sets, taking into account the real measured data and communication quality, demonstrating that our model-based approach provides a high-fidelity representation of the real phenomena and leads to significant performance gains versus traditional data acquisition techniques.
1,218 citations
"Blurring snapshots: Temporal infere..." refers background in this paper
...Model-driven approaches that estimate missing data using mathematical models can alleviate these uncertainties [6], [7]....
TL;DR: The current version of TelegraphCQ is shown, which is implemented by leveraging the code base of the open source PostgreSQL database system, which found that a significant portion of the PostgreSQL code was easily reusable.
Abstract: At Berkeley, we are developing TelegraphCQ [1, 2], a dataflow system for processing continuous queries over data streams. TelegraphCQ is based on a novel, highly-adaptive architecture supporting dynamic query workloads in volatile data streaming environments. In this demonstration we show our current version of TelegraphCQ, which we implemented by leveraging the code base of the open source PostgreSQL database system. Although TelegraphCQ differs significantly from a traditional database system, we found that a significant portion of the PostgreSQL code was easily reusable. We also found the extensibility features of PostgreSQL very useful, particularly its rich data types and the ability to load user-developed functions. Challenges: As discussed in [1], sharing and adaptivity are our main techniques for implementing a continuous query system. Doing this in the codebase of a conventional database posed a number of challenges:
767 citations
"Blurring snapshots: Temporal infere..." refers background in this paper
...Such trends are usually measured through continuous queries that are often registered at the remote information sources and periodically push sensed data
back to the consumers [2], [9]....
TL;DR: This paper addresses the important issue of measuring the quality of the answers to query evaluation based upon uncertain data, and provides algorithms for efficiently pulling data from relevant sensors or moving objects in order to improve thequality of the executing queries.
Abstract: Many applications employ sensors for monitoring entities such as temperature and wind speed. A centralized database tracks these entities to enable query processing. Due to continuous changes in these values and limited resources (e.g., network bandwidth and battery power), it is often infeasible to store the exact values at all times. A similar situation exists for moving object environments that track the constantly changing locations of objects. In this environment, it is possible for database queries to produce incorrect or invalid results based upon old data. However, if the degree of error (or uncertainty) between the actual value and the database value is controlled, one can place more confidence in the answers to queries. More generally, query answers can be augmented with probabilistic estimates of the validity of the answers. In this paper we study probabilistic query evaluation based upon uncertain data. A classification of queries is made based upon the nature of the result set. For each class, we develop algorithms for computing probabilistic answers. We address the important issue of measuring the quality of the answers to these queries, and provide algorithms for efficiently pulling data from relevant sensors or moving objects in order to improve the quality of the executing queries. Extensive experiments are performed to examine the effectiveness of several data update policies.
TL;DR: In this paper, a method of learning a Bayesian model of a traveler moving through an urban environment is presented, which simultaneously learns a unified model of the traveler's current mode of transportation as well as his most likely route, in an unsupervised manner.
Abstract: We present a method of learning a Bayesian model of a traveler moving through an urban environment. This technique is novel in that it simultaneously learns a unified model of the traveler’s current mode of transportation as well as his most likely route, in an unsupervised manner. The model is implemented using particle filters and learned using Expectation-Maximization. The training data is drawn from a GPS sensor stream that was collected by the authors over a period of three months. We demonstrate that by adding more external knowledge about bus routes and bus stops, accuracy is improved.
Q1. What have the authors contributed in "Blurring snapshots: temporal inference of missing and uncertain data tr-utedge-2009-005" ?
In this paper, the authors support continuous monitoring of dynamic pervasive computing phenomena through the use of a series of snapshot queries. The authors evaluate the usefulness of this abstraction in its application to complex spatiotemporal pattern queries in pervasive computing networks.