Q2. What is the common way to guarantee accuracy of a query?
accuracy guarantees will be made in terms of a pair of user specified parameters, ε and δ, meaning that the error in answering a query is within a factor of ε with probability δ.
Q3. What is the disadvantage of the Count-Min sketch?
This sketch has the advantages that: (1) space used is proportional to 1/ε; (2) the update time is significantly sublinear in the size of the sketch; (3) it requires only pairwise independent hash functions that are simple to construct; (4) this sketch can be used for several different queries and multiple applications;and (5) all the constants are made explicit and are small.
Q4. how many dyadic range queries can be used?
Any range query can be reduced to at most 2 log2 n dyadic range queries, which in turn can each be reduced to a single point query.
Q5. How many times does CM sketch improve the query time?
Choosing CM sketches over Random Subset Sums improves both the query time and the update time from O( 1ε2 log 2(n) log log nεδ ), by a factor of more than 34 ε2 log n.
Q6. How many items are in the CM sketch method?
when the distribution of items is non-uniform, for example when certain items contribute a large amount to the join size, then the two norms are closer, and the guarantees of the CM sketch method is closer to the existing method.
Q7. What is the known space bounds for finding approximate quantiles?
The previously best known space bounds for finding approximate quantiles is O( 1ε (log 2 1 ε + log2 log 1δ )) space for a randomized sampling and O( 1ε log(ε||a||1)) space for a deterministic solution [14].
Q8. How many dyadic ranges can be computed?
given a range query Q(l, r), compute the at most 2 log2 n dyadic ranges which canonically cover the range, and pose that many point queries to the sketches, returning the sum of the queries as the estimate.
Q9. What are the applications of CM sketch?
These have applications to computing correlations between data streams and tracking the number of distinct elements in streams, both of which are of great interest.
Q10. What are the -heavy hitters of a multiset of integers?
The φ-heavy hitters of a multiset of ||a||1 (integer) values each in the range 1 . . . n, consist of those items whose multiplicity exceeds the fraction φ of the total cardinality, i.e., ai ≥ φ||a||1.
Q11. How many kilobytes of data are needed to process every update?
Sketches are typically a few kilobytes up to a megabyte or so, and processing this much data for every update severely limits the update speed.
Q12. How can the authors improve the accuracy of the range sums?
By keeping log n sketches, one for each dyadic range and setting the accuracy parameter for each to be ε/ log n and the probability guarantee to δφ/ log(n), the overall probability guarantee for all 1/φ quantiles is achieved.
Q13. How many ranges are found in the hierarchy?
Nodes in the hierarchy (corresponding to dyadic ranges) whose estimated weight exceeds the threshold of (φ + ε)||a||1 are split into two ranges, and investigated recursively.
Q14. What is the previous bounds for this problem in the turnstile model?
The best previous bounds for this problem in the turnstile model are given in [13], where range queries are answered by keeping O(log n) sketches, each of sizeO( 1 ε′2 log(n) log log nδ ) to give approximations with additive error ε||a||1 with probability 1 − δ′.
Q15. How can the authors estimate the join size of two relations on a particular attribute?
Corollary 1. The Join size of two relations on a particular attribute can be approximated up to ε||a||1||b||1 with probability 1− δ, by keeping space O( 1ε log 1 δ ).1
Q16. What is the way to solve the range sum problem?
In [13] the authors showed that finding the approximate φ-quantiles of the data subject to insertions and deletions can be reduced to the problem of computing range sums.
Q17. What is the space used by the algorithm to represent a?
the space used by the algorithm should be small, at most polylogarithmic in n, the space required to represent a explicitly.
Q18. What is the known method for finding the -quantiles of the data subject?
Theorem 4. ε-approximate φ-quantiles can be found with probability at least 1− δ by keeping a data structure with space O( 1ε log2(n) log( log nφδ )).