Finding a "Kneedle" in a Haystack: Detecting Knee Points in System Behavior
read more
Citations
Snorkel: rapid training data creation with weak supervision
An agent-based model to evaluate the COVID-19 transmission risks in facilities.
Performance-limiting nanoscale trap clusters at grain junctions in halide perovskites
Snorkel: Rapid Training Data Creation with Weak Supervision
Inferring clonal composition from multiple sections of a breast cancer.
References
MapReduce: simplified data processing on large clusters
MapReduce: simplified data processing on large clusters
A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise
A density-based algorithm for discovering clusters in large spatial Databases with Noise
Detection of abrupt changes: theory and application
Related Papers (5)
Scikit-learn: Machine Learning in Python
Frequently Asked Questions (12)
Q2. How can the authors identify true knees in NoisyGaussian data?
Using the closed-form approximation for the point of maximum curvature in their NoisyGaussian data sets, the authors can identify “true” knees in the data.
Q3. What is the way to determine curvature?
Since an approximation of curvature requires at least three points—the minimum number of points that define a circle—end-points in a data set do not have curvature values by definition.
Q4. What is the benefit of evaluating the knee detection algorithms using NoisyGaussian?
The benefit of evaluating the knee detection algorithms using NoisyGaussian is that an approximate closed-form solution exists for the point of maximum curvature.
Q5. How did the authors reduce the total completion time of Kneedle?
When Kneedle returned a knee, the authors simply reallocated unfinished tasks to idle nodes, reducing the total completion time from 827 seconds down to 143 seconds.
Q6. What is the way to test the effectiveness of Kneedle?
To test the effectiveness of Kneedle in their own MapReducelike setting, the authors integrated their algorithm into a prototypical distributed batch computing system that farms out tasks to PlanetLab nodes [18].
Q7. What is the definition of a knee?
In this work, as in [8], the authors use the mathematical definition of curvature for a continuous functi n as the basis for ur knee definition.
Q8. How do the authors compute the point of maximum curvature?
The authors derive the point of maximum curvature by computing it for the underlying Gaussian CDF in terms of standard deviation σ and mean µ.
Q9. What is the threshold value for detecting knees?
For each local maximum (xlmxi , ylmxi) in the difference curve, the authors define a unique threshold value, Tlmxi , that is based on the average difference between consecutive xvalues and a sensitivity parameter, S. The sensitivity parameter allows us to adjust how aggressive the authors want Kneedle to be when detecting knees.
Q10. What is the point of maximum curvature?
The point of maximum curvature is well-matched to the ad hoc methods operators use to select a knee, since curvature is a mathematical measure of how much a function differs from a straight line.
Q11. How do the authors use Kneedle to find the knee?
The authors increment the rate every time a packet is transmitted and pace the packets evenly; for every 100 packets sent, the authors compute the knee point and use it as the new target rate.
Q12. How can Kneedle be integrated into existing systems?
Figure 10 demonstrates that Kneedle can be successfully integrated into existing systems with minimal effort: the only change required to their work allocation system was a single function call.