What are the contributions in "Finding a “kneedle” in a haystack: detecting knee points in system behavior" ?

While prior work largely uses ad hoc, system-specific approaches to detect knees, the authors present Kneedle, a general approach to online and offline knee detection that is applicable to a wide range of systems. The authors then evaluate Kneedle ’ s accuracy against existing algorithms on both synthetic and real data sets, and evaluate its performance in two different applications.

How can the authors identify true knees in NoisyGaussian data?

Using the closed-form approximation for the point of maximum curvature in their NoisyGaussian data sets, the authors can identify “true” knees in the data.

What is the way to determine curvature?

Since an approximation of curvature requires at least three points—the minimum number of points that define a circle—end-points in a data set do not have curvature values by definition.

How did the authors reduce the total completion time of Kneedle?

When Kneedle returned a knee, the authors simply reallocated unfinished tasks to idle nodes, reducing the total completion time from 827 seconds down to 143 seconds.

What is the way to test the effectiveness of Kneedle?

To test the effectiveness of Kneedle in their own MapReducelike setting, the authors integrated their algorithm into a prototypical distributed batch computing system that farms out tasks to PlanetLab nodes [18].

What is the definition of a knee?

In this work, as in [8], the authors use the mathematical definition of curvature for a continuous functi n as the basis for ur knee definition.

How do the authors use Kneedle to find the knee?

The authors increment the rate every time a packet is transmitted and pace the packets evenly; for every 100 packets sent, the authors compute the knee point and use it as the new target rate.

How can Kneedle be integrated into existing systems?

Figure 10 demonstrates that Kneedle can be successfully integrated into existing systems with minimal effort: the only change required to their work allocation system was a single function call.

(Open Access) Finding a "Kneedle" in a Haystack: Detecting Knee Points in System Behavior (2011) | Ville Satopää

Q: What is the benefit of evaluating the knee detection algorithms using NoisyGaussian?

The benefit of evaluating the knee detection algorithms using NoisyGaussian is that an approximate closed-form solution exists for the point of maximum curvature.

Q: How do the authors compute the point of maximum curvature?

The authors derive the point of maximum curvature by computing it for the underlying Gaussian CDF in terms of standard deviation σ and mean µ.

Q: What is the threshold value for detecting knees?

For each local maximum (xlmxi , ylmxi) in the difference curve, the authors define a unique threshold value, Tlmxi , that is based on the average difference between consecutive xvalues and a sensitivity parameter, S. The sensitivity parameter allows us to adjust how aggressive the authors want Kneedle to be when detecting knees.

Finding a “Kneedle” in a Haystack:

Detecting Knee Points in System Behavior

Ville Satop

†

, Jeannie Albrecht

†

, David Irwin

‡

, and Barath Raghavan

†

Williams College, Williamstown, MA

‡

University of Massachusetts Amherst, Amherst, MA

International Computer Science Institute, Berkeley, CA

Abstract—Computer systems often reach a point at which the

relative cost to increase some tunable parameter is no longer

worth the corresponding performance beneﬁt. These “knees” typ-

ically represent beneﬁcial points that system designers have long

selected to best balance inherent trade-offs. While prior work

largely uses ad hoc, system-speciﬁc approaches to detect knees,

we present Kneedle, a general approach to online and ofﬂine

knee detection that is applicable to a wide range of systems.

We deﬁne a knee formally for continuous functions using the

mathematical concept of curvature and compare our deﬁnition

against alternatives. We then evaluate Kneedle’s accuracy against

existing algorithms on both synthetic and real data sets, and

evaluate its performance in two different applications.

I. INTRODUCTION

Selecting the “right” operating point for a given system is

often thought of as an art form, since the direct and indirect

costs and beneﬁts of changing different system parameters

are difﬁcult or even impossible to quantify. For example, an

important operating point in a large MapReduce job occurs

when the job should no longer wait for “slow” tasks to ﬁnish,

but instead speculatively re-execute work on other nodes in

hopes of ﬁnishing the job sooner [1]. Since MapReduce’s goal

is to ﬁnish all tasks as fast as possible, it must decide when the

cost, in terms of a job’s running time and cluster utilization,

is worth the corresponding performance beneﬁt, in terms of

task completion percentage. Congestion-responsive network

protocols face a related challenge when setting a sending rate:

a protocol must decide a rate that maximizes performance

without exceeding its fair share and causing congestion.

In prior work, the issue has frequently been couched as

identifying one or more “knees”—operating points, based on

recent trends, where the perceived cost to alter a system param-

eter is no longer worth the expected performance beneﬁt. For

MapReduce, triggering speculative execution after observing

a knee in the task completion percentage ensures that the

system re-executes tasks that are signiﬁcantly slower than

other similar tasks that have ﬁnished execution. In the case

of a network protocol, successive increases to the sending

rate should cease if delay signals congestion by increasing

steeply, forming a knee. However, while the problem of

knee detection—ﬁnding “good” operating points in system

behavior—seems straightforward, to the best of our knowledge

there exists neither an accepted deﬁnition of a knee nor a

general systematic approach for detecting one.

Numerous researchers in widely disparate areas frequently

encounter knee detection problems similar to those we de-

scribe [1], [2], [3], [4], [5]. In these systems, researchers

either use ad hoc or system-speciﬁc approaches to detect

knees, or defer the problem to future work. While a ﬁnely-

crafted system-speciﬁc approach will perform better than a

general knee detection approach, a designer may not take

the time to design one. Thus, our aim is not to improve

or optimize a speciﬁc system or protocol, but to provide

system designers a general tool for improving the parts of

their system they generally do not take the time to optimize.

In network protocol and system design, rules-of-thumb often

serve researchers and operators well in the absence of an

optimal solution. We believe that a tool for knee detection

adds to their problem solving arsenal. Our hypothesis is that

a knee detection algorithm that does not require tuning for a

speciﬁc system or operational characteristics is applicable in a

wide range of settings where developers do not take the time

to design, test, and optimize a system-speciﬁc algorithm.

II. DEFINING AND DETECTING KNEES

While the notion of a knee is well-known, we are not

aware of a broadly accepted deﬁnition in prior literature.

The confusion stems from the fact that researchers, in many

cases unknowingly, use knees as a substitite for a more

comprehensive cost-beneﬁt analysis that is either difﬁcult

or impossible to perform. Performing a direct cost-beneﬁt

analysis is often complex, since it is inherently system-,

platform-, and workload-speciﬁc. Further, many systems are

not predictable due to volatile operating conditions.

For example, unpredictable failure rates in large clusters,

which may change over time, are the root cause of stragglers in

MapReduce jobs [1]. Likewise, since multiple ﬂows share net-

work links in the Internet, network protocols cannot predict in

advance the rapidly changing level of TCP-friendly bandwidth

available, but must instead continuously adapt to the indirect

signals of packet loss and delay [6]. In lieu of a complex

system-speciﬁc analysis, operators tend to select operating

points, or knees, that are “good enough” by observing where

performance improvements start to level off as a function of

one or more tunable system parameters. Note that we focus on

knee detection for complex systems that change their behavior

according to volatile, and potentially unpredictable, operating

conditions, and not for simple systems that permit standard

closed-form models, e.g., M/M/1 queues [7].

Ville Satop

a Knee Detection - Spring 2010

-4 -2 0 2 4

0.0 0.4 0.8

Gaussian Curvature

Time

Arrivals

-4 -2 0 2 4

-0.2 0.0 0.2

The top graph represents the CDF and the bottom graph is the associated curvature. The vertical line indicates the

maximum curvature, i.e. the knee, This seems to match the intuitive deﬁnition of a knee very precisely.

Fig. 1: CDF of a standard Gaussian distribution with mean=0

and standard deviation=1. Vertical bar indicates point of maximum

curvature. The inﬂection point of this curve occurs at x =0.

A. Knee Deﬁnition

The difﬁculty with deﬁning a knee formally is that “good

enough” in one system may not be “good enough” in another.

Since knees only serve as an approximation, operators interpret

them differently in different situations. Thus, knee detection is

an inherently heuristic process. However, to design a general

application-independent knee detection algorithm, we require

a consistent deﬁnition applicable to any system. In this work,

as in [8], we use the mathematical deﬁnition of curvature for

a continuous function as the basis for our knee deﬁnition. For

any continuous function f, there exists a standard closed-form

(x) that deﬁnes the curvature of f at any point as a function

of its ﬁrst and second derivative:

(x)=

��

(x)

(1 + f

�

(x)

)

1.5

The point of maximum curvature is well-matched to the ad

hoc methods operators use to select a knee, since curvature is

a mathematical measure of how much a function differs from

a straight line. As a result, maximum curvature captures the

leveling off effect operators use to identify knees. Importantly,

unlike other common deﬁnitions, curvature is application-

independent and (i) does not depend on the relationship

between system parameters and performance, or (ii) require

setting system-speciﬁc thresholds. Note that knee detection

does depend on the selection of proper adjustable system

parameters and performance metrics, as we show for our

examples in Section V.

It is important to realize why a knee deﬁnition based only on

the ﬁrst derivative is not enough to identify a knee. Consider

the simple example in Figure 1, where the y-axis represents

some performance metric, the x-axis represents a tunable

system parameter, and the vertical bar represents the point

of maximum curvature. The maximum of the ﬁrst derivative

is the inﬂection point of the curve, which occurs at x =0

in Figure 1. The inﬂection point is not representative of the

knee since performance continues to improve signiﬁcantly

beyond it. Instead, the inﬂection point only captures where the

rate of performance increase reaches a maximum. In contrast,

the curvature deﬁnition precisely matches the concept of a

knee. [8] includes a survey of a range of other knee deﬁni-

tions from prior work, primarily in the context of clustering

algorithms [7], [9], [10], [11], [12]. We discuss alternative

deﬁnitions below.

While curvature is well-deﬁned for continuous functions,

it is not well-deﬁned for discrete data sets. In the discrete

case, we could determine curvature by ﬁtting a continuous

function to the data and using the function’s point of maximum

curvature. However, ﬁtting a continuous function to a set of

arbitrary data points is difﬁcult, especially if the data is noisy.

Further, determining the maximum curvature of the resulting

function may not be sufﬁcient, since the curvature at any point

of a function is dependent on the entire function, including

points not in the relevant data set. Thus, maximum curvature

may fall outside the data’s valid range or be one of the set’s

end-points. Since an approximation of curvature requires at

least three points—the minimum number of points that deﬁne

a circle—end-points in a data set do not have curvature values

by deﬁnition. Thus, using the closed-form formulation as a

direct basis for knee detection on discrete data is not possible.

B. Knee Detection in Discrete Data Sets

Researchers have proposed multiple previous approaches

to detecting knees in discrete data. Before formulating our

curvature-inspired algorithm in Section III, we present two

existing approaches—Angle-based and EWMA—from prior

research for comparison, as well as another approach we

formulate based on Menger curvature, a direct discrete equiv-

alent of continuous curvature. Note that the Angle-based and

Menger algorithms are designed speciﬁcally for ofﬂine cases,

where the entire data set is known in advance, while EWMA is

designed to detect knees online as data points become known.

Angle-based. The geometric “angle-based” approach of

Zhao et al. [13] is an extension of the L-method for detecting

knees in clustering applications [8]. The Angle-based approach

ﬁrst ﬁnds the local minima of the successive differences

+ y

− 2y

) for each consecutive triple of points. For

example, consider a straight line that goes through the con-

secutive points (x

), (x

), and (x

). Assuming x-

values are evenly spaced, then y

+ y

− 2y

=0for any

straight segment. However, if these three points form a knee,

) must be above the the straight line that goes through

) and (x

). In this case y

−2y

< 0. “Sharper”

knees have more negative difference values.

Next, since successive differences are local measures and

ignore the overall trend of the curve, the algorithm combines

the differences with an angle value. After obtaining the local

minima of the successive differences, the algorithm sorts the

minima, and, starting from the point with the largest difference

value, calculates the two angles formed by the y -axis and the

line going through each successive pair of points associated

with the corresponding difference value. The sum of these

two angles is the angle value. Knees are detected at the local

maxima of these angle values.

Menger Curvature. While curvature is not well-deﬁned

for arbitrary discrete data sets, Menger curvature deﬁnes the

curvature for three discrete points as the curvature of the

circle circumscribed about those points [14]. Thus, we deﬁne

the Menger curvature for each point p

=(x

) in an

n point data set as being equal to 1/r for the circle of

radius r circumscribed about p

, p

, and p

. The curvature

of the circumscribed circle is straightforward to compute and

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Difference

Threshold

(a) (b) (c)

Fig. 2: Kneedle algorithm for online knee detection. (a) depicts the smoothed and normalized data, with dashed bars indicating the

perpendicular distance from y = x with the maximum distance indicated. (b) shows the same data, but this time the dashed bars are

rotated 45 degrees. The magnitude of these bars correspond to the difference values used in Kneedle. (c) shows the plot of these difference

values and the corresponding threshold values (with S =1). The knee is found at x =0.22 and is detected after receiving the point x =0.55.

is simply a function of the lengths of the sides of the triangle

with the points as vertices. However, as we show in Section IV,

while Menger closely approximates curvature for ofﬂine data

drawn from ideal continuous functions, it does not work well

for the noisy online data sets typical of computing systems.

EWMA. The EWMA approach uses techniques similar

to those employed by Bollinger Bands [15] and Geometric

Moving Average algorithms for change detection [16]. The

algorithm that we use is based on the methodology described

by Albrecht et al. in their work on partial barriers [3], which

derives from previous work on MONET [17]. EWMA is an

online algorithm that uses two exponentially weighted moving

averages. The ﬁrst EWMA, called arr, is used to smooth

the input data, which is viewed as host arrival times. The

second EWMA, arrvar, keeps track of the average deviation

from arr, and is an estimate of the variance in arrival times.

Finally, these two values are used to compute a maximum wait

threshold of arr +4· arrvar, which represents the maximum

amount of time to wait for the next point to arrive. If the

point arrives after this threshold, or the threshold is reached

without seeing the next arrival, EWMA declares a knee. One

important attribute of this algorithm is that EWMA does not

directly report where the knee point is—it only determines if

a knee has been passed. As a result, EWMA is only applicable

in an online setting.

III. KNEEDLE ALGORITHM

Kneedle is based on the notion that the points of maximum

curvature in a data set—the knees—are approximately the set

of points in a curve that are local maxima if the curve is rotated

θ degrees clockwise about (x

min

) through the line formed

by the points (x

min

) and (x

max

). We choose this line

because we want to preserve the overall behavior of the data

set—using a line of best ﬁt, for example, risks cutting off the

end points due to a higher concentration of points in the middle

of the curve. After rotating about this line, the local maxima—

and thus knees—are the points at which the curve differs most

from the straight line segment connecting the ﬁrst and last data

point, thereby approximating the point of maximum curvature

for a discrete set of points. Since maximum curvature is an

inherent measure of the point where a continuous function

differs most from a straight line, Kneedle uses a literal measure

of the point that differs most from the straight line connecting

the set’s end-points.

Figure 2 depicts how Kneedle works for data points drawn

from the curve y = −1/x +5 where x-values are between 0

and 1. Note that we assume that the curves under consideration

have negative concavity. For curves with consistently positive

concavity (e.g., forming “elbows” rather than knees) it is trivial

to invert the graph by replacing each y

with y

max

− y

and x

with x

max

− x

We summarize Kneedle below. Put simply, knees occur

when a curve becomes more “ﬂat,” indicating a decrease in

curvature. The algorithm works as follows:

1. First we use a smoothing spline to preserve the shape of

the original data set as much as possible, although other

smoothing techniques, such as an exponentially weighted

moving average, could also be used. Let D

represent the

ﬁnite set of x- and y-values that deﬁne a smooth curve, i.e.,

one that has been ﬁt to a smoothing spline.

= {(x

) ∈ R

| x

≥ 0}.

2. We want our algorithm to function in the same way

regardless of the magnitude of the values in the underlying

data. Thus, we next normalize the points of the smooth

curve to the unit square, as shown in Figure 2(a). This does

not change the shape or trends of the data set:

= {(x

)}, where

=(x

− min{x

})/(max{x

}−min{x

}),

=(y

− min{y

})/(max{y

}−min{y

})}.

3. Next, we let D

represent the set of differences between

the x- and y-values, i.e., the set of points (x, y − x) as

illustrated in Figure 2(b). The goal is to ﬁnd out when

the difference curve changes from horizontal to sharply

decreasing, since this indicates the presence of a knee in the

original data set. Note that the actual values of the difference

points are irrelevant. We are only interested in observing the

trends of the difference curve, as seen in Figure 2(c).

= {(x

)}, where

= x

= y

− x

0 20 40 60 80 100

Definition

Kneedle

Menger

Angle−based

EWMA

Fig. 3: Kneedle, Menger, Angle-based,

and EWMA for synthetic data set. Max-

imum curvature occurs at x = 60.

●

1 2 3 4 5 6

0.0 0.1 0.2 0.3 0.4 0.5

Allowable error (number of points)

F−Score

●

Kneedle

Menger

Angle−based

Fig. 4: Measured ofﬂine F-Score of knee

detection algorithms using NoisyGaus-

sian data.

−50 0 50 100 150

0.00 0.02 0.04 0.06 0.08 0.10

Difference to Maximum Curvature

Probability Density

Kneedle

Menger

Angle−based

Fig. 5: Histogram showing measured off-

line distances (numbers of x-values) to

“correct” knees.

4. To ﬁnd the knee points in the normalized curve, e.g., the

places where the curve ﬂattens out, we calculate the local

maxima of the difference curve. These points indicate the

instances where the rate of increase of y begins to decrease.

Each of these local maximum points are a candidate knee

point in the original data curve:

lmx

= {(x

lmx

)}, where

lmx

= x

lmx

= y

| y

i−1

i+1

5. For each local maximum (x

lmx

) in the difference

curve, we deﬁne a unique threshold value, T

lmx

, that is

based on the average difference between consecutive x-

values and a sensitivity parameter, S. The sensitivity param-

eter allows us to adjust how aggressive we want Kneedle

to be when detecting knees. Smaller values for S detect

knees quicker, while larger values are more conservative.

Put simply, S is a measure of how many “ﬂat” points we

expect to see in the unmodiﬁed data curve before declaring

a knee. We explore the choice of S in Section IV. In

Figure 2(c), the threshold line is plotted with S =1.

lmx

= y

lmx

− S ·

n−1



i=1



i+1

− x



n − 1

6. If any difference value (x

), where j>i, drops

below the threshold y = T

lmx

for (x

lmx

) before the

next local maximum in the difference curve is reached,

Kneedle declares a knee at the x-value of the corresponding

local maximum x = x

lmx

. If the difference values reach

a local minimum and starts to increase before y = T

lmx

is reached, we reset the threshold value to 0 and wait for

another local maximum to be reached.

Note that Kneedle can be run ofﬂine or online. In the online

case, Kneedle can “correct” old knee values if necessary as

points are received. Kneedle’s online run time for any given

n pairs of x- and y-values is bounded by



i=1

i = O(n

IV. EVA L UAT I N G KNEEDLE

We compare the performance of Kneedle to the ofﬂine

(Angle-based, Menger) and online (EWMA) algorithms sep-

arately, since their goals are different. In ofﬂine settings, our

aim is to determine a base-line accuracy for each algorithm

using synthetic data sets drawn from continuous functions

where the true knees are well-known. After showing that

Kneedle closely approximates the true knees, we then compare

its online behavior against EWMA to evaluate how quickly it

is able to detect knees once they “appear” in the data.

A. Detecting Knees in Synthetic Data Sets

To evaluate Kneedle, we developed a synthetic data source

which we call NoisyGaussian that yields data similar to many

of the real data sets of interest, but allows us to vary the overall

shape of the curve. To generate a NoisyGaussian, we start

with a Gaussian function with a randomly selected standard

deviation and mean. Then we generate the NoisyGaussian

data set using the cumulative count of the randomly generated

points whose value is less than x. The resulting curve is similar

to a Gaussian cumulative distribution function in overall shape.

The beneﬁt of evaluating the knee detection algorithms

using NoisyGaussian is that an approximate closed-form

solution exists for the point of maximum curvature. We derive

the point of maximum curvature by computing it for the

underlying Gaussian CDF in terms of standard deviation σ

and mean µ. Although we omit the details for brevity, the

point of maximum curvature is approximately x ≈ µ + σ with

a small bounded error. We use this closed-form expression to

represent the “correct” knee in our evaluation.

To illustrate the general behavior of each knee detector, we

plot the knees each algorithm detects in Figure 3 for a sample

NoisyGaussian data set with µ = 50 and σ = 10.

B. Ofﬂine Accuracy

To evaluate ofﬂine accuracy, we use three common statisti-

cal metrics: precision, recall, and F-Score. Precision measures

the correctness of each knee an algorithm detects. A low preci-

sion value indicates the presence of numerous false positives,

where a false positive is any detected knee that does not

align with maximum curvature. Recall measures completeness

by quantifying the percentage of correct knees an algorithm

detects out of the total number of correct knees. Note, however,

that recall does not penalize for incorrect detections. Our third

metric, F-Score, is the harmonic mean of precision and recall.

Since an ideal knee detection algorithm has both high recall

−40 −20 0 20

0.00 0.02 0.04 0.06 0.08 0.10 0.12

Latency of Detection

Probability Density

Kneedle

EWMA

Fig. 6: Online detection latency. Nega-

tive values indicate early detections.

●

0 2 4 6 8 10

0.10 0.15 0.20 0.25 0.30 0.35

Sensitivity Parameter, S

F−Score

Fig. 7: Measured ofﬂine F-Scores for

varying sensitivity values in Kneedle.

0 20 40 60 80 100

0.0 0.2 0.4 0.6 0.8 1.0

Percent of Points Received

F−Score

Sensitivity

0.001

1.0

5.0

Fig. 8: Measured online F-Scores for

varying sensitivity values in Kneedle.

and high precision, we use F-Score to capture both measures

of accuracy in a single value. An F-Score value of 1 is best.

To evaluate our algorithms, we generate 10,000 Noisy-

Gaussian data sets. Since none of the algorithms detect knees

at exactly the point of maximum curvature, we vary how

many data points we allow for error. For example, suppose

our data set includes points at x =1, 2, 3, 4, 5, and the point

of maximum curvature is x =4. With an allowable error of

1, we declare the algorithm as ﬁnding a “correct” knee if it

detects a knee at x =3, 4, or 5. Figure 4 shows that Kneedle’s

F-Score is better than the Angle-based or Menger algorithm.

Using the closed-form approximation for the point of max-

imum curvature in our NoisyGaussian data sets, we can

identify “true” knees in the data. This allows us to quantify

the accuracy of each algorithm by measuring the distance, in

terms of the number of x-values, between the true knees and

the detected knees. Figure 5 shows the results of measuring

the distance, in terms of the number of x-values, between the

true knees and the detected knees. In this histogram, we see

that Kneedle approximates the point of maximum curvature

much more closely than either Menger or Angle-based, since

the density of the histogram is highest between 0 and 25, while

Menger and Angle-based show a wider variation.

C. Online Detection Latency

In this section, we evaluate detection latency—the number

of data points beyond the knee required for detection—for

both EWMA and Kneedle. For online Kneedle, we execute

the knee detection algorithm after receiving each new data

point, in order of increasing x. For both EWMA and Kneedle,

we compute the detection latency as the number of data points

between when the algorithm detects a knee and the actual knee

point as determined by the point of maximum curvature. For

example, suppose the data set has points at x =1, 2, 3, 4, and

5, with a true knee at x =3. Now suppose that after receiving

the point at x =5, the knee detection algorithm detects a

knee. In this case, we compute the the latency as 5 − 3=2.

In Figure 6 we plot a histogram of the detection latency for

EWMA and Kneedle with S =1. The experiment highlights

the fact that Kneedle rarely has a signiﬁcant detection latency,

while EWMA often has high detection latencies.

D. Sensitivity

To better understand the importance of sensitivity, S, to

Kneedle’s performance, we again use F-Score. Figures 7 and 8

show the results of our sensitivity analysis in ofﬂine and online

settings respectively. In both graphs, we compute Kneedle’s F-

Score using a wide range of sensitivity values. We compare

the F-Score from 10,000 data sets for each value of S. In the

ofﬂine graph, we use the points of maximum curvature as the

true knees, and compute the F-Score based on those values. In

the online graph, our goal is to determine how quickly Kneedle

approaches the ofﬂine case, and thus we use the knees detected

by ofﬂine Kneedle as the correct knees. Not surprisingly, in

ofﬂine settings where Kneedle has perfect information, the

highest F-Score occurs when S =0. In online settings, the

results vary depending on the number of points received, but

overall S =1has the best results.

V. A PPLICATION RESULTS

This section demonstrates Kneedle’s usefulness in real ap-

plications. First, we identify knees in a data set from prior

work, and show that we ﬁnd close to the same knees that

the authors found with system-speciﬁc techniques. Next we

evaluate Kneedle’s performance for two sample applications: a

MapReduce-like system and a TCP-friendly network protocol.

A. Using Kneedle in Existing Applications

Figure 9 applies knees to object replication, where the knees

represent the optimal degrees of replication for high avail-

ability given various object distributions (data from Figure 5

in [5]). The application requires the detection of multiple knees

in object popularity curves, each of which has considerable

noise. Unlike other knee detection algorithms, such as Menger,

Kneedle is capable of detecting multiple knees, where the

sensitivity of this detection depends on the selected value of

S. Note that we consider this knee detection application to

be ofﬂine, since Zhong et al. observe: “[w]e expect that the

replica adjustment overhead due to object request popularity

changes would not be excessive in practice...our analysis of

real system object request traces in Section 3.2 suggests that

the popularities of most data objects tend to remain stable

over multi-week periods.” The knees found by Kneedle in this

graph concur with those identiﬁed by the original authors.

Finding a "Kneedle" in a Haystack: Detecting Knee Points in System Behavior

Figures

Citations

Snorkel: rapid training data creation with weak supervision

An agent-based model to evaluate the COVID-19 transmission risks in facilities.

Performance-limiting nanoscale trap clusters at grain junctions in halide perovskites

Snorkel: Rapid Training Data Creation with Weak Supervision

Inferring clonal composition from multiple sections of a breast cancer.

References

MapReduce: simplified data processing on large clusters

MapReduce: simplified data processing on large clusters

A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise

A density-based algorithm for discovering clusters in large spatial Databases with Noise

Detection of abrupt changes: theory and application

Related Papers (5)

A density-based algorithm for discovering clusters in large spatial Databases with Noise

Scikit-learn: Machine Learning in Python

Efficient Estimation of Word Representations in Vector Space

Visualizing Data using t-SNE

Latent dirichlet allocation

Frequently Asked Questions (12)

Q1. What are the contributions in "Finding a “kneedle” in a haystack: detecting knee points in system behavior" ?

Q2. How can the authors identify true knees in NoisyGaussian data?

Q3. What is the way to determine curvature?

Q4. What is the benefit of evaluating the knee detection algorithms using NoisyGaussian?

Q5. How did the authors reduce the total completion time of Kneedle?

Q6. What is the way to test the effectiveness of Kneedle?

Q7. What is the definition of a knee?

Q8. How do the authors compute the point of maximum curvature?

Q9. What is the threshold value for detecting knees?

Q10. What is the point of maximum curvature?

Q11. How do the authors use Kneedle to find the knee?

Q12. How can Kneedle be integrated into existing systems?