What contributions have the authors mentioned in the paper "Hics: high contrast subspaces for density-based outlier ranking" ?

Outliers do not show up in the scattered full space, they are hidden in multiple high contrast subspace projections of the data. In this work, the authors propose a novel subspace search method that selects high contrast subspaces for density-based outlier ranking. With their approach, the authors propose a first measure for the contrast of subspaces. Thus, the authors enhance the quality of traditional outlier rankings by computing outlier scores in high contrast projections only. The evaluation on real and synthetic data shows that their approach outperforms traditional dimensionality reduction techniques, naive random projections as well as state-of-the-art subspace search techniques and provides enhanced quality for outlier ranking.

What have the authors stated for future works in "Hics: high contrast subspaces for density-based outlier ranking" ?

For future work, the authors aim at further evaluations with other outlier scores such as ORCA [ 5 ] or OUTRES [ 23 ]. Furthermore, the authors would like to extend the research on subspace selection and enhance their subspace search based on other outlier ranking paradigms. Both seem very promising extensions of LOF with enhanced outlier scoring. Due to the decoupled processing, their subspace search can be applied directly to these or other outlier scores.

What is the advantage of subspace slices over any other density estimation method?

The advantage of these subspace slices over any grid-based density estimation is that the authors can construct the subspace slices in a way that does not suffer from the curse of dimensionality.

How did the authors generate clusters in the subspaces?

The authors randomly selected 2-5 dimensional subspaces out of the full data space and generated high density clusters in these subspaces.

What is the heuristic for the subspace generation process?

The subspace generation process terminates when the Apriori merge step produces an empty list for the (d + 1)- dimensional subspace candidates.

How do the authors denote the distance between objects x and y?

The authors denote the distance between objects x and y as distA( x, y), which can be instantiated for instance by the widely used Euclidean Distance distA( x, y) = √∑ s∈A(xs − ys)2.

How does HiCS perform on a broad variety of datasets?

HiCS shows excellent results on a broad variety of datasets, with robust and easy-to-use parameters, and a scalable processing w.r.t. the dimensionality of databases.

What are the parameters that are required to perform the adaptive selection of the subspace?

The algorithm operates according to the sampling formalism in III-D. Besides the set of attributes that belong to the specific subspace, the algorithm requires two parameters:•

What is the effect of the outlier ranking?

As a result, all mentioned outlier score functions will suffer from a loss of contrast, i.e.:score( x) ≈ score( y) ∀ x, y ∈ DBAny outlier ranking obtained for a sufficiently high dimensional database will degenerate into a random ranking with very similar scores for all objects.

How does the algorithm determine the sample size?

In their implementation the authors specified the size by a ratio α ∈ (0, 1) that determines the sample size dynamically in relation to the total size of the database.

What is the effect of local outlier ranking?

Since local outlier ranking calculates the density based on the object distances, the authors observe the same effect for the minimal and maximal value of score( x).

How can the authors improve the quality of HiCS?

Therefore it might be possible to improve the quality of HiCS even further by applying a pre-processing step that takes care of the detection of trivial outliers.

(Open Access) HiCS: High Contrast Subspaces for Density-Based Outlier Ranking (2012) | Fabian Keller

Q: how do the authors measure the contrast between the attributes in subspace S?

Given the notion of probability density in any subspace S, the authors measure the contrast by comparing conditional probability densities to the corresponding marginal densities for all attributes si ∈ S.

HiCS: High Contrast Subspaces

for Density-Based Outlier Ranking

Fabian Keller, Emmanuel M

uller, Klemens B

ohm

Institute for Program Structures and Data Organization

Karlsruhe Institute of Technology (KIT), Germany

{fabian.keller, emmanuel.mueller, klemens.boehm}@kit.edu

Abstract—Outlier mining is a major task in data analysis.

Outliers are objects that highly deviate from regular objects in

their local neighborhood. Density-based outlier ranking methods

score each object based on its degree of deviation. In many

applications, these ranking methods degenerate to random list-

ings due to low contrast between outliers and regular objects.

Outliers do not show up in the scattered full space, they are

hidden in multiple high contrast subspace projections of the data.

Measuring the contrast of such subspaces for outlier rankings is

an open research challenge.

In this work, we propose a novel subspace search method that

selects high contrast subspaces for density-based outlier ranking.

It is designed as pre-processing step to outlier ranking algorithms.

It searches for high contrast subspaces with a signiﬁcant amount

of conditional dependence among the subspace dimensions. With

our approach, we propose a ﬁrst measure for the contrast of

subspaces. Thus, we enhance the quality of traditional outlier

rankings by computing outlier scores in high contrast projections

only. The evaluation on real and synthetic data shows that

our approach outperforms traditional dimensionality reduction

techniques, naive random projections as well as state-of-the-art

subspace search techniques and provides enhanced quality for

outlier ranking.

I. INTRODUCTION

Outlier mining is an important task in the ﬁeld of knowl-

edge discovery. In applications such as fraud detection, gene-

expression analysis or environmental surveillance, one is in-

terested in rare, suspicious, and unexpected objects. Outlier

analysis searches for such highly deviating objects in contrast

to regular objects. An outlier has highly deviating attribute

values compared to its local neighborhood. For example, in

environmental surveillance (cf. Fig. 1) a sensor node might be

an outlier as it shows an abnormally high deviation w.r.t. air

pollution index and noise level. For instance, outlier

shows a

high deviation in this speciﬁc subset of attributes only. Another

sensor node (outlier

) shows high deviation w.r.t. humidity

and temperature, independent of its air pollution index and its

noise level. Thus, a sensor node might be an outlier in one of

these attribute combinations and a regular object in all other

attributes. In general, these multiple roles (outlying vs. regular

behavior) of objects can be observed in other domains as well:

Suspicious customers show fraud activity only w.r.t. some

ﬁnancial transactions, and genes show unexpected expression

only under speciﬁc medical conditions.

Traditional outlier mining [26], [16], [5], [13], [7], [25] is

unable to detect such outliers hidden in subsets of all given

noise level

air pollution index

outlier

noise level

humidity

temperature

humidity

outlier

low contrast high contrasthigh contrast

Fig. 1. Environmental surveillance example: suspicious sensor readings

attributes. Most outlier mining techniques search for outliers

w.r.t. all given attributes. Considering object distances in the

full data space, these methods fall prey to randomly distributed

attribute combinations. In our example, humidity and noise

level in combination show no clear outlier objects and hinder

outlier detection. Furthermore, due to the increasing number

of attributes in today’s databases, distances between objects

grow more and more alike [6]. Outlier ranking techniques

score each object based on the degree of deviation, e.g.,

by computing its density in the full data space [7]. Thus,

for high dimensional data, outlier rankings degenerate to

random listings, as outliers do not show up in the full space.

Other common statistical techniques try to detect outliers in

single attributes [26]. However, by ignoring the dependencies

between several attributes, these techniques miss outliers that

appear only due to correlations in multi-dimensional spaces.

We focus on such outliers, which are neither visible in the full

space nor in a single attribute.

Subspace mining has been proposed as a novel data mining

paradigm to tackle this challenge. It detects highly deviating

objects in any possible attribute combination (low dimensional

projection). While dimensionality reduction techniques aim at

such lower dimensional projections, they are not designed as

pre-processing step for outlier ranking. General measures, such

as the variance of the data in PCA [14], are not appropriate

objective functions for outlier ranking. Novel quality criteria

and processing schemes are required for subspace outlier

mining. In particular, we search for high contrast subspaces.

Such subspaces have the deﬁning characteristic that outliers

can be clearly distinguished from regular objects within the

subspace context. Our general aim is a two-step processing:

(1) Subspace search: measuring the contrast of subspaces

(2) Outlier ranking: score objects in high contrast subspaces

We consider the decoupling of these two steps to be an open

research issue. Current subspace outlier mining techniques [1],

[18], [23], [21] focus on interleaved algorithms only, which

select subspaces during outlier mining. We propose to consider

subspace outlier mining as a decoupled process, divided into

“subspace search” and “outlier ranking”. By treating these

two steps as independent problems, one can design and

combine the respective algorithms in a modular fashion. It

also allows both research ﬁelds to evolve independently. In

conclusion, any improvement in either of these steps will lead

to an improvement in the overall outlier detection quality.

Thus, future research in outlier mining may beneﬁt from the

proposed decoupling.

In this work, we focus on the ﬁrst step and propose a novel

subspace search method that selects high contrast subspaces

for density-based outlier ranking. As outlier score for the

ranking we rely on the commonly used local outlier factor

(LOF) [7]. However, any other outlier score could be used as

instantiation of the second step. Our subspace search technique

is based on a novel selection of high contrast subspaces

(HiCS). It provides three main contributions:

• The decoupling of subspace search as generalized pre-

processing step for outlier ranking

• A contrast measure based on the conditional dependence

of dimensions in the selected subspaces

• Two statistical instantiations of our contrast measure

ensuring a robust parametrization of our technique

Our contrast measure is based on statistical tests and enables

a high quality outlier ranking of outliers hidden in arbitrary

subspace projections. Our approach searches for high contrast

subspaces with a signiﬁcant amount of conditional dependence

among the selected dimensions. Thus, we enhance the quality

of traditional outlier rankings by computing outlier scores in

high contrast projections only. The evaluation on real and

synthetic data shows that our approach outperforms tradi-

tional dimensionality reduction techniques [14], naive random

projections [20] as well as state-of-the-art subspace search

techniques [8], [15] and provides enhanced quality for outlier

rankings.

II. R

ELATED WORK

In this section, we review existing techniques in the areas

of outlier discovery and subspace mining. In particular, we

explain the differences of existing paradigms compared to our

novel subspace search approach.

a) Traditional Outlier Ranking: There have been dif-

ferent outlier detection paradigms proposed in the literature,

ranging from deviation-based methods [26], distance-based

methods [16], [5], [13] to density-based methods [7], [25]. We

focus on the density-based outlier ranking paradigm, which

computes a score for each object by measuring its degree

of deviation w.r.t. a local neighborhood. Thus, one is able to

detect local density variations between low density outliers and

their high density (clustered) neighborhood. However, all of

those traditional outlier mining approaches have one drawback.

They cannot detect outliers in subspaces, as their degree of

deviation considers only the full data space.

b) Subspace Outlier Ranking: Outlier detection in sub-

spaces has ﬁrst been proposed by [1]. Recent approaches have

enhanced subspace outlier mining by ranking objects based

on any possible subspace projection [11], [20], [18], [23],

[21]. These techniques differ in their choice of subspaces. The

majority of approaches uses specialized heuristics for subspace

selection that are integrated into the outlier ranking [11], [18],

[23], [21]. In general, all of these techniques use an integrated

processing of subspaces and outliers. This implies that scoring

functions and subspace selection are tightly coupled such that

none of these techniques would beneﬁt from a novel scoring

function or a novel subspace selection technique.

The only approach with a decoupled processing is consid-

ered as a baseline for our technique. It selects several subspace

projections randomly [20]. Obviously, this random selection

does not guarantee high quality results. Selection of arbitrary

projections will result in random rankings just as in the full

data space. With our work we aim at a decoupled processing

with two steps as proposed in [20]. In contrast to a naive

random selection of subspaces, we aim at an enhanced contrast

measure based on sound statistical foundations.

c) Subspace Search: Based on the general idea of sub-

space mining in arbitrary projections of the data, several pre-

processing techniques for the selection of subspaces have

been proposed [8], [15], [24], [4]. All of these techniques

focus on the related domain of subspace clustering. They

try to decouple the detection of clusters and the selection of

individual subspaces for each cluster. However, each of the

four subspace search models depends on a speciﬁc cluster

deﬁnition.

First, the Enclus approach proposes a selection based on the

entropy measure [8]. Its quality measure for subspaces highly

depends on the subspace clustering algorithm CLIQUE [2]. It

partitions the data space in equally sized grid cells. A subspace

is selected if it has low entropy, i.e., if it shows a large variation

in the densities of the grid cells. With our approach we follow

this basic idea of contrast, however, we do not rely on ﬁxed

grid cells. This is because they induce several drawbacks for

density estimation in high dimensional spaces.

Other techniques, i.e., RIS [15] and SURFING [4], have

been proposed for the detection of density-based subspace

clusters based on the DBSCAN paradigm [10]. For instance,

RIS counts the core objects in a subspace projection and

uses them as a measure for its subspace selection criterion.

Recently, a subspace search method has been proposed for

spectral clustering as well [24].

In general, all of the proposed subspace search methods fo-

cus on speciﬁc clustering tasks. Their selection highly depends

on the underlying clustering model. In contrast to this, our

technique is based on a more general analysis of conditional

dependence. Furthermore, we propose an instantiation of our

objective function that aims at high contrast w.r.t. density-

based outlier ranking, and thus, is tailored to detect low density

regions as required for many outlier models.

III. HIGH CONTRAST SUBSPACES (HICS)

The main idea of our HiCS approach is the statistical

selection of high contrast subspaces. We propose a processing

based on a series of statistical tests. Each test compares the

data distribution in a local subspace region to its marginal

distribution. Dependencies between attributes highlight the

high contrast of a subspace. Based on these statistical tests

and the detected dependence between attributes we derive our

contrast measure. It provides the means for high quality outlier

ranking in a selection of high contrast subspaces.

Overall, HiCS establishes a ﬁrst statistical subspace search

technique for density-based outlier ranking. In the following,

we will introduce the necessary notation in Section III-A, and

deﬁne the general objective for our high contrast subspaces in

Section III-B. We will introduce the notion of subspace slices

that specify local subspace regions in Section III-C, and deﬁne

the contrast measure in Section III-D. In Section III-E we will

show how different statistical tests can be used to instantiate

our contrast deﬁnition.

A. Notation

Let DB be a database containing N objects, each described

by a D-dimensional real-valued data vector x =(x

,...,x

The set A = {1,...,D} denotes the full data space of all

given attributes. Any attribute subset S = {s

,...,s

}⊆A

will be called a d-dimensional subspace projection. We denote

the distance between objects x and y as dist

(x, y), which

can be instantiated for instance by the widely used Euclidean

Distance dist

(x, y)=





s∈A

− y

)

As general property of any outlier ranking method we have

to consider the underlying scoring function. It measures the

outlierness of an object. Traditionally, each object is sorted

according to a single outlier score score(x) measuring the

degree of deviation in all given attributes A. Traditional

density-based outlier scores measure the density p(x) of an

object and compare it to the density in the local neighborhood

of x. Local outlier ranking based on density deviation in

local neighborhoods has ﬁrst been proposed by LOF [7]. In

recent years, this outlier mining paradigm has been extended

by enhanced scoring functions and efﬁcient outlier ranking

algorithms [25], [5], [13], [19], [17], [23], [9].

The problem with all of these full space approaches is intro-

duced by the curse of dimensionality. As pointed out in [6], the

deﬁnition of a local neighborhood becomes meaningless for

a large number of attributes. Furthermore distances between

objects grow more and more alike, thus

lim

|A|→∞

max

z∈DB

dist

(z,x) − min

z∈DB

dist

(z, x)=0

Since local outlier ranking calculates the density based on the

object distances, we observe the same effect for the minimal

and maximal value of score(x). As a result, all mentioned

outlier score functions will suffer from a loss of contrast, i.e.:

score(x) ≈ score(y) ∀ x, y ∈ DB

Any outlier ranking obtained for a sufﬁciently high dimen-

sional database will degenerate into a random ranking with

very similar scores for all objects.

Subspace outlier rankings address this problem by evalu-

ating the score function in lower dimensional subspace pro-

jections. They simply restrict the distance computation to a

selected subspace S, i.e., compute dist

. Thus, any outlier

ranking with score(x) can be extended to a subspace score

score

(x). The idea is to aggregate these score

(x) values

over several subspaces. Each score provides some insights

about the deviation of x in a lower dimensional projection

S. The ﬁnal ranking is derived from the aggregation of these

scores:

Deﬁnition 1: Outlier Score

score(x)=

|RS|



S∈RS

score

(x)

In the most basic approach [20], RS is a selection of

random subspaces that contribute to the overall ranking. A

major drawback of this approach is that irrelevant subspaces

in RS might blur the overall order of objects. To tackle this

challenge, we propose a novel method to select high contrast

subspaces only. Our subspace search technique excludes low

contrast subspaces, which inhibit a clear distinction between

outliers and regular objects.

For our experiments, we instantiate score

(x) with the

commonly used local outlier factor [7]. It has been used for

the subspace extension based on random projections [20] as

well. However, our technique is not restricted to LOF only.

Any other density-based scoring function could be used for

score

(x). This ﬂexibility w.r.t. the score function is a main

advantage of our method. We only consider the contrast of

subspaces and their selection as pre-processing step. Any

improvement in the area of outlier scoring can be applied

directly to our approach as well. In recent years several

extensions of LOF have addressed speciﬁc challenges for this

local outlier ranking [25], [19], [23], [17]. While each of these

publications proposes an individual score function, they all

have an assumption in common: An outlier has low density

compared to its local neighborhood. Our technique relies

only on this general assumption.

To derive our criterion for subspace contrast, we treat the

attributes in DB as random variables. We use the notion of

probability density functions (pdf) to derive the formal back-

ground of our contrast criterion. We will adapt the notation for

subspaces as follows. For a given subspace S = {s

,...,s

we refer to the projected data vectors as x

=(x

,...,x

Notation 1: The subspace data vector x

is distributed by

an unknown joint pdf of S:

,...,s

,...,x

)

By integration over all attributes s ∈A\s

we obtain:

Notation 2: The marginal pdf of attribute s

)

    

















(a) DatasetA–example of an uncorrelated joint pdf

    

















(b) DatasetB–example of a correlated joint pdf

Fig. 2. high vs. low contrast and the effects on outlier ranking

Please note that the marginal densities are simply one-

dimensional projections, independent from any subspace. Fur-

thermore, we can require a condition on the attributes s ∈

S \ s

, which leads to the following notion.

Notation 3: The conditional pdf of attribute s

| s ∈ S\s

|{x

: s ∈ S \s

})

Thus, we express the probability density function of s

w.r.t.

|S|−1 conditions on all other attributes in the subspace.

B. High Contrast Improves Outlier Ranking

Given the notion of probability density in any subspace

S, we measure the contrast by comparing conditional prob-

ability densities to the corresponding marginal densities for

all attributes s

∈ S. This idea is based on the following

key hypothesis: the detection of non-trivial outliers is only

possible in a subspace S that shows high dependence between

all attributes s

∈ S. The notion of non-trivial outliers is

a new concept and we will postpone the formal deﬁnition

for a moment. Intuitively, a non-trivial outlier is an outlier

in subspace S, but it is not visible as outlier in any one-

dimensional projection of S, i.e., all its one-dimensional

attribute values are located in regions of high density. Based on

the one-dimensional projections, a non-trivial outlier appears

to be a clustered object.

1) Motivation Example:

We illustrate the relationship between correlated subspaces

and non-trivial outliers by a toy example (cf. Figure 2). It

consists of two two-dimensional datasets. Both datasets were

generated from the same marginal distributions. In dataset A,

and s

are completely uncorrelated. As a result, this two-

dimensional subspace is ﬁlled by a random scattering of

objects in consistency with the marginal distribution. Never-

theless the dataset contains an outlier object o

. By considering

the one-dimensional projections of this subspace, the existence

of o

is not a surprise: o

could trivially be detected by the

examination of the one-dimensional distribution of attribute

. We call such an object a trivial outlier. In summary, the

evaluation of the two-dimensional subspace does not reveal

any new information for this dataset.

The other dataset features marginal distributions identical to

the ones of dataset A. The difference is that dataset B shows a

signiﬁcant correlation. The correlation allows the data objects

to form regions of varying or unexpected densities over the

total possible area that would be consistent with the marginal

distribution. We observe (a) cluster-like dense agglomerations

of objects and (b) sparse or even empty regions. Besides a

trivial outlier o

, the subspace also features an other outlier

. This time the outlier is hidden in all one-dimensional

subspace projections, where it even appears to be a clustered

object. We will call this type of objects non-trivial outliers.

For dataset B the evaluation of the two-dimensional subspace

was worthwhile and reveals signiﬁcant insight regarding the

data structure. Accordingly, we have found an example for a

high contrast subspace in this case.

Once we have found such a high contrast subspace we

can apply any density-based outlier ranking algorithm: for

instance in dataset B, o

and o

both exhibit a much lower

density compared to the local neighborhood. Thus, deter-

mining the outlierness in the two-dimensional subspace of

dataset B would result in a detection of o

and o

, i.e.,

score

1/2

)  score

) for all other objects o

in the

database.

We can also explain the essential idea of our approach

to identify high contrast subspaces using this toy example.

Depicted on top of each plot in Figure 2, we show two different

histograms for the s

axis of both datasets. The ﬁrst one

(red) represents the full data sample, i.e., corresponds to the

marginal probability distribution p

). The blue one shows

the conditional probability distribution that is generated by the

sample according to the selection range w.r.t. the s

axis (blue

area). The comparison of the blue vs. the red histograms for

both datasets show a basic property of correlation: Whereas

the histograms for dataset A are in good agreement, we see

a signiﬁcant discrepancy between the two histograms for the

high contrast subspace B. The proposed HiCS algorithm is

based on the evaluation of this discrepancy.

Please note that we design our contrast measure as a

conservative subspace selection criterion. The set of selected

subspaces is a proper superset of the subspaces containing

non-trivial outliers. We will later show that high contrast is

a necessary condition for non-trivial outliers. Still, the result

may contain subspaces without any outliers.

In the following we will focus on non-trivial outliers only.

The reason is simple: A user might already know about the

existence of one-dimensional outliers; one can detect these

outliers by existing methods [26] without difﬁculty. Moreover,

our subspace search can detect trivial outliers as a by-product

of the search for non-trivial outliers. For instance in dataset B,

we will always detect o

as outlier as soon as attribute s

part of any high contrast subspace. In any case, the detection

of non-trivial outliers will provide a much higher information

gain to the user. Therefore, we focus on the detection of

correlated subspaces containing such non-trivial outliers.

2) Contrast based on correlation of dimensions:

In probability theory, two events A and B are called inde-

pendent and uncorrelated, if and only if the probability of

the combined event is given by the product of the individual

probabilities, i.e.:

p(A ∩ B)=p(A) · p(B) (1)

By putting the notion of correlation in the context of sub-

spaces, we obtain:

Deﬁnition 2: A subspace S is called an uncorrelated

subspace if and only if:

,...,s

,...,x



i=1

) (2)

Please note that the formal distinction between statistical

dependence and correlation is not important for our purpose.

Strictly speaking, the term set of independent attributes would

be the appropriate expression. Instead we prefer to use the

more concise term uncorrelated subspace to express the sta-

tistical independence within a subspace.

To support the observations regarding Figure 2, we want to

examine the characteristics of outlier mining in uncorrelated

subspaces more formally. The observation of a high value of

score

(x) implies that the object x is located in a region

with a low value of the joint pdf p

,...,s

,...,x

).On

the other hand, we can evaluate the expected density for x

under the assumption of an uncorrelated subspace:

expected

,...,x

) ≡



i=1

) (3)

We deﬁne the notion of trivial outliers over the comparison of

the expected density with the joint density:

Deﬁnition 3: We call an object x

a non-trivial outlier

w.r.t. subspace S if

,...,s

,...,x

)  p

expected

,...,x

) (4)

Comparing the deﬁnition of an uncorrelated subspace (Eq. 2)

with the deﬁnition of non-trivial outliers leads to:

Theorem 1: An uncorrelated subspace S does not contain

any non-trivial outlier.

For an uncorrelated subspace, the joint probability density

function p

,...,s

,...,x

) is by deﬁnition equal to the

product of the marginal pdfs and thus, will never fulﬁll Eq. 4.

On the other hand, a correlated subspace allows signiﬁcantly

smaller values of p

,...,s

,...,x

) compared to the

expected density. Thus, we deﬁne subspace correlation as

objective function for the subspace contrast.

3) Measuring Correlation:

We propose to quantify the subspace contrast by a comparison

of different probability density functions. To simplify the

notation, we will express all following conditional probability

densities only for s

without loss of generality. In the case of

an uncorrelated subspace, Eq. 2 simpliﬁes the deﬁnition of all

conditional probability densities within the subspace, i.e.:

,...,x

,...,s

,...,x

)

,...,s

,...,x

)

= p

) (5)

This allows to measure the contrast of a subspace by deter-

mining the degree of violation of Eq. 5. In other words, we

have to compare a conditional pdf of s

to the corresponding

marginal pdf, and we assign a high contrast to a subspace

if we observe a signiﬁcant deviation between the two pdfs.

Please note that the correlation analysis within subspaces

goes beyond classical correlation analysis approaches, since

we may be faced with high contrast subspaces with more

than two dimensions. In contrast to, say, the Pearson or

Spearman correlation coefﬁcient [28], the proposed approach

is not limited in the subspace dimensionality. Furthermore,

it is possible to detect any kind of non-linear correlation.

Above all, our approach does not require an evaluation of a

high dimensional joint pdf, but is based on one-dimensional

densities only. Hence, it does not fall prey to the curse of

dimensionality.

In the following sections we will discuss (1) how to empiri-

cally analyze the the conditional pdf by introducing the notion

of subspace slices, (2) how to compare the conditional pdf to

the marginal pdf by means of statistical tests, and (3) how to

instantiate these statistical tests in our contrast measure.

C. Evaluation of conditional densities

The main challenge for the proposed calculation of the

subspace contrast is the empirical analysis of the conditional

probability densities p

s1|...

≡ p

,...,s

,...,x

Since we do not require any knowledge of the underlying

HiCS: High Contrast Subspaces for Density-Based Outlier Ranking

Figures

Citations

Outlier Analysis

Graph based anomaly detection and description: a survey

Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection

Graph-based Anomaly Detection and Description: A Survey

A survey on unsupervised outlier detection in high-dimensional numerical data

References

Principal Component Analysis

UCI Machine Learning Repository

A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise

A density-based algorithm for discovering clusters in large spatial Databases with Noise

Fast algorithms for mining association rules

Related Papers (5)

LOF: identifying density-based local outliers

Anomaly detection: A survey

Efficient algorithms for mining outliers from large data sets

Outlier detection for high dimensional data

Isolation Forest

Frequently Asked Questions (13)

Q1. What contributions have the authors mentioned in the paper "Hics: high contrast subspaces for density-based outlier ranking" ?

Q2. What have the authors stated for future works in "Hics: high contrast subspaces for density-based outlier ranking" ?

Q3. What is the advantage of subspace slices over any other density estimation method?

Q4. How did the authors generate clusters in the subspaces?

Q5. What is the heuristic for the subspace generation process?

Q6. How do the authors denote the distance between objects x and y?

Q7. How does HiCS perform on a broad variety of datasets?

Q8. What are the parameters that are required to perform the adaptive selection of the subspace?

Q9. What is the effect of the outlier ranking?

Q10. how do the authors measure the contrast between the attributes in subspace S?

Q11. How does the algorithm determine the sample size?

Q12. What is the effect of local outlier ranking?

Q13. How can the authors improve the quality of HiCS?