HiCS: High Contrast Subspaces for Density-Based Outlier Ranking
read more
Citations
Outlier Analysis
Graph based anomaly detection and description: a survey
Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection
Graph-based Anomaly Detection and Description: A Survey
A survey on unsupervised outlier detection in high-dimensional numerical data
References
Principal Component Analysis
A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise
A density-based algorithm for discovering clusters in large spatial Databases with Noise
Fast algorithms for mining association rules
Related Papers (5)
Frequently Asked Questions (13)
Q2. What have the authors stated for future works in "Hics: high contrast subspaces for density-based outlier ranking" ?
For future work, the authors aim at further evaluations with other outlier scores such as ORCA [ 5 ] or OUTRES [ 23 ]. Furthermore, the authors would like to extend the research on subspace selection and enhance their subspace search based on other outlier ranking paradigms. Both seem very promising extensions of LOF with enhanced outlier scoring. Due to the decoupled processing, their subspace search can be applied directly to these or other outlier scores.
Q3. What is the advantage of subspace slices over any other density estimation method?
The advantage of these subspace slices over any grid-based density estimation is that the authors can construct the subspace slices in a way that does not suffer from the curse of dimensionality.
Q4. How did the authors generate clusters in the subspaces?
The authors randomly selected 2-5 dimensional subspaces out of the full data space and generated high density clusters in these subspaces.
Q5. What is the heuristic for the subspace generation process?
The subspace generation process terminates when the Apriori merge step produces an empty list for the (d + 1)- dimensional subspace candidates.
Q6. How do the authors denote the distance between objects x and y?
The authors denote the distance between objects x and y as distA( x, y), which can be instantiated for instance by the widely used Euclidean Distance distA( x, y) = √∑ s∈A(xs − ys)2.
Q7. How does HiCS perform on a broad variety of datasets?
HiCS shows excellent results on a broad variety of datasets, with robust and easy-to-use parameters, and a scalable processing w.r.t. the dimensionality of databases.
Q8. What are the parameters that are required to perform the adaptive selection of the subspace?
The algorithm operates according to the sampling formalism in III-D. Besides the set of attributes that belong to the specific subspace, the algorithm requires two parameters:•
Q9. What is the effect of the outlier ranking?
As a result, all mentioned outlier score functions will suffer from a loss of contrast, i.e.:score( x) ≈ score( y) ∀ x, y ∈ DBAny outlier ranking obtained for a sufficiently high dimensional database will degenerate into a random ranking with very similar scores for all objects.
Q10. how do the authors measure the contrast between the attributes in subspace S?
Given the notion of probability density in any subspace S, the authors measure the contrast by comparing conditional probability densities to the corresponding marginal densities for all attributes si ∈ S.
Q11. How does the algorithm determine the sample size?
In their implementation the authors specified the size by a ratio α ∈ (0, 1) that determines the sample size dynamically in relation to the total size of the database.
Q12. What is the effect of local outlier ranking?
Since local outlier ranking calculates the density based on the object distances, the authors observe the same effect for the minimal and maximal value of score( x).
Q13. How can the authors improve the quality of HiCS?
Therefore it might be possible to improve the quality of HiCS even further by applying a pre-processing step that takes care of the detection of trivial outliers.