# A Clustering Model for Uncertain Preferences Based on Belief Functions

## Summary (3 min read)

### 1 Introduction

- Community detection is a very popular topic in network science field, and has received a great deal of attention.
- It is a key task for identifying groups (i.e. clusters) of objects that share common properties and/or interact with each other.
- In [17], the authors introduced a new community detection algorithm based on preference network.
- To form meaningful groups of agents according to their preferences, a clustering algorithm need to capture the preference data structure and to cope with imperfect information.
- In previous works [19, 11], a qualitative and expressive preference modeling strategy based on the theory of belief functions to model imperfect preferences was proposed.

### 2.1 Preference Order

- A binary relation satisfies any of the following properties: reflexive, irreflexive, symmetric, antisymmetric, asymmetric, complete, strongly complete, transitive, negatively transitive, semitransitive, and Ferrers relation [13].
- The detailed definitions of the properties are not in the scope of this article.
- Inspired by a four-valued logic introduced in [1, 11, 4], the authors introduce four relations between alternatives.

### 2.2 Dissimilarity between orders

- To measure the dissimilarity between two preferences represented by total orders, metrics such as Euclidean distance and Kendall distance are often adopted.
- Given two preference orders O1 and O2 on the same alternatives, the authors give some basic concepts on such metrics.
- The rank function r(O, a) denotes the position of the alternative a according to the order O.
- Intuitively, both partial orders agree that ai and aj are tied.

### 3.2 Preference model on belief functions

- The objective is to cluster the experts represented by their preferences under uncertainty.
- The authors consider the model from [19] to represent this uncertain preference by the theory of belief functions.
- This procedure consists in two steps: 1. Initialization of mass functions 2. Clustering on quasi orders represented by mass functions.

### 4 Contribution: agent clustering based on their preferences

- The authors explain how the agents are represented and clustered from two sources of preferences.
- The first block concerns the representation of agents and modeling of mass functions from two preference sources S1, S2 (sections 4.1 and 4.2).
- The second block concerns the measure of dissimilarity between agents (section 4.3).
- The third block concerns clustering algorithm, the authors use EkNNclus algorithm in their work (section 4.4).

### 4.3 Dissimilarity between different agents

- The dissimilarity measure is based on Jousselme distance [8] for mass functions.
- Given two mass functions modeling preference relations between alternatives i and j from agents u1 and u2 expressing preference orders O1, O2.
- To simplify the expression, the authors use BF model to refer to their model and the corresponding dissimilarity function.

### 4.4 Unsupervised classifier–Ek-NN [3]

- For dissimilarity spaces in which only pairwise distances are given (such as Kendall distance), the centroid of several agents is a metric k-center problem and is proved to be NP-hard.
- Therefore, the authors avoid using clustering methods requiring the calculation of centroid, such as k-means.
- The authors applied Ek-NNclus method [3] as classifier.

### 5 Experiments

- The authors still wonder its quality for clustering on certain preferences.
- Thus the clustering quality of their model can be divided into two aspects: on certain preferences and on uncertain preferences.

### 5.1 Evaluation criteria

- With the similar aforementioned reasons, it’s NP-hard to calculate centroids.
- Thus, the authors choose two evaluation criteria that do not require a cluster centroid calculation: Adjusted Rand Index (ARI) [7] for data with ground truth and silhouette coefficient [15] for any dataset.
- The authors tested different metrics on synthetic certain and uncertain preferences.
- The authors also compared different metrics on a real world certain preferences from SUSHI data set [9].
- In the following parts, the authors introduce the method of generating synthetic preferences and compare the clustering quality of different metrics.

### 5.2 Certain preferences

- On synthetic data Certain preferences are those who are from non-conflicting sources.
- To study the clustering quality, the authors firstly generate preferences with different ranges to their centroids.
- The authors test on various K and choose the one that returns the largest ARI and average silhouette coefficient4 as their result.
- On real data SUSHI preference dataset [9] is collected from a survey on Japanese consumer preferences over different sushis.
- 6 Kendall distance and BF model have similar quality.

### 5.3 Uncertain preferences

- The authors suppose a case that two preferences are given with different representations: ranking and score.
- Indifference relations are introduced, causing conflicts between two preference sources.
- Or of 10 alternatives a1 to a10, the scores are generated by the following rules: – For least preferred two alternatives (2 alternatives at the end of the Or, i.e. ranking no. 9 and 10), the authors give score 1. – For alternatives sorted at the positions 7 and 8, they give score 2. –.
- As indifference relations exist in Os, the authors apply Fagin distance for Os.
- The results illustrated by these figures show the advantage of BF model over Euclidean distance and Kendall distance when dealing with two sources.

### 6 Conclusion and perspectives

- The authors investigate the problem of clustering individuals according to their preferences, when dealing with multiple and conflicting sources (two in their case study).
- To cope with this issue, the authors apply the theory of belief functions (BF model) to express and interpret the contradictions and conflicts from different sources as uncertainty and ignorance.
- To highlight the relevance of the proposed solution, the authors perform experiments on synthetic and real data to compare their method with other preference models, and found the advantage in the expressiveness of the uncertainty and the incomparability of the preference orders.
- In certain cases, BF model has equivalent clustering-quality with Kendall distance and outperforms Euclidean distance.

Did you find this useful? Give us your feedback

##### Citations

11 citations

2 citations

##### References

14,565 citations

### "A Clustering Model for Uncertain Pr..." refers methods in this paper

...The theory of belief functions (also referred to as Dempster-Shafer or Evidence Theory) was firstly introduced by Dempster [2] then developed by Shafer [16]...

[...]

...The theory of belief functions (also referred to as Dempster-Shafer or Evidence Theory) was firstly introduced by Dempster [2] then developed by Shafer [16] 3 In our work, we take p = 0.5 A clustering model for uncertain preferences based on belief functions 5 as a general model of uncertainties....

[...]

14,144 citations

### "A Clustering Model for Uncertain Pr..." refers methods in this paper

...Thus, we choose two evaluation criteria that do not require a cluster centroid calculation: Adjusted Rand Index (ARI) [7] for data with ground truth and silhouette coefficient [15] for any dataset....

[...]

10,137 citations

5,437 citations

### "A Clustering Model for Uncertain Pr..." refers methods in this paper

...The theory of belief functions (also referred to as Dempster-Shafer or Evidence Theory) was firstly introduced by Dempster [2] then developed by Shafer [16]...

[...]

...The theory of belief functions (also referred to as Dempster-Shafer or Evidence Theory) was firstly introduced by Dempster [2] then developed by Shafer [16] 3 In our work, we take p = 0.5 A clustering model for uncertain preferences based on belief functions 5 as a general model of uncertainties....

[...]

4,637 citations

##### Related Papers (5)

##### Frequently Asked Questions (2)

###### Q2. What are the future works in "A clustering model for uncertain preferences based on belief functions" ?

In the future, the authors will work on an ameliorated BF model 14 Y. Moreover, a more general dissimilarity measure method for incomplete orders ( i. e. quasiorders ) is also in the scope of their future work. In fact, the combination of preferences from multiple sources is a social choice problem, and different combination rules can be applied, corresponding to different complexity.