# Using Multi-armed Bandit to Solve Cold-Start Problems in Recommender Systems at Telco

## Summary (2 min read)

### 1 Introduction

- Telcos do not commonly supply a lot of services; most general they supply subscriptions, or rate-plans; either pre-paid or post-paid.
- Comparing this to a more traditional recommender problem where a user-item matrix might be sparse; in this example the matrix will be completely sparse.
- To solve this cold-start problem, given the fact that no prior information on the new user exists, one might think of a random recommendation of rate-plans.
- The authors approach this by applying the multi-armed bandit algorithms.
- Section 5 presents some experimental results and discussions.

### 3 Problem Definition

- Recommending a rate-plan for a new mobile telephony user differs from traditional recommender systems.
- Traditionally, recommender systems are in a context where users can purchase and own several products, such as books;.
- Finally, no explicit rating for the rate-plans exist.
- Among k (k ≥ 1) suggested rate-plans, the new user can only select one plan at any given time.
- Below, the authors suggest to take into account the two most popular measurements which are i ) the indicator function and ii ) the correlation value.

### 4 Bandit Algorithms for the CSAR Problem

- The game of the recommender system is to repeatedly pick up one of the rate-plans and suggest to a new user whenever she enters the system.
- The ultimate goal is to maximize the cumulative reward.
- Note that the setting in present context is slightly different from traditional MABs.
- In fact, in the case of using the indicator function, then the non-selected rate-plans by users will get a zero reward.
- Since the distributions of the rate-plans being selected are still unknown, the idea of using the MAB algorithms for the CSAR problem is still valid.

### 5 Experiments and Results

- This section details the datasets used in the experiments; the experimental settings, in which the detail implementation of the proposed methods and of the competing algorithms are provided; and contains an analysis and discussion of the experimental results.
- The authors use two different real-world client datasets from two brands of a major international Telco operator.
- These two datasets were collected during the first quarter of 2013.
- The first brand's dataset contains the descriptive features of 16 rate-plans, as well as information about the plans used by 3066 users.
- They have each chosen a rate-plan that fits their need.the authors.

### 5.2 Experimental Settings

- The first and the most naïve approach for the cold-start recommendation systems at Telcos is to choose randomly a rate-plan to recommend to a new user.
- This algorithm is very efficient, especially, when the authors do not have any description on users and the algorithm seems to be reasonable.
- The second trivial approach is to recommend the most popular rate-plan (Most common) to the new user.
- This is a sensible approach in terms of the efficiency and many operators apply this.
- The UCB algorithm estimates the value U CB tj for each plan.

### 5.3 Results and Analysis

- Table 2 shows the performances of the six different approaches for the cold-start problem on the two different real-world client datasets DS1 and DS2.
- It can be seen from the table that the random approach provided very poor results in both datasets.
- This forces the -greedy algorithm to follow the best rate-plan (i.e. the rate-plan has the maximal average reward value) all the time.
- The UCB gave us a surprisingly good precision and prediction results.
- The reason is that the UCB approach has a good strategy in balancing the exploitation of the best rate-plan at a time and the exploration of other different rate-plans which are also interest for the new users.

### 6 Conclusions and Future Research

- This work approaches recommending rate-plans to completely new users at Telco, without any prior information on them.
- An experiment was conducted on two different real-world client datasets from two brands of a major international Telco operator.
- From the experimental results, the authors observed that the UCB algorithm clearly outperforms traditional naïve approaches, as well as other classical multi-arm bandit algorithms.
- Improving the precision and AFP would still be preferable.

Did you find this useful? Give us your feedback

##### Citations

50 citations

16 citations

4 citations

##### References

9,873 citations

### "Using Multi-armed Bandit to Solve C..." refers background in this paper

...This problem has also been identified under slightly different names, such as: the new user problem [2], the cold start problem [3] or new-user ramp-up problem [4]....

[...]

6,361 citations

### "Using Multi-armed Bandit to Solve C..." refers background or methods in this paper

...The UCB algorithm estimates the value UCBtj for each plan....

[...]

...The UCB gave us a surprisingly good precision and prediction results....

[...]

...UCB [7] consists of selecting the rate-plan that maximises the following function:...

[...]

...The following three MAB algorithms are being used: -greedy [7] aims at picking up the rate-plan that is currently considered the best (i....

[...]

...The multi-armed bandit (MAB) is a classical problem in decision theory [5,6,7]....

[...]

3,883 citations

### "Using Multi-armed Bandit to Solve C..." refers background in this paper

...This problem has also been identified under slightly different names, such as: the new user problem [2], the cold start problem [3] or new-user ramp-up problem [4]....

[...]

^{1}, University of Milan

^{2}, Hebrew University of Jerusalem

^{3}, AT&T

^{4}

2,370 citations

### "Using Multi-armed Bandit to Solve C..." refers background or methods in this paper

...The case of EXP3 shows even worse performance than the -greedy....

[...]

...EXP3 [19] selects a rate-plan according to a distribution, which is a mixture of the uniform distribution and a distribution that assigns each plan a probability mass exponential in the estimated cumulative rewards for that plan....

[...]

...Finally, EXP3 selects a plan according to a give distribution, as described in [19]....

[...]

...In this equation, µ̂j favours a greedy selection (exploitation) while the second term √ 2 ln t tj favours exploration driven by uncertainty; it is a confidence interval on the true value of the expectation of reward for plan j. EXP3 [19] selects a rate-plan according to a distribution, which is a mixture of the uniform distribution and a distribution that assigns each plan a probability mass exponential in the estimated cumulative rewards for that plan....

[...]

2,143 citations

##### Related Papers (5)

##### Frequently Asked Questions (9)

###### Q2. What have the authors stated for future works in "Using multi-armed bandit to solve cold-start problems in recommender systems at telco" ?

The authors would like to extend their gratitude to Professor Helge Langseth at the Department of Computer and Information Science, at the Norwegian University of Science and Technology ( NTNU ), and Dr. Humberto N. Castejón Mart́ınez and Dr. Kenth Engø-Monsen at Telenor Research ; without whom this work would not have been possible.

###### Q3. What is the common way to measure the similarity between rate-plans?

If the authors use the indicator function as the similarity measurement, then the problem becomes to design an algorithm that predict the rate-plan p∗t chosen by the new user.

###### Q4. What is the reward of the non-selected rate-plans?

In the case of using the correlation value, the rewards of the non-selected rate-plans will be the correlation value between the two vectors p and p∗.

###### Q5. What is the game of the recommender system?

The game of the recommender system is to repeatedly pick up one of the rate-plans and suggest to a new user whenever she enters the system.

###### Q6. What is the objective of the CSAR problem?

If the authors denote the similarity value between the recommended plan pt and the actual demand of the new user ut by a similarity(needt, pt), then the objective when solving the CSAR problem is to select the rate-plans pt that maximizes the following so called ”cumulative reward” (Reward) over all T new users:RewardT = T∑ t=1 (similarity(needt, pt))The CSAR problem would be easy to solve if the authors knew about the user’s needs needt.

###### Q7. What is the purpose of the UCB algorithm?

To have a better explanation, by looking at the UCB algorithm as described in previous section, the authors see that the recommendation of a rate-plan is a result of solving the trade-off between the average reward and the number of times the plan has been selected so far by users.

###### Q8. What is the way to evaluate the algorithm?

To evaluate any algorithm solving this problem, the authors can use the classical precision measurement: PrecisionT = 1 TReward (1) TCorrelation value

###### Q9. What is the probability of a rate-plan being accepted?

The is also true for the second dataset, where a randomly recommended rate-plan only has a 113 = 7.67% probability of being correct.