A New Information Theory-Based
Serendipitous Algorithm Design
Xiaosong Zhou
1
, Zhan Xu
1
, Xu Sun
1(&)
, and Qingfeng Wang
2
1
Faculty of Science and Engineering,
University of Nottingham Ningbo China, Ningbo, China
Xu.sun@nottingham.edu.cn
2
Business School, University of Nottingham Ningbo China, Ningbo, China
Abstract. The development of information technology has stimulated an
increasing number of researchers to investigate how to provide serendipitous
experience to users in the digital environment, especially in the fields of
information research and recommendation systems. Although a number of
achievements have been made in understanding the nature of serendipity in the
context of information research, few of these achievements have been employed
in the design of information systems. This paper proposes a new serendipitous
recommendation algorithm based on previous empirical studies by taking into
considerations of the three important elements of serendipity, namely “unex-
pectedness”, “insight” and “value”. We consider our design of the algorithm as
an important attempt to bridge the research fruits between the two areas of
information research and recommendation systems. By applying the designed
algorithm to a game-based application in a real life experiment with target users,
we have found that comparing to the conventional designed method; the pro-
posed algorithm has successfully provided more possibilities to the participants
to experience serendipitous encountering.
Keywords: Serendipity
Recommendation system Information theory
1 Introduction
Serendipity is widely exp erienced in human history, it is defined as “an unexpected
experience prompted by an individual’s valuable interaction with ideas, information,
objects, or phenomena” [1]. So far studies relating to serendipity mainly focus on the
following two directions: theoretical studies in the area of information research which
aim to investigate the nature of serendipity [2–4], and the empirical studies with the
purpose to develop applications or algorithms that provide users with serendipitous
encountering especially in the digital environment [5–7].
One of the areas which try to employ serendipity applications is the design of
recommender system. The overloaded information in the cyber space has made current
users no longer satisfied by recommending them those “accurate” information, instead,
users aims to be recommended with the information that are more serendipitous and
interesting to them [8–10]. However, a rising concern identified in our reviewing of
relevant studies is that those discoveries from information research regarding the nature
© Springer International Publishing AG 2017
S. Yamamoto (Ed.): HIMI 2017, Part II, LNCS 10274, pp. 314–327, 2017.
DOI: 10.1007/978-3-319-58524-6_26
of serendipity do not receive sufficient attentions in the recommender system designs.
This paper proposes a new algor ithm to support serendipitous recommendation by
applying recent research fruits on serendipity in the area of information research.
2 Problem and Research Question
Recommender system researchers often consider serendipity as “unexpected” and
“useful” [11], and have designed recommendation algorithms through either
content-based filtering [12] or collaborative filtering [13]. However, most of the rec-
ommendation algorithms mainly focus on providing “unexpectedness” to the users, and
treated the “usefulness” as only a metric value to measure the effectiveness of their
algorithms rather than considering it as a design clue [14].
As a comparison, serendipity in information research is often considered with three
main characteristics: unexpectedness, insight and value [4]. “Unexpectedness” is con-
sidered as the encountered information should be unexpected or a surprise to the
information actor, while “value” specifies that the encountered information should be
considered as useful and beneficial to the information actor. These two understandings of
“unexpectedness” and “value” consist with the current view of serendipity in designing
recommender systems [11, 14]; however, the “insight” aspect tends to be neglected.
“Insight” is considered as an ability to find some clue in curren t environment, then
“making connections” between the clue and one’ s previous knowledge or experience,
and finally shift the attention to the new discovered clue [15]. Some researchers have
found such ability of “making connections” is actually a key facet in experiencing
serendipity [4] and can be quite different among individuals and result in a range of
serendipity encounterers from the super-encounterers to occasional-encounterers [16].
The connections can be made between different pieces of information, people and ideas
[3]; therefore, to support or “trigger
” connection-making in order to bring more pos-
sibilities of experiencing serendipity have always been considered as an imp ortant
design clue for those information researchers [17, 18].
Based on the discussed issues, we then raise our research question: is it possible to
combine the theoretical studies of serendipity in information research, especially the
ignored aspect of “insight” or “making connection”, into the recommender system
design?
Followed by our research question, we proposed a collaborative-filtering based
algorithm by considering the theoretical discoveries of serendipity from the area of
information research. Based on the discovery from information research that serendipity
is often encountered in a relaxed and leisure personal state [1, 3], we then applied the
algorithm into a game based application and conducted an empirical experiment.
3 Proposed Algorithm
There are two major concerns in providing serendipitous encountering in the recom-
mendation system design: the first concern is how to balance “unexpectedness” and
“useful”. As pointed out by [14], there should be “a most preferred distance” between
A New Information Theory-Based Serendipitous Algorithm Design 315
the two values, as the high level of unexpectedness may cause user’ s dissatisfaction of
the recommended information, while users may also lose interest to that information
with a low unexpectedness. The second concern is how to combine “insight” into
system design to stimulate the process of “making connections”.
The two concerns are addressed from the following perspective of “relev ance” with
two hypotheses:
• Hypothesis 1: Given the information that is highly relevant to a user’s personal
profile, the information would also of a high potential value to the user;
• Hypothesis 2: A user will be unexpected to the information that is relevant to his
profile while is not previ ous acknowledged or known by the user.
Consider a target user A, who is the user that will be provided with the recom-
mended information, a user B who is highly relevant to user A and a user C who is
highly relevant to user B while is not known by user A. The user A may experience
serendipity by providing the information of user C, which is unexpected to him/her,
and by providing the relationship between user B and user C, which may further cause
interestingness or usefulness to user A. The following part of this section illustrates a
detailed implementation of the algorithm.
1. Target user
Consider a table of a target user profile U
1
with a category set C = {C
1
,C
2
,C
3
…C
i
…
C
n
}, where C
i
represents the i-th category of the user profile. All the categories are
arranged through the value of their weights in the user profile. The weight can either be
a given weight by the dataset or calculated through clustering analysis [19]. In order to
simplify the introduction of our proposed algorithm here, it is more convenient to set
the weight for each C
i
which is given by the dataset in the very beginning. The weight
of C
i
is larger than C
j
(i > j)inC set:
w
c
¼ w
C
1
; w
C
2
; ...; w
C
i
; ...; w
C
i
; ...; w
C
n
w
C
i
w
C
j
; i [ j
ð1Þ
For each category set C
i
, consider C
i
= {a
1
,a
2
,a
3
… a
i
… a
n
}, wher e a
i
is the
corresponded attribute to each vector C
i
. In particular, for each a
i
represents the
dimension according to which a new user profile may be produced (i.e. author of
literatures; musicians). The values for each a
i
are also arranged by their weight in each
vector C
i
and can be calculated through semantic analysis such as the tf*idf weight
(term-frequency times inverse document frequency) calculation [20]:
wðt; dÞ¼
tf
t;d
log
N
df
t
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P
i
tf
t
i
;d
2
log
N
df
t
i
2
s
ð2Þ
Where w(t,d) represents for the weight of a term t in a document d, and it is a
function of the frequency of t in the document (tft,d), the number of documents that
316 X. Zhou et al.
contain the term (dft) and the number of documents in the collection (N). As a result,
the weight for a category set C
i
is determined by the weight of each attribute in the set:
w
c
i
¼ w
a
1
; w
a
2
; ...; w
a
i
; .. .; w
a
j
; ...; w
a
n
w
a
i
w
a
j
; i [ j
ð3Þ
2. Screen the weight
As been pre-defined that C
1
with the largest weight in the C set and a
1
with the largest
weight in the C
i
set. Set a threshold s to eliminate the low weight value from the user
profile U
1
:
w
c
i
¼ w
a
1
; w
a
2
; ...; w
a
i
; .. .; w
a
j
; ...; w
a
n
w
a
i
w
a
j
; i [ j
ð4Þ
Similarly, set a threshold h to eliminate the low weight value from the C
i
set:
w
c
i
¼ w
C
i
a
1
; w
C
i
a
2
; w
C
i
a
3
; ...; w
C
i
a
i
w
C
i
a
i
h
jfg
ð5Þ
3. Generate a new user profile
A new user profile U
i+1
is produce d according to each a
i
in the C
i
set. Here, the
generation of the user profile arrang es from the largest weight of w
C
i
;a
1
to the smallest
weight of w
C
i
;a
i
.
4. Iteration and End condition
Based on the weight arrangement in a user profile, it is intuitional that for an attribute a
i
with a large weight, it is also with more possibility for the current user to have
acknowledged about the information of a
i
. In other words, the probability for a current
user U
i
to make connection with the next user profile U
i+1
is proportional to the weight
of the attribute in current user profile:
PðU
i þ 1
U
i
j
Þ¼kw
c
i
w
c
i
;a
i
ð6Þ
where k is the proportionality coefficient of the probability to the relev ant weight.
The probability of making connections by target user U
1
to i-th user can be further
extended if only the generated user is always new to the prior generated ones:
PðU
i
U
1
j
Þ¼PðU
2
U
1
j
ÞPðU
3
U
2
j
Þ... PðU
i
U
i1
j
Þð7Þ
The iteration to find the next user would not continue until it meets the following
two end conditions:
• the generated user is no longer new to all the previous generated users;
• PðU
i
U
1
j
Þ comes to a threshold d , where d represents an appropriate threshold of the
probability.
A New Information Theory-Based Serendipitous Algorithm Design 317
The reason to set the threshold d is to ensure the effectiveness of the iteration
process. This is because if PðU
i
U
1
j
Þ comes too large, the recommended information
may fail to bring the target user with the sense of unexpectedness, as the recommen-
dation may probably have been acknowledged by the user; however, if the value of
PðU
i
U
1
j
Þ is too small, the recommended information may be too irrelevant to the target
user and he/she may lose interest on it. Hence the setting of the threshold d is a very
important step for the iteration process and it needs to be further identified based on
empirical studies in the future. Once the recommendation list is generated within the
threshold d, they can be recommended to the target user by selecting the item with the
highest values of PðU
i
U
1
j
Þ.
5. Recommendation
When the iteration is finished, the content with the largest weighted category in current
candidate will be provided to the target user, in ad dition with the relevant information
of the previous searched users that result in the current user.
6. An example of the proposed algorithm
An example of the proposed algorithm is provided in Fig. 1. Consider Ann as the target
user (U
1
) with different literature categories of {A, B, C} in her person al library, whose
weight is {0.5, 0.3, 0.2} (Fig. 1a). The author names of the literatures are set as the
attributes for each category and according to the tf*idf weight calculation, there are
three values {a
1
,a
2
,a
3
} in category A with the weight W’A = {0.6, 0.3, 0.1}. Set k =1
for each probability of the current user to find the next user pro file, the probability for
Ann to find a1’s profile (U
2
) can be calculated according to Eq. (6):
PðU
2
U
1
j
Þ¼w
A
w
A;a
1
¼ 0: 5 0:6 ¼ 0:3 ð8Þ
The profile of a1 is then produced as Fig. 1b. Likewise, among the four authors in
the D catego ry, author d1 (U
3
) weights largest and then produce d1’s profile (Fig. 1c):
PðU
3
U
2
j
Þ¼w
D
w
D;d
1
¼ 0:4 0:5 ¼ 0:2 ð9Þ
According to Eq. (7), the probabili ty for Ann (U
1
) to find d1’s profile (U
3
) is:
PðU
3
U
1
j
Þ¼PðU
2
U
1
j
ÞPðU
3
U
2
j
Þ¼0:3 0:2 ¼ 0:06 ð10Þ
Set the threshold d as 0.06, then the iteration of the algorithm stops and recommend
literatures of category F in d1’s profile to Ann, in addition with the relevant information
of d1 and a1. For example, the recommended information can be “these papers (category
F) are most stored by d1, who had published papers (d1, d2, d3, d4) with a1 before”.
7. Description of the Proposed Algorithm
The proposed algorithm is collaborative filtering based, hence it is more appropriate to
those dataset whose content is generated by different users, according to which the next
user’s profile will be easier to produce for a current user.
318 X. Zhou et al.