A cross-cultural user evaluation of product recommender interfaces

doi:10.1145/1454008.1454022

A Cross-Cultural User Evaluation of Product

Recommender Interfaces

Li Chen and Pearl Pu

Human Computer Interaction Group, School of Computer and Communication Sciences

Swiss Federal Institute of Technology in Lausanne (EPFL)

CH-1015, Lausanne, Switzerland

{li.chen, pearl.pu}@epfl.ch

ABSTRACT

We present a cross-cultural user evaluation of an organization-

based product recommender interface, by comparing it with the

traditional list view. The results show that it performed

significantly better, for all study participants, in improving on

their competence perceptions, including perceived

recommendation quality, perceived ease of use and perceived

usefulness, and positively impacting users’ behavioral intentions

such as intention to save effort in the next visit. Additionally,

oriental users were observed reacting more significantly strongly

to the organization interface regarding some subjective aspects,

compared to western subjects. Through this user study, we also

identified the dominating role of the recommender system’s

decision-aiding competence in stimulating both oriental and

western users’ return intention to an e-commerce website where

the system is applied.

Categories and Subject Descriptors

H.5.2 [Information interfaces and presentation]: User

Interfaces – evaluation/methodology, graphical user interfaces

(GUI), user-centered design.

General Terms

Design, Experimentation, Human Factors.

Keywords

Product recommender systems, organization interface, list view,

cross-cultural user study.

1. INTRODUCTION

Online systems that help users select the most preferential item

from a large electronic catalog are known as product search and

recommender systems. In recent years, much research work has

emphasized on developing and improving the underlying

algorithms, whereas many of the user issues such as acceptance of

recommendations and trust building received little attention.

Trust is seen as a long-term relationship between a user and the

organization that the online technology represents. It is critical to

study especially for e-commerce environments where the

traditional salesperson, and subsequent relationship, is replaced

by a virtual vendor or a more intelligent product recommender

agent. Studies show that customer trust is positively associated

with customers’ intentions to transact, purchase a product, and

return to the website [9]. However, these results have mainly been

derived from online shops’ ability to ensure security, privacy, and

reputation (i.e., the integrity and benevolence aspects of trust

formation) [8], and less from the website’s competence such as its

decision agent’s ability in providing good recommendations and

explaining its results.

We have always been engaged in investigating the effective

recommender design factors that may positively impact the

promotion of users’ trust and furthermore their behavioral

intentions. Previously, we have conceptualized a competence-based

trust model for recommender systems [4]. We have primarily

studied trust-building by the different design dimensions of

explanation interfaces, given explanations’ potential benefits to

improve users’ confidence about recommendations and their

acceptance of the system [10,18].

The traditional strategy of displaying and explaining

recommendations, as popularly adopted in most of case-based

reasoning recommender systems [15] and commercial websites

(www.activedecisions.com), is to display the recommendation

content in a rank ordered list and use a “why” component along

with each item to explain the computational reasoning behind it.

In order to accelerate users’ decision process by saving their

information-searching effort in reviewing all recommended items,

we have proposed a so called preference-based organization

technique. The main idea is that, rather than explaining each item

one by one, a group of products can be explained together by a

category title, provided that they have shared tradeoff characteristics

compared to a reference product (e.g., the top candidate) [17]. In the

following, we first summarize previous studies on the organization

method and then give the contribution of our current work.

1.1 Summary of Previous Studies

A carefully conducted user survey (53 subjects) first showed

some interesting observations regarding the influence of

explanations on trust building and the effectiveness of the

organization-based recommender interface [4]. That is, most of

surveyed users strongly agreed that they shall trust more in a

system with the explanation of how it computed the

recommended items. Moreover, the organized view of

recommendations was largely favored than the traditional “why”-

based list view, since it was perceived to more likely accelerate

the process of product comparison and choice making.

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that

copies bear this notice and the full citation on the first page. To copy

otherwise, or republish, to post on servers or to redistribute to lists,

requires prior specific permission and/or a fee.

RecSys’08, October 23–25, 2008, Lausanne, Switzerland.

75

A follow-up user study asked 72 participants to evaluate the two

types of recommender interfaces in a within-subject procedure

[17]. The user task was to find a product s/he most preferred

among a set of most popular products recommended in either an

organized view or a list view with “why” components.

Results show that while both interfaces enabled trust-building, the

organized view was significantly more effective in increasing

users’ task efficiency, saving their cognitive effort and prompting

them to intend to return to the interface for future use.

1.2 Contribution of Our Current Work

The previous two experiments pointed out promising benefits of

the organization interface regarding its trust-inspiring ability.

They motivated us to further evaluate the interface’s practical

performance in a more realistic and interactive system where it

serves as the computation and explanation of personalized

recommendations according to users’ preferences (rather than

based on products’ general popularity). In such system,

preference specification/revision tools are provided for users to

input and refine their preferences, and the recommender interface

is returned whenever the user’s preferences are revised.

In addition, we were interested in identifying whether people

from different categories of cultural backgrounds (i.e., oriental

and western cultures) would all react actively to the organization-

based system. Thus, a relatively larger scale cross-cultural

experiment was set up, and a comparative user study was

additionally involved to compare the organization interface with

the “why”-based list view which was implemented in a similar

interactive system setting.

An extended evaluation framework based on previous

measurements was also established to assess the system’s actual

benefits in respect of three design aspects: recommendation

quality, transparency, and user-control. As for upper-level

competence perceptions, perceived ease of use and perceived

usefulness, the two primary determining elements of convincing

users to accept a technology [6], were included, besides decision

confidence, perceived effort and satisfaction. Three trust-induced

behavioral intentions were also contained, which are intention to

purchase, intention to return and intention to save effort in the

next visit.

This paper is hence organized as follows: section 2 and 3 describes

the organization-based interface and its function in an implemented

prototype system; section 4 introduces the cross-cultural user

evaluation’s design and experimental procedure; section 5 presents

results from the study; and section 6 concludes the paper’s work.

2. ORGANIZATION-BASED

RECOMMENDER INTERFACE

The organization interface has been developed to compute and

categorize recommended products, and use the category title (e.g.

“these products have cheaper price and longer battery life, but

slower processor speed and heavier weight”) as the explanation of

multiple products (see Figure 1). Each presented title essentially

details the representative tradeoff properties shared by a set of

recommended products by comparing them with the top candidate

(the best matching product according to the user’s current

preferences). It exposes the recommendation opportunities and

indicates the reason of why these products are recommended, by

revealing their superior values on some important attributes, and

compromises on less important ones.

To derive effective principles for this interface design, we tested

13 paper prototypes by means of pilot studies and user interviews,

and finally concluded five design principles. The principles

include: proposing improvements and compromises in the

category title using the conversational language, keeping the

number of tradeoff attributes in the category title under five,

including a few of actual products within each category, and

diversifying the proposed category titles as well as associated

products (see details in [17]). We accordingly proposed an

algorithm to generate such organization interfaces [5]. Briefly

speaking, the algorithm contains three main steps:

Step 1: the user preferences over all products are represented as a

weighted additive form of value functions according to the multi-

attribute utility theory (MAUT) [11]. Based on this compensatory

preference model, we can resolve conflicting values explicitly by

considering tradeoffs between different attributes;

Step 2: all alternatives are ranked by their weighted utilities

calculated according to the MAUT model. Then, each of them,

except the ranked first one (i.e., the top candidate), is converted

into a tradeoff vector. Each tradeoff vector is a set of (attribute,

tradeoff) pairs, where tradeoff indicates the improved (denoted as ↑)

or compromised (↓) property of the product’s attribute value

compared to the same attribute of the top candidate. For the

attributes without explicitly stated preferences, default properties

are suggested (e.g., the cheaper, the better). For example, a

tradeoff vector is {(price, ↑), (processor speed, ↓), (memory, ↓),

(hard drive size, ↑), …}, meaning that the corresponding laptop has

lower price, slower processor speed, less memory, more hard drive

size, etc, in comparison with the top recommended laptop;

Step 3: all of the tradeoff vectors are then organized into different

categories by utilizing an association rule mining tool [1] to

discover the recurring subsets of (attribute, tradeoff) pairs among

them. Each subset hence represents a category of products with

the same tradeoff properties. Since a large amount of category

candidates would be produced by the mining algorithm, they are

further ranked and diversified. We select ones with higher tradeoff

utilities (i.e., gains against losses relative to the top candidate and

user preferences) in consideration of both category titles and their

associated products.

Therefore, the presented category titles can in nature stimulate users

to consider hidden needs and even guide them to conduct tradeoff

navigations for a better choice. For instance, after the user saw the

products that “have faster processor speed and longer battery life,

although they are slightly more expensive”, she may likely

change to that direction from the top candidate, if she realized that

the processor speed is more important than the price to her, or she

likes “longer battery life” although she did not state any

preference on this attribute before. The support for this kind of

tradeoff navigation process has been demonstrated to have

significant effect on increasing users’ decision accuracy and

preference certainty [16]. We have previously compared our

organization algorithm with other typical tradeoff supporting

approaches (such as the data-driven dynamic critiquing system

[14]), and found that it achieved significantly higher accuracy in

predicting tradeoff criteria and targeted products that users

actually made, mainly owing to its preference-focused clustering

and selection strategies [5].

76

Figure 1. Screenshot of the organization-based recommender interface.

Figure 2. Screenshot of the list view of recommendations.

77

3. PROTOTYPE SYSTEM

We implemented the organization interface in a product

recommender system, which is in particular to assist users in

searing for high-involvement products (e.g., notebooks, digital

cameras, and cars) for which people will be willing to spend

considerable effort in locating a desired choice, in order to avoid

any financial damage or emotional burden.

A typical interaction procedure with the system can be as follows.

A user initially starts her search by specifying any number of

preferences in a query area. Each preference is composed of one

acceptable attribute value and its relative weight from 1 “least

important” to 5 “most important”. A preference structure is hence

a set of (attribute value, weight) pairs of all participating

attributes, as required by the MAUT model. After a user states her

initial preferences, the best matching product will be computed

and returned at the top, followed by k categories of other

recommended products as outcomes of the organization algorithm

(k = 4 in our prototype, see Figure 1). If the user is interested in

one of the suggested categories, she can click “Show All” to see

more products (up to 6) belonging to it. Among these products,

the user can either choose one as her final choice, or select a near-

target and click “Better Features” to view recommended products

with some better values than the selected one. In the latter case,

the user’s preference model will be automatically refined to

respect her current needs. Specifically, the weight of improved

attribute(s) that appears in the examined category title will be

increased and the weight of compromised one(s) be decreased. All

attributes’ acceptable values will be also updated according to the

selected new reference product.

On the other hand, the user can revise preferences on her own

through clicking the button “Specify your own criteria for ‘Better

Features’”. A critiquing page will be then activated that provides

her with options for making self-specified tradeoff criteria to a

near-target. For example, the user could choose to optimize any

attributes’ values (e.g., $100 cheaper) and accept compromise(s)

on one or more less important attributes, which revisions will be

directly reflected in her preference model. A small set of tradeoff

alternatives that best satisfy the stated tradeoff criteria will be

then returned, among which she either makes the final choice or

proceeds to conduct any further tradeoff navigations in either the

organization interface or by her self-initiated way.

Moreover, the system allows the user to view the product's

detailed specifications via a “detail” link, and to record all of her

interesting products in a consideration set to facilitate comparing

them before checking out.

4. CROSS-CULTURAL EVALUATION

4.1 Cultural Difference

It is commonly recognized that elements of a user interface

appropriate for one culture may not be appropriate for another.

For example, Barber and Badre [2] claimed that Americans prefer

websites with a white background, while Japanese dislike the

white and Chinese favor the red background.

People are deeply influenced by the cultural values and norms

they hold. Many researchers have classified cultures around the

world in various categories. The most typical classification is

Oriental vs. Western cultures. The Oriental culture, influenced by

the ancient Chinese culture, focuses on holistic thought,

continuity, and interrelationships of objects. On the contrary, the

Western culture, influenced by the ancient Greek culture, puts

greater emphasis on analytical thought, detachment, and attributes

of objects [13].

In online user-experience researches, one primary reason

identified for consumer behavior differences has been based on

the belief that western countries generally have individualism and

a low context culture, whereas eastern countries generally have

collectivism and a high context culture [3].

Thus, we were interested in recruiting people from the two

different cultural backgrounds to see whether the culture

difference would influence their actual behavior and subjective

perceptions with our product recommender system, while they use

it to make a purchase decision. In our experiment, the participants

were mainly coming from two nations respectively representing

the two different cultures: China (oriental culture) and

Switzerland (western culture).

4.2 Participants and Materials

In total, 120 participants volunteered to take part in the

experiment. In collaboration with the HCI lab at Tsinghua

University in China, we recruited 60 native Chinese. Most of

them are students in the university pursuing Bachelor, Master or

PhD degrees, and a few of them work as engineers in domains of

software development, architecture, etc. Another 60 subjects are

mainly students in our university, and 41 of them are Swiss and

the others are from European countries nearby like France, Italy

and Germany. Table 1 lists demographical profiles of study

subjects from the two cultural backgrounds.

Table 1. Demographical profiles of study subjects from two

cultures (the number of users is in the bracket)

Oriental Culture (60) Western Culture (60)

Nation China (60) Switzerland (41); Other

European countries (19)

Gender Female (23); Male (37) Female (15); Male (45)

Average age 21~30 (57); >30 (3) <21 (14); 21~30 (44);

>30 (2)

Major/

job domain

Computer, mathematics,

environment, electronics,

architecture, etc.

Computer, education,

mechanics, electronics,,

architecture, etc.

Computer

knowledge

4.34 (advanced) 4.08 (advanced)

Internet

usage

4.83 (almost daily) 4.98 (almost daily)

e-commerce

site visits

3.69 (1-3 times a month) 3.36 (a few times every 3

months)

e-shopping

experiences

3.25 (a few times every 3

months)

2.92 (a few times every 3

months)

Two systems were prepared for this user study. One is the

prototype system with the organization-based recommender

interface, as described in Section 3. Another system differs from

it only in respect of the recommendation display. That is, it does

not show an organized view of recommendations, but a traditional

ranked list with a “why” component to explain each

recommended product. More specifically, in the list view, k

products (e.g., k = 25 in our implementation) that are with the

78

highest weighted utilities according to the user’s current

preferences are listed, and the “why” gives the reason of why the

corresponding product is presented (i.e., its pros and cons

compared to the top candidate) (see Figure 2). In this system,

users can also freely specify and revise preferences, examine

products’ detailed specifications, and in-depth compare near-

targets in a consideration set.

Henceforth, the two compared systems are respectively

abbreviated as ORG and LIST. They were both developed with

two product catalogs: 64 digital cameras each constrained by 8

main attributes (manufacturer, price, resolution, optical zoom,

etc), and 55 tablet PCs by 10 main attributes (manufacturer, price,

processor speed, weight, etc). All products were extracted from a

real e-commerce website.

4.3 Evaluation Criteria

In this experiment, the measured variables used in previous user

studies (e.g., perceived effort, return intention) [17] were

extended to include more subjective aspects, which are essentially

related to the competence-based trust model we have established

for recommender systems [4]. The model consists of three main

constructs: system-design features, competence-inspired trust, and

trust-induced behavioral intentions. As for system-design features

that may directly contribute to the promotion of overall

competence perceptions, we included three dimensions:

recommendation quality, transparency, and user-control. The

overall competence is composed of two crucial variables:

perceived ease of use and perceived usefulness, which have been

determined as the primary factors of persuading users to accept

and use a technology [6]. Besides, we included questions about

decision confidence, cognitive effort, and satisfaction. Trusting

intentions are behavioral attitudes expected from users once their

trust has been built. In addition to commonly addressed purchase

and return intentions, we were interested in the intention to save

effort, because it examines whether users will potentially reduce

their decision-making effort in repeated visits upon establishing a

certain trust level with the recommender system.

Table 2 lists all of the questions as measurements of these

subjective variables. Most of them came from existing literatures

where they have been repeatedly shown to exhibit strong content

validity [12]. Each question was required to respond on a 5-point

Likert scale from “strongly disagree” to “strongly agree”.

Except for these subjective criteria, we also measured

participants’ objective decision accuracy and effort. The objective

accuracy was defined as the percentage of users who stood by

their choice found using the assigned recommender system, when

they have the chance to review all alternatives in the database.

The objective effort was quantitatively measured in terms of both

task completion time and interaction cycles.

4.4 Experiment Design and Procedure

A 2

2

full-factorial between-group experiment design was used.

The manipulated factors are: (oriental culture, western culture)

and (ORG, LIST). Participants were evenly distributed into the

four conditions, resulting in a sample size of 30 for each condition

cell. Each participant was further randomly assigned one product

catalog (digital camera or tablet PC) to search.

An online procedure containing instructions, evaluated interfaces

and questionnaires was implemented, so that participants could

easily follow and we could record all of their actions in a log file.

At the beginning, the participant was required to fill in a pre-

questionnaire about her/his personal information and subjective

opinions on the priority order of different factors in influencing

her/his general trust formation in an e-commerce website. Then

s/he was asked to use the assigned system to locate a product that

s/he most preferred and would purchase if given the opportunity.

After the choice was made, the participant was asked to answer

post-study questions related to all of the measured subjective

variables. Then the interface’s decision accuracy was assessed by

revealing all of products to the participant to determine whether

s/he prefers another product in the catalog or sticks with the

choice just made using the recommender system.

Table 2. Questions to measure subjective variables

Measured

variable

Question responded on a 5-point Likert scale

from “strongly disagree” to “strongly agree”

Subjective perceptions of system-design features

Recommendation

quality

This interface gave me some really good

recommendations.

Transparency

I understand why the products were returned

through the explanations in the interface.

User control

I felt in control of specifying and changing my

preferences in this interface.

Overall competence perceptions

Perceived ease of

use

I find this interface easy to use.

This interface is competent to help me effectively

find products I really like.

I find this interface is useful to improve my

“shopping” performance.

Perceived

usefulness

Cronbach’s alpha = 0.69

Decision

confidence

I am confident that the product I just “purchased” is

really the best choice for me.

I easily found the information I was looking for.

Looking for a product using this interface required

too much effort (reverse scale).

Perceived effort

Cronbach’s alpha = 0.54

Satisfaction My overall satisfaction with the interface is high.

Trusting intentions

Intention to

purchase

I would purchase the product I just chose if given

the opportunity.

If I had to search for a product online in the future

and an interface like this was available, I would be

very likely to use it.

I don't like this interface, so I would not use it again

(reverse scale).

Intention to return

Cronbach’s alpha = 0.80

Intention to save

effort in next visit

If I had a chance to use this interface again, I

would likely make my choice more quickly.

Note: The Cronbach’s alpha value represents how well the two items are

related and unified to one construct.

4.5 Hypotheses

Regarding the culture difference, we postulated that it would not

have significant influence on users’ decision behavior in either

ORG or LIST. That is, people would react similarly to the system

no matter which cultural background s/he is from. The ORG

system was further hypothesized to outperform LIST, especially

in terms of subjective constructs related to user trust, owing to the

replacement of the list view of recommendations with the

organized view.

79

A cross-cultural user evaluation of product recommender interfaces

Figures

Citations

The Cultural Impact on User Interface Design: The Case of e-Government services of Kingdom of Bahrain

A Cross-Cultural Analysis of Explanations for Product Reviews.

Inferring users' multi-attribute preferences from the reviews for augmenting recommender systems in e-commerce

Improvement of the performance of Recommender System Using the National Culture of Buyer

Exploring and Eliciting Needs and Preferences from Editors for Wikidata Recommendations

References

Perceived Usefulness, Perceived Ease of Use, and User

Perceived usefulness, perceived ease of use, and user acceptance of information technology

Mining association rules between sets of items in large databases

Decisions with Multiple Objectives: Preferences and Value Trade-Offs

E-commerce: the role of familiarity and trust

Related Papers (5)

Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions

Evaluating collaborative filtering recommender systems

Trust building with explanation interfaces

Being accurate is not enough: how accuracy metrics have hurt recommender systems

A user-centric evaluation framework for recommender systems