scispace - formally typeset
Search or ask a question
Journal Article

Moving beyond Text Highlights: Inferring Users' Interests to Improve the Relevance of Retrieval.

01 Dec 2016-Information Research: An International Electronic Journal (Thomas D. Wilson. 9 Broomfield Road, Broomhill, Sheffield, S10 2SE, UK. Web site: http://informationr.net/ir)-Vol. 21, Iss: 4
TL;DR: Users’ post-click behaviour may serve as a significant indicator of their interests, and thus can be used to improve the relevance of the retrieved results, according to a study that examined users’ text highlight frequency, length and users' copy-paste actions.
Abstract: Introduction. Studies have indicated that users' text highlighting behaviour can be further manipulated to improve the relevance of retrieved results. This article reports on a study that examined users’ text highlight frequency, length and users' copy-paste actions. Method. A binary voting mechanism was employed to determine the weights for the feedback, which were then used to re-rank the original search results. A search engine prototype was built using the Communications of the ACM test collection, with the well-known BM25 acting as the baseline model. Analysis. The proposed enhanced model’s performance was evaluated using the mean average precisions and F-score metrics, and results were compared at the top 5, 10 and 15. Additionally, comparisons were also made based on the number of terms used in a query, that is single, double and triple terms. Results. The findings show that the enhanced model significantly outperformed BM25, and the rest of the models at all document levels. To be specific, the enhanced model showed significant improvements over the frequency model. Additionally, retrieval relevance was found to be the best when the query length is two. Conclusions. Users’ post-click behaviour may serve as a significant indicator of their interests, and thus can be used to improve the relevance of the retrieved results. Future studies could look into further extending this model by including other post-click behaviour such as printing or saving.

Content maybe subject to copyright    Report

VOL. 21 NO. 4, DECEMBER, 2016
Contents | Author index | Subject index | Search |
Home
Moving beyond text highlights: inferring users' interests to
improve the relevance of retrieval
Vimala Balakrishnan, Yasir Mehmood and Yoganathan Nagappan
Abstract
Introduction. Studies have indicated that users' text highlighting
behaviour can be further manipulated to improve the relevance of
retrieved results. This article reports on a study that examined users'
text highlight frequency, length and users' copy-paste actions.
Method. A binary voting mechanism was employed to determine the
weights for the feedback, which were then used to re-rank the original
search results. A search engine prototype was built using the
Communications of the ACM test collection, with the well-known
BM25 acting as the baseline model.
Analysis. The proposed enhanced model's performance was
evaluated using the mean average precisions and F-score metrics,
and results were compared at the top 5, 10 and 15. Additionally,
comparisons were also made based on the number of terms used in a
query, that is single, double and triple terms.
Results. The findings show that the enhanced model significantly
outperformed BM25, and the rest of the models at all document levels.
To be specific, the enhanced model showed significant improvements
over the frequency model. Additionally, retrieval relevance was found
to be the best when the query length is two.
Conclusions. Users' post-click behaviour may serve as a significant
indicator of their interests, and thus can be used to improve the
relevance of the retrieved results. Future studies could look into
further extending this model by including other post-click behaviour
such as printing or saving.
Introduction
Online searching has become part of many people's work and
daily lives, including activities such as research, shopping and
entertainment (Clay and Esparza, 2012
). For example, it is
common for people to: seek information from Wikipedia, search
change font

through Google, buy products from eBay or Amazon.com, etc.
However, searching for relevant items or services can be a
daunting task due to the amount of information, and this is
further exacerbated by a lack of searching skills. Web users tend
to not know (or care) about the heterogeneity of Web content,
the syntax of query languages and the art of phrasing queries,
often resulting in them spending a lot of time looking for
relevant items on the Internet (Manning, Raghavan and Schutze,
2009; Varathan, Tengku Sembok, Abdul Kadir and Omar, 2014).
To solve the problem of users' lack of query skills, relevance
feedback is commonly used. Relevance feedback is a process
involving users in the development of information retrieval
systems, and aims to improve search results and increase user
satisfaction. According to Baeza-Yates and Ribeiro-Neto (1999
),
'in a relevance feedback cycle, the user is presented with a list of
the retrieved documents and, after examining them, marks
those which are relevant'. In fact, relevance feedback has been
shown to be an indicator of users' interests, which can then be
used to improve their satisfaction (Claypool, Le, Wased and
Brown, 2001; Fox, Karnawat, Mydland, Dumais and White,
2005; Liu, Gwizdka and Liu, 2010). Two components of
relevance feedback have evolved, namely, query expansion (i.e.,
automatic relevance feedback) and term reweighting. Query
expansion involves the addition of new terms to the initial query
automatically, using techniques such as pseudo-relevance
feedback (i.e., users get improved retrieval performance without
further interaction), thesaurus-based or other types of
expansions, with studies demonstrating the accuracy of the
interpretation of original queries to improve by this technique
(Belkin
et al., 2004; Crabtree, Andreae and Gao, 2007; Walker,
Robertson, Boughanem, Jones and Sparck, 1998). In contrast,
term reweighting refers to the modification of term weights
according to the relevance judgement by users (Baeza-Yates and
Ribeiro-Neto, 1999). In other words, this technique increases the
term weights in relevant documents whilst decreasing those in
irrelevant documents.
Relevance feedback is usually gathered through explicit and/or
implicit feedback. Explicit feedback requires the users to provide
feedback for products or services rendered, with methods
including ranking, rating, commenting and answering questions
(Balakrishnan, Ahmadi and Ravana, 2016
; Claypool et al., 2001;
Núñez-Valdéz
et al., 2012). Explicit feedback is well understood,
easy to implement and fairly precise. The approach, however,
requires the users to engage in additional activities beyond their
normal searching behaviour, hence resulting in higher user costs
in time and effort. Additionally, not all users like to be involved
in providing explicit feedback as the repetitive and frequent way
of obtaining the relevance judgement causes a cognitive overload

for the users (Claypool et al., 2001; Zemerli, 2012). Users are
generally busy reading or looking for the right document or item
during a search session, and thus they do not often provide
explicit feedback. In fact, during an experiment it was found that
only a small group of users agreed to give relevance judgments,
and sometimes they had to be paid to provide this information
(Spink, Jansen and Ozmultu, 2000
). Similarly, previous work on
GroupLens
found that users rated many fewer documents than
they read (Sarwar
et al., 1998). Thus, even though explicit
ratings are fairly precise in recognizing user interests, their
efficacy is limited.
On the other hand, implicit feedback such as mouse clicks (i.e.,
click-through data) can be mined unobtrusively and used to
determine users' preferences (Agichtein, Brill, Dumais and
Ragno, 2006; Agrawal, Halverson, Kenthapadi, Mishra and
Tsaparas, 2009; Carterette and Jones, 2008; Chuklin, Markov
and Rijke, 2015; Claypool et al., 2001; Dupret and Liao, 2010;
Feimin
et al., 2010; Joachims, Granka, Pan, Hembrooke and
Gay, 2005; Yu, Lu, Sun, and Zhang, 2012). For instance, mouse
clicks have been used for online advertising to estimate the
relevance of an advertisement to a search result page or a
document (Chatterjee, Hoffman and Novak, 2003
; Graepel,
Candela, Borchert and Herbrich, 2010).
One of the main difficulties in estimating relevance from click
data is due to position bias, that is, a document appearing in a
higher position is more likely to attract user clicks even though it
is irrelevant. Subsequent studies on implicit feedback hence
progressed to include users' post-click behaviour, which refers to
users' actions on the selected documents or results. These
include dwell time (i.e., reading or display time) (Balakrishnan
and Zhang, 2014; Fox et al., 2005; Oard and Kim, 1998),
printing (Oard and Kim, 1998
), scrolling (Claypool et al., 2001;
Guo and Agichtein, 2012
; Oard and Kim, 1998), and mouse or
cursor movements (Buscher, White, Dumais and Huang, 2012
;
Guo and Agichtein, 2010
; Huang, White and, Buscher, 2012),
with results indicating post-click behaviour to provide valuable
implicit feedback that could indicate the relevance of selected
documents. Huge volumes of implicit feedback data can be
gathered easily and unobtrusively. Furthermore, no mental
efforts are required from the users (Claypool
et al., 2001;
Manning
et al., 2009).
One of the recent implicit feedback techniques that has been
explored is text highlight or text selection, which involves
selecting a block of text to indicate its relevancy to the user.
Generally, people make some form of mark, such as highlights,
annotations, comments, circles, etc., on documents to indicate
interests or relevance (Shipman, Price, Marshall, Golovchinsky

and Schilit, 2003). Similar assumptions have been made in
information retrieval studies, whereby users' annotations and
text highlighting behaviour was used to improve document
relevance. However, research focusing on such behaviour is
scarce.
Studies of users' text highlighting behaviour thus far have
examined the frequency of text highlighting, that is, it is
assumed that the more text highlights a document contains, the
more relevant the document is to the user (Balakrishnan and
Zhang, 2014; White and Buscher, 2012). Determining a
document's relevance based only on the frequency of text
highlighting may be inadequate because factors such as the
length of the highlighted text and users' post-selection actions
may also indicate users' interests (White and Buscher, 2012
).
Furthermore, according to Buscher et al. (2012
), copy-paste and
reading aid were the two main reasons leading users to highlight
text. In fact, both text highlighting and copying are considered to
be very strong indicators of users' interests (Hauger, Paramythis,
and Weibelzahl, 2011; Hijikata, 2004). Therefore, these
indicators can potentially be used together to improve retrieval
relevance.
The current study aims to extend the above-mentioned works by
further exploring and manipulating users' text highlighting
behaviour. To be precise, the study intends to improve document
retrieval relevance by analysing three parameters: (i) frequency
of text highlight, (ii) length of text highlight, and (iii) user's copy-
paste action. The traditional ranking algorithm, Okapi BM25
was
used as the baseline (i.e., without users' feedback) and
Communications of the ACM (CACM) was used as the test
collection. To evaluate the retrieval effectiveness of the proposed
model, an experiment was conducted using a self-developed
prototype search engine. The retrieved results were analysed at
the top 5, 10 and 15 document levels, and also compared by
query lengths. As will be shown, the findings show that the
proposed model consistently yields significant improvements
over BM25, and the rest of the feedback models.
Related work
Implicit user feedback can be generally divided into two
categories: the query actions and the physical user reactions. The
query actions refer to ways in which the user interacts with the
search engine (e.g., clicks, key strokes) whereas the physical
reactions are users' unconscious behaviour (e.g., eye movements,
heart rate). Unlike the latter category, which requires special
devices to collect data, users' query actions can be easily
captured during a search session. The current study intends to
exploit these query actions, specifically users' text highlighting
behaviour.

Inferences drawn from implicit feedback are considered to be
less reliable compared to explicit feedback, but on the other
hand, large quantities of data can be gathered unobtrusively
(Jung, Herlocker, and Webster, 2007
). Studies focusing on
implicit feedback have investigated various user behaviour, such
as mouse clicks (Agichtein
et al., 2006; Agrawal et al., 2009;
Balakrishnan and Zhang, 2014
; Claypool et al., 2001; Dupret and
Liao, 2010; Feimin et al., 2010; Yu et al., 2012), dwell time
(Balakrishnan and Zhang, 2014
; Fox et al., 2005; Hassan, Jones
and Klinkner, 2010; Huang, White and Dumais, 2011), eye
tracking (Joachims
et al., 2007), and mouse movements
(Buscher
et al., 2012; Guo and Agichtein, 2010; Huang et al.,
2012), among others. Techniques such as mouse clicks are based
on the assumption that the clicked documents are relevant to the
search queries, however this may not be accurate. Joachims et
al. (2005
) reported two main issues: trust bias in which users
trusted the ranking quality of the search engine and only clicked
the first few results, and quality bias which refers to users'
varying behaviour for the same query in different search engines.
Generally, studies examine the search logs to understand users'
behaviour and interests because they can automatically capture
user interaction details. In addition, these data can be analysed
to optimize retrieval performances (Jordan, Simone, Thomas,
and Alexander, 2010), help query suggestions (Huanhuan et al.,
2008) and enhance the ranked results (Agichtein et al., 2006;
Balakrishnan and Zhang, 2014
). More recent studies have looked
into ways to improve retrieval by investigating other clicking
behaviour, such as Xu, Chen, Xu, Li, and Elbio (2010
) who used
click rate and last click to predict the relevant labels or Uniform
Resource Locators (URL). Although an overall improvement was
observed, using last click as an interest indicator may not be
accurate as well, because there are different reasons behind the
last click. For example, users who left the last documents may
have either succeeded in finding useful documents (good
abandon) or failed to find relevant documents (bad abandon),
and hence began a new search (Huang
et al., 2011).
Studies have also progressed into examining users' post-click
behaviour (i.e., actions performed after clicking on a link). A
simple technique would be the dwell time, whereby it is assumed
that if a document is relevant, the user may spend longer time on
it than other documents (Buscher, Elst, and Dengel, 2009
). In a
more recent study, dwell time was further experimented from
three different angles, that is, display time (i.e., interval time
between open and close of the document), dwell time (i.e.,
reading time) and decision time (i.e., decision-making time to
select document), with results indicating dwell time topped the
list in predicting document relevance compared to the other two

References
More filters
Book
17 Dec 2009
TL;DR: This work presents the PRF from a conceptual point of view, describing the probabilistic modelling assumptions behind the framework and the different ranking algorithms that result from its application: the binary independence model, relevance feedback models, BM25 and BM25F.
Abstract: The Probabilistic Relevance Framework (PRF) is a formal framework for document retrieval, grounded in work done in the 1970—1980s, which led to the development of one of the most successful text-retrieval algorithms, BM25. In recent years, research in the PRF has yielded new retrieval models capable of taking into account document meta-data (especially structure and link-graph information). Again, this has led to one of the most successful Web-search and corporate-search algorithms, BM25F. This work presents the PRF from a conceptual point of view, describing the probabilistic modelling assumptions behind the framework and the different ranking algorithms that result from its application: the binary independence model, relevance feedback models, BM25 and BM25F. It also discusses the relation between the PRF and other statistical models for IR, and covers some related topics, such as the use of non-textual features, and parameter optimisation for models with free parameters.

2,037 citations


"Moving beyond Text Highlights: Infe..." refers background in this paper

  • ...The model and its variants have been extensively described and evaluated in the field of information retrieval, and hence serve as a strong, reproducible baseline algorithm (Robertson and Zaragoza, 2009)....

    [...]

Journal ArticleDOI
TL;DR: A failure analysis was conducted, identifying trends among user mistakes, and a summary of findings and a discussion of the implications of these findings were concluded.
Abstract: We analyzed transaction logs containing 51,473 queries posed by 18,113 users of Excite, a major Internet search service. We provide data on: (i) sessions — changes in queries during a session, number of pages viewed, and use of relevance feedback; (ii) queries — the number of search terms, and the use of logic and modifiers; and (iii) terms — their rank/frequency distribution and the most highly used search terms. We then shift the focus of analysis from the query to the user to gain insight to the characteristics of the Web user. With these characteristics as a basis, we then conducted a failure analysis, identifying trends among user mistakes. We conclude with a summary of findings and a discussion of the implications of these findings. # 2000 Elsevier Science Ltd. All rights reserved.

1,414 citations


"Moving beyond Text Highlights: Infe..." refers background in this paper

  • ...settings, but shorter queries are more pervasive in interactive systems, such as the Web search engines (Jansen et al., 2000; Spink et al., 2000)....

    [...]

  • ...Search results can often be unsatisfying as multiple words may have similar meanings, or more than one meaning, causing results to be different than expected (Jansen, Spink, and Saracevic, 2000)....

    [...]

  • ...In fact, the average query length has been reported to be around 2.3 words in the Web domain (Jansen et al., 2000; Spink et al., 2000)....

    [...]

  • ...Query formulation is important and studies have reported that longer queries yield more accurate results in controlled experiment settings, but shorter queries are more pervasive in interactive systems, such as the Web search engines (Jansen et al., 2000; Spink et al., 2000)....

    [...]

Proceedings ArticleDOI
01 Jan 2001
TL;DR: It was found that the time spent on a pages, the amount of scrolling on a page and the combination of time and scrolling had a strong correlation with explicit interest, while individual scrolling methods and mouse-clicks were ineffective in predicting explicit interest.
Abstract: Recommender systems provide personalized suggestions about items that users will find interesting. Typically, recommender systems require a user interface that can ``intelligently'' determine the interest of a user and use this information to make suggestions. The common solution, ``explicit ratings'', where users tell the system what they think about a piece of information, is well-understood and fairly precise. However, having to stop to enter explicit ratings can alter normal patterns of browsing and reading. A more ``intelligent'' method is to useimplicit ratings, where a rating is obtained by a method other than obtaining it directly from the user. These implicit interest indicators have obvious advantages, including removing the cost of the user rating, and that every user interaction with the system can contribute to an implicit rating.Current recommender systems mostly do not use implicit ratings, nor is the ability of implicit ratings to predict actual user interest well-understood. This research studies the correlation between various implicit ratings and the explicit rating for a single Web page. A Web browser was developed to record the user's actions (implicit ratings) and the explicit rating of a page. Actions included mouse clicks, mouse movement, scrolling and elapsed time. This browser was used by over 80 people that browsed more than 2500 Web pages.Using the data collected by the browser, the individual implicit ratings and some combinations of implicit ratings were analyzed and compared with the explicit rating. We found that the time spent on a page, the amount of scrolling on a page and the combination of time and scrolling had a strong correlation with explicit interest, while individual scrolling methods and mouse-clicks were ineffective in predicting explicit interest.

768 citations


"Moving beyond Text Highlights: Infe..." refers background or methods in this paper

  • ...including ranking, rating, commenting and answering questions (Balakrishnan, Ahmadi and Ravana, 2016; Claypool et al., 2001; Núñez-Valdéz et al., 2012)....

    [...]

  • ...…feedback have investigated various user behaviour, such as mouse clicks (Agichtein et al., 2006; Agrawal et al., 2009; Balakrishnan and Zhang, 2014; Claypool et al., 2001; Dupret and Liao, 2010; Feimin et al., 2010; Yu et al., 2012), dwell time (Balakrishnan and Zhang, 2014; Fox et al., 2005;…...

    [...]

  • ...Furthermore, no mental efforts are required from the users (Claypool et al., 2001; Manning et al., 2009)....

    [...]

  • ..., click-through data) can be mined unobtrusively and used to determine users' preferences (Agichtein, Brill, Dumais and Ragno, 2006; Agrawal, Halverson, Kenthapadi, Mishra and Tsaparas, 2009; Carterette and Jones, 2008; Chuklin, Markov and Rijke, 2015; Claypool et al., 2001; Dupret and Liao, 2010; Feimin et al., 2010; Joachims, Granka, Pan, Hembrooke and Gay, 2005; Yu, Lu, Sun, and Zhang, 2012)....

    [...]

  • ...For instance, Claypool et al. (2001) found dwell time and scrolling predicted relevance in Internet browsing, Oard and Kim (1998) found dwell time and printing significantly indicated users' interests, whereas Guo and Agichtein (2012) improved document relevance using dwell time, scrolling and…...

    [...]

Journal ArticleDOI
TL;DR: It is found that relative preferences derived from clicks are reasonably accurate on average, and not only between results from an individual query, but across multiple sets of results within chains of query reformulations.
Abstract: This article examines the reliability of implicit feedback generated from clickthrough data and query reformulations in World Wide Web (WWW) search. Analyzing the users' decision process using eyetracking and comparing implicit feedback against manual relevance judgments, we conclude that clicks are informative but biased. While this makes the interpretation of clicks as absolute relevance judgments difficult, we show that relative preferences derived from clicks are reasonably accurate on average. We find that such relative preferences are accurate not only between results from an individual query, but across multiple sets of results within chains of query reformulations.

632 citations


"Moving beyond Text Highlights: Infe..." refers background in this paper

  • ...…et al., 2010; Yu et al., 2012), dwell time (Balakrishnan and Zhang, 2014; Fox et al., 2005; Hassan, Jones and Klinkner, 2010; Huang, White and Dumais, 2011), eye tracking (Joachims et al., 2007), and mouse movements (Buscher et al., 2012; Guo and Agichtein, 2010; Huang et al., 2012), among others....

    [...]

  • ..., 2005; Hassan, Jones and Klinkner, 2010; Huang, White and Dumais, 2011), eye tracking (Joachims et al., 2007), and mouse movements (Buscher et al....

    [...]

Journal ArticleDOI
Steve Fox1, Kuldeep Karnawat1, Mark B. Mydland1, Susan T. Dumais1, Thomas D. White1 
TL;DR: There was an association between implicit measures of user activity and the user's explicit satisfaction ratings, and the best models for individual pages combined clickthrough, time spent on the search result page, and how a user exited a result or ended a search session.
Abstract: Of growing interest in the area of improving the search experience is the collection of implicit user behavior measures (implicit measures) as indications of user interest and user satisfaction. Rather than having to submit explicit user feedback, which can be costly in time and resources and alter the pattern of use within the search experience, some research has explored the collection of implicit measures as an efficient and useful alternative to collecting explicit measure of interest from users.This research article describes a recent study with two main objectives. The first was to test whether there is an association between explicit ratings of user satisfaction and implicit measures of user interest. The second was to understand what implicit measures were most strongly associated with user satisfaction. The domain of interest was Web search. We developed an instrumented browser to collect a variety of measures of user activity and also to ask for explicit judgments of the relevance of individual pages visited and entire search sessions. The data was collected in a workplace setting to improve the generalizability of the results.Results were analyzed using traditional methods (e.g., Bayesian modeling and decision trees) as well as a new usage behavior pattern analysis (“gene analysis”). We found that there was an association between implicit measures of user activity and the user's explicit satisfaction ratings. The best models for individual pages combined clickthrough, time spent on the search result page, and how a user exited a result or ended a search session (exit type/end action). Behavioral patterns (through the gene analysis) can also be used to predict user satisfaction for search sessions.

597 citations


"Moving beyond Text Highlights: Infe..." refers background in this paper

  • ...These include dwell time (i.e., reading or display time) (Balakrishnan and Zhang, 2014; Fox et al., 2005; Oard and Kim, 1998), printing (Oard and Kim, 1998), scrolling (Claypool et al., 2001; Guo and Agichtein, 2012; Oard and Kim, 1998), and mouse or cursor movements (Buscher, White, Dumais and…...

    [...]

  • ..., 2012), dwell time (Balakrishnan and Zhang, 2014; Fox et al., 2005; Hassan, Jones and Klinkner, 2010; Huang, White and Dumais, 2011), eye tracking (Joachims et al....

    [...]

  • ...This finding was expected as generally document relevance improve when users' feedback are available (Ahn et al., 2008; Balakrishnan and Zhang, 2014; Bidoki et al., 2010; Buscher et al., 2012; Fox et al., 2005; White and Buscher, 2012)....

    [...]

  • ...…and Zhang, 2014; Claypool et al., 2001; Dupret and Liao, 2010; Feimin et al., 2010; Yu et al., 2012), dwell time (Balakrishnan and Zhang, 2014; Fox et al., 2005; Hassan, Jones and Klinkner, 2010; Huang, White and Dumais, 2011), eye tracking (Joachims et al., 2007), and mouse movements…...

    [...]

  • ..., reading or display time) (Balakrishnan and Zhang, 2014; Fox et al., 2005; Oard and Kim, 1998), printing (Oard and Kim, 1998), scrolling (Claypool et al....

    [...]