A Case Study of Collaboration and Reputation in Social Web Search
Summary (9 min read)
1. INTRODUCTION
- The scale of the Web and the heterogeneous nature of its content [Signorini and Gulli 2005] introduces many significant information discovery challenges.
- Part of the problem rests with the searchers themselves: with an average of only 2-3 terms [Lawrence and Giles 1998; Spink and Jansen 2004], the typical Web search query is often vague with respect to the searcher’s true intentions or information needs [Song et al. 2007].
- Moreover, searchers sometimes choose query terms that are not well represented in the page that they are seeking and so simply increasing the length of queries will not necessarily improve search performance.
- This paper focuses on the second idea, that of collaboration.
2 · McNally et al.
- The traditional world of web search and the information sharing world of social networks, also known as discovery worlds.
- Only a few years ago, by and large, the majority of people located information of interest through their favourite mainstream search engine.
- This shift in their information discovery habits has lead to an explosion in the number and variety of new social-search type services — all of which can influence their information discovery activities, bringing the world of web search and social networks even closer together (see Figure 1).
- A key contribution of this paper is a detailed description of a recent live-user trial of HeyStaks in order to understand the usage and collaboration patterns of users and also the quality of HeyStaks’ social recommendations relative to the organic results of mainstream search engines.
2. BACKGROUND
- This paper focuses on discussing HeyStaks as a collaborative information retrieval technology, augmented by a reputation system based on the collaborations that implicitly take place between searchers in the HeyStaks social search utility.
- As such this background section covers recent, relevant work in the two broad areas of collaborative information retrieval and reputation systems.
2.1 Collaborative Information Retrieval
- Approaches to collaborative information retrieval can be usefully distinguished in terms of two important dimensions, time — synchronous versus asynchronous search — and place — that is, co-located versus remote searchers.
- Co-located systems offer a collaborative search experience for multiple searchers at a single location, typically sharing a single PC [Amershi and Morris 2008; Smeaton et al. 2008], whereas remote approaches allow searchers to perform their searches at different locations across multiple devices [Morris and Horvitz 2007a; 2007b; Smyth et al. 2009b].
- The former enjoy the obvious benefit of an increased faculty for direct ACM Transactions on Intelligent Systems and Technology, Vol. 2, No. 3, 09 2001.
4 · McNally et al.
- Collaboration that is enabled by the face-to-face nature of co-located search, while the latter offer a greater opportunity for collaborative search.
- Once again, preliminary studies speak to the potential for such an approach to improve overall search productivity and collaboration, at least in specific types of information access tasks.
- The iBingo system allows a group of users to collaborate on an image search task with each user using a iPod touch device as their primary search/feedback device (although conventional PCs appear to be just as applicable).
- SearchTogether supports synchronous collaborative search by allowing searchers to invite others to join in specific search tasks, allowing cooperating searchers to synchronously view the results of each others’ searches via a split-screen style results interface.
- Work by Pickens et al. [2008] describes an approach to collaborative search that is more tightly integrated with the underlying search engine resource so that the operation of the search engine is itself influenced by the activities of collaborating searchers.
2.2 Reputation Systems
- Recently there has been considerable interest in reputation systems to provide a mechanism to evaluate user reputation and inter-user trust across a growing number of social web and e-commerce applications [Jøsang and Golbeck 2009; O’Donovan and Smyth 2005; 2006; Sabater and Sierra 2005; Resnick and Zeckhauser 2002; Resnick et al. 2000].
- This work is, in part, motivated by the idea that an understanding of user reputation can serve as the basis for strategies to guard against malicious users [Lazzari 2010; Hoffman et al.
- Here, the authors present a brief review of the work that has been undertaken in this regard.
- Jøsang et al. [2007] confirms this, stating such systems require manual curation and protection from malicious users.
- Unlike conventional reputation systems like eBay’s, reputation is not calculated by examining feedback received directly from users.
6 · McNally et al.
- Collaborative filtering algorithm is modified to add a user-user trust score to compliment the normal profile or item-based similarity score, so that recommendation partners are chosen from those users that are not only similar to the target user, but who have also had a positive recommendation history with that user.
- Using this metric average prediction error is improved by 22%.
- Similar to O’Donovan and Smyth [2005], Massa and Avesani [2007] propose a reputation algorithm called MoleTrust that can be used to augment an existing collaborative filtering system.
- Other recent research has examined reputation systems employed in social networking platforms.
- Applying reputation globally affords malicious users influence over the entire system, which adds to its vulnerability.
3. HEYSTAKS: A SOCIAL SEARCH UTILITY
- In designing HeyStaks their primary goal is to provide social Web search enhancements, while at the same time allowing searchers to continue to use their favourite search engine.
- First, it allows users to create search staks, as a type of folder for their search experiences at search time, and the creator can invite initial members by providing their email addresses.
- As shown in Figure 2, HeyStaks takes the form of two basic components: a clientside browser toolbar and a back-end server.
- A Case Study of Collaboration and Reputation in Social Web Search · 7 list.
- In the following sections the authors review how HeyStaks captures search activities within search staks and how this search knowledge is used to generate and filter result recommendations at search time; more detailed technical details can be found in [Smyth et al. 2009a; 2009b].
3.1 Profiling Stak Pages
- In HeyStaks each search stak (S) serves as a profile of the search activities of the stak members.
- A number of primary actions are facilitated, for example: —Selections (or Click-thrus) – that is, a user selects a search result (whether organic or recommended).
- It is also a weak indicator of relevance because users will frequently select pages that turn out to be ACM Transactions on Intelligent Systems and Technology, Vol. 2, No. 3, 09 2001.
- (1) In this way, each result page is associated with a set of term data (query terms and/or tag terms) and a set of usage data (the selection, tag, share, and voting count).
- At search time, recommendations are produced in a number of stages: first, relevant results are retrieved and ranked from the stak index; next, these recommendation candidates are filtered based on the usage evidence to eliminate noisy recommendations; and, finally, the remaining results are added to the Google result-list according to a set of recommendation rules.
3.2 Retrieval & Ranking
- Briefly, there are two types of recommendation candidates: primary recommendations are results that come from the active stak St; whereas secondary recommendations come from other staks in the searcher’s stak-list.
- To generate these recommendation candidates, the HeyStaks server uses the current query qt as a probe into each stak index, Si, to identify a set of relevant stak results R(Si, qt).
- Each candidate result, r, is assigned a relevance score using a TF*IDF -based retrieval function as per Equation 2, which serves as the basis for an initial recommendation ranking.
- (2) Staks are inevitably noisy, in the sense that they will frequently contain results that are not on topic.
- The precise details of this model are beyond the scope of this paper but suffice it to say that any results which do not meet the necessary evidence thresholds are eliminated from further consideration; further detail can be found in [Smyth et al. 2009a; 2009b].
3.3 Summary Discussion
- HeyStaks is designed to help users to collaborate during Web search tasks and, importantly, it succeeds in integrating collaborative recommendation techniques with mainstream search engines.
- In the next section the authors introduce their user reputation model, which is based on the collaboration events that inherently occur between users who share their search experiences.
- In turn, the authors show how this model can be employed to further enhance the quality of recommendations provided by HeyStaks by using reputation to influence the ranking of recommended results.
4. A REPUTATION MODEL FOR SOCIAL SEARCH
- The many and varied different types of activities that a user can perform on a web page (click-thrus, tagging, voting, sharing) are ultimately combined and leveraged by HeyStaks to make recommendations at search time.
- Intuitively, the authors might expect that some users are more experienced searchers than others and, as such, perhaps their activities should be considered more reliable at recommendation time.
- This is particularly important given the potential for malicious users to disrupt stak quality by introducing dubious results to a stak.
- If unchecked this type of gaming has the potential to significantly degrade recommendation quality; see also recent related research on malicious users and robustness by the recommender systems community [Bryan et al.
- In the following section, the authors describe how user activities in HeyStaks can be harnessed to generate a computational model of user reputation, based on the collaboration events that naturally occur between HeyStaks users who share their search experiences.
4.1 From Activities to Reputation
- It seems natural that the reputation of searchers should be linked to the search knowledge that they contribute to HeyStaks.
- Each activity on the part of users causes the creation of new search knowledge.
- If the target page is new to a stak, then its selection, sharing, voting, or tagging will cause it to be added to the stak for the first time.
- Under the heading of “more search knowledge is better than less search knowledge” it might make sense to model reputation as a direct function of the sheer volume of activity that a given searcher engages in.
- On the contrary, one of the major concerns in any social recommender is the potential ACM Transactions on Intelligent Systems and Technology, Vol. 2, No. 3, 09 2001.
4.2 Reputation as Collaboration
- The long-term value of HeyStaks as a social search service depends critically on the ability of users to benefit from its quality search knowledge and if, for example, all of the best search experiences are tied up in private staks and never shared, then this long-term value will be greatly diminished.
- The key idea is that, ultimately, the quality of shared search knowledge can be estimated by looking at the frequency of search collaborations within HeyStaks.
- In other words, the producer created search knowledge that was deemed to be relevant enough to be recommended and useful enough for the consumer to act upon it.
- This collaboration-based model of reputation is incentivizing users not just to create search knowledge of high quality but also to share it with others.
4.3 A Computational Model of Reputation
- The conferral of reputation by a single consumer on a single producer (Figure 4(a)) is the simplest case of their reputation model.
- A specific producer may have been the first to select the result in a given stak, but subsequent users may have selected it for different queries, or they may have voted on it or tagged it or shared it with others independently of its other producers.
- In this way reputation is shared equally among its k contributing producers; see Figure 5 for an example of how user reputation can evolve over time.
- Given the formulation of the reputation model, some protection against malicious activity is inherently provided because users only benefit if their results are recommended and selected by other users.
- The problem is that the current reputation model distributes reputation equally among all producers.
14 · McNally et al.
- That a producer should receive more reputation if many of their past contributions have been consumed by other users but the should receive less reputation if most of their contributions have not been consumed.
- More formally, for a producer pi, let nt(pi, t− 1) be the total number of distinct results that this user has added to the stak in question prior to time t; remember that pi refers to a single user and a specific stak.
- Further, let nr(pi, t − 1) be the number of these results that have been subsequently recommended and consumed by other users.
- Accordingly, if a producer has a high consumption ratio it means that many of their contributions have been consumed by other users, suggesting that the producer has consistently added useful content to the stak.
4.4 Reputation and Result Recommendation
- In the previous sections the authors have described a reputation model for users.
- In this section the authors describe how this reputation information can be used to produce better recommendations at search time.
- One option is to simply add the reputation scores of the producers.
- In their work the authors have found a third option to work best.
- Thus, as long as at least some of the producers are considered reputable then this result will receive a high reputation score, even if many of the producers have low reputation scores.
5. EVALUATION
- Live-user trial of HeyStaks, designed to evaluate the utility of HeyStaks’ brand of collaborative search in factfinding, information discovery tasks.the authors.
- In addition the authors also have the opportunity to evaluate the potential benefits of their new reputation model when it comes to boosting the relevance of HeyStaks’ default promotions.
- It is worth highlighting that this present evaluation complements earlier evaluations of HeyStaks such as that carried out by Smyth et al. [2009b].
- These earlier evaluations had the benefit of being open-ended trials, following users during routine search tasks, but were limited in their ability to evaluate the relevance of HeyStaks recommendations.
- Instead, these earlier evaluations reported on typical usage by HeyStaks users, focusing on stak creation and sharing behaviour.
5.1 Dataset and Methodology
- The authors experiment involves 64 first-year undergraduate university students with varying degrees of search expertise.
- It was highly unlikely that students would be able to answer any significant number of these questions from their own general knowledge and so the purpose of this experiment was to look at how the students used HeyStaks and Google to help them answer these questions.
- The solitary staks served as a straightforward benchmark to evaluate the search effectiveness of individual users on a non-collaborative search setting, whereas the different sizes of shared staks provided an opportunity to examine the effectiveness of collaborative search across a range of different group sizes.
- During the 60 minute trial a total of 3,124 queries and 1,998 result activities (selections, tagging, voting, popouts) were logged, and 724 unique results were selected.
18 · McNally et al.
- —not relevant (i.e. the result page content had no relevance with respect to a question); —partially relevant (i.e. the result page contains an implicit reference to the answer or to a part of the answer to a question); —relevant (i.e. the result page contains an answer to a question).
- Figure 6(b) shows a relevance breakdown of the result pages logged during the course of the trial.
- 66% of result pages acted on were categorised as being not relevant with respect to the questions posed, while only 14% were deemed relevant.
- These findings demonstrate the difficulty of the questions presented as mentioned above.
- The authors will return to this relevance information later in this section when they use it to evaluate the relevance of HeyStaks recommendations.
5.2 Research Questions
- Using this trial data the authors can explore a number of important questions pertaining to the benefits, or otherwise, of social web search and the value of reputation during result recommendation.
- To answer this question the authors can look at the outcome of their quiz as the core search task.
5.3 Quiz Performance
- To begin with it is worth looking at the overall performance of students during the quiz as a basic outcome measure for this search task.
- These results point to benefit of the sharing and collaboration during this search task.
- By comparison, the median values across shared staks are between 5.5 and 8 questions attempted and between 4 to 7 questions correctly answered.
- In general the influence of stak size is less clear in terms of these measures of overall performance.
- It is likely that the search expertise of individual users is playing a role here and as such a simple measure such as stak size is unlikely to be a powerful predictor of overall performance given the variation in expertise that likely exists between between the individual members of a stak.
5.4 Search Queries & Result Activities
- The authors have presented evidence above to show how the members of their shared staks perform better than solitary searchers in their search task.
- The authors key hypothesis is that this is due, at least in part, to the benefits of the type of search collaboration that HeyStaks is designed to facilitate.
- Specifically, the authors posit that the members of shared staks will benefit from relevant results, promoted due to the activities of other stak members, results that might otherwise be difficult to find.
- The authors will look in more detail at these promotions in the next section but first it is useful to look at the level of granular search activity across the different search staks.
20 · McNally et al.
- The authors can view the number of queries submitted by a searcher as a proxy for their search effort and the number of activities (result selections, tagging, etc.) they generate to be an indicator of relevance for the results returned for these queries.
- Now the authors can see a very significant difference between the activities per query for the solitary searchers (approximately 0.4 activities per query) and the collaborating searchers in the shared staks (approximately 0.6 – 0.8 activities per query).
- A Case Study of Collaboration and Reputation in Social Web Search · 21 significantly from results that are, apparently at least, more relevant than those experienced by the latter.
- In the case of the former, on average they correctly answer 0.044 questions per query, but for the latter this ratio increases to 0.15.
5.5 Recommendations & Relevance
- Given that the members of shared search staks seem to be enjoying improved search productivity when compared to their solitary counterparts, the authors now turn their attention to likely source of this improvement: the recommendations that are generated by HeyStaks.
- To begin with it is worth looking at how often HeyStaks is able to recommend results to the members of the different staks.
- This is presented in Figure 10 as the percentage of queries that result in at least one HeyStaks recommendation.
- As expected, larger staks mean more recommendations, because there are more search experiences to act as a source of recommendations.
- For the solitary staks, the authors find that only 16% of the queries lead to recommendations, and, by definition, these recommendations are due to the solitary searcher submitting queries that are similar to those they have used previously.
22 · McNally et al.
- The 5-person stak, nearly 40% of queries lead to recommendations, growing to over 62% for the largest 25-person stak.
- Comparing the graphs for the recommended results versus the organic results the authors can see a significant relevance benefit for the former.
- Similarly, the authors find that, on average, 41% of the organic result activities are for not relevant results compared to only 21% for the recommended result activities.
- Relevance ratio = ar anr (9) To better quantify this relevance benefit the authors can compute a relevance ratio for organic and recommended results as per Equation 9.
- This stak is the best performer (e.g. more questions answered correctly per user), most likely because its members are better searchers to begin with.
5.6 Searcher Reputation
- The results of the previous section highlight the potential benefits of the HeyStaks form of collaborative web search in the context of the target search task.
- Recommended results turned out to be significantly more relevant, according to their independent relevance metric, than conventional organic results.
- Even absent overtly malicious users, recommendation quality can degrade if prolific, but inexperienced, searchers contribute large quantities of irrelevant results to a stak.
- Clearly there is a diverse range of reputation scores across all of these users.
- Figure 14 plots the reputation score of a user versus the number of distinct results contributed to collaboration events by that user.
26 · McNally et al.
- Members of the 9-person stak, which has a very similar median reputation score (14.5) despite having ten fewer members compared to the 19-person stak (14.9).
- The authors know from their earlier performance results that the users in the 9-person stak perform particularly well, both in terms of their quiz performance (e.g. median questions correct per queries submitted) and the relevance of their search results.
- It is also interesting to look at how reputation builds during the course of the trial.
- To examine this the authors note the number of users with non-zero reputation score at 5-minute intervals during the trial; they do this retrospectively by analysing the collaboration logs.
- The authors see a consistent reputation profile across the 4 staks with reputation beginning to accumulate from an early stage, albeit more slowly, as expected, for the 5-person stak.
5.7 Reputation for Recommendation Ranking
- As discussed previously the motivation for incorporating a reputation model into the HeyStaks recommendation engine is to provide a way for searcher expertise to influence recommendation.
- In principle, by increasing the reputation threshold in this way the authors should experience an improvement in recommendation quality, but at the same time it will reduce recommendation coverage — the number of recommendations that can be made — because none of the recommendations for certain queries will exceed the threshold.
- The results of this experiment are presented in Figure 17(b) as the relative benefit (percentage increase in relevance ratio) of reputation-based ranking, in comparison to the default HeyStaks recommendation ranking, for different values of the reputation weight (w), from 0 to 1, and for 3 different reputation thresholds (0, 0.3, and 0.5).
- As the reputation weight is increased, initially the authors see a rapid increase in its relative benefit score but as the reputation weight exceeds 0.6 they see relative benefit fall back as it begins to over-influence the recommendation rankings.
- At least in this experiment, most likely because of the limits ACM Transactions on Intelligent Systems and Technology, Vol. 2, No. 3, 09 2001.
5.8 Limitations & Results Summary
- In this evaluation the authors have described the results of a live-user trial of HeyStaks.
- It is, for example, just one of the many reasons why users avail of search engines and there is clearly an opportunity for further work in order to broaden their evaluation to cover more open-ended search and discovery tasks; preliminary results for these open-ended style evaluations have been presented elsewhere in Smyth et al. [2009b].
- Given these trial limitations, the outcome of their evaluation has been very positive.
- These recommendations effectively amplified the relevance of results selected by search leaders and benefitted search followers accordingly.
- Keane et al. [2008]) and, as such, it is generally accepted that if one can produce rankings where top-ranked results are more relevant, then these rankings are likely to meet with a better user response.
6. CONCLUSIONS
- Many of their information needs are being met by sharing through social networks as much as they are through queries to search engines.
- As web search evolves there is a significant opportunity for search engines ACM Transactions on Intelligent Systems and Technology, Vol. 2, No. 3, 09 2001.
Did you find this useful? Give us your feedback
Citations
2,639 citations
152 citations
144 citations
75 citations
56 citations
References
3,493 citations
[...]
2,410 citations
"A Case Study of Collaboration and R..." refers background in this paper
...…in reputation systems to provide a mechanism to evaluate user reputation and interuser trust across a growing number of social Web and e-commerce applications [Jøsang and Golbeck 2009; O Donovan and Smyth 2005, 2006; Resnick and Zeckhauser 2002; Resnick et al. 2000; Sabater and Sierra 2005]....
[...]
1,948 citations
"A Case Study of Collaboration and R..." refers background in this paper
...Recently there has been considerable interest in reputation systems to provide a mechanism to evaluate user reputation and inter-user trust across a growing number of social web and e-commerce applications [Jøsang and Golbeck 2009; O’Donovan and Smyth 2005; 2006; Sabater and Sierra 2005; Resnick and Zeckhauser 2002; Resnick et al. 2000]....
[...]
...…in reputation systems to provide a mechanism to evaluate user reputation and interuser trust across a growing number of social Web and e-commerce applications [Jøsang and Golbeck 2009; O Donovan and Smyth 2005, 2006; Resnick and Zeckhauser 2002; Resnick et al. 2000; Sabater and Sierra 2005]....
[...]
1,615 citations
1,137 citations
Related Papers (5)
Frequently Asked Questions (9)
Q2. What is the main focus of this paper?
The focus of this paper is the HeyStaks search service (www.heystaks.com), which adds a layer of collaboration on top of mainstream search engines: so users continue to search as normal but benefit from a more collaborative/social search experience.
Q3. What are the other forms of actions that the user must choose to use?
The 3 other forms of actions (voting, sharing, tagging) the authors refer to as explicit actions in the sense that they are not part of the normal search process, but rather they are HeyStaks specific actions that the user must chose to use.
Q4. What is the problem with the Web search?
Part of the problem rests with the searchers themselves: with an average of only 2-3 terms [Lawrence and Giles 1998; Spink and Jansen 2004], the typical Web search query is often vague with respect to the searcher’s true intentions or information needs [Song et al. 2007].
Q5. How is the relevance score assigned to a candidate result?
Each candidate result, r, is assigned a relevance score using a TF*IDF -based retrieval function as per Equation 2, which serves as the basis for an initial recommendation ranking.
Q6. What is the name of the algorithm?
Similar to O’Donovan and Smyth [2005], Massa and Avesani [2007] propose a reputation algorithm called MoleTrust that can be used to augment an existing collaborative filtering system.
Q7. Why is this work motivated by the idea that an understanding of user reputation can serve as a?
This work is, in part, motivated by the idea that an understanding of user reputation can serve as the basis for strategies to guard against malicious users [Lazzari 2010; Hoffman et al.
Q8. What is the standardACM Transactions on Intelligent Systems and Technology?
the standardACM Transactions on Intelligent Systems and Technology, Vol. 2, No. 3, 09 2001.collaborative filtering algorithm is modified to add a user-user trust score to compliment the normal profile or item-based similarity score, so that recommendation partners are chosen from those users that are not only similar to the target user, but who have also had a positive recommendation history with that user.
Q9. What is the main contribution of this paper?
In addition, a second contribution of this paper is a novel enhanced reputation model for HeyStaks, which has been developed in order to evaluate the reputation of individual searchers based on their search contributions.