scispace - formally typeset
Open AccessBook ChapterDOI

Terms of a feather: content-based news recommendation and discovery using twitter

Reads0
Chats0
TLDR
A social news service called Buzzer is described that is capable of adapting to the conversations that are taking place on Twitter to ranking personal RSS subscriptions, and results of a live-user evaluation demonstrate how these ranking strategies can add better item filtering and discovery value to conventional recency-based RSS ranking techniques.
Abstract
User-generated content has dominated the web's recent growth and today the so-called real-time web provides us with unprecedented access to the real-time opinions, views, and ratings of millions of users. For example, Twitter's 200m+ users are generating in the region of 1000+ tweets per second. In this work, we propose that this data can be harnessed as a useful source of recommendation knowledge. We describe a social news service called Buzzer that is capable of adapting to the conversations that are taking place on Twitter to ranking personal RSS subscriptions. This is achieved by a content-based approach of mining trending terms from both the public Twitter timeline and from the timeline of tweets published by a user's own Twitter friend subscriptions. We also present results of a live-user evaluation which demonstrates how these ranking strategies can add better item filtering and discovery value to conventional recency-based RSS ranking techniques.

read more

Content maybe subject to copyright    Report

Terms of a Feather: Content-Based News
Recommendation and Discovery Using Twitter
!
Owen Phelan, Kevin McCarthy, Mike Bennett, and Barry Smyth
CLARITY: Centre for Sensor Web Technologies
School of Computer Science & Informatics
University College Dublin
firstname.lastname@ucd.ie
Abstract. User-generated content has dominated the web’s recent
gro wth and today the so-called real-time web provides us with unprece-
dented access to the real-time op in ions, views, and ratings of millions of
users. For example, Twitter’s 200m+ users are generating in the region
of 1000+ tweets per second. In this work, we propose that this data can
be harnessed as a useful source of recommendation knowledge. We de-
scribe a social news service called Buzzer that is capable of adapting to
the conversations that are taking place on Twitter to ranking personal
RSS subscriptions. This is achieved by a content-based approach of min-
ing trending terms from both the public Twitter timeline and from the
timeline of tweets published by a user’s own Twitter friend subscriptions.
We also present results of a live-user evaluation which demonstrates how
these ranking strategies can add better item filtering and discovery value
to conventional recency-based RSS ranking techniques.
1 Introduction
The real-time web (RTW) is emerging as new technologies enable a g rowing
number of users to share information in multi-dimensional con texts. Sites such as
Twitter (www.twitter.com), Foursquare (www.foursquare.com)areplatforms
for real-time blogging, messaging and live video broadcasting to friends and a
wider global audience. Companies can get instantaneous feedback on products
and services from RTW sites such as Blippr (www.blippr.com). Our research
focusses on the real-time web, in all of its various forms, as a potentially pow-
erful source of recommendation data. For example, we consider the possibility
of mining user profiles based on their Twitter postings. If so, we can use this
profile information as a way to rank items, user recommendation, products and
services for these users, even in the absence of more traditional forms of prefer-
ence data or tra nsaction histories [6]. We may also provide a practical solution
to the cold-start problem [13] of sparse profiles of users’ interests, an issue that
has plagued many item discovery and recommender systems to date.
!
This w ork is gratefully supported by Science Foundation Ireland under Grant No.
07/CE/11147 CLARITY CSET.
P. Clough et al. (Eds.): ECIR 2011, LNCS 6611, pp. 448459, 2011.
c
! Springer-Ve rlag Berlin Heidelb e rg 2011

Terms of a Feather: Content-Based News Recommendation and Discovery 449
Online news is a well-trodden r esearchfield,withmanygoodreasonswhy
IR and AI techniques have the potential to improve the way we consume news
online. For a start there is the sheer volume of news stories that users must deal
with, plus we have varied tastes and preferences with respect to what we are in-
teresting in reading about. At the same time, news is a biased form of media that
is increasingly driven by the stories that are capable of selling advertising. Niche
stories that may be of interest to a small portion of readers often get buried. All
of this has contributed to a long history of using recommender systems to help
users navigate through the sea of stories that are published everyday based on
learned profiles of user s. For example, Google News (http://news.google.com)
is a topically segregated mashup of a number of feeds, with automatic ranking
strategies based on user interactions (click-histories & click-thrus) [5]. It is an
example of a hybrid technique for news recommendation, as it utilises a user’s
search keywords from Google itself as a supp ort for explicit ratings. Another pop-
ular example is Digg (www.digg.com), whose webpage rating service generally
leads to a high overlap of selected topical news items [12].
This paper extends some of the previous work presented in [15], which de-
scribed an early prototype of the Buzzer system, in two ways. First, we describe
amorecomprehensiveandrobustrecommendationframeworkthathasbeenex-
tended both in terms of the dierent s ources of recommendation knowledge and
the recommendation strategies that it users. Secondly, we describe the result of
alive-userevaluationwith35usersovera1monthperiod,andbasedonmore
than 30,000 news stories and in excess of 50 million Twitter messages, the results
of which describe interesting usage patterns compared to recency benchmarks.
2Background
Many research opportunities remain when considering how to adapt recommen-
dation techniques to tackle the so-called information explosion on the web. Digg,
for example, mines implicit click-thrus of articles as well as ratings and user-
tagging folksonomies as a basis of con tent retrieval for users [12]. One of the
byproducts of Diggs operation is that users’ browsing and sharing activities
generally involve socially or temporally topical items, so as such it has been
branded as a sort of news service [12]. Diculties arise where it is necessary
for many users to implicitly (click, share, tag) and explicitly (star or digg) rate
items many times for those items to emerge as topical things. Also, there would
be considerable item churn, that is, the corpus of data is constantly updating
and item relevances are constantly fluctuating. The space of documents them-
selves could be defined by Brusilovsky and Henze as an Open Corpus Adaptive
Hypermedia System in that there is an o pen corpus of documents (though topic
specific), that can constantly change and expand [3].
Google News is a popular service that uses (mostly unpublished) recommen-
dation techniques to filter 4500 partner news providers to present an aggregated
view for registered users of popular and topical content [5]. Items are usually
between several seconds to 30 days old, and appear on the “front page” based
on click-thrus and key-word term Google—search queries. The ranking itself is

450 O. Phelan et al.
mostly based on click-thru rates of items,higherrankeditemshavemoreclicks.
Issues arise with new and topical items struggling to get to the “front page”, as
it is necessary for a critical-mass of clicks from many users. Das et al. [5] mostly
describ e scalability of the system a s an issue with the service, and propose sev-
eral techniques know to be capable of dealing with s uch issues. These included
MinHash, Probabilistic Latent Semantic Indexing (PLSI) and Latent Semantic
Hashing (LSH) as component algorithms in an overall hybrid system.
Content-based approaches are widely discussed in many branches of recom-
mender systems [2,9,13,14]. Examples such as the News@Hand semantic system
by Cantador et al. [4] show encouraging moves towards considering the content of
the news items themselves. The authors use semantic annotation and ontologies
to structure items in to groups, while matching this to similarly structured user
profiles of preferred items unfortunately the success of these are based on the
quality and existence of established domain ontologies. Our approach is to look
at the most atomic components of the content, the individual terms themselves.
There is currently considerable r esearch attention being paid to Twitter and
the real-time web in general. RTW services provide access to new types of infor-
mation and the real-time nature of these data streams provide as many oppor-
tunities as they do challenges. In addition, companies like Twitter and Yahoo
have adopted a very open approach to making their data available and Twitter’s
developer API provides researchers withaccesstoahugevolumeofinformation
for example. It is no surprisethenthattherecentliterature includes analyses of
Twitter’s real-time data, largely with a view to developing an early understand-
ing of why and how people are using services like Tw itter [7,8,11]. For instance,
the work of Kwak et al. [11] describes a very comprehensive analysis of Twitter
users and Twitter usage, covering almost 42m users, nearly 1.5bn social connec-
tions, and over 100m tweets. In this work, the authors have examined reciprocity
and homophily among Twitter users, they have compared a number of dierent
ways to evaluate user influence, as well as investigating how information diuses
through the Twitter ecosystem as a result o fsocialrelationshipsandretweet-
ing behaviour. Similarly, Krishnamurthy et al. identify classes of Twitter users
based on behaviours and geographical dispersion [10]. They highlight the pro-
cess of producing and consuming content based on retweet actions, where users
source and disseminate information through the network.
We are interested in the potential to use near-ubiquitous user-generated con-
tent as a source of preference and profiling information in order to drive recom-
mendation, as such in this research context Buzzer is termed a content-based
recommender. User-generated content is inherently noisy but it is plentiful, and
recently researchers have started to consider its utility in recommendation. There
has been some recent work [17] o n the role of tags in recommender systems, and
researchers have also started to leverage user-generated reviews a s a way to rec-
ommend and filter pro ducts and services. For example, Acair et al. lo ok at the
use of user-generated movie reviews from IMDb as part of a movie recommender
system [1] and similar ideas are discussed in [18].

Terms of a Feather: Content-Based News Recommendation and Discovery 451
Fig. 1. AscreenshotofBuzzer,withpersonalizednewsresultsforagivenuser
Both of these instances of related work look to mine review content as an ad-
ditional source of recommendation knowledge (in a similar way to the content-
boosted collaborative filtering technique in Melville et al. [13]), but they rely on
the availability of detailed item reviews, which may run to hundreds of words
but which may not always be available. In this paper, we consider trending and
emerging topics on user-generated content sites like twitter as a way to auto-
matically derive recommendation data for topical news and web-item discovery.
3TheBuzzerSystem
People talk about news and events on Twitter all of the time. They share web
pages about news stories. They express their views on recent stories. They even
report on emerging news stories as they happen. Surely then it is logical to
think of Twitter as a source of news information and news preferences? The
challenge of course is that Twitter is borderline chaotic: tweets are little more
than impressions of information through fleeting moments of time. Can we really
hope to make sense of this signal and noise and harness the chaos as a way
to search, filter and rank news stories? This is the objective of the research
presented in this paper. Specifically, we aim to mine Twitter information, from
both public data streams, and the streams of related users, as a way to identify
discriminating terms that are capable of being used to highlight breaking and
interesting news stories.
As such the Buzzer system adopts a content-based technique to recommending
news articles, but instead of using structured user profiles we use unstructured
real-time feeds from Twitter. In eect, the user messages (tweets)themselvesact

452 O. Phelan et al.
Tweets
Index
(Pub OR User SG)
Co-occuring Term
Gatherer
!
Gathers weighted vector
of terms from both
!
Finds co-occuring terms
Recommendation
Engine (articles)
!
Queries RSS Index to
nd articles
!
Aggregates scores
!
Ranks the articles based
on the summed scores
!
Generates term-
frequency tag cloud
!
Returns the ranked list
{Q}
List of Articles
(RecList)
Strategy x
Result List
Queries
RSS
Index
(Comm OR User)
Strategy xs input data feeds
Fig. 2. Generating results for a given strategy. System mines a specified RSS and
Twitter source and uses the co-occuring technique described to generate a set of results,
which will be interleaved with other sets to produce the final list shown to users.
as an implicit ratings system for promoting and filtering content for retrieval in
alargespaceofitemsofvariedtopicalityorrelevancetousers.
3.1 System Architecture
The high-level Buzzer system architectureispresentedinFigure2.Insummary,
Buzzer generates two content indexes, one from Twitter (including public tweets
and Buzzer-user tweets as discussed below) and one from the RSS feeds of Buzzer
users. Buzzer looks for correlations between the terms that are present in tweets
and RSS articles and ranks articles accordingly. In this way, articles with content
that appear to match the content of recent Twitter chatter (whether public or
user related) will receive high scores during recommendation.Figure1showsa
sample list of recommendationsforaparticularuser.Buzzeritselfisdeveloped
as a web application and can take the place of a user’s normal RSS reader: the
user continues to have access to their favourite RSS feeds but in addition, by
syncing Buzzer with their Twitter account, they have the potential to benefit
from a more informative ranking of news stories based on their inferred interests.
3.2 Strategies
Each Buzzer user brings two types of information to the system (1) their RSS
feeds; (2) their Twitter social graph and this suggests a number of dierent
ways of combining tweets and RSS during recommendation. In this paper, we
explore 4 dierent news retrieval strategies (S1 S4) as outlined in Figure 3.
For example, stories/articles can b e mined from a user’s personal RSS feeds or
from the RSS feeds of the wider Buzzer community. Moreover, stories can be
ranked based on the tweets of the user’s own Twitter social graph, that is the
tweets of their friends and followers, or from the tweets of the public Twitter
timeline. This gives us 4 dierent retrieval strategies as follows (as visualized in
Figure 3):

Citations
More filters
Proceedings ArticleDOI

DRN: A Deep Reinforcement Learning Framework for News Recommendation

TL;DR: A Deep Q-Learning based recommendation framework, which can model future reward explicitly, is proposed, which considers user return pattern as a supplement to click / no click label in order to capture more user feedback information.
Journal ArticleDOI

A Survey of Social-Based Routing in Delay Tolerant Networks: Positive and Negative Social Effects

TL;DR: The social properties in DTNs are summarized, some open issues and challenges in social-based approaches regarding the design of DTN routing protocols are discussed, and some of these methods either take advantages of positive social characteristics to assist packet forwarding or consider negative social characteristics such as selfishness.
Proceedings ArticleDOI

NPA: Neural News Recommendation with Personalized Attention

TL;DR: In this article, a neural news recommendation model with personalized attention (NPA) is proposed, which exploits the embedding of user ID to generate the query vector for the word-and news-level attentions.
Proceedings ArticleDOI

Neural News Recommendation with Multi-Head Self-Attention.

TL;DR: A neural news recommendation approach with multi-head self-attentions to learn news representations from news titles by modeling the interactions between words and applies additive attention to learn more informative news and user representations by selecting important words and news.
Proceedings ArticleDOI

From chatter to headlines: harnessing the real-time web for personalized news recommendation

TL;DR: The results show that the combination of various signals from real-time Web and micro-blogging platforms can be a useful resource to understand user behavior.
References
More filters
Journal ArticleDOI

Machine learning in automated text categorization

TL;DR: This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
Proceedings ArticleDOI

What is Twitter, a social network or a news media?

TL;DR: In this paper, the authors have crawled the entire Twittersphere and found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks.
Journal ArticleDOI

Fab: content-based, collaborative recommendation

TL;DR: It is explained how a hybrid system can incorporate the advantages of both methods while inheriting the disadvantages of neither, and how the particular design of the Fab architecture brings two additional benefits.
Proceedings ArticleDOI

Why we twitter: understanding microblogging usage and communities

TL;DR: It is found that people use microblogging to talk about their daily activities and to seek or share information and the user intentions associated at a community level are analyzed to show how users with similar intentions connect with each other.
Book ChapterDOI

Content-based recommendation systems

TL;DR: This chapter discusses content-based recommendation systems, i.e., systems that recommend an item to a user based upon a description of the item and a profile of the user's interests, which are used in a variety of domains ranging from recommending web pages, news articles, restaurants, television programs, and items for sale.
Frequently Asked Questions (9)
Q1. What are the contributions mentioned in the paper "Terms of a feather: content-based news recommendation and discovery using twitter!" ?

User-generated content has dominated the web ’ s recent growth and today the so-called real-time web provides us with unprecedented access to the real-time opinions, views, and ratings of millions of users. In this work, the authors propose that this data can be harnessed as a useful source of recommendation knowledge. The authors describe a social news service called Buzzer that is capable of adapting to the conversations that are taking place on Twitter to ranking personal RSS subscriptions. The authors also present results of a live-user evaluation which demonstrates how these ranking strategies can add better item filtering and discovery value to conventional recency-based RSS ranking techniques. 

There are many opportunities for further work within the scope of this research. Some suggestions include considering preference rankings and click-thrus as part of the recommendation algorithm. Also, it will be interesting to consider whether the reputation of users on Twitter has a bearing on how useful their tweets are during ranking. 

In this paper, the authors consider trending and emerging topics on user-generated content sites like twitter as a way to automatically derive recommendation data for topical news and web-item discovery. 

In addition, the 35 users registered a total of 281 unique RSS feeds as story sources and during the evaluation period these feeds generated a total of 31,137 unique stories/articles. 

Buzzer itself is developed as a web application and can take the place of a user’s normal RSS reader: the user continues to have access to their favourite RSS feeds but in addition, by syncing Buzzer with their Twitter account, they have the potential to benefit from a more informative ranking of news stories based on their inferred interests. 

The authors are interested in the potential to use near-ubiquitous user-generated content as a source of preference and profiling information in order to drive recommendation, as such in this research context Buzzer is termed a content-based recommender. 

During this timeframe the authors gathered a total of 56 million public tweets (for use in strategies S1 and S3) and 537,307 tweets from the social graphs of the 35 registered users (for use in S2 and S4). 

the authors can see that drawing stories from the larger community of RSS feeds (S3 + S4) attracts fewer click-thrus (approximately 150) than stories that are drawn from the user’s personal RSS feeds (strategies S1 + S2), which attract about 225 click-thrus, which is acceptable and expected. 

The space of documents themselves could be defined by Brusilovsky and Henze as an Open Corpus Adaptive Hypermedia System in that there is an open corpus of documents (though topic specific), that can constantly change and expand [3].