What have the authors stated for future works in "Terms of a feather: content-based news recommendation and discovery using twitter!" ?

There are many opportunities for further work within the scope of this research. Some suggestions include considering preference rankings and click-thrus as part of the recommendation algorithm. Also, it will be interesting to consider whether the reputation of users on Twitter has a bearing on how useful their tweets are during ranking.

How many unique stories were generated during the evaluation period?

In addition, the 35 users registered a total of 281 unique RSS feeds as story sources and during the evaluation period these feeds generated a total of 31,137 unique stories/articles.

What is the meaning of the term content-based recommender?

The authors are interested in the potential to use near-ubiquitous user-generated content as a source of preference and profiling information in order to drive recommendation, as such in this research context Buzzer is termed a content-based recommender.

How many tweets were retrieved from the social graphs of the 35 registered users?

During this timeframe the authors gathered a total of 56 million public tweets (for use in strategies S1 and S3) and 537,307 tweets from the social graphs of the 35 registered users (for use in S2 and S4).

How many click-thrus are expected to be generated by the strategy?

the authors can see that drawing stories from the larger community of RSS feeds (S3 + S4) attracts fewer click-thrus (approximately 150) than stories that are drawn from the user’s personal RSS feeds (strategies S1 + S2), which attract about 225 click-thrus, which is acceptable and expected.

(Open Access) Terms of a feather: content-based news recommendation and discovery using twitter (2011) | Owen Phelan

Q: What are the contributions mentioned in the paper "Terms of a feather: content-based news recommendation and discovery using twitter!" ?

User-generated content has dominated the web ’ s recent growth and today the so-called real-time web provides us with unprecedented access to the real-time opinions, views, and ratings of millions of users. In this work, the authors propose that this data can be harnessed as a useful source of recommendation knowledge. The authors describe a social news service called Buzzer that is capable of adapting to the conversations that are taking place on Twitter to ranking personal RSS subscriptions. The authors also present results of a live-user evaluation which demonstrates how these ranking strategies can add better item filtering and discovery value to conventional recency-based RSS ranking techniques.

Q: What is the purpose of the paper?

In this paper, the authors consider trending and emerging topics on user-generated content sites like twitter as a way to automatically derive recommendation data for topical news and web-item discovery.

Q: What is the role of the user’s RSS feed?

Buzzer itself is developed as a web application and can take the place of a user’s normal RSS reader: the user continues to have access to their favourite RSS feeds but in addition, by syncing Buzzer with their Twitter account, they have the potential to benefit from a more informative ranking of news stories based on their inferred interests.

Q: What is the definition of the space of documents?

The space of documents themselves could be defined by Brusilovsky and Henze as an Open Corpus Adaptive Hypermedia System in that there is an open corpus of documents (though topic specific), that can constantly change and expand [3].

Terms of a Feather: Content-Based News

Recommendation and Discovery Using Twitter

Owen Phelan, Kevin McCarthy, Mike Bennett, and Barry Smyth

CLARITY: Centre for Sensor Web Technologies

School of Computer Science & Informatics

University College Dublin

firstname.lastname@ucd.ie

Abstract. User-generated content has dominated the web’s recent

gro wth and today the so-called real-time web provides us with unprece-

dented access to the real-time op in ions, views, and ratings of millions of

users. For example, Twitter’s 200m+ users are generating in the region

of 1000+ tweets per second. In this work, we propose that this data can

be harnessed as a useful source of recommendation knowledge. We de-

scribe a social news service called Buzzer that is capable of adapting to

the conversations that are taking place on Twitter to ranking personal

RSS subscriptions. This is achieved by a content-based approach of min-

ing trending terms from both the public Twitter timeline and from the

timeline of tweets published by a user’s own Twitter friend subscriptions.

We also present results of a live-user evaluation which demonstrates how

these ranking strategies can add better item ﬁltering and discovery value

to conventional recency-based RSS ranking techniques.

1 Introduction

The real-time web (RTW) is emerging as new technologies enable a g rowing

number of users to share information in multi-dimensional con texts. Sites such as

Twitter (www.twitter.com), Foursquare (www.foursquare.com)areplatforms

for real-time blogging, messaging and live video broadcasting to friends and a

wider global audience. Companies can get instantaneous feedback on products

and services from RTW sites such as Blippr (www.blippr.com). Our research

focusses on the real-time web, in all of its various forms, as a potentially pow-

erful source of recommendation data. For example, we consider the possibility

of mining user proﬁles based on their Twitter postings. If so, we can use this

proﬁle information as a way to rank items, user recommendation, products and

services for these users, even in the absence of more traditional forms of prefer-

ence data or tra nsaction histories [6]. We may also provide a practical solution

to the cold-start problem [13] of sparse proﬁles of users’ interests, an issue that

has plagued many item discovery and recommender systems to date.

This w ork is gratefully supported by Science Foundation Ireland under Grant No.

07/CE/11147 CLARITY CSET.

P. Clough et al. (Eds.): ECIR 2011, LNCS 6611, pp. 448–459, 2011.

! Springer-Ve rlag Berlin Heidelb e rg 2011

Terms of a Feather: Content-Based News Recommendation and Discovery 449
Online news is a well-trodden r esearchﬁeld,withmanygoodreasonswhy
IR and AI techniques have the potential to improve the way we consume news
online. For a start there is the sheer volume of news stories that users must deal
with, plus we have varied tastes and preferences with respect to what we are in-
teresting in reading about. At the same time, news is a biased form of media that
is increasingly driven by the stories that are capable of selling advertising. Niche
stories that may be of interest to a small portion of readers often get buried. All
of this has contributed to a long history of using recommender systems to help
users navigate through the sea of stories that are published everyday based on
learned proﬁles of user s. For example, Google News (http://news.google.com)
is a topically segregated mashup of a number of feeds, with automatic ranking
strategies based on user interactions (click-histories & click-thrus) [5]. It is an
example of a hybrid technique for news recommendation, as it utilises a user’s
search keywords from Google itself as a supp ort for explicit ratings. Another pop-
ular example is Digg (www.digg.com), whose webpage rating service generally
leads to a high overlap of selected topical news items [12].
This paper extends some of the previous work presented in [15], which de-
scribed an early prototype of the Buzzer system, in two ways. First, we describe
amorecomprehensiveandrobustrecommendationframeworkthathasbeenex-
tended both in terms of the diﬀerent s ources of recommendation knowledge and
the recommendation strategies that it users. Secondly, we describe the result of
alive-userevaluationwith35usersovera1monthperiod,andbasedonmore
than 30,000 news stories and in excess of 50 million Twitter messages, the results
of which describe interesting usage patterns compared to recency benchmarks.
2Background
Many research opportunities remain when considering how to adapt recommen-
dation techniques to tackle the so-called information explosion on the web. Digg,
for example, mines implicit click-thrus of articles as well as ratings and user-
tagging folksonomies as a basis of con tent retrieval for users [12]. One of the
byproducts of Digg’s operation is that users’ browsing and sharing activities
generally involve socially or temporally topical items, so as such it has been
branded as a sort of news service [12]. Diﬃculties arise where it is necessary
for many users to implicitly (click, share, tag) and explicitly (star or digg) rate
items many times for those items to emerge as topical things. Also, there would
be considerable item churn, that is, the corpus of data is constantly updating
and item relevances are constantly ﬂuctuating. The space of documents them-
selves could be deﬁned by Brusilovsky and Henze as an Open Corpus Adaptive
Hypermedia System in that there is an o pen corpus of documents (though topic
speciﬁc), that can constantly change and expand [3].
Google News is a popular service that uses (mostly unpublished) recommen-
dation techniques to ﬁlter 4500 partner news providers to present an aggregated
view for registered users of popular and topical content [5]. Items are usually
between several seconds to 30 days old, and appear on the “front page” based
on click-thrus and key-word term Google—search queries. The ranking itself is

450 O. Phelan et al.
mostly based on click-thru rates of items,higherrankeditemshavemoreclicks.
Issues arise with new and topical items struggling to get to the “front page”, as
it is necessary for a critical-mass of clicks from many users. Das et al. [5] mostly
describ e scalability of the system a s an issue with the service, and propose sev-
eral techniques know to be capable of dealing with s uch issues. These included
MinHash, Probabilistic Latent Semantic Indexing (PLSI) and Latent Semantic
Hashing (LSH) as component algorithms in an overall hybrid system.
Content-based approaches are widely discussed in many branches of recom-
mender systems [2,9,13,14]. Examples such as the News@Hand semantic system
by Cantador et al. [4] show encouraging moves towards considering the content of
the news items themselves. The authors use semantic annotation and ontologies
to structure items in to groups, while matching this to similarly structured user
proﬁles of preferred items — unfortunately the success of these are based on the
quality and existence of established domain ontologies. Our approach is to look
at the most atomic components of the content, the individual terms themselves.
There is currently considerable r esearch attention being paid to Twitter and
the real-time web in general. RTW services provide access to new types of infor-
mation and the real-time nature of these data streams provide as many oppor-
tunities as they do challenges. In addition, companies like Twitter and Yahoo
have adopted a very open approach to making their data available and Twitter’s
developer API provides researchers withaccesstoahugevolumeofinformation
for example. It is no surprisethenthattherecentliterature includes analyses of
Twitter’s real-time data, largely with a view to developing an early understand-
ing of why and how people are using services like Tw itter [7,8,11]. For instance,
the work of Kwak et al. [11] describes a very comprehensive analysis of Twitter
users and Twitter usage, covering almost 42m users, nearly 1.5bn social connec-
tions, and over 100m tweets. In this work, the authors have examined reciprocity
and homophily among Twitter users, they have compared a number of diﬀerent
ways to evaluate user inﬂuence, as well as investigating how information diﬀuses
through the Twitter ecosystem as a result o fsocialrelationshipsandretweet-
ing behaviour. Similarly, Krishnamurthy et al. identify classes of Twitter users
based on behaviours and geographical dispersion [10]. They highlight the pro-
cess of producing and consuming content based on retweet actions, where users
source and disseminate information through the network.
We are interested in the potential to use near-ubiquitous user-generated con-
tent as a source of preference and proﬁling information in order to drive recom-
mendation, as such in this research context Buzzer is termed a content-based
recommender. User-generated content is inherently noisy but it is plentiful, and
recently researchers have started to consider its utility in recommendation. There
has been some recent work [17] o n the role of tags in recommender systems, and
researchers have also started to leverage user-generated reviews a s a way to rec-
ommend and ﬁlter pro ducts and services. For example, Acair et al. lo ok at the
use of user-generated movie reviews from IMDb as part of a movie recommender
system [1] and similar ideas are discussed in [18].

Terms of a Feather: Content-Based News Recommendation and Discovery 451

Fig. 1. AscreenshotofBuzzer,withpersonalizednewsresultsforagivenuser

Both of these instances of related work look to mine review content as an ad-

ditional source of recommendation knowledge (in a similar way to the content-

boosted collaborative ﬁltering technique in Melville et al. [13]), but they rely on

the availability of detailed item reviews, which may run to hundreds of words

but which may not always be available. In this paper, we consider trending and

emerging topics on user-generated content sites like twitter as a way to auto-

matically derive recommendation data for topical news and web-item discovery.

3TheBuzzerSystem

People talk about news and events on Twitter all of the time. They share web

pages about news stories. They express their views on recent stories. They even

report on emerging news stories as they happen. Surely then it is logical to

think of Twitter as a source of news information and news preferences? The

challenge of course is that Twitter is borderline chaotic: tweets are little more

than impressions of information through ﬂeeting moments of time. Can we really

hope to make sense of this signal and noise and harness the chaos as a way

to search, ﬁlter and rank news stories? This is the objective of the research

presented in this paper. Speciﬁcally, we aim to mine Twitter information, from

both public data streams, and the streams of related users, as a way to identify

discriminating terms that are capable of being used to highlight breaking and

interesting news stories.

As such the Buzzer system adopts a content-based technique to recommending

news articles, but instead of using structured user proﬁles we use unstructured

real-time feeds from Twitter. In eﬀect, the user messages (tweets)themselvesact

452 O. Phelan et al.

Tweets

Index

(Pub OR User SG)

Co-occuring Term

Gatherer

Gathers weighted vector

of terms from both

Finds co-occuring terms

Recommendation

Engine (articles)

Queries RSS Index to

nd articles

Aggregates scores

Ranks the articles based

on the summed scores

Generates term-

frequency tag cloud

Returns the ranked list

{Q}

List of Articles

(RecList)

Strategy x

Result List

Queries

RSS

Index

(Comm OR User)

Strategy xs input data feeds

Fig. 2. Generating results for a given strategy. System mines a speciﬁed RSS and

Twitter source and uses the co-occuring technique described to generate a set of results,

which will be interleaved with other sets to produce the ﬁnal list shown to users.

as an implicit ratings system for promoting and ﬁltering content for retrieval in

alargespaceofitemsofvariedtopicalityorrelevancetousers.

3.1 System Architecture

The high-level Buzzer system architectureispresentedinFigure2.Insummary,

Buzzer generates two content indexes, one from Twitter (including public tweets

and Buzzer-user tweets as discussed below) and one from the RSS feeds of Buzzer

users. Buzzer looks for correlations between the terms that are present in tweets

and RSS articles and ranks articles accordingly. In this way, articles with content

that appear to match the content of recent Twitter chatter (whether public or

user related) will receive high scores during recommendation.Figure1showsa

sample list of recommendationsforaparticularuser.Buzzeritselfisdeveloped

as a web application and can take the place of a user’s normal RSS reader: the

user continues to have access to their favourite RSS feeds but in addition, by

syncing Buzzer with their Twitter account, they have the potential to beneﬁt

from a more informative ranking of news stories based on their inferred interests.

3.2 Strategies

Each Buzzer user brings two types of information to the system — (1) their RSS

feeds; (2) their Twitter social graph — and this suggests a number of diﬀerent

ways of combining tweets and RSS during recommendation. In this paper, we

explore 4 diﬀerent news retrieval strategies (S1 − S4) as outlined in Figure 3.

For example, stories/articles can b e mined from a user’s personal RSS feeds or

from the RSS feeds of the wider Buzzer community. Moreover, stories can be

ranked based on the tweets of the user’s own Twitter social graph, that is the

tweets of their friends and followers, or from the tweets of the public Twitter

timeline. This gives us 4 diﬀerent retrieval strategies as follows (as visualized in

Figure 3):

Terms of a feather: content-based news recommendation and discovery using twitter

Figures

Citations

DRN: A Deep Reinforcement Learning Framework for News Recommendation

A Survey of Social-Based Routing in Delay Tolerant Networks: Positive and Negative Social Effects

NPA: Neural News Recommendation with Personalized Attention

Neural News Recommendation with Multi-Head Self-Attention.

From chatter to headlines: harnessing the real-time web for personalized news recommendation

References

Machine learning in automated text categorization

What is Twitter, a social network or a news media?

Fab: content-based, collaborative recommendation

Why we twitter: understanding microblogging usage and communities

Content-based recommendation systems

Related Papers (5)

Google news personalization: scalable online collaborative filtering

Personalized news recommendation based on click behavior

What is Twitter, a social network or a news media?

Factorization Machines with libFM

Glove: Global Vectors for Word Representation

Frequently Asked Questions (9)

Q1. What are the contributions mentioned in the paper "Terms of a feather: content-based news recommendation and discovery using twitter!" ?

Q2. What have the authors stated for future works in "Terms of a feather: content-based news recommendation and discovery using twitter!" ?

Q3. What is the purpose of the paper?

Q4. How many unique stories were generated during the evaluation period?

Q5. What is the role of the user’s RSS feed?

Q6. What is the meaning of the term content-based recommender?

Q7. How many tweets were retrieved from the social graphs of the 35 registered users?

Q8. How many click-thrus are expected to be generated by the strategy?

Q9. What is the definition of the space of documents?