scispace - formally typeset

Book ChapterDOI

A Hybrid Model to Improve Filtering Systems

20 May 2015-pp 303-314

TL;DR: This work model a collaborative filtering system by using Friend Of A Friend (FOAF) formalism to represent the users and the Dublin Core (DC) vocabulary to represents the resources “items” and adopts the Resource Description Framework (RDF) syntax to describe the various modules of the system.

AbstractThere is a continuous information overload on the Web. The problem treated is how to have relevant information (documents, products, services etc.) at time and without difficulty. Filtering system also called recommender systems have widely used to recommend relevant resources to users by similarity process such as Amazon, MovieLens, Cdnow etc. The trend is to improve the information filtering approaches to better answer the users expectations. In this work, we model a collaborative filtering system by using Friend Of A Friend (FOAF) formalism to represent the users and the Dublin Core (DC) vocabulary to represent the resources “items”. In addition, to ensure the interoperability and openness of this model, we adopt the Resource Description Framework (RDF) syntax to describe the various modules of the system. A hybrid function is introduced for the calculation of prediction. Empirical tests on various real data sets (Book-Crossing, FoafPub) showed satisfactory performances in terms of relevance and precision.

Topics: Recommender system (63%), Collaborative filtering (61%), MovieLens (61%), RDF (57%), Friend of a friend (57%)

Summary (3 min read)

1 Introduction

  • The multiplicity of the services offered via the Web excites the Net surfers to expose and communicate an enormous traffic of data of various formats.
  • This phenomenon known under the name big data imposes multiple difficulties such as management, storage, the control and the security of circulated data.
  • Many commercial and educational sites are based on the filtering algorithms to recommend their products such as the Amazon, Movielens, Netflix, EducationWorld etc [5].
  • In their study, the authors adopted the RDF model to represent all elements of the system with an open and interoperable manner.
  • The section 3 presents the details of their proposal.

2 State of the Art

  • The number of Internet users has now reached 38.8% of the world population in 2013 against 0.4% in 1995 according to statistics provided by ITU (http://www.itu.int/en/ITU-D/Statistics/Pages/stat/default.aspx).
  • According to how to estimate the relevance, researchers classify recommendation algorithms into three main approaches: content-based, collaborative and hybrid [4].
  • The hybrid methods operate to attenuate the insufficiencies of each of the two previous approaches by combining them in various manners.

3 Proposed approach

  • The authors study focuses on reducing the sparsity problem through the similarity of items via the values of DC properties, as well as the similarity of users through the values of FOAF properties.
  • The values of properties are heterogeneous type nominal, ordinal, qualitative, etc ., so the authors have defined several functions of encoding and normalization to convert these properties in a numeric scale.

3.1 RDF specification

  • Resource Description Framework RDF (http://www.w3.org/TR/2004/REC-rdfsyntax-grammar-20040210/) is a data model for the description of various types of resources (person, web page, movie, service, book etc.).
  • It treats the data and its properties and the relationship between them, in other words it is a formal specification by meta-data, originally designed by W3C, whose purpose is to allow a com-munity of users to share the same meta-data for shared resources.
  • One of the great advantages of RDF is its extensibility through the use of RDF schemas that can be integrated and not mutually exclusive with the use of namespace and URI (Uniform Resource Identifier) concepts [7].
  • Thus, in order to keep the collaborative filtering approach the authors took into account the feedback of the users in the process of computing similarity, moreover they used a hybrid function to define the prediction value.

3.2 Item′s representation

  • A social FS consists of resources items, the users profiles and the histories which memorizes the interactions of the users about items recommended.
  • The authors exploited the meta-data of the Dublin core vocabulary as being a standardization description of items, the attributes values of the vocabulary allowed us to calculate the degree of similarity between items and group them into communities.
  • //dublincore.org) is a set of simple and effective elements to describe a wide variety of web resources, the standard version of this format includes 15 elements of which semantics has been established by an international consensus coming from various disciplines recommended by W3C, also known as Dublin Core DC (http.
  • These elements are gathered in three categories those which describe the contents (Cover, Description, Type, Relation, Source, Subject) and those which describe the individual properties (Collaborator, Creator, Editor, Rights) and others for instantiations (Date, Format, Identifier, Language), the current version is known as 1.1, validated in 2007 and revised in 2012 by DCMI (Dublin Core Metadata Initiative, (http://dublincore.org/documents/dces/).

3.3 User′s Representation

  • The objective of FS is to deliver the relevant items to the user, because the formation of the communities depends on the attributes values defined in the user profile.
  • Among the most common current practices the authors adopted the FOAF vocabulary to represent their profiles.
  • It can be used to search for individuals and communities: CV, social networks and management of the online communities, online identification and management of participation in projects etc.
  • A file FOAF can contain various information (name, family name, dateOfBirth, gender, mbox, Home Page, weblog, interest, accountName, Knows,etc.).
  • Following the very high number of the users in interaction, it is very important to well form the community as a building block in the FS and assuming one for all and all for one.

3.4 Recommendation engine

  • Simdc, similarity that using the Dublin Core vocabulary for describing items.
  • By the use of the URI, while identifying item and by exploiting its own metadata allowing reduce the sparsity problem.
  • Simf , similarity which depends on the representation of the profiles by the means of FOAF formalism, in favour of the variety of the fields and the availability of the data in profile, thus, the authors can overcome the problem of cold start of a new user and to still better forming the communities.
  • The recommendation process is purely automatic and directly related to the prediction value, so a given item is deemed relevant and deserves to be sent to the user if and only if its predictive value is greater than a given threshold.

4 Experimentation

  • This section is devoted to the experimental results of their hybrid solution on real data sets.
  • For evaluation and comparison, the authors implemented item-CF (item based collaborative filtering) approach widely referenced in Collaborative filtering search [6].

4.2 Relevance metrics

  • To evaluate the method presented in this article, the authors held a special metric and widely used in the FS, it is MAE, and two other metrics, recall and precision of information retrieval field [16, 9].
  • Mean Absolute Error, calculating the mean absolute difference between predictions pi retained by the system and the real evaluations ei given by users, also known as MAE.
  • This measure is simple to implement and directly interpretable.
  • P = Npr Nr Recall: it is the ratio between the number of relevant items returned by the system and the total number of existing relevant items in the database.
  • These metrics respectively measures the error, the effectiveness and the quality of FS.

4.3 Results and discussion

  • The authors discuss the experimental results obtained, for that, they divide the dataset size in two parts, one having a proportion of 80% has dedicated for training phase and the other of proportion of a 20% has dedicated for test phase.
  • Also the authors observe that the DC curve illustrates a slightly favourable result compared to the FOAF curve, as the items are identified and enriched by descriptions and meta-data with certain stability better than valorising links and subjective opinions between a user′s networks.
  • Moreover, the URI clause for the unique resource identification in rdf documents lowers the effect of scalability.
  • The authors also observe that the recall rate which reaches a maximum rate of 45% for the optimal Hybrid solution involves the role of property values of adopted vocabularies to filter only the relevant items.

Did you find this useful? Give us your feedback

...read more

Content maybe subject to copyright    Report

HAL Id: hal-01789930
https://hal.inria.fr/hal-01789930
Submitted on 11 May 2018
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Distributed under a Creative Commons Attribution| 4.0 International License
A Hybrid Model to Improve Filtering Systems
Kharroubi Sahraoui, Dahmani youcef, Nouali Omar
To cite this version:
Kharroubi Sahraoui, Dahmani youcef, Nouali Omar. A Hybrid Model to Improve Filtering Systems.
5th International Conference on Computer Science and Its Applications (CIIA), May 2015, Saida,
Algeria. pp.303-314, �10.1007/978-3-319-19578-0_25�. �hal-01789930�

A hybrid model to improve filtering systems
KHARROUBI Sahraoui
1
, DAHMANI Youcef
2
, and NOUALI Omar
3
1
National High School of Computer Science E.S.I, &
Ibn Khaldoun University Tiaret, Algeria,
s kharoubi@esi.dz
2
Department of Computer Science,
Ibn Khaldoun University, Tiaret, Algeria
dahmani y@yahoo.fr
3
Basic Software Laboratory, C.E.R.I.S.T, Ben Aknoun, Algeria
o nouali@cerist.dz
Abstract. There is a continuous information overload on the Web. The
problem treated is how to have relevant information (documents, prod-
ucts, services etc.) at time and without difficulty. Filtering system also
called recommender systems have widely used to recommend relevant
resources to users by similarity process such as Amazon, MovieLens, Cd-
now etc. The trend is to improve the information filtering approaches to
better answer the users expectations. In this work, we model a collabo-
rative filtering system by using Friend Of A Friend (FOAF) formalism to
represent the users and the Dublin Core (DC) vocabulary to represent
the resources “items”. In addition, to ensure the interoperability and
openness of this model, we adopt the Resource Description Framework
(RDF) syntax to describe the various modules of the system. A hybrid
function is introduced for the calculation of prediction. Empirical tests
on various real data sets (Book-Crossing, FoafPub) showed satisfactory
performances in terms of relevance and precision.
Keywords: Recommender systems, Resource description framework,
Dublin core, FOAF, Semantic.
1 Introduction
The multiplicity of the services offered via the Web excites the Net surfers to
expose and communicate an enormous traffic of data of various formats. The
gigantic mass of existing information and the speed of its instantaneous produc-
tion triggers the problem of informational overload. This phenomenon known
under the name big data imposes multiple difficulties such as management, stor-
age, the control and the security of circulated data. On the other hand, the
access to relevant information in time is a major occupation of the developers
and users, in spite of his availability it is lost in the mass. The performances of
the existing tools degrade when we handle large volume of data, more precisely
the search engines are involved by this phenomenon in terms of recall and preci-
sion as well as the process of the indexing. Our work is more particularly listed

under filtering information tab, specifically custom filtering in order to submit
the useful information to the users. Many commercial and educational sites are
based on the filtering algorithms to recommend their products such as the Ama-
zon, Movielens, Netflix, EducationWorld etc [5]. Filtering systems (FS), known
as ”recommender systems”, have become essential with the increasing variety of
web resources such as news, games, videos, documents or others [10]. The ma-
jority of the recent FS explores semantic information and share the metadata of
the resources in order to improve the relevance factor[8]. Additionally, another
type of these systems is based on ontology for conceptualizing and valorising the
application domain, which makes it possible to increase their performances [1].
However, FS suffer from some common weaknesses, such as cold start, sparsity
and scalability. In our study, we adopted the RDF model to represent all ele-
ments of the system with an open and interoperable manner. With the formalism
Friend Of A Friend (FOAF), we weighted the attributes of the user profiles in
order to gather them by degree of similarity. In addition, the items of system are
represented by the Dublin Core vocabulary (DC) in RDF model to describe the
web resources formally. These two formalisms that are recommended by W3C
ensure interoperability and easy integration of the data. This approach allowed
us to avoid focusing the approaches on a specific and closed field, and treats all
kinds of resource using the URI and namespace clauses. The rest of the paper
is organized as follows, we will briefly review the various forms of FS in section
2. The section 3 presents the details of our proposal. The results of experiments
followed by discussions were exposed in section 4. In the end, we conclude our
work with a conclusion and perspective.
2 State of the Art
The number of Internet users has now reached 38.8% of the world popula-
tion in 2013 against 0.4% in 1995 according to statistics provided by ITU
(http://www.itu.int/en/ITU-D/Statistics/Pages/stat/default.aspx). On the other
hand, resources called commonly items occur at an incredible speed either by
users or companies. Current tools are not consistent with this huge volume of
data in order to analyze, control or have relevant information at time. The birth
of FS is used to manage information overload by filtering [3, 8]. Items can be
extremely varied DVDs, books, images, web pages, restaurants ... etc. These
systems are now increasingly present on the web and certainly will become es-
sential in the future with the continuous increase of data [12]. According to
how to estimate the relevance, researchers classify recommendation algorithms
into three main approaches: content-based, collaborative and hybrid [4]. In the
first approach, the system will support the content of the thematic items ”doc-
uments” to compare them with a user profile, itself consists of topics explaining
his interests, that is to say, the system compares the document themes with
those of the profile and decides if the document is recommended or rejected ac-
cording to the threshold of satisfaction function [17]. In the second approach,
also known as social, the system uses the ratings of certain items or users and

in order to recommend them to other users through the application of similarity
process and without it being necessary to analyze the content of items [2], in this
approach, there are two main techniques which builds on memory-based algo-
rithms, that operates a portion or all of the ratings to generate a new prediction
[12] and which is founded on the model-based algorithms to create a descriptive
model of the user so, estimate the prediction. The collaborative approaches are
widely adopted in recommender systems such as Tapestry [4] GroupeLens [15],
Amazon, Netflix ... etc. The hybrid methods operate to attenuate the insuffi-
ciencies of each of the two previous approaches by combining them in various
manners. Recently, a new generation of FS boosted by semantic web formalisms
or adaptable to contexts that uses a taxonomies or ontologies [13]. Commonly,
these systems have shortcomings that prevent the recommendation process and
degrade their performances, like the effect of the funnel where the user does not
profited from the innovation and diversity of the items recommended in content-
based filtering; the scalability where the system handles a large number of users
and items online what makes difficult to predict in time; the sparsity problem,
where there’s a lack of sufficient evaluations to estimate the prediction well as
the problem of the cold start to a user and/or item lately integrated into the
system [11]. In this paper, we will extend the filtering systems in an open and
interoperable specification, each component of the system is formalized by an
appropriate RDF vocabulary. The following section explains the basic concepts
of this specification.
3 Proposed approach
Our study focuses on reducing the sparsity problem through the similarity of
items via the values of DC properties, as well as the similarity of users through
the values of FOAF properties. The values of properties are heterogeneous type
nominal, ordinal, qualitative, etc ., so we have defined several functions of en-
coding and normalization to convert these properties in a numeric scale. i.e.
quantitative values in the range [0-1].
3.1 RDF specification
Resource Description Framework RDF (http://www.w3.org/TR/2004/REC-rdf-
syntax-grammar-20040210/) is a data model for the description of various types
of resources (person, web page, movie, service, book etc.). It treats the data and
its properties and the relationship between them, in other words it is a formal
specification by meta-data, originally designed by W3C, whose purpose is to
allow a com-munity of users to share the same meta-data for shared resources.
However, an RDF document is a set of triplet ¡subject, predicate, object¿ where
the subject is the resource to be described, the predicate is the property of this
resource and the object it is the value of this property or another resource. One of
the great advantages of RDF is its extensibility through the use of RDF schemas
that can be integrated and not mutually exclusive with the use of namespace and

URI (Uniform Resource Identifier) concepts [7]. It is always possible to present a
RDF document by a labelled directed graph. For example, “the book Semantic
Web for the Working Ontologist written by Dean Allemang on July 5, 2011”, in
RDF/XML Syntax: < ?xml version="1.0"? >
<rdf:RDF xmlns:ss="http://workingontologist.org/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
<rdf:Description rdf:about="http://www.amazon.fr/
Semantic-Web-Working-Ontologist-Effective/dp/0123859654/">
<ss:written by rdf:resource="http://www.cs.bu.edu/fac/
allemang/"/> </rdf:Description>
<rdf:Description rdf:about="http://www.amazon.fr/
Semantic-Web-Working-Ontologist-Effective/dp/0123859654/">
<ss:hasTitle>SemanticWeb for the WorkingOntologist</ss:hasTitle>
</rdf:Description>
<rdf:Description rdf:about="http://www.amazon.fr/
Semantic-Web-Working-Ontologist-Effective/dp/0123859654/">
<ss:hasDate >July 5, 2011 </ss:hasDate >
</rdf:Description>
</rdf:RDF>
Our solution (figure1) based on a modelling in RDF through FOAF and Dublin
core standards,describing the set of the users and items.
Fig. 1. Overall scheme of the proposal

References
More filters

Journal ArticleDOI
TL;DR: This paper presents an overview of the field of recommender systems and describes the current generation of recommendation methods that are usually classified into the following three main categories: content-based, collaborative, and hybrid recommendation approaches.
Abstract: This paper presents an overview of the field of recommender systems and describes the current generation of recommendation methods that are usually classified into the following three main categories: content-based, collaborative, and hybrid recommendation approaches. This paper also describes various limitations of current recommendation methods and discusses possible extensions that can improve recommendation capabilities and make recommender systems applicable to an even broader range of applications. These extensions include, among others, an improvement of understanding of users and items, incorporation of the contextual information into the recommendation process, support for multicriteria ratings, and a provision of more flexible and less intrusive types of recommendations.

9,202 citations


Proceedings ArticleDOI
01 Apr 2001
TL;DR: This paper analyzes item-based collaborative ltering techniques and suggests that item- based algorithms provide dramatically better performance than user-based algorithms, while at the same time providing better quality than the best available userbased algorithms.
Abstract: Recommender systems apply knowledge discovery techniques to the problem of making personalized recommendations for information, products or services during a live interaction. These systems, especially the k-nearest neighbor collaborative ltering based ones, are achieving widespread success on the Web. The tremendous growth in the amount of available information and the number of visitors to Web sites in recent years poses some key challenges for recommender systems. These are: producing high quality recommendations, performing many recommendations per second for millions of users and items and achieving high coverage in the face of data sparsity. In traditional collaborative ltering systems the amount of work increases with the number of participants in the system. New recommender system technologies are needed that can quickly produce high quality recommendations, even for very large-scale problems. To address these issues we have explored item-based collaborative ltering techniques. Item-based techniques rst analyze the user-item matrix to identify relationships between di erent items, and then use these relationships to indirectly compute recommendations for users. In this paper we analyze di erent item-based recommendation generation algorithms. We look into di erent techniques for computing item-item similarities (e.g., item-item correlation vs. cosine similarities between item vectors) and di erent techniques for obtaining recommendations from them (e.g., weighted sum vs. regression model). Finally, we experimentally evaluate our results and compare them to the basic k-nearest neighbor approach. Our experiments suggest that item-based algorithms provide dramatically better performance than user-based algorithms, while at the same time providing better quality than the best available userbased algorithms.

7,756 citations


Proceedings ArticleDOI
22 Oct 1994
TL;DR: GroupLens is a system for collaborative filtering of netnews, to help people find articles they will like in the huge stream of available articles, and protect their privacy by entering ratings under pseudonyms, without reducing the effectiveness of the score prediction.
Abstract: Collaborative filters help people make choices based on the opinions of other people. GroupLens is a system for collaborative filtering of netnews, to help people find articles they will like in the huge stream of available articles. News reader clients display predicted scores and make it easy for users to rate articles after they read them. Rating servers, called Better Bit Bureaus, gather and disseminate the ratings. The rating servers predict scores based on the heuristic that people who agreed in the past will probably agree again. Users can protect their privacy by entering ratings under pseudonyms, without reducing the effectiveness of the score prediction. The entire architecture is open: alternative software for news clients and Better Bit Bureaus can be developed independently and can interoperate with the components we have developed.

5,417 citations


Journal ArticleDOI
TL;DR: The key decisions in evaluating collaborative filtering recommender systems are reviewed: the user tasks being evaluated, the types of analysis and datasets being used, the ways in which prediction quality is measured, the evaluation of prediction attributes other than quality, and the user-based evaluation of the system as a whole.
Abstract: Recommender systems have been evaluated in many, often incomparable, ways. In this article, we review the key decisions in evaluating collaborative filtering recommender systems: the user tasks being evaluated, the types of analysis and datasets being used, the ways in which prediction quality is measured, the evaluation of prediction attributes other than quality, and the user-based evaluation of the system as a whole. In addition to reviewing the evaluation strategies used by prior researchers, we present empirical results from the analysis of various accuracy metrics on one content domain where all the tested metrics collapsed roughly into three equivalence classes. Metrics within each equivalency class were strongly correlated, while metrics from different equivalency classes were uncorrelated.

5,338 citations


Journal ArticleDOI
TL;DR: Tapestry is intended to handle any incoming stream of electronic documents and serves both as a mail filter and repository; its components are the indexer, document store, annotation store, filterer, little box, remailer, appraiser and reader/browser.
Abstract: The Tapestry experimental mail system developed at the Xerox Palo Alto Research Center is predicated on the belief that information filtering can be more effective when humans are involved in the filtering process. Tapestry was designed to support both content-based filtering and collaborative filtering, which entails people collaborating to help each other perform filtering by recording their reactions to documents they read. The reactions are called annotations; they can be accessed by other people’s filters. Tapestry is intended to handle any incoming stream of electronic documents and serves both as a mail filter and repository; its components are the indexer, document store, annotation store, filterer, little box, remailer, appraiser and reader/browser. Tapestry’s client/server architecture, its various components, and the Tapestry query language are described.

4,030 citations