scispace - formally typeset
Search or ask a question
Book ChapterDOI

A Hybrid Model to Improve Filtering Systems

TL;DR: This work model a collaborative filtering system by using Friend Of A Friend (FOAF) formalism to represent the users and the Dublin Core (DC) vocabulary to represents the resources “items” and adopts the Resource Description Framework (RDF) syntax to describe the various modules of the system.
Abstract: There is a continuous information overload on the Web. The problem treated is how to have relevant information (documents, products, services etc.) at time and without difficulty. Filtering system also called recommender systems have widely used to recommend relevant resources to users by similarity process such as Amazon, MovieLens, Cdnow etc. The trend is to improve the information filtering approaches to better answer the users expectations. In this work, we model a collaborative filtering system by using Friend Of A Friend (FOAF) formalism to represent the users and the Dublin Core (DC) vocabulary to represent the resources “items”. In addition, to ensure the interoperability and openness of this model, we adopt the Resource Description Framework (RDF) syntax to describe the various modules of the system. A hybrid function is introduced for the calculation of prediction. Empirical tests on various real data sets (Book-Crossing, FoafPub) showed satisfactory performances in terms of relevance and precision.

Summary (3 min read)

1 Introduction

  • The multiplicity of the services offered via the Web excites the Net surfers to expose and communicate an enormous traffic of data of various formats.
  • This phenomenon known under the name big data imposes multiple difficulties such as management, storage, the control and the security of circulated data.
  • Many commercial and educational sites are based on the filtering algorithms to recommend their products such as the Amazon, Movielens, Netflix, EducationWorld etc [5].
  • In their study, the authors adopted the RDF model to represent all elements of the system with an open and interoperable manner.
  • The section 3 presents the details of their proposal.

2 State of the Art

  • The number of Internet users has now reached 38.8% of the world population in 2013 against 0.4% in 1995 according to statistics provided by ITU (http://www.itu.int/en/ITU-D/Statistics/Pages/stat/default.aspx).
  • According to how to estimate the relevance, researchers classify recommendation algorithms into three main approaches: content-based, collaborative and hybrid [4].
  • The hybrid methods operate to attenuate the insufficiencies of each of the two previous approaches by combining them in various manners.

3 Proposed approach

  • The authors study focuses on reducing the sparsity problem through the similarity of items via the values of DC properties, as well as the similarity of users through the values of FOAF properties.
  • The values of properties are heterogeneous type nominal, ordinal, qualitative, etc ., so the authors have defined several functions of encoding and normalization to convert these properties in a numeric scale.

3.1 RDF specification

  • Resource Description Framework RDF (http://www.w3.org/TR/2004/REC-rdfsyntax-grammar-20040210/) is a data model for the description of various types of resources (person, web page, movie, service, book etc.).
  • It treats the data and its properties and the relationship between them, in other words it is a formal specification by meta-data, originally designed by W3C, whose purpose is to allow a com-munity of users to share the same meta-data for shared resources.
  • One of the great advantages of RDF is its extensibility through the use of RDF schemas that can be integrated and not mutually exclusive with the use of namespace and URI (Uniform Resource Identifier) concepts [7].
  • Thus, in order to keep the collaborative filtering approach the authors took into account the feedback of the users in the process of computing similarity, moreover they used a hybrid function to define the prediction value.

3.2 Item′s representation

  • A social FS consists of resources items, the users profiles and the histories which memorizes the interactions of the users about items recommended.
  • The authors exploited the meta-data of the Dublin core vocabulary as being a standardization description of items, the attributes values of the vocabulary allowed us to calculate the degree of similarity between items and group them into communities.
  • //dublincore.org) is a set of simple and effective elements to describe a wide variety of web resources, the standard version of this format includes 15 elements of which semantics has been established by an international consensus coming from various disciplines recommended by W3C, also known as Dublin Core DC (http.
  • These elements are gathered in three categories those which describe the contents (Cover, Description, Type, Relation, Source, Subject) and those which describe the individual properties (Collaborator, Creator, Editor, Rights) and others for instantiations (Date, Format, Identifier, Language), the current version is known as 1.1, validated in 2007 and revised in 2012 by DCMI (Dublin Core Metadata Initiative, (http://dublincore.org/documents/dces/).

3.3 User′s Representation

  • The objective of FS is to deliver the relevant items to the user, because the formation of the communities depends on the attributes values defined in the user profile.
  • Among the most common current practices the authors adopted the FOAF vocabulary to represent their profiles.
  • It can be used to search for individuals and communities: CV, social networks and management of the online communities, online identification and management of participation in projects etc.
  • A file FOAF can contain various information (name, family name, dateOfBirth, gender, mbox, Home Page, weblog, interest, accountName, Knows,etc.).
  • Following the very high number of the users in interaction, it is very important to well form the community as a building block in the FS and assuming one for all and all for one.

3.4 Recommendation engine

  • Simdc, similarity that using the Dublin Core vocabulary for describing items.
  • By the use of the URI, while identifying item and by exploiting its own metadata allowing reduce the sparsity problem.
  • Simf , similarity which depends on the representation of the profiles by the means of FOAF formalism, in favour of the variety of the fields and the availability of the data in profile, thus, the authors can overcome the problem of cold start of a new user and to still better forming the communities.
  • The recommendation process is purely automatic and directly related to the prediction value, so a given item is deemed relevant and deserves to be sent to the user if and only if its predictive value is greater than a given threshold.

4 Experimentation

  • This section is devoted to the experimental results of their hybrid solution on real data sets.
  • For evaluation and comparison, the authors implemented item-CF (item based collaborative filtering) approach widely referenced in Collaborative filtering search [6].

4.2 Relevance metrics

  • To evaluate the method presented in this article, the authors held a special metric and widely used in the FS, it is MAE, and two other metrics, recall and precision of information retrieval field [16, 9].
  • Mean Absolute Error, calculating the mean absolute difference between predictions pi retained by the system and the real evaluations ei given by users, also known as MAE.
  • This measure is simple to implement and directly interpretable.
  • P = Npr Nr Recall: it is the ratio between the number of relevant items returned by the system and the total number of existing relevant items in the database.
  • These metrics respectively measures the error, the effectiveness and the quality of FS.

4.3 Results and discussion

  • The authors discuss the experimental results obtained, for that, they divide the dataset size in two parts, one having a proportion of 80% has dedicated for training phase and the other of proportion of a 20% has dedicated for test phase.
  • Also the authors observe that the DC curve illustrates a slightly favourable result compared to the FOAF curve, as the items are identified and enriched by descriptions and meta-data with certain stability better than valorising links and subjective opinions between a user′s networks.
  • Moreover, the URI clause for the unique resource identification in rdf documents lowers the effect of scalability.
  • The authors also observe that the recall rate which reaches a maximum rate of 45% for the optimal Hybrid solution involves the role of property values of adopted vocabularies to filter only the relevant items.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

HAL Id: hal-01789930
https://hal.inria.fr/hal-01789930
Submitted on 11 May 2018
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Distributed under a Creative Commons Attribution| 4.0 International License
A Hybrid Model to Improve Filtering Systems
Kharroubi Sahraoui, Dahmani youcef, Nouali Omar
To cite this version:
Kharroubi Sahraoui, Dahmani youcef, Nouali Omar. A Hybrid Model to Improve Filtering Systems.
5th International Conference on Computer Science and Its Applications (CIIA), May 2015, Saida,
Algeria. pp.303-314, �10.1007/978-3-319-19578-0_25�. �hal-01789930�

A hybrid model to improve filtering systems
KHARROUBI Sahraoui
1
, DAHMANI Youcef
2
, and NOUALI Omar
3
1
National High School of Computer Science E.S.I, &
Ibn Khaldoun University Tiaret, Algeria,
s kharoubi@esi.dz
2
Department of Computer Science,
Ibn Khaldoun University, Tiaret, Algeria
dahmani y@yahoo.fr
3
Basic Software Laboratory, C.E.R.I.S.T, Ben Aknoun, Algeria
o nouali@cerist.dz
Abstract. There is a continuous information overload on the Web. The
problem treated is how to have relevant information (documents, prod-
ucts, services etc.) at time and without difficulty. Filtering system also
called recommender systems have widely used to recommend relevant
resources to users by similarity process such as Amazon, MovieLens, Cd-
now etc. The trend is to improve the information filtering approaches to
better answer the users expectations. In this work, we model a collabo-
rative filtering system by using Friend Of A Friend (FOAF) formalism to
represent the users and the Dublin Core (DC) vocabulary to represent
the resources “items”. In addition, to ensure the interoperability and
openness of this model, we adopt the Resource Description Framework
(RDF) syntax to describe the various modules of the system. A hybrid
function is introduced for the calculation of prediction. Empirical tests
on various real data sets (Book-Crossing, FoafPub) showed satisfactory
performances in terms of relevance and precision.
Keywords: Recommender systems, Resource description framework,
Dublin core, FOAF, Semantic.
1 Introduction
The multiplicity of the services offered via the Web excites the Net surfers to
expose and communicate an enormous traffic of data of various formats. The
gigantic mass of existing information and the speed of its instantaneous produc-
tion triggers the problem of informational overload. This phenomenon known
under the name big data imposes multiple difficulties such as management, stor-
age, the control and the security of circulated data. On the other hand, the
access to relevant information in time is a major occupation of the developers
and users, in spite of his availability it is lost in the mass. The performances of
the existing tools degrade when we handle large volume of data, more precisely
the search engines are involved by this phenomenon in terms of recall and preci-
sion as well as the process of the indexing. Our work is more particularly listed

under filtering information tab, specifically custom filtering in order to submit
the useful information to the users. Many commercial and educational sites are
based on the filtering algorithms to recommend their products such as the Ama-
zon, Movielens, Netflix, EducationWorld etc [5]. Filtering systems (FS), known
as ”recommender systems”, have become essential with the increasing variety of
web resources such as news, games, videos, documents or others [10]. The ma-
jority of the recent FS explores semantic information and share the metadata of
the resources in order to improve the relevance factor[8]. Additionally, another
type of these systems is based on ontology for conceptualizing and valorising the
application domain, which makes it possible to increase their performances [1].
However, FS suffer from some common weaknesses, such as cold start, sparsity
and scalability. In our study, we adopted the RDF model to represent all ele-
ments of the system with an open and interoperable manner. With the formalism
Friend Of A Friend (FOAF), we weighted the attributes of the user profiles in
order to gather them by degree of similarity. In addition, the items of system are
represented by the Dublin Core vocabulary (DC) in RDF model to describe the
web resources formally. These two formalisms that are recommended by W3C
ensure interoperability and easy integration of the data. This approach allowed
us to avoid focusing the approaches on a specific and closed field, and treats all
kinds of resource using the URI and namespace clauses. The rest of the paper
is organized as follows, we will briefly review the various forms of FS in section
2. The section 3 presents the details of our proposal. The results of experiments
followed by discussions were exposed in section 4. In the end, we conclude our
work with a conclusion and perspective.
2 State of the Art
The number of Internet users has now reached 38.8% of the world popula-
tion in 2013 against 0.4% in 1995 according to statistics provided by ITU
(http://www.itu.int/en/ITU-D/Statistics/Pages/stat/default.aspx). On the other
hand, resources called commonly items occur at an incredible speed either by
users or companies. Current tools are not consistent with this huge volume of
data in order to analyze, control or have relevant information at time. The birth
of FS is used to manage information overload by filtering [3, 8]. Items can be
extremely varied DVDs, books, images, web pages, restaurants ... etc. These
systems are now increasingly present on the web and certainly will become es-
sential in the future with the continuous increase of data [12]. According to
how to estimate the relevance, researchers classify recommendation algorithms
into three main approaches: content-based, collaborative and hybrid [4]. In the
first approach, the system will support the content of the thematic items ”doc-
uments” to compare them with a user profile, itself consists of topics explaining
his interests, that is to say, the system compares the document themes with
those of the profile and decides if the document is recommended or rejected ac-
cording to the threshold of satisfaction function [17]. In the second approach,
also known as social, the system uses the ratings of certain items or users and

in order to recommend them to other users through the application of similarity
process and without it being necessary to analyze the content of items [2], in this
approach, there are two main techniques which builds on memory-based algo-
rithms, that operates a portion or all of the ratings to generate a new prediction
[12] and which is founded on the model-based algorithms to create a descriptive
model of the user so, estimate the prediction. The collaborative approaches are
widely adopted in recommender systems such as Tapestry [4] GroupeLens [15],
Amazon, Netflix ... etc. The hybrid methods operate to attenuate the insuffi-
ciencies of each of the two previous approaches by combining them in various
manners. Recently, a new generation of FS boosted by semantic web formalisms
or adaptable to contexts that uses a taxonomies or ontologies [13]. Commonly,
these systems have shortcomings that prevent the recommendation process and
degrade their performances, like the effect of the funnel where the user does not
profited from the innovation and diversity of the items recommended in content-
based filtering; the scalability where the system handles a large number of users
and items online what makes difficult to predict in time; the sparsity problem,
where there’s a lack of sufficient evaluations to estimate the prediction well as
the problem of the cold start to a user and/or item lately integrated into the
system [11]. In this paper, we will extend the filtering systems in an open and
interoperable specification, each component of the system is formalized by an
appropriate RDF vocabulary. The following section explains the basic concepts
of this specification.
3 Proposed approach
Our study focuses on reducing the sparsity problem through the similarity of
items via the values of DC properties, as well as the similarity of users through
the values of FOAF properties. The values of properties are heterogeneous type
nominal, ordinal, qualitative, etc ., so we have defined several functions of en-
coding and normalization to convert these properties in a numeric scale. i.e.
quantitative values in the range [0-1].
3.1 RDF specification
Resource Description Framework RDF (http://www.w3.org/TR/2004/REC-rdf-
syntax-grammar-20040210/) is a data model for the description of various types
of resources (person, web page, movie, service, book etc.). It treats the data and
its properties and the relationship between them, in other words it is a formal
specification by meta-data, originally designed by W3C, whose purpose is to
allow a com-munity of users to share the same meta-data for shared resources.
However, an RDF document is a set of triplet ¡subject, predicate, object¿ where
the subject is the resource to be described, the predicate is the property of this
resource and the object it is the value of this property or another resource. One of
the great advantages of RDF is its extensibility through the use of RDF schemas
that can be integrated and not mutually exclusive with the use of namespace and

URI (Uniform Resource Identifier) concepts [7]. It is always possible to present a
RDF document by a labelled directed graph. For example, “the book Semantic
Web for the Working Ontologist written by Dean Allemang on July 5, 2011”, in
RDF/XML Syntax: < ?xml version="1.0"? >
<rdf:RDF xmlns:ss="http://workingontologist.org/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
<rdf:Description rdf:about="http://www.amazon.fr/
Semantic-Web-Working-Ontologist-Effective/dp/0123859654/">
<ss:written by rdf:resource="http://www.cs.bu.edu/fac/
allemang/"/> </rdf:Description>
<rdf:Description rdf:about="http://www.amazon.fr/
Semantic-Web-Working-Ontologist-Effective/dp/0123859654/">
<ss:hasTitle>SemanticWeb for the WorkingOntologist</ss:hasTitle>
</rdf:Description>
<rdf:Description rdf:about="http://www.amazon.fr/
Semantic-Web-Working-Ontologist-Effective/dp/0123859654/">
<ss:hasDate >July 5, 2011 </ss:hasDate >
</rdf:Description>
</rdf:RDF>
Our solution (figure1) based on a modelling in RDF through FOAF and Dublin
core standards,describing the set of the users and items.
Fig. 1. Overall scheme of the proposal

References
More filters
Book ChapterDOI
27 Aug 2013
TL;DR: This paper proposes confidence on predictions in order to depict the belief from the system on the pertinence of those predictions, and presents an evaluation of such a confidence by applying it on different collaborative filtering systems of the literature using two datasets with different characteristics.
Abstract: Memory-based collaborative filtering systems predict items ratings for a particular user based on an aggregation of the ratings previously given by other users. Most systems focus on prediction accuracy, through MAE or RMSE metrics. However end users have seldom feedback on this accuracy. In this paper, we propose confidence on predictions in order to depict the belief from the system on the pertinence of those predictions. This confidence can be returned to the end user in order to ease his/her final choice or used by the system in order to make new predictions. It takes into account some characteristics on the aggregated ratings, such as number, homogeneity and freshness of ratings as well as users weight. We present an evaluation of such a confidence by applying it on different collaborative filtering systems of the literature using two datasets with different characteristics.

5 citations

Journal ArticleDOI
TL;DR: This paper proposes an adaptive metric which considers the time in measuring the similarity of users and shows that this approach is more accurate than the traditional collaborative filtering algorithm.
Abstract: The aim of a recommender system is filtering the enormous quantity of information to obtain useful information based on the user’s interest. Collaborative filtering is a technique which improves the efficiency of recommendation systems by considering the similarity between users. The similarity is based on the given rating to data by similar users. However, user’s interest may change over time. In this paper we propose an adaptive metric which considers the time in measuring the similarity of users. The experimental results show that our approach is more accurate than the traditional collaborative filtering algorithm.

5 citations