HAL Id: hal-01789930
https://hal.inria.fr/hal-01789930
Submitted on 11 May 2018
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Distributed under a Creative Commons Attribution| 4.0 International License
A Hybrid Model to Improve Filtering Systems
Kharroubi Sahraoui, Dahmani youcef, Nouali Omar
To cite this version:
Kharroubi Sahraoui, Dahmani youcef, Nouali Omar. A Hybrid Model to Improve Filtering Systems.
5th International Conference on Computer Science and Its Applications (CIIA), May 2015, Saida,
Algeria. pp.303-314, �10.1007/978-3-319-19578-0_25�. �hal-01789930�
A hybrid model to improve filtering systems
KHARROUBI Sahraoui
1
, DAHMANI Youcef
2
, and NOUALI Omar
3
1
National High School of Computer Science E.S.I, &
Ibn Khaldoun University Tiaret, Algeria,
s kharoubi@esi.dz
2
Department of Computer Science,
Ibn Khaldoun University, Tiaret, Algeria
dahmani y@yahoo.fr
3
Basic Software Laboratory, C.E.R.I.S.T, Ben Aknoun, Algeria
o nouali@cerist.dz
Abstract. There is a continuous information overload on the Web. The
problem treated is how to have relevant information (documents, prod-
ucts, services etc.) at time and without difficulty. Filtering system also
called recommender systems have widely used to recommend relevant
resources to users by similarity process such as Amazon, MovieLens, Cd-
now etc. The trend is to improve the information filtering approaches to
better answer the users expectations. In this work, we model a collabo-
rative filtering system by using Friend Of A Friend (FOAF) formalism to
represent the users and the Dublin Core (DC) vocabulary to represent
the resources “items”. In addition, to ensure the interoperability and
openness of this model, we adopt the Resource Description Framework
(RDF) syntax to describe the various modules of the system. A hybrid
function is introduced for the calculation of prediction. Empirical tests
on various real data sets (Book-Crossing, FoafPub) showed satisfactory
performances in terms of relevance and precision.
Keywords: Recommender systems, Resource description framework,
Dublin core, FOAF, Semantic.
1 Introduction
The multiplicity of the services offered via the Web excites the Net surfers to
expose and communicate an enormous traffic of data of various formats. The
gigantic mass of existing information and the speed of its instantaneous produc-
tion triggers the problem of informational overload. This phenomenon known
under the name big data imposes multiple difficulties such as management, stor-
age, the control and the security of circulated data. On the other hand, the
access to relevant information in time is a major occupation of the developers
and users, in spite of his availability it is lost in the mass. The performances of
the existing tools degrade when we handle large volume of data, more precisely
the search engines are involved by this phenomenon in terms of recall and preci-
sion as well as the process of the indexing. Our work is more particularly listed
under filtering information tab, specifically custom filtering in order to submit
the useful information to the users. Many commercial and educational sites are
based on the filtering algorithms to recommend their products such as the Ama-
zon, Movielens, Netflix, EducationWorld etc [5]. Filtering systems (FS), known
as ”recommender systems”, have become essential with the increasing variety of
web resources such as news, games, videos, documents or others [10]. The ma-
jority of the recent FS explores semantic information and share the metadata of
the resources in order to improve the relevance factor[8]. Additionally, another
type of these systems is based on ontology for conceptualizing and valorising the
application domain, which makes it possible to increase their performances [1].
However, FS suffer from some common weaknesses, such as cold start, sparsity
and scalability. In our study, we adopted the RDF model to represent all ele-
ments of the system with an open and interoperable manner. With the formalism
Friend Of A Friend (FOAF), we weighted the attributes of the user profiles in
order to gather them by degree of similarity. In addition, the items of system are
represented by the Dublin Core vocabulary (DC) in RDF model to describe the
web resources formally. These two formalisms that are recommended by W3C
ensure interoperability and easy integration of the data. This approach allowed
us to avoid focusing the approaches on a specific and closed field, and treats all
kinds of resource using the URI and namespace clauses. The rest of the paper
is organized as follows, we will briefly review the various forms of FS in section
2. The section 3 presents the details of our proposal. The results of experiments
followed by discussions were exposed in section 4. In the end, we conclude our
work with a conclusion and perspective.
2 State of the Art
The number of Internet users has now reached 38.8% of the world popula-
tion in 2013 against 0.4% in 1995 according to statistics provided by ITU
(http://www.itu.int/en/ITU-D/Statistics/Pages/stat/default.aspx). On the other
hand, resources called commonly items occur at an incredible speed either by
users or companies. Current tools are not consistent with this huge volume of
data in order to analyze, control or have relevant information at time. The birth
of FS is used to manage information overload by filtering [3, 8]. Items can be
extremely varied DVDs, books, images, web pages, restaurants ... etc. These
systems are now increasingly present on the web and certainly will become es-
sential in the future with the continuous increase of data [12]. According to
how to estimate the relevance, researchers classify recommendation algorithms
into three main approaches: content-based, collaborative and hybrid [4]. In the
first approach, the system will support the content of the thematic items ”doc-
uments” to compare them with a user profile, itself consists of topics explaining
his interests, that is to say, the system compares the document themes with
those of the profile and decides if the document is recommended or rejected ac-
cording to the threshold of satisfaction function [17]. In the second approach,
also known as social, the system uses the ratings of certain items or users and
in order to recommend them to other users through the application of similarity
process and without it being necessary to analyze the content of items [2], in this
approach, there are two main techniques which builds on memory-based algo-
rithms, that operates a portion or all of the ratings to generate a new prediction
[12] and which is founded on the model-based algorithms to create a descriptive
model of the user so, estimate the prediction. The collaborative approaches are
widely adopted in recommender systems such as Tapestry [4] GroupeLens [15],
Amazon, Netflix ... etc. The hybrid methods operate to attenuate the insuffi-
ciencies of each of the two previous approaches by combining them in various
manners. Recently, a new generation of FS boosted by semantic web formalisms
or adaptable to contexts that uses a taxonomies or ontologies [13]. Commonly,
these systems have shortcomings that prevent the recommendation process and
degrade their performances, like the effect of the funnel where the user does not
profited from the innovation and diversity of the items recommended in content-
based filtering; the scalability where the system handles a large number of users
and items online what makes difficult to predict in time; the sparsity problem,
where there’s a lack of sufficient evaluations to estimate the prediction well as
the problem of the cold start to a user and/or item lately integrated into the
system [11]. In this paper, we will extend the filtering systems in an open and
interoperable specification, each component of the system is formalized by an
appropriate RDF vocabulary. The following section explains the basic concepts
of this specification.
3 Proposed approach
Our study focuses on reducing the sparsity problem through the similarity of
items via the values of DC properties, as well as the similarity of users through
the values of FOAF properties. The values of properties are heterogeneous type
nominal, ordinal, qualitative, etc ., so we have defined several functions of en-
coding and normalization to convert these properties in a numeric scale. i.e.
quantitative values in the range [0-1].
3.1 RDF specification
Resource Description Framework RDF (http://www.w3.org/TR/2004/REC-rdf-
syntax-grammar-20040210/) is a data model for the description of various types
of resources (person, web page, movie, service, book etc.). It treats the data and
its properties and the relationship between them, in other words it is a formal
specification by meta-data, originally designed by W3C, whose purpose is to
allow a com-munity of users to share the same meta-data for shared resources.
However, an RDF document is a set of triplet ¡subject, predicate, object¿ where
the subject is the resource to be described, the predicate is the property of this
resource and the object it is the value of this property or another resource. One of
the great advantages of RDF is its extensibility through the use of RDF schemas
that can be integrated and not mutually exclusive with the use of namespace and
URI (Uniform Resource Identifier) concepts [7]. It is always possible to present a
RDF document by a labelled directed graph. For example, “the book Semantic
Web for the Working Ontologist written by Dean Allemang on July 5, 2011”, in
RDF/XML Syntax: < ?xml version="1.0"? >
<rdf:RDF xmlns:ss="http://workingontologist.org/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
<rdf:Description rdf:about="http://www.amazon.fr/
Semantic-Web-Working-Ontologist-Effective/dp/0123859654/">
<ss:written by rdf:resource="http://www.cs.bu.edu/fac/
allemang/"/> </rdf:Description>
<rdf:Description rdf:about="http://www.amazon.fr/
Semantic-Web-Working-Ontologist-Effective/dp/0123859654/">
<ss:hasTitle>SemanticWeb for the WorkingOntologist</ss:hasTitle>
</rdf:Description>
<rdf:Description rdf:about="http://www.amazon.fr/
Semantic-Web-Working-Ontologist-Effective/dp/0123859654/">
<ss:hasDate >July 5, 2011 </ss:hasDate >
</rdf:Description>
</rdf:RDF>
Our solution (figure1) based on a modelling in RDF through FOAF and Dublin
core standards,describing the set of the users and items.
Fig. 1. Overall scheme of the proposal