A Hybrid Model to Improve Filtering Systems

doi:10.1007/978-3-319-19578-0_25

Book Chapter•DOI•

A Hybrid Model to Improve Filtering Systems

Kharroubi Sahraoui, Dahmani youcef, Nouali Omar

20 May 2015-pp 303-314

TL;DR: This work model a collaborative filtering system by using Friend Of A Friend (FOAF) formalism to represent the users and the Dublin Core (DC) vocabulary to represents the resources “items” and adopts the Resource Description Framework (RDF) syntax to describe the various modules of the system.

read less

Abstract: There is a continuous information overload on the Web. The problem treated is how to have relevant information (documents, products, services etc.) at time and without difficulty. Filtering system also called recommender systems have widely used to recommend relevant resources to users by similarity process such as Amazon, MovieLens, Cdnow etc. The trend is to improve the information filtering approaches to better answer the users expectations. In this work, we model a collaborative filtering system by using Friend Of A Friend (FOAF) formalism to represent the users and the Dublin Core (DC) vocabulary to represent the resources “items”. In addition, to ensure the interoperability and openness of this model, we adopt the Resource Description Framework (RDF) syntax to describe the various modules of the system. A hybrid function is introduced for the calculation of prediction. Empirical tests on various real data sets (Book-Crossing, FoafPub) showed satisfactory performances in terms of relevance and precision.

...read moreread less

Summary (3 min read)

Jump to: [1 Introduction] – [2 State of the Art] – [3 Proposed approach] – [3.1 RDF specification] – [3.2 Item′s representation] – [3.3 User′s Representation] – [3.4 Recommendation engine] – [4 Experimentation] – [4.2 Relevance metrics] and [4.3 Results and discussion]

1 Introduction

The multiplicity of the services offered via the Web excites the Net surfers to expose and communicate an enormous traffic of data of various formats.
This phenomenon known under the name big data imposes multiple difficulties such as management, storage, the control and the security of circulated data.
Many commercial and educational sites are based on the filtering algorithms to recommend their products such as the Amazon, Movielens, Netflix, EducationWorld etc [5].
In their study, the authors adopted the RDF model to represent all elements of the system with an open and interoperable manner.
The section 3 presents the details of their proposal.

2 State of the Art

The number of Internet users has now reached 38.8% of the world population in 2013 against 0.4% in 1995 according to statistics provided by ITU (http://www.itu.int/en/ITU-D/Statistics/Pages/stat/default.aspx).
According to how to estimate the relevance, researchers classify recommendation algorithms into three main approaches: content-based, collaborative and hybrid [4].
The hybrid methods operate to attenuate the insufficiencies of each of the two previous approaches by combining them in various manners.

3 Proposed approach

The authors study focuses on reducing the sparsity problem through the similarity of items via the values of DC properties, as well as the similarity of users through the values of FOAF properties.
The values of properties are heterogeneous type nominal, ordinal, qualitative, etc ., so the authors have defined several functions of encoding and normalization to convert these properties in a numeric scale.

3.1 RDF specification

Resource Description Framework RDF (http://www.w3.org/TR/2004/REC-rdfsyntax-grammar-20040210/) is a data model for the description of various types of resources (person, web page, movie, service, book etc.).
It treats the data and its properties and the relationship between them, in other words it is a formal specification by meta-data, originally designed by W3C, whose purpose is to allow a com-munity of users to share the same meta-data for shared resources.
One of the great advantages of RDF is its extensibility through the use of RDF schemas that can be integrated and not mutually exclusive with the use of namespace and URI (Uniform Resource Identifier) concepts [7].
Thus, in order to keep the collaborative filtering approach the authors took into account the feedback of the users in the process of computing similarity, moreover they used a hybrid function to define the prediction value.

3.2 Item′s representation

A social FS consists of resources items, the users profiles and the histories which memorizes the interactions of the users about items recommended.
The authors exploited the meta-data of the Dublin core vocabulary as being a standardization description of items, the attributes values of the vocabulary allowed us to calculate the degree of similarity between items and group them into communities.
//dublincore.org) is a set of simple and effective elements to describe a wide variety of web resources, the standard version of this format includes 15 elements of which semantics has been established by an international consensus coming from various disciplines recommended by W3C, also known as Dublin Core DC (http.
These elements are gathered in three categories those which describe the contents (Cover, Description, Type, Relation, Source, Subject) and those which describe the individual properties (Collaborator, Creator, Editor, Rights) and others for instantiations (Date, Format, Identifier, Language), the current version is known as 1.1, validated in 2007 and revised in 2012 by DCMI (Dublin Core Metadata Initiative, (http://dublincore.org/documents/dces/).

3.3 User′s Representation

The objective of FS is to deliver the relevant items to the user, because the formation of the communities depends on the attributes values defined in the user profile.
Among the most common current practices the authors adopted the FOAF vocabulary to represent their profiles.
It can be used to search for individuals and communities: CV, social networks and management of the online communities, online identification and management of participation in projects etc.
A file FOAF can contain various information (name, family name, dateOfBirth, gender, mbox, Home Page, weblog, interest, accountName, Knows,etc.).
Following the very high number of the users in interaction, it is very important to well form the community as a building block in the FS and assuming one for all and all for one.

3.4 Recommendation engine

Simdc, similarity that using the Dublin Core vocabulary for describing items.
By the use of the URI, while identifying item and by exploiting its own metadata allowing reduce the sparsity problem.
Simf , similarity which depends on the representation of the profiles by the means of FOAF formalism, in favour of the variety of the fields and the availability of the data in profile, thus, the authors can overcome the problem of cold start of a new user and to still better forming the communities.
The recommendation process is purely automatic and directly related to the prediction value, so a given item is deemed relevant and deserves to be sent to the user if and only if its predictive value is greater than a given threshold.

4 Experimentation

This section is devoted to the experimental results of their hybrid solution on real data sets.
For evaluation and comparison, the authors implemented item-CF (item based collaborative filtering) approach widely referenced in Collaborative filtering search [6].

4.2 Relevance metrics

To evaluate the method presented in this article, the authors held a special metric and widely used in the FS, it is MAE, and two other metrics, recall and precision of information retrieval field [16, 9].
Mean Absolute Error, calculating the mean absolute difference between predictions pi retained by the system and the real evaluations ei given by users, also known as MAE.
This measure is simple to implement and directly interpretable.
P = Npr Nr Recall: it is the ratio between the number of relevant items returned by the system and the total number of existing relevant items in the database.
These metrics respectively measures the error, the effectiveness and the quality of FS.

4.3 Results and discussion

The authors discuss the experimental results obtained, for that, they divide the dataset size in two parts, one having a proportion of 80% has dedicated for training phase and the other of proportion of a 20% has dedicated for test phase.
Also the authors observe that the DC curve illustrates a slightly favourable result compared to the FOAF curve, as the items are identified and enriched by descriptions and meta-data with certain stability better than valorising links and subjective opinions between a user′s networks.
Moreover, the URI clause for the unique resource identification in rdf documents lowers the effect of scalability.
The authors also observe that the recall rate which reaches a maximum rate of 45% for the optimal Hybrid solution involves the role of property values of adopted vocabularies to filter only the relevant items.

Did you find this useful? Give us your feedback

Figures (4)

Content maybe subject to copyright Report

HAL Id: hal-01789930

https://hal.inria.fr/hal-01789930

Submitted on 11 May 2018

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-

entic research documents, whether they are pub-

lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diusion de documents

scientiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Distributed under a Creative Commons Attribution| 4.0 International License

A Hybrid Model to Improve Filtering Systems

Kharroubi Sahraoui, Dahmani youcef, Nouali Omar

To cite this version:

Kharroubi Sahraoui, Dahmani youcef, Nouali Omar. A Hybrid Model to Improve Filtering Systems.

5th International Conference on Computer Science and Its Applications (CIIA), May 2015, Saida,

Algeria. pp.303-314, �10.1007/978-3-319-19578-0_25�. �hal-01789930�

A hybrid model to improve ﬁltering systems

KHARROUBI Sahraoui

, DAHMANI Youcef

, and NOUALI Omar

National High School of Computer Science E.S.I, &

Ibn Khaldoun University Tiaret, Algeria,

s kharoubi@esi.dz

Department of Computer Science,

Ibn Khaldoun University, Tiaret, Algeria

dahmani y@yahoo.fr

Basic Software Laboratory, C.E.R.I.S.T, Ben Aknoun, Algeria

o nouali@cerist.dz

Abstract. There is a continuous information overload on the Web. The

problem treated is how to have relevant information (documents, prod-

ucts, services etc.) at time and without diﬃculty. Filtering system also

called recommender systems have widely used to recommend relevant

resources to users by similarity process such as Amazon, MovieLens, Cd-

now etc. The trend is to improve the information ﬁltering approaches to

better answer the users expectations. In this work, we model a collabo-

rative ﬁltering system by using Friend Of A Friend (FOAF) formalism to

represent the users and the Dublin Core (DC) vocabulary to represent

the resources “items”. In addition, to ensure the interoperability and

openness of this model, we adopt the Resource Description Framework

(RDF) syntax to describe the various modules of the system. A hybrid

function is introduced for the calculation of prediction. Empirical tests

on various real data sets (Book-Crossing, FoafPub) showed satisfactory

performances in terms of relevance and precision.

Keywords: Recommender systems, Resource description framework,

Dublin core, FOAF, Semantic.

1 Introduction

The multiplicity of the services oﬀered via the Web excites the Net surfers to

expose and communicate an enormous traﬃc of data of various formats. The

gigantic mass of existing information and the speed of its instantaneous produc-

tion triggers the problem of informational overload. This phenomenon known

under the name big data imposes multiple diﬃculties such as management, stor-

age, the control and the security of circulated data. On the other hand, the

access to relevant information in time is a major occupation of the developers

and users, in spite of his availability it is lost in the mass. The performances of

the existing tools degrade when we handle large volume of data, more precisely

the search engines are involved by this phenomenon in terms of recall and preci-

sion as well as the process of the indexing. Our work is more particularly listed

under ﬁltering information tab, speciﬁcally custom ﬁltering in order to submit

the useful information to the users. Many commercial and educational sites are

based on the ﬁltering algorithms to recommend their products such as the Ama-

zon, Movielens, Netﬂix, EducationWorld etc [5]. Filtering systems (FS), known

as ”recommender systems”, have become essential with the increasing variety of

web resources such as news, games, videos, documents or others [10]. The ma-

jority of the recent FS explores semantic information and share the metadata of

the resources in order to improve the relevance factor[8]. Additionally, another

type of these systems is based on ontology for conceptualizing and valorising the

application domain, which makes it possible to increase their performances [1].

However, FS suﬀer from some common weaknesses, such as cold start, sparsity

and scalability. In our study, we adopted the RDF model to represent all ele-

ments of the system with an open and interoperable manner. With the formalism

Friend Of A Friend (FOAF), we weighted the attributes of the user proﬁles in

order to gather them by degree of similarity. In addition, the items of system are

represented by the Dublin Core vocabulary (DC) in RDF model to describe the

web resources formally. These two formalisms that are recommended by W3C

ensure interoperability and easy integration of the data. This approach allowed

us to avoid focusing the approaches on a speciﬁc and closed ﬁeld, and treats all

kinds of resource using the URI and namespace clauses. The rest of the paper

is organized as follows, we will brieﬂy review the various forms of FS in section

2. The section 3 presents the details of our proposal. The results of experiments

followed by discussions were exposed in section 4. In the end, we conclude our

work with a conclusion and perspective.

2 State of the Art

The number of Internet users has now reached 38.8% of the world popula-

tion in 2013 against 0.4% in 1995 according to statistics provided by ITU

(http://www.itu.int/en/ITU-D/Statistics/Pages/stat/default.aspx). On the other

hand, resources called commonly items occur at an incredible speed either by

users or companies. Current tools are not consistent with this huge volume of

data in order to analyze, control or have relevant information at time. The birth

of FS is used to manage information overload by ﬁltering [3, 8]. Items can be

extremely varied DVDs, books, images, web pages, restaurants ... etc. These

systems are now increasingly present on the web and certainly will become es-

sential in the future with the continuous increase of data [12]. According to

how to estimate the relevance, researchers classify recommendation algorithms

into three main approaches: content-based, collaborative and hybrid [4]. In the

ﬁrst approach, the system will support the content of the thematic items ”doc-

uments” to compare them with a user proﬁle, itself consists of topics explaining

his interests, that is to say, the system compares the document themes with

those of the proﬁle and decides if the document is recommended or rejected ac-

cording to the threshold of satisfaction function [17]. In the second approach,

also known as social, the system uses the ratings of certain items or users and

in order to recommend them to other users through the application of similarity

process and without it being necessary to analyze the content of items [2], in this

approach, there are two main techniques which builds on memory-based algo-

rithms, that operates a portion or all of the ratings to generate a new prediction

[12] and which is founded on the model-based algorithms to create a descriptive

model of the user so, estimate the prediction. The collaborative approaches are

widely adopted in recommender systems such as Tapestry [4] GroupeLens [15],

Amazon, Netﬂix ... etc. The hybrid methods operate to attenuate the insuﬃ-

ciencies of each of the two previous approaches by combining them in various

manners. Recently, a new generation of FS boosted by semantic web formalisms

or adaptable to contexts that uses a taxonomies or ontologies [13]. Commonly,

these systems have shortcomings that prevent the recommendation process and

degrade their performances, like the eﬀect of the funnel where the user does not

proﬁted from the innovation and diversity of the items recommended in content-

based ﬁltering; the scalability where the system handles a large number of users

and items online what makes diﬃcult to predict in time; the sparsity problem,

where there’s a lack of suﬃcient evaluations to estimate the prediction well as

the problem of the cold start to a user and/or item lately integrated into the

system [11]. In this paper, we will extend the ﬁltering systems in an open and

interoperable speciﬁcation, each component of the system is formalized by an

appropriate RDF vocabulary. The following section explains the basic concepts

of this speciﬁcation.

3 Proposed approach

Our study focuses on reducing the sparsity problem through the similarity of

items via the values of DC properties, as well as the similarity of users through

the values of FOAF properties. The values of properties are heterogeneous type

nominal, ordinal, qualitative, etc ., so we have deﬁned several functions of en-

coding and normalization to convert these properties in a numeric scale. i.e.

quantitative values in the range [0-1].

3.1 RDF speciﬁcation

Resource Description Framework RDF (http://www.w3.org/TR/2004/REC-rdf-

syntax-grammar-20040210/) is a data model for the description of various types

of resources (person, web page, movie, service, book etc.). It treats the data and

its properties and the relationship between them, in other words it is a formal

speciﬁcation by meta-data, originally designed by W3C, whose purpose is to

allow a com-munity of users to share the same meta-data for shared resources.

However, an RDF document is a set of triplet ¡subject, predicate, object¿ where

the subject is the resource to be described, the predicate is the property of this

resource and the object it is the value of this property or another resource. One of

the great advantages of RDF is its extensibility through the use of RDF schemas

that can be integrated and not mutually exclusive with the use of namespace and

URI (Uniform Resource Identiﬁer) concepts [7]. It is always possible to present a

RDF document by a labelled directed graph. For example, “the book Semantic

Web for the Working Ontologist written by Dean Allemang on July 5, 2011”, in

RDF/XML Syntax: < ?xml version="1.0"? >

<rdf:RDF xmlns:ss="http://workingontologist.org/"

xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:xsd="http://www.w3.org/2001/XMLSchema#"

xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">

<rdf:Description rdf:about="http://www.amazon.fr/

Semantic-Web-Working-Ontologist-Effective/dp/0123859654/">

<ss:written by rdf:resource="http://www.cs.bu.edu/fac/

allemang/"/> </rdf:Description>

<rdf:Description rdf:about="http://www.amazon.fr/

Semantic-Web-Working-Ontologist-Effective/dp/0123859654/">

<ss:hasTitle>SemanticWeb for the WorkingOntologist</ss:hasTitle>

</rdf:Description>

<rdf:Description rdf:about="http://www.amazon.fr/

Semantic-Web-Working-Ontologist-Effective/dp/0123859654/">

<ss:hasDate >July 5, 2011 </ss:hasDate >

</rdf:Description>

</rdf:RDF>

Our solution (ﬁgure1) based on a modelling in RDF through FOAF and Dublin

core standards,describing the set of the users and items.

Fig. 1. Overall scheme of the proposal

HTML Viewer

A Hybrid Model to Improve Filtering Systems

Summary (3 min read)

1 Introduction

2 State of the Art

3 Proposed approach

3.1 RDF specification

3.2 Item′s representation

3.3 User′s Representation

3.4 Recommendation engine

4 Experimentation

4.2 Relevance metrics

4.3 Results and discussion

Figures (4)

References

Related Papers (5)