scispace - formally typeset
Book ChapterDOI

Gathering and mining information from web log files

Reads0
Chats0
TLDR
A general methodology for gathering and mining information from Web log files by abstracting from the analysis of logs which use a well-defined standard format, such as the Extended Log File Format proposed by W3C.
Abstract
In this paper, a general methodology for gathering and mining information from Web log files is proposed. A series of tools to retrieve, store, and analyze the data extracted from log files have been designed and implemented. The aim is to form general methods by abstracting from the analysis of logs which use a well-defined standard format, such as the Extended Log File Format proposed by W3C. The methodology has been experimented on the Web log files of The European Library portal; the experimental analyses led to personal, technical, geographical and temporal findings about the usage and traffic load. Considerations about a more accurate tracking of users and users profiles, and a better management of crawler accesses using authentication are presented.

read more

Citations
More filters
Journal ArticleDOI

Web log analysis: a review of a decade of studies about information acquisition, inspection and interpretation of user interaction

TL;DR: An overview of 10 years of research on log analysis is presented, which presents an overview of two main themes: Web search engine log analysis and Digital Library System log analysis.
Journal ArticleDOI

Understanding user requirements and preferences for a digital library Web portal

TL;DR: The analysis conducted shed light on likely motivations for both participant usage and reluctance to use the services provided, leading to more informed decisions on how to refine, improve, and present Web portal services to their future users.
Book ChapterDOI

LogCLEF 2009: the CLEF 2009 multilingual logfile analysis track overview

TL;DR: The interest in multilingual log analysis was promoted by the Cross Language Evaluation Forum (CLEF) for the first time with a track named LogCLEF and five groups using a variety of approaches submitted experiments.
Journal Article

Personalizing search using socially enhanced interest model, built from the stream of user's activity

TL;DR: A method which can infer additional keywords for a search query by leveraging a social network context and a method to build this network from the stream of user's activity on the Web is proposed.

LogCLEF 2010: the CLEF 2010 Multilingual Logfile Analysis Track Overview

TL;DR: In this article, the authors presented the results of the second track of the LogCLEF cross language evaluation forum (CLEF-CEM) 2009, where the authors analyzed languages of queries, activities within sessions and success of searches.
References
More filters
Book

The entity-relationship model: toward a unified view of data

TL;DR: A data model, called the entity-relationship model, is proposed that incorporates some of the important semantic information about the real world and can be used as a basis for unification of different views of data: the network model, the relational model, and the entity set model.
Proceedings ArticleDOI

The entity-relationship model: toward a unified view of data

TL;DR: A data model, called the entity-relationship model, which incorporates the semantic information in the real world is proposed, and a special diagramatic technique is introduced for exhibiting entities and relationships.
Journal ArticleDOI

Scholarly journal usage: the results of deep log analysis

TL;DR: This is the second study to look at the information seeking behaviour of academics and researchers in regard to digital journal libraries, and concentrates on the users and usage of Blackwell Synergy.
Journal Article

The impact of site structure and user environment on session reconstruction in Web usage analysis

TL;DR: In this paper, the authors investigated the impact of site structure on the quality of constructed sessions and compared sessionizing on frame-based and a frame-free version of a site, and showed that session reconstruction heuristics can be recommended depending on the characteristics of the site.
BookDOI

WEBKDD 2002 - Mining Web Data for Discovering Usage Patterns and Profiles

TL;DR: This paper describes the further development of this work into a prototype service called LumberJack, a push-button analysis system that is both more automated and accurate than past systems.