scispace - formally typeset
Open AccessJournal ArticleDOI

A Fuzzy Approach for Feature Evaluation and Dimensionality Reduction to Improve the Quality of Web Usage Mining Results

TLDR
The fuzzy set theoretic approach to perform feature selection (or dimensionality reduction) and session weight assignment and the traditional approach of direct elimination of small sessions and low support count URLs are compared.
Abstract
The explosive growth in the information available on the Web has necessitated the need for developing Web personalization systems that understand user preferences to dynamically serve customized content to individual users. Web server access logs contain substantial data about the accesses of users to a Web site. Hence, if properly exploited, the log data can reveal useful information about the navigational behaviour of users in a site. In order to reveal the information about user preferences from, Web Usage Mining is being performed. Web Usage Mining is the application of data mining techniques to web usage log repositories in order to discover the usage patterns that can be used to analyze the user’s navigational behavior. WUM contains three main steps: preprocessing, knowledge extraction and results analysis. During the preprocessing stage, raw web log data is transformed into a set of user profiles. Each user profile captures a set of URLs representing a user session. Clustering can be applied to this sessionized data in order to capture similar interests and trends among users’ navigational patterns. Since the sessionized data may contain thousands of user sessions and each user session may consist of hundreds of URL accesses, dimensionality reduction is achieved by eliminating the low support URLs. Very small sessions are also removed in order to filter out the noise from the data. But direct elimination of low support URLs and small sized sessions may results in loss of a significant amount of information especially when the count of low support URLs and small sessions is large. We propose a fuzzy solution to deal with this problem by assigning weights to URLs and user sessions based on a fuzzy membership function. After assigning the weights we apply a "Fuzzy c-Mean Clustering" algorithm to discover the clusters of user profiles. In this paper, we describe our fuzzy set theoretic approach to perform feature selection (or dimensionality reduction) and session weight assignment. Finally we compare our soft computing based approach of dimensionality reduction with the traditional approach of direct elimination of small sessions and low support count URLs. Our results show that fuzzy feature evaluation and dimensionality  reduction results in better performance and validity indices for the discovered clusters.

read more

Citations
More filters
Journal ArticleDOI

Unsupervised Machine Learning for Networking: Techniques, Applications and Research Challenges

TL;DR: In this article, the authors provide an overview of unsupervised learning in the domain of networking, and provide a comprehensive review of the current state of the art in this area, by synthesizing insights from previous survey papers.
Journal ArticleDOI

A fuzzy neural network based framework to discover user access patterns from web log data

TL;DR: A fuzzy neural clustering network (FNCN) based framework is proposed that makes use of the fuzzy membership concept of fuzzy c-means (FCM) clustering and the learning rate of a modified self-organizing map (MSOM) neural network model and tries to minimize the weighted sum of the squared error.
Journal ArticleDOI

Mountain density-based fuzzy approach for discovering web usage clusters from web log data

TL;DR: A mountain density function (MDF)-based fuzzy clustering framework for discovering user session clusters in web log data and the quality of the clusters formed using the proposed algorithms is much better in terms of various validity measures compared with the FCM and FCMed algorithms.
Journal ArticleDOI

Analysing and Visualizing Tweets for U.S. President Popularity

TL;DR: A software system in R based on the Bayesian approach for text categorization is developed for identifying sentiments expressed by the tweets posted on the Twitter social platform, which allows to identify the free thoughts, expressed authentically.
References
More filters
Book

Pattern Recognition with Fuzzy Objective Function Algorithms

TL;DR: Books, as a source that may involve the facts, opinion, literature, religion, and many others are the great friends to join with, becomes what you need to get.
Book

Pattern classification and scene analysis

TL;DR: In this article, a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition is provided, including Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.
Book

Finding Groups in Data: An Introduction to Cluster Analysis

TL;DR: An electrical signal transmission system, applicable to the transmission of signals from trackside hot box detector equipment for railroad locomotives and rolling stock, wherein a basic pulse train is transmitted whereof the pulses are of a selected first amplitude and represent a train axle count.
Related Papers (5)