scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Rough set based clustering in dense web domain

TL;DR: The clustering task for sequence data (web page visits) is demonstrated in three ways namely, capturing content information, sequence information and combination of both, suggesting that the measure which captures both content and sequence forms compact clusters, thus putting the web users of similar interests in one group.
Abstract: Clustering is a widely used technique in data mining applications. It groups the objects on the basis of similarity among them. Web has evolved enormously in past few years which resulted in sharp augmentation in number of web users and web pages. Web personalization has become a challenging task for e-Commerce based companies due to the information overload on web and increase of web users. Web users are matched with the available information in order to make personalization effective. Web usage data, coming from a single domain happens to be dense in nature as plenty of web users are fetching the pages from the same domain/ application area. This scenario is prevalent in case of e-Commerce websites. Rough set is a soft computing technique which is efficient in dealing with ambiguities present in data. In this paper we have utilized rough set based clustering using similarity upper approximation for deriving the clusters. The clusters evolve in steps and finally converge in to a well defined clustering scheme. Developers are trying to customize web sites as per the needs of specific users with the help of knowledge acquired from users' navigational behaviour. Since user page visits are intrinsically sequential in nature, efficient clustering algorithms with suitable distance/similarity measure for sequential data is needed. In the current paper, we demonstrate the clustering task for sequence data (web page visits) in three ways namely, capturing content information, sequence information and combination of both. Experimental results suggest that the measure which captures both content and sequence forms compact clusters, thus putting the web users of similar interests in one group.
Citations
More filters
06 Oct 2013
TL;DR: This paper performs a complete survey of different techniques that are available for retrieving the web documents and presents research directions for using soft computing techniques for web document retrieval.
Abstract: Web Documents are of vital importance as they provide information from basic to the core requirements of users. The web documents have similarity based on the content and information represented. The key information in the web documents can be analyzed and can be represented as features. This leads to the importance of using similarity measures for categorizing and representing the web documents back to the users. This is a vital challenge as the documents could be clustered and represented in multiple dimensions for retrieval. This paper performs a complete survey of different techniques that are available for retrieving the web documents. The paper also presents investigations of the different performance parameters available for measuring the performance of results. The paper also presents research directions for using soft computing techniques for web document retrieval.
References
More filters
Journal ArticleDOI
TL;DR: This approach seems to be of fundamental importance to artificial intelligence (AI) and cognitive sciences, especially in the areas of machine learning, knowledge acquisition, decision analysis, knowledge discovery from databases, expert systems, decision support systems, inductive reasoning, and pattern recognition.
Abstract: Rough set theory, introduced by Zdzislaw Pawlak in the early 1980s [11, 12], is a new mathematical tool to deal with vagueness and uncertainty. This approach seems to be of fundamental importance to artificial intelligence (AI) and cognitive sciences, especially in the areas of machine learning, knowledge acquisition, decision analysis, knowledge discovery from databases, expert systems, decision support systems, inductive reasoning, and pattern recognition.

7,185 citations

Journal ArticleDOI
TL;DR: This work presents a new coarsening heuristic (called heavy-edge heuristic) for which the size of the partition of the coarse graph is within a small factor of theSize of the final partition obtained after multilevel refinement, and presents a much faster variation of the Kernighan--Lin (KL) algorithm for refining during uncoarsening.
Abstract: Recently, a number of researchers have investigated a class of graph partitioning algorithms that reduce the size of the graph by collapsing vertices and edges, partition the smaller graph, and then uncoarsen it to construct a partition for the original graph [Bui and Jones, Proc. of the 6th SIAM Conference on Parallel Processing for Scientific Computing, 1993, 445--452; Hendrickson and Leland, A Multilevel Algorithm for Partitioning Graphs, Tech. report SAND 93-1301, Sandia National Laboratories, Albuquerque, NM, 1993]. From the early work it was clear that multilevel techniques held great promise; however, it was not known if they can be made to consistently produce high quality partitions for graphs arising in a wide range of application domains. We investigate the effectiveness of many different choices for all three phases: coarsening, partition of the coarsest graph, and refinement. In particular, we present a new coarsening heuristic (called heavy-edge heuristic) for which the size of the partition of the coarse graph is within a small factor of the size of the final partition obtained after multilevel refinement. We also present a much faster variation of the Kernighan--Lin (KL) algorithm for refining during uncoarsening. We test our scheme on a large number of graphs arising in various domains including finite element methods, linear programming, VLSI, and transportation. Our experiments show that our scheme produces partitions that are consistently better than those produced by spectral partitioning schemes in substantially smaller time. Also, when our scheme is used to compute fill-reducing orderings for sparse matrices, it produces orderings that have substantially smaller fill than the widely used multiple minimum degree algorithm.

5,629 citations

Book ChapterDOI
Pavel Berkhin1
01 Jan 2006
TL;DR: This survey concentrates on clustering algorithms from a data mining perspective as a data modeling technique that provides for concise summaries of the data.
Abstract: Clustering is the division of data into groups of similar objects. In clustering, some details are disregarded in exchange for data simplification. Clustering can be viewed as a data modeling technique that provides for concise summaries of the data. Clustering is therefore related to many disciplines and plays an important role in a broad range of applications. The applications of clustering usually deal with large datasets and data with many attributes. Exploration of such data is a subject of data mining. This survey concentrates on clustering algorithms from a data mining perspective.

3,047 citations