scispace - formally typeset
Search or ask a question
Book ChapterDOI

Clustering for Knowledgeable Web Mining

TL;DR: In this article, statistical features of the images are extracted from the pixel map of the image and presented to the fuzzy clustering algorithm (FCM) and Gath-Geva algorithm.
Abstract: Web pages nowadays have different forms and types of content. When the Web content is considered, they are in the form of pictures, videos, audio files, and text files in different languages. The content can be multilingual, heterogeneous, and unstructured. The mining should be independent of the language and software. Statistical features of the images are extracted from the pixel map of the image. The extracted features are presented to the fuzzy clustering algorithm (FCM) and Gath–Geva algorithm. The similarity metric being Euclidean distance and Gaussian distance, respectively. The accuracy is compared and presented.
Citations
More filters
Journal ArticleDOI
TL;DR: A modified unconscious search and its k-means hybrid for data clustering are proposed with two main modifications: generating initial population by combining solutions of k-Means and random solutions, and replacing the usual local search step of the original US by an existing Heuristic Search method.
Abstract: Clustering is a widely used data mining technique with a diverse set of applications. Since clustering is an NP-hard problem, finding high-quality solutions for large-scale clustering problems can be an arduous and computationally expensive task. Therefore, many metaheuristics are utilized to solve these problems efficiently. In this paper, a modified unconscious search (US) and its k-means hybrid for data clustering are proposed with two main modifications: (1) generating initial population by combining solutions of k-means and random solutions, (2) replacing the usual local search step of the original US by an existing Heuristic Search method. Modified US is tested on the seven following well-known benchmarks from the UCI machine learning directory: Iris, Wine, Glass, Cancer, Vowel, CMC, and Ecoli. The results are then compared against metaheuristics, such as genetic algorithm, particle swarm optimization (PSO), black hole algorithm, hybrid of PSO k-means, and accelerated chaotic PSO. The results of experiments show that, on average, the quality of best solutions obtained by the proposed methods on all seven datasets is 0.176% better than the quality of the other six algorithms applied for experimentations.
Journal ArticleDOI
TL;DR: This study aims on extraction of information from the available data after the data is digitized, which is generic, uses pixel-maps of the data which is software and language independent.
Abstract: Objectives: Internet is the repository of information, which contains enormous information about the past, present which can be used to predict future. To know the unknown users are inclined towards searching the internet rather than referencing the library because of ease of availability. This requirement initiates the need to find the content of a web page with in shortest period of time irrespective of the form the page is. So information and content extraction need to be at a basic generic level and easier to implement without depending on any major software. Methods: The study aims on extraction of information from the available data after the data is digitized. The digitized data is converted to pixel- maps which are universal. The pixel map will not face the issues of the form and the format of the web page content. Statistical method is incorporated to extract the attributes of the images so that issues of language hence text-script and format do not pose problems, the extracted features are presented to the Back Propagation algorithm. Findings: The accuracy is presented and how the content extraction within certain bounds could be possible Tested using unstructured word sets chosen from web pages. The method is demonstrated for mono lingual, multi-lingual and transliterated documents so that the applicability is universal. Applications/Improvement: The method is generic, uses pixel-maps of the data which is software and language independent.
References
More filters
Proceedings ArticleDOI
18 Jul 2010
TL;DR: A recursive extension of Gath-Geva clustering algorithm is proposed in this paper which is used as a basis for online tuning and development of neuro-fuzzy models and shows better performance of the proposed model as compared with other online modeling approaches.
Abstract: A recursive extension of Gath-Geva clustering algorithm is proposed in this paper which is used as a basis for online tuning and development of neuro-fuzzy models. In comparison with other online modeling approaches which use spherical clusters for defining validity region of neurons, the proposed evolving neuro-fuzzy model (ENFM) has the ability to take advantage of elliptical clusters. This extension increases the ability of local linear neurons of ENFM to capture system behavior in more sophisticated regions which leads to decrease in number of neurons as well as increase in the modeling ability. The proposed model is capable of adapting to changes in system behavior by adding new neurons or merging similar existing fuzzy rules. Efficiency of evolving neuro-fuzzy model is investigated in prediction of Mackey-Glass and smoothed sunspot number time series. Results of these simulations show better performance of the proposed model as compared with other online modeling approaches.

35 citations