scispace - formally typeset
Search or ask a question
Author

Michalis Vazirgiannis

Bio: Michalis Vazirgiannis is an academic researcher from École Polytechnique. The author has contributed to research in topics: Graph (abstract data type) & Cluster analysis. The author has an hindex of 49, co-authored 326 publications receiving 13390 citations. Previous affiliations of Michalis Vazirgiannis include French Institute for Research in Computer Science and Automation & Télécom ParisTech.


Papers
More filters
Journal ArticleDOI
02 Dec 2001
TL;DR: The fundamental concepts of clustering are introduced while it surveys the widely known clustering algorithms in a comparative way and the issues that are under-addressed by the recent algorithms are illustrated.
Abstract: Cluster analysis aims at identifying groups of similar objects and, therefore helps to discover distribution of patterns and interesting correlations in large data sets. It has been subject of wide research since it arises in many application domains in engineering, business and social sciences. Especially, in the last years the availability of huge transactional and experimental data sets and the arising requirements for data mining created needs for clustering algorithms that scale and can be applied in diverse domains. This paper introduces the fundamental concepts of clustering while it surveys the widely known clustering algorithms in a comparative way. Moreover, it addresses an important issue of clustering process regarding the quality assessment of the clustering results. This is also related to the inherent features of the data set under concern. A review of clustering validity measures and approaches available in the literature is presented. Furthermore, the paper illustrates the issues that are under-addressed by the recent algorithms and gives the trends in clustering process.

2,643 citations

Journal ArticleDOI
TL;DR: This article introduces the modules that comprise a Web personalization system, emphasizing the Web usage mining module, and presents a review of the most common methods that are used as well as technical issues that occur.
Abstract: Web personalization is the process of customizing a Web site to the needs of specific users, taking advantage of the knowledge acquired from the analysis of the user's navigational behavior (usage data) in correlation with other information collected in the Web context, namely, structure, content, and user profile data. Due to the explosive growth of the Web, the domain of Web personalization has gained great momentum both in the research and commercial areas. In this article we present a survey of the use of Web mining for Web personalization. More specifically, we introduce the modules that comprise a Web personalization system, emphasizing the Web usage mining module. A review of the most common methods that are used as well as technical issues that occur is given, along with a brief overview of the most popular tools and applications available from software vendors. Moreover, the most important research initiatives in the Web usage mining and personalization areas are presented.

941 citations

Journal ArticleDOI
TL;DR: The paper formally defines the types and operations, offers detailed insight into the considerations that went into the design, and exemplifies the use of the abstract data types using SQL.
Abstract: Spatio-temporal databases deal with geometries changing over time. The goal of our work is to provide a DBMS data model and query language capable of handling such time-dependent geometries, including those changing continuously that describe moving objects. Two fundamental abstractions are moving point and moving region, describing objects for which only the time-dependent position, or position and extent, respectively, are of interest. We propose to present such time-dependent geometries as attribute data types with suitable operations, that is, to provide an abstract data type extension to a DBMS data model and query language. This paper presents a design of such a system of abstract data types. It turns out that besides the main types of interest, moving point and moving region, a relatively large number of auxiliary data types are needed. For example, one needs a line type to represent the projection of a moving point into the plane, or a “moving real” to represent the time-dependent distance of two points. It then becomes crucial to achieve (i) orthogonality in the design of the system, i.e., type constructors can be applied unifomly; (ii) genericity and consistency of operations, i.e., operations range over as many types as possible and behave consistently; and (iii) closure and consistency between structure and operations of nontemporal and related temporal types. Satisfying these goal leads to a simple and expressive system of abstract data types that may be integrated into a query language to yield a powerful language for querying spatio-temporal data, including moving objects. The paper formally defines the types and operations, offers detailed insight into the considerations that went into the design, and exemplifies the use of the abstract data types using SQL. The paper offers a precise and conceptually clean foundation for implementing a spatio-temporal DBMS extension.

784 citations

Journal ArticleDOI
01 Jun 2002
TL;DR: This paper presents a review of the clustering validity and methods based on external and internal criteria and discusses the cluster validity approaches based on internal and external criteria.
Abstract: Clustering is an unsupervised process since there are no predefined classes and no examples that would indicate grouping properties in the data set. The majority of the clustering algorithms behave differently depending on the features of the data set and the initial assumptions for defining groups. Therefore, in most applications the resulting clustering scheme requires some sort of evaluation as regards its validity. Evaluating and assessing the results of a clustering algorithm is the main subject of cluster validity. In this paper we present a review of the clustering validity and methods. More specifically, Part I of the paper discusses the cluster validity approaches based on external and internal criteria.

564 citations

Journal ArticleDOI
TL;DR: In this article, the authors present an in-depth comparative review of the methods presented so far for clustering directed networks along with the relevant necessary methodological background and also related applications.

533 citations


Cited by
More filters
Book
08 Sep 2000
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

23,600 citations

Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

01 Jan 2002

9,314 citations

01 Jan 2009

7,241 citations

Journal ArticleDOI
TL;DR: Clustering algorithms for data sets appearing in statistics, computer science, and machine learning are surveyed, and their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts are illustrated.
Abstract: Data analysis plays an indispensable role for understanding various phenomena. Cluster analysis, primitive exploration with little or no prior knowledge, consists of research developed across a wide variety of communities. The diversity, on one hand, equips us with many tools. On the other hand, the profusion of options causes confusion. We survey clustering algorithms for data sets appearing in statistics, computer science, and machine learning, and illustrate their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts. Several tightly related topics, proximity measure, and cluster validation, are also discussed.

5,744 citations