scispace - formally typeset
Search or ask a question

Showing papers by "University of Paderborn published in 2003"


Journal ArticleDOI
TL;DR: The Chemistry Development Kit provides methods for many common tasks in molecular informatics, including 2D and 3D rendering of chemical structures, I/O routines, SMILES parsing and generation, ring searches, isomorphism checking, structure diagram generation, etc.
Abstract: The Chemistry Development Kit (CDK) is a freely available open-source Java library for Structural Chemo- and Bioinformatics. Its architecture and capabilities as well as the development as an open-source project by a team of international collaborators from academic and industrial institutions is described. The CDK provides methods for many common tasks in molecular informatics, including 2D and 3D rendering of chemical structures, I/O routines, SMILES parsing and generation, ring searches, isomorphism checking, structure diagram generation, etc. Application scenarios as well as access information for interested users and potential contributors are given.

936 citations


Proceedings ArticleDOI
24 Apr 2003
TL;DR: The Sigma method is introduced as a new method for finding best local guides for each particle of the population from a set of Pareto-optimal solutions and the results are compared with the results of a multi-objective evolutionary algorithm (MOEA).
Abstract: In multi-objective particle swarm optimization (MOPSO) methods, selecting the best local guide (the global best particle) for each particle of the population from a set of Pareto-optimal solutions has a great impact on the convergence and diversity of solutions, especially when optimizing problems with high number of objectives. This paper introduces the Sigma method as a new method for finding best local guides for each particle of the population. The Sigma method is implemented and is compared with another method, which uses the strategy of an existing MOPSO method for finding the local guides. These methods are examined for different test functions and the results are compared with the results of a multi-objective evolutionary algorithm (MOEA).

679 citations


Proceedings Article
21 Aug 2003
TL;DR: This work presents a new approach that is especially designed to construct batches and incorporates a diversity measure that has low computational requirements making it feasible for large scale problems with several thousands of examples.
Abstract: In many real world applications, active selection of training examples can significantly reduce the number of labelled training examples to learn a classification function. Different strategies in the field of support vector machines have been proposed that iteratively select a single new example from a set of unlabelled examples, query the corresponding class label and then perform retraining of the current classifier. However, to reduce computational time for training, it might be necessary to select batches of new training examples instead of single examples. Strategies for single examples can be extended straightforwardly to select batches by choosing the h > 1 examples that get the highest values for the individual selection criterion. We present a new approach that is especially designed to construct batches and incorporates a diversity measure. It has low computational requirements making it feasible for large scale problems with several thousands of examples. Experimental results indicate that this approach provides a faster method to attain a level of generalization accuracy in terms of the number of labelled examples.

515 citations


Book ChapterDOI
23 Jun 2003
TL;DR: The design principles, the basic concepts, and the underlying XML technology of PNML are discussed to stimulate discussion on and contributions to a standard Petri net interchange format.
Abstract: The Petri Net Markup Language (PNML) is an XML-based interchange format for Petri nets. In order to support different versions of Petri nets and, in particular, future versions of Petri nets, PNML allows the definition of Petri net types. Due to this flexibility, PNML is a starting point for a standard interchange format for Petri nets. This paper discusses the design principles, the basic concepts, and the underlying XML technology of PNML. The main purpose of this paper is to disseminate the ideas of PNML and to stimulate discussion on and contributions to a standard Petri net interchange format.

391 citations


Book ChapterDOI
27 Jan 2003
TL;DR: An implemen- tation independent fault attack on AES is presented, able to deter- mine the complete 128-bit secret key of a sealed tamper-proof smart- card by generating 128 faulty cipher texts.
Abstract: In this paper we describe several fault attacks on the Ad- vanced Encryption Standard (AES). First, using optical/eddy current fault induction attacks as recently publicly presented by Skorobogatov, Anderson and Quisquater, Samyde (SA,QS), we present an implemen- tation independent fault attack on AES. This attack is able to deter- mine the complete 128-bit secret key of a sealed tamper-proof smart- card by generating 128 faulty cipher texts. Second, we present several implementation-dependent fault attacks on AES. These attacks rely on the observation that due to the AES's known timing analysis vulnera- bility (as pointed out by Koeune and Quisquater (KQ)), any implemen- tation of the AES must ensure a data independent timing behavior for the so called AES's xtime operation. We present fault attacks on AES based on various timing analysis resistant implementations of the xtime- operation. Our strongest attack in this direction uses a very liberal fault model and requires only 256 faulty encryptions to determine a 128-bit key.

334 citations


Journal ArticleDOI
15 Feb 2003-Proteins
TL;DR: For both solutes, the distribution from the QM/MM simulation shows greater similarity with the distribution in high‐resolution protein structures than is the case for any of the MM simulations.
Abstract: We compare the conformational distributions of Ace-Ala-Nme and Ace-Gly-Nme sampled in long simulations with several molecular mechanics (MM) force fields and with a fast combined quantum mechanics/molecular mechanics (QM/MM) force field, in which the solute's intramolecular energy and forces are calculated with the self-consistent charge density functional tight binding method (SCCDFTB), and the solvent is represented by either one of the well-known SPC and TIP3P models. All MM force fields give two main states for Ace-Ala-Nme, beta and alpha separated by free energy barriers, but the ratio in which these are sampled varies by a factor of 30, from a high in favor of beta of 6 to a low of 1/5. The frequency of transitions between states is particularly low with the amber and charmm force fields, for which the distributions are noticeably narrower, and the energy barriers between states higher. The lower of the two barriers lies between alpha and beta at values of psi near 0 for all MM simulations except for charmm22. The results of the QM/MM simulations vary less with the choice of MM force field; the ratio beta/alpha varies between 1.5 and 2.2, the easy pass lies at psi near 0, and transitions between states are more frequent than for amber and charmm, but less frequent than for cedar. For Ace-Gly-Nme, all force fields locate a diffuse stable region around phi = pi and psi = pi, whereas the amber force field gives two additional densely sampled states near phi = +/-100 degrees and psi = 0, which are also found with the QM/MM force field. For both solutes, the distribution from the QM/MM simulation shows greater similarity with the distribution in high-resolution protein structures than is the case for any of the MM simulations.

266 citations


Journal ArticleDOI
TL;DR: In this article, new functional forms have been developed for multiparameter equations of state for non-and weakly polar fluids and for polar fluids, which were established with an optimization algorithm which considers data sets for different fluids simultaneously.
Abstract: New functional forms have been developed for multiparameter equations of state for non- and weakly polar fluids and for polar fluids. The resulting functional forms, which were established with an optimization algorithm which considers data sets for different fluids simultaneously, are suitable as a basis for equations of state for a broad variety of fluids. The functional forms were designed to fulfil typical demands of advanced technical application with regard to the achieved accuracy. They are numerically very stable and their substance-specific coefficients can easily be fitted to restricted data sets. In this way, a fast extension of the group of fluids for which accurate empirical equations of state are available is now possible. This article deals with the results found for the polar fluids CFC-11 (trichlorofluoromethane), CFC-12 (dichlorodifluoromethane), HCFC-22 (chlorodifluoromethane), HFC-32 (difluoromethane), CFC-113 (1,1,2-trichlorotrifluoroethane), HCFC-123 (2,2-dichloro-1,1,1-trifluoroethane), HFC-125 (pentafluoroethane), HFC-134a (1,1,1,2-tetrafluoroethane), HFC-143a (1,1,1-trifluoroethane), HFC-152a (1,1-difluoroethane), carbon dioxide, and ammonia. The substance-specific parameters of the new equations of state are given as well as statistical and graphical comparisons with experimental data. General features of the new class of equations of state such as their extrapolation behavior or their numerical stability and results for non- and weakly polar fluids have been discussed in preceding articles.

242 citations


Journal ArticleDOI
TL;DR: In this article, a new functional form for multiparameter equations of state was developed for non-and weakly polar fluids and for polar fluids, which were established with an optimization algorithm which considers data sets for different fluids simultaneously.
Abstract: New functional forms for multiparameter equations of state have been developed for non- and weakly polar fluids and for polar fluids. The resulting functional forms, which were established with an optimization algorithm which considers data sets for different fluids simultaneously, are suitable as a basis for equations of state for a broad variety of fluids. With regard to the achieved accuracy, the functional forms were designed to fulfill typical demands of advanced technical application. They are numerically very stable, and their substance-specific coefficients can easily be fitted to restricted data sets. In this way, a fast extension of the group of fluids for which accurate empirical equations of state are available becomes possible. This article deals with characteristic features of the new class of simultaneously optimized equations of state. Shortcomings of existing multiparameter equations of state widely used in technical applications are briefly discussed, and demands on the new class of equations of state are formulated. Substance specific parameters and detailed comparisons are given in subsequent articles for the non- and weakly polar fluids (methane, ethane, propane, isobutane, n-butane, n-pentane, n-hexane, n-heptane, n-octane, argon, oxygen, nitrogen, ethylene, cyclohexane, and sulfur hexafluoride) and for the polar fluids (trichlorofluoromethane (CFC-11), dichlorodifluoromethane (CFC-12), chlorodifluoromethane (HCFC-22), difluoromethane (HFC-32), 1,1,2-trichlorotrifluoroethane (CFC-113), 2,2-dichloro-1,1,1-trifluoroethane (HCFC-123), pentafluoroethane (HFC-125), 1,1,1,2-tetrafluoroethane (HFC-134a), 1,1,1-trifluoroethane (HFC-143a), 1,1-difluoroethane (HFC-152a), carbon dioxide, and ammonia) considered to date.

237 citations


Journal ArticleDOI
TL;DR: In this article, it was shown that every height-three complete intersection has the weak or strong Lefschetz property, and a sharp bound on the graded Betti numbers of K-algebras with the weak and strong lefscheckz property and fixed Hilbert functions was given.

216 citations


Proceedings ArticleDOI
09 Jun 2003
TL;DR: This work gives a polynomial time construction that guarantees Racke's bounds, and more generally gives the true optimal ratio for any network.
Abstract: A recent seminal result of Racke is that for any network there is an oblivious routing algorithm with a polylog competitive ratio with respect to congestion. Unfortunately, Racke's construction is not polynomial time. We give a polynomial time construction that guarantee's Racke's bounds, and more generally gives the true optimal ratio for any network.

208 citations


Journal ArticleDOI
TL;DR: In this paper, a software system called ADAPCRACK3D is developed by the authors to predict fatigue crack growth in arbitrary 3D geometries under complex loading by the use of the finite element method.

Journal ArticleDOI
TL;DR: In this article, the authors focus on two important soft factors, corporate culture and human resource management, that are necessary for a successful implementation of a service-oriented strategy in industrial marketing companies and analyze the mediating role of these two soft factors in the causal chain leading from a serviceoriented strategy to organizational performance.
Abstract: It has been recognized that in today's highly competitive industrial markets, one of the few ways left to gain differentiation from competitors is by offering value-added services. To do so, however, requires a service-oriented strategy and the active implementation of this strategy which includes significant internal changes in management philosophy and approach. Unfortunately, no study has examined the implementation aspects of a service-oriented strategy. In this context, our research focuses on two important “soft factors,” corporate culture and human resource management, that are necessary for a successful implementation of a service-oriented strategy in industrial marketing companies. We analyze the mediating role of these two soft factors in the causal chain leading from a service-oriented strategy to organizational performance. We find that the soft factors play an important mediating role in the link between a service-oriented strategy and organizational performance.

Journal ArticleDOI
TL;DR: In this article, a self-consistent charge density functional tight-binding scheme (SCC-DFTB) is proposed to extend its applicability to biomolecular structures, which has been implemented into quantum mechanical/molecular mechanics and linear scaling schemes and augmented with an empirical treatment of the dispersion forces.
Abstract: In the last years, we have developed a computationally efficient approximation to density functional theory, the so called self-consistent charge density functional tight-binding scheme (SCC-DFTB). To extend its applicability to biomolecular structures, this method has been implemented into quantum mechanical/molecular mechanics (QM/MM) and linear scaling schemes and augmented with an empirical treatment of the dispersion forces. We review here applications of the SCC-DFTB QM/MM method to proton transfer (PT) reactions in enzymes like liver alcohol dehydrogenase and triosephosphate isomerase. The computational speed of SCC-DFTB allows not only to compute minimum energy pathways for the PT but also the potential of mean force. Further applications concern the dynamics of polypeptides in solution and of ligands in their biological environment. The developments reviewed allowed for the first time realistic QM simulations of polypeptides, a protein and a DNA dodecamer in the nanosecond time scale.

Proceedings ArticleDOI
12 Jan 2003
TL;DR: This paper shows how very deep min-max and duality theorems from Graph Minors can be used to obtain essential speed-up to many known algorithms on different domination problems.
Abstract: Graph minors theory, developed by Robertson & Seymour, provides a list of powerful theoretical results and tools. However, the wide spread opinion in Graph Algorithms community about this theory is that it is mainly of theoretical importance. The main purpose of this paper is to show how very deep min-max and duality theorems from Graph Minors can be used to obtain essential speed-up to many known algorithms on different domination problems.

Proceedings ArticleDOI
01 Sep 2003
TL;DR: This work provides a domain specific formal semantic definition for a subset of the UML 2.0 component model and an integrated sequence of design steps which prescribe how to compose complex software systems from domain-specific patterns which model a particular part of the system behavior in a well-defined context.
Abstract: Current techniques for the verification of software as e.g. model checking are limited when it comes to the verification of complex distributed embedded real-time systems. Our approach addresses this problem and in particular the state explosion problem for the software controlling mechatronic systems, as we provide a domain specific formal semantic definition for a subset of the UML 2.0 component model and an integrated sequence of design steps. These steps prescribe how to compose complex software systems from domain-specific patterns which model a particular part of the system behavior in a well-defined context. The correctness of these patterns can be verified individually because they have only simple communication behavior and have only a fixed number of participating roles. The composition of these patterns to describe the complete component behavior and the overall system behavior is prescribed by a rigorous syntactic definition which guarantees that the verification of component and system behavior can exploit the results of the verification of individual patterns.


Book ChapterDOI
24 Jun 2003
TL;DR: Nearly all existing HPC systems are operated by resource management systems based on the queuing approach, but with the increasing acceptance of grid middleware like Globus, new requirements for the underlying local resource management system arise.
Abstract: Nearly all existing HPC systems are operated by resource management systems based on the queuing approach. With the increasing acceptance of grid middleware like Globus, new requirements for the underlying local resource management systems arise. Features like advanced reservation or quality of service are needed to implement high level functions like co-allocation. However it is difficult to realize these features with a resource management system based on the queuing concept since it considers only the present resource usage.

Proceedings ArticleDOI
27 Oct 2003
TL;DR: This paper describes a new algorithm to prevent fault attacks on RSA signature algorithms using the Chinese Remainder Theorem (CRT-RSA), and proves that the new algorithm is secure against the Bellcore attack.
Abstract: In this paper we describe a new algorithm to prevent fault attacks on RSA signature algorithms using the Chinese Remainder Theorem (CRT-RSA). This variant of the RSA signature algorithm is widely used on smartcards. Smartcards on the other hand are particularly susceptible to fault attacks like the one described in [7]. Recent results have shown that fault attacks are practical and easy to accomplish ([21], [17]).Therefore, they establish a practical need for fault attack protected CRT-RSA schemes. Starting from a careful derivation and classification of fault models, we describe a new variant of the CRT-RSA algorithm. For the most realistic fault model described, we rigorously analyze the success probability of an adversary against our new CRT-RSA algorithm. Thereby, we prove that our new algorithm is secure against the Bellcore attack.

Journal ArticleDOI
TL;DR: Parameters for the zinc ion have been developed in the self‐consistent charge density functional tight‐binding (SCC‐DFTB) framework and the approach was tested against B3LYP calculations for a range of systems, including small molecules that contain the typical coordination environment of zinc in biological systems.
Abstract: Parameters for the zinc ion have been developed in the self-consistent charge density functional tight-binding (SCC-DFTB) framework. The approach was tested against B3LYP calculations for a range of systems, including small molecules that contain the typical coordination environment of zinc in biological systems (cysteine, histidine, glutamic/aspartic acids, and water) and active site models for a number of enzymes such as alcohol dehydrogenase, carbonic anhydrase, and aminopeptidase. The SCC-DFTB approach reproduces structural and energetic properties rather reliably (e.g., total and relative ligand binding energies and deprotonation energies of ligands and barriers for zinc-assisted proton transfers), as compared with B3LYP/6-311+G** or MP2/6-311+G** calculations.

Journal ArticleDOI
TL;DR: The present data indicate a potential role of lycopene degradation products in cell signaling enhancing cell-to-cell communication via gap junctions in rat liver epithelial WB-F344 cells.

Proceedings ArticleDOI
01 Sep 2003
TL;DR: This work proposes UML models of both the architectural style of the platform and the application scenario, and a formal interpretation of these as graphs and graph transformation systems is able to validate the consistency between platform and application.
Abstract: Most applications developed today rely on a given middleware platform which governs the interaction between components, the access to resources, etc. To decide, which platform is suitable for a given application (or more generally, to understand the interaction between application and platform), we propose UML models of both the architectural style of the platform and the application scenario. Based on a formal interpretation of these as graphs and graph transformation systems, we are able to validate the consistency between platform and application.

Book ChapterDOI
17 Aug 2003
TL;DR: It is shown that for small public exponent RSA half of the bits of dp = d mod p- 1 suffice to find the factorization of N in polynomial time and therefore the method belongs to the strongest known partial key exposure attacks.
Abstract: In 1998, Boneh, Durfee and Frankel [4] presented several attacks on RSA when an adversary knows a fraction of the secret key bits. The motivation for these so-called partial key exposure attacks mainly arises from the study of side-channel attacks on RSA. With side channel attacks an adversary gets either most significant or least significant bits of the secret key. The polynomial time algorithms given in [4] only work provided that the public key e is smaller than \(N^{\frac{1}{2}}\). It was raised as an open question whether there are polynomial time attacks beyond this bound. We answer this open question in the present work both in the case of most and least significant bits. Our algorithms make use of Coppersmith’s heuristic method for solving modular multivariate polynomial equations [8]. For known most significant bits, we provide an algorithm that works for public exponents e in the interval [\(N^{\frac{1}{2}}\), N 0.725]. Surprisingly, we get an even stronger result for known least significant bits: An algorithm that works for all \(e < N^{\frac{7}{8}}\).

Book
01 Jan 2003
TL;DR: A SystemC Based System On Chip Modelling and Design Methodology and Modeling and Refinement of Mixed-Signal Systems with SystemC, as a Complete Design and Validation Environment.
Abstract: Foreword. Preface. 1: A SystemC Based System On Chip Modelling and Design Methodology Y. Vanderperren, M. Pauwels, W. Dehaene, A.Berna, F. OEzdemir. 2: Using Transactional Level Models in a SoC Design Flow A. Clouard, K. Jain, F. Ghenassia, L. Maillet-Contoz, J.-P. Strassen. 3: Refining a High Level SystemC Model B. Niemann, F. Mayer, F.J. Rabano Rubio, M. Speitel. 4: An ASM Based SystemC Simulation Semantics W. Muller, J. Ruf, W. Rosenstiel. 5: SystemC as a Complete Design and Validation Environment A. Fin, F. Fummi, G. Pravadelli. 6: System Level Performance Estimation N. Pazos, W. Brunnbauer, J. Foag, T. Wild. 7: Design of Protocol Dominated Digital Systems R. Siegmund, U. Pross, D. Muller. 8: Object Oriented Hardware Design and Synthesis Based on SystemC 2.0 E. Grimpe, W. Nebel, F. Oppenheimer, T. Schubert. 9: Embedded Software Generation from SystemC for Platform Based Design F. Herrera, V. Fernandez, P. Sanchez, E. Villar. 10: SystemC-AMS: Rationales, State of the Art, and Examples K. Einwich, P. Schwarz, C. Grimm, C. Meise. 11: Modeling and Refinement of Mixed-Signal Systems with SystemC C. Grimm. References. Index.

01 Jan 2003
TL;DR: The non-local semantics of Event Driven Process Chains (EPCs) have been extensively studied in the literature as discussed by the authors, where the most debatable features of EPCs are their nonlocal semantics.
Abstract: One of the most debatable features of Event driven Process Chains (EPCs) is their non-local semantics. Most non-local semantics for EPCs either have a formal flaw or no formal definition is given at all.

Journal ArticleDOI
01 Feb 2003
TL;DR: The Paderborn University BSP (PUB) library is a C communication library based on the BSP model that supports buffered as well as unbuffered non-blocking communication between any pair of processors and a mechanism for synchronizing the processors in a barrier style.
Abstract: The Paderborn University BSP (PUB) library is a C communication library based on the BSP model. The basic library supports buffered as well as unbuffered non-blocking communication between any pair of processors and a mechanism for synchronizing the processors in a barrier style. In addition, PUB provides non-blocking collective communication operations on arbitrary subsets of processors, the ability to partition the processors into independent groups that execute asynchronously from each other, and a zero-cost synchronization mechanism. Furthermore, some techniques used in the implementation of the PUB library deviate significantly from the techniques used in other BSP libraries.

Book ChapterDOI
30 Jun 2003
TL;DR: The first result is an O(nm2) time algorithm for Nashification, which can be used in combination with any approximation algorithm for the routing problem to compute a Nash equilibrium of the same quality and yields a PTAS for the computation of a best Nash equilibrium.
Abstract: We study the problem of n users selfishly routing traffic through a network consisting of m parallel related links. Users route their traffic by choosing private probability distributions over the links with the aim of minimizing their private latency. In such an environment Nash equilibria represent stable states of the system: no user can improve its private latency by unilaterally changing its strategy. Nashification is the problem of converting any given non-equilibrium routing into a Nash equilibrium without increasing the social cost. Our first result is an O(nm2) time algorithm for Nashification. This algorithm can be used in combination with any approximation algorithm for the routing problem to compute a Nash equilibrium of the same quality. In particular, this approach yields a PTAS for the computation of a best Nash equilibrium. Furthermore, we prove a lower bound of Ω(2√n) and an upper bound of O(2n) for the number of greedy selfish steps for identical link capacities in the worst case. In the second part of the paper we introduce a new structural parameter which allows us to slightly improve the upper bound on the coordination ratio for pure Nash equilibria in [3]. The new bound holds for the individual coordination ratio and is asymptotically tight. Additionally, we prove that the known upper bound of 1+√4m-3/2 on the coordination ratio for pure Nash equilibria also holds for the individual coordination ratio in case of mixed Nash equilibria, and we determine the range of m for which this bound is tight.

Journal ArticleDOI
TL;DR: In this article, a review of the properties of silicon-based two-dimensional (2D) photonic crystals is given, essentially infinite 2D photonic materials made from macroporous silicon and photonic crystal slabs based on silicon-on-insulator basis.
Abstract: A review of the properties of silicon-based two-dimensional (2D) photonic crystals is given, essentially infinite 2D photonic crystals made from macroporous silicon and photonic crystal slabs based on silicon-on-insulator basis. We discuss the bulk photonic crystal properties with particular attention to the light cone and its impact on the band structure. The application for wave guiding is discussed for both material systems, and compared to classical waveguides based on index-guiding. Losses of resonant waveguide modes above the light line are discussed in detail. © 2003 Elsevier B.V. All rights reserved.

Proceedings ArticleDOI
08 Dec 2003
TL;DR: The results show that the /spl epsi/-dominance method can find solutions much faster than the clustering technique with comparable and even in some cases better convergence and diversity.
Abstract: In this paper, the influence of /spl epsi/-dominance on multi-objective particle swarm optimization (MOPSO) methods is studied. The most important role of /spl epsi/-dominance is to bound the number of non-dominated solutions stored in the archive (archive size), which has influences on computational time, convergence and diversity of solutions. Here, /spl epsi/-dominance is compared with the existing clustering technique for fixing the archive size and the solutions are compared in terms of computational time, convergence and diversity. A new diversity metric is also suggested. The results show that the /spl epsi/-dominance method can find solutions much faster than the clustering technique with comparable and even in some cases better convergence and diversity.

Journal ArticleDOI
TL;DR: Temporary wetting of porous templates provides customized nanotubes and allows us to investigate how the wall curvature affects the structure formation.
Abstract: Nanotubes have an outstanding potential both for applications in nanotechnology and as the subject of basic research. Wetting of porous templates is a simple technique that overcomes many limitations of established preparation methods. It extends the range of processable materials, for example, by a broad range of multicomponent mixtures or by high-performance polymers such as poly(oxy-1,4-phenyleneoxy-1,4-phenylenecarbonyl-1,4-phenylene) (PEEK) and polytetrafluoroethylene (PTFE). Inducing controlled phase transitions generates a large specific surface, a specific nanoporosity, or oriented crystalline domains within the nanotube walls. Template wetting provides customized nanotubes and allows us to investigate how the wall curvature affects the structure formation.

01 Jan 2003
TL;DR: This analysis includes the classical cluster validity measures from Dunn and Davies-Bouldin as well as the new graph-based measures Λ (weighted edge connectivity) and ρ (expected edge density) and they are definitely outperformed by the expected edge density ρ.
Abstract: In the field of information retrieval, clustering algorithms are used to analyze large collections of documents with the objective to form groups of similar documents. Clustering a document collection is an ambiguous task: A clustering, i. e. a set of document groups, depends on the chosen clustering algorithm as well as on the algorithm’s parameter settings. To find the best among several clusterings, it is common practice to evaluate their internal structures with a cluster validity measure. A clustering is considered to be useful to a user if particular structural properties are well developed. Nevertheless, the presence of certain structural properties may not guarantee usefulness from an information retrieval standpoint, say, whether or not the found document groups resemble the classification of a human editor. The paper in hand investigates this point: Based on already classified document collections we generate clusterings and compare the predicted quality to their real quality. Our analysis includes the classical cluster validity measures from Dunn and Davies-Bouldin as well as the new graph-based measures Λ (weighted edge connectivity) and ρ (expected edge density). The experiments show interesting results: The classical measures behave in a consistent manner insofar as mediocre and poor clusterings are identified as such. On real-world document clustering data, however, they are definitely outperformed by the expected edge density ρ. This superiority of the graph-based measures can be explained by their independence of cluster forms and distances.