Showing papers on "Metric (mathematics) published in 2002"

PDF

Open Access

Proceedings Article•DOI•

Robust wide baseline stereo from maximally stable extremal regions

[...]

Jiri Matas¹, Ondrej Chum, Martin Urban, Tomas Pajdla•Institutions (1)

01 Jan 2002

TL;DR: The wide-baseline stereo problem, i.e. the problem of establishing correspondences between a pair of images taken from different viewpoints, is studied and an efficient and practically fast detection algorithm is presented for an affinely-invariant stable subset of extremal regions, the maximally stable extremal region (MSER).

...read moreread less

Abstract: The wide-baseline stereo problem, i.e. the problem of establishing correspondences between a pair of images taken from different viewpoints is studied. A new set of image elements that are put into correspondence, the so called extremal regions , is introduced. Extremal regions possess highly desirable properties: the set is closed under (1) continuous (and thus projective) transformation of image coordinates and (2) monotonic transformation of image intensities. An efficient (near linear complexity) and practically fast detection algorithm (near frame rate) is presented for an affinely invariant stable subset of extremal regions, the maximally stable extremal regions (MSER). A new robust similarity measure for establishing tentative correspondences is proposed. The robustness ensures that invariants from multiple measurement regions (regions obtained by invariant constructions from extremal regions), some that are significantly larger (and hence discriminative) than the MSERs, may be used to establish tentative correspondences. The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes. Significant change of scale (3.5×), illumination conditions, out-of-plane rotation, occlusion, locally anisotropic scale change and 3D translation of the viewpoint are all present in the test problems. Good estimates of epipolar geometry (average distance from corresponding points to the epipolar line below 0.09 of the inter-pixel distance) are obtained.

...read moreread less

3,400 citations

Book Chapter•DOI•

Kademlia: A Peer-to-Peer Information System Based on the XOR Metric

[...]

Petar Maymounkov¹, David Mazières¹•Institutions (1)

New York University¹

07 Mar 2002

TL;DR: In this paper, the authors describe a peer-to-peer distributed hash table with provable consistency and performance in a fault-prone environment, which routes queries and locates nodes using a novel XOR-based metric topology.

...read moreread less

Abstract: We describe a peer-to-peer distributed hash table with provable consistency and performance in a fault-prone environment. Our system routes queries and locates nodes using a novel XOR-based metric topology that simplifies the algorithm and facilitates our proof. The topology has the property that every message exchanged conveys or reinforces useful contact information. The system exploits this information to send parallel, asynchronous query messages that tolerate node failures without imposing timeout delays on users.

...read moreread less

3,196 citations

Proceedings Article•

Distance Metric Learning with Application to Clustering with Side-Information

[...]

Eric P. Xing¹, Michael I. Jordan¹, Stuart Russell¹, Andrew Y. Ng¹•Institutions (1)

University of California, Berkeley¹

01 Jan 2002

TL;DR: This paper presents an algorithm that, given examples of similar (and, if desired, dissimilar) pairs of points in �”n, learns a distance metric over ℝn that respects these relationships.

...read moreread less

Abstract: Many algorithms rely critically on being given a good metric over their inputs. For instance, data can often be clustered in many "plausible" ways, and if a clustering algorithm such as K-means initially fails to find one that is meaningful to a user, the only recourse may be for the user to manually tweak the metric until sufficiently good clusters are found. For these and other applications requiring good metrics, it is desirable that we provide a more systematic way for users to indicate what they consider "similar." For instance, we may ask them to provide examples. In this paper, we present an algorithm that, given examples of similar (and, if desired, dissimilar) pairs of points in ℝn, learns a distance metric over ℝn that respects these relationships. Our method is based on posing metric learning as a convex optimization problem, which allows us to give efficient, local-optima-free algorithms. We also demonstrate empirically that the learned metrics can be used to significantly improve clustering performance.

...read moreread less

3,176 citations

Journal Article•DOI•

Combining effect size estimates in meta-analysis with repeated measures and independent-groups designs.

[...]

Scott B. Morris¹, Richard P. DeShon²•Institutions (2)

Illinois Institute of Technology¹, Michigan State University²

01 Mar 2002-Psychological Methods

TL;DR: In this paper, a method for combining results across independent-groups and repeated measures designs is described, and the conditions under which such an analysis is appropriate are discussed, and a meta-analysis procedure using design-specific estimates of sampling variance is described.

...read moreread less

Abstract: When a meta-analysis on results from experimental studies is conducted, differences in the study design must be taken into consideration. A method for combining results across independent-groups and repeated measures designs is described, and the conditions under which such an analysis is appropriate are discussed. Combining results across designs requires that (a) all effect sizes be transformed into a common metric, (b) effect sizes from each design estimate the same treatment effect, and (c) meta-analysis procedures use design-specific estimates of sampling variance to reflect the precision of the effect size estimates.

...read moreread less

1,949 citations

Book Chapter•DOI•

Chapter 36 – Exact Indexing of Dynamic Time Warping

[...]

Eamonn Keogh¹•Institutions (1)

University of California, Riverside¹

01 Jan 2002

TL;DR: Dynamic time warping (DTW) is a much more robust distance measure for time series, allowing similar shapes to match even if they are out of phase in the time axis, but does not obey the triangular inequality and, thus, has resisted attempts at exact indexing.

...read moreread less

Abstract: Publisher Summary The indexing of very large time series databases has attracted the attention of database community in recent years. The vast majority of work in this area has focused on indexing under the Euclidean distance metric. The problem of indexing time series has attracted much research interest in the database community. Most algorithms that are used to index time series utilize the Euclidean distance or some variation thereof. However, it has been forcefully shown that the Euclidean distance is a very brittle distance measure. Dynamic time warping (DTW) is a much more robust distance measure for time series, allowing similar shapes to match even if they are out of phase in the time axis. Because of this flexibility, DTW is widely used in science, medicine, industry, and finance. Unfortunately, however, DTW does not obey the triangular inequality and, thus, has resisted attempts at exact indexing. Instead, many researchers have introduced approximate indexing techniques, or abandoned the idea of indexing and concentrated on speeding up sequential search.

...read moreread less

1,033 citations

Journal Article•DOI•

On the use of higher-order finite-difference schemes on curvilinear and deforming meshes

[...]

Miguel R. Visbal¹, Datta Gaitonde¹•Institutions (1)

Air Force Research Laboratory¹

01 Sep 2002-Journal of Computational Physics

TL;DR: A simple technique is adopted which ensures metric cancellation and thus ensures freestream preservation even on highly distorted curvilinear meshes, and metric cancellation is guaranteed regardless of the manner in which grid speeds are defined.

...read moreread less

950 citations

Proceedings Article•DOI•

MESH: measuring errors between surfaces using the Hausdorff distance

[...]

Nicolas Aspert, Diego Santa-Cruz, Touradj Ebrahimi

07 Nov 2002

TL;DR: An efficient method to estimate the distance between discrete 3D surfaces represented by triangular 3D meshes based on an approximation of the Hausdorff distance is proposed.

...read moreread less

Abstract: This paper proposes an efficient method to estimate the distance between discrete 3D surfaces represented by triangular 3D meshes. The metric used is based on an approximation of the Hausdorff distance, which has been appropriately implemented in order to reduce unnecessary computation and memory usage. Results show that when compared to similar tools, a significant gain in both memory and speed can be achieved.

...read moreread less

751 citations

Proceedings Article•DOI•

A comparison of algorithms for maximum entropy parameter estimation

[...]

Robert Malouf¹•Institutions (1)

ALFA¹

31 Aug 2002

TL;DR: A number of algorithms for estimating the parameters of ME models are considered, including iterative scaling, gradient ascent, conjugate gradient, and variable metric methods.

...read moreread less

Abstract: Conditional maximum entropy (ME) models provide a general purpose machine learning technique which has been successfully applied to fields as diverse as computer vision and econometrics, and which is used for a wide variety of classification problems in natural language processing. However, the flexibility of ME models is not without cost. While parameter estimation for ME models is conceptually straightforward, in practice ME models for typical natural language tasks are very large, and may well contain many thousands of free parameters. In this paper, we consider a number of algorithms for estimating the parameters of ME models, including iterative scaling, gradient ascent, conjugate gradient, and variable metric methods. Sur-prisingly, the standardly used iterative scaling algorithms perform quite poorly in comparison to the others, and for all of the test problems, a limited-memory variable metric algorithm outperformed the other choices.

...read moreread less

730 citations

Proceedings Article•DOI•

A no-reference perceptual blur metric

[...]

Pina Marziliano, Frederic Dufaux, Stefan Winkler, Touradj Ebrahimi

10 Dec 2002

TL;DR: A no-reference blur metric based on the analysis of the spread of the edges in an image is presented, which is shown to perform well over a range of image content.

...read moreread less

Abstract: We present a no-reference blur metric for images and video. The blur metric is based on the analysis of the spread of the edges in an image. Its perceptual significance is validated through subjective experiments. The novel metric is near real-time, has low computational complexity and is shown to perform well over a range of image content. Potential applications include optimization of source coding, network resource management and autofocus of an image capturing device.

...read moreread less

643 citations

Journal Article•DOI•

Approximation algorithms for classification problems with pairwise relationships: metric labeling and Markov random fields

[...]

Jon Kleinberg¹, Éva Tardos¹•Institutions (1)

Cornell University¹

01 Sep 2002-Journal of the ACM

TL;DR: The first nontrivial polynomial-time approximation algorithms for a general family of classification problems of this type are provided, the metric labeling problem, which contains as special cases a number of standard classification frameworks, including several arising from the theory of Markov random fields.

...read moreread less

Abstract: In a traditional classification problem, we wish to assign one of k labels (or classes) to each of n objects, in a way that is consistent with some observed data that we have about the problem. An active line of research in this area is concerned with classification when one has information about pairwise relationships among the objects to be classified; this issue is one of the principal motivations for the framework of Markov random fields, and it arises in areas such as image processing, biometry, and document analysis. In its most basic form, this style of analysis seeks to find a classification that optimizes a combinatorial function consisting of assignment costs---based on the individual choice of label we make for each object---and separation costs---based on the pair of choices we make for two "related" objects.We formulate a general classification problem of this type, the metric labeling problem; we show that it contains as special cases a number of standard classification frameworks, including several arising from the theory of Markov random fields. From the perspective of combinatorial optimization, our problem can be viewed as a substantial generalization of the multiway cut problem, and equivalent to a type of uncapacitated quadratic assignment problem.We provide the first nontrivial polynomial-time approximation algorithms for a general family of classification problems of this type. Our main result is an O(log k log log k)-approximation algorithm for the metric labeling problem, with respect to an arbitrary metric on a set of k labels, and an arbitrary weighted graph of relationships on a set of objects. For the special case in which the labels are endowed with the uniform metric---all distances are the same---our methods provide a 2-approximation algorithm.

...read moreread less

502 citations

Journal Article•DOI•

A critique and improvement of an evaluation metric for text segmentation

[...]

Lev Pevzner¹, Marti A. Hearst²•Institutions (2)

Harvard University¹, University of California, Berkeley²

01 Mar 2002-Computational Linguistics

TL;DR: A simple modification to the Pk metric is proposed, called Window Diff, which moves a fixed-sized window across the text and penalizes the algorithm whenever the number of boundaries within the window does not match the true number of borders for that window of text.

...read moreread less

Abstract: The Pk evaluation metric, initially proposed by Beeferman, Berger, and Lafferty (1997), is becoming the standard measure for assessing text segmentation algorithms. However, a theoretical analysis of the metric finds several problems: the metric penalizes false negatives more heavily than false positives, overpenalizes near misses, and is affected by variation in segment size distribution. We propose a simple modification to the Pk metric that remedies these problems. This new metric-called WindowDiff-moves a fixed-sized window across the text and penalizes the algorithm whenever the number of boundaries within the window does not match the true number of boundaries for that window of text.

...read moreread less

Posted Content•

Greedy Facility Location Algorithms Analyzed using Dual Fitting with Factor-Revealing LP

[...]

Kamal Jain¹, Mohammad Mahdian², Evangelos Markakis³, Amin Saberi³, Vijay V. Vazirani³ - Show less +1 more•Institutions (3)

Microsoft¹, Massachusetts Institute of Technology², Georgia Institute of Technology³

09 Jul 2002-arXiv: Data Structures and Algorithms

TL;DR: The method of dual fitting and the idea of factor-revealing LP are formalized and used to design and analyze two greedy algorithms for the metric uncapacitated facility location problem.

...read moreread less

Abstract: In this paper, we will formalize the method of dual fitting and the idea of factor-revealing LP. This combination is used to design and analyze two greedy algorithms for the metric uncapacitated facility location problem. Their approximation factors are 1.861 and 1.61, with running times of O(mlog m) and O(n^3), respectively, where n is the total number of vertices and m is the number of edges in the underlying complete bipartite graph between cities and facilities. The algorithms are used to improve recent results for several variants of the problem.

...read moreread less

IP Packet Delay Variation Metric for IP Performance Metrics (IPPM)

[...]

C. Demichelis, P.F. Chimento

01 Nov 2002

TL;DR: This document refers to a metric for variation in delay of packets across Internet paths based on the difference in the One-Way-Delay of selected packets.

...read moreread less

Abstract: This document refers to a metric for variation in delay of packets across Internet paths. The metric is based on the difference in the One-Way-Delay of selected packets. This difference in delay is called "IP Packet Delay Variation (ipdv)".

...read moreread less

Journal Article•DOI•

Automated Derivation of Primitives for Movement Classification

[...]

Ajo Fod¹, Maja J. Matarić¹, Odest Chadwicke Jenkins¹•Institutions (1)

University of Southern California¹

01 Jan 2002-Autonomous Robots

TL;DR: In this paper, a method for representing human movement compactly, in terms of a linear super-imposition of simpler movements termed i>primitives, is described, which is a part of a larger research project aimed at modeling motor control and imitation using the notion of perceptuo-motor primitives.

...read moreread less

Abstract: We describe a new method for representing human movement compactly, in terms of a linear super-imposition of simpler movements termed i>primitives. This method is a part of a larger research project aimed at modeling motor control and imitation using the notion of perceptuo-motor primitives, a basis set of coupled perceptual and motor routines. In our model, the perceptual system is biased by the set of motor behaviors the agent can execute. Thus, an agent can automatically classify observed movements into its executable repertoire. In this paper, we describe a method for automatically deriving a set of primitives directly from human movement data. We used movement data gathered from a psychophysical experiment on human imitation to derive the primitives. The data were first filtered, then segmented, and principal component analysis was applied to the segments. The eigenvectors corresponding to a few of the highest eigenvalues provide us with a basis set of primitives. These are used, through superposition and sequencing, to reconstruct the training movements as well as novel ones. The validation of the method was performed on a humanoid simulation with physical dynamics. The effectiveness of the motion reconstruction was measured through an error metric. We also explored and evaluated a technique of clustering in the space of primitives for generating controllers for executing frequently used movements.

...read moreread less

Journal Article•DOI•

Locally adaptive metric nearest-neighbor classification

[...]

Carlotta Domeniconi¹, Jing Peng², Dimitrios Gunopulos¹•Institutions (2)

University of California, Riverside¹, Tulane University²

01 Sep 2002-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A chi-squared distance analysis is used to compute a flexible metric for producing neighborhoods that are highly adaptive to query locations and the class conditional probabilities are smoother in the modified neighborhoods, whereby better classification performance can be achieved.

...read moreread less

Abstract: Nearest-neighbor classification assumes locally constant class conditional probabilities. This assumption becomes invalid in high dimensions with finite samples due to the curse of dimensionality. Severe bias can be introduced under these conditions when using the nearest-neighbor rule. We propose a locally adaptive nearest-neighbor classification method to try to minimize bias. We use a chi-squared distance analysis to compute a flexible metric for producing neighborhoods that are highly adaptive to query locations. Neighborhoods are elongated along less relevant feature dimensions and constricted along most influential ones. As a result, the class conditional probabilities are smoother in the modified neighborhoods, whereby better classification performance can be achieved. The efficacy of our method is validated and compared against other techniques using both simulated and real-world data.

...read moreread less

Book Chapter•DOI•

Towards an information theoretic metric for anonymity

[...]

Andrei Serjantov¹, George Danezis¹•Institutions (1)

University of Cambridge¹

14 Apr 2002

TL;DR: In this paper, an alternative information theoretic measure of anonymity is proposed, which takes into account the probabilities of users sending and receiving the messages and shows how to calculate it for a message in a standard mix-based anonymity system.

...read moreread less

Abstract: In this paper we look closely at the popular metric of anonymity, the anonymity set, and point out a number of problems associated with it. We then propose an alternative information theoretic measure of anonymity which takes into account the probabilities of users sending and receiving the messages and show how to calculate it for a message in a standard mix-based anonymity system. We also use our metric to compare a pool mix to a traditional threshold mix, which was impossible using anonymity sets. We also show how the maximum route length restriction which exists in some fielded anonymity systems can lead to the attacker performing more powerful traffic analysis. Finally, we discuss open problems and future work on anonymity measurements.

...read moreread less

A peer-to-peer information system based on the XOR metric

[...]

Petar Maymounkov

01 Jan 2002

TL;DR: A peer-to-peer distributed hash table with provable consistency and performance in a fault-prone environment is described using a novel XOR-based metric topology that simplifies the algorithm and facilitates the proof.

...read moreread less

Journal Article•DOI•

Positive mass theorem on manifolds admitting corners along a hypersurface

[...]

Pengzi Miao¹•Institutions (1)

Stanford University¹

05 Dec 2002-Advances in Theoretical and Mathematical Physics

TL;DR: In this paper, a class of non-smooth asymptotically flat manifolds on which metrics fails to be $C^1$ across a hypersurface is studied, and the Positive Mass Theorem still holds on these manifolds if a geometric boundary condition is satisfied by metrics separated by σ.

...read moreread less

Abstract: We study a class of non-smooth asymptotically flat manifolds on which metrics fails to be $C^1$ across a hypersurface $\Sigma$. We first give an approximation scheme to mollify the metric, then we prove that the Positive Mass Theorem still holds on these manifolds if a geometric boundary condition is satisfied by metrics separated by $\Sigma$.

...read moreread less

Journal Article•DOI•

Bounds on packings of spheres in the Grassmann manifold

[...]

Alexander Barg¹, D.Yu. Nogin•Institutions (1)

Bell Labs¹

01 Sep 2002-IEEE Transactions on Information Theory

TL;DR: The Gilbert-Varshamov and Hamming bounds for packings of spheres (codes) in the Grassmann manifolds over R and C are derived.

...read moreread less

Abstract: We derive the Gilbert-Varshamov and Hamming bounds for packings of spheres (codes) in the Grassmann manifolds over R and C. Asymptotic expressions are obtained for the geodesic metric and projection Frobenius (chordal) metric on the manifold.

...read moreread less

Journal Article•DOI•

Quasisymmetric parametrizations of two-dimensional metric spheres

[...]

Mario Bonk¹, Bruce Kleiner¹•Institutions (1)

University of Michigan¹

01 Oct 2002-Inventiones Mathematicae

TL;DR: In this paper, a uniformization theory for a different type of generalized conformal structure is developed for a smooth Riemannian surface Z homeomorphic to the 2-sphere.

...read moreread less

Abstract: According to the classical uniformization theorem, every smooth Riemannian surface Z homeomorphic to the 2-sphere is conformally diffeomorphic to S2 (the unit sphere inR3 equipped with the Riemannian metric induced by the ambient Euclidean metric). The availability of a similar uniformization procedure for spheres with a “generalized conformal structure” is highly desirable, in particular in connection with Thurston’s hyperbolization conjecture. This was addressed by Cannon in his combinatorial Riemann mapping theorem [7]. He considers topological surfaces equipped with a sequence of “shinglings”—a combinatorial structure that leads to a notion of approximate conformal moduli of rings. He then finds conditions that imply the existence of coordinate systems on the surface that relate these combinatorial moduli to classical analytic moduli in the plane. In this paper we develop a uniformization theory for a different type of generalized conformal structure. We start with a metric space Z homeomorphic to S2 and ask for conditions under which Z can be mapped onto S2 by a quasisymmetric homeomorphism. The class of quasisymmetries is an appropriate analog of conformal1 mappings in a metric space context. Quasisymmetric homeomorphisms also arise in the theory of Gromov hyperbolic metric spaces—quasi-isometries between Gromov hyperbolic spaces induce quasisymmetric boundary homeomorphisms. Our setup has the advantage that we can exploit recent notions and methods from Analysis

...read moreread less

Posted Content•

On choosing and bounding probability metrics

[...]

Alison L. Gibbs¹, Francis Edward Su²•Institutions (2)

Keele University¹, Harvey Mudd College²

03 Sep 2002-arXiv: Probability

TL;DR: This work provides a summary and some new results concerning bounds among some important probability metrics/distances that are used by statisticians and probabilists and examples that show that rates of convergence can strongly depend on the metric chosen.

...read moreread less

Abstract: When studying convergence of measures, an important issue is the choice of probability metric. In this review, we provide a summary and some new results concerning bounds among ten important probability metrics/distances that are used by statisticians and probabilists. We focus on these metrics because they are either well-known, commonly used, or admit practical bounding techniques. We summarize these relationships in a handy reference diagram, and also give examples to show how rates of convergence can depend on the metric chosen.

...read moreread less

Book Chapter•DOI•

An Improved Approximation Algorithm for the Metric Uncapacitated Facility Location Problem

[...]

Maxim Sviridenko¹•Institutions (1)

IBM¹

27 May 2002

TL;DR: A new approximation algorithm for the metric uncapacitated facility location problem is designed, of LP rounding type and is based on a rounding technique developed in [5,6,7].

...read moreread less

Abstract: We design a new approximation algorithm for the metric uncapacitated facility location problem. This algorithm is of LP rounding type and is based on a rounding technique developed in [5,6,7].

...read moreread less

Journal Article•DOI•

Diversity of order and densities in jammed hard-particle packings

[...]

Anuraag R. Kansal¹, Salvatore Torquato¹, Frank H. Stillinger¹•Institutions (1)

Princeton University¹

24 Oct 2002-Physical Review E

TL;DR: This investigation shows that, even in the large-system limit, jammed systems of hard spheres can be generated with a wide range of packing fractions from phi approximately 0.52 to the fcc limit, indicating that the density alone does not uniquely characterize a packing.

...read moreread less

Abstract: Recently the conventional notion of random close packing has been supplanted by the more appropriate concept of the maximally random jammed (MRJ) state. This inevitably leads to the necessity of distinguishing the MRJ state among the entire collection of jammed packings. While the ideal method of addressing this question would be to enumerate and classify all possible jammed hard-sphere configurations, practical limitations prevent such a method from being employed. Instead, we generate numerically a large number of representative jammed hard-sphere configurations (primarily relying on a slight modification of the Lubachevsky-Stillinger algorithm to do so) and evaluate several commonly employed order metrics for each of these packings. Our investigation shows that, even in the large-system limit, jammed systems of hard spheres can be generated with a wide range of packing fractions from phi approximately 0.52 to the fcc limit (phi approximately 0.74). Moreover, at a fixed packing fraction, the variation in the order can be substantial, indicating that the density alone does not uniquely characterize a packing. Interestingly, each order metric evaluated yielded a relatively consistent estimate for the packing fraction of the maximally random jammed state (phi(MRJ) approximately 0.63). This estimate, however, is compromised by the weaknesses in the order metrics available, and we propose several guiding principles for future efforts to define more broadly applicable metrics.

...read moreread less

Proceedings Article•DOI•

Signal-specialized parametrization

[...]

Pedro V. Sander¹, Steven J. Gortler¹, John Snyder², Hugues Hoppe²•Institutions (2)

Harvard University¹, Microsoft²

26 Jul 2002

TL;DR: This work builds a surface parametrization specialized to its signal, derived from a Taylor expansion of signal error, which is pre-integrated over the surface as a metric tensor for fast evaluation.

...read moreread less

Abstract: To reduce memory requirements for texture mapping a model, we build a surface parametrization specialized to its signal (such as color or normal). Intuitively, we want to allocate more texture samples in regions with greater signal detail. Our approach is to minimize signal approximation error --- the difference between the original surface signal and its reconstruction from the sampled texture. Specifically, our signal-stretch parametrization metric is derived from a Taylor expansion of signal error. For fast evaluation, this metric is pre-integrated over the surface as a metric tensor. We minimize this nonlinear metric using a novel coarse-to-fine hierarchical solver, further accelerated with a fine-to-coarse propagation of the integrated metric tensor. Use of metric tensors permits anisotropic squashing of the parametrization along directions of low signal gradient. Texture area can often be reduced by a factor of 4 for a desired signal accuracy compared to non-specialized parametrizations.

...read moreread less

Proceedings Article•DOI•

Mining long sequential patterns in a noisy environment

[...]

Jiong Yang¹, Wei Wang¹, Philip S. Yu¹, Jiawei Han²•Institutions (2)

IBM¹, University of Illinois at Urbana–Champaign²

03 Jun 2002

TL;DR: The concept of compatibility matrix is introduced as the means to provide a probabilistic connection from the observation to the underlying true value and a new metric match is proposed to capture the "real support" of a pattern which would be expected if a noise-free environment is assumed.

...read moreread less

Abstract: Pattern discovery in long sequences is of great importance in many applications including computational biology study, consumer behavior analysis, system performance analysis, etc. In a noisy environment, an observed sequence may not accurately reflect the underlying behavior. For example, in a protein sequence, the amino acid N is likely to mutate to D with little impact to the biological function of the protein. It would be desirable if the occurrence of D in the observation can be related to a possible mutation from N in an appropriate manner. Unfortunately, the support measure (i.e., the number of occurrences) of a pattern does not serve this purpose. In this paper, we introduce the concept of compatibility matrix as the means to provide a probabilistic connection from the observation to the underlying true value. A new metric match is also proposed to capture the "real support" of a pattern which would be expected if a noise-free environment is assumed. In addition, in the context we address, a pattern could be very long. The standard pruning technique developed for the market basket problem may not work efficiently. As a result, a novel algorithm that combines statistical sampling and a new technique (namely border collapsing) is devised to discover long patterns in a minimal number of scans of the sequence database with sufficiently high confidence. Empirical results demonstrate the robustness of the match model (with respect to the noise) and the efficiency of the probabilistic algorithm.

...read moreread less

Journal Article•DOI•

A cognitive complexity metric applied to cognitive development.

[...]

Glenda Andrews¹, Graeme S. Halford²•Institutions (2)

Griffith University¹, University of Queensland²

01 Sep 2002-Cognitive Psychology

TL;DR: Two experiments tested predictions from a theory in which processing load depends on relational complexity (RC), the number of variables related in a single decision, and the RC approach to defining cognitive complexity is applicable to different content domains.

...read moreread less

Proceedings Article•DOI•

Finite metric spaces: combinatorics, geometry and algorithms

[...]

Nathan Linial¹•Institutions (1)

Hebrew University of Jerusalem¹

05 Jun 2002

TL;DR: In the last several years, a number of very interesting results have been proved about finite metric spaces as mentioned in this paper, and many interesting open problems in this area have been discussed in the literature.

...read moreread less

Abstract: In the last several years a number of very interesting results were proved about finite metric spaces. Some of this work is motivated by practical considerations: Large data sets (coming e.g. from computational molecular biology, brain research or data mining) can be viewed as large metric spaces that should be analyzed (e.g. correctly clustered).On the other hand, these investigations connect to some classical areas of geometry - the asymptotic theory of finite-dimensional normed spaces and differential geometry. Finally, the metric theory of finite graphs has proved very useful in the study of graphs per se and the design of approximation algorithms for hard computational problems. In this talk I will try to explain some of the results and review some of the emerging new connections and the many fascinating open problems in this area.

...read moreread less

Proceedings Article•DOI•

Approximate XML joins

[...]

Sudipto Guha¹, H. V. Jagadish², Nick Koudas³, Divesh Srivastava³, Ting Yu⁴ - Show less +1 more•Institutions (4)

University of Pennsylvania¹, University of Michigan², AT&T Labs³, University of Illinois at Urbana–Champaign⁴

03 Jun 2002

TL;DR: This paper defines what constitutes a good choice of a reference set and proposes sampling based algorithms to identify them and demonstrates the practical utility of the solutions using large collections of real and synthetic XML data sets.

...read moreread less

Abstract: XML is widely recognized as the data interchange standard for tomorrow, because of its ability to represent data from a wide variety sources. Hence, XML is likely to be the format through which data from multiple sources is integrated.In this paper we study the problem of integrating XML data sources through correlations realized as join operations. A challenging aspect of this operation is the XML document structure. Two documents might convey approximately or exactly the same information but may be quite different in structure. Consequently approximate match in structure, in addition to, content has to be folded in the join operation. We quantify approximate match in structure and content using well defined notions of distance. For structure, we propose computationally inexpensive lower and upper bounds for the tree edit distance metric between two trees. We then show how the tree edit distance, and other metrics that quantify distance between trees, can be incorporated in a join framework. We introduce the notion of reference sets to facilitate this operation. Intuitively, a reference set consists of data elements used to project the data space. We characterize what constitutes a good choice of a reference set and we propose sampling based algorithms to identify them. This gives rise to a variety of algorithmic approaches for the problem, which we formulate and analyze. We demonstrate the practical utility of our solutions using large collections of real and synthetic XML data sets.

...read moreread less

Book Chapter•DOI•

Non-negative Matrix Factorization for Face Recognition

[...]

David Guillamet¹, Jordi Vitrià¹•Institutions (1)

Autonomous University of Barcelona¹

24 Oct 2002-Lecture Notes in Computer Science

TL;DR: Non-negative Matrix Factorization (NMF) technique is introduced in the context of face classification and a direct comparison with Principal Component Analysis (PCA) is also analyzed.

...read moreread less

Abstract: The computer vision problem of face classification under several ambient and unfavorable conditions is considered in this study Changes in expression, different lighting conditions and occlusions are the relevant factors that are studied in this present contribution Non-negative Matrix Factorization (NMF) technique is introduced in the context of face classification and a direct comparison with Principal Component Analysis (PCA) is also analyzed Two leading techniques in face recognition are also considered in this study noticing that NMF is able to improve these techniques when a high dimensional feature space is used Finally, different distance metrics (L1, L2 and correlation) are evaluated in the feature space defined by NMF in order to determine the best one for this specific problem Experiments demonstrate that the correlation is the most suitable metric for this problem

...read moreread less

Journal Article•DOI•

Probabilistic Tracking with Exemplars in a Metric Space

[...]

Kentaro Toyama¹, Andrew Blake¹•Institutions (1)

Microsoft¹

01 Jun 2002-International Journal of Computer Vision

TL;DR: A new, exemplar-based, probabilistic paradigm for visual tracking is presented, which provides alternatives to standard learning algorithms by allowing the use of metrics that are not embedded in a vector space and uses a noise model that is learned from training data.

...read moreread less

Abstract: A new, exemplar-based, probabilistic paradigm for visual tracking is presented. Probabilistic mechanisms are attractive because they handle fusion of information, especially temporal fusion, in a principled manner. Exemplars are selected representatives of raw training data, used here to represent probabilistic mixture distributions of object configurations. Their use avoids tedious hand-construction of object models, and problems with changes of topology. Using exemplars in place of a parameterized model poses several challenges, addressed here with what we call the “Metric Mixture” (M2) approach, which has a number of attractions. Principally, it provides alternatives to standard learning algorithms by allowing the use of metrics that are not embedded in a vector space. Secondly, it uses a noise model that is learned from training data. Lastly, it eliminates any need for an assumption of probabilistic pixelwise independence. Experiments demonstrate the effectiveness of the M2 model in two domains: tracking walking people using “chamfer” distances on binary edge images, and tracking mouth movements by means of a shuffle distance.

...read moreread less

Collapse