scispace - formally typeset
Search or ask a question

Showing papers on "Metric (mathematics) published in 2010"


Journal ArticleDOI
TL;DR: It is shown that targeted transport processes without global topology knowledge are maximally efficient, according to all efficiency measures, in networks with strongest heterogeneity and clustering, and that this efficiency is remarkably robust with respect to even catastrophic disturbances and damages to the network structure.
Abstract: We develop a geometric framework to study the structure and function of complex networks. We assume that hyperbolic geometry underlies these networks, and we show that with this assumption, heterogeneous degree distributions and strong clustering in complex networks emerge naturally as simple reflections of the negative curvature and metric property of the underlying hyperbolic geometry. Conversely, we show that if a network has some metric structure, and if the network degree distribution is heterogeneous, then the network has an effective hyperbolic geometry underneath. We then establish a mapping between our geometric framework and statistical mechanics of complex networks. This mapping interprets edges in a network as noninteracting fermions whose energies are hyperbolic distances between nodes, while the auxiliary fields coupled to edges are linear functions of these energies or distances. The geometric network ensemble subsumes the standard configuration model and classical random graphs as two limiting cases with degenerate geometric structures. Finally, we show that targeted transport processes without global topology knowledge, made possible by our geometric framework, are maximally efficient, according to all efficiency measures, in networks with strongest heterogeneity and clustering, and that this efficiency is remarkably robust with respect to even catastrophic disturbances and damages to the network structure.

1,002 citations


Journal ArticleDOI
TL;DR: The generalized metric as discussed by the authors is a covariant symmetric matrix constructed from the metric and two-form gauge field and arises in generalized geometry and can be viewed as a metric on the doubled spacetime and use it to give a simple formulation with manifest T-duality of the double field theory that describes the massless sector of closed strings.
Abstract: The generalized metric is a T-duality covariant symmetric matrix constructed from the metric and two-form gauge field and arises in generalized geometry. We view it here as a metric on the doubled spacetime and use it to give a simple formulation with manifest T-duality of the double field theory that describes the massless sector of closed strings. The gauge transformations are written in terms of a generalized Lie derivative whose commutator algebra is defined by a double field theory extension of the Courant bracket.

743 citations


Book ChapterDOI
08 Nov 2010
TL;DR: This paper proposes a new method, named the Cosine Similarity Metric Learning (CSML) for learning a distance metric for facial verification, which has achieved the highest accuracy in the literature.
Abstract: Face verification is the task of deciding by analyzing face images, whether a person is who he/she claims to be. This is very challenging due to image variations in lighting, pose, facial expression, and age. The task boils down to computing the distance between two face vectors. As such, appropriate distance metrics are essential for face verification accuracy. In this paper we propose a new method, named the Cosine Similarity Metric Learning (CSML) for learning a distance metric for facial verification. The use of cosine similarity in our method leads to an effective learning algorithm which can improve the generalization ability of any given metric. Our method is tested on the state-of-the-art dataset, the Labeled Faces in the Wild (LFW), and has achieved the highest accuracy in the literature.

626 citations


Journal ArticleDOI
TL;DR: A no-reference metric Q is proposed which is based upon singular value decomposition of local image gradient matrix, and provides a quantitative measure of true image content in the presence of noise and other disturbances, and is used to automatically and effectively set the parameters of two leading image denoising algorithms.
Abstract: Across the field of inverse problems in image and video processing, nearly all algorithms have various parameters which need to be set in order to yield good results. In practice, usually the choice of such parameters is made empirically with trial and error if no “ground-truth” reference is available. Some analytical methods such as cross-validation and Stein's unbiased risk estimate (SURE) have been successfully used to set such parameters. However, these methods tend to be strongly reliant on restrictive assumptions on the noise, and also computationally heavy. In this paper, we propose a no-reference metric Q which is based upon singular value decomposition of local image gradient matrix, and provides a quantitative measure of true image content (i.e., sharpness and contrast as manifested in visually salient geometric features such as edges,) in the presence of noise and other disturbances. This measure 1) is easy to compute, 2) reacts reasonably to both blur and random noise, and 3) works well even when the noise is not Gaussian. The proposed measure is used to automatically and effectively set the parameters of two leading image denoising algorithms. Ample simulated and real data experiments support our claims. Furthermore, tests using the TID2008 database show that this measure correlates well with subjective quality evaluations for both blur and noise distortions.

388 citations


Journal ArticleDOI
TL;DR: In this paper, a physical explanation of the Kontsevich-Soibelman wall-crossing formula for the BPS spectrum in Seiberg-Witten theories is given.
Abstract: We give a physical explanation of the Kontsevich-Soibelman wall-crossing formula for the BPS spectrum in Seiberg-Witten theories. In the process we give an exact description of the BPS instanton corrections to the hyperkahler metric of the moduli space of the theory on \({\mathbb R^3 \times S^1}\). The wall-crossing formula reduces to the statement that this metric is continuous. Our construction of the metric uses a four-dimensional analogue of the two-dimensional tt* equations.

373 citations


Proceedings Article
21 Jun 2010
TL;DR: A general metric learning algorithm is presented, based on the structural SVM framework, to learn a metric such that rankings of data induced by distance from a query can be optimized against various ranking measures, such as AUC, Precision-at-k, MRR, MAP or NDCG.
Abstract: We study metric learning as a problem of information retrieval. We present a general metric learning algorithm, based on the structural SVM framework, to learn a metric such that rankings of data induced by distance from a query can be optimized against various ranking measures, such as AUC, Precision-at-k, MRR, MAP or NDCG. We demonstrate experimental results on standard classification data sets, and a large-scale online dating recommendation problem.

371 citations


Journal ArticleDOI
TL;DR: The present findings indicate that the FCR is a sensitive, valid, and reliable acoustic metric for distinguishing dysarthric from unimpaired speech and for monitoring treatment effects, probably because of reduced sensitivity to interspeaker variability and enhanced sensitivity to vowel centralization.
Abstract: Purpose The vowel space area (VSA) has been used as an acoustic metric of dysarthric speech, but with varying degrees of success. In this study, the authors aimed to test an alternative metric to t...

316 citations


Journal Article
TL;DR: It is shown that within this framework, one can prove a theorem analogous to one of Kleinberg (2002), in which one obtains an existence and uniqueness theorem instead of a non-existence result.
Abstract: We study hierarchical clustering schemes under an axiomatic view. We show that within this framework, one can prove a theorem analogous to one of Kleinberg (2002), in which one obtains an existence and uniqueness theorem instead of a non-existence result. We explore further properties of this unique scheme: stability and convergence are established. We represent dendrograms as ultrametric spaces and use tools from metric geometry, namely the Gromov-Hausdorff distance, to quantify the degree to which perturbations in the input metric space affect the result of hierarchical methods.

306 citations


Journal ArticleDOI
TL;DR: This paper explores the applicability of diffusion distances within the Gromov-Hausdorff framework and finds that in addition to the relatively low complexity involved in the computation of the diffusion distances between surface points, its recognition and matching performances favorably compare to the classical geodesic distances in the presence of topological changes between the non-rigid shapes.
Abstract: In this paper, the problem of non-rigid shape recognition is studied from the perspective of metric geometry. In particular, we explore the applicability of diffusion distances within the Gromov-Hausdorff framework. While the traditionally used geodesic distance exploits the shortest path between points on the surface, the diffusion distance averages all paths connecting the points. The diffusion distance constitutes an intrinsic metric which is robust, in particular, to topological changes. Such changes in the form of shortcuts, holes, and missing data may be a result of natural non-rigid deformations as well as acquisition and representation noise due to inaccurate surface construction. The presentation of the proposed framework is complemented with examples demonstrating that in addition to the relatively low complexity involved in the computation of the diffusion distances between surface points, its recognition and matching performances favorably compare to the classical geodesic distances in the presence of topological changes between the non-rigid shapes.

306 citations


Proceedings ArticleDOI
Ravi Kumar1, Sergei Vassilvitskii1
26 Apr 2010
TL;DR: This work extends Spearman's footrule and Kendall's tau to those with position and element weights, and shows that a variant of the Diaconis-Graham inequality still holds - the generalized two measures remain within a constant factor of each other for all permutations.
Abstract: Spearman's footrule and Kendall's tau are two well established distances between rankings. They, however, fail to take into account concepts crucial to evaluating a result set in information retrieval: element relevance and positional information. That is, changing the rank of a highly-relevant document should result in a higher penalty than changing the rank of an irrelevant document; a similar logic holds for the top versus the bottom of the result ordering. In this work, we extend both of these metrics to those with position and element weights, and show that a variant of the Diaconis-Graham inequality still holds - the generalized two measures remain within a constant factor of each other for all permutations.We continue by extending the element weights into a distance metric between elements. For example, in search evaluation, swapping the order of two nearly duplicate results should result in little penalty, even if these two are highly relevant and appear at the top of the list. We extend the distance measures to this more general case and show that they remain within a constant factor of each other.We conclude by conducting simple experiments on web search data with the proposed measures. Our experiments show that the weighted generalizations are more robust and consistent with each other than their unweighted counter-parts.

288 citations


Journal ArticleDOI
TL;DR: A simulation study using data models and analysis of real microarray data shows that for small samples the root mean square differences of the estimated and true metrics are considerable, and even for large samples, there is only weak correlation between the true and estimated metrics.
Abstract: Motivation: The receiver operator characteristic (ROC) curves are commonly used in biomedical applications to judge the performance of a discriminant across varying decision thresholds. The estimated ROC curve depends on the true positive rate (TPR) and false positive rate (FPR), with the key metric being the area under the curve (AUC). With small samples these rates need to be estimated from the training data, so a natural question arises: How well do the estimates of the AUC, TPR and FPR compare with the true metrics? Results: Through a simulation study using data models and analysis of real microarray data, we show that (i) for small samples the root mean square differences of the estimated and true metrics are considerable; (ii) even for large samples, there is only weak correlation between the true and estimated metrics; and (iii) generally, there is weak regression of the true metric on the estimated metric. For classification rules, we consider linear discriminant analysis, linear support vector machine (SVM) and radial basis function SVM. For error estimation, we consider resubstitution, three kinds of cross-validation and bootstrap. Using resampling, we show the unreliability of some published ROC results. Availability: Companion web site at http://compbio.tgen.org/paper_supp/ROC/roc.html Contact: edward@mail.ece.tamu.edu

Journal ArticleDOI
TL;DR: In this paper, the authors reformulate the Hamiltonian form of bosonic eleven-dimensional supergravity in terms of an object that unifies the three-form and the metric.
Abstract: We reformulate the Hamiltonian form of bosonic eleven dimensional supergravity in terms of an object that unifies the three-form and the metric. For the case of four spatial dimensions, the duality group is manifest and the metric and C-field are on an equal footing even though no dimensional reduction is required for our results to hold. One may also describe our results using the generalized geometry that emerges from membrane duality. The relationship between the twisted Courant algebra and the gauge symmetries of eleven dimensional supergravity are described in detail.

Journal ArticleDOI
TL;DR: A new metric is presented which combines the numerical information of the votes with independent information from those values, based on the proportions of the common and uncommon votes between each pair of users, which is superior to the traditional levels of accuracy.
Abstract: Recommender systems are typically provided as Web 20 services and are part of the range of applications that give support to large-scale social networks, enabling on-line recommendations to be made based on the use of networked databases The operating core of recommender systems is based on the collaborative filtering stage, which, in current user to user recommender processes, usually uses the Pearson correlation metric In this paper, we present a new metric which combines the numerical information of the votes with independent information from those values, based on the proportions of the common and uncommon votes between each pair of users Likewise, we define the reasoning and experiments on which the design of the metric is based and the restriction of being applied to recommender systems where the possible range of votes is not greater than 5 In order to demonstrate the superior nature of the proposed metric, we provide the comparative results of a set of experiments based on the MovieLens, FilmAffinity and NetFlix databases In addition to the traditional levels of accuracy, results are also provided on the metrics' coverage, the percentage of hits obtained and the precision/recall

Proceedings Article
16 May 2010
TL;DR: This paper studies the impact of different attribute and topology based sampling strategies on the discovery of an important social media phenomena–information diffusion, and develops a series of metrics for evaluating the quality of the sample.
Abstract: Platforms such as Twitter have provided researchers with ample opportunities to analytically study social phenomena. There are however, significant computational challenges due to the enormous rate of production of new information: researchers are therefore, often forced to analyze a judiciously selected “sample” of the data. Like other social media phenomena, information diffusion is a social process–it is affected by user context, and topic, in addition to the graph topology. This paper studies the impact of different attribute and topology based sampling strategies on the discovery of an important social media phenomena–information diffusion. We examine several widely-adopted sampling methods that select nodes based on attribute (random, location, and activity) and topology (forest fire) as well as study the impact of attribute based seed selection on topology based sampling. Then we develop a series of metrics for evaluating the quality of the sample, based on user activity (e.g. volume, number of seeds), topological (e.g. reach, spread) and temporal characteristics (e.g. rate). We additionally correlate the diffusion volume metric with two external variables–search and news trends. Our experiments reveal that for small sample sizes (30%), a sample that incorporates both topology and user context (e.g. location, activity) can improve on naive methods by a significant margin of ~15-20%.

Posted Content
TL;DR: In this paper, a generalization of stochastic bandits where the set of arms is allowed to be a generic measurable space and the mean-payoff function is locally Lipschitz with respect to a dissimilarity function that is known to the decision maker is considered.
Abstract: We consider a generalization of stochastic bandits where the set of arms, $\cX$, is allowed to be a generic measurable space and the mean-payoff function is "locally Lipschitz" with respect to a dissimilarity function that is known to the decision maker. Under this condition we construct an arm selection policy, called HOO (hierarchical optimistic optimization), with improved regret bounds compared to previous results for a large class of problems. In particular, our results imply that if $\cX$ is the unit hypercube in a Euclidean space and the mean-payoff function has a finite number of global maxima around which the behavior of the function is locally continuous with a known smoothness degree, then the expected regret of HOO is bounded up to a logarithmic factor by $\sqrt{n}$, i.e., the rate of growth of the regret is independent of the dimension of the space. We also prove the minimax optimality of our algorithm when the dissimilarity is a metric. Our basic strategy has quadratic computational complexity as a function of the number of time steps and does not rely on the doubling trick. We also introduce a modified strategy, which relies on the doubling trick but runs in linearithmic time. Both results are improvements with respect to previous approaches.

Journal ArticleDOI
TL;DR: In this article, the authors extend the celebrated result of W. A. Kirk that a metric space is complete if and only if every Caristi self-mapping for has a fixed point, to partial metric spaces.
Abstract: We extend the celebrated result of W. A. Kirk that a metric space is complete if and only if every Caristi self-mapping for has a fixed point, to partial metric spaces.

Proceedings ArticleDOI
10 Apr 2010
TL;DR: It is argued that this metric and graph serves as a representation of the overall emotional health of the nation, and the importance of tracking such metrics is discussed.
Abstract: I analyze the use of emotion words for approximately 100 million Facebook users since September of 2007. "Gross national happiness" is operationalized as a standardized difference between the use of positive and negative words, aggregated across days, and present a graph of this metric. I begin to validate this metric by showing that positive and negative word use in status updates covaries with self-reported satisfaction with life (convergent validity), and also note that the graph shows peaks and valleys on days that are culturally and emotionally significant (face validity). I discuss the development and computation of this metric, argue that this metric and graph serves as a representation of the overall emotional health of the nation, and discuss the importance of tracking such metrics.

Journal ArticleDOI
TL;DR: It is demonstrated that the novel approach is very effective in discovering the modular structures in microarray data, both for genes and for samples, and may be applied to large data sets where the number of clusters is difficult to estimate.
Abstract: Co-expression network-based approaches have become popular in analyzing microarray data, such as for detecting functional gene modules. However, co-expression networks are often constructed by ad hoc methods, and network-based analyses have not been shown to outperform the conventional cluster analyses, partially due to the lack of an unbiased evaluation metric. Here, we develop a general co-expression network-based approach for analyzing both genes and samples in microarray data. Our approach consists of a simple but robust rank-based network construction method, a parameter-free module discovery algorithm and a novel reference network-based metric for module evaluation. We report some interesting topological properties of rank-based co-expression networks that are very different from that of value-based networks in the literature. Using a large set of synthetic and real microarray data, we demonstrate the superior performance of our approach over several popular existing algorithms. Applications of our approach to yeast, Arabidopsis and human cancer microarray data reveal many interesting modules, including a fatal subtype of lymphoma and a gene module regulating yeast telomere integrity, which were missed by the existing methods. We demonstrated that our novel approach is very effective in discovering the modular structures in microarray data, both for genes and for samples. As the method is essentially parameter-free, it may be applied to large data sets where the number of clusters is difficult to estimate. The method is also very general and can be applied to other types of data. A MATLAB implementation of our algorithm can be downloaded from http://cs.utsa.edu/~jruan/Software.html .

Journal ArticleDOI
TL;DR: This work shows that molecular simulations driven by adaptive sampling of networks called Markov State Models (MSMs) can yield tremendous time and resource savings, allowing previously intractable calculations to be performed on a routine basis on existing hardware.
Abstract: Computer simulations can complement experiments by providing insight into molecular kinetics with atomic resolution. Unfortunately, even the most powerful supercomputers can only simulate small systems for short time scales, leaving modeling of most biologically relevant systems and time scales intractable. In this work, however, we show that molecular simulations driven by adaptive sampling of networks called Markov State Models (MSMs) can yield tremendous time and resource savings, allowing previously intractable calculations to be performed on a routine basis on existing hardware. We also introduce a distance metric (based on the relative entropy) for comparing MSMs. We primarily employ this metric to judge the convergence of various sampling schemes but it could also be employed to assess the effects of perturbations to a system (e.g., determining how changing the temperature or making a mutation changes a system’s dynamics).

Proceedings ArticleDOI
Sergey Ioffe1
13 Dec 2010
TL;DR: A novel method of mapping hashes to short bit-strings, apply it to Weighted Minhash, and achieve more accurate distance estimates from sketches than existing methods, as long as the inputs are sufficiently distinct.
Abstract: We propose a new Consistent Weighted Sampling method, where the probability of drawing identical samples for a pair of inputs is equal to their Jaccard similarity. Our method takes deterministic constant time per non-zero weight, improving on the best previous approach which takes expected constant time. The samples can be used as Weighted Minhash for efficient retrieval and compression (sketching) under Jaccard or L1 metric. A method is presented for using simple data statistics to reduce the running time of hash computation by two orders of magnitude. We compare our method with the random projection method and show that it has better characteristics for retrieval under L1. We present a novel method of mapping hashes to short bit-strings, apply it to Weighted Minhash, and achieve more accurate distance estimates from sketches than existing methods, as long as the inputs are sufficiently distinct. We show how to choose the optimal number of bits per hash for sketching, and demonstrate experimental results which agree with the theoretical analysis.

Journal ArticleDOI
TL;DR: In this article, the fixed point existence results of multivalued mappings defined on cone metric spaces are discussed and the new results are merely copies of the classical ones and do not necessitate the underlying Banach space nor associated cone.
Abstract: In this work we discuss some recent results about KKM mappings in cone metric spaces. We also discuss the fixed point existence results of multivalued mappings defined on such metric spaces. In particular we show that most of the new results are merely copies of the classical ones and do not necessitate the underlying Banach space nor the associated cone.

Proceedings ArticleDOI
03 Dec 2010
TL;DR: A novel feature based IQA model, namely Riesz-transform based Feature SIMilarity metric (RFSIM), is proposed based on the fact that the human vision system (HVS) perceives an image mainly according to its low-level features.
Abstract: Image quality assessment (IQA) aims to provide computational models to measure the image quality in a perceptually consistent manner. In this paper, a novel feature based IQA model, namely Riesz-transform based Feature SIMilarity metric (RFSIM), is proposed based on the fact that the human vision system (HVS) perceives an image mainly according to its low-level features. The 1st-order and 2nd-order Riesz transform coefficients of the image are taken as image features, while a feature mask is defined as the edge locations of the image. The similarity index between the reference and distorted images is measured by comparing the two feature maps at key locations marked by the feature mask. Extensive experiments on the comprehensive TID2008 database indicate that the proposed RFSIM metric is more consistent with the subjective evaluation than all the other competing methods evaluated.

Journal ArticleDOI
TL;DR: This work presents a boosting framework for distance metric learning that aims to preserve both visual and semantic similarities and shows that the boosting framework compares favorably to state-of-the-art approaches fordistance metric learning in retrieval accuracy, with much lower computational cost.
Abstract: Similarity measurement is a critical component in content-based image retrieval systems, and learning a good distance metric can significantly improve retrieval performance. However, despite extensive study, there are several major shortcomings with the existing approaches for distance metric learning that can significantly affect their application to medical image retrieval. In particular, ldquosimilarityrdquo can mean very different things in image retrieval: resemblance in visual appearance (e.g., two images that look like one another) or similarity in semantic annotation (e.g., two images of tumors that look quite different yet are both malignant). Current approaches for distance metric learning typically address only one goal without consideration of the other. This is problematic for medical image retrieval where the goal is to assist doctors in decision making. In these applications, given a query image, the goal is to retrieve similar images from a reference library whose semantic annotations could provide the medical professional with greater insight into the possible interpretations of the query image. If the system were to retrieve images that did not look like the query, then users would be less likely to trust the system; on the other hand, retrieving images that appear superficially similar to the query but are semantically unrelated is undesirable because that could lead users toward an incorrect diagnosis. Hence, learning a distance metric that preserves both visual resemblance and semantic similarity is important. We emphasize that, although our study is focused on medical image retrieval, the problem addressed in this work is critical to many image retrieval systems. We present a boosting framework for distance metric learning that aims to preserve both visual and semantic similarities. The boosting framework first learns a binary representation using side information, in the form of labeled pairs, and then computes the distance as a weighted Hamming distance using the learned binary representation. A boosting algorithm is presented to efficiently learn the distance function. We evaluate the proposed algorithm on a mammographic image reference library with an interactive search-assisted decision support (ISADS) system and on the medical image data set from ImageCLEF. Our results show that the boosting framework compares favorably to state-of-the-art approaches for distance metric learning in retrieval accuracy, with much lower computational cost. Additional evaluation with the COREL collection shows that our algorithm works well for regular image data sets.

Proceedings Article
31 Mar 2010
TL;DR: A lower bound is proved for the regret of any algo- rithm where ~ ~ are packing dimensions of the query spaces and the ad space respectively and this gives an almost matching up- per and lower bound for finite spaces or convex bounded subsets of Eu- clidean spaces.
Abstract: D´ avid P´ al Abstract We study contextual multi-armed bandit prob- lems where the context comes from a metric space and the payoff satisfies a Lipschitz condi- tion with respect to the metric. Abstractly, a con- textual multi-armed bandit problem models a sit- uation where, in a sequence of independent trials, an online algorithm chooses, based on a given context (side information), an action from a set of possible actions so as to maximize the total pay- off of the chosen actions. The payoff depends on both the action chosen and the context. In con- trast, context-free multi-armed bandit problems, a focus of much previous research, model situa- tions where no side information is available and the payoff depends only on the action chosen. Our problem is motivated by sponsored web search, where the task is to display ads to a user of an Internet search engine based on her search query so as to maximize the click-through rate (CTR) of the ads displayed. We cast this prob- lem as a contextual multi-armed bandit problem where queries and ads form metric spaces and the payoff function is Lipschitz with respect to both the metrics. For any > 0 we present an algorithm with regret O(T a+b+1 a+b+2+ ) where a;b are the covering dimensions of the query space and the ad space respectively. We prove a lower bound ( T ~ a+~+1 ~ a+~+2 ) for the regret of any algo- rithm where ~ ~ are packing dimensions of the query spaces and the ad space respectively. For finite spaces or convex bounded subsets of Eu- clidean spaces, this gives an almost matching up- per and lower bound.

Journal ArticleDOI
TL;DR: A supervised feature selection approach, which is based on metric applied on continuous and discrete data representations, builds a dissimilarity space using information theoretic measures, in particular conditional mutual information between features with respect to a relevant variable that represents the class labels.

Journal ArticleDOI
TL;DR: A novel semi-supervised distance metric learning technique, called ldquoLaplacian Regularized Metric Learningrdquo (LRML), for learning robust distance metrics for CIR, which shows that reliable metrics can be learned from real log data even they may be noisy and limited at the beginning stage of a CIR system.
Abstract: Learning a good distance metric plays a vital role in many multimedia retrieval and data mining tasks. For example, a typical content-based image retrieval (CBIR) system often relies on an effective distance metric to measure similarity between any two images. Conventional CBIR systems simply adopting Euclidean distance metric often fail to return satisfactory results mainly due to the well-known semantic gap challenge. In this article, we present a novel framework of Semi-Supervised Distance Metric Learning for learning effective distance metrics by exploring the historical relevance feedback log data of a CBIR system and utilizing unlabeled data when log data are limited and noisy. We formally formulate the learning problem into a convex optimization task and then present a new technique, named as “Laplacian Regularized Metric Learning” (LRML). Two efficient algorithms are then proposed to solve the LRML task. Further, we apply the proposed technique to two applications. One direct application is for Collaborative Image Retrieval (CIR), which aims to explore the CBIR log data for improving the retrieval performance of CBIR systems. The other application is for Collaborative Image Clustering (CIC), which aims to explore the CBIR log data for enhancing the clustering performance of image pattern clustering tasks. We conduct extensive evaluation to compare the proposed LRML method with a number of competing methods, including 2 standard metrics, 3 unsupervised metrics, and 4 supervised metrics with side information. Encouraging results validate the effectiveness of the proposed technique.

Posted Content
Charles H. Bennett1, Peter Gacs, Ming Li, Paul M. B. Vitányi, Wojciech H. Zurek 
TL;DR: In this paper, a universal information metric based on length of shortest programs for either ordinary computations or reversible (dissipationless) computations was proposed. But the information distance between two individual objects, for example, two pictures, is not an absolute measure of information content in an individual finite object.
Abstract: While Kolmogorov complexity is the accepted absolute measure of information content in an individual finite object, a similarly absolute notion is needed for the information distance between two individual objects, for example, two pictures We give several natural definitions of a universal information metric, based on length of shortest programs for either ordinary computations or reversible (dissipationless) computations It turns out that these definitions are equivalent up to an additive logarithmic term We show that the information distance is a universal cognitive similarity distance We investigate the maximal correlation of the shortest programs involved, the maximal uncorrelation of programs (a generalization of the Slepian-Wolf theorem of classical information theory), and the density properties of the discrete metric spaces induced by the information distances A related distance measures the amount of nonreversibility of a computation Using the physical theory of reversible computation, we give an appropriate (universal, anti-symmetric, and transitive) measure of the thermodynamic work required to transform one object in another object by the most efficient process Information distance between individual objects is needed in pattern recognition where one wants to express effective notions of "pattern similarity" or "cognitive similarity" between individual objects and in thermodynamics of computation where one wants to analyse the energy dissipation of a computation from a particular input to a particular output

Book ChapterDOI
05 Sep 2010
TL;DR: This paper proposes to learn a metric using those labeled pairs of bags, leading to MildML, for multiple instance logistic discriminant metric learning, and finds that MildML using the bag-level annotation performs as well as fully supervised metric learning using instance- level annotation.
Abstract: Metric learning aims at finding a distance that approximates a task-specific notion of semantic similarity. Typically, a Mahalanobis distance is learned from pairs of data labeled as being semantically similar or not. In this paper, we learn such metrics in a weakly supervised setting where "bags" of instances are labeled with "bags" of labels. We formulate the problem as a multiple instance learning (MIL) problem over pairs of bags. If two bags share at least one label, we label the pair positive, and negative otherwise. We propose to learn a metric using those labeled pairs of bags, leading to MildML, for multiple instance logistic discriminant metric learning. MildML iterates between updates of the metric and selection of putative positive pairs of examples from positive pairs of bags. To evaluate our approach, we introduce a large and challenging data set, Labeled Yahoo! News, which we have manually annotated and contains 31147 detected faces of 5873 different people in 20071 images. We group the faces detected in an image into a bag, and group the names detected in the caption into a corresponding set of labels. When the labels come from manual annotation, we find that MildML using the bag-level annotation performs as well as fully supervised metric learning using instance-level annotation. We also consider performance in the case of automatically extracted labels for the bags, where some of the bag labels do not correspond to any example in the bag. In this case MildML works substantially better than relying on noisy instance-level annotations derived from the bag-level annotation by resolving face-name associations in images with their captions.

Journal ArticleDOI
TL;DR: This article proposes a new distance measure based on the biharmonic differential operator that has all the desired properties and provides a nice trade-off between nearly geodesic distances for small distances and global shape-awareness for large distances.
Abstract: Measuring distances between pairs of points on a 3D surface is a fundamental problem in computer graphics and geometric processing. For most applications, the important properties of a distance are that it is a metric, smooth, locally isotropic, globally “shape-aware,” isometry-invariant, insensitive to noise and small topology changes, parameter-free, and practical to compute on a discrete mesh. However, the basic methods currently popular in computer graphics (e.g., geodesic and diffusion distances) do not have these basic properties. In this article, we propose a new distance measure based on the biharmonic differential operator that has all the desired properties. This new surface distance is related to the diffusion and commute-time distances, but applies different (inverse squared) weighting to the eigenvalues of the Laplace-Beltrami operator, which provides a nice trade-off between nearly geodesic distances for small distances and global shape-awareness for large distances. The article provides theoretical and empirical analysis for a large number of meshes.

Proceedings ArticleDOI
07 Jun 2010
TL;DR: This paper proves an Ω(n) lower bound of the social cost approximation ratio for deterministic strategy-proof mechanisms and provides the first randomized strategy-Proof mechanism with a constant approximation ratio of 4.1 and works in general metric spaces.
Abstract: We consider the problem of locating facilities in a metric space to serve a set of selfish agents. The cost of an agent is the distance between her own location and the nearest facility. The social cost is the total cost of the agents. We are interested in designing strategy-proof mechanisms without payment that have a small approximation ratio for social cost. A mechanism is a (possibly randomized) algorithm which maps the locations reported by the agents to the locations of the facilities. A mechanism is strategy-proof if no agent can benefit from misreporting her location in any configuration.This setting was first studied by Procaccia and Tennenholtz [21]. They focused on the facility game where agents and facilities are located on the real line. Alon et al. studied the mechanisms for the facility games in a general metric space [1]. However, they focused on the games with only one facility. In this paper, we study the two-facility game in a general metric space, which extends both previous models.We first prove an Ω(n) lower bound of the social cost approximation ratio for deterministic strategy-proof mechanisms. Our lower bound even holds for the line metric space. This significantly improves the previous constant lower bounds [21, 17]. Notice that there is a matching linear upper bound in the line metric space [21]. Next, we provide the first randomized strategy-proof mechanism with a constant approximation ratio of 4. Our mechanism works in general metric spaces. For randomized strategy-proof mechanisms, the previous best upper bound is O(n) which works only in the line metric space.