Showing papers by "Kilian Q. Weinberger published in 2008"

PDF

Open Access

Proceedings Article•DOI•

Fast solvers and efficient implementations for distance metric learning

[...]

Kilian Q. Weinberger¹, Lawrence K. Saul²•Institutions (2)

Yahoo!¹, University of California, Los Angeles²

05 Jul 2008

TL;DR: A highly efficient solver for the particular instance of semidefinite programming that arises in LMNN classification is described; this solver can handle problems with billions of large margin constraints in a few hours.

...read moreread less

Abstract: In this paper we study how to improve nearest neighbor classification by learning a Mahalanobis distance metric. We build on a recently proposed framework for distance metric learning known as large margin nearest neighbor (LMNN) classification. Our paper makes three contributions. First, we describe a highly efficient solver for the particular instance of semidefinite programming that arises in LMNN classification; our solver can handle problems with billions of large margin constraints in a few hours. Second, we show how to reduce both training and testing times using metric ball trees; the speedups from ball trees are further magnified by learning low dimensional representations of the input space. Third, we show how to learn different Mahalanobis distance metrics in different parts of the input space. For large data sets, the use of locally adaptive distance metrics leads to even lower error rates.

...read moreread less

295 citations

Proceedings Article•DOI•

Resolving tag ambiguity

[...]

Kilian Q. Weinberger¹, Malcolm Slaney¹, Roelof van Zwol¹•Institutions (1)

Yahoo!¹

26 Oct 2008

TL;DR: A probabilistic framework is introduced that allows us to find two tags that appear in different contexts but are both likely to co-occur with the original tag set and suggest new tags that disambiguate the original tags.

...read moreread less

Abstract: Tagging is an important way for users to succinctly describe the content they upload to the Internet. However, most tag-suggestion systems recommend words that are highly correlated with the existing tag set, and thus add little information to a user's contribution. This paper describes a means to determine the ambiguity of a set of (user-contributed) tags and suggests new tags that disambiguate the original tags. We introduce a probabilistic framework that allows us to find two tags that appear in different contexts but are both likely to co-occur with the original tag set. If such tags can be found, the current description is considered "ambiguous" and the two tags are recommended to the user for further clarification. In contrast to previous work, we only query the user when information is most needed and good suggestions are available. We verify the efficacy of our approach using geographical, temporal and semantic metadata, and a user study. We built our system using statistics from a large (100M) database of images and their tags.

...read moreread less

143 citations

Proceedings Article•

Learning a metric for music similarity

[...]

Malcolm Slaney, Kilian Q. Weinberger, William White

01 Jan 2008

TL;DR: Five different principled ways to embed songs into a Euclidean metric space are described, each of the six approaches rotate and scale the raw feature space with a linear transform and tune the parameters of these models using a song-classification task with content-based features.

...read moreread less

Abstract: This paper describe five different principled ways to embed songs into a Euclidean metric space. In particular, we learn embeddings so that the pairwise Euclidean distance between two songs reflects semantic dissimilarity. This allows distance-based analysis, such as for example straightforward nearest-neighbor classification, to detect and potentially suggest similar songs within a collection. Each of the six approaches (baseline, whitening, LDA, NCA, LMNN and RCA) rotate and scale the raw feature space with a linear transform. We tune the parameters of these models using a song-classification task with content-based features.

...read moreread less

93 citations

Proceedings Article•

Large Margin Taxonomy Embedding for Document Categorization

[...]

Kilian Q. Weinberger, Olivier Chapelle

01 Jan 2008

TL;DR: This work presents a novel algorithm that goes beyond hierarchical classification and estimates the latent semantic space that underlies the class hierarchy and shows that the optimization is convex and can be solved efficiently for large data sets.

...read moreread less

Abstract: Applications of multi-class classification, such as document categorization, often appear in cost-sensitive settings. Recent work has significantly improved the state of the art by moving beyond ``flat'' classification through incorporation of class hierarchies [Cai and Hoffman 04]. We present a novel algorithm that goes beyond hierarchical classification and estimates the latent semantic space that underlies the class hierarchy. In this space, each class is represented by a prototype and classification is done with the simple nearest neighbor rule. The optimization of the semantic space incorporates large margin constraints that ensure that for each instance the correct class prototype is closer than any other. We show that our optimization is convex and can be solved efficiently for large data sets. Experiments on the OHSUMED medical journal data base yield state-of-the-art results on topic categorization.

...read moreread less

77 citations

Proceedings Article•

Large margin taxonomy embedding with an application to document categorization

[...]

Kilian Q. Weinberger¹, Olivier Chapelle¹•Institutions (1)

Yahoo!¹

08 Dec 2008

...read moreread less

Abstract: Applications of multi-class classification, such as document categorization, often appear in cost-sensitive settings. Recent work has significantly improved the state of the art by moving beyond "flat" classification through incorporation of class hierarchies [4]. We present a novel algorithm that goes beyond hierarchical classification and estimates the latent semantic space that underlies the class hierarchy. In this space, each class is represented by a prototype and classification is done with the simple nearest neighbor rule. The optimization of the semantic space incorporates large margin constraints that ensure that for each instance the correct class prototype is closer than any other. We show that our optimization is convex and can be solved efficiently for large data sets. Experiments on the OHSUMED medical journal data base yield state-of-the-art results on topic categorization.

...read moreread less

30 citations

Patent•

Playful incentive for labeling content

[...]

Kilian Q. Weinberger¹, Anirban Dasgupta¹, Raghu Ramakrishnan¹, David Reiley¹, Martin Zinkevich¹, Bo Pang¹, Daniel Kifer¹ - Show less +3 more•Institutions (1)

Yahoo!¹

26 Jun 2008

TL;DR: In this paper, a playful incentive to encourage users to provide feedback that is useable to train a classifier is proposed. But the incentive is limited to a single classifier.

...read moreread less

Abstract: Embodiments are directed towards employing a playful incentive to encourage users to provide feedback that is useable to train a classifier. The classifier being associated with any of a variety of different settings, including but not limited to classifying: messages as ham/spam, images, advertising, bookmarking, music, videos, photographs, shopping, or the like. An animated image, such as a pet, provides an interface to the classifier that encourages and responds to user feedback. Users may share their classifiers or aspects thereof with other users to enable a community of knowledge to be applied to a classification task, while preserving privacy of the user feedback. One form of sharing may be within the context of a competitive game. Various evaluations may be performed on a classifier to indicate user feedback consistency, or quality. Classifiers may also be used to provide users with advertisements, products, or services based on the user's feedback.

...read moreread less

28 citations

Patent•

Distributed personal spam filtering

[...]

Kilian Q. Weinberger¹, John Langford¹•Institutions (1)

Yahoo!¹

19 May 2008

TL;DR: In this article, the authors use a community of weighted results from local and global message classifiers to determine whether a message is spam, where each local classifier receives the message and performs a classification of the message.

...read moreread less

Abstract: Embodiments are directed towards using a community of weighted results from local and global message classifiers to determine whether a message is spam. Each local classifier may receive a message that is to be evaluated to determine whether it is spam. A local classifier receives the message and performs a classification of the message. The local classifier may receive predictions of whether the message is spam from at least one global classifier. The local and global predictions are combined using, in one embodiment, a regression analysis to generate a single local message classification. Combining the local and global predictions is directed towards enabling a community of predictions to be used to classify messages. The user may then re-classify this output, which in turn is used as feedback to modify weights to the local and received global predictions for a next message.

...read moreread less

27 citations

Patent•

System and method for improved classification

[...]

Marc'Aurelio Ranzato¹, Kilian Q. Weinberger¹, Eva Hoerster¹, Malcolm Slaney¹•Institutions (1)

Yahoo!¹

22 Dec 2008

TL;DR: In this article, a first classifier is trained using a first process running on at least one computing device using the first set of training images relating to a class of images and a set of additional images are selected using the second classifier from a source of images accessible to the computing device.

...read moreread less

Abstract: A system and method for improved classification. A first classifier is trained using a first process running on at least one computing device using a first set of training images relating to a class of images. A set of additional images are selected using the first classifier from a source of additional images accessible to the computing device. The first set of training images and the set of additional images are merged using the computing device to create a second set of training images. A second classifier is trained using a second process running on the computing device using the second set of training images. A set of unclassified images are classified using the second classifier thereby creating a set of classified images. The first classifier and the second classifier employ different classification methods.

...read moreread less

16 citations

Proceedings Article•DOI•

Mapping Uncharted Waters: Exploratory Analysis, Visualization, and Clustering of Oceanographic Data

[...]

Joshua M. Lewis¹, Pincelli M. Hull¹, Kilian Q. Weinberger², Lawrence K. Saul¹•Institutions (2)

University of California, San Diego¹, Yahoo!²

11 Dec 2008

TL;DR: This work provides the first quantitative classification of open ocean biomes from an automated statistical analysis of multivariate data and provides a valuable case study in the use (and misuse) of recently developed algorithms for high dimensional data analysis.

...read moreread less

Abstract: In this paper we describe an interdisciplinary collaboration between researchers in machine learning and oceanography. The collaboration was formed to study the problem of open ocean biome classification. Biomes are regions on Earth with similar climate (e.g., temperature and rainfall) and vegetation structure (e.g., grasslands, coniferous forests, and deserts). To discover biomes in the open ocean, we apply leading methods in high dimensional data analysis, clustering, and visualization to oceanographic measurements culled from multiple existing databases. We compare traditional approaches, such as k-means clustering and principal component analysis, to newer approaches such as Isomap and maximum variance unfolding. Our work provides the first quantitative classification of open ocean biomes from an automated statistical analysis of multivariate data. It also provides a valuable case study in the use (and misuse) of recently developed algorithms for high dimensional data analysis.

...read moreread less

13 citations

Patent•

Generating congruous metadata for multimedia

[...]

Malcolm Slaney¹, Kilian Q. Weinberger¹•Institutions (1)

Yahoo!¹

04 Mar 2008

TL;DR: In this paper, a method of generating congruous metadata is presented, which includes receiving a similarity measure between at least two multimedia objects and comparing the associated metadata of each of the objects.

...read moreread less

Abstract: A method of generating congruous metadata is provided. The method includes receiving a similarity measure between at least two multimedia objects. Each multimedia object has associated metadata. If the at least two multimedia objects are similar based on the similarity measure and a similarity threshold, the associated metadata of each of the multimedia objects are compared. Then, based on the comparison of the associated metadata of each of the at least two multimedia objects, the method further includes generating congruous metadata. Metadata may be tags, for example.

...read moreread less

6 citations

Patent•

Distributed spam filtering utilizing a plurality of global classifiers and a local classifier

[...]

Kilian Q. Weinberger¹, John Langford¹•Institutions (1)

Yahoo!¹

19 May 2008

TL;DR: In this paper, the authors use a community of weighted results from local and global message classifiers to determine whether a message is spam, where each local classifier receives the message and performs a classification of the message.

...read moreread less

Patent•

System and method for disambiguating text labeling content objects

[...]

Malcolm Slaney¹, Kilian Q. Weinberger¹, Roelof van Zwol¹•Institutions (1)

Yahoo!¹

28 Jun 2008

TL;DR: In this article, an improved system and method for disambiguating text strings labeling content objects is provided, which is based on a weighted KL divergence of text string distributions that maximizes the value of divergence when a text string set may occur in different contexts.

...read moreread less

Abstract: An improved system and method for disambiguating text strings labeling content objects is provided. A text string set may be received from a user. Frequencies of co-occurring text strings in a text collection may be obtained, and a disambiguation measure may be determined for a pair of text strings that each co-occur with a text string in the text string set. The disambiguation measure may be based on a weighted KL divergence of text string distributions that maximizes the value of divergence when a text string set may occur in different contexts. A disambiguation measure may be determined for a list of the top most common pairs of text strings that co-occur with the text string set, and the pairs of text strings may be output in decreasing order by disambiguation measure for those pairs of text strings with a disambiguation measure that exceeds a threshold.

...read moreread less

Patent•

Hierarchical Recognition Through Semantic Embedding

[...]

Olivier Chapelle¹, Kilian Q. Weinberger¹•Institutions (1)

Yahoo!¹

29 Apr 2008

TL;DR: In this paper, a set of classes on which a loss function is defined into a semantic space and learn an input mapping between an input space and the semantic space is used for matching and classification.

...read moreread less

Abstract: Computer-implemented systems and methods, including servers, perform structure-based recognition processes that include matching and classification. Preprocessing subsystems and sub-methods embed a set of classes on which a loss function is defined into a semantic space and learn an input mapping between an input space and the semantic space. Recognition subsystems and methods accept a test object, representable in the input space, and apply the input mapping to the test object as part of a recognition process.

...read moreread less