scispace - formally typeset
Search or ask a question

Showing papers by "Kilian Q. Weinberger published in 2013"


Proceedings Article
16 Jun 2013
TL;DR: This work proposes FastTag, a novel algorithm that achieves comparable results with two simple linear mappings that are co-regularized in a joint convex loss function, and demonstrates that FastTag matches the current state-of-the-art in tagging quality, yet reduces the training and testing times and has lower asymptotic complexity.
Abstract: Automatic image annotation is a difficult and highly relevant machine learning task. Recent advances have significantly improved the state-of-the-art in retrieval accuracy with algorithms based on nearest neighbor classification in carefully learned metric spaces. But this comes at a price of increased computational complexity during training and testing. We propose FastTag, a novel algorithm that achieves comparable results with two simple linear mappings that are co-regularized in a joint convex loss function. The loss function can be efficiently optimized in closed form updates, which allows us to incorporate a large number of image descriptors cheaply. On several standard real-world benchmark data sets, we demonstrate that FastTag matches the current state-of-the-art in tagging quality, yet reduces the training and testing times by several orders of magnitude and has lower asymptotic complexity.

222 citations


Proceedings Article
16 Jun 2013
TL;DR: This work proposes to corrupt training examples with noise from known distributions within the exponential family and presents a novel learning algorithm, called marginalized corrupted features (MCF), that trains robust predictors by minimizing the expected value of the loss function under the corrupting distribution.
Abstract: The goal of machine learning is to develop predictors that generalize well to test data. Ideally, this is achieved by training on very large (infinite) training data sets that capture all variations in the data distribution. In the case of finite training data, an effective solution is to extend the training set with artificially created examples--which, however, is also computationally costly. We propose to corrupt training examples with noise from known distributions within the exponential family and present a novel learning algorithm, called marginalized corrupted features (MCF), that trains robust predictors by minimizing the expected value of the loss function under the corrupting distribution-- essentially learning with infinitely many (corrupted) training examples. We show empirically on a variety of data sets that MCF classifiers can be trained efficiently, may generalize substantially better to test data, and are more robust to feature deletion at test time.

177 citations


Proceedings Article
16 Jun 2013
TL;DR: This paper addresses the challenge of balancing the test-time cost and the classifier accuracy in a principled fashion by constructing a tree of classifiers, through which test inputs traverse along individual paths.
Abstract: Recently, machine learning algorithms have successfully entered large-scale real-world industrial applications (eg search engines and email spam filters) Here, the CPU cost during test-time must be budgeted and accounted for In this paper, we address the challenge of balancing the test-time cost and the classifier accuracy in a principled fashion The test-time cost of a classifier is often dominated by the computation required for feature extraction--which can vary drastically across features We decrease this extraction time by constructing a tree of classifiers, through which test inputs traverse along individual paths Each path extracts different features and is optimized for a specific subpartition of the input space By only computing features for inputs that benefit from them the most, our cost-sensitive tree of classifiers can match the high accuracies of the current state-of-the-art at a small fraction of the computational cost

71 citations


Posted Content
TL;DR: This paper proposes Dense Cohort of Terms (dCoT), an unsupervised algorithm to learn improved sBoW document features and demonstrates empirically, on several benchmark datasets, that dCoT features significantly improve the classification accuracy across several document classification tasks.
Abstract: In text mining, information retrieval, and machine learning, text documents are commonly represented through variants of sparse Bag of Words (sBoW) vectors (e.g. TF-IDF). Although simple and intuitive, sBoW style representations suffer from their inherent over-sparsity and fail to capture word-level synonymy and polysemy. Especially when labeled data is limited (e.g. in document classification), or the text documents are short (e.g. emails or abstracts), many features are rarely observed within the training corpus. This leads to overfitting and reduced generalization accuracy. In this paper we propose Dense Cohort of Terms (dCoT), an unsupervised algorithm to learn improved sBoW document features. dCoT explicitly models absent words by removing and reconstructing random sub-sets of words in the unlabeled corpus. With this approach, dCoT learns to reconstruct frequent words from co-occurring infrequent words and maps the high dimensional sparse sBoW vectors into a low-dimensional dense representation. We show that the feature removal can be marginalized out and that the reconstruction can be solved for in closed-form. We demonstrate empirically, on several benchmark datasets, that dCoT features significantly improve the classification accuracy across several document classification tasks.

19 citations


Proceedings Article
16 Jun 2013
TL;DR: Anytime Feature Representations (AFR) is introduced, a novel algorithm that explicitly addresses this trade-off in the data representation rather than in the classifier, allowing conventional classifiers to be turned into test-time cost sensitive anytime classifiers.
Abstract: Evaluation cost during test-time is becoming increasingly important as many real-world applications need fast evaluation (e.g. web search engines, email spam filtering) or use expensive features (e.g. medical diagnosis). We introduce Anytime Feature Representations (AFR), a novel algorithm that explicitly addresses this trade-off in the data representation rather than in the classifier. This enables us to turn conventional classifiers, in particular Support Vector Machines, into test-time cost sensitive anytime classifiers-- combining the advantages of anytime learning and large-margin classification.

18 citations


Proceedings Article
16 Jun 2013
TL;DR: Maximum Variance Correction (MVC) is introduced, which finds largescale feasible solutions to Maximum Variance Unfolding (MVU) by post-processing embeddings from any manifold learning algorithm, which increases the scale of MVUembeddings by several orders of magnitude and is naturally parallel.
Abstract: In this paper we introduce Maximum Variance Correction (MVC), which finds largescale feasible solutions to Maximum Variance Unfolding (MVU) by post-processing embeddings from any manifold learning algorithm. It increases the scale of MVU embeddings by several orders of magnitude and is naturally parallel. This unprecedented scalability opens up new avenues of applications for manifold learning, in particular the use of MVU embeddings as effective heuristics to speed-up A* search. We demonstrate unmatched reductions in search time across several non-trivial A* benchmark search problems and bridge the gap between the manifold learning literature and one of its most promising high impact applications.

14 citations


Proceedings ArticleDOI
03 Jul 2013
TL;DR: This work proposes a machine-learning based multi-parametric approach that uses radiologist generated labels to train a classifier that is able to classify tissue on a voxel-wise basis and automatically generate a tumor segmentation.
Abstract: Glioblastoma Mulitforme is highly infiltrative, making precise delineation of tumor margin difficult. Multimodality or multi-parametric MR imaging sequences promise an advantage over anatomic sequences such as post contrast enhancement as methods for determining the spatial extent of tumor involvement. In considering multi-parametric imaging sequences however, manual image segmentation and classification is time-consuming and prone to error. As a preliminary step toward integration of multi-parametric imaging into clinical assessments of primary brain tumors, we propose a machine-learning based multi-parametric approach that uses radiologist generated labels to train a classifier that is able to classify tissue on a voxel-wise basis and automatically generate a tumor segmentation. A random forests classifier was trained using a leave-one-out experimental paradigm. A simple linear classifier was also trained for comparison. The random forests classifier accurately predicted radiologist generated segmentations and tumor extent.

13 citations




DOI
01 Jan 2013
TL;DR: This work focuses on making use of additional data, which is readily available or can be obtained easily but comes from a different distribution than the testing data, to aid learning, and introduces two strategies and manifest them in five ways to cope with the difference between the training and testing distribution.
Abstract: The generalization properties of most existing machine learning techniques are predicated on the assumptions that 1) a sufficiently large quantity of training data is available; 2) the training and testing data come from some common distribution. Although these assumptions are often met in practice, there are also many scenarios in which training data from the relevant distribution is insufficient. We focus on making use of additional data, which is readily available or can be obtained easily but comes from a different distribution than the testing data, to aid learning. We present five learning scenarios, depending on how the distribution we used to sample the additional training data differs from the testing distribution: 1) learning with weak supervision; 2) domain adaptation; 3) learning from multiple domains; 4) learning from corrupted data; 5) learning with partial supervision. We introduce two strategies and manifest them in five ways to cope with the difference between the training and testing distribution. The first strategy, which gives rise to Pseudo Multi-view Co-training (PMC) and Co-training for Domain Adaptation (CODA), is inspired by the co-training algorithm for multi-view data. PMC generalizes co-training to the more common single view data and allows us to learn from weakly labeled data retrieved free from the web. CODA integrates PMC with an another feature selection component to address the feature incompatibility between domains for domain adaptation. PMC and CODA are evaluated on a variety of real datasets, and both yield record performance. The second strategy marginalized dropout leads to marginalized Stacked Denoising Autoencoders (mSDA), Marginalized Corrupted Features (MCF) and FastTag (FastTag). mSDA diminishes the difference between distributions associated with different domains by learning a new representation through marginalized corruption and reconstruciton. MCF learns from a known distribution which is created by corrupting a small set of training data, and improves robustness of learned classifiers by training on "infinitely'' many data sampled from the distribution. FastTag applies marginalized dropout to the output of partially labeled data to recover missing labels for multi-label tasks. These three algorithms not only achieve the state-of-art performance in various tasks, but also deliver orders of magnitude speed up at training and testing comparing to competing algorithms.

4 citations


Proceedings Article
14 Jul 2013
TL;DR: A goal-oriented manifold learning scheme is proposed that optimizes the Euclidean distance to goals in the embedding while maintaining admissibility and consistency and a state heuristic enhancement technique is proposed to reduce the gap between heuristic and true distances.
Abstract: Recently, a Euclidean heuristic (EH) has been proposed for A* search. EH exploits manifold learning methods to construct an embedding of the state space graph, and derives an admissible heuristic distance between two states from the Euclidean distance between their respective embedded points. EH has shown good performance and memory efficiency in comparison to other existing heuristics such as differential heuristics. However, its potential has not been fully explored. In this paper, we propose a number of techniques that can significantly improve the quality of EH. We propose a goal-oriented manifold learning scheme that optimizes the Euclidean distance to goals in the embedding while maintaining admissibility and consistency. We also propose a state heuristic enhancement technique to reduce the gap between heuristic and true distances. The enhanced heuristic is admissible but no longer consistent. We then employ a modified search algorithm, known as B′ algorithm, that achieves optimality with inconsistent heuristics using consistency check and propagation. We demonstrate the effectiveness of the above techniques and report un-matched reduction in search costs across several non-trivial benchmark search problems.

Proceedings Article
01 Jan 2013
TL;DR: This short paper improves EH by exploiting the landmark structure derived from the SAS+ planning formalism, which provides richer semantics than the simple state-space graph model.
Abstract: An important problem in AI is to construct high-quality heuristics for optimal search. Recently, the Euclidean heuristic (EH) has been proposed, which embeds a state space graph into a Euclidean space and uses Euclidean distances as approximations for the graph distances. The embedding process leverages recent research results from manifold learning, a subfield in machine learning, and guarantees that the heuristic is provably admissible and consistent. EH has shown good performance and memory efficiency in comparison to other existing heuristics. Our recent works have further improved the scalability and quality of EH. In this short paper, we present our latest progress on applying EH to problems in planning formalisms, which provide richer semantics than the simple state-space graph model. In particular, we improve EH by exploiting the landmark structure derived from the SAS+ planning formalism.