Author
Ji Zhao
Other affiliations: Samsung
Bio: Ji Zhao is an academic researcher from Carnegie Mellon University. The author has contributed to research in topics: Support vector machine & Feature selection. The author has an hindex of 3, co-authored 3 publications receiving 45 citations. Previous affiliations of Ji Zhao include Samsung.
Papers
More filters
TL;DR: In this article, a method for feature selection and region selection in the visual BoW model is presented, which is able to handle both regions in images and spatio-temporal regions in videos in a unified way.
Abstract: Visual learning problems such as object classification and action recognition are typically approached using extensions of the popular bag-of-words (BoW) model. Despite its great success, it is unclear what visual features the BoW model is learning: Which regions in the image or video are used to discriminate among classes? Which are the most discriminative visual words? Answering these questions is fundamental for understanding existing BoW models and inspiring better models for visual recognition.
To answer these questions, this paper presents a method for feature selection and region selection in the visual BoW model. This allows for an intermediate visualization of the features and regions that are important for visual learning. The main idea is to assign latent weights to the features or regions, and jointly optimize these latent variables with the parameters of a classifier (e.g., support vector machine). There are four main benefits of our approach: (1) Our approach accommodates non-linear additive kernels such as the popular $\chi^2$ and intersection kernel; (2) our approach is able to handle both regions in images and spatio-temporal regions in videos in a unified way; (3) the feature selection problem is convex, and both problems can be solved using a scalable reduced gradient method; (4) we point out strong connections with multiple kernel learning and multiple instance learning approaches. Experimental results in the PASCAL VOC 2007, MSR Action Dataset II and YouTube illustrate the benefits of our approach.
18 citations
TL;DR: Inspired by the successful application of bag-of-words (BoW) to feature representation, this work leverages it at instance-level to model the distributions of the positive class and negative class, and then incorporates the BoW learning and instance labeling in a single optimization formulation.
Abstract: In this paper, we aim at irregular-shape object localization under weak supervision. With over-segmentation, this task can be transformed into multiple-instance context. However, most multiple-instance learning methods only emphasize single most positive instance in a positive bag to optimize bag-level classification, and leads to imprecise or incomplete localization. To address this issue, we propose a scheme for instance annotation, where all of the positive instances are detected by labeling each instance in each positive bag. Inspired by the successful application of bag-of-words (BoW) to feature representation, we leverage it at instance-level to model the distributions of the positive class and negative class, and then incorporate the BoW learning and instance labeling in a single optimization formulation. We also demonstrate that the scheme is well suited to weakly supervised object localization of irregular-shape. Experimental results validate the effectiveness both for the problem of generic instance annotation and for the application of weakly supervised object localization compared to some existing methods.
16 citations
TL;DR: A method for feature selection and region selection in the visual BoW model is presented, to assign latent weights to the features or regions, and jointly optimize these latent variables with the parameters of a classifier.
Abstract: Visual learning problems, such as object classification and action recognition, are typically approached using extensions of the popular bag-of-words (BoWs) model. Despite its great success, it is unclear what visual features the BoW model is learning. Which regions in the image or video are used to discriminate among classes? Which are the most discriminative visual words? Answering these questions is fundamental for understanding existing BoW models and inspiring better models for visual recognition. To answer these questions, this paper presents a method for feature selection and region selection in the visual BoW model. This allows for an intermediate visualization of the features and regions that are important for visual learning. The main idea is to assign latent weights to the features or regions, and jointly optimize these latent variables with the parameters of a classifier (e.g., support vector machine). There are four main benefits of our approach: 1) our approach accommodates non-linear additive kernels, such as the popular $\chi ^{2}$ and intersection kernel; 2) our approach is able to handle both regions in images and spatio-temporal regions in videos in a unified way; 3) the feature selection problem is convex, and both problems can be solved using a scalable reduced gradient method; and 4) we point out strong connections with multiple kernel learning and multiple instance learning approaches. Experimental results in the PASCAL VOC 2007, MSR Action Dataset II and YouTube illustrate the benefits of our approach.
15 citations
Cited by
More filters
TL;DR: This article proposes a simple yet effective similarity guidance network to tackle the one-shot (SG-One) segmentation problem, aiming at predicting the segmentation mask of a query image with the reference to one densely labeled support image of the same category.
Abstract: One-shot image semantic segmentation poses a challenging task of recognizing the object regions from unseen categories with only one annotated example as supervision. In this article, we propose a simple yet effective similarity guidance network to tackle the one-shot (SG-One) segmentation problem. We aim at predicting the segmentation mask of a query image with the reference to one densely labeled support image of the same category. To obtain the robust representative feature of the support image, we first adopt a masked average pooling strategy for producing the guidance features by only taking the pixels belonging to the support image into account. We then leverage the cosine similarity to build the relationship between the guidance features and features of pixels from the query image. In this way, the possibilities embedded in the produced similarity maps can be adopted to guide the process of segmenting objects. Furthermore, our SG-One is a unified framework that can efficiently process both support and query images within one network and be learned in an end-to-end manner. We conduct extensive experiments on Pascal VOC 2012. In particular, our SG-One achieves the mIoU score of 46.3%, surpassing the baseline methods.
325 citations
TL;DR: A unique survey of the state-of-the-art image matching methods based on feature descriptor is presented, from which future research may benefit.
Abstract: Image registration is an important technique in many computer vision applications such as image fusion, image retrieval, object tracking, face recognition, change detection and so on. Local feature descriptors, i.e., how to detect features and how to describe them, play a fundamental and important role in image registration process, which directly influence the accuracy and robustness of image registration. This paper mainly focuses on the variety of local feature descriptors including some theoretical research, mathematical models, and methods or algorithms along with their applications in the context of image registration. The existing local feature descriptors are roughly classified into six categories to demonstrate and analyze comprehensively their own advantages. The current and future challenges of local feature descriptors are discussed. The major goal of the paper is to present a unique survey of the state-of-the-art image matching methods based on feature descriptor, from which future research may benefit.
82 citations
Posted Content•
TL;DR: In this paper, a similarity guidance network is proposed to predict the segmentation mask of a query image with the reference to one densely labeled support image of the same category, which can efficiently process both support and query images within one network and be learned in an end-to-end manner.
Abstract: One-shot image semantic segmentation poses a challenging task of recognizing the object regions from unseen categories with only one annotated example as supervision. In this paper, we propose a simple yet effective Similarity Guidance network to tackle the One-shot (SG-One) segmentation problem. We aim at predicting the segmentation mask of a query image with the reference to one densely labeled support image of the same category. To obtain the robust representative feature of the support image, we firstly adopt a masked average pooling strategy for producing the guidance features by only taking the pixels belonging to the support image into account. We then leverage the cosine similarity to build the relationship between the guidance features and features of pixels from the query image. In this way, the possibilities embedded in the produced similarity maps can be adapted to guide the process of segmenting objects. Furthermore, our SG-One is a unified framework which can efficiently process both support and query images within one network and be learned in an end-to-end manner. We conduct extensive experiments on Pascal VOC 2012. In particular, our SGOne achieves the mIoU score of 46.3%, surpassing the baseline methods.
30 citations
01 Nov 2019
TL;DR: A unique approach utilizing the underlying standard metrics and creating internals 3D evaluation of fitness factors, a part of enhanced machine learning engine engineering for enhanced Weighted Performance Metric.
Abstract: Accuracy metric has become a gold standard for measuring various models and systems. However, the latest development and research have shown its limitations, especially during training/testing of the machine learning models. It fails to keep the integrity of the reliability of the outcome for unbalanced data modeling. This problem is known as Accuracy Paradox. This paper presents a unique approach utilizing the underlying standard metrics and creating internals 3D evaluation of fitness factors, a part of enhanced machine learning engine engineering. The model is trained under the internal tuning of parameters and trade-off is handled as a collateral benefit using built-in parallelism. Moreover, the way this metric is engineered, it opens up generalized options for data engineers and model creators to incorporate (mathematically) the metrics (standard or the new one) of their choice based on how the model is trained for any type of data domain. This approach shows the improved integrity of the internal metrics measurements using mathematical constructs and governing algorithm for enhanced Weighted Performance Metric. Included sample experiments on real-world datasets, and results support the contribution of this research.
16 citations