Bio: Yueting Zhuang is an academic researcher from Zhejiang University. The author has contributed to research in topics: Computer science & Image retrieval. The author has an hindex of 53, co-authored 432 publications receiving 11299 citations. Previous affiliations of Yueting Zhuang include University of Illinois at Urbana–Champaign & University of Southern California.
Papers published on a yearly basis
••01 Oct 2017
TL;DR: This paper proposes a simple yet effective human part-aligned representation for handling the body part misalignment problem, and shows state-of-the-art results over standard datasets, Market-1501,CUHK03, CUHK01 and VIPeR.
Abstract: In this paper, we address the problem of person re-identification, which refers to associating the persons captured from different cameras. We propose a simple yet effective human part-aligned representation for handling the body part misalignment problem. Our approach decomposes the human body into regions (parts) which are discriminative for person matching, accordingly computes the representations over the regions, and aggregates the similarities computed between the corresponding regions of a pair of probe and gallery images as the overall matching score. Our formulation, inspired by attention models, is a deep neural network modeling the three steps together, which is learnt through minimizing the triplet loss function without requiring body part labeling information. Unlike most existing deep learning algorithms that learn a global or spatial partition-based local representation, our approach performs human body partition, and thus is more robust to pose changes and various human spatial distributions in the person bounding box. Our approach shows state-of-the-art results over standard datasets, Market-1501, CUHK03, CUHK01 and VIPeR. 1
••04 Oct 1998
TL;DR: A new algorithm for key frame extraction based on unsupervised clustering is introduced, both computationally simple and able to adapt to the visual content, which is validated by large amount of real-world videos.
Abstract: Key frame extraction has been recognized as one of the important research issues in video information retrieval. Although progress has been made in key frame extraction, the existing approaches are either computationally expensive or ineffective in capturing salient visual content. We first discuss the importance of key frame selection; and then review and evaluate the existing approaches. To overcome the shortcomings of the existing approaches, we introduce a new algorithm for key frame extraction based on unsupervised clustering. The proposed algorithm is both computationally simple and able to adapt to the visual content. The efficiency and effectiveness are validated by large amount of real-world videos.
TL;DR: This paper proposes a multi-task deep saliency model based on a fully convolutional neural network with global input (whole raw images) and global output (Whole saliency maps) and presents a graph Laplacian regularized nonlinear regression model for saliency refinement.
Abstract: A key problem in salient object detection is how to effectively model the semantic properties of salient objects in a data-driven manner. In this paper, we propose a multi-task deep saliency model based on a fully convolutional neural network with global input (whole raw images) and global output (whole saliency maps). In principle, the proposed saliency model takes a data-driven strategy for encoding the underlying saliency prior information, and then sets up a multi-task learning scheme for exploring the intrinsic correlations between saliency detection and semantic image segmentation. Through collaborative feature learning from such two correlated tasks, the shared fully convolutional layers produce effective features for object perception. Moreover, it is capable of capturing the semantic information on salient objects across different levels using the fully convolutional layers, which investigate the feature-sharing properties of salient object detection with a great reduction of feature redundancy. Finally, we present a graph Laplacian regularized nonlinear regression model for saliency refinement. Experimental results demonstrate the effectiveness of our approach in comparison with the state-of-the-art approaches.
••15 Jun 2019
TL;DR: A self-supervised spatiotemporal learning technique which leverages the chronological order of videos to learn the spatiotmporal representation of the video by predicting the order of shuffled clips from the video.
Abstract: We propose a self-supervised spatiotemporal learning technique which leverages the chronological order of videos. Our method can learn the spatiotemporal representation of the video by predicting the order of shuffled clips from the video. The category of the video is not required, which gives our technique the potential to take advantage of infinite unannotated videos. There exist related works which use frames, while compared to frames, clips are more consistent with the video dynamics. Clips can help to reduce the uncertainty of orders and are more appropriate to learn a video representation. The 3D convolutional neural networks are utilized to extract features for clips, and these features are processed to predict the actual order. The learned representations are evaluated via nearest neighbor retrieval experiments. We also use the learned networks as the pre-trained models and finetune them on the action recognition task. Three types of 3D convolutional neural networks are tested in experiments, and we gain large improvements compared to existing self-supervised methods.
TL;DR: A new semi-supervised algorithm called ranking with Local Regression and Global Alignment (LRGA) to learn a robust Laplacian matrix for data ranking and a semi- supervised long-term RF algorithm to refine the multimedia data representation.
Abstract: We present a new framework for multimedia content analysis and retrieval which consists of two independent algorithms. First, we propose a new semi-supervised algorithm called ranking with Local Regression and Global Alignment (LRGA) to learn a robust Laplacian matrix for data ranking. In LRGA, for each data point, a local linear regression model is used to predict the ranking scores of its neighboring points. A unified objective function is then proposed to globally align the local models from all the data points so that an optimal ranking score can be assigned to each data point. Second, we propose a semi-supervised long-term Relevance Feedback (RF) algorithm to refine the multimedia data representation. The proposed long-term RF algorithm utilizes both the multimedia data distribution in multimedia feature space and the history RF information provided by users. A trace ratio optimization problem is then formulated and solved by an efficient algorithm. The algorithms have been applied to several content-based multimedia retrieval applications, including cross-media retrieval, image retrieval, and 3D motion/pose data retrieval. Comprehensive experiments on four data sets have demonstrated its advantages in precision, robustness, scalability, and computational efficiency.
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.
01 Jan 2002