scispace - formally typeset
Search or ask a question
Author

Yan-Tao Zheng

Bio: Yan-Tao Zheng is an academic researcher from Institute for Infocomm Research Singapore. The author has contributed to research in topics: TRECVID & Relevance feedback. The author has an hindex of 18, co-authored 53 publications receiving 3778 citations. Previous affiliations of Yan-Tao Zheng include National University of Singapore & Google.

Papers published on a yearly basis

Papers
More filters
Proceedings ArticleDOI
08 Jul 2009
TL;DR: The benchmark results indicate that it is possible to learn effective models from sufficiently large image dataset to facilitate general image retrieval and four research issues on web image annotation and retrieval are identified.
Abstract: This paper introduces a web image dataset created by NUS's Lab for Media Search. The dataset includes: (1) 269,648 images and the associated tags from Flickr, with a total of 5,018 unique tags; (2) six types of low-level features extracted from these images, including 64-D color histogram, 144-D color correlogram, 73-D edge direction histogram, 128-D wavelet texture, 225-D block-wise color moments extracted over 5x5 fixed grid partitions, and 500-D bag of words based on SIFT descriptions; and (3) ground-truth for 81 concepts that can be used for evaluation. Based on this dataset, we highlight characteristics of Web image collections and identify four research issues on web image annotation and retrieval. We also provide the baseline results for web image annotation by learning from the tags using the traditional k-NN algorithm. The benchmark results indicate that it is possible to learn effective models from sufficiently large image dataset to facilitate general image retrieval.

2,648 citations

Proceedings ArticleDOI
20 Jun 2009
TL;DR: This paper leverages the vast amount of multimedia data on the Web, the availability of an Internet image search engine, and advances in object recognition and clustering techniques, to address issues of modeling and recognizing landmarks at world-scale.
Abstract: Modeling and recognizing landmarks at world-scale is a useful yet challenging task There exists no readily available list of worldwide landmarks Obtaining reliable visual models for each landmark can also pose problems, and efficiency is another challenge for such a large scale system This paper leverages the vast amount of multimedia data on the Web, the availability of an Internet image search engine, and advances in object recognition and clustering techniques, to address these issues First, a comprehensive list of landmarks is mined from two sources: (1) ~20 million GPS-tagged photos and (2) online tour guide Web pages Candidate images for each landmark are then obtained from photo sharing Websites or by querying an image search engine Second, landmark visual models are built by pruning candidate images using efficient image matching and unsupervised clustering techniques Finally, the landmarks and their visual models are validated by checking authorship of their member images The resulting landmark recognition engine incorporates 5312 landmarks from 1259 cities in 144 countries The experiments demonstrate that the engine can deliver satisfactory recognition performance with high efficiency

355 citations

Journal ArticleDOI
TL;DR: This study aims to leverage the wealth of these enriched online photos to analyze people’s travel patterns at the local level of a tour destination by building a statistically reliable database of travel paths from a noisy pool of community-contributed geotagged photos on the Internet.
Abstract: Recently, the phenomenal advent of photo-sharing services, such as Flickr and Panoramio, have led to volumous community-contributed photos with text tags, timestamps, and geographic references on the Internet. The photos, together with their time- and geo-references, become the digital footprints of photo takers and implicitly document their spatiotemporal movements. This study aims to leverage the wealth of these enriched online photos to analyze people’s travel patterns at the local level of a tour destination. Specifically, we focus our analysis on two aspects: (1) tourist movement patterns in relation to the regions of attractions (RoA), and (2) topological characteristics of travel routes by different tourists. To do so, we first build a statistically reliable database of travel paths from a noisy pool of community-contributed geotagged photos on the Internet. We then investigate the tourist traffic flow among different RoAs by exploiting the Markov chain model. Finally, the topological characteristics of travel routes are analyzed by performing a sequence clustering on tour routes. Testings on four major cities demonstrate promising results of the proposed system.

223 citations

Journal ArticleDOI
TL;DR: A novel active learning approach based on the optimum experimental design criteria in statistics is proposed that simultaneously exploits sample's local structure, and sample relevance, density, and diversity information, as well as makes use of labeled and unlabeled data.
Abstract: Video indexing, also called video concept detection, has attracted increasing attentions from both academia and industry. To reduce human labeling cost, active learning has been introduced to video indexing recently. In this paper, we propose a novel active learning approach based on the optimum experimental design criteria in statistics. Different from existing optimum experimental design, our approach simultaneously exploits sample's local structure, and sample relevance, density, and diversity information, as well as makes use of labeled and unlabeled data. Specifically, we develop a local learning model to exploit the local structure of each sample. Our assumption is that for each sample, its label can be well estimated based on its neighbors. By globally aligning the local models from all the samples, we obtain a local learning regularizer, based on which a local learning regularized least square model is proposed. Finally, a unified sample selection approach is developed for interactive video indexing, which takes into account the sample relevance, density and diversity information, and sample efficacy in minimizing the parameter variance of the proposed local learning regularized least square model. We compare the performance between our approach and the state-of-the-art approaches on the TREC video retrieval evaluation (TRECVID) benchmark. We report superior performance from the proposed approach.

140 citations

Journal ArticleDOI
TL;DR: A comprehensive survey on recent research and applications on online georeferenced media based on the current technical achievements, open research issues and challenges are identified, and directions that can lead to compelling applications are suggested.
Abstract: In recent years, the emergence of georeferenced media, like geotagged photos, on the Internet has opened up a new world of possibilities for geographic related research and applications. Despite of its short history, georeferenced media has been attracting attentions from several major research communities of Computer Vision, Multimedia, Digital Libraries and KDD. This paper provides a comprehensive survey on recent research and applications on online georeferenced media. Specifically, the survey focuses on four aspects: (1) organizing and browsing georeferenced media resources, (2) mining semantic/social knowledge from georeferenced media, (3) learning landmarks in the world, and (4) estimating geographic location of a photo. Furthermore, based on the current technical achievements, open research issues and challenges are identified, and directions that can lead to compelling applications are suggested.

100 citations


Cited by
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Proceedings ArticleDOI
20 Jun 2009
TL;DR: The experiments show that by using an attribute layer it is indeed possible to build a learning object detection system that does not require any training images of the target classes, and assembled a new large-scale dataset, “Animals with Attributes”, of over 30,000 animal images that match the 50 classes in Osherson's classic table of how strongly humans associate 85 semantic attributes with animal classes.
Abstract: We study the problem of object classification when training and test classes are disjoint, i.e. no training examples of the target classes are available. This setup has hardly been studied in computer vision research, but it is the rule rather than the exception, because the world contains tens of thousands of different object classes and for only a very few of them image, collections have been formed and annotated with suitable class labels. In this paper, we tackle the problem by introducing attribute-based classification. It performs object detection based on a human-specified high-level description of the target objects instead of training images. The description consists of arbitrary semantic attributes, like shape, color or even geographic information. Because such properties transcend the specific learning task at hand, they can be pre-learned, e.g. from image datasets unrelated to the current task. Afterwards, new classes can be detected based on their attribute representation, without the need for a new training phase. In order to evaluate our method and to facilitate research in this area, we have assembled a new large-scale dataset, “Animals with Attributes”, of over 30,000 animal images that match the 50 classes in Osherson's classic table of how strongly humans associate 85 semantic attributes with animal classes. Our experiments show that by using an attribute layer it is indeed possible to build a learning object detection system that does not require any training images of the target classes.

2,228 citations

Journal ArticleDOI
TL;DR: The concept of ensemble learning is introduced, traditional, novel and state‐of‐the‐art ensemble methods are reviewed and current challenges and trends in the field are discussed.
Abstract: Ensemble methods are considered the state‐of‐the art solution for many machine learning challenges. Such methods improve the predictive performance of a single model by training multiple models and combining their predictions. This paper introduce the concept of ensemble learning, reviews traditional, novel and state‐of‐the‐art ensemble methods and discusses current challenges and trends in the field.

1,381 citations

01 Jan 2015
TL;DR: The abstract should follow the structure of the article (relevance, degree of exploration of the problem, the goal, the main results, conclusion) and characterize the theoretical and practical significance of the study results.
Abstract: Summary) The abstract should follow the structure of the article (relevance, degree of exploration of the problem, the goal, the main results, conclusion) and characterize the theoretical and practical significance of the study results. The abstract should not contain wording echoing the title, cumbersome grammatical structures and abbreviations. The text should be written in scientific style. The volume of abstracts (summaries) depends on the content of the article, but should not be less than 250 words. All abbreviations must be disclosed in the summary (in spite of the fact that they will be disclosed in the main text of the article), references to the numbers of publications from reference list should not be made. The sentences of the abstract should constitute an integral text, which can be made by use of the words “consequently”, “for example”, “as a result”. Avoid the use of unnecessary introductory phrases (eg, “the author of the article considers...”, “The article presents...” and so on.)

1,229 citations

Posted Content
TL;DR: The history of person re-identification and its relationship with image classification and instance retrieval is introduced and two new re-ID tasks which are much closer to real-world applications are described and discussed.
Abstract: Person re-identification (re-ID) has become increasingly popular in the community due to its application and research significance. It aims at spotting a person of interest in other cameras. In the early days, hand-crafted algorithms and small-scale evaluation were predominantly reported. Recent years have witnessed the emergence of large-scale datasets and deep learning systems which make use of large data volumes. Considering different tasks, we classify most current re-ID methods into two classes, i.e., image-based and video-based; in both tasks, hand-crafted and deep learning systems will be reviewed. Moreover, two new re-ID tasks which are much closer to real-world applications are described and discussed, i.e., end-to-end re-ID and fast re-ID in very large galleries. This paper: 1) introduces the history of person re-ID and its relationship with image classification and instance retrieval; 2) surveys a broad selection of the hand-crafted systems and the large-scale methods in both image- and video-based re-ID; 3) describes critical future directions in end-to-end re-ID and fast retrieval in large galleries; and 4) finally briefs some important yet under-developed issues.

984 citations