Showing papers by "Sheng Tang published in 2008"

PDF

Open Access

TRECVID 2008 High-Level Feature Extraction By MCG-ICT-CAS*

[...]

Sheng Tang, Jintao Li, Ming Li, Cheng Xie, Yizhi Liu, Kun Tao, Shao-Xi Xu¹ - Show less +3 more•Institutions (1)

01 Jan 2008

TL;DR: Zhang et al. as mentioned in this paper proposed a novel method based on Latent Dirichlet Allocation (LDA): LDA-based multiple-SVM (LDASVM) to improve the training efficiency and explore the knowledge between concepts or hidden sub-domains more easily and efficiently.

...read moreread less

Abstract: For TRECVID 2008 concept detection task, we principally focus on: (1) Early fusion of texture, edge and color features TECM, abbreviation of the combined TF*IDF weights based on SIFT features, Edge Histogram, and Color Moments. (2) To improve the training efficiency and explore the knowledge between concepts or hidden sub-domains more easily and efficiently, we propose a novel method based on Latent Dirichlet Allocation (LDA): LDA-based multiple-SVM (LDASVM). We first use LDA to cluster all the keyframes into topics according to the maximum element of the topic-simplex representation vector (TRV) of each keyframe. Then, we train the annotated data in each topic for each concept. During training, unlike multi-bag SVM, we only use positive samples in current topic for the sake of retaining sample’s separability, instead of all positive samples among the whole training set, and ignore the topics with too few positive samples. While testing a keyframe for a given concept, we adopt TRV as the weight vector, instead of equal weighting strategy, to combine the SVM outputs of topic-models. (3) Introduction of Pseudo Relevance Feedback (PRF) into our concept detection system for the purpose of making re-trained models more adaptive to the test data: unlike existing PRF techniques in text and video retrieval, we propose a preliminary strategy to explore the visual features of positive training samples to improve the quality of pseudo positive samples. Experimental results demonstrate that our proposed LDASVM approach is both effective and efficient.

...read moreread less

63 citations

Proceedings Article•DOI•

A Novel Image Text Extraction Method Based on K-Means Clustering

[...]

Yan Song¹, An-An Liu¹, Lin Pang¹, Shouxun Lin¹, Yongdong Zhang¹, Sheng Tang¹ - Show less +2 more•Institutions (1)

Chinese Academy of Sciences¹

14 May 2008

TL;DR: A coarse-to-fine text location method is implemented, a multi-scale approach is adopted to locate texts with different font sizes, and color-based k-means clustering is adopted in text segmentation.

...read moreread less

Abstract: Texts in web pages, images and videos contain important clues for information indexing and retrieval. Most existing text extraction methods depend on the language type and text appearance. In this paper, a novel and universal method of image text extraction is proposed. A coarse-to-fine text location method is implemented. Firstly, a multi-scale approach is adopted to locate texts with different font sizes. Secondly, projection profiles are used in location refinement step. Color-based k-means clustering is adopted in text segmentation. Compared to grayscale image which is used in most existing methods, color image is more suitable for segmentation based on clustering. It treats corner-points, edge-points and other points equally so that it solves the problem of handling multilingual text. It is demonstrated in experimental results that best performance is obtained when k is 3. Comparative experimental results on a large number of images show that our method is accurate and robust in various conditions.

...read moreread less

44 citations

Proceedings Article•DOI•

Attention Model Based SIFT Keypoints Filtration for Image Retrieval

[...]

Ke Gao¹, Shouxun Lin¹, Yongdong Zhang¹, Sheng Tang¹, Huamin Ren¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

14 May 2008

TL;DR: Experiments demonstrate that the attention model based SIFT keypoints filtration algorithm provides significant benefits both in retrieval accuracy and matching speed.

...read moreread less

Abstract: Effective feature extraction is a fundamental component of content-based image retrieval. Scale Invariant Feature Transform (SIFT) has been proven to be the most robust local invariant feature descriptor. However, SIFT algorithm generates hundreds of thousands of keypoints per image, and most of them comes from background. This has seriously affected the application of SIFT in real-time image retrieval. This paper addresses this problem and proposes a novel method to filter the SIFT keypoints using attention model. Based on visual attention analysis, all of the keypoints in an image are ranked with their attention saliency, and only the most distinctive keypoints will be reserved. Then we use Bag of words to efficiently index these features. Experiments demonstrate that the attention model based SIFT keypoints filtration algorithm provides significant benefits both in retrieval accuracy and matching speed.

...read moreread less

35 citations

Proceedings Article•DOI•

Personalized multimedia web summarizer for tourist

[...]

Xiao Wu¹, Jintao Li¹, Yongdong Zhang¹, Sheng Tang¹, Shi-Yong Neo² - Show less +1 more•Institutions (2)

Chinese Academy of Sciences¹, National University of Singapore²

21 Apr 2008

TL;DR: The use of multimedia technology in generating intrinsic summaries of tourism related information through an automated process to gather, filter and classify information on various tourist spots on the Web made retrievable for mobile devices is highlighted.

...read moreread less

Abstract: In this paper, we highlight the use of multimedia technology in generating intrinsic summaries of tourism related information. The system utilizes an automated process to gather, filter and classify information on various tourist spots on the Web. The end result present to the user is a personalized multimedia summary generated with respect to users queries filled with text, image, video and real-time news made retrievable for mobile devices. Preliminary experiments demonstrate the superiority of our presentation scheme to traditional methods.

...read moreread less

18 citations

Proceedings Article•DOI•

An Innovative Model of Tempo and Its Application in Action Scene Detection for Movie Analysis

[...]

An-An Liu¹, Jintao Li², Yongdong Zhang², Sheng Tang², Yan Song², Zhaoxuan Yang¹ - Show less +2 more•Institutions (2)

Tianjin University¹, Chinese Academy of Sciences²

07 Jan 2008

TL;DR: An innovative model of tempo and its application in action scene detection for movie analysis is presented, for the first time, and it is clearly proposed that tempo indicates the rhythm of both movie scenarios and human perception.

...read moreread less

Abstract: In this paper, we present an innovative model of tempo and its application in action scene detection for movie analysis. For the first time, we clearly propose that tempo indicates the rhythm of both movie scenarios and human perception. By thoroughly analyzing both aspects, we classify the factors of tempo into two sorts. The first is based on the film grammar and we use the low level features of shot length and camera motion to describe filmmaking by directors. The second is based on the human perception and we originally propose the information measure for perception depending on the cognitive informatics, a newly emerging and significative subject. With the information in both visual and auditory modalities, the low level features of motion intensity, motion complexity, audio energy and audio pace are integrated for the formulation of information to describe the viewers' emotional changes to continuously developing storyline. With both aspects, tempo is defined and tempo flow plot is derived as the clue of storyline. On the basis of video structuralization and movie tempo analysis, we build a system for hierarchical browse and edit with action scene annotation. The large-scale experiments demonstrate the effectiveness and generality of tempo for action movie analysis.In this paper, we present an innovative model of tempo and its application in action scene detection for movie analysis. For the first time, we clearly propose that tempo indicates the rhythm of both movie scenarios and human perception. By thoroughly analyzing both aspects, we classify the factors of tempo into two sorts. The first is based on the film grammar and we use the low level features of Shot Length and Camera Motion to describe filmmaking by directors. The second is based on the human perception and we originally propose the information measure for perception depending on the cognitive informatics, a newly emerging and significative subject. With the information in both visual and auditory modalities, the low level features of Motion Intensity, Motion Complexity, Audio Energy and Audio Pace are integrated for the formulation of information to describe the viewers' emotional changes to continuously developing storyline. With both aspects, tempo is defined and tempo flow plot is derived as the clue of storyline. On the basis of video structuralization and movie tempo analysis, we build a system for hierarchical browse and edit with action scene annotation. The large-scale experiments demonstrate the effectiveness and generality of tempo for action movie analysis.

...read moreread less

12 citations

Proceedings Article•DOI•

A Hierarchical Scheme for Rapid Video Copy Detection

[...]

Xiao Wu¹, Yongdong Zhang¹, Sheng Tang¹, Xia Tian¹, Jintao Li¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

07 Jan 2008

TL;DR: This paper presents a hierarchical scheme to detect video copies, especially the temporal attacked and re-encoded ones, based on the ordinal signature of intra frames and effective R*-tree indexing structure archives real time performance.

...read moreread less

Abstract: Today with the rapid increasing popularity of web video sharing, digital copyright protection encounters many troubles. Video copy detection schemes are emerging to cope with the digital video piracy and illegal distribution problems. But the large amount of video data and diversity of copy attacks pose difficulties on copy detection. This paper presents a hierarchical scheme to detect video copies, especially the temporal attacked and re-encoded ones. Our algorithm which is based on the ordinal signature of intra frames and effective R*-tree indexing structure archives real time performance. Comparison experiments are conducted on the benchmarked database of CIVR 2007 copy detection showcase and demonstrate the promising results of the proposed approach.

...read moreread less

12 citations

Proceedings Article•DOI•

A statistical framework for replay detection in soccer video

[...]

Ying Yang¹, Shouxun Lin¹, Yongdong Zhang¹, Sheng Tang¹•Institutions (1)

Chinese Academy of Sciences¹

18 May 2008

TL;DR: Experimental results on soccer video are promising, demonstrating the effectiveness of the proposed framework, which realizes segments and classifies video stream into replay and non-replay shots simultaneously.

...read moreread less

Abstract: A novel statistical framework for replay detection is presented in this paper. Unlike current methods, the proposed framework exploits both inherent characters and transition relations of replay and non-replay scenes based on annotation of the video, which realizes segments and classifies video stream into replay and non-replay shots simultaneously. After annotation, the detected replay segment is further verified and its boundaries are adjusted to get more accurate replay segment considering probability distribution of lengths of replay and non-replay shots. Experimental results on soccer video are promising, demonstrating the effectiveness of the proposed framework.

...read moreread less

11 citations

Book Chapter•DOI•

Document Clustering Based on Spectral Clustering and Non-negative Matrix Factorization

[...]

Lei Bao¹, Sheng Tang¹, Jintao Li¹, Yongdong Zhang¹, Weiping Ye² - Show less +1 more•Institutions (2)

Chinese Academy of Sciences¹, Beijing Normal University²

18 Jun 2008

TL;DR: A novel non-negative matrix factorization to the affinity matrix for document clustering, which enforces non-negativity and orthogonality constraints simultaneously and presents a much more reasonable clustering interpretation than the previous NMF-based clustering methods.

...read moreread less

Abstract: In this paper, we propose a novel non-negative matrix factorization (NMF) to the affinity matrix for document clustering, which enforces non-negativity and orthogonality constraints simultaneously. With the help of orthogonality constraints, this NMF provides a solution to spectral clustering, which inherits the advantages of spectral clustering and presents a much more reasonable clustering interpretation than the previous NMF-based clustering methods. Furthermore, with the help of non-negativity constraints, the proposed method is also superior to traditional eigenvector-based spectral clustering, as it can inherit the benefits of NMF-based methods that the non-negative solution is institutive, from which the final clusters could be directly derived. As a result, the proposed method combines the advantages of spectral clustering and the NMF-based methods together, and hence outperforms both of them, which is demonstrated by experimental results on TDT2 and Reuters-21578 corpus.

...read moreread less

6 citations

Proceedings Article•DOI•

A hierarchical framework for movie content analysis: Let computers watch films like humans

[...]

An-An Liu¹, Sheng Tang¹, Yongdong Zhang¹, Yan Song¹, Jintao Li¹, Zhaoxuan Yang² - Show less +2 more•Institutions (2)

Chinese Academy of Sciences¹, Tianjin University²

23 Jun 2008

TL;DR: The promising results of userspsila subjective assessment indicate that the proposed framework for movie content analysis is applicable for automatic analysis of movie content by computers.

...read moreread less

Abstract: In this paper, we specially propose a hierarchical framework for movie content analysis. The purpose of our work is trying to realize computerspsila understanding for movie content, especially ldquowho, what, where, howrdquo which occur in the storyline by imitating human perception and cognition. The framework consists of two hierarchies. As for the low level part, we originally construct the human attention model with temporal information motivated by the Weber-Fechner Law to depict the variation of human perception in multiple modalities. As for the high level part, we focus on semantic understanding of different granularities of videos and simulate human cognition for movie content. Based on this hierarchical framework, we present its applications on semantic retrieval, video summarization and content filter. The promising results of userspsila subjective assessment indicate that the proposed framework is applicable for automatic analysis of movie content by computers.

...read moreread less

5 citations

Book Chapter•DOI•

A more topologically stable locally linear embedding algorithm based on R*-tree

[...]

Xia Tian¹, Jintao Li¹, Yongdong Zhang¹, Sheng Tang¹•Institutions (1)

Chinese Academy of Sciences¹

20 May 2008

TL;DR: A new variant algorithm of LLE is presented, which can effectively prune "short circuit" edges by performing spatial search on the R*-Tree built on the dataset, which makes the original fixed neighborhood size to be a self-tuning value, thus makes the algorithm have more topologically stableness than LLE does.

...read moreread less

Abstract: Locally linear embedding is a popular manifold learning algorithm for nonlinear dimensionality reduction. However, the success of LLE depends greatly on an input parameter - neighborhood size, and it is still an open problem how to find the optimal value for it. This paper focuses on this parameter, proposes that it should be self-tuning according to local density not a uniform value for all the data as LLE does, and presents a new variant algorithm of LLE, which can effectively prune "short circuit" edges by performing spatial search on the R*-Tree built on the dataset. This pruning leads the original fixed neighborhood size to be a self-tuning value, thus makes our algorithm have more topologically stableness than LLE does. The experiments prove that our idea and method are correct.

...read moreread less

4 citations

Book Chapter•DOI•

Object-based image retrieval with attention analysis and spatial re- ranking

[...]

Ke Gao¹, Shouxun Lin¹, Yongdong Zhang¹, Sheng Tang¹•Institutions (1)

Chinese Academy of Sciences¹

19 Oct 2008

TL;DR: An novel object-based image retrieval framework that integrates effective pre-treatment and re-ranking is presented, and a new feature filtration method based on attention analysis is proposed for pre- treatment.

...read moreread less

Abstract: In this paper, a new method is proposed for object-based image retrieval. The user supplies a query object by selecting a region from a query image, and the system returns a ranked list of images that contain the same object, retrieved from a large image database. The main outcomes of this research are as follows: (1) An novel object-based image retrieval framework that integrates effective pre-treatment and re-ranking is presented, (2) a new feature filtration method based on attention analysis is proposed for pre-treatment, (3) to further improve object retrieval precision, we add an efficient spatial configuration model to re-rank the primary retrieval result using Bag of Word method. Experimental results demonstrate the effectiveness of our method.

...read moreread less

Proceedings Article•DOI•

Spatio-temporal visual consistency for video copy detection

[...]

Xiao Wu¹, Jintao Li¹, Yongdong Zhang¹, Sheng Tang¹•Institutions (1)

Chinese Academy of Sciences¹

01 Jan 2008

TL;DR: Based on the spatio-temporal consistency, the algorithm aims to utilize the invariant pattern of visual information for video matching to verify the robustness and efficiency of the algorithm.

...read moreread less

Abstract: Video copy detection is essentially a problem of large scale pattern matching. Various copy attacks which change the visual appearance impose hazard on this task. Based on the spatio-temporal consistency, our algorithm aims to utilize the invariant pattern of visual information for video matching. Position correlation of trajectory feature points is calculated as the signature for fast detection. Experiments using benchmarked dataset and commonly happened copy attacks verify the robustness and efficiency of our algorithm.

...read moreread less

Proceedings Article•DOI•

Personalized event-based news video retrieval with dynamic user-log

[...]

Ming Li, Yan-Tao Zheng, Shi-Yong Neo, Xiangdong Wang¹, Sheng Tang¹, Shouxun Lin¹ - Show less +2 more•Institutions (1)

Chinese Academy of Sciences¹

01 Jun 2008

TL;DR: This paper presents a personalized news video retrieval engine, which exploits the individual userpsilas previous browsing history to customize and enhance their future search results.

...read moreread less

Abstract: Personalization especially in the domain of information retrieval is essentially important, as users might pose the same query even when they are searching for different information. It is thus necessary to create a retrieval engine which takes into consideration the dynamic information needs of different users. This paper presents our personalized news video retrieval engine, which exploits the individual userpsilas previous browsing history to customize and enhance their future search results. Specifically, the system utilizes the news topic hierarchy, a hierarchical news topic structure derived from unsupervised clustering on the news video corpus and event entities from news video and online news articles. We then dynamically project userpsilas browsing history onto this topic hierarchy to provide the basis for re-ranking relevant news videos. This system is tested on one month of TRECVID 2006 dataset consisting of 80 hours news video and found to return results in a more intuitive and personalized manner.

...read moreread less

Book Chapter•DOI•

Local Separability Assessment: A Novel Feature Selection Method for Multimedia Applications

[...]

Kun Tao¹, Shouxun Lin¹, Yongdong Zhang¹, Sheng Tang¹•Institutions (1)

Chinese Academy of Sciences¹

09 Dec 2008

TL;DR: This paper tries to measure the separation level of samples in subregions of feature space, and integrate them for evaluating the separability of features, and proposes a novel feature selection method named Local Separability Assessment.

...read moreread less

Abstract: Feature selection technology can help to reduce feature redundancy and improve classification performance. Most general feature selection methods do not perform well on high-dimension large-scale data sets of multimedia applications. In this paper we propose a novel feature selection method named Local Separability Assessment. We try to measure the separation level of samples in subregions of feature space, and integrate them for evaluating the separability of features. Our method has favorable performance on large-scale continuous data sets, and requires no priori hypothesis on data distribution. The experiments on various applications have proved its excellence.

...read moreread less

Book Chapter•DOI•

Local Subspace-Based Denoising for Shot Boundary Detection

[...]

Xuefeng Pan¹, Yongdong Zhang¹, Jintao Li¹, Xiaoyuan Cao¹, Sheng Tang¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

18 Jun 2008

TL;DR: The shortage of using common feature space for content representing and continuity computing is demonstrated, and a denoising method that can effectively restrain the in-shot change for SBD is proposed.

...read moreread less

Abstract: Shot boundary detection (SBD) has long been an important problem in content based video analyzing. In existing works, researchers proposed kinds of methods to analyze the continuity of video sequence for SBD. However, the conventional methods focus on analyzing adjacent frame continuity information in some common feature space. The feature space for content representing and continuity computing is seldom specialized for different parts of video content. In this paper, we demonstrate the shortage of using common feature space, and propose a denoising method that can effectively restrain the in-shot change for SBD. A local subspace specialized for every period of video content is used to develop the denoising method. The experiment results show the proposed method can remove the noise effectively and promote the performance of SBD.

...read moreread less