scispace - formally typeset
Search or ask a question

Showing papers by "Sheng Tang published in 2008"


Sheng Tang, Jintao Li, Ming Li, Cheng Xie, Yizhi Liu, Kun Tao, Shao-Xi Xu1 •
01 Jan 2008
TL;DR: Zhang et al. as mentioned in this paper proposed a novel method based on Latent Dirichlet Allocation (LDA): LDA-based multiple-SVM (LDASVM) to improve the training efficiency and explore the knowledge between concepts or hidden sub-domains more easily and efficiently.
Abstract: For TRECVID 2008 concept detection task, we principally focus on: (1) Early fusion of texture, edge and color features TECM, abbreviation of the combined TF*IDF weights based on SIFT features, Edge Histogram, and Color Moments. (2) To improve the training efficiency and explore the knowledge between concepts or hidden sub-domains more easily and efficiently, we propose a novel method based on Latent Dirichlet Allocation (LDA): LDA-based multiple-SVM (LDASVM). We first use LDA to cluster all the keyframes into topics according to the maximum element of the topic-simplex representation vector (TRV) of each keyframe. Then, we train the annotated data in each topic for each concept. During training, unlike multi-bag SVM, we only use positive samples in current topic for the sake of retaining sample’s separability, instead of all positive samples among the whole training set, and ignore the topics with too few positive samples. While testing a keyframe for a given concept, we adopt TRV as the weight vector, instead of equal weighting strategy, to combine the SVM outputs of topic-models. (3) Introduction of Pseudo Relevance Feedback (PRF) into our concept detection system for the purpose of making re-trained models more adaptive to the test data: unlike existing PRF techniques in text and video retrieval, we propose a preliminary strategy to explore the visual features of positive training samples to improve the quality of pseudo positive samples. Experimental results demonstrate that our proposed LDASVM approach is both effective and efficient.

63 citations


Proceedings Article•DOI•
Yan Song1, An-An Liu1, Lin Pang1, Shouxun Lin1, Yongdong Zhang1, Sheng Tang1 •
14 May 2008
TL;DR: A coarse-to-fine text location method is implemented, a multi-scale approach is adopted to locate texts with different font sizes, and color-based k-means clustering is adopted in text segmentation.
Abstract: Texts in web pages, images and videos contain important clues for information indexing and retrieval. Most existing text extraction methods depend on the language type and text appearance. In this paper, a novel and universal method of image text extraction is proposed. A coarse-to-fine text location method is implemented. Firstly, a multi-scale approach is adopted to locate texts with different font sizes. Secondly, projection profiles are used in location refinement step. Color-based k-means clustering is adopted in text segmentation. Compared to grayscale image which is used in most existing methods, color image is more suitable for segmentation based on clustering. It treats corner-points, edge-points and other points equally so that it solves the problem of handling multilingual text. It is demonstrated in experimental results that best performance is obtained when k is 3. Comparative experimental results on a large number of images show that our method is accurate and robust in various conditions.

44 citations


Proceedings Article•DOI•
Ke Gao1, Shouxun Lin1, Yongdong Zhang1, Sheng Tang1, Huamin Ren1 •
14 May 2008
TL;DR: Experiments demonstrate that the attention model based SIFT keypoints filtration algorithm provides significant benefits both in retrieval accuracy and matching speed.
Abstract: Effective feature extraction is a fundamental component of content-based image retrieval. Scale Invariant Feature Transform (SIFT) has been proven to be the most robust local invariant feature descriptor. However, SIFT algorithm generates hundreds of thousands of keypoints per image, and most of them comes from background. This has seriously affected the application of SIFT in real-time image retrieval. This paper addresses this problem and proposes a novel method to filter the SIFT keypoints using attention model. Based on visual attention analysis, all of the keypoints in an image are ranked with their attention saliency, and only the most distinctive keypoints will be reserved. Then we use Bag of words to efficiently index these features. Experiments demonstrate that the attention model based SIFT keypoints filtration algorithm provides significant benefits both in retrieval accuracy and matching speed.

35 citations


Proceedings Article•DOI•
21 Apr 2008
TL;DR: The use of multimedia technology in generating intrinsic summaries of tourism related information through an automated process to gather, filter and classify information on various tourist spots on the Web made retrievable for mobile devices is highlighted.
Abstract: In this paper, we highlight the use of multimedia technology in generating intrinsic summaries of tourism related information. The system utilizes an automated process to gather, filter and classify information on various tourist spots on the Web. The end result present to the user is a personalized multimedia summary generated with respect to users queries filled with text, image, video and real-time news made retrievable for mobile devices. Preliminary experiments demonstrate the superiority of our presentation scheme to traditional methods.

18 citations


Proceedings Article•DOI•
07 Jan 2008
TL;DR: An innovative model of tempo and its application in action scene detection for movie analysis is presented, for the first time, and it is clearly proposed that tempo indicates the rhythm of both movie scenarios and human perception.
Abstract: In this paper, we present an innovative model of tempo and its application in action scene detection for movie analysis. For the first time, we clearly propose that tempo indicates the rhythm of both movie scenarios and human perception. By thoroughly analyzing both aspects, we classify the factors of tempo into two sorts. The first is based on the film grammar and we use the low level features of shot length and camera motion to describe filmmaking by directors. The second is based on the human perception and we originally propose the information measure for perception depending on the cognitive informatics, a newly emerging and significative subject. With the information in both visual and auditory modalities, the low level features of motion intensity, motion complexity, audio energy and audio pace are integrated for the formulation of information to describe the viewers' emotional changes to continuously developing storyline. With both aspects, tempo is defined and tempo flow plot is derived as the clue of storyline. On the basis of video structuralization and movie tempo analysis, we build a system for hierarchical browse and edit with action scene annotation. The large-scale experiments demonstrate the effectiveness and generality of tempo for action movie analysis.In this paper, we present an innovative model of tempo and its application in action scene detection for movie analysis. For the first time, we clearly propose that tempo indicates the rhythm of both movie scenarios and human perception. By thoroughly analyzing both aspects, we classify the factors of tempo into two sorts. The first is based on the film grammar and we use the low level features of Shot Length and Camera Motion to describe filmmaking by directors. The second is based on the human perception and we originally propose the information measure for perception depending on the cognitive informatics, a newly emerging and significative subject. With the information in both visual and auditory modalities, the low level features of Motion Intensity, Motion Complexity, Audio Energy and Audio Pace are integrated for the formulation of information to describe the viewers' emotional changes to continuously developing storyline. With both aspects, tempo is defined and tempo flow plot is derived as the clue of storyline. On the basis of video structuralization and movie tempo analysis, we build a system for hierarchical browse and edit with action scene annotation. The large-scale experiments demonstrate the effectiveness and generality of tempo for action movie analysis.

12 citations


Proceedings Article•DOI•
Xiao Wu1, Yongdong Zhang1, Sheng Tang1, Xia Tian1, Jintao Li1 •
07 Jan 2008
TL;DR: This paper presents a hierarchical scheme to detect video copies, especially the temporal attacked and re-encoded ones, based on the ordinal signature of intra frames and effective R*-tree indexing structure archives real time performance.
Abstract: Today with the rapid increasing popularity of web video sharing, digital copyright protection encounters many troubles. Video copy detection schemes are emerging to cope with the digital video piracy and illegal distribution problems. But the large amount of video data and diversity of copy attacks pose difficulties on copy detection. This paper presents a hierarchical scheme to detect video copies, especially the temporal attacked and re-encoded ones. Our algorithm which is based on the ordinal signature of intra frames and effective R*-tree indexing structure archives real time performance. Comparison experiments are conducted on the benchmarked database of CIVR 2007 copy detection showcase and demonstrate the promising results of the proposed approach.

12 citations


Proceedings Article•DOI•
18 May 2008
TL;DR: Experimental results on soccer video are promising, demonstrating the effectiveness of the proposed framework, which realizes segments and classifies video stream into replay and non-replay shots simultaneously.
Abstract: A novel statistical framework for replay detection is presented in this paper. Unlike current methods, the proposed framework exploits both inherent characters and transition relations of replay and non-replay scenes based on annotation of the video, which realizes segments and classifies video stream into replay and non-replay shots simultaneously. After annotation, the detected replay segment is further verified and its boundaries are adjusted to get more accurate replay segment considering probability distribution of lengths of replay and non-replay shots. Experimental results on soccer video are promising, demonstrating the effectiveness of the proposed framework.

11 citations


Book Chapter•DOI•
18 Jun 2008
TL;DR: A novel non-negative matrix factorization to the affinity matrix for document clustering, which enforces non-negativity and orthogonality constraints simultaneously and presents a much more reasonable clustering interpretation than the previous NMF-based clustering methods.
Abstract: In this paper, we propose a novel non-negative matrix factorization (NMF) to the affinity matrix for document clustering, which enforces non-negativity and orthogonality constraints simultaneously. With the help of orthogonality constraints, this NMF provides a solution to spectral clustering, which inherits the advantages of spectral clustering and presents a much more reasonable clustering interpretation than the previous NMF-based clustering methods. Furthermore, with the help of non-negativity constraints, the proposed method is also superior to traditional eigenvector-based spectral clustering, as it can inherit the benefits of NMF-based methods that the non-negative solution is institutive, from which the final clusters could be directly derived. As a result, the proposed method combines the advantages of spectral clustering and the NMF-based methods together, and hence outperforms both of them, which is demonstrated by experimental results on TDT2 and Reuters-21578 corpus.

6 citations


Proceedings Article•DOI•
23 Jun 2008
TL;DR: The promising results of userspsila subjective assessment indicate that the proposed framework for movie content analysis is applicable for automatic analysis of movie content by computers.
Abstract: In this paper, we specially propose a hierarchical framework for movie content analysis. The purpose of our work is trying to realize computerspsila understanding for movie content, especially ldquowho, what, where, howrdquo which occur in the storyline by imitating human perception and cognition. The framework consists of two hierarchies. As for the low level part, we originally construct the human attention model with temporal information motivated by the Weber-Fechner Law to depict the variation of human perception in multiple modalities. As for the high level part, we focus on semantic understanding of different granularities of videos and simulate human cognition for movie content. Based on this hierarchical framework, we present its applications on semantic retrieval, video summarization and content filter. The promising results of userspsila subjective assessment indicate that the proposed framework is applicable for automatic analysis of movie content by computers.

5 citations


Book Chapter•DOI•
20 May 2008
TL;DR: A new variant algorithm of LLE is presented, which can effectively prune "short circuit" edges by performing spatial search on the R*-Tree built on the dataset, which makes the original fixed neighborhood size to be a self-tuning value, thus makes the algorithm have more topologically stableness than LLE does.
Abstract: Locally linear embedding is a popular manifold learning algorithm for nonlinear dimensionality reduction. However, the success of LLE depends greatly on an input parameter - neighborhood size, and it is still an open problem how to find the optimal value for it. This paper focuses on this parameter, proposes that it should be self-tuning according to local density not a uniform value for all the data as LLE does, and presents a new variant algorithm of LLE, which can effectively prune "short circuit" edges by performing spatial search on the R*-Tree built on the dataset. This pruning leads the original fixed neighborhood size to be a self-tuning value, thus makes our algorithm have more topologically stableness than LLE does. The experiments prove that our idea and method are correct.

4 citations


Book Chapter•DOI•
19 Oct 2008
TL;DR: An novel object-based image retrieval framework that integrates effective pre-treatment and re-ranking is presented, and a new feature filtration method based on attention analysis is proposed for pre- treatment.
Abstract: In this paper, a new method is proposed for object-based image retrieval. The user supplies a query object by selecting a region from a query image, and the system returns a ranked list of images that contain the same object, retrieved from a large image database. The main outcomes of this research are as follows: (1) An novel object-based image retrieval framework that integrates effective pre-treatment and re-ranking is presented, (2) a new feature filtration method based on attention analysis is proposed for pre-treatment, (3) to further improve object retrieval precision, we add an efficient spatial configuration model to re-rank the primary retrieval result using Bag of Word method. Experimental results demonstrate the effectiveness of our method.

Proceedings Article•DOI•
01 Jan 2008
TL;DR: Based on the spatio-temporal consistency, the algorithm aims to utilize the invariant pattern of visual information for video matching to verify the robustness and efficiency of the algorithm.
Abstract: Video copy detection is essentially a problem of large scale pattern matching. Various copy attacks which change the visual appearance impose hazard on this task. Based on the spatio-temporal consistency, our algorithm aims to utilize the invariant pattern of visual information for video matching. Position correlation of trajectory feature points is calculated as the signature for fast detection. Experiments using benchmarked dataset and commonly happened copy attacks verify the robustness and efficiency of our algorithm.

Proceedings Article•DOI•
01 Jun 2008
TL;DR: This paper presents a personalized news video retrieval engine, which exploits the individual userpsilas previous browsing history to customize and enhance their future search results.
Abstract: Personalization especially in the domain of information retrieval is essentially important, as users might pose the same query even when they are searching for different information. It is thus necessary to create a retrieval engine which takes into consideration the dynamic information needs of different users. This paper presents our personalized news video retrieval engine, which exploits the individual userpsilas previous browsing history to customize and enhance their future search results. Specifically, the system utilizes the news topic hierarchy, a hierarchical news topic structure derived from unsupervised clustering on the news video corpus and event entities from news video and online news articles. We then dynamically project userpsilas browsing history onto this topic hierarchy to provide the basis for re-ranking relevant news videos. This system is tested on one month of TRECVID 2006 dataset consisting of 80 hours news video and found to return results in a more intuitive and personalized manner.

Book Chapter•DOI•
09 Dec 2008
TL;DR: This paper tries to measure the separation level of samples in subregions of feature space, and integrate them for evaluating the separability of features, and proposes a novel feature selection method named Local Separability Assessment.
Abstract: Feature selection technology can help to reduce feature redundancy and improve classification performance. Most general feature selection methods do not perform well on high-dimension large-scale data sets of multimedia applications. In this paper we propose a novel feature selection method named Local Separability Assessment. We try to measure the separation level of samples in subregions of feature space, and integrate them for evaluating the separability of features. Our method has favorable performance on large-scale continuous data sets, and requires no priori hypothesis on data distribution. The experiments on various applications have proved its excellence.

Book Chapter•DOI•
Xuefeng Pan1, Yongdong Zhang1, Jintao Li1, Xiaoyuan Cao1, Sheng Tang1 •
18 Jun 2008
TL;DR: The shortage of using common feature space for content representing and continuity computing is demonstrated, and a denoising method that can effectively restrain the in-shot change for SBD is proposed.
Abstract: Shot boundary detection (SBD) has long been an important problem in content based video analyzing. In existing works, researchers proposed kinds of methods to analyze the continuity of video sequence for SBD. However, the conventional methods focus on analyzing adjacent frame continuity information in some common feature space. The feature space for content representing and continuity computing is seldom specialized for different parts of video content. In this paper, we demonstrate the shortage of using common feature space, and propose a denoising method that can effectively restrain the in-shot change for SBD. A local subspace specialized for every period of video content is used to develop the denoising method. The experiment results show the proposed method can remove the noise effectively and promote the performance of SBD.