Showing papers by "Sheng Tang published in 2007"

PDF

Open Access

TRECVID 2007 High-Level Feature Extraction By MCG-ICT-CAS *

[...]

Sheng Tang, Yongdong Zhang, Jintao Li, Ming Li Na Cai, Xu Zhang, Kun Tao, Li Tan, Shao-Xi Xu, Yuan-Yuan Ran¹ - Show less +5 more•Institutions (1)

Chinese Academy of Sciences¹

01 Jan 2007

TL;DR: An EMD-based bag-of-feature method is proposed to exploit visual/spatial information, and WordNet is utilized to expand semantic meanings of text to boost up the generalization of detectors.

...read moreread less

Abstract: We participated in the high-level feature extraction task in TRECVID 2007. This paper describes the details of our system for the task. For feature extraction, we propose an EMD-based bag-of-feature method to exploit visual/spatial information, and utilize WordNet to expand semantic meanings of text to boost up the generalization of detectors. We also explore audio features and extract the motion cues in compressed domain for detecting concepts highly associated with audio/motion. We use Ordered Weighted Average (OWA) fusion method to combine the SVM-based multi-modal concept detection results. Experiment results show that our methods are effective.

...read moreread less

43 citations

Proceedings Article•DOI•

Highlights extraction in soccer videos based on goal-mouth detection

[...]

Ying Yang¹, Shouxun Lin¹, Yongdong Zhang¹, Sheng Tang¹•Institutions (1)

Chinese Academy of Sciences¹

01 Feb 2007

TL;DR: A new method based on the observations that the appearance of goal-mouth points to a high likelihood of exciting action in soccer videos and that highlight is composed of certain types of scene views which exhibit certain transition rules is proposed.

...read moreread less

Abstract: A new method is proposed for highlight extraction in soccer videos based on goal-mouth detection. This approach is based on the observations that the appearance of goal-mouth points to a high likelihood of exciting action in soccer videos and that highlight is composed of certain types of scene views which exhibit certain transition rules. To exploit those observations, first goal-mouth are detected and segmented in soccer videos by Top-Hat Transform and some domain rules, and then with scene transition rules highlight is extracted based on goal-mouth detection. The effectiveness and efficiency of this approach are demonstrated by the experimental results on shot detection.

...read moreread less

19 citations

Proceedings Article•DOI•

LDA-Based Retrieval Framework for Semantic News Video Retrieval

[...]

J. Caol¹, Jintao Li¹, Yongdong Zhang¹, Sheng Tang¹•Institutions (1)

Chinese Academy of Sciences¹

17 Sep 2007

TL;DR: This paper proposed a lexicon-guided two-level LDA retrieval framework, which uses the HowNet to guide the first level LDA model's parameter estimation, and further constructs the second layer LDA models based on the first-level's inference results.

...read moreread less

Abstract: Topic-based language model has attracted much attention as the propounding of semantic retrieval in recent years. Especially for the ASR text with errors, the topic representation is more reasonable than the exact term representation. Among these models, Latent Dirichlet Allocation(LDA) has been noted for its ability to discover the latent topic structure, and is broadly applied in many text-related tasks. But up to now its application in information retrieval(IR) is still limited to be a supplement to the standard document models, and furthermore, it has been pointed out that directly employing the basic LDA model will hurt retrieval performance. In this paper, we propose a lexicon-guided two-level LDA retrieval framework. It uses the HowNet to guide the first-level LDA model's parameter estimation, and further construct the second-level LDA models based on the first-level's inference results. We use TRECID 2005 ASR collection to evaluate it, and compare it with the vector space model(VSM) and latent semantic Indexing(LSI). Our experiments show the proposed method is very competitive.

...read moreread less

18 citations

Journal Article•DOI•

Secure and incidental distortion tolerant digital signature for image authentication

[...]

Yongdong Zhang¹, Sheng Tang¹, Jintao Li¹•Institutions (1)

Chinese Academy of Sciences¹

01 Jul 2007-Journal of Computer Science and Technology

TL;DR: A secure and incidental distortion tolerant signature method for image authentication based on Hotelling’s T-square Statistic via Principal Component Analysis (PCA) of block DCT coefficients and Structural and Statistical Signature (SSS).

...read moreread less

Abstract: In this paper, a secure and incidental distortion tolerant signature method for image authentication is proposed. The generation of authentication signature is based on Hotelling's T-square Statistic (HTS) via Principal Component Analysis (PCA) of block DCT coefficients. HTS values of all blocks construct a unique and stable "block-edge image", i.e., Structural and Statistical Signature (SSS). The characteristic of SSS is that it is short, and can tolerate content-preserving manipulations while keeping sensitive to content-changing attacks, and locate tampering easily. During signature matching, the Fisher criterion is used to obtain optimal threshold for automatically and universally distinguishing incidental manipulations from malicious attacks. Moreover, the security of SSS is achieved by encryption of the DCT coefficients with chaotic sequences before PCA. Experiments show that the novel method is effective for authentication.

...read moreread less

12 citations

Book Chapter•DOI•

Statistical framework for shot segmentation and classification in sports video

[...]

Ying Yang¹, Shouxun Lin¹, Yongdong Zhang¹, Sheng Tang¹•Institutions (1)

Chinese Academy of Sciences¹

18 Nov 2007

TL;DR: A novel statistical framework is proposed for shot segmentation and classification that segments shot considering the character of intra- shot to classify shot, while classifies shot considering character of inter-shot to segment shot, which obtain more accurate results.

...read moreread less

Abstract: In this paper, a novel statistical framework is proposed for shot segmentation and classification. The proposed framework segments and classifies shots simultaneously using same difference features based on statistical inference. The task of shot segmentation and classification is taken as finding the most possible shot sequence given feature sequences, and it can be formulated by a conditional probability which can be divided into a shot sequence probability and a feature sequence probability. Shot sequence probability is derived from relations between adjacent shots by Bi-gram, and feature sequence probability is dependent on inherent character of shot modeled by HMM. Thus, the proposed framework segments shot considering the character of intra-shot to classify shot, while classifies shot considering character of inter-shot to segment shot, which obtain more accurate results. Experimental results on soccer and badminton videos are promising, and demonstrate the effectiveness of the proposed framework.

...read moreread less

10 citations

TRECVID 2007 Search Tasks by NUS-ICT

[...]

Tat-Seng Chua, Shi-Yong Neo, Yan-Tao Zheng, Hai-Kiat Goh, Xiaoming Zhang, Sheng Tang, Yongdong Zhang, Jintao Li, Juan Cao, Huanbo Luan, Qiao-Yan He, Xu Zhang - Show less +8 more

01 Jan 2007

TL;DR: The overall framework of the video search and retrieval for both automated and interactive system is shown, which focuses on designing a high performance feedback system, from which users can make use of several autofeedback and active learning functions to improve the retrieval performance.

...read moreread less

Abstract: This paper describes the details of our systems for our automated and interactive search in TRECVID 2007. The shift from news video to documentary video this year has prompted a series of changes in processing techniques from that developed over the past few years. For the automated search task, we employ our previous querydependent retrieval which automatically discovers query class and query-high-level-features (query-HLF) to fuse available multimodal features. Different from previous works, our system this year gives more emphasis to visual features such as color, texture and motion in the video source. The reasons are: (a) given the low quality of ASR text and the more visual and motion oriented queries, we expect the visual features to be as discriminating as text feature; and (b) the appropriate use of motion features is highly effective for queries as they are able to model intra-frame changes. For the interactive task, we first utilize the results from the automated search results for user feedback. The user is able to make use of our intuitive retrieval interface with a variety of relevance feedback techniques to refine the search results. In addition, we introduce the motion-icons, which allow users to see a dynamic series of keyframes instead of a single keyframe during assessment. Results show that the approach can help in providing better discrimination. 1. INTRODUCTION The overall framework of our video search and retrieval for both automated and interactive system is shown in Figure 1. There are two main stages: the auto search stage and the interactive search stage. The retrieval starts with the user query, which can simply be a free text query; or coupled with image and video (multimedia query). The auto search first processes the multimedia query and performs the retrieval. The emphasis is on understanding the query to infer the roles of HLF, motion and visual features in query processing. For the interactive search, the user will make use of the automated search results to indicate whether the results are indeed relevant or otherwise. The emphasis is on designing a high performance feedback system, from which users can make use of several autofeedback and active learning functions to improve the retrieval performance. The domain of corpus for this year is the Dutch documentary video. The videos are preprocessed, segmented into shots with the speech track automatically recognized using a commercial automated speech recognition (ASR) engine and translated to English text. As a result of ASR and translation, the quality of ASR text is quite low. This, coupled with a large number of visual and motion oriented queries, suggests that ASR text may not play a critical role in the retrieval process. In fact, visual and motion information will be as important as text, as we move from news video to Dutch documentary video retrievals.

...read moreread less

9 citations

Proceedings Article•DOI•

AP-Based Adaboost in High Level Feature Extraction at TRECVID

[...]

Na Cai¹, Ming Li¹, Shouxun Lin¹, Yongdong Zhang¹, Sheng Tang¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

26 Jul 2007

TL;DR: The weighting scheme along with the more adaptive formulae modified in the method makes it outperform the standard Adaboost algorithm as well as many other fusion methods.

...read moreread less

Abstract: We propose an improved fusion method used in high level feature extraction at TRECVID - average precision based Adaboost (AP-based Adaboost). The AP-based weighting scheme makes use of both the weight and the rank of each sample that all have contribution to the final average precision. The weighting scheme along with the more adaptive formulae modified in our method makes it outperform the standard Adaboost algorithm as well as many other fusion methods. Experimental results on TRECVID-2005 development set show that our method is an effective and relatively robust fusion method.

...read moreread less

8 citations

Proceedings Article•DOI•

Human Attention Model for Action Movie Analysis

[...]

An-An Liu¹, Jintao Li², Yongdong Zhang², Sheng Tang², Yan Song², Zhaoxuan Yang¹ - Show less +2 more•Institutions (2)

Tianjin University¹, Chinese Academy of Sciences²

26 Jul 2007

TL;DR: Based on video structuralization and human attention analysis, action concept annotation for interest-oriented navigation of viewers is realized and the effectiveness and generality of human attention model for action movie analysis is demonstrated.

...read moreread less

Abstract: Nowadays, automatic detection of highlights for movies is indispensable for video management and browsing. In this paper, we specifically present the formulation of human attention model and its application for action movie analysis. Depending on the relationship between stimuli in both visual and audio modalities and the change of human attention, we construct the visual and audio attention sub-models respectively. By integrating both sub-models, human attention model is formulated and attention flow plot is derived to simulate the change of human attention in the time domain. Based on video structuralization and human attention analysis, we realize action concept annotation for interest-oriented navigation of viewers. The large-scale experiments demonstrate the effectiveness and generality of human attention model for action movie analysis.

...read moreread less

7 citations

Proceedings Article•DOI•

A Novel Anchorperson Detection Algorithm Based on Spatio-temporal Slice

[...]

An-An Liu¹, Sheng Tang², Yongdong Zhang², Jintao Li², Zhaoxuan Yang¹ - Show less +1 more•Institutions (2)

Tianjin University¹, Chinese Academy of Sciences²

10 Sep 2007

TL;DR: This paper presents a novel anchorperson detection algorithm based on spatio-temporal slice (STS), which with STSpattern analysis, clustering and decision fusion, anchorperson shots can be detected for browsing news video.

...read moreread less

Abstract: For conveniently navigating and editing the news programs, it is very important to segment the video into meaningful units. The effective indexing of news videos can be fulfilled by the anchorperson shot because it is an indicator which denotes the occurrence of upcoming news stories. The paper presents a novel anchorperson detection algorithm based on spatio-temporal slice (STS). With STSpattern analysis, clustering and decision fusion, anchorperson shots can be detected for browsing news video. The large-scale experimental results demonstrate that the algorithm is accurate, robust and effective.

...read moreread less

6 citations

Proceedings Article•DOI•

Interactive Spatio-Temporal Visual Map Model for Web Video Retrieval

[...]

Huanbo Luan¹, Shouxun Lin¹, Sheng Tang¹, Shi-Yong Neo², Tat-Seng Chua² - Show less +1 more•Institutions (2)

Chinese Academy of Sciences¹, National University of Singapore²

02 Jul 2007

TL;DR: The use of a spatio-temporal visual map (STVM) model to supplement Web video retrieval is described by employing the spatiospecific visual similarity to rerank the text-retrieval results and find new results.

...read moreread less

Abstract: The massive amount of multimedia information especially video available on the Web requires a more precise and interactive retrieval. Current operational video retrieval systems do not make use of the implicit visual features but rely only on textual metadata supplied by the user during uploading. This greatly affects the retrieval performance as the metadata may not be comprehensive or consistent. In this paper, we describe the use of a spatio-temporal visual map (STVM) model to supplement Web video retrieval. This is done by employing the spatio-temporal visual similarity to rerank the text-retrieval results and find new results. Experimental results on a dynamic Web video corpus show significant improvement based on STVM model, with good usability scores based on human users.

...read moreread less

5 citations

Proceedings Article•DOI•

Clustering Guided SVM for Semantic Image Retrieval

[...]

Ke Gao¹, Shouxun Lin¹, Yongdong Zhang¹, Sheng Tang¹•Institutions (1)

Chinese Academy of Sciences¹

26 Jul 2007

TL;DR: A novel approach to SVM (support vector machine) named CGSVM (clustering guided SVM) is presented, which utilizes clustering result to select the most informative image samples to be labeled, and optimize the penalty coefficient.

...read moreread less

Abstract: SVM (support vector machine) enables effective image classification for semantic image retrieval. However, how to train accurate image classifiers in high-dimensional feature space suffers from the problem of choosing proper training samples. To solve this problem, a novel approach named CGSVM (clustering guided SVM) is presented, which utilizes clustering result to select the most informative image samples to be labeled, and optimize the penalty coefficient. Experimental results show that our algorithm achieves higher search accuracy than regular SVM for semantic image retrieval.

...read moreread less

Book Chapter•DOI•

Visual features extraction through spatiotemporal slice analysis

[...]

Xuefeng Pan¹, Jintao Li¹, Shan Ba, Yongdong Zhang¹, Sheng Tang¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

09 Jan 2007

TL;DR: The experiment results show that the proposed feature extracting method based on spatiotemporal slice analyzing is effective and robust for variant video content and format.

...read moreread less

Abstract: In this paper we propose a novel feature extracting method based on spatiotemporal slice analyzing To date, video features are focused on the character of every single video frame With our method, the video content is no longer represented with every single frame The temporal variation of visual information is taken as an important feature of video in our method We examined this kind of feature with experiments in this paper The experiment results show that the proposed feature is effective and robust for variant video content and format

...read moreread less

Proceedings Article•DOI•

News Video Retrieval using Implicit Event Semantics

[...]

Shi-Yong Neo¹, Yan-Tao Zheng¹, Hai-Kiat Goh¹, Tat-Seng Chua¹, Sheng Tang² - Show less +1 more•Institutions (2)

National University of Singapore¹, Chinese Academy of Sciences²

02 Jul 2007

TL;DR: This paper describes an automated retrieval framework which fuses the multimodal features and event structures present in news video to support precise news video retrieval and employs temporal event clusters to provide additional information during story level retrieval.

...read moreread less

Abstract: Current state-of-the-art news video retrieval systems mainly focus on automated speech recognition (ASR) text to perform retrieval. This paradigm greatly affects retrieval performance as ASR text alone is not sufficient to provide an accurate representation of the entire news video. In this paper, we describe our automated retrieval framework which fuses the multimodal features and event structures present in news video to support precise news video retrieval. The contributions of this paper are: (a) we uncover and employ temporal event clusters to provide additional information during story level retrieval; and (b) we integrate other modality features with text features and incorporate event clusters for pseudo relevance feedback (PRF) in shot level re-ranking. Experiments performed on video search task using the TRECVID 2005/06 dataset show that the proposed approach is effective.

...read moreread less

Proceedings Article•DOI•

Active learning approach to interactive spatio-temporal news video retrieval

[...]

Huanbo Luan¹, Shi-Yong Neo, Tat-Seng Chua, Yan-Tao Zheng, Sheng Tang¹, Yongdong Zhang¹, Jintao Li¹ - Show less +3 more•Institutions (1)

Chinese Academy of Sciences¹

09 Jul 2007

TL;DR: A spatio-temporal visual map (STVM) retrieval system coupled with active learning to support user-centered interactive retrieval of news video segments is proposed.

...read moreread less

Abstract: Interactive news video retrieval requires the effective communication between the human searchers and the search engine to locate relevant video segments. We propose a spatio-temporal visual map (STVM) retrieval [1] system coupled with active learning to support user-centered interactive retrieval.

...read moreread less

Book Chapter•DOI•

Retrieval method for video content in different format based on spatiotemporal features

[...]

Xuefeng Pan¹, Jintao Li¹, Yongdong Zhang¹, Sheng Tang¹, Juan Cao¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

02 Apr 2007

TL;DR: A robust video content retrieval method based on spatiotemporal features is proposed and the experiment results show that the proposed feature is robust for variant video format.

...read moreread less

Abstract: In this paper a robust video content retrieval method based on spatiotemporal features is proposed To date, most video retrieval methods are using the character of video key frames This kind of frame based methods is not robust enough for different video format With our method, the temporal variation of visual information is presented using spatiotemporal slice Then the DCT is used to extract feature of slice With this kind of feature, a robust video content retrieval algorithm is developed The experiment results show that the proposed feature is robust for variant video format

...read moreread less

Book Chapter•DOI•

A lexicon-guided LSI method for semantic news video retrieval

[...]

Juan Cao¹, Sheng Tang¹, Jintao Li¹, Yongdong Zhang¹, Xuefeng Pan¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

11 Dec 2007

TL;DR: This paper uses the lexiconguided semantic clustering to effectively remove the noise introduced by news video's additional contents, and uses the cluster-based LSI to automatically mine the semantic structure underlying the terms expression.

...read moreread less

Abstract: Many researchers try to utilize the semantic information extracted from visual feature to directly realize the semantic video retrieval or to supplement the automated speech recognition (ASR) text retrieval. But bridging the gap between the low-level visual feature and semantic content is still a challenging task. In this paper, we study how to effectively use Latent Semantic Indexing (LSI) to improve the semantic video retrieval through the ASR texts. The basic LSI method has been shown effective in the traditional text retrieval and the noisy ASR text retrieval. In this paper, we further use the lexiconguided semantic clustering to effectively remove the noise introduced by news video's additional contents, and use the cluster-based LSI to automatically mine the semantic structure underlying the terms expression. Tests on the TRECVID 2005 dataset show that the above two enhancements achieve 21.3% and 6.9% improvements in performance over the traditional vector-space model(VSM) and the basic LSI separately.

...read moreread less

DOI•

Multi-modal interview concept detection for rushes exploitation

[...]

An-An Liu¹, Sheng Tang¹, Yongdong Zhang¹, Jintao Li¹, Zhaoxuan Yang¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

30 May 2007

TL;DR: Face detection and audio classification are implemented to detect "face" and "speech" concepts for each shot and by integrating audiovisual information, "interview" concept is finally detected.

...read moreread less

Abstract: According to the concepts of Large-Scale Concept Ontology for Multimedia (LSCOM) and requirement of the 4th task in the 2006 TRECVID, i.e., rushes exploitation, the "interview" concept is an important semantic concept for rushes content analysis. The paper presents the shot-level "interview" concept detection method. Face detection and audio classification are implemented to detect "face" and "speech" concepts for each shot. By integrating audiovisual information, "interview" concept is finally detected. The utilization of the method will definitely benefit the video edit. Large-scale experimental results strongly demonstrate the accuracy and effectiveness of the proposed method.

...read moreread less

News videoretrievalusingimplicitevent semantics

[...]

Shi-Yong Neo, Hai-Kiat Goh, Tat-Seng Chua, Sheng Tang

01 Jan 2007

TL;DR: Aautomated retrieval framework whichuses themultimodal features and event structures present innewsvideo to support precise newsvideo retrieval and integrates other modality features withtext features and incorporates event clusters for pseudorelevance feedback (PRF)inshot level re-ranking.

...read moreread less

Abstract: Current state-of-the-art newsvideo retrieval systems mainly focus onautomated speech recognition (ASR)text toperform retrieval. Thisparadigm greatly affects retrieval performance asASRtext alone isnotsufficient toprovide anaccurate representation ofthe entire newsvideo. Inthispaper, we describe ourautomated retrieval framework whichfuses themultimodal features andevent structures present innewsvideo tosupport precise newsvideo retrieval. Thecontributions ofthis paper are: (a)weuncover and employ temporal event clusters toprovide additional information during story level retrieval; and(b)weintegrate other modality features withtextfeatures andincorporate eventclusters for pseudorelevance feedback (PRF)inshotlevel re-ranking. Experiments performed onvideo search task using theTRECVID 2005/06 dataset showthat theproposed approach iseffective.

...read moreread less