scispace - formally typeset
Search or ask a question
Author

Shouxun Lin

Bio: Shouxun Lin is an academic researcher from Chinese Academy of Sciences. The author has contributed to research in topics: Object detection & TRECVID. The author has an hindex of 14, co-authored 70 publications receiving 735 citations.

Papers published on a yearly basis

Papers
More filters
Journal ArticleDOI
Wu Yuan1, Shouxun Lin1, Yongdong Zhang1, Wen Yuan1, Haiyong Luo1 
TL;DR: A novel adaptive coding characteristics prediction scheme is presented to improve the accuracy of R-D modeling, by exploiting spatio-temporal correlations, and deduce a simple close-form solution to the problem of optimum bit allocation.
Abstract: For the rate control of H. 264/AVC, one of the most important things is to get the statistics of the current frame accurately. To achieve this, a novel adaptive coding characteristics prediction scheme is presented to improve the accuracy of R-D modeling, by exploiting spatio-temporal correlations. With the proposed prediction scheme, we present a novel rate function and a linear distortion model, and then deduce a simple close-form solution to the problem of optimum bit allocation, just in a TMN-8-alike way. Extensive experiments show that improvements with gains up to 0.92dB per frame over JVT-G012, the current standardized rate control scheme, are achieved by the proposed scheme for a variety of test sequences with less demanding bandwidth.

86 citations

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper presented a system for automatically detecting and analyzing complex player actions in moving background sports video sequences, aiming at action-based sports videos indexing and providing kinematic measurements for coach assistance and performance improvement.
Abstract: This paper presents a system for automatically detecting and analyzing complex player actions in moving background sports video sequences, aiming at action-based sports videos indexing and providing kinematic measurements for coach assistance and performance improvement. The system works in a coarse-to-fine fashion. For an input video, in the coarse granularity level, we automatically segment the highlights, that is, the video clips containing the desired action as summaries for general user viewing purposes; in the middle granularity level, we recognize the action types to support action-based video indexing and retrieval; and finally in the fine granularity level, the critical kinematic parameters of player action are obtained for sports professionals' training purposes. However, the complex and dynamic background of sports videos and the complexity of player actions bring considerable difficulty to the automatic analysis. To fulfill such a challenging task, robust algorithms including global motion estimation with adaptive outliers filtering, object segmentation based on adaptive background construction, and automatic human body tracking are proposed in this paper. Two visual analyzing tools: motion panorama and overlay composition, are also introduced. Real diving and jump game videos are used to test the proposed system and algorithms, and the extensive and encouraging experimental results show their effectiveness.

62 citations

Proceedings ArticleDOI
10 Oct 2004
TL;DR: An efficient block size mode selection algorithm for the variable-sizes block-matching (VSBM) in the MPEG-2 to H.264 transcoding is presented and the whole transcoding time can be efficiently reduced by 22% on the average while the bit rate is slightly increased.
Abstract: In this paper, an efficient block size mode selection algorithm for the variable-sizes block-matching (VSBM) in the MPEG-2 to H.264 transcoding is presented. Depending on leveraging the available motion information carried by the MPEG-2 bit-streams, the proposed algorithm is used to determine which one of the 16x16, 16x8, 8x16, and 8x8 block size modes should be used for each macroblock (MB). The simulation results show that the performance of the proposed algorithm is close to that of a cascaded pixel-domain transcoder (CPDT) when all the seven block size modes are enabled and the exhaustively full search method is used to determine the best block size modes. The whole transcoding time can be efficiently reduced by 22% on the average while the bit rate is slightly increased (2.9%).

46 citations

Proceedings ArticleDOI
Yan Song1, An-An Liu1, Lin Pang1, Shouxun Lin1, Yongdong Zhang1, Sheng Tang1 
14 May 2008
TL;DR: A coarse-to-fine text location method is implemented, a multi-scale approach is adopted to locate texts with different font sizes, and color-based k-means clustering is adopted in text segmentation.
Abstract: Texts in web pages, images and videos contain important clues for information indexing and retrieval. Most existing text extraction methods depend on the language type and text appearance. In this paper, a novel and universal method of image text extraction is proposed. A coarse-to-fine text location method is implemented. Firstly, a multi-scale approach is adopted to locate texts with different font sizes. Secondly, projection profiles are used in location refinement step. Color-based k-means clustering is adopted in text segmentation. Compared to grayscale image which is used in most existing methods, color image is more suitable for segmentation based on clustering. It treats corner-points, edge-points and other points equally so that it solves the problem of handling multilingual text. It is demonstrated in experimental results that best performance is obtained when k is 3. Comparative experimental results on a large number of images show that our method is accurate and robust in various conditions.

44 citations

Journal ArticleDOI
TL;DR: A localized multiple kernel learning (L-MKL) algorithm to tackle the issues above and develops a locality gating model to partition the input space of heterogeneous representations to a set of localities of simpler data structure.
Abstract: Realistic human action recognition in videos has been a useful yet challenging task. Video shots of same actions may present huge intra-class variations in terms of visual appearance, kinetic patterns, video shooting, and editing styles. Heterogeneous feature representations of videos pose another challenge on how to effectively handle the redundancy, complementariness and disagreement in these features. This paper proposes a localized multiple kernel learning (L-MKL) algorithm to tackle the issues above. L-MKL integrates the localized classifier ensemble learning and multiple kernel learning in a unified framework to leverage the strengths of both. The basis of L-MKL is to build multiple kernel classifiers on diverse features at subspace localities of heterogeneous representations. L-MKL integrates the discriminability of complementary features locally and enables localized MKL classifiers to deliver better performance in its own region of expertise. Specifically, L-MKL develops a locality gating model to partition the input space of heterogeneous representations to a set of localities of simpler data structure. Each locality then learns its localized optimal combination of Mercer kernels of heterogeneous features. Finally, the gating model coordinates the localized multiple kernel classifiers globally to perform action recognition. Experiments on two datasets show that the proposed approach delivers promising performance.

37 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This review provides a fundamental comparison and analysis of the remaining problems in the field and summarizes the fundamental problems and enumerates factors that should be considered when addressing these problems.
Abstract: This paper analyzes, compares, and contrasts technical challenges, methods, and the performance of text detection and recognition research in color imagery It summarizes the fundamental problems and enumerates factors that should be considered when addressing these problems Existing techniques are categorized as either stepwise or integrated and sub-problems are highlighted including text localization, verification, segmentation and recognition Special issues associated with the enhancement of degraded text and the processing of video text, multi-oriented, perspectively distorted and multilingual text are also addressed The categories and sub-categories of text are illustrated, benchmark datasets are enumerated, and the performance of the most representative approaches is compared This review provides a fundamental comparison and analysis of the remaining problems in the field

709 citations

Journal ArticleDOI
01 Nov 2011
TL;DR: Methods for video structure analysis, including shot boundary detection, key frame extraction and scene segmentation, extraction of features including static key frame features, object features and motion features, video data mining, video annotation, and video retrieval including query interfaces are analyzed.
Abstract: Video indexing and retrieval have a wide spectrum of promising applications, motivating the interest of researchers worldwide. This paper offers a tutorial and an overview of the landscape of general strategies in visual content-based video indexing and retrieval, focusing on methods for video structure analysis, including shot boundary detection, key frame extraction and scene segmentation, extraction of features including static key frame features, object features and motion features, video data mining, video annotation, video retrieval including query interfaces, similarity measure and relevance feedback, and video browsing. Finally, we analyze future research directions.

606 citations

Book
26 May 2009
TL;DR: This paper presents a component-wise decomposition of such an interdisciplinary multimedia system, covering influences from information retrieval, computer vision, machine learning, and human–computer interaction and lays down the anatomy of a concept-based video search engine.
Abstract: In this paper, we review 300 references on video retrieval, indicating when text-only solutions are unsatisfactory and showing the promising alternatives which are in majority concept-based. Therefore, central to our discussion is the notion of a semantic concept: an objective linguistic description of an observable entity. Specifically, we present our view on how its automated detection, selection under uncertainty, and interactive usage might solve the major scientific problem for video retrieval: the semantic gap. To bridge the gap, we lay down the anatomy of a concept-based video search engine. We present a component-wise decomposition of such an interdisciplinary multimedia system, covering influences from information retrieval, computer vision, machine learning, and human–computer interaction. For each of the components we review state-of-the-art solutions in the literature, each having different characteristics and merits. Because of these differences, we cannot understand the progress in video retrieval without serious evaluation efforts such as carried out in the NIST TRECVID benchmark. We discuss its data, tasks, results, and the many derived community initiatives in creating annotations and baselines for repeatable experiments. We conclude with our perspective on future challenges and opportunities.

416 citations

Journal ArticleDOI
01 Jan 2010
TL;DR: Recent development in three major issues involved in a general human motion analysis system, namely, human detection, view-invariant pose representation and estimation, and behavior understanding are presented.
Abstract: As viewpoint issue is becoming a bottleneck for human motion analysis and its application, in recent years, researchers have been devoted to view-invariant human motion analysis and have achieved inspiring progress. The challenge here is to find a methodology that can recognize human motion patterns to reach increasingly sophisticated levels of human behavior description. This paper provides a comprehensive survey of this significant research with the emphasis on view-invariant representation, and recognition of poses and actions. In order to help readers understand the integrated process of visual analysis of human motion, this paper presents recent development in three major issues involved in a general human motion analysis system, namely, human detection, view-invariant pose representation and estimation, and behavior understanding. Public available standard datasets are recommended. The concluding discussion assesses the progress so far, and outlines some research challenges and future directions, and solution to what is essential to achieve the goals of human motion analysis.

237 citations

Proceedings ArticleDOI
20 Jun 2017
TL;DR: The parameters and characteristics of a dataset for omnidirectional video are proposed and exemplary instantiated to evaluate various aspects of such an ecosystem, namely bitrate overhead, bandwidth requirements, and quality aspects in terms of viewport PSNR.
Abstract: Real-time entertainment services such as streaming audiovisual content deployed over the open, unmanaged Internet account now for more than 70% during peak periods. More and more such bandwidth hungry applications and services are proposed like immersive media services such as virtual reality and, specifically omnidirectional/360-degree videos. The adaptive streaming of omnidirectional video over HTTP imposes an important challenge on today's video delivery infrastructures which calls for dedicated, thoroughly designed techniques for content generation, delivery, and consumption.; AB@This paper describes the usage of tiles --- as specified within modern video codecs such HEVC/H.265 and VP9 --- enabling bandwidth efficient adaptive streaming of omnidirectional video over HTTP and we define various streaming strategies. Therefore, the parameters and characteristics of a dataset for omnidirectional video are proposed and exemplary instantiated to evaluate various aspects of such an ecosystem, namely bitrate overhead, bandwidth requirements, and quality aspects in terms of viewport PSNR. The results indicate bitrate savings from 40% (in a realistic scenario with recorded head movements from real users) up to 65% (in an ideal scenario with a centered/fixed viewport) and serve as a baseline and guidelines for advanced techniques including the outline of a research roadmap for the near future.

194 citations