Showing papers by "Junsong Yuan published in 2015"

PDF

Open Access

Proceedings Article•DOI•

Fast action proposals for human action detection and search

[...]

Gang Yu¹, Junsong Yuan¹•Institutions (1)

07 Jun 2015

TL;DR: Experimental results on two challenging datasets, MSRII and UCF 101, validate the superior performance of the action proposals as well as competitive results on action detection and search.

...read moreread less

Abstract: In this paper we target at generating generic action proposals in unconstrained videos. Each action proposal corresponds to a temporal series of spatial bounding boxes, i.e., a spatio-temporal video tube, which has a good potential to locate one human action. Assuming each action is performed by a human with meaningful motion, both appearance and motion cues are utilized to measure the actionness of the video tubes. After picking those spatiotemporal paths of high actionness scores, our action proposal generation is formulated as a maximum set coverage problem, where greedy search is performed to select a set of action proposals that can maximize the overall actionness score. Compared with existing action proposal approaches, our action proposals do not rely on video segmentation and can be generated in nearly real-time. Experimental results on two challenging datasets, MSRII and UCF 101, validate the superior performance of our action proposals as well as competitive results on action detection and search.

...read moreread less

258 citations

Journal Article•DOI•

Learning LBP structure by maximizing the conditional mutual information

[...]

Jianfeng Ren¹, Xudong Jiang¹, Junsong Yuan¹•Institutions (1)

Nanyang Technological University¹

01 Oct 2015-Pattern Recognition

TL;DR: An incremental Maximal-Conditional-Mutual-Information scheme for LBP structure learning to handle pixel correlation has demonstrated a superior performance over the state-of-the-arts results on classifying both spatial patternssuch as texture classification, scene recognition and face recognition, and spatial-temporal patterns such as dynamic texture recognition.

...read moreread less

72 citations

Journal Article•DOI•

Robust Discriminative Tracking via Landmark-Based Label Propagation

[...]

Yuwei Wu¹, Mingtao Pei¹, Min Yang¹, Junsong Yuan², Yunde Jia¹ - Show less +1 more•Institutions (2)

Beijing Institute of Technology¹, Nanyang Technological University²

19 Feb 2015-IEEE Transactions on Image Processing

TL;DR: Qualitative and quantitative evaluations on the benchmark data set containing 51 challenging image sequences demonstrate that the proposed algorithm outperforms the state-of-the-art methods.

...read moreread less

Abstract: The appearance of an object could be continuously changing during tracking, thereby being not independent identically distributed. A good discriminative tracker often needs a large number of training samples to fit the underlying data distribution, which is impractical for visual tracking. In this paper, we present a new discriminative tracker via landmark-based label propagation (LLP) that is nonparametric and makes no specific assumption about the sample distribution. With an undirected graph representation of samples, the LLP locally approximates the soft label of each sample by a linear combination of labels on its nearby landmarks. It is able to effectively propagate a limited amount of initial labels to a large amount of unlabeled samples. To this end, we introduce a local landmarks approximation method to compute the cross-similarity matrix between the whole data and landmarks. Moreover, a soft label prediction function incorporating the graph Laplacian regularizer is used to diffuse the known labels to all the unlabeled vertices in the graph, which explicitly considers the local geometrical structure of all samples. Tracking is then carried out within a Bayesian inference framework, where the soft label prediction value is used to construct the observation model. Both qualitative and quantitative evaluations on the benchmark data set containing 51 challenging image sequences demonstrate that the proposed algorithm outperforms the state-of-the-art methods.

...read moreread less

44 citations

Proceedings Article•DOI•

AR in Hand: Egocentric Palm Pose Tracking and Gesture Recognition for Augmented Reality Applications

[...]

Hui Liang¹, Junsong Yuan¹, Daniel Thalmann¹, Nadia Magnenat Thalmann¹•Institutions (1)

Nanyang Technological University¹

13 Oct 2015

TL;DR: This demo shows the possibility to interact with 3D contents with bare hands on wearable devices by two Augmented Reality applications, including virtual teapot manipulation and fountain animation in hand.

...read moreread less

Abstract: Wearable devices such as Microsoft Hololens and Google glass are highly popular in recent years. As traditional input hardware is difficult to use on such platforms, vision-based hand pose tracking and gesture control techniques are more suitable alternatives. This demo shows the possibility to interact with 3D contents with bare hands on wearable devices by two Augmented Reality applications, including virtual teapot manipulation and fountain animation in hand. Technically, we use a head-mounted depth camera to capture the RGB-D images from egocentric view, and adopt the random forest to regress for the palm pose and classify the hand gesture simultaneously via a spatial-voting framework. The predicted pose and gesture are used to render the 3D virtual objects, which are overlaid onto the hand region in input RGB images with camera calibration parameters for seamless virtual and real scene synthesis.

...read moreread less

43 citations

Journal Article•DOI•

Manifold Kernel Sparse Representation of Symmetric Positive-Definite Matrices and Its Applications

[...]

Yuwei Wu¹, Yunde Jia¹, Peihua Li², Jian Zhang, Junsong Yuan³ - Show less +1 more•Institutions (3)

Beijing Institute of Technology¹, Dalian University of Technology², Nanyang Technological University³

01 Jul 2015-IEEE Transactions on Image Processing

TL;DR: This paper proposes a novel sparse representation method of SPD matrices in the data-dependent manifold kernel space, and designs two different positive definite kernel functions that can be readily transformed to the corresponding manifold kernels.

...read moreread less

Abstract: The symmetric positive-definite (SPD) matrix, as a connected Riemannian manifold, has become increasingly popular for encoding image information. Most existing sparse models are still primarily developed in the Euclidean space. They do not consider the non-linear geometrical structure of the data space, and thus are not directly applicable to the Riemannian manifold. In this paper, we propose a novel sparse representation method of SPD matrices in the data-dependent manifold kernel space. The graph Laplacian is incorporated into the kernel space to better reflect the underlying geometry of SPD matrices. Under the proposed framework, we design two different positive definite kernel functions that can be readily transformed to the corresponding manifold kernels. The sparse representation obtained has more discriminating power. Extensive experimental results demonstrate good performance of manifold kernel sparse codes in image classification, face recognition, and visual tracking.

...read moreread less

33 citations

Journal Article•DOI•

Resolving Ambiguous Hand Pose Predictions by Exploiting Part Correlations

[...]

Hui Liang¹, Junsong Yuan¹, Daniel Thalmann¹•Institutions (1)

Nanyang Technological University¹

01 Jul 2015-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: A novel method to predict the 3-D joint positions from the depth images and the parsed hand parts obtained with a pretrained classifier is presented, showing that the regressor learned on synthesized dataset also gives accurate prediction on real-world depth images by enforcing the hand part correlations despite their discrepancies.

...read moreread less

Abstract: The positions of the hand joints are important high-level features for hand-based human-computer interaction. We present a novel method to predict the 3-D joint positions from the depth images and the parsed hand parts obtained with a pretrained classifier. The hand parts are utilized as the additional cue to resolve the multimodal predictions produced by the previous regression-based method without increasing the computational cost significantly. In addition, we further enforce the hand motion constraints to fuse the per-pixel prediction results. The posterior distribution of the joints is formulated as a weighted product of experts model based on the individual pixel predictions, which is maximized via the expectation–maximization algorithm on a learned low-dimensional space of the hand joint parameters. The experimental results show the proposed method improves the prediction accuracy considerably compared with the rivals that also regress for the joint locations from the depth images. Especially, we show that the regressor learned on synthesized dataset also gives accurate prediction on real-world depth images by enforcing the hand part correlations despite their discrepancies.

...read moreread less

31 citations

Journal Article•DOI•

Propagative Hough Voting for Human Activity Detection and Recognition

[...]

Gang Yu¹, Junsong Yuan¹, Zicheng Liu²•Institutions (2)

Nanyang Technological University¹, Microsoft²

01 Jan 2015-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: Zhang et al. as discussed by the authors proposed propagative generalized Hough voting (HV) to propagate the label and spatio-temporal configuration information of local features via HV.

...read moreread less

Abstract: Generalized Hough voting (HV) has shown promising results in both object and action detection. However, most existing HV methods will suffer when insufficient training data are provided. We propose propagative HV to address this limitation and apply it to human activity analysis. Instead of training a discriminative classifier for local feature voting, we match individual local features to propagate the label and spatiotemporal configuration information of local features via HV. To enable a fast local feature matching, we index the local features using random projection trees (RPTs). RPTs can reveal the low-dimension manifold structure to provide adaptive local feature matching. Moreover, as the RPT index can be built in either labeled or unlabeled dataset, it can be applied to different tasks, such as activity search (limited training) and recognition (sufficient training). The superior performances on benchmarked datasets validate that our propagative HV can outperform state-of-the-art techniques in various activity analysis tasks, such as activity search, recognition, and prediction.

...read moreread less

30 citations

Journal Article•DOI•

A Chi-Squared-Transformed Subspace of LBP Histogram for Visual Recognition

[...]

Jianfeng Ren¹, Xudong Jiang¹, Junsong Yuan¹•Institutions (1)

Nanyang Technological University¹

06 Mar 2015-IEEE Transactions on Image Processing

TL;DR: A chi-squared transformation (CST) is proposed to transfer the LBP feature to a feature that fits better to Gaussian distribution, which leads to the formulation of a two-class classification problem.

...read moreread less

Abstract: Local binary pattern (LBP) and its variants have been widely used in many recognition tasks. Subspace approaches are often applied to the LBP feature in order to remove unreliable dimensions, or to derive a compact feature representation. It is well-known that subspace approaches utilizing up to the second-order statistics are optimal only when the underlying distribution is Gaussian. However, due to its nonnegative and simplex constraints, the LBP feature deviates significantly from Gaussian distribution. To alleviate this problem, we propose a chi-squared transformation (CST) to transfer the LBP feature to a feature that fits better to Gaussian distribution. The proposed CST leads to the formulation of a two-class classification problem. Due to its asymmetric nature, we apply asymmetric principal component analysis (APCA) to better remove the unreliable dimensions in the CST feature space. The proposed CST-APCA is evaluated extensively on spatial LBP for face recognition, protein cellular classification, and spatial-temporal LBP for dynamic texture recognition. All experiments show that the proposed feature transformation significantly enhances the recognition accuracy.

...read moreread less

29 citations

Journal Article•DOI•

Topical Video Object Discovery From Key Frames by Modeling Word Co-Occurrence Prior

[...]

Gangqiang Zhao¹, Junsong Yuan¹, Gang Hua², Jiong Yang¹•Institutions (2)

Nanyang Technological University¹, Microsoft²

07 Oct 2015-IEEE Transactions on Image Processing

TL;DR: A topic model is proposed that incorporates a word co-occurrence prior for efficient discovery of topical video objects from a set of key frames that can discover different types of topical objects despite variations in scale, view-point, color and lighting changes, or even partial occlusions.

...read moreread less

Abstract: A topical video object refers to an object, that is, frequently highlighted in a video. It could be, e.g., the product logo and the leading actor/actress in a TV commercial. We propose a topic model that incorporates a word co-occurrence prior for efficient discovery of topical video objects from a set of key frames. Previous work using topic models, such as latent Dirichelet allocation (LDA), for video object discovery often takes a bag-of-visual-words representation, which ignored important co-occurrence information among the local features. We show that such data driven co-occurrence information from bottom–up can conveniently be incorporated in LDA with a Gaussian Markov prior, which combines top–down probabilistic topic modeling with bottom–up priors in a unified model. Our experiments on challenging videos demonstrate that the proposed approach can discover different types of topical objects despite variations in scale, view-point, color and lighting changes, or even partial occlusions. The efficacy of the co-occurrence prior is clearly demonstrated when compared with topic models without such priors.

...read moreread less

26 citations

Journal Article•DOI•

Randomized Spatial Context for Object Search

[...]

Yuning Jiang¹, Jingjing Meng¹, Junsong Yuan¹, Jiebo Luo²•Institutions (2)

Nanyang Technological University¹, University of Rochester²

24 Feb 2015-IEEE Transactions on Image Processing

TL;DR: This work proposes a randomized approach to deriving spatial context, in the form of spatial random partition, which offers three benefits: the aggregation of the matching scores over multiple random patches provides robust local matching; the matched objects can be directly identified on the pixelwise confidence map, which results in efficient object localization.

...read moreread less

Abstract: Searching visual objects in large image or video data sets is a challenging problem, because it requires efficient matching and accurate localization of query objects that often occupy a small part of an image. Although spatial context has been shown to help produce more reliable detection than methods that match local features individually, how to extract appropriate spatial context remains an open problem. Instead of using fixed-scale spatial context, we propose a randomized approach to deriving spatial context, in the form of spatial random partition. The effect of spatial context is achieved by averaging the matching scores over multiple random patches. Our approach offers three benefits: 1) the aggregation of the matching scores over multiple random patches provides robust local matching; 2) the matched objects can be directly identified on the pixelwise confidence map, which results in efficient object localization; and 3) our algorithm lends itself to easy parallelization and also allows a flexible tradeoff between accuracy and speed through adjusting the number of partition times. Both theoretical studies and experimental comparisons with the state-of-the-art methods validate the advantages of our approach.

...read moreread less

24 citations

Proceedings Article•DOI•

Optimizing Inter-server Communication for Online Social Networks

[...]

Jing Tang¹, Xueyan Tang¹, Junsong Yuan¹•Institutions (1)

Nanyang Technological University¹

01 Jun 2015

TL;DR: This paper investigates the problem of minimizing the total inter-server traffic among a cluster of OSN servers through joint partitioning and replication optimization and proposes a Traffic-Optimized Partitioning and Replication (TOPR) method based on an analysis of how replica allocation affects the inter- server communication.

...read moreread less

Abstract: Distributed storage systems are the key infrastructures for hosting the user data of large-scale Online Social Networks (OSNs). The amount of inter-server communication is an important scalability indicator for these systems. Data partitioning and replication are two inter-related issues affecting the inter-server traffic caused by user-initiated read and write operations. This paper investigates the problem of minimizing the total inter-server traffic among a cluster of OSN servers through joint partitioning and replication optimization. We propose a Traffic-Optimized Partitioning and Replication (TOPR) method based on an analysis of how replica allocation affects the inter-server communication. Lightweight algorithms are developed to adjust partitioning and replication dynamically according to data read and write rates. Evaluations with real Facebook and Twitter social graphs show that TOPR significantly reduces the inter-server communication compared with state-of-the-art methods.

...read moreread less

Proceedings Article•DOI•

Group saliency propagation for large scale and quick image co-segmentation

[...]

Koteswar Rao Jerripothula¹, Jianfei Cai¹, Junsong Yuan¹•Institutions (1)

Nanyang Technological University¹

10 Dec 2015

TL;DR: Group Saliency Propagation model is proposed where a single group saliency map is developed, which can be propagated to segment the entire group, with the added advantage of speed up.

...read moreread less

Abstract: Most of the existing co-segmentation methods are usually complex, and require pre-grouping of images, fine-tuning a few parameters and initial segmentation masks etc. These limitations become serious concerns for their application on large scale datasets. In this paper, Group Saliency Propagation (GSP) model is proposed where a single group saliency map is developed, which can be propagated to segment the entire group. In addition, it is also shown how a pool of these group saliency maps can help in quickly segmenting new input images. Experiments demonstrate that the proposed method can achieve competitive performance on several benchmark co-segmentation datasets including ImageNet, with the added advantage of speed up.

...read moreread less

Journal Article•DOI•

LBP Encoding Schemes Jointly Utilizing the Information of Current Bit and Other LBP Bits

[...]

Jianfeng Ren¹, Xudong Jiang¹, Junsong Yuan¹•Institutions (1)

Nanyang Technological University¹

23 Sep 2015-IEEE Signal Processing Letters

TL;DR: Two enhanced NRLBPs are proposed that jointly utilize the sign and the magnitude of the current pixel difference, and also the information of other LBP bits that demonstrate a superior performance compared with NRLBP and other L BP variants.

...read moreread less

Abstract: Local binary pattern (LBP) is sensitive to image noise Noise-resistant LBP (NRLBP) improves the robustness to noise by incorporating the prior knowledge of images and information of other LBP bits into encoding process However, it encodes the small pixel difference in such a way that its sign and magnitude are ignored Although the small pixel difference may be easily distorted by noise, some of its information is still useful for LBP encoding In this letter, we propose two enhanced NRLBPs that jointly utilize the sign and the magnitude of the current pixel difference, and also the information of other LBP bits The proposed approaches are validated on two benchmark databases and demonstrate a superior performance compared with NRLBP and other LBP variants The performance gain is significant when the noise level is high

...read moreread less

Proceedings Article•DOI•

QCCE: Quality constrained co-saliency estimation for common object detection

[...]

Koteswar Rao Jerripothula, Jianfei Cai, Junsong Yuan

01 Dec 2015

TL;DR: The approach here is to iteratively update the saliency maps through co-saliency estimation depending upon quality scores, which indicate the degree of separation of foreground and background likelihoods (the easier the separation, the higher the quality of saliency map).

...read moreread less

Abstract: Despite recent advances in joint processing of images, sometimes it may not be as effective as single image processing for object discovery problems. In this paper while aiming for common object detection, we attempt to address this problem by proposing a novel QCCE: Quality Constrained Co-saliency Estimation method. The approach here is to iteratively update the saliency maps through co-saliency estimation depending upon quality scores, which indicate the degree of separation of foreground and background likelihoods (the easier the separation, the higher the quality of saliency map). In this way, joint processing is automatically constrained by the quality of saliency maps. Moreover, the proposed method can be applied to both unsupervised and supervised scenarios, unlike other methods which are particularly designed for one scenario only. Experimental results demonstrate superior performance of the proposed method compared to the state-of-the-art methods.

...read moreread less

Book•DOI•

Context Aware Human-Robot and Human-Agent Interaction

[...]

Nadia Magnenat-Thalmann, Junsong Yuan, Daniel Thalmann, Bum-Jae You

26 Sep 2015

TL;DR: This is the first book to describe how Autonomous Virtual Humans and Social Robots can interact with real people, be aware of the environment around them, and react to various situations.

...read moreread less

Abstract: This is the first book to describe how Autonomous Virtual Humans and Social Robots can interact with real people, be aware of the environment around them, and react to various situations. Researchers from around the world present the main techniques for tracking and analysing humans and their behaviour and contemplate the potential for these virtual humans and robots to replace or stand in for their human counterparts, tackling areas such as awareness and reactions to real world stimuli and using the same modalities as humans do: verbal and body gestures, facial expressions and gaze to aid seamless human-computer interaction (HCI). The research presented in this volume is split into three sections: User Understanding through Multisensory Perception: deals with the analysis and recognition of a given situation or stimuli, addressing issues of facial recognition, body gestures and sound localization. Facial and Body Modelling Animation: presents the methods used in modelling and animating faces and bodies to generate realistic motion. Modelling Human Behaviours: presents the behavioural aspects of virtual humans and social robots when interacting and reacting to real humans and each other. Context Aware Human-Robot and Human-Agent Interaction would be of great use to students, academics and industry specialists in areas like Robotics, HCI, and Computer Graphics.

...read moreread less

Proceedings Article•DOI•

Quantized fuzzy LBP for face recognition

[...]

Jianfeng Ren¹, Xudong Jiang¹, Junsong Yuan¹•Institutions (1)

Nanyang Technological University¹

19 Apr 2015

TL;DR: This work proposes to determine the fuzzy membership function by its sign only, and shows that this approach is more robust to noise, and demonstrates a superior performance to FLBP and many other LBP variants.

...read moreread less

Abstract: Face recognition under large illumination variations is challenging Local binary pattern (LBP) is robust to illumination variation, but sensitive to noise Fuzzy LBP (FLBP) partially solves the noise-sensitivity problem by incorporating fuzzy logic in the representation of local binary patterns The fuzzy membership function is determined by both sign and magnitude of the pixel difference However, the magnitude is easily altered by noise, hence could be unreliable Thus, we propose to determine the fuzzy membership function by its sign only We name the proposed approach as Quantized Fuzzy LBP (QFLBP) On two challenging face recognition datasets, it is shown more robust to noise, and demonstrates a superior performance to FLBP and many other LBP variants

...read moreread less

Proceedings Article•DOI•

Egocentric hand pose estimation and distance recovery in a single RGB image

[...]

Hui Liang¹, Junsong Yuan¹, Daniel Thalman¹•Institutions (1)

Nanyang Technological University¹

06 Aug 2015

TL;DR: This paper demonstrates the possibility to recover both the articulated hand pose and its distance from the camera with a single RGB camera in egocentric view with good performance on both a synthesized dataset and several real-world color image sequences that are captured in different environments.

...read moreread less

Abstract: Articulated hand pose recovery in egocentric vision is useful for in-air interaction with the wearable devices, such as the Google glasses. Despite the progress obtained with the depth camera, this task is still challenging with ordinary RGB cameras. In this paper we demonstrate the possibility to recover both the articulated hand pose and its distance from the camera with a single RGB camera in egocentric view. We address this problem by modeling the distance as a hidden variable and use the Conditional Regression Forest to infer the pose and distance jointly. Especially, we find that the pose estimation accuracy can be further enhanced by incorporating the hand part semantics. The experimental results show that the proposed method achieves good performance on both a synthesized dataset and several real-world color image sequences that are captured in different environments. In addition, our system runs in real-time at more than 10fps.

...read moreread less

Journal Article•DOI•

Collaborative Multifeature Fusion for Transductive Spectral Learning

[...]

Hongxing Wang¹, Junsong Yuan¹•Institutions (1)

Nanyang Technological University¹

01 Mar 2015-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: This work proposes a novel transductive learning approach that considers multiple feature types simultaneously to improve the classification performance, and allows all feature types to collaborate simultaneously.

...read moreread less

Abstract: Much existing work of multifeature learning relies on the agreement among different feature types to improve the clustering or classification performance. However, as different feature types could have different data characteristics, such a forced agreement among different feature types may not bring a satisfactory result. We propose a novel transductive learning approach that considers multiple feature types simultaneously to improve the classification performance. Instead of forcing different feature types to agree with each other, we perform spectral clustering in different feature types separately. Each data sample is then described by a co-occurrence of feature patterns among different feature types, and we apply these feature co-occurrence representations to perform transductive learning, such that data samples of similar feature co-occurrence pattern will share the same label. As the spectral clustering results in different feature types and the formed co-occurrence patterns influence each other under the transductive learning formulation, an iterative optimization approach is proposed to decouple these factors. Different from co-training that need to iteratively update individual feature type, our method allows all feature types to collaborate simultaneously. It can naturally handle multiple feature types together and is less sensitive to noisy feature types. The experimental results on synthetic, object, and action recognition datasets all validate the advantages of our method compared to state-of-the-art methods.

...read moreread less

Journal Article•DOI•

Efficient Mining of Optimal AND/OR Patterns for Visual Recognition

[...]

Chaoqun Weng¹, Junsong Yuan¹•Institutions (1)

Nanyang Technological University¹

18 Mar 2015-IEEE Transactions on Multimedia

TL;DR: This work proposes a novel branch-and-bound co-occurrence feature mining algorithm that can directly mine both optimal conjunctions and disjunctions of individual features at arbitrary orders simultaneously.

...read moreread less

Abstract: The co-occurrence features are the composition of base features that have more discriminative power than individual base features. Although they show promising performance in visual recognition applications such as object, scene, and action recognition, the discovery of optimal co-occurrence features is usually a computationally demanding task. Unlike previous feature mining methods that fix the order of the co-occurrence features or rely on a two-stage frequent pattern mining to select the optimal co-occurrence feature, we propose a novel branch-and-bound search-based co-occurrence feature mining algorithm that can directly mine both optimal conjunctions (AND) and disjunctions (OR) of individual features at arbitrary orders simultaneously. This feature mining process is integrated into a multi-class boosting framework Adaboost.MH such that the weighted training error is minimized by the discovered co- occurrence features in each boosting step. Experiments on UCI benchmark datasets, the scene recognition dataset, and the action recognition dataset validate both the effectiveness and efficiency of our proposed method.

...read moreread less

Proceedings Article•DOI•

Flexible Trajectory Indexing for 3D Motion Recognition

[...]

Jianyu Yang¹, Junsong Yuan², Youfu Li³•Institutions (3)

Soochow University (Suzhou)¹, Nanyang Technological University², City University of Hong Kong³

05 Jan 2015

TL;DR: A flexible 3D trajectory indexing method for complex 3D motion recognition based on both point level and primitive-level descriptors that is suitable for spatial motion trajectory, which is view-invariant in 3D space.

...read moreread less

Abstract: Motion trajectory analysis is important for human motion recognition and human computer interaction. In this paper, we propose a flexible 3D trajectory indexing method for complex 3D motion recognition. Based on both point level and primitive-level descriptors, trajectories are represented in the sub-primitive level, the level between the point level and primitive level. Primitives are flexibly segmented into sub-primitives in various scales, and the sub-primitives retain more detailed information than primitives. The detailed level of sub-primitives can be adjusted by controlling segmentation scales according to motion complexities. The proposed approach is suitable for spatial motion trajectory, which is view-invariant in 3D space. A cluster model is also proposed to represent motion classes and motion recognition performed based on maximum a posteriori (MAP) criterion. The experiments on benchmark datasets validate the effectiveness of the proposed approach.

...read moreread less

Book Chapter•DOI•

First-Person Palm Pose Tracking and Gesture Recognition in Augmented Reality

[...]

Daniel Thalmann¹, Hui Liang¹, Junsong Yuan¹•Institutions (1)

Nanyang Technological University¹

11 Mar 2015

TL;DR: An Augmented Reality solution to allow users to manipulate and inspect 3D virtual objects freely with their bare hands on wearable devices is presented and a unified framework to jointly recover the 6D palm pose and recognize the hand gesture from the depth images is proposed.

...read moreread less

Abstract: We present an Augmented Reality solution to allow users to manipulate and inspect 3D virtual objects freely with their bare hands on wearable devices. To this end, we use a head-mounted depth camera to capture the RGB-D hand images from egocentric view, and propose a unified framework to jointly recover the 6D palm pose and recognize the hand gesture from the depth images. The random forest is utilized to regress for the palm pose and classify the hand gesture simultaneously via a spatial-voting framework. With a real-world annotated training dataset, the proposed method shows to predict the palm pose and gesture accurately. The output of the forest is used to render the 3D virtual objects, which are overlaid onto the hand region in input RGB images with camera calibration parameters to provide seamless virtual and real scene synthesis.

...read moreread less

Proceedings Article•DOI•

Fast object instance search in videos from one example

[...]

Jingjing Meng¹, Junsong Yuan¹, Yap-Peng Tan¹, Gang Wang¹•Institutions (1)

Nanyang Technological University¹

10 Dec 2015

TL;DR: The proposed method significantly improves the localized search accuracy over the baseline, which treats each frame independently, and is able to find the top 100 object trajectories in the 5.5-hour dataset within 30 seconds.

...read moreread less

Abstract: We present an efficient approach to search for and locate all occurrences of a specific object in large video volumes, given a single query example. Locations of object occurrences are returned as spatio-temporal trajectories in the 3D video volume. Despite much work on object instance search in image datasets, these methods locate the object independently in each image, therefore do not preserve the spatio-temporal consistency in consecutive video frames. This results in sub-optimal performance if directly applied to videos, as will be shown in our experiments. We propose to locate the object jointly across video frames using spatio-temporal search. The efficiency and effectiveness of the proposed approach is demonstrated on a consumer video dataset consisting of crawled YouTube videos and mobile captured consumer clips. Our method significantly improves the localized search accuracy over the baseline, which treats each frame independently. Moreover, it is able to find the top 100 object trajectories in the 5.5-hour dataset within 30 seconds.

...read moreread less

Proceedings Article•DOI•

Two-layer optimized light field display using depth initialization

[...]

Shizheng Wang¹, Zhenfeng Zhuang¹, Phil Surman¹, Junsong Yuan¹, Yuanjin Zheng¹, Xiao Wei Sun¹ - Show less +2 more•Institutions (1)

Nanyang Technological University¹

01 Dec 2015

TL;DR: Experiments demonstrate that the proposed initialization method can obviously save the iterations and related processing time for the existing online or offline algorithms to achieve the same reconstructed peak signal to noise ratio (PSNR) and present a better subjective reconstructed performance using the same computation resource.

...read moreread less

Abstract: In this paper, we propose a method to optimize two-layer light field display using depth initialization. In contrast to existing trade-off work between performance and processing time, this paper firstly models the display principle of layered light field display, and then performs layered initialization with the prior known depth of 3D objects, and finally optimizes the layered images for light field display. Experiments demonstrate that the proposed initialization method can obviously save the iterations and related processing time for the existing online or offline algorithms to achieve the same reconstructed peak signal to noise ratio (PSNR) and present a better subjective reconstructed performance using the same computation resource.

...read moreread less

Proceedings Article•DOI•

Adaptive Exponential Smoothing for Online Filtering of Pixel Prediction Maps

[...]

Kang Dang¹, Jiong Yang¹, Junsong Yuan¹•Institutions (1)

Nanyang Technological University¹

01 Dec 2015

TL;DR: Comparisons with average and exponential filtering, as well as state-of-the-art methods, validate that the proposed adaptive exponential filtering method can effectively refine the pixel prediction maps, without using the original video again.

...read moreread less

Abstract: We propose an efficient online video filtering method, called adaptive exponential filtering (AES) to refine pixel prediction maps. Assuming each pixel is associated with a discriminative prediction score, the proposed AES applies exponentially decreasing weights over time to smooth the prediction score of each pixel, similar to classic exponential smoothing. However, instead of fixing the spatial pixel location to perform temporal filtering, we trace each pixel in the past frames by finding the optimal path that can bring the maximum exponential smoothing score, thus performing adaptive and non-linear filtering. Thanks to the pixel tracing, AES can better address object movements and avoid over-smoothing. To enable real-time filtering, we propose a linear-complexity dynamic programming scheme that can trace all pixels simultaneously. We apply the proposed filtering method to improve both saliency detection maps and scene parsing maps. The comparisons with average and exponential filtering, as well as state-of-the-art methods, validate that our AES can effectively refine the pixel prediction maps, without using the original video again.

...read moreread less

Journal Article•DOI•

P-79: Maximizing the 2D Viewing Field of a Computational Two-layer Light Field 3D Display

[...]

Shizheng Wang¹, Mingyu Sun¹, Phil Surman¹, Junsong Yuan¹, Xiao Wei Sun¹ - Show less +1 more•Institutions (1)

Nanyang Technological University¹

01 Jun 2015

TL;DR: Experiments demonstrate that the proposed method provides a relatively desirable improvement for the whole visual effect of compressive light field display, especially for the performance in the non-target display region.

...read moreread less

Abstract: In this paper, we propose a method to extend the viewing field of a compressive light field display by optimizing the maximum viewing angle. The difference from existing work is the improvement in the overall visual effect of compressive light field display rather than only extending the viewing field region. Sliding window scanning and viewer detection are also used for determining the target region and optimizing the display. Experiments demonstrate that the proposed method provides a relatively desirable improvement for the whole visual effect of compressive light field display, especially for the performance in the non-target display region. Author Keywords Light field; compressive display; glass-free 3D; face detection.

...read moreread less

Proceedings Article•DOI•

Query-Adaptive Logo Search using Shape-Aware Descriptors

[...]

Sreyasee Das Bhattacharjee¹, Junsong Yuan¹, Yap-Peng Tan¹, Ling-Yu Duan²•Institutions (2)

Nanyang Technological University¹, Peking University²

13 Oct 2015

TL;DR: A graph-based optimization framework to leverage category independent object proposals (candidate object regions) for logo search in a large scale image database and an efficient feature descriptor EdgeBoW, which can yield promising results, specially for object categories primarily defined by its shape.

...read moreread less

Abstract: We propose a graph-based optimization framework to leverage category independent object proposals (candidate object regions) for logo search in a large scale image database. The proposed contour-based feature descriptor EdgeBoW is robust to view-angle changes, varying illumination conditions and can implicitly capture the significant object shape information. Having been equipped with a local descriptor, it can handle a fair amount of occlusion and deformation frequently present in a real-life scenario. Given a small set of initially retrieved candidate object proposals, a fast graph-based short-listing scheme is designed to exploit the mutual similarities among these proposals for eliminating outliers. In contrast to a coarse image-level pairwise similarity measure, this search focussed on a few specific image regions provides a more accurate method for matching. The proposed query expansion strategy aims to assess each of the remaining better matched proposals against all its neighbors within the same image for a precise localization. Combined with an efficient feature descriptor EdgeBoW, a set of more insightful edge-weights and node-utility measures can yield promising results, specially for object categories primarily defined by its shape. Extensive set of experiments performed on a number of benchmark datasets demonstrates its effectiveness and superior generalization ability in both clutter intensive real-life images and poor quality binary document images.

...read moreread less

Book Chapter•DOI•

Unsupervised Trees for Human Action Search

[...]

Gang Yu¹, Junsong Yuan¹, Zicheng Liu²•Institutions (2)

Nanyang Technological University¹, Microsoft²

01 Jan 2015

TL;DR: This chapter proposes a very fast action retrieval system which can effectively locate the subvolumes similar to the query video and proposes a coarse-to-fine subvolume search scheme, which results in a dramatic speedup over the existing video branch-and-bound method.

...read moreread less

Abstract: Action search is an interesting problem for human action analysis, which has a lot of potential applications in industry. In this chapter, we propose a very fast action retrieval system which can effectively locate the subvolumes similar to the query video. Random-indexing-trees-based visual vocabularies are introduced for the database indexing. By increasing the number of vocabularies, the large intra-class variance problem can be relieved despite only one query sample available. In addition, we use a mutual information based formulation, which is easy to leverage feedback from the user. Also, a coarse-to-fine subvolume search scheme is proposed, which results in a dramatic speedup over the existing video branch-and-bound method. Cross-dataset experiments demonstrate that our proposed method is not only fast to search higher-resolution videos, but also robust to action variations, partial occlusions, and cluttered and dynamic backgrounds. Besides from the superior performance, our system is fast for on-line applications, for example, we can finish an action search in 24 s from a 1 h database and in 37 s from a 5 h database.

...read moreread less

Proceedings Article•DOI•

Glasses-free light field 3D display

[...]

Shizheng Wang¹, Xiangyu Zhang¹, Qijia Cheng¹, Kaviya Rajendran¹, Phil Surman¹, Junsong Yuan¹, Xiao Wei Sun¹ - Show less +3 more•Institutions (1)

Nanyang Technological University¹

01 Dec 2015

TL;DR: A demo system that realistically displays the glasses-free light field 3D effect with a triple-layer structure by combining multi-layer panels, high refresh rates, and directional backlighting together with a thin form factor is presented.

...read moreread less

Abstract: This paper presents a demo system that realistically displays the glasses-free light field 3D effect with a triple-layer structure. By combining multi-layer panels, high refresh rates, and directional backlighting together, we achieve a wide field of view and large depth of field with a thin form factor. Additionally, using some off-the-shelf hardware, this system demonstrates an interesting light field display.

...read moreread less

Book Chapter•DOI•

Propagative Hough Voting to Leverage Contextual Information

[...]

Gang Yu¹, Junsong Yuan¹, Zicheng Liu²•Institutions (2)

Nanyang Technological University¹, Microsoft²

01 Jan 2015

TL;DR: The superior performances on benchmarked datasets validate that the propagative Hough voting can outperform state-of-the-art techniques in various action analysis tasks, such as action search and recognition.

...read moreread less

Abstract: Generalized Hough voting has shown promising results in both object and action detection. However, most existing Hough voting methods will suffer when insufficient training data are provided. To address this limitation, we propose propagative Hough voting in this chapter. Instead of training a discriminative classifier for local feature voting, we first match labeled feature points to unlabeled feature points, then propagate the label and sptatio-temporal configuration information via Hough voting. To enable a fast and robust matching, we index the unlabeled data using random projection trees (RPT). RPT can leverage the low-dimension manifold structure to provide adaptive local feature matching. Moreover, as the RPT index can be built in either labeled or unlabeled dataset, it can be applied to different tasks such as action search (limited training) and recognition (sufficient training). The superior performances on benchmarked datasets validate that our propagative Hough voting can outperform state-of-the-art techniques in various action analysis tasks, such as action search and recognition.

...read moreread less

Book Chapter•DOI•

Human Action Prediction with Multiclass Balanced Random Forest

[...]

Gang Yu¹, Junsong Yuan¹, Zicheng Liu²•Institutions (2)

Nanyang Technological University¹, Microsoft²

01 Jan 2015

TL;DR: This chapter develops a spatial-temporal implicit shape model (STISM), which characterizes the space-time structure of the sparse local features extracted from a video, and proposes a new random forest structure, called multiclass balanced random forest, which makes a good trade-off between the balance of the trees and the discriminative abilities.

...read moreread less

Abstract: Early recognition and prediction of human activities are of great importance in video surveillance. In this chapter, we target this problem by developing a spatial-temporal implicit shape model (STISM), which characterizes the space-time structure of the sparse local features extracted from a video. The recognition of human activities is accomplished by pattern matching through STISM. To enable efficient and robust matching, we propose a new random forest structure, called multiclass balanced random forest, which makes a good trade-off between the balance of the trees and the discriminative abilities. The prediction is done simultaneously for multiple classes, which saves both the memory and computational cost. The experiments show that our algorithm significantly outperforms the state-of-the-art for the human activity prediction problem.

...read moreread less