scispace - formally typeset
Search or ask a question
Author

Omar Oreifej

Other affiliations: University of Central Florida
Bio: Omar Oreifej is an academic researcher from University of California, Berkeley. The author has contributed to research in topics: Object detection & Motion estimation. The author has an hindex of 8, co-authored 19 publications receiving 1836 citations. Previous affiliations of Omar Oreifej include University of Central Florida.

Papers
More filters
Proceedings ArticleDOI
23 Jun 2013
TL;DR: A new descriptor for activity recognition from videos acquired by a depth sensor is presented that better captures the joint shape-motion cues in the depth sequence, and thus outperforms the state-of-the-art on all relevant benchmarks.
Abstract: We present a new descriptor for activity recognition from videos acquired by a depth sensor. Previous descriptors mostly compute shape and motion features independently, thus, they often fail to capture the complex joint shape-motion cues at pixel-level. In contrast, we describe the depth sequence using a histogram capturing the distribution of the surface normal orientation in the 4D space of time, depth, and spatial coordinates. To build the histogram, we create 4D projectors, which quantize the 4D space and represent the possible directions for the 4D normal. We initialize the projectors using the vertices of a regular polychoron. Consequently, we refine the projectors using a discriminative density measure, such that additional projectors are induced in the directions where the 4D normals are more dense and discriminative. Through extensive experiments, we demonstrate that our descriptor better captures the joint shape-motion cues in the depth sequence, and thus outperforms the state-of-the-art on all relevant benchmarks.

978 citations

Proceedings ArticleDOI
16 Jun 2012
TL;DR: This paper proposes a robust part-based tracking-by-detection framework that learns part- based person-specific SVM classifiers which capture the articulations of the human bodies in dynamically changing appearance and background.
Abstract: Single camera-based multiple-person tracking is often hindered by difficulties such as occlusion and changes in appearance. In this paper, we address such problems by proposing a robust part-based tracking-by-detection framework. Human detection using part models has become quite popular, yet its extension in tracking has not been fully explored. Our approach learns part-based person-specific SVM classifiers which capture the articulations of the human bodies in dynamically changing appearance and background. With the part-based model, our approach is able to handle partial occlusions in both the detection and the tracking stages. In the detection stage, we select the subset of parts which maximizes the probability of detection, which significantly improves the detection performance in crowded scenes. In the tracking stage, we dynamically handle occlusions by distributing the score of the learned person classifier among its corresponding parts, which allows us to detect and predict partial occlusions, and prevent the performance of the classifiers from being degraded. Extensive experiments using the proposed method on several challenging sequences demonstrate state-of-the-art performance in multiple-people tracking.

363 citations

Journal ArticleDOI
TL;DR: A novel three-term low-rank matrix decomposition approach in which a Gaussian-based turbulence model can be employed to enforce an additional constraint on the search space of the minimization.
Abstract: Turbulence mitigation refers to the stabilization of videos with nonuniform deformations due to the influence of optical turbulence. Typical approaches for turbulence mitigation follow averaging or dewarping techniques. Although these methods can reduce the turbulence, they distort the independently moving objects, which can often be of great interest. In this paper, we address the novel problem of simultaneous turbulence mitigation and moving object detection. We propose a novel three-term low-rank matrix decomposition approach in which we decompose the turbulence sequence into three components: the background, the turbulence, and the object. We simplify this extremely difficult problem into a minimization of nuclear norm, Frobenius norm, and 21 norm. Our method is based on two observations: First, the turbulence causes dense and Gaussian noise and therefore can be captured by Frobenius norm, while the moving objects are sparse and thus can be captured by 21 norm. Second, since the object's motion is linear and intrinsically different from the Gaussian-like turbulence, a Gaussian-based turbulence model can be employed to enforce an additional constraint on the search space of the minimization. We demonstrate the robustness of our approach on challenging sequences which are significantly distorted with atmospheric turbulence and include extremely tiny moving objects.

208 citations

Proceedings ArticleDOI
06 Nov 2011
TL;DR: A novel approach which does not follow the standard steps, and accordingly avoids the aforementioned difficulties is proposed, based on Lagrangian particle trajectories which are a set of dense trajectories obtained by advecting optical flow over time, thus capturing the ensemble motions of a scene.
Abstract: Recognition of human actions in a video acquired by a moving camera typically requires standard preprocessing steps such as motion compensation, moving object detection and object tracking. The errors from the motion compensation step propagate to the object detection stage, resulting in miss-detections, which further complicates the tracking stage, resulting in cluttered and incorrect tracks. Therefore, action recognition from a moving camera is considered very challenging. In this paper, we propose a novel approach which does not follow the standard steps, and accordingly avoids the aforementioned difficulties. Our approach is based on Lagrangian particle trajectories which are a set of dense trajectories obtained by advecting optical flow over time, thus capturing the ensemble motions of a scene. This is done in frames of unaligned video, and no object detection is required. In order to handle the moving camera, we propose a novel approach based on low rank optimization, where we decompose the trajectories into their camera-induced and object-induced components. Having obtained the relevant object motion trajectories, we compute a compact set of chaotic invariant features which captures the characteristics of the trajectories. Consequently, a SVM is employed to learn and recognize the human actions using the computed motion features. We performed intensive experiments on multiple benchmark datasets and two new aerial datasets called ARG and APHill, and obtained promising results.

181 citations

Proceedings ArticleDOI
13 Jun 2010
TL;DR: This paper proposes an algorithm to address the novel problem of human identity recognition over a set of unordered low quality aerial images by implementing a weighted voter-candidate formulation and identifies the candidate with the highest weighted vote as the target.
Abstract: Human identity recognition is an important yet under-addressed problem. Previous methods were strictly limited to high quality photographs, where the principal techniques heavily rely on body details such as face detection. In this paper, we propose an algorithm to address the novel problem of human identity recognition over a set of unordered low quality aerial images. Assuming a user was able to manually locate a target in some images of the set, we find the target in each other query image by implementing a weighted voter-candidate formulation. In the framework, every manually located target is a voter, and the set of humans in a query image are candidates. In order to locate the target, we detect and align blobs of voters and candidates. Consequently, we use PageRank to extract distinguishing regions, and then match multiple regions of a voter to multiple regions of a candidate using Earth Mover Distance (EMD). This generates a robust similarity measure between every voter-candidate pair. Finally, we identify the candidate with the highest weighted vote as the target. We tested our technique over several aerial image sets that we collected, along with publicly available sets, and have obtained promising results.

135 citations


Cited by
More filters
Proceedings ArticleDOI
07 Jun 2015
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

40,257 citations

Proceedings ArticleDOI
01 Dec 2013
TL;DR: Dense trajectories were shown to be an efficient video representation for action recognition and achieved state-of-the-art results on a variety of datasets are improved by taking into account camera motion to correct them.
Abstract: Recently dense trajectories were shown to be an efficient video representation for action recognition and achieved state-of-the-art results on a variety of datasets. This paper improves their performance by taking into account camera motion to correct them. To estimate camera motion, we match feature points between frames using SURF descriptors and dense optical flow, which are shown to be complementary. These matches are, then, used to robustly estimate a homography with RANSAC. Human motion is in general different from camera motion and generates inconsistent matches. To improve the estimation, a human detector is employed to remove these matches. Given the estimated camera motion, we remove trajectories consistent with it. We also use this estimation to cancel out camera motion from the optical flow. This significantly improves motion-based descriptors, such as HOF and MBH. Experimental results on four challenging action datasets (i.e., Hollywood2, HMDB51, Olympic Sports and UCF50) significantly outperform the current state of the art.

3,487 citations

Book ChapterDOI
08 Oct 2016
TL;DR: A new pair of precision-recall measures of performance that treats errors of all types uniformly and emphasizes correct identification over sources of error are presented to help accelerate progress in multi-target, multi-camera tracking systems.
Abstract: To help accelerate progress in multi-target, multi-camera tracking systems, we present (i) a new pair of precision-recall measures of performance that treats errors of all types uniformly and emphasizes correct identification over sources of error; (ii) the largest fully-annotated and calibrated data set to date with more than 2 million frames of 1080 p, 60 fps video taken by 8 cameras observing more than 2,700 identities over 85 min; and (iii) a reference software system as a comparison baseline. We show that (i) our measures properly account for bottom-line identity match performance in the multi-camera setting; (ii) our data set poses realistic challenges to current trackers; and (iii) the performance of our system is comparable to the state of the art.

1,775 citations

Journal ArticleDOI
TL;DR: The MBH descriptor shows to consistently outperform other state-of-the-art descriptors, in particular on real-world videos that contain a significant amount of camera motion.
Abstract: This paper introduces a video representation based on dense trajectories and motion boundary descriptors. Trajectories capture the local motion information of the video. A dense representation guarantees a good coverage of foreground motion as well as of the surrounding context. A state-of-the-art optical flow algorithm enables a robust and efficient extraction of dense trajectories. As descriptors we extract features aligned with the trajectories to characterize shape (point coordinates), appearance (histograms of oriented gradients) and motion (histograms of optical flow). Additionally, we introduce a descriptor based on motion boundary histograms (MBH) which rely on differential optical flow. The MBH descriptor shows to consistently outperform other state-of-the-art descriptors, in particular on real-world videos that contain a significant amount of camera motion. We evaluate our video representation in the context of action classification on nine datasets, namely KTH, YouTube, Hollywood2, UCF sports, IXMAS, UIUC, Olympic Sports, UCF50 and HMDB51. On all datasets our approach outperforms current state-of-the-art results.

1,726 citations

Journal ArticleDOI
TL;DR: It is demonstrated that trackers can be evaluated objectively by survival curves, Kaplan Meier statistics, and Grubs testing, and it is found that in the evaluation practice the F-score is as effective as the object tracking accuracy (OTA) score.
Abstract: There is a large variety of trackers, which have been proposed in the literature during the last two decades with some mixed success. Object tracking in realistic scenarios is a difficult problem, therefore, it remains a most active area of research in computer vision. A good tracker should perform well in a large number of videos involving illumination changes, occlusion, clutter, camera motion, low contrast, specularities, and at least six more aspects. However, the performance of proposed trackers have been evaluated typically on less than ten videos, or on the special purpose datasets. In this paper, we aim to evaluate trackers systematically and experimentally on 315 video fragments covering above aspects. We selected a set of nineteen trackers to include a wide variety of algorithms often cited in literature, supplemented with trackers appearing in 2010 and 2011 for which the code was publicly available. We demonstrate that trackers can be evaluated objectively by survival curves, Kaplan Meier statistics, and Grubs testing. We find that in the evaluation practice the F-score is as effective as the object tracking accuracy (OTA) score. The analysis under a large variety of circumstances provides objective insight into the strengths and weaknesses of trackers.

1,604 citations