scispace - formally typeset
Search or ask a question
Author

Ferda Ofli

Bio: Ferda Ofli is an academic researcher from Qatar Computing Research Institute. The author has contributed to research in topics: Social media & Computer science. The author has an hindex of 28, co-authored 92 publications receiving 3437 citations. Previous affiliations of Ferda Ofli include Khalifa University & Massachusetts Institute of Technology.


Papers
More filters
Proceedings ArticleDOI
15 Jan 2013
TL;DR: A controlled multimodal dataset consisting of temporally synchronized and geometrically calibrated data from an optical motion capture system, multi-baseline stereo cameras from multiple views, depth sensors, accelerometers and microphones, provides researchers an inclusive testbed to develop and benchmark new algorithms across multiple modalities under known capture conditions in various research domains.
Abstract: Over the years, a large number of methods have been proposed to analyze human pose and motion information from images, videos, and recently from depth data. Most methods, however, have been evaluated on datasets that were too specific to each application, limited to a particular modality, and more importantly, captured under unknown conditions. To address these issues, we introduce the Berkeley Multimodal Human Action Database (MHAD) consisting of temporally synchronized and geometrically calibrated data from an optical motion capture system, multi-baseline stereo cameras from multiple views, depth sensors, accelerometers and microphones. This controlled multimodal dataset provides researchers an inclusive testbed to develop and benchmark new algorithms across multiple modalities under known capture conditions in various research domains. To demonstrate possible use of MHAD for action recognition, we compare results using the popular Bag-of-Words algorithm adapted to each modality independently with the results of various combinations of modalities using the Multiple Kernel Learning. Our comparative results show that multimodal analysis of human motion yields better action recognition rates than unimodal analysis.

462 citations

Journal ArticleDOI
TL;DR: A new representation of human actions called Sequence of the Most Informative Joints (SMIJ), which is extremely easy to interpret and performs better than several state-of-the-art algorithms for the task of human action recognition.

373 citations

Proceedings ArticleDOI
12 Nov 2012
TL;DR: This paper compares the Kinect pose estimation (skeletonization) with more established techniques for pose estimation from motion capture data, examining the accuracy of joint localization and robustness of pose estimation with respect to the orientation and occlusions.
Abstract: The Microsoft Kinect camera is becoming increasingly popular in many areas aside from entertainment, including human activity monitoring and rehabilitation. Many people, however, fail to consider the reliability and accuracy of the Kinect human pose estimation when they depend on it as a measuring system. In this paper we compare the Kinect pose estimation (skeletonization) with more established techniques for pose estimation from motion capture data, examining the accuracy of joint localization and robustness of pose estimation with respect to the orientation and occlusions. We have evaluated six physical exercises aimed at coaching of elderly population. Experimental results present pose estimation accuracy rates and corresponding error bounds for the Kinect system.

356 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: This paper introduces Recipe1M, a new large-scale, structured corpus of over 1m cooking recipes and 800k food images, and demonstrates that regularization via the addition of a high-level classification objective both improves retrieval performance to rival that of humans and enables semantic vector arithmetic.
Abstract: In this paper, we introduce Recipe1M, a new large-scale, structured corpus of over 1m cooking recipes and 800k food images. As the largest publicly available collection of recipe data, Recipe1M affords the ability to train high-capacity models on aligned, multi-modal data. Using these data, we train a neural network to find a joint embedding of recipes and images that yields impressive results on an image-recipe retrieval task. Additionally, we demonstrate that regularization via the addition of a high-level classification objective both improves retrieval performance to rival that of humans and enables semantic vector arithmetic. We postulate that these embeddings will provide a basis for further exploration of the Recipe1M dataset and food and cooking in general. Code, data and models are publicly available

346 citations

Proceedings ArticleDOI
23 Jun 2013
TL;DR: This work proposes a hierarchy of dynamic medial axis structures at several spatio-temporal scales that can be modeled using a set of Linear Dynamical Systems (LDSs), and proposes novel discriminative metrics for comparing these sets of LDSs for the task of human activity recognition.
Abstract: Over the last few years, with the immense popularity of the Kinect, there has been renewed interest in developing methods for human gesture and action recognition from 3D data. A number of approaches have been proposed that extract representative features from 3D depth data, a reconstructed 3D surface mesh or more commonly from the recovered estimate of the human skeleton. Recent advances in neuroscience have discovered a neural encoding of static 3D shapes in primate infero-temporal cortex that can be represented as a hierarchy of medial axis and surface features. We hypothesize a similar neural encoding might also exist for 3D shapes in motion and propose a hierarchy of dynamic medial axis structures at several spatio-temporal scales that can be modeled using a set of Linear Dynamical Systems (LDSs). We then propose novel discriminative metrics for comparing these sets of LDSs for the task of human activity recognition. Combined with simple classification frameworks, our proposed features and corresponding hierarchical dynamical models provide the highest human activity recognition rates as compared to state-of-the-art methods on several skeletal datasets.

194 citations


Cited by
More filters
01 Jan 2016
TL;DR: The remote sensing and image interpretation is universally compatible with any devices to read and is available in the digital library an online access to it is set as public so you can get it instantly.
Abstract: Thank you very much for downloading remote sensing and image interpretation. As you may know, people have look hundreds times for their favorite novels like this remote sensing and image interpretation, but end up in malicious downloads. Rather than reading a good book with a cup of tea in the afternoon, instead they are facing with some malicious virus inside their computer. remote sensing and image interpretation is available in our digital library an online access to it is set as public so you can get it instantly. Our book servers spans in multiple countries, allowing you to get the most less latency time to download any of our books like this one. Merely said, the remote sensing and image interpretation is universally compatible with any devices to read.

1,802 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: This paper proposes an end-to-end hierarchical RNN for skeleton based action recognition, and demonstrates that the model achieves the state-of-the-art performance with high computational efficiency.
Abstract: Human actions can be represented by the trajectories of skeleton joints. Traditional methods generally model the spatial structure and temporal dynamics of human skeleton with hand-crafted features and recognize human actions by well-designed classifiers. In this paper, considering that recurrent neural network (RNN) can model the long-term contextual information of temporal sequences well, we propose an end-to-end hierarchical RNN for skeleton based action recognition. Instead of taking the whole skeleton as the input, we divide the human skeleton into five parts according to human physical structure, and then separately feed them to five subnets. As the number of layers increases, the representations extracted by the subnets are hierarchically fused to be the inputs of higher layers. The final representations of the skeleton sequences are fed into a single-layer perceptron, and the temporally accumulated output of the perceptron is the final decision. We compare with five other deep RNN architectures derived from our model to verify the effectiveness of the proposed network, and also compare with several other methods on three publicly available datasets. Experimental results demonstrate that our model achieves the state-of-the-art performance with high computational efficiency.

1,642 citations

Journal ArticleDOI
TL;DR: It is suggested that deep learning approaches could be the vehicle for translating big biomedical data into improved human health and develop holistic and meaningful interpretable architectures to bridge deep learning models and human interpretability.
Abstract: Gaining knowledge and actionable insights from complex, high-dimensional and heterogeneous biomedical data remains a key challenge in transforming health care. Various types of data have been emerging in modern biomedical research, including electronic health records, imaging, -omics, sensor data and text, which are complex, heterogeneous, poorly annotated and generally unstructured. Traditional data mining and statistical learning approaches typically need to first perform feature engineering to obtain effective and more robust features from those data, and then build prediction or clustering models on top of them. There are lots of challenges on both steps in a scenario of complicated data and lacking of sufficient domain knowledge. The latest advances in deep learning technologies provide new effective paradigms to obtain end-to-end learning models from complex data. In this article, we review the recent literature on applying deep learning technologies to advance the health care domain. Based on the analyzed work, we suggest that deep learning approaches could be the vehicle for translating big biomedical data into improved human health. However, we also note limitations and needs for improved methods development and applications, especially in terms of ease-of-understanding for domain experts and citizen scientists. We discuss such challenges and suggest developing holistic and meaningful interpretable architectures to bridge deep learning models and human interpretability.

1,573 citations

Journal ArticleDOI
TL;DR: A comprehensive review of recent Kinect-based computer vision algorithms and applications covering topics including preprocessing, object tracking and recognition, human activity analysis, hand gesture analysis, and indoor 3-D mapping.
Abstract: With the invention of the low-cost Microsoft Kinect sensor, high-resolution depth and visual (RGB) sensing has become available for widespread use. The complementary nature of the depth and visual information provided by the Kinect sensor opens up new opportunities to solve fundamental problems in computer vision. This paper presents a comprehensive review of recent Kinect-based computer vision algorithms and applications. The reviewed approaches are classified according to the type of vision problems that can be addressed or enhanced by means of the Kinect sensor. The covered topics include preprocessing, object tracking and recognition, human activity analysis, hand gesture analysis, and indoor 3-D mapping. For each category of methods, we outline their main algorithmic contributions and summarize their advantages/differences compared to their RGB counterparts. Finally, we give an overview of the challenges in this field and future research trends. This paper is expected to serve as a tutorial and source of references for Kinect-based computer vision researchers.

1,513 citations

Proceedings ArticleDOI
23 Jun 2014
TL;DR: A new skeletal representation that explicitly models the 3D geometric relationships between various body parts using rotations and translations in 3D space is proposed and outperforms various state-of-the-art skeleton-based human action recognition approaches.
Abstract: Recently introduced cost-effective depth sensors coupled with the real-time skeleton estimation algorithm of Shotton et al. [16] have generated a renewed interest in skeleton-based human action recognition. Most of the existing skeleton-based approaches use either the joint locations or the joint angles to represent a human skeleton. In this paper, we propose a new skeletal representation that explicitly models the 3D geometric relationships between various body parts using rotations and translations in 3D space. Since 3D rigid body motions are members of the special Euclidean group SE(3), the proposed skeletal representation lies in the Lie group SE(3)×…×SE(3), which is a curved manifold. Using the proposed representation, human actions can be modeled as curves in this Lie group. Since classification of curves in this Lie group is not an easy task, we map the action curves from the Lie group to its Lie algebra, which is a vector space. We then perform classification using a combination of dynamic time warping, Fourier temporal pyramid representation and linear SVM. Experimental results on three action datasets show that the proposed representation performs better than many existing skeletal representations. The proposed approach also outperforms various state-of-the-art skeleton-based human action recognition approaches.

1,432 citations