scispace - formally typeset
Search or ask a question
Author

Roozbeh Jafari

Bio: Roozbeh Jafari is an academic researcher from Texas A&M University. The author has contributed to research in topics: Wireless sensor network & Wearable computer. The author has an hindex of 40, co-authored 224 publications receiving 5901 citations. Previous affiliations of Roozbeh Jafari include University of California & Cornell University.


Papers
More filters
Proceedings ArticleDOI
10 Dec 2015
TL;DR: A freely available dataset, named UTD-MHAD, which consists of four temporally synchronized data modalities, which includes RGB videos, depth videos, skeleton positions, and inertial signals from a Kinect camera and a wearable inertial sensor for a comprehensive set of 27 human actions is described.
Abstract: Human action recognition has a wide range of applications including biometrics, surveillance, and human computer interaction. The use of multimodal sensors for human action recognition is steadily increasing. However, there are limited publicly available datasets where depth camera and inertial sensor data are captured at the same time. This paper describes a freely available dataset, named UTD-MHAD, which consists of four temporally synchronized data modalities. These modalities include RGB videos, depth videos, skeleton positions, and inertial signals from a Kinect camera and a wearable inertial sensor for a comprehensive set of 27 human actions. Experimental results are provided to show how this database can be used to study fusion approaches that involve using both depth camera data and inertial sensor data. This public domain dataset is of benefit to multimodality research activities being conducted for human action recognition by various research groups.

606 citations

Journal ArticleDOI
TL;DR: This paper analyzes the most important requirements for an effective BSN-specific software framework, enabling efficient signal-processing applications and presents signal processing in node environment (SPINE), an open-source programming framework, designed to support rapid and flexible prototyping and management of BSN applications.
Abstract: Wireless body sensor networks (BSNs) possess enormous potential for changing people's daily lives. They can enhance many human-centered application domains such as m-Health, sport and wellness, and human-centered applications that involve physical/virtual social interactions. However, there are still challenging issues that limit their wide diffusion in real life: primarily, the programming complexity of these systems, due to the lack of high-level software abstractions, and the hardware constraints of wearable devices. In contrast with low-level programming and general-purpose middleware, domain-specific frameworks are an emerging programming paradigm designed to fulfill the lack of suitable BSN programming support with proper abstraction layers. This paper analyzes the most important requirements for an effective BSN-specific software framework, enabling efficient signal-processing applications. Specifically, we present signal processing in node environment (SPINE), an open-source programming framework, designed to support rapid and flexible prototyping and management of BSN applications. We describe how SPINE efficiently addresses the identified requirements while providing performance analysis on the most common hardware/software sensor platforms. We also report a few high-impact BSN applications that have been entirely implemented using SPINE to demonstrate practical examples of its effectiveness and flexibility. This development experience has notably led to the definition of a SPINE-based design methodology for BSN applications. Finally, lessons learned from the development of such applications and from feedback received by the SPINE community are discussed.

388 citations

Journal ArticleDOI
TL;DR: The thrust of this survey is on the utilization of depth cameras and inertial sensors as these two types of sensors are cost-effective, commercially available, and more significantly they both provide 3D human action data.
Abstract: A number of review or survey articles have previously appeared on human action recognition where either vision sensors or inertial sensors are used individually. Considering that each sensor modality has its own limitations, in a number of previously published papers, it has been shown that the fusion of vision and inertial sensor data improves the accuracy of recognition. This survey article provides an overview of the recent investigations where both vision and inertial sensors are used together and simultaneously to perform human action recognition more effectively. The thrust of this survey is on the utilization of depth cameras and inertial sensors as these two types of sensors are cost-effective, commercially available, and more significantly they both provide 3D human action data. An overview of the components necessary to achieve fusion of data from depth and inertial sensors is provided. In addition, a review of the publicly available datasets that include depth and inertial data which are simultaneously captured via depth and inertial sensors is presented.

294 citations

Journal ArticleDOI
TL;DR: The results indicate that because of the complementary aspect of the data from these sensors, the introduced fusion approaches lead to 2% to 23% recognition rate improvements depending on the action over the situations when each sensor is used individually.
Abstract: This paper presents a fusion approach for improving human action recognition based on two differing modality sensors consisting of a depth camera and an inertial body sensor. Computationally efficient action features are extracted from depth images provided by the depth camera and from accelerometer signals provided by the inertial body sensor. These features consist of depth motion maps and statistical signal attributes. For action recognition, both feature-level fusion and decision-level fusion are examined by using a collaborative representation classifier. In the feature-level fusion, features generated from the two differing modality sensors are merged before classification, while in the decision-level fusion, the Dempster–Shafer theory is used to combine the classification outcomes from two classifiers, each corresponding to one sensor. The introduced fusion framework is evaluated using the Berkeley multimodal human action database. The results indicate that because of the complementary aspect of the data from these sensors, the introduced fusion approaches lead to 2% to 23% recognition rate improvements depending on the action over the situations when each sensor is used individually.

240 citations

Journal ArticleDOI
01 Apr 2009
TL;DR: A distributed recognition framework to classify continuous human actions using a low-bandwidth wearable motion sensor network, called distributed sparsity classifier (DSC), validates the robustness of the distributed recognitionframework on an unreliable wireless network and demonstrates the ability of DSC to conserve sensor energy for communication while preserve accurate global classification.
Abstract: We propose a distributed recognition framework to classify continuous human actions using a low-bandwidth wearable motion sensor network, called distributed sparsity classifier (DSC). The algorithm classifies human actions using a set of training motion sequences as prior examples. It is also capable of rejecting outlying actions that are not in the training categories. The classification is operated in a distributed fashion on individual sensor nodes and a base station computer. We model the distribution of multiple action classes as a mixture subspace model, one subspace for each action class. Given a new test sample, we seek the sparsest linear representation of the sample w.r.t. all training examples. We show that the dominant coefficients in the representation only correspond to the action class of the test sample, and hence its membership is encoded in the sparse representation. Fast linear solvers are provided to compute such representation via e 1-minimization. To validate the accuracy of the framework, a public wearable action recognition database is constructed, called wearable action recognition database (WARD). The database is comprised of 20 human subjects in 13 action categories. Using up to five motion sensors in the WARD database, DSC achieves state-of-the-art performance. We further show that the recognition precision only decreases gracefully using smaller subsets of active sensors. It validates the robustness of the distributed recognition framework on an unreliable wireless network. It also demonstrates the ability of DSC to conserve sensor energy for communication while preserve accurate global classification. (This work was partially supported by ARO MURI W911NF-06-1-0076, NSF TRUST Center, and the startup funding from the University of Texas and Texas Instruments.)

212 citations


Cited by
More filters
Journal ArticleDOI
29 Apr 2010
TL;DR: This review paper highlights a few representative examples of how the interaction between sparse signal representation and computer vision can enrich both fields, and raises a number of open questions for further study.
Abstract: Techniques from sparse signal representation are beginning to see significant impact in computer vision, often on nontraditional applications where the goal is not just to obtain a compact high-fidelity representation of the observed signal, but also to extract semantic information. The choice of dictionary plays a key role in bridging this gap: unconventional dictionaries consisting of, or learned from, the training samples themselves provide the key to obtaining state-of-the-art results and to attaching semantic meaning to sparse signal representations. Understanding the good performance of such unconventional dictionaries in turn demands new algorithmic and analytical techniques. This review paper highlights a few representative examples of how the interaction between sparse signal representation and computer vision can enrich both fields, and raises a number of open questions for further study.

1,871 citations

Posted Content
TL;DR: In this paper, a large-scale dataset for RGB+D human action recognition was introduced with more than 56 thousand video samples and 4 million frames, collected from 40 distinct subjects.
Abstract: Recent approaches in depth-based human activity analysis achieved outstanding performance and proved the effectiveness of 3D representation for classification of action classes. Currently available depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of training samples, distinct class labels, camera views and variety of subjects. In this paper we introduce a large-scale dataset for RGB+D human action recognition with more than 56 thousand video samples and 4 million frames, collected from 40 distinct subjects. Our dataset contains 60 different action classes including daily, mutual, and health-related actions. In addition, we propose a new recurrent neural network structure to model the long-term temporal correlation of the features for each body part, and utilize them for better action classification. Experimental results show the advantages of applying deep learning methods over state-of-the-art hand-crafted features on the suggested cross-subject and cross-view evaluation criteria for our dataset. The introduction of this large scale dataset will enable the community to apply, develop and adapt various data-hungry learning techniques for the task of depth-based and RGB+D-based human activity analysis.

1,448 citations

Proceedings ArticleDOI
01 Jun 2016
TL;DR: A large-scale dataset for RGB+D human action recognition with more than 56 thousand video samples and 4 million frames, collected from 40 distinct subjects is introduced and a new recurrent neural network structure is proposed to model the long-term temporal correlation of the features for each body part, and utilize them for better action classification.
Abstract: Recent approaches in depth-based human activity analysis achieved outstanding performance and proved the effectiveness of 3D representation for classification of action classes. Currently available depth-based and RGB+Dbased action recognition benchmarks have a number of limitations, including the lack of training samples, distinct class labels, camera views and variety of subjects. In this paper we introduce a large-scale dataset for RGB+D human action recognition with more than 56 thousand video samples and 4 million frames, collected from 40 distinct subjects. Our dataset contains 60 different action classes including daily, mutual, and health-related actions. In addition, we propose a new recurrent neural network structure to model the long-term temporal correlation of the features for each body part, and utilize them for better action classification. Experimental results show the advantages of applying deep learning methods over state-of-the-art handcrafted features on the suggested cross-subject and crossview evaluation criteria for our dataset. The introduction of this large scale dataset will enable the community to apply, develop and adapt various data-hungry learning techniques for the task of depth-based and RGB+D-based human activity analysis.

1,391 citations

Journal ArticleDOI
TL;DR: The current state-of-art of WBANs is surveyed based on the latest standards and publications, and open issues and challenges within each area are explored as a source of inspiration towards future developments inWBANs.
Abstract: Recent developments and technological advancements in wireless communication, MicroElectroMechanical Systems (MEMS) technology and integrated circuits has enabled low-power, intelligent, miniaturized, invasive/non-invasive micro and nano-technology sensor nodes strategically placed in or around the human body to be used in various applications, such as personal health monitoring. This exciting new area of research is called Wireless Body Area Networks (WBANs) and leverages the emerging IEEE 802.15.6 and IEEE 802.15.4j standards, specifically standardized for medical WBANs. The aim of WBANs is to simplify and improve speed, accuracy, and reliability of communication of sensors/actuators within, on, and in the immediate proximity of a human body. The vast scope of challenges associated with WBANs has led to numerous publications. In this paper, we survey the current state-of-art of WBANs based on the latest standards and publications. Open issues and challenges within each area are also explored as a source of inspiration towards future developments in WBANs.

1,359 citations