scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Human-Machine Systems in 2023"


Journal ArticleDOI
TL;DR: In this paper , a threshold-free method using a long shortterm memory recurrent neural network is constructed to segment four typical gait phases in a gait sequence for the temporal parameters analysis.
Abstract: Gait analysis is a prosperous tool for the clinical evaluation and diagnosis. In this article, a portable gait analysis system based on foot-mounted inertial sensors is established. A threshold-free method using a long short-term memory recurrent neural network is constructed to segment four typical gait phases in a gait sequence for the temporal parameters analysis. Segmentation accuracy reaches over 95% across recruited subjects with distinct gait patterns, which is significantly superior when compared with traditional machine learning methods. The zero-velocity indicator is generated successively according to the segmented sequence to accomplish zero velocity update for the spatial parameter calculation. The accuracy of the proposed system is also validated through the OptiTrack in the lab. The comparison result of the stride length shows that the error between the two systems is less than 2%, which demonstrates that our system can satisfy the demand in the clinical.

5 citations


Journal ArticleDOI
TL;DR: In this paper , a work using a smartphone with an off-the-shelf WiFi router for human activity recognition with various scales is presented, where the smartphone is configured with customized firmware and developed software for capturing WiFi channel state information (CSI) data.
Abstract: In this article, we present a work using a smartphone with an off-the-shelf WiFi router for human activity recognition with various scales. The router serves as a hotspot for transmitting WiFi packets. The smartphone is configured with customized firmware and developed software for capturing WiFi channel state information (CSI) data. We extract the features from the CSI data associated with specific human activities, and utilize the features to classify the activities using machine learning models. To evaluate the system performance, we test 20 types of human activities with different scales including seven small motions, four medium motions, and nine big motions. We recruit 60 participants and spend 140 hours for data collection at various experimental settings, and have 36 000 data points collected in total. Furthermore, for comparison, we adopt three distinct machine learning models, including convolutional neural networks (CNNs), decision tree, and long short-term memory. The results demonstrate that our system can predict these human activities with an overall accuracy of 97.25%. Specifically, our system achieves a mean accuracy of 97.57% for recognizing small-scale motions that are particularly useful for gesture recognition. We then consider the adaptability of the machine learning algorithms in classifying the motions, where CNN achieves the best predicting accuracy. As a result, our system enables human activity recognition in a more ubiquitous and mobile fashion that can potentially enhance a wide range of applications such as gesture control, sign language recognition, etc.

4 citations


Journal ArticleDOI
TL;DR: In this paper , a work using a smartphone with an off-the-shelf WiFi router for human activity recognition with various scales is presented, where the smartphone is configured with customized firmware and developed software for capturing WiFi channel state information (CSI) data.
Abstract: In this article, we present a work using a smartphone with an off-the-shelf WiFi router for human activity recognition with various scales. The router serves as a hotspot for transmitting WiFi packets. The smartphone is configured with customized firmware and developed software for capturing WiFi channel state information (CSI) data. We extract the features from the CSI data associated with specific human activities, and utilize the features to classify the activities using machine learning models. To evaluate the system performance, we test 20 types of human activities with different scales including seven small motions, four medium motions, and nine big motions. We recruit 60 participants and spend 140 hours for data collection at various experimental settings, and have 36 000 data points collected in total. Furthermore, for comparison, we adopt three distinct machine learning models, including convolutional neural networks (CNNs), decision tree, and long short-term memory. The results demonstrate that our system can predict these human activities with an overall accuracy of 97.25%. Specifically, our system achieves a mean accuracy of 97.57% for recognizing small-scale motions that are particularly useful for gesture recognition. We then consider the adaptability of the machine learning algorithms in classifying the motions, where CNN achieves the best predicting accuracy. As a result, our system enables human activity recognition in a more ubiquitous and mobile fashion that can potentially enhance a wide range of applications such as gesture control, sign language recognition, etc.

3 citations


Journal ArticleDOI
TL;DR: In this article , the influence of different physical human-exoskeleton interfaces on subjective and objective biomechanical parameters was quantified and the authors found that increasing the interaction area is necessary to improve the interaction quality at a subjective level.
Abstract: Despite exoskeletons becoming widespread tools in industrial applications, the impact of the design of human–exoskeleton physical interfaces has received little attention. This study aims at thoroughly quantifying the influence of different physical human–exoskeleton interfaces on subjective and objective biomechanical parameters. To this aim, 18 participants performed elbow flexion/extension movements while wearing an active exoskeleton with three different physical interfaces: a strap without any degree of freedom, a thermoformed orthosis with one (translation) and three degrees of freedom (translation and rotations). Interaction efforts, kinematic parameters, electromyographic activities, and subjective feelings were collected and examined during the experiment. Results showed that increasing the interaction area is necessary to improve the interaction quality at a subjective level. The addition of passive degrees of freedom allows significant improvements on both subjective and objective measurements. Outcomes of this study may provide fundamental insights to select physical interfaces when designing future exoskeletons.

3 citations


Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed an efficient hand gesture detection framework based on deep learning for dexterous robot hand-based visual teleoperation using an RGB-D camera, which achieved high accuracy with fast speed on public and custom hand datasets.
Abstract: Aiming at the problems of accurate and fast hand gesture detection and teleoperation mapping in the hand-based visual teleoperation of dexterous robots, an efficient hand gesture detection framework based on deep learning is proposed in this article. It can achieve an accurate and fast hand gesture detection and teleoperation of dexterous robots based on an anchor-free network architecture by using an RGB-D camera. First, an RGB-D early-fusion method based on the HSV space is proposed, effectively reducing background interference and enhancing hand information. Second, a hand gesture classification network (HandClasNet) is proposed to realize hand detection and localization by detecting the center and corner points of hands, and a HandClasNet is proposed to realize gesture recognition by using a parallel EfficientNet structure. Then, a dexterous robot hand-arm teleoperation system based on the hand gesture detection framework is designed to realize the hand-based teleoperation of a dexterous robot. Our method achieves high accuracy with fast speed on public and custom hand datasets and outperforms some state-of-the-art methods. In addition, the application of the proposed method in the hand-based teleoperation system can control the grasping of various objects by a dexterous hand-arm system in real time and accurately, which verifies the efficiency of our method.

3 citations


Journal ArticleDOI
TL;DR: In this paper , the influence of different physical human exoskeleton interfaces on subjective and objective biomechanical parameters was quantified and the authors found that increasing the interaction area is necessary to improve the interaction quality at a subjective level.
Abstract: Despite exoskeletons becoming widespread tools in industrial applications, the impact of the design of human–exoskeleton physical interfaces has received little attention. This study aims at thoroughly quantifying the influence of different physical human–exoskeleton interfaces on subjective and objective biomechanical parameters. To this aim, 18 participants performed elbow flexion/extension movements while wearing an active exoskeleton with three different physical interfaces: a strap without any degree of freedom, a thermoformed orthosis with one (translation) and three degrees of freedom (translation and rotations). Interaction efforts, kinematic parameters, electromyographic activities, and subjective feelings were collected and examined during the experiment. Results showed that increasing the interaction area is necessary to improve the interaction quality at a subjective level. The addition of passive degrees of freedom allows significant improvements on both subjective and objective measurements. Outcomes of this study may provide fundamental insights to select physical interfaces when designing future exoskeletons.

3 citations


Journal ArticleDOI
TL;DR: In this article , an underwater attentional generative adversarial network (UAGAN) is proposed to suppress unuseful underwater noise feature and effectively avoid overenhancement.
Abstract: In this article, to exclusively suppress unuseful underwater noise feature and effectively avoid overenhancement, simultaneously, an underwater attentional generative adversarial network (UAGAN) is innovatively established. Main contributions are as follows: combining dense concatenation with global maximum and average pooling techniques, a cascade dense-channel attention (CDCA) module is devised to adaptively distinguish noise feature and recalibrate channel weight, simultaneously, such that low-contribution feature map can be effectively suppressed; to sufficiently capture long-range dependence between any two nonlocal spatial patches, the position attention (PA) module is created such that the deviation among independent patches can be sufficiently eliminated, thereby avoiding overenhancement; and in conjunction with CDCA and PA modules, the entire UAGAN framework is eventually developed in an end-to-end manner. Comprehensive experiments conducted on underwater image enhancement benchmark (UIEB) and underwater robot professional contest (URPC) datasets demonstrate remarkable effectiveness and superiority of the proposed UAGAN scheme by comparing with typical underwater image enhancement approaches including unsupervised color correction method, image blurriness and light absorption, underwater dark channel prior, underwater generative adversarial network, underwater convolutional neural network, and WaterNet in terms of peak signal-to-noise ratio, underwater color image quality evaluation, underwater image quality measures, etc.

3 citations


Journal ArticleDOI
TL;DR: In this article , the authors proposed a new weighting-based deep ensemble model for recognizing interventionalists' hand motions in manual and robotic intravascular catheterization, which achieved 97.52% and 47.80% recognition performances on test samples in the in-vitro and invivo data, respectively.
Abstract: Robot-assisted intravascular interventions have evolved as unique treatments approach for cardiovascular diseases. However, the technology currently has low potentials for catheterization skill evaluation, slow learning curve, and inability to transfer experience gained from manual interventions. This study proposes a new weighting-based deep ensemble model for recognizing interventionalists' hand motions in manual and robotic intravascular catheterization. The model has a module of neural layers for extracting features in electromyography data, and an ensemble of machine learning methods for classifying interventionalists' hand gestures as one of the six hand motions used during catheterization. A soft-weighting technique is applied to guide the contributions of each base learners. The model is validated with electromyography data recorded during in-vitro and in-vivo trials and labeled as many-to-one sequences. Results obtained show the proposed model could achieve 97.52% and 47.80% recognition performances on test samples in the in-vitro and in-vivo data, respectively. For the latter, transfer learning was applied to update weights from the in-vitro data, and the retrained model was used for recognizing the hand motions in the in-vivo data. The weighting-based ensemble was evaluated against the base learners and the results obtained shows it has a more stable performance across the six hand motion classes. Also, the proposed model was compared with four existing methods used for hand motion recognition in intravascular catheterization. The results obtained show our model has the best recognition performances for both the in-vitro and in-vivo catheterization datasets. This study is developed toward increasing interventionalists' skills in robot-assisted catheterization.

3 citations


Journal ArticleDOI
TL;DR: In this paper , the authors highlight contemporary methods used for hand tracking and gesture recognition by collecting publications of systems developed in the last decade, that employ contactless devices as RGB cameras, IR, and depth sensors, along with some preceding pillar works.
Abstract: Hand tracking and gesture recognition are fundamental in a multitude of applications. Various sensors have been used for this purpose, however, all monocular vision systems face limitations caused by occlusions. Wearable equipment overcome said limitations, although deemed impractical in some cases. Using more than one sensor provides a way to overcome this problem, but necessitates more complicated designs. In this work, we aim to highlight contemporary methods used for hand tracking and gesture recognition by collecting publications of systems developed in the last decade, that employ contactless devices as RGB cameras, IR, and depth sensors, along with some preceding pillar works. Additionally, we briefly present common steps, techniques, and basic algorithms used during the process of developing modern hand tracking and gesture recognition systems and, finally, we derive the trend for the next future.

3 citations


Journal ArticleDOI
TL;DR: In this article , the authors highlight contemporary methods used for hand tracking and gesture recognition by collecting publications of systems developed in the last decade, that employ contactless devices as RGB cameras, IR, and depth sensors, along with some preceding pillar works.
Abstract: Hand tracking and gesture recognition are fundamental in a multitude of applications. Various sensors have been used for this purpose, however, all monocular vision systems face limitations caused by occlusions. Wearable equipment overcome said limitations, although deemed impractical in some cases. Using more than one sensor provides a way to overcome this problem, but necessitates more complicated designs. In this work, we aim to highlight contemporary methods used for hand tracking and gesture recognition by collecting publications of systems developed in the last decade, that employ contactless devices as RGB cameras, IR, and depth sensors, along with some preceding pillar works. Additionally, we briefly present common steps, techniques, and basic algorithms used during the process of developing modern hand tracking and gesture recognition systems and, finally, we derive the trend for the next future.

2 citations


Journal ArticleDOI
TL;DR: In this article , a threshold-free method using a long shortterm memory recurrent neural network is constructed to segment four typical gait phases in a gait sequence for the temporal parameters analysis.
Abstract: Gait analysis is a prosperous tool for the clinical evaluation and diagnosis. In this article, a portable gait analysis system based on foot-mounted inertial sensors is established. A threshold-free method using a long short-term memory recurrent neural network is constructed to segment four typical gait phases in a gait sequence for the temporal parameters analysis. Segmentation accuracy reaches over 95% across recruited subjects with distinct gait patterns, which is significantly superior when compared with traditional machine learning methods. The zero-velocity indicator is generated successively according to the segmented sequence to accomplish zero velocity update for the spatial parameter calculation. The accuracy of the proposed system is also validated through the OptiTrack in the lab. The comparison result of the stride length shows that the error between the two systems is less than 2%, which demonstrates that our system can satisfy the demand in the clinical.

Journal ArticleDOI
TL;DR: In this paper , an eye-brain hybrid brain-computer interface (BCI) interaction system was introduced for intention detection through the fusion of multimodal eye-tracker and event-related potential (ERP) features.
Abstract: Intention decoding is an indispensable procedure in hands-free human–computer interaction (HCI). A conventional eye-tracker system using a single-model fixation duration may issue commands that ignore users' real expectations. Here, an eye-brain hybrid brain–computer interface (BCI) interaction system was introduced for intention detection through the fusion of multimodal eye-tracker and event-related potential (ERP) [a measurement derived from electroencephalography (EEG)] features. Eye-tracking and EEG data were recorded from 64 healthy participants as they performed a 40-min customized free search task of a fixed target icon among 25 icons. The corresponding fixation duration of eye tracking and ERP were extracted. Five previously-validated linear discriminant analysis (LDA)-based classifiers [including regularized LDA, stepwise LDA, Bayesian LDA, shrinkage linear discriminant analysis (SKLDA), and spatial-temporal discriminant analysis] and the widely-used convolutional neural network (CNN) method were adopted to verify the efficacy of feature fusion from both offline and pseudo-online analysis, and the optimal approach was evaluated by modulating the training set and system response duration. Our study demonstrated that the input of multimodal eye tracking and ERP features achieved a superior performance of intention detection in the single-trial classification of active search tasks. Compared with the single-model ERP feature, this new strategy also induced congruent accuracy across classifiers. Moreover, in comparison with other classification methods, we found that SKLDA exhibited a superior performance when fusing features in offline tests (ACC = 0.8783, AUC = 0.9004) and online simulations with various sample amounts and duration lengths. In summary, this study revealed a novel and effective approach for intention classification using an eye-brain hybrid BCI and further supported the real-life application of hands-free HCI in a more precise and stable manner.

Journal ArticleDOI
TL;DR: In this article , the authors present a systematic method for the design of a limited information shared control (LISC) system, in which some subsystems are fully controlled by automation while others are controlled by a human.
Abstract: This paper presents a systematic method for the design of a limited information shared control (LISC). LISC is used in applications where not all system states or reference trajectories are measurable by the automation. Typical examples are partially human-controlled systems, in which some subsystems are fully controlled by automation while others are controlled by a human. The proposed systematic design method uses a novel class of games to model human-machine interaction: the near potential differential games (NPDG). We provide a necessary and sufficient condition for the existence of an NPDG and derive an algorithm for finding a NPDG that completely describes a given differential game. The proposed design method is applied to the control of a large vehicle-manipulator system, in which the manipulator is controlled by a human operator and the vehicle is fully automated. The suitability of the NPDG to model differential games is verified in simulations, leading to a faster and more accurate controller design compared to manual tuning. Furthermore, the overall design process is validated in a study with sixteen test subjects, indicating the applicability of the proposed concept in real applications.

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed an efficient hand gesture detection framework based on deep learning for dexterous robot hand-based visual teleoperation using an RGB-D camera, which achieved high accuracy with fast speed on public and custom hand datasets.
Abstract: Aiming at the problems of accurate and fast hand gesture detection and teleoperation mapping in the hand-based visual teleoperation of dexterous robots, an efficient hand gesture detection framework based on deep learning is proposed in this article. It can achieve an accurate and fast hand gesture detection and teleoperation of dexterous robots based on an anchor-free network architecture by using an RGB-D camera. First, an RGB-D early-fusion method based on the HSV space is proposed, effectively reducing background interference and enhancing hand information. Second, a hand gesture classification network (HandClasNet) is proposed to realize hand detection and localization by detecting the center and corner points of hands, and a HandClasNet is proposed to realize gesture recognition by using a parallel EfficientNet structure. Then, a dexterous robot hand-arm teleoperation system based on the hand gesture detection framework is designed to realize the hand-based teleoperation of a dexterous robot. Our method achieves high accuracy with fast speed on public and custom hand datasets and outperforms some state-of-the-art methods. In addition, the application of the proposed method in the hand-based teleoperation system can control the grasping of various objects by a dexterous hand-arm system in real time and accurately, which verifies the efficiency of our method.

Journal ArticleDOI
TL;DR: In this article , an underwater attentional generative adversarial network (UAGAN) is proposed to suppress unuseful underwater noise feature and effectively avoid overenhancement.
Abstract: In this article, to exclusively suppress unuseful underwater noise feature and effectively avoid overenhancement, simultaneously, an underwater attentional generative adversarial network (UAGAN) is innovatively established. Main contributions are as follows: combining dense concatenation with global maximum and average pooling techniques, a cascade dense-channel attention (CDCA) module is devised to adaptively distinguish noise feature and recalibrate channel weight, simultaneously, such that low-contribution feature map can be effectively suppressed; to sufficiently capture long-range dependence between any two nonlocal spatial patches, the position attention (PA) module is created such that the deviation among independent patches can be sufficiently eliminated, thereby avoiding overenhancement; and in conjunction with CDCA and PA modules, the entire UAGAN framework is eventually developed in an end-to-end manner. Comprehensive experiments conducted on underwater image enhancement benchmark (UIEB) and underwater robot professional contest (URPC) datasets demonstrate remarkable effectiveness and superiority of the proposed UAGAN scheme by comparing with typical underwater image enhancement approaches including unsupervised color correction method, image blurriness and light absorption, underwater dark channel prior, underwater generative adversarial network, underwater convolutional neural network, and WaterNet in terms of peak signal-to-noise ratio, underwater color image quality evaluation, underwater image quality measures, etc.

Journal ArticleDOI
TL;DR: In this paper , the authors compared an optical-based AR setup and a semi-immersive setup based on a VR table, using the same anatomy training software and the same interaction system.
Abstract: Research on alternative ways to provide anatomy learning and training has increased over the past few years, especially since the COVID-19 pandemic. Virtual reality (VR) and augmented reality (AR) represent two promising alternatives in this regard. For this reason, in this work, we analyze the suitability of applying VR and AR for anatomy training, comparing an optical-based AR setup and a semi-immersive setup based on a VR table, using the same anatomy training software and the same interaction system. The AR-based setup uses a Magic Leap One, whereas the VR table is configured through the use of stereoscopic TV displays and a motion-capture system. This experiment builds on a previous one (Vergel et al., 2020) on which we have improved the AR-based setup and increased the complexity of one of the two tasks. The goal of this new experiment is to confirm whether the changes made in the setups modify the previous conclusions. Our hypothesis is that the improved AR-based setup will be more suitable, for anatomy training, than the VR-based setup. For this reason, we conducted an experimental research with 45 participants, comparing the use of an anatomy training software. Objective and subjective data were collected. The results show that the AR-based setup is the preferred choice. The differences in measurable performance were small but also favorable to the AR setup. In addition, participants provided better subjective ratings for the AR-based setup, confirming our initial hypothesis. Nevertheless, both setups offer a similar overall performance and provide excellent results in the subjective measures, with both systems approaching the highest possible values.

Journal ArticleDOI
TL;DR: In this paper , the authors used the Fourier-Bessel series expansion-based empirical wavelet transform (FBSE-EWT) with time and frequency-domain (TAFD) features.
Abstract: The accurate automated eye movement classification is gaining importance in the field of human–computer interaction (HCI). The present article aims at the classification of six types of eye movements from electromyogram (EMG) of extraocular muscles (EOM) signals using the Fourier–Bessel series expansion-based empirical wavelet transform (FBSE-EWT) with time and frequency-domain (TAFD) features. The FBSE-EWT of EMG signals results in Fourier–Bessel intrinsic mode functions (FBIMFs), which correspond to the frequency contents in the signal. A hybrid approach is used to select the prominent FBIMFs followed by the statistical and signal complexity-based feature extraction. Furthermore, metaheuristic optimization algorithms are employed to reduce the feature space dimension. The discrimination ability of the reduced feature set is verified by Kruskal–Wallis statistical test. Multiclass support vector machine (MSVM) has been employed for classification. First, the classification has been performed with TAFD features followed by the combination of TAFD and FBSE-EWT-based reduced feature set. The combination of TAFD and FBSE-EWT-based feature set has provided good classification performance. This study demonstrates the efficacy of FBSE-EWT and subsequent metaheuristic feature selection algorithms in classifying the eye movements from EMG of EOM signals. The combination of TAFD and the selected features through salp swarm optimization algorithm has provided maximum classification accuracy of 98.91% with MSVM employing Gaussian and radial basis function kernels. Thus, the proposed approach has the potential to be used in HCI applications involving biomedical signals.

Journal ArticleDOI
TL;DR: In this paper , a user study with experienced workers from the shop floor (n = 25) was conducted to evaluate workers' preferences in a practical context of human-robot interaction (HRI) in assembly.
Abstract: Collaborative industrial robotic arms (cobots) are integrated industrial assembly systems relieving their human coworkers from monotonous tasks and achieving productivity gains. The question of task allocation arises in the organization of these human–robot interactions. State of the art shows static, compensatory task allocation approaches in current assembly systems and flexible, adaptive task sharing (ATS) approaches in human factors research. The latter should exploit the economic and ergonomic advantages of cobot usage. Previous research results did not provide a clear insight into whether industrial workers prefer static or adaptive task allocation and which tasks workers do prefer to assign to cobots. Therefore, we set up a cobot demonstrator with a realistic industrial assembly use case and did a user study with experienced workers from the shop floor (n = 25). The aim of the user study is to provide a systematic understanding and evaluation of workers' preferences in a practical context of human–robot interaction (HRI) in assembly. Our main findings are that participants preferred the ATS concept to a predetermined task allocation and reported increased satisfaction with the allocation. Results show that participants are more likely to give manual tasks to the cobot in contrast to cognitive tasks. It shows that workers do not entrust all tasks to robots, but like to take over cognitive tasks by themselves. This work contributes to the design of human-centered HRI in industrial assembly systems.

Journal ArticleDOI
TL;DR: In this article , a human-aware haptic feedback pipeline that renders a distinguishable and interpretable physical interaction based on the contact states is explored. But the major problem is a mismatch between low-level modulated sensor signals and human aware haptic stimuli.
Abstract: Teleoperation with haptic feedback is especially useful for contact-rich manipulation. Humans can easily perceive physical interaction events and effects of hands-on manipulation. However, current measurement-based haptic rendering methods with distorting and confusing transmission limit teleoperation performance. The major problem is a mismatch between low-level modulated sensor signals and human-aware haptic stimuli. We explore a human-aware haptic feedback pipeline that renders a distinguishable and interpretable physical interaction based on the contact states. Manipulation tasks are modeled as time-series sequences of four contact states: 1) noncontact , 2) contact , 3) stick , and 4) slip . The temporal convolutional network model fuses force/torque sensor and velocity signals in real time to identify the four contact states with a 91.3% accuracy. Meanwhile, state-dependent haptic feedback, which combines transient and continuous feedback, brings more cues for physical interaction events and effects, corresponding to human fast- and slow-adapting receptors. We formulate a two-peak waveform for transient feedback based on a second-order superimposed exponentially decaying sinusoid model and adopt the orthogonal decomposition filter method for “inequable” continuous feedback. We demonstrate the effectiveness of our method through contact state and teleoperation experiments under different haptic conditions. The results indicate that the proposed method helps the operator to perceive and understand physical interaction and significantly improves teleoperation performance.


Journal ArticleDOI
TL;DR: In this paper , a multimodal sensing approach was used to monitor situation awareness (SA) in an automated driving scenario, which achieved the best classification accuracy (90.6%).
Abstract: Maintaining situation awareness (SA) is essential for drivers to deal with the situations that Society of Automotive Engineers (SAE) Level 3 automated vehicle systems are not designed to handle. Although advanced physiological sensors can enable continuous SA assessments, previous single-modality approaches may not be sufficient to capture SA. To address this limitation, the current study demonstrates a multimodal sensing approach for objective SA monitoring. Physiological sensor data from electroencephalogram and eye-tracking were recorded for 30 participants as they performed three secondary tasks during automated driving scenarios that consisted of a pre-takeover (pre-TOR) request segment and a post-TOR segment. The tasks varied in terms of how visual attention was allocated in the pre-TOR segment. In the post-TOR segment, drivers were expected to gather information from the driving environment in preparation for a vehicle-to-driver transition. Participants' ground-truth SA level was measured using the Situation Awareness Global Assessment Techniques (SAGAT) after the post-TOR segment. A total of 23 physiological features were extracted from the post-TOR segment to train computational intelligence models. Results compared the performance of five different classifiers, the ground-truth labeling strategies, and the features included in the model. Overall, the proposed neural network model outperformed other machine learning models and achieved the best classification accuracy (90.6%). A model with 11 features was optimal. In addition, the multi-physiological sensor-model outperformed the single sensing model by comparing prediction performance. Our results suggest that multimodal sensing model can objectively predict SA. The results of this study provide new insight into how physiological features contribute to the SA assessment.

Journal ArticleDOI
TL;DR: In this article , the effects of misalignment between the camera frame and the operator frame are investigated in a teleoperated eye-in-hand robot and a simple correction method in the view display is proposed.
Abstract: Misalignment between the camera frame and the operator frame is commonly seen in a teleoperated system and usually degrades the operation performance. The effects of such misalignment have not been fully investigated for eye-in-hand systems - systems that have the camera (eye) mounted to the end-effector (hand) to gain compactness in confined spaces such as in endoscopic surgery. This paper provides a systematic study on the effects of the camera frame misalignment in a teleoperated eye-in-hand robot and proposes a simple correction method in the view display. A simulation is designed to compare the effects of the misalignment under different conditions. Users are asked to move a rigid body from its initial position to the specified target position via teleoperation, with different levels of misalignment simulated. It is found that misalignment between the input motion and the output view is much more difficult to compensate by the operators when it is in the orthogonal direction (~40s) compared with the opposite direction (~20s). An experiment on a real concentric tube robot with an eye-in-hand configuration is also conducted. Users are asked to telemanipulate the robot to complete a pick-and-place task. Results show that with the correction enabled, there is a significant improvement in the operation performance in terms of completion time (mean 40.6%, median 38.6%), trajectory length (mean 34.3%, median 28.1%), difficulty (50.5%), unsteadiness (49.4%), and mental stress (60.9%).

Journal ArticleDOI
TL;DR: In this article , a hip exoskeleton with a parallel structure is developed, which eliminates the misalignment problem and enables walking free of restrictions by using a model-based controller.
Abstract: High motion compatibility with the human body is essential for lower limb exoskeletons. However, in most exoskeletons, internal/external rotational degrees of freedom are not provided, which makes accurate alignment between the biological and mechanical joints difficult to achieve. To solve this problem, a novel hip exoskeleton with a parallel structure is developed in this article. The unique parallel structure eliminates the misalignment problem and enables walking free of restrictions. On the other hand, this requires a coordinated control among actuations within the parallel exoskeleton structure. In this light, a model-based controller is proposed in this article. The controller is based on a human–machine integrated dynamic model and can generate coordinated force control references that could increase the closed-loop system's sensitivity to its wearer's movements. The controller requires only kinematic information from the wearer, but not interaction force data that most existing exoskeletons require in their control design, which saves spaces and makes the system compact for use. Experiments were conducted to demonstrate the kinematic compatibility and assistive performance of the proposed hip exoskeleton.

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a deep learning model consisting of the convolutional part, the long short-term memory layer, and the multilabel output layer.
Abstract: Activity prediction aims to predict what activities will occur in the future. In smart homes, to facilitate the daily living of the residents, automated or assistive services are provided. To provide these services, the ability of activity prediction is necessary. When we make a prediction, most of the existing works focus on predicting information about the next activity. However, in a smart home environment, compared with just predicting information about the next activity, another type of activity prediction problem has more practical value: predicting what activities will occur in the coming time period of a certain length. The necessity of this type of prediction is due to the purpose of the smart homes and the character of the activities. Many activities in smart homes need preparation time before being performed. Through this type of prediction, activities could be predicted sufficient time before being performed, and there will be adequate time for the smart home system to prepare corresponding automated or assistive services. As more than one activity could occur within the time period in which the prediction is made, this problem is a multilabel classification problem. In this article, we first give a formulation of the problem. Then, we propose a deep learning model to address it. The proposed model consists of the convolutional part, the long short-term memory layer, and the multilabel output layer. Experiments on real-world datasets show the effectiveness of our model.

Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper proposed a lightweight multiscale fusion network (LMFNet) with a hierarchical structure based on single-mode data for low-quality 3D face recognition.
Abstract: Three-dimensional (3-D) face recognition (FR) can improve the usability and user-friendliness of human–machine interaction. In general, 3-D FR can be divided into high-quality and low-quality 3-D FR according to different interaction scenarios. The low-quality data can be easily obtained, so its application prospect is more extensive. However, the challenge is how to balance the trade-offs between data accuracy and real-time performance. To solve this problem, we propose a lightweight multiscale fusion network (LMFNet) with a hierarchical structure based on single-mode data for low-quality 3-D FR. First, we design a backbone network with only five feature extraction blocks to reduce computational complexity and improve the inference speed. Second, we devise a mid-low adjacent layer with a multiscale feature fusion (ML-MSFF) module to extract the facial texture and contour information, and a mid-high adjacent layer with a multiscale feature fusion (MH-MSFF) module to obtain the discriminative information in high-level features. Then, a hierarchical multiscale feature fusion (HMSFF) module is formed by combining these two modules mentioned above to acquire the local information of different scales. Finally, we enhance the expression of features by integrating HMSFF with a global convolutional neural network for improving recognition accuracy. Experiments on Lock3DFace, KinectFaceDB, and IIIT-D datasets demonstrate that our proposed LMFNet can achieve superior performance on low-quality datasets. Furthermore, experiments on the cross-quality database based on Bosphorus and the different intensity noise low-quality datasets based on UMB-DB and Bosphorus show that our network is robust and has a high generalization ability. It satisfies the real-time requirement, which lays a foundation for a smooth and user-friendly interactive experience.

Journal ArticleDOI
TL;DR: In this paper , the authors compared an optical-based AR setup and a semi-immersive setup based on a VR table, using the same anatomy training software and the same interaction system.
Abstract: Research on alternative ways to provide anatomy learning and training has increased over the past few years, especially since the COVID-19 pandemic. Virtual reality (VR) and augmented reality (AR) represent two promising alternatives in this regard. For this reason, in this work, we analyze the suitability of applying VR and AR for anatomy training, comparing an optical-based AR setup and a semi-immersive setup based on a VR table, using the same anatomy training software and the same interaction system. The AR-based setup uses a Magic Leap One, whereas the VR table is configured through the use of stereoscopic TV displays and a motion-capture system. This experiment builds on a previous one (Vergel et al., 2020) on which we have improved the AR-based setup and increased the complexity of one of the two tasks. The goal of this new experiment is to confirm whether the changes made in the setups modify the previous conclusions. Our hypothesis is that the improved AR-based setup will be more suitable, for anatomy training, than the VR-based setup. For this reason, we conducted an experimental research with 45 participants, comparing the use of an anatomy training software. Objective and subjective data were collected. The results show that the AR-based setup is the preferred choice. The differences in measurable performance were small but also favorable to the AR setup. In addition, participants provided better subjective ratings for the AR-based setup, confirming our initial hypothesis. Nevertheless, both setups offer a similar overall performance and provide excellent results in the subjective measures, with both systems approaching the highest possible values.

Journal ArticleDOI
TL;DR: In this article , a multidimensional analysis framework was developed to improve cognitive load prediction using a fusion of spatial, temporal, and spectral EEG features, which can serve as signatures of cognitive load and that their fusion can improve multilevel prediction.
Abstract: Cognitive load prediction is one of the most important issues in the nascent field of neuroergonomics, and it has significant value in real-world applications. Most of the previous studies of cognitive load prediction only utilized electroencephalography (EEG)-based spectral signatures or interchannel connectivity, ignoring abundant temporal microstate features, which may represent the transient topologies of EEG signals. Furthermore, previous studies have mostly focused on the binary-level classification of cognitive load for single-type cognitive tasks. To date, there are few studies on the multilevel prediction of cognitive load during mixed cognitive tasks. Here, we first designed a new paradigm termed the “finding fault game,” mixing multiple tasks of memory, counting, and visual search, and then developed a multidimensional analysis framework to improve cognitive load prediction using a fusion of spatial, temporal, and spectral EEG features. Specifically, EEG-based functional connectivity, microstates and power spectral densities (PSD) were calculated for three cognitive load levels. Twelve adult subjects participated in the study. The experimental results show that increased cognitive load was associated with elevated theta and degraded alpha power and significant changes in interchannel connectivity and microstates, and that fusing the three types of EEG features improved the performance of three-level cognitive load prediction, achieving the accuracies of greater than 80% in the cross-validation, real-time, and over-time prediction. The findings suggest that all three types of EEG features can serve as signatures of cognitive load and that their fusion can improve multilevel prediction.

Journal ArticleDOI
TL;DR: In this paper , the authors used the Fourier-Bessel series expansion-based empirical wavelet transform (FBSE-EWT) with time and frequency-domain (TAFD) features.
Abstract: The accurate automated eye movement classification is gaining importance in the field of human–computer interaction (HCI). The present article aims at the classification of six types of eye movements from electromyogram (EMG) of extraocular muscles (EOM) signals using the Fourier–Bessel series expansion-based empirical wavelet transform (FBSE-EWT) with time and frequency-domain (TAFD) features. The FBSE-EWT of EMG signals results in Fourier–Bessel intrinsic mode functions (FBIMFs), which correspond to the frequency contents in the signal. A hybrid approach is used to select the prominent FBIMFs followed by the statistical and signal complexity-based feature extraction. Furthermore, metaheuristic optimization algorithms are employed to reduce the feature space dimension. The discrimination ability of the reduced feature set is verified by Kruskal–Wallis statistical test. Multiclass support vector machine (MSVM) has been employed for classification. First, the classification has been performed with TAFD features followed by the combination of TAFD and FBSE-EWT-based reduced feature set. The combination of TAFD and FBSE-EWT-based feature set has provided good classification performance. This study demonstrates the efficacy of FBSE-EWT and subsequent metaheuristic feature selection algorithms in classifying the eye movements from EMG of EOM signals. The combination of TAFD and the selected features through salp swarm optimization algorithm has provided maximum classification accuracy of 98.91% with MSVM employing Gaussian and radial basis function kernels. Thus, the proposed approach has the potential to be used in HCI applications involving biomedical signals.

Journal ArticleDOI
TL;DR: In this article , the authors proposed a new weighting-based deep ensemble model for recognizing interventionalists' hand motions in manual and robotic intravascular catheterization, which achieved 97.52% and 47.80% recognition performances on test samples in the ...read more
Abstract: Robot-assisted intravascular interventions have evolved as unique treatments approach for cardiovascular diseases. However, the technology currently has low potentials for catheterization skill evaluation, slow learning curve, and inability to transfer experience gained from manual interventions. This study proposes a new weighting-based deep ensemble model for recognizing interventionalists' hand motions in manual and robotic intravascular catheterization. The model has a module of neural layers for extracting features in electromyography data, and an ensemble of machine learning methods for classifying interventionalists' hand gestures as one of the six hand motions used during catheterization. A soft-weighting technique is applied to guide the contributions of each base learners. The model is validated with electromyography data recorded during in-vitro and in-vivo trials and labeled as many-to-one sequences. Results obtained show the proposed model could achieve 97.52% and 47.80% recognition performances on test samples in the in-vitro and in-vivo data, respectively. For the latter, transfer learning was applied to update weights from the in-vitro data, and the retrained model was used for recognizing the hand motions in the in-vivo data. The weighting-based ensemble was evaluated against the base learners and the results obtained shows it has a more stable performance across the six hand motion classes. Also, the proposed model was compared with four existing methods used for hand motion recognition in intravascular catheterization. The results obtained show our model has the best recognition performances for both the in-vitro and in-vivo catheterization datasets. This study is developed toward increasing interventionalists' skills in robot-assisted catheterization.

Journal ArticleDOI
TL;DR: In this article , a fully automatic subject modeling framework is presented to reconstruct human pose, shape, as well as the body texture in a challenging optimization scenario by integrating powerful differentiable rendering into the subject-specific modeling pipeline.
Abstract: 3-D human pose estimation or human tracking has always been the focus of research in the human–computer interaction community. As the calibration step of human pose estimation, subject-specific modeling is crucially important to the subsequent pose estimation process. It not only provides a priori knowledge but also clearly defines the tracking target. This article presents a fully automatic subject modeling framework to reconstruct human pose, shape, as well as the body texture in a challenging optimization scenario. By integrating powerful differentiable rendering into the subject-specific modeling pipeline, the proposed method transforms the texture reconstruction problem into analysis by synthesis minimization and solves it efficiently by a gradient-based method. Furthermore, a novel covariance matrix adaptation annealing algorithm is proposed to attack the high-dimensional multimodal optimization problem in an adaptive manner. The domain knowledge of hierarchical human anatomy is seamlessly injected to the annealing optimization process by using a soft covariance matrix mask. All together contributes to the novel algorithm robust to the temptation of local minima. Experiments on the Human3.6 M dataset and the People-Snapshot dataset demonstrate the competitive results to the state of the art both qualitatively and quantitatively.