Showing papers by "Antonis A. Argyros published in 2021"

PDF

Open Access

Proceedings Article•DOI•

Occlusion-tolerant and personalized 3D human pose estimation in RGB images

[...]

Ammar Qammaz¹, Antonis A. Argyros¹•Institutions (1)

10 Jan 2021

TL;DR: In this article, a real-time method that estimates the 3D human pose directly in the Bio Vision Hierarchy (BVH) format, given estimations of the 2D body joints originating from monocular color images is presented.

...read moreread less

Abstract: We introduce a real-time method that estimates the 3D human pose directly in the popular Bio Vision Hierarchy (BVH) format, given estimations of the 2D body joints originating from monocular color images. Our contributions include: (a) A novel and compact 2D pose representation. (b) A human body orientation classifier and an ensemble of orientation-tuned neural networks that regress the 3D human pose by also allowing for the decomposition of the body to an upper and lower kinematic hierarchy. This permits the recovery of the human pose even in the case of significant occlusions. (c) An efficient Inverse Kinematics solver that refines the neural-network-based solution providing 3D human pose estimations that are consistent with the limb sizes of a target person (if known). All the above yield a 33% accuracy improvement on the Human 3.6 Million (H3.6M) dataset compared to the baseline method (MocapNET) while maintaining real-time performance (70 fps in CPU-only execution).

...read moreread less

12 citations

Journal Article•DOI•

Adaptive heterogeneous multi-robot collaboration from formal task specifications

[...]

Philipp Schillinger¹, Sergio Garcia², Alexandros Makris, Konstantinos Roditakis, Michalis Logothetis³, Konstantinos Alevizos³, Wei Ren⁴, Pouria Tajvar⁴, Patrizio Pelliccione², Antonis A. Argyros, Kostas J. Kyriakopoulos³, Dimos V. Dimarogonas⁴ - Show less +8 more•Institutions (4)

Bosch¹, University of Gothenburg², National Technical University of Athens³, Royal Institute of Technology⁴

01 Nov 2021-Robotics and Autonomous Systems

TL;DR: A distributed framework that enables a team of heterogeneous robots to dynamically generate actions from a common, user-defined goal specification is presented and the integration of various robotic capabilities into a common task allocation and planning formalism is discussed.

...read moreread less

7 citations

Journal Article•DOI•

FaceGuard: A Wearable System To Avoid Face Touching

[...]

Allan Michael Michelin¹, Georgios Korres¹, Sara Ba'ara¹, Hadi Assadi¹, Haneen Alsuradi¹, Rony R. Sayegh², Antonis A. Argyros³, Mohamad Eid¹ - Show less +4 more•Institutions (3)

New York University Abu Dhabi¹, Cleveland Clinic², University of Crete³

08 Apr 2021

TL;DR: In this article, a 1D-CNN based prediction model was developed and trained with data from 4,800 trials recorded from 40 participants, with a prediction accuracy of more than 92% with less than 550 ms of IMU data.

...read moreread less

Abstract: Most people touch their faces unconsciously, for instance to scratch an itch or to rest one's chin in their hands. To reduce the spread of the novel coronavirus (COVID-19), public health officials recommend against touching one's face, as the virus is transmitted through mucous membranes in the mouth, nose and eyes. Students, office workers, medical personnel and people on trains were found to touch their faces between 9 and 23 times per hour. This paper introduces FaceGuard, a system that utilizes deep learning to predict hand movements that result in touching the face, and provides sensory feedback to stop the user from touching the face. The system utilizes an inertial measurement unit (IMU) to obtain features that characterize hand movement involving face touching. Time-series data can be efficiently classified using 1D-Convolutional Neural Network (CNN) with minimal feature engineering; 1D-CNN filters automatically extract temporal features in IMU data. Thus, a 1D-CNN based prediction model is developed and trained with data from 4,800 trials recorded from 40 participants. Training data are collected for hand movements involving face touching during various everyday activities such as sitting, standing, or walking. Results showed that while the average time needed to touch the face is 1,200 ms, a prediction accuracy of more than 92% is achieved with less than 550 ms of IMU data. As for the sensory response, the paper presents a psychophysical experiment to compare the response time for three sensory feedback modalities, namely visual, auditory, and vibrotactile. Results demonstrate that the response time is significantly smaller for vibrotactile feedback (427.3 ms) compared to visual (561.70 ms) and auditory (520.97 ms). Furthermore, the success rate (to avoid face touching) is also statistically higher for vibrotactile and auditory feedback compared to visual feedback. These results demonstrate the feasibility of predicting a hand movement and providing timely sensory feedback within less than a second in order to avoid face touching.

...read moreread less

6 citations

Proceedings Article•DOI•

Towards Improved and Interpretable Action Quality Assessment with Self-Supervised Alignment

[...]

Konstantinos Roditakis¹, Alexandros Makris, Antonis A. Argyros¹•Institutions (1)

University of Crete¹

29 Jun 2021

TL;DR: In this paper, the same embeddings are also used to temporally align the sequences prior to quality assessment, which further increases the accuracy, provides robustness to variance in execution speed and enables to provide fine-grained interpretability of the assessment score.

...read moreread less

Abstract: Action Quality Assessment (AQA) is a video understanding task aiming at the quantification of the execution quality of an action. One of the main challenges in relevant, deep learning-based approaches is the collection of training data annotated by experts. Current methods perform fine-tuning on pre-trained backbone models and aim to improve performance by modeling the subjects and the scene. In this work, we consider embeddings extracted using a self-supervised training method based on a differential cycle consistency loss between sequences of actions. These are shown to improve the state-of-the-art without the need for additional annotations or scene modeling. The same embeddings are also used to temporally align the sequences prior to quality assessment which further increases the accuracy, provides robustness to variance in execution speed and enables us to provide fine-grained interpretability of the assessment score. The experimental evaluation of the method on the MTL-AQA dataset demonstrates significant accuracy gain compared to the state-of-the-art baselines, which grows even more when the action execution sequences are not well aligned.

...read moreread less

4 citations

Journal Article•DOI•

Efficient Cooperation of Heterogeneous Robotic Agents: A Decentralized Framework

[...]

Michalis Logothetis¹, George C. Karras¹, Konstantinos Alevizos¹, Christos K. Verginis², Pedro Roque³, Konstantinos Roditakis, Alexandros Makris, Sergio Garcia⁴, Philipp Schillinger⁵, Alessandro Di Fava, Patrizio Pelliccione⁴, Antonis A. Argyros, Kostas J. Kyriakopoulos¹, Dimos V. Dimarogonas³ - Show less +10 more•Institutions (5)

National Technical University of Athens¹, University of Texas at Austin², Royal Institute of Technology³, University of Gothenburg⁴, Bosch⁵

23 Apr 2021-IEEE Robotics & Automation Magazine

TL;DR: In social and industrial facilities of the future such as hospitals, hotels, and warehouses, teams of robots will be deployed to assist humans in accomplishing everyday tasks like object handling, transportation, or pickup and delivery operations as discussed by the authors.

...read moreread less

Abstract: In social and industrial facilities of the future such as hospitals, hotels, and warehouses, teams of robots will be deployed to assist humans in accomplishing everyday tasks like object handling, transportation, or pickup and delivery operations. In such a context, different robots (e.g., mobile platforms, static manipulators, or mobile manipulators) with different actuation, manipulation, and perception capabilities must be coordinated to achieve various complex tasks (e.g., cooperative parts assembly in the automotive industry or loading and unloading of palettes in warehouses) that require collaborative actions with each other and with human operators (Figure 1).

...read moreread less

4 citations

Book Chapter•DOI•

Action Prediction During Human-Object Interaction Based on DTW and Early Fusion of Human and Object Representations

[...]

Victoria Manousaki¹, Victoria Manousaki², Konstantinos E. Papoutsakis², Antonis A. Argyros², Antonis A. Argyros¹ - Show less +1 more•Institutions (2)

University of Crete¹, Foundation for Research & Technology – Hellas²

22 Sep 2021

TL;DR: In this paper, an approach that builds time series representations of the performance of the humans and the objects is proposed, and a representation of an ongoing action is then compared to prototype actions.

...read moreread less

Abstract: Action prediction is defined as the inference of an action label while the action is still ongoing. Such a capability is extremely useful for early response and further action planning. In this paper, we consider the problem of action prediction in scenarios involving humans interacting with objects. We formulate an approach that builds time series representations of the performance of the humans and the objects. Such a representation of an ongoing action is then compared to prototype actions. This is achieved by a Dynamic Time Warping (DTW)-based time series alignment framework which identifies the best match between the ongoing action and the prototype ones. Our approach is evaluated quantitatively on three standard benchmark datasets. Our experimental results reveal the importance of the fusion of human- and object-centered action representations in the accuracy of action prediction. Moreover, we demonstrate that the proposed approach achieves significantly higher action prediction accuracy compared to competitive methods.

...read moreread less

3 citations

Proceedings Article•DOI•

H-GAN: the power of GANs in your Hands

[...]

Sergiu Oprea¹, Giorgos Karvounas, Pablo Martinez-Gonzalez¹, Nikolaos Kyriazis, Sergio Orts-Escolano¹, Iason Oikonomidis, Alberto Garcia-Garcia², Aggeliki Tsoli, Jose Garcia-Rodriguez¹, Antonis A. Argyros - Show less +6 more•Institutions (2)

University of Alicante¹, Spanish National Research Council²

18 Jul 2021

TL;DR: HandGAN as discussed by the authors is a cycle-consistent adversarial learning approach implementing multi-scale perceptual discriminators, which is designed to translate synthetic images of hands to the real domain by improving the appearance of synthetic hands to approximate the statistical distribution underlying a collection of real images.

...read moreread less

Abstract: We present HandGAN (H-GAN), a cycle-consistent adversarial learning approach implementing multi-scale perceptual discriminators. It is designed to translate synthetic images of hands to the real domain. Synthetic hands provide complete ground-truth annotations, yet they are not representative of the target distribution of real-world data. We strive to provide the perfect blend of a realistic hand appearance with synthetic annotations. Relying on image-to-image translation, we improve the appearance of synthetic hands to approximate the statistical distribution underlying a collection of real images of hands. H-GAN tackles not only the cross-domain tone mapping but also structural differences in localized areas such as shading discontinuities. Results are evaluated on a qualitative and quantitative basis improving previous works. Furthermore, we relied on the hand classification task to claim our generated hands are statistically similar to the real domain of hands.

...read moreread less

2 citations

Journal Article•DOI•

Retinal image registration as a tool for supporting clinical applications.

[...]

Carlos Hernandez-Matas¹, Carlos Hernandez-Matas², Xenophon Zabulis², Antonis A. Argyros²•Institutions (2)

University of Crete¹, Foundation for Research & Technology – Hellas²

01 Feb 2021-Computer Methods and Programs in Biomedicine

TL;DR: The most important novelty presented in this work is the eye shape estimation and the generation of 3D point meshes that has the potential for allowing clinicians to perform measurements on 3D representations of the eye, instead of doing so in 2D images that contain distortions induced because of the projection on the image space.

...read moreread less

2 citations

Posted Content•

Even Faster SNN Simulation with Lazy+Event-driven Plasticity and Shared Atomics

[...]

Dennis Bautembach¹, Iason Oikonomidis¹, Antonis A. Argyros¹•Institutions (1)

Foundation for Research & Technology – Hellas¹

08 Jul 2021-arXiv: Neural and Evolutionary Computing

TL;DR: In this article, the authors present two novel optimizations that accelerate clock-based spiking neural network (SNN) simulators, one of which targets spike timing dependent plasticity (STDP) and the other one targets spike delivery.

...read moreread less

Abstract: We present two novel optimizations that accelerate clock-based spiking neural network (SNN) simulators. The first one targets spike timing dependent plasticity (STDP). It combines lazy- with event-driven plasticity and efficiently facilitates the computation of pre- and post-synaptic spikes using bitfields and integer intrinsics. It offers higher bandwidth than event-driven plasticity alone and achieves a 1.5x-2x speedup over our closest competitor. The second optimization targets spike delivery. We partition our graph representation in a way that bounds the number of neurons that need be updated at any given time which allows us to perform said update in shared memory instead of global memory. This is 2x-2.5x faster than our closest competitor. Both optimizations represent the final evolutionary stages of years of iteration on STDP and spike delivery inside "Spice" (/spaIk/), our state of the art SNN simulator. The proposed optimizations are not exclusive to our graph representation or pipeline but are applicable to a multitude of simulator designs. We evaluate our performance on three well-established models and compare ourselves against three other state of the art simulators.

...read moreread less

1 citations

Posted Content•

Multi-GPU SNN Simulation with Static Load Balancing

[...]

Dennis Bautembach¹, Iason Oikonomidis¹, Antonis A. Argyros¹•Institutions (1)

Foundation for Research & Technology – Hellas¹

09 Feb 2021-arXiv: Neural and Evolutionary Computing

TL;DR: In this article, a cache-aware spike transmission algorithm and load balancing strategy are proposed to scale to millions of neurons, billions of synapses, and 8 GPUs, which is made possible by a novel, cacheaware, spike transmission and a model parallel multi-GPU distribution scheme.

...read moreread less

Abstract: We present a SNN simulator which scales to millions of neurons, billions of synapses, and 8 GPUs. This is made possible by 1) a novel, cache-aware spike transmission algorithm 2) a model parallel multi-GPU distribution scheme and 3) a static, yet very effective load balancing strategy. The simulator further features an easy to use API and the ability to create custom models. We compare the proposed simulator against two state of the art ones on a series of benchmarks using three well-established models. We find that our simulator is faster, consumes less memory, and scales linearly with the number of GPUs.

...read moreread less

1 citations

Proceedings Article•DOI•

Extracting Action Hierarchies from Action Labels and their Use in Deep Action Recognition

[...]

Konstantinos Bacharidis¹, Antonis A. Argyros¹•Institutions (1)

University of Crete¹

10 Jan 2021

TL;DR: In this paper, a coarse-to-fine action hierarchy based on linguistic label associations is proposed, and the potential benefits and drawbacks of the hierarchical organization of action classes in different levels of granularity are investigated.

...read moreread less

Abstract: Human activity recognition is a fundamental and challenging task in computer vision. Its solution can support multiple and diverse applications in areas including but not limited to smart homes, surveillance, daily living assistance, Human-Robot Collaboration (HRC), etc. In realistic conditions, the complexity of human activities ranges from simple coarse actions, such as siting or standing up, to more complex activities that consist of multiple actions with subtle variations in appearance and motion patterns. A large variety of existing datasets target specific action classes, with some of them being coarse and others being fine-grained. In all of them, a description of the action and its complexity is manifested in the action label sentence. As the action/activity complexity increases, so is the label sentence size and the amount of action-related semantic information contained in this description. In this paper, we propose an approach to exploit the information content of these action labels to formulate a coarse-to-fine action hierarchy based on linguistic label associations, and investigate the potential benefits and drawbacks. Moreover, in a series of quantitative and qualitative experiments, we show that the exploitation of this hierarchical organization of action classes in different levels of granularity improves the learning speed and overall performance of a range of baseline and mid-range deep architectures for human action recognition (HAR).

...read moreread less

Proceedings Article•DOI•

Multi-GPU SNN Simulation with Static Load Balancing

[...]

Dennis Bautembach, Iason Oikonomidis, Antonis A. Argyros

18 Jul 2021

...read moreread less

Posted Content•

H-GAN: the power of GANs in your Hands

[...]

University of Alicante¹, Spanish National Research Council²

27 Mar 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: HandGAN (H-GAN) as discussed by the authors is a cycle-consistent adversarial learning approach implementing multi-scale perceptual discriminators, which is designed to translate synthetic images of hands to the real domain.

...read moreread less

Abstract: We present HandGAN (H-GAN), a cycle-consistent adversarial learning approach implementing multi-scale perceptual discriminators It is designed to translate synthetic images of hands to the real domain Synthetic hands provide complete ground-truth annotations, yet they are not representative of the target distribution of real-world data We strive to provide the perfect blend of a realistic hand appearance with synthetic annotations Relying on image-to-image translation, we improve the appearance of synthetic hands to approximate the statistical distribution underlying a collection of real images of hands H-GAN tackles not only the cross-domain tone mapping but also structural differences in localized areas such as shading discontinuities Results are evaluated on a qualitative and quantitative basis improving previous works Furthermore, we relied on the hand classification task to claim our generated hands are statistically similar to the real domain of hands

...read moreread less

Posted Content•

Multi-view Image-based Hand Geometry Refinement using Differentiable Monte Carlo Ray Tracing.

[...]

Giorgos Karvounas¹, Nikolaos Kyriazis¹, Iason Oikonomidis¹, Aggeliki Tsoli¹, Antonis A. Argyros¹ - Show less +1 more•Institutions (1)

Foundation for Research & Technology – Hellas¹

12 Jul 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, the authors employ a publicly available, multi-camera dataset of hands (InterHand2.6M), and perform effective image-based refinement to improve on the imperfect ground truth annotations, yielding a better dataset.

...read moreread less

Abstract: The amount and quality of datasets and tools available in the research field of hand pose and shape estimation act as evidence to the significant progress that has been made. We find that there is still room for improvement in both fronts, and even beyond. Even the datasets of the highest quality, reported to date, have shortcomings in annotation. There are tools in the literature that can assist in that direction and yet they have not been considered, so far. To demonstrate how these gaps can be bridged, we employ such a publicly available, multi-camera dataset of hands (InterHand2.6M), and perform effective image-based refinement to improve on the imperfect ground truth annotations, yielding a better dataset. The image-based refinement is achieved through raytracing, a method that has not been employed so far to relevant problems and is hereby shown to be superior to the approximative alternatives that have been employed in the past. To tackle the lack of reliable ground truth, we resort to realistic synthetic data, to show that the improvement we induce is indeed significant, qualitatively, and quantitatively, too.

...read moreread less