Showing papers by "Majid Mirmehdi published in 2021"

PDF

Open Access

Proceedings Article•DOI•

Temporal-Relational CrossTransformers for Few-Shot Action Recognition

[...]

Toby Perrett¹, Alessandro Masullo¹, Tilo Burghardt¹, Majid Mirmehdi¹, Dima Damen¹ - Show less +1 more•Institutions (1)

15 Jan 2021

TL;DR: Temporal-Relational CrossTransformers (TRX) as mentioned in this paper constructs class prototypes using the CrossTransformer attention mechanism to observe relevant sub-sequences of all support videos, rather than using class averages or single best matches.

...read moreread less

Abstract: We propose a novel approach to few-shot action recognition, finding temporally-corresponding frame tuples between the query and videos in the support set. Distinct from previous few-shot works, we construct class prototypes using the CrossTransformer attention mechanism to observe relevant sub-sequences of all support videos, rather than using class averages or single best matches. Video representations are formed from ordered tuples of varying numbers of frames, which allows sub-sequences of actions at different speeds and temporal offsets to be compared.1Our proposed Temporal-Relational CrossTransformers (TRX) achieve state-of-the-art results on few-shot splits of Kinetics, Something-Something V2 (SSv2), HMDB51 and UCF101. Importantly, our method outperforms prior work on SSv2 by a wide margin (12%) due to the its ability to model temporal relations. A detailed ablation showcases the importance of matching to multiple support set videos and learning higher-order relational CrossTransformers.

...read moreread less

88 citations

Journal Article•DOI•

Multimodal Classification of Parkinson’s Disease in Home Environments with Resiliency to Missing Modalities

[...]

Farnoosh Heidarivincheh¹, Ryan McConville¹, Catherine Morgan¹, Roisin McNaney², Alessandro Masullo¹, Majid Mirmehdi¹, Alan L Whone³, Alan L Whone¹, Ian J Craddock¹ - Show less +5 more•Institutions (3)

University of Bristol¹, Monash University², North Bristol NHS Trust³

16 Jun 2021-Sensors

TL;DR: In this article, a multimodal deep learning approach for discriminating between people with Parkinson's disease and without PD is presented, which uses two data modalities, acquired from vision and accelerometer sensors in a home environment to train variational autoencoder (VAE) models.

...read moreread less

Abstract: Parkinson's disease (PD) is a chronic neurodegenerative condition that affects a patient's everyday life. Authors have proposed that a machine learning and sensor-based approach that continuously monitors patients in naturalistic settings can provide constant evaluation of PD and objectively analyse its progression. In this paper, we make progress toward such PD evaluation by presenting a multimodal deep learning approach for discriminating between people with PD and without PD. Specifically, our proposed architecture, named MCPD-Net, uses two data modalities, acquired from vision and accelerometer sensors in a home environment to train variational autoencoder (VAE) models. These are modality-specific VAEs that predict effective representations of human movements to be fused and given to a classification module. During our end-to-end training, we minimise the difference between the latent spaces corresponding to the two data modalities. This makes our method capable of dealing with missing modalities during inference. We show that our proposed multimodal method outperforms unimodal and other multimodal approaches by an average increase in F1-score of 0.25 and 0.09, respectively, on a data set with real patients. We also show that our method still outperforms other approaches by an average increase in F1-score of 0.17 when a modality is missing during inference, demonstrating the benefit of training on multiple modalities.

...read moreread less

9 citations

Proceedings Article•DOI•

No Need for a Lab: Towards Multi-Sensory Fusion for Ambient Assisted Living in Real-World Living Homes

[...]

Alessandro Masullo¹, Toby Perrett¹, Dima Damen¹, Tilo Burghardt, Majid Mirmehdi² - Show less +1 more•Institutions (2)

University of Bristol¹, Vision Institute²

01 Jan 2021

TL;DR: This work improves its architecture and combines it with a tracking functionality that makes it possible to be deployed in real-world homes and shows a novel first example of subject-tailored health monitoring measurement by applying its methodology to a sit-to-stand detector to generate clinically relevant rehabilitation trends.

...read moreread less

Abstract: The majority of the Ambient Assisted Living (AAL) systems, designed for home or lab settings, monitor one participant at a time – this is to avoid the complexities of pre-fusion correspondence of different sensors since carers, guests, and visitors may be involved in real world scenarios. Previous work from (Masullo et al., 2020) presented a solution to this problem that involves matching video sequences of silhouettes to accelerations from wearable sensors to identify members of a household while respecting their privacy. In this work, we elevate this approach to the next stage by improving its architecture and combining it with a tracking functionality that makes it possible to be deployed in real-world homes. We present experiments on a new dataset recorded in participants’ own houses, which includes multiple participants visited by guests, and show an auROC score of 90.2%. We also show a novel first example of subject-tailored health monitoring measurement by applying our methodology to a sit-to-stand detector to generate clinically relevant rehabilitation trends.

...read moreread less

3 citations

Proceedings Article•DOI•

Exploring Motion Boundaries in an End-to-End Network for Vision-based Parkinson's Severity Assessment.

[...]

Amirhossein Dadashzadeh¹, Alan L Whone², Alan L Whone¹, Michal Rolinski¹, Michal Rolinski², Majid Mirmehdi¹ - Show less +2 more•Institutions (2)

University of Bristol¹, Southmead Hospital²

01 Jan 2021

TL;DR: In this paper, an end-to-end deep learning framework was proposed to measure PD severity in two important components, hand movement and gait, of the Unified Parkinson's Disease Rating Scale (UPDRS).

...read moreread less

Abstract: Evaluating neurological disorders such as Parkinson's disease (PD) is a challenging task that requires the assessment of several motor and non-motor functions In this paper, we present an end-to-end deep learning framework to measure PD severity in two important components, hand movement and gait, of the Unified Parkinson's Disease Rating Scale (UPDRS) Our method leverages on an Inflated 3D CNN trained by a temporal segment framework to learn spatial and long temporal structure in video data We also deploy a temporal attention mechanism to boost the performance of our model Further, motion boundaries are explored as an extra input modality to assist in obfuscating the effects of camera motion for better movement assessment We ablate the effects of different data modalities on the accuracy of the proposed network and compare with other popular architectures We evaluate our proposed method on a dataset of 25 PD patients, obtaining 723% and 771% top-1 accuracy on hand movement and gait tasks respectively

...read moreread less

3 citations

Posted Content•

Temporal-Relational CrossTransformers for Few-Shot Action Recognition

[...]

Toby Perrett¹, Alessandro Masullo¹, Tilo Burghardt¹, Majid Mirmehdi¹, Dima Damen¹ - Show less +1 more•Institutions (1)

University of Bristol¹

15 Jan 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: Temporal-Relational CrossTransformers (TRX) as discussed by the authors constructs class prototypes using the CrossTransformer attention mechanism to observe relevant sub-sequences of all support videos, rather than using class averages or single best matches.

...read moreread less

Abstract: We propose a novel approach to few-shot action recognition, finding temporally-corresponding frame tuples between the query and videos in the support set. Distinct from previous few-shot works, we construct class prototypes using the CrossTransformer attention mechanism to observe relevant sub-sequences of all support videos, rather than using class averages or single best matches. Video representations are formed from ordered tuples of varying numbers of frames, which allows sub-sequences of actions at different speeds and temporal offsets to be compared. Our proposed Temporal-Relational CrossTransformers (TRX) achieve state-of-the-art results on few-shot splits of Kinetics, Something-Something V2 (SSv2), HMDB51 and UCF101. Importantly, our method outperforms prior work on SSv2 by a wide margin (12%) due to the its ability to model temporal relations. A detailed ablation showcases the importance of matching to multiple support set videos and learning higher-order relational CrossTransformers.

...read moreread less

2 citations

Posted Content•

Small or Far Away? Exploiting Deep Super-Resolution and Altitude Data for Aerial Animal Surveillance

[...]

Mowen Xue¹, Theo Greenslade¹, Majid Mirmehdi¹, Tilo Burghardt¹•Institutions (1)

University of Bristol¹

12 Nov 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, a holistic attention network based super-resolution approach and a custom-built altitude data exploitation network are integrated into standard recognition pipelines for animal detection in real-world settings.

...read moreread less

Abstract: Visuals captured by high-flying aerial drones are increasingly used to assess biodiversity and animal population dynamics around the globe. Yet, challenging acquisition scenarios and tiny animal depictions in airborne imagery, despite ultra-high resolution cameras, have so far been limiting factors for applying computer vision detectors successfully with high confidence. In this paper, we address the problem for the first time by combining deep object detectors with super-resolution techniques and altitude data. In particular, we show that the integration of a holistic attention network based super-resolution approach and a custom-built altitude data exploitation network into standard recognition pipelines can considerably increase the detection efficacy in real-world settings. We evaluate the system on two public, large aerial-capture animal datasets, SAVMAP and AED. We find that the proposed approach can consistently improve over ablated baselines and the state-of-the-art performance for both datasets. In addition, we provide a systematic analysis of the relationship between animal resolution and detection performance. We conclude that super-resolution and altitude knowledge exploitation techniques can significantly increase benchmarks across settings and, thus, should be used routinely when detecting minutely resolved animals in aerial imagery.

...read moreread less

1 citations

Posted Content•

Unsupervised View-Invariant Human Posture Representation

[...]

Faegheh Sardari¹, Björn Ommer², Majid Mirmehdi³•Institutions (3)

University of Bristol¹, Heidelberg University², Vision Institute³

17 Sep 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, an unsupervised approach that learns to extract view-invariant 3D human pose representation from a 2D image without using 3D joint data is presented.

...read moreread less

Abstract: Most recent view-invariant action recognition and performance assessment approaches rely on a large amount of annotated 3D skeleton data to extract view-invariant features. However, acquiring 3D skeleton data can be cumbersome, if not impractical, in in-the-wild scenarios. To overcome this problem, we present a novel unsupervised approach that learns to extract view-invariant 3D human pose representation from a 2D image without using 3D joint data. Our model is trained by exploiting the intrinsic view-invariant properties of human pose between simultaneous frames from different viewpoints and their equivariant properties between augmented frames from the same viewpoint. We evaluate the learned view-invariant pose representations for two downstream tasks. We perform comparative experiments that show improvements on the state-of-the-art unsupervised cross-view action classification accuracy on NTU RGB+D by a significant margin, on both RGB and depth images. We also show the efficiency of transferring the learned representations from NTU RGB+D to obtain the first ever unsupervised cross-view and cross-subject rank correlation results on the multi-view human movement quality dataset, QMAR, and marginally improve on the-state-of-the-art supervised results for this dataset. We also carry out ablation studies to examine the contributions of the different components of our proposed network.

...read moreread less