scispace - formally typeset
Search or ask a question

Showing papers by "Jana Kosecka published in 2021"


Proceedings ArticleDOI
01 Jan 2021
TL;DR: In this paper, a pose-guided pooling method was proposed for word-level sign recognition from American Sign Language (ASL) using video, which uses both motion and hand shape cues while being robust to variations of execution.
Abstract: Gestures in American Sign Language (ASL) are characterized by fast, highly articulate motion of upper body, including arm movements with complex hand shapes and facial expressions. In this work, we propose a new method for word-level sign recognition from American Sign Language (ASL) using video. Our method uses both motion and hand shape cues while being robust to variations of execution. We exploit the knowledge of the body pose, estimated from an off-the-shelf pose estimator. Using the pose as a guide, we pool spatio-temporal feature maps from different layers of a 3D convolutional neural network. We train separate classifiers using pose guided pooled features from different resolutions and fuse their prediction scores during test time. This leads to a significant improvement in performance on the WLASL benchmark dataset [25]. The proposed approach achieves 10%, 12%, 9.5% and 6.5% performance gain on WLASL100, WLASL300, WLASL1000, WLASL2000 subsets respectively. To demonstrate the robustness of the pose guided pooling and proposed fusion mechanism, we also evaluate our method by fine tuning the model on another dataset. This yields 10% performance improvement for the proposed method using only 0.4% training data during fine tuning stage.

25 citations


Proceedings ArticleDOI
07 Apr 2021
TL;DR: In this article, the loss function is regulated by a reverse knowledge distillation, forcing the new member to learn different features and map to a latent space safely distanced from those of existing members.
Abstract: This paper proposes an ensemble learning model that is resistant to adversarial attacks. To build resilience, we introduced a training process where each member learns a radically distinct latent space. Member models are added one at a time to the ensemble. Simultaneously, the loss function is regulated by a reverse knowledge distillation, forcing the new member to learn different features and map to a latent space safely distanced from those of existing members. We assessed the security and performance of the proposed solution on image classification tasks using CIFAR10 and MNIST datasets and showed security and performance improvement compared to the state of the art defense methods.

7 citations


Posted Content
TL;DR: In this article, the authors exploit additional predictions of semantic segmentation models and quantifying its confidences, followed by classification of object hypotheses as known vs. unknown, out of distribution objects.
Abstract: Recent efforts in deploying Deep Neural Networks for object detection in real world applications, such as autonomous driving, assume that all relevant object classes have been observed during training. Quantifying the performance of these models in settings when the test data is not represented in the training set has mostly focused on pixel-level uncertainty estimation techniques of models trained for semantic segmentation. This paper proposes to exploit additional predictions of semantic segmentation models and quantifying its confidences, followed by classification of object hypotheses as known vs. unknown, out of distribution objects. We use object proposals generated by Region Proposal Network (RPN) and adapt distance aware uncertainty estimation of semantic segmentation using Radial Basis Functions Networks (RBFN) for class agnostic object mask prediction. The augmented object proposals are then used to train a classifier for known vs. unknown objects categories. Experimental results demonstrate that the proposed method achieves parallel performance to state of the art methods for unknown object detection and can also be used effectively for reducing object detectors' false positive rate. Our method is well suited for applications where prediction of non-object background categories obtained by semantic segmentation is reliable.

4 citations


Posted Content
TL;DR: SLAW as discussed by the authors balances learning between tasks by estimating the magnitudes of each task's gradient without performing any extra backward passes, which matches the performance of the best existing methods while being much more efficient.
Abstract: Multi-task learning (MTL) is a subfield of machine learning with important applications, but the multi-objective nature of optimization in MTL leads to difficulties in balancing training between tasks. The best MTL optimization methods require individually computing the gradient of each task's loss function, which impedes scalability to a large number of tasks. In this paper, we propose Scaled Loss Approximate Weighting (SLAW), a method for multi-task optimization that matches the performance of the best existing methods while being much more efficient. SLAW balances learning between tasks by estimating the magnitudes of each task's gradient without performing any extra backward passes. We provide theoretical and empirical justification for SLAW's estimation of gradient magnitudes. Experimental results on non-linear regression, multi-task computer vision, and virtual screening for drug discovery demonstrate that SLAW is significantly more efficient than strong baselines without sacrificing performance and applicable to a diverse range of domains.