scispace - formally typeset
Search or ask a question

Showing papers on "Task analysis published in 2020"


Proceedings ArticleDOI
14 Jun 2020
TL;DR: This paper investigated the relationship between vision and language tasks by developing a large-scale, multi-task model, which culminates in a single model on 12 datasets from four broad categories of task including visual question answering, caption-based image retrieval, grounding referring expressions, and multimodal verification.
Abstract: Much of vision-and-language research focuses on a small but diverse set of independent tasks and supporting datasets often studied in isolation; however, the visually-grounded language understanding skills required for success at these tasks overlap significantly. In this work, we investigate these relationships between vision-and-language tasks by developing a large-scale, multi-task model. Our approach culminates in a single model on 12 datasets from four broad categories of task including visual question answering, caption-based image retrieval, grounding referring expressions, and multimodal verification. Compared to independently trained single-task models, this represents a reduction from approximately 3 billion parameters to 270 million while simultaneously improving performance by 2.05 points on average across tasks. We use our multi-task framework to perform in-depth analysis of the effect of joint training diverse tasks. Further, we show that finetuning task-specific models from our single multi-task model can lead to further improvements, achieving performance at or above the state-of-the-art.

267 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: Zhang et al. as mentioned in this paper proposed an object relational graph (ORG) based encoder, which captures more detailed interaction features to enrich visual representation, and designed a teacher-recommended learning (TRL) method to make full use of the successful external language model (ELM) to integrate the abundant linguistic knowledge into the caption model.
Abstract: Taking full advantage of the information from both vision and language is critical for the video captioning task. Existing models lack adequate visual representation due to the neglect of interaction between object, and sufficient training for content-related words due to long-tailed problems. In this paper, we propose a complete video captioning system including both a novel model and an effective training strategy. Specifically, we propose an object relational graph (ORG) based encoder, which captures more detailed interaction features to enrich visual representation. Meanwhile, we design a teacher-recommended learning (TRL) method to make full use of the successful external language model (ELM) to integrate the abundant linguistic knowledge into the caption model. The ELM generates more semantically similar word proposals which extend the groundtruth words used for training to deal with the long-tailed problem. Experimental evaluations on three benchmarks: MSVD, MSR-VTT and VATEX show the proposed ORG-TRL system achieves state-of-the-art performance. Extensive ablation studies and visualizations illustrate the effectiveness of our system.

225 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: In this article, Zhao et al. design a simple but surprisingly effective visual recognition benchmark for studying bias mitigation, and provide a thorough analysis of a wide range of techniques, highlighting the shortcomings of popular adversarial training approaches for bias mitigation.
Abstract: Computer vision models learn to perform a task by capturing relevant statistics from training data. It has been shown that models learn spurious age, gender, and race correlations when trained for seemingly unrelated tasks like activity recognition or image captioning. Various mitigation techniques have been presented to prevent models from utilizing or learning such biases. However, there has been little systematic comparison between these techniques. We design a simple but surprisingly effective visual recognition benchmark for studying bias mitigation. Using this benchmark, we provide a thorough analysis of a wide range of techniques. We highlight the shortcomings of popular adversarial training approaches for bias mitigation, propose a simple but similarly effective alternative to the inference-time Reducing Bias Amplification method of Zhao et al., and design a domain-independent training technique that outperforms all other methods. Finally, we validate our findings on the attribute classification task in the CelebA dataset, where attribute presence is known to be correlated with the gender of people in the image, and demonstrate that the proposed technique is effective at mitigating real-world gender bias.

221 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: This work proposes a new method to estimate the drift, called semantic drift, of features and compensate for it without the need of any exemplars, and shows that the proposed SDC when combined with existing methods to prevent forgetting consistently improves results.
Abstract: Class-incremental learning of deep networks sequentially increases the number of classes to be classified. During training, the network has only access to data of one task at a time, where each task contains several classes. In this setting, networks suffer from catastrophic forgetting which refers to the drastic drop in performance on previous tasks. The vast majority of methods have studied this scenario for classification networks, where for each new task the classification layer of the network must be augmented with additional weights to make room for the newly added classes. Embedding networks have the advantage that new classes can be naturally included into the network without adding new weights. Therefore, we study incremental learning for embedding networks. In addition, we propose a new method to estimate the drift, called semantic drift, of features and compensate for it without the need of any exemplars. We approximate the drift of previous tasks based on the drift that is experienced by current task data. We perform experiments on fine-grained datasets, CIFAR100 and ImageNet-Subset. We demonstrate that embedding networks suffer significantly less from catastrophic forgetting. We outperform existing methods which do not require exemplars and obtain competitive results compared to methods which store exemplars. Furthermore, we show that our proposed SDC when combined with existing methods to prevent forgetting consistently improves results.

195 citations


Journal ArticleDOI
TL;DR: This paper considers a two-user MEC network, where each WD has a sequence of tasks to execute and proves that the optimal offloading decisions follow an one-climb policy, based on which a reduced-complexity Gibbs Sampling algorithm is proposed to obtain the optimalOffloading decisions.
Abstract: Mobile-edge computing (MEC) has recently emerged as a cost-effective paradigm to enhance the computing capability of hardware-constrained wireless devices (WDs). In this paper, we first consider a two-user MEC network, where each WD has a sequence of tasks to execute. In particular, we consider task dependency between the two WDs, where the input of a task at one WD requires the final task output at the other WD. Under the considered task-dependency model, we study the optimal task offloading policy and resource allocation (e.g., on offloading transmit power and local CPU frequencies) that minimize the weighted sum of the WDs’ energy consumption and task execution time. The problem is challenging due to the combinatorial nature of the offloading decisions among all tasks and the strong coupling with resource allocation. To tackle this problem, we first assume that the offloading decisions are given and derive the closed-form expressions of the optimal offloading transmit power and local CPU frequencies. Then, an efficient bi-section search method is proposed to obtain the optimal solutions. Furthermore, we prove that the optimal offloading decisions follow an one-climb policy, based on which a reduced-complexity Gibbs Sampling algorithm is proposed to obtain the optimal offloading decisions. We then extend the investigation to a general multi-user scenario, where the input of a task at one WD requires the final task outputs from multiple other WDs. Numerical results show that the proposed method can significantly outperform the other representative benchmarks and efficiently achieve low complexity with respect to the call graph size.

180 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: AuxRN is introduced, a framework with four self-supervised auxiliary reasoning tasks to exploit the additional training signals derived from these semantic information that help the agent to acquire knowledge of semantic representations in order to reason about its activities and build a thorough perception of environments.
Abstract: Vision-Language Navigation (VLN) is a task where an agent learns to navigate following a natural language instruction. The key to this task is to perceive both the visual scene and natural language sequentially. Conventional approaches fully exploit vision and language features in cross-modal grounding. However, the VLN task remains challenging, since previous works have implicitly neglected the rich semantic information contained in environments (such as navigation graphs or sub-trajectory semantics). In this paper, we introduce Auxiliary Reasoning Navigation (AuxRN), a framework with four self-supervised auxiliary reasoning tasks to exploit the additional training signals derived from these semantic information. The auxiliary tasks have four reasoning objectives: explaining the previous actions, evaluating the trajectory consistency, estimating the progress and predict the next direction. As a result, these additional training signals help the agent to acquire knowledge of semantic representations in order to reason about its activities and build a thorough perception of environments. Our experiments demonstrate that auxiliary reasoning tasks improve both the performance of the main task and the model generalizability by a large margin. We further demonstrate empirically that an agent trained with self-supervised auxiliary reasoning tasks substantially outperforms the previous state-of-the-art method, being the best existing approach on the standard benchmark.

165 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: A dataset of varied and complex robot tasks, described in natural language, in terms of objects visible in a large set of real images, and a novel Interactive Navigator-Pointer model is proposed that provides a strong baseline on the task.
Abstract: One of the long-term challenges of robotics is to enable robots to interact with humans in the visual world via natural language, as humans are visual animals that communicate through language. Overcoming this challenge requires the ability to perform a wide variety of complex tasks in response to multifarious instructions from humans. In the hope that it might drive progress towards more flexible and powerful human interactions with robots, we propose a dataset of varied and complex robot tasks, described in natural language, in terms of objects visible in a large set of real images. Given an instruction, success requires navigating through a previously-unseen environment to identify an object. This represents a practical challenge, but one that closely reflects one of the core visual problems in robotics. Several state-of-the-art vision-and-language navigation, and referring-expression models are tested to verify the difficulty of this new task, but none of them show promising results because there are many fundamental differences between our task and previous ones. A novel Interactive Navigator-Pointer model is also proposed that provides a strong baseline on the task. The proposed model especially achieves the best performance on the unseen test split, but still leaves substantial room for improvement compared to the human performance. Repository: https://github.com/YuankaiQi/REVERIE.

164 citations


Journal ArticleDOI
TL;DR: In this article, a convolutional neural network based on different word embeddings was evaluated and compared to a classification based on user-level linguistic metadata, which achieved state-of-the-art results in a current early detection task.
Abstract: Depression is ranked as the largest contributor to global disability and is also a major reason for suicide. Still, many individuals suffering from forms of depression are not treated for various reasons. Previous studies have shown that depression also has an effect on language usage and that many depressed individuals use social media platforms or the internet in general to get information or discuss their problems. This paper addresses the early detection of depression using machine learning models based on messages on a social platform. In particular, a convolutional neural network based on different word embeddings is evaluated and compared to a classification based on user-level linguistic metadata. An ensemble of both approaches is shown to achieve state-of-the-art results in a current early detection task. Furthermore, the currently popular $ERDE$ E R D E score as metric for early detection systems is examined in detail and its drawbacks in the context of shared tasks are illustrated. A slightly modified metric is proposed and compared to the original score. Finally, a new word embedding was trained on a large corpus of the same domain as the described task and is evaluated as well.

152 citations


Journal ArticleDOI
TL;DR: In this paper, a teacher-student curriculum learning (TSCL) framework is proposed, where the student tries to learn a complex task, and the teacher automatically chooses subtasks from a given set for the student to train on.
Abstract: We propose Teacher–Student Curriculum Learning (TSCL), a framework for automatic curriculum learning, where the Student tries to learn a complex task, and the Teacher automatically chooses subtasks from a given set for the Student to train on. We describe a family of Teacher algorithms that rely on the intuition that the Student should practice more those tasks on which it makes the fastest progress, i.e., where the slope of the learning curve is highest. In addition, the Teacher algorithms address the problem of forgetting by also choosing tasks where the Student’s performance is getting worse. We demonstrate that TSCL matches or surpasses the results of carefully hand-crafted curricula in two tasks: addition of decimal numbers with long short-term memory (LSTM) and navigation in Minecraft. Our automatically ordered curriculum of submazes enabled to solve a Minecraft maze that could not be solved at all when training directly on that maze, and the learning was an order of magnitude faster than a uniform sampling of those submazes.

146 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: In this article, task-specific gating modules are used to select which filters to apply on the given input, ensuring no loss in the performance of the model for previously learned tasks.
Abstract: Convolutional Neural Networks experience catastrophic forgetting when optimized on a sequence of learning problems: as they meet the objective of the current training examples, their performance on previous tasks drops drastically. In this work, we introduce a novel framework to tackle this problem with conditional computation. We equip each convolutional layer with task-specific gating modules, selecting which filters to apply on the given input. This way, we achieve two appealing properties. Firstly, the execution patterns of the gates allow to identify and protect important filters, ensuring no loss in the performance of the model for previously learned tasks. Secondly, by using a sparsity objective, we can promote the selection of a limited set of kernels, allowing to retain sufficient model capacity to digest new tasks. Existing solutions require, at test time, awareness of the task to which each example belongs to. This knowledge, however, may not be available in many practical scenarios. Therefore, we additionally introduce a task classifier that predicts the task label of each example, to deal with settings in which a task oracle is not available. We validate our proposal on four continual learning datasets. Results show that our model consistently outperforms existing methods both in the presence and the absence of a task oracle. Notably, on Split SVHN and Imagenet-50 datasets, our model yields up to 23.98% and 17.42% improvement in accuracy w.r.t. competing methods.

136 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: This work presents a novel technique, coined implicit affordances, to effectively leverage RL for urban driving thus including lane keeping, pedestrians and vehicles avoidance, and traffic light detection, and is the first to present a successful RL agent handling such a complex task especially regarding the traffic light Detection.
Abstract: Reinforcement Learning (RL) aims at learning an optimal behavior policy from its own experiments and not rule-based control methods. However, there is no RL algorithm yet capable of handling a task as difficult as urban driving. We present a novel technique, coined implicit affordances, to effectively leverage RL for urban driving thus including lane keeping, pedestrians and vehicles avoidance, and traffic light detection. To our knowledge we are the first to present a successful RL agent handling such a complex task especially regarding the traffic light detection. Furthermore, we have demonstrated the effectiveness of our method by winning the Camera Only track of the CARLA challenge.

Journal ArticleDOI
16 Jan 2020
TL;DR: The proposed Improved WOA for Cloud task scheduling (IWC) has better convergence speed and accuracy in searching for the optimal task scheduling plans, compared to the current metaheuristic algorithms, and can also achieve better performance on system resource utilization.
Abstract: Task scheduling in cloud computing can directly affect the resource usage and operational cost of a system. To improve the efficiency of task executions in a cloud, various metaheuristic algorithms, as well as their variations, have been proposed to optimize the scheduling. In this article, for the first time, we apply the latest metaheuristics whale optimization algorithm (WOA) for cloud task scheduling with a multiobjective optimization model, aiming at improving the performance of a cloud system with given computing resources. On that basis, we propose an advanced approach called I mproved W OA for C loud task scheduling (IWC) to further improve the optimal solution search capability of the WOA-based method. We present the detailed implementation of IWC and our simulation-based experiments show that the proposed IWC has better convergence speed and accuracy in searching for the optimal task scheduling plans, compared to the current metaheuristic algorithms. Moreover, it can also achieve better performance on system resource utilization, in the presence of both small and large-scale tasks.

Journal ArticleDOI
18 Feb 2020
TL;DR: RLBench as discussed by the authors is a large-scale few-shot benchmark for robot learning with hundreds of hand-designed tasks, ranging from simple target reaching and door opening to longer multi-stage tasks such as opening an oven and placing a tray in it.
Abstract: We present a challenging new benchmark and learning-environment for robot learning: RLBench. The benchmark features 100 completely unique, hand-designed tasks, ranging in difficulty from simple target reaching and door opening to longer multi-stage tasks, such as opening an oven and placing a tray in it. We provide an array of both proprioceptive observations and visual observations, which include rgb, depth, and segmentation masks from an over-the-shoulder stereo camera and an eye-in-hand monocular camera. Uniquely, each task comes with an infinite supply of demos through the use of motion planners operating on a series of waypoints given during task creation time; enabling an exciting flurry of demonstration-based learning possibilities. RLBench has been designed with scalability in mind; new tasks, along with their motion-planned demos, can be easily created and then verified by a series of tools, allowing users to submit their own tasks to the RLBench task repository. This large-scale benchmark aims to accelerate progress in a number of vision-guided manipulation research areas, including: reinforcement learning, imitation learning, multi-task learning, geometric computer vision, and in particular, few-shot learning. With the benchmark's breadth of tasks and demonstrations, we propose the first large-scale few-shot challenge in robotics. We hope that the scale and diversity of RLBench offers unparalleled research opportunities in the robot learning community and beyond. Benchmarking code and videos can be found at https://sites.google.com/view/rlbench .

Proceedings ArticleDOI
14 Jun 2020
TL;DR: This article proposed a task-agnostic meta-learning approach that learns a set of generalized parameters that are neither specific to old nor new tasks, which is ensured by a new meta-update rule which avoids catastrophic forgetting.
Abstract: Humans can continuously learn new knowledge as their experience grows. In contrast, previous learning in deep neural networks can quickly fade out when they are trained on a new task. In this paper, we hypothesize this problem can be avoided by learning a set of generalized parameters, that are neither specific to old nor new tasks. In this pursuit, we introduce a novel meta-learning approach that seeks to maintain an equilibrium between all the encountered tasks. This is ensured by a new meta-update rule which avoids catastrophic forgetting. In comparison to previous meta-learning techniques, our approach is task-agnostic. When presented with a continuum of data, our model automatically identifies the task and quickly adapts to it with just a single update. We perform extensive experiments on five datasets in a class-incremental setting, leading to significant improvements over the state of the art methods (e.g., a 21.3% boost on CIFAR100 with 10 incremental tasks). Specifically, on large-scale datasets that generally prove difficult cases for incremental learning, our approach delivers absolute gains as high as 19.1% and 7.4% on ImageNet and MS-Celeb datasets, respectively.

Journal ArticleDOI
TL;DR: A self-regulated EMTO (SREMTO) algorithm is proposed to automatically adapt the intensity of cross-task knowledge transfer to different and varying degrees of relatedness between different tasks as the search proceeds so that the useful knowledge in common for solving related tasks can be captured, shared, and utilized to a great extent.
Abstract: Evolutionary multitask optimization (EMTO) is a newly emerging research area in the field of evolutionary computation. It investigates how to solve multiple optimization problems (tasks) at the same time via evolutionary algorithms (EAs) to improve on the performance of solving each task independently, assuming if some component tasks are related then the useful knowledge (e.g., promising candidate solutions) acquired during the process of solving one task may assist in (and also benefit from) solving the other tasks. In EMTO, task relatedness is typically unknown in advance and needs to be captured via EA’s population. Since the population of an EA can only cover a subregion of the solution space and keeps evolving during the search, thus captured task relatedness is local and dynamic. The multifactorial EA (MFEA) is one of the most representative EMTO techniques, inspired by the bio-cultural model of multifactorial inheritance, which transmits both biological and cultural traits from the parents to the offspring. MFEA has succeeded in solving various multitask optimization (MTO) problems. However, the intensity of knowledge transfer in MFEA is determined via its algorithmic configuration without considering the degree of task relatedness, which may prevent the effective sharing and utilization of the useful knowledge acquired in related tasks. To address this issue, we propose a self-regulated EMTO (SREMTO) algorithm to automatically adapt the intensity of cross-task knowledge transfer to different and varying degrees of relatedness between different tasks as the search proceeds so that the useful knowledge in common for solving related tasks can be captured, shared, and utilized to a great extent. We compare SREMTO with MFEA and its variants as well as the single-task optimization counterpart of SREMTO on two MTO test suites, which demonstrates the superiority of SREMTO.

Journal ArticleDOI
TL;DR: A deep reinforcement learning (DRL) framework based on the actor-critic learning structure is proposed, which achieves up to 99.1% of the optimal performance while significantly reducing the computational complexity compared to the existing optimization methods.
Abstract: In this paper, we consider a mobile-edge computing (MEC) system, where an access point (AP) assists a mobile device (MD) to execute an application consisting of multiple tasks following a general task call graph. The objective is to jointly determine the offloading decision of each task and the resource allocation (e.g., CPU computing power) under time-varying wireless fading channels and stochastic edge computing capability, so that the energy-time cost (ETC) of the MD is minimized. Solving the problem is particularly hard due to the combinatorial offloading decisions and the strong coupling among task executions under the general dependency model. Conventional numerical optimization methods are inefficient to solve such a problem, especially when the problem size is large. To address the issue, we propose a deep reinforcement learning (DRL) framework based on the actor-critic learning structure. In particular, the actor network utilizes a DNN to learn the optimal mapping from the input states (i.e., wireless channel gains and edge CPU frequency) to the binary offloading decision of each task. Meanwhile, by analyzing the structure of the optimal solution, we derive a low-complexity algorithm for the critic network to quickly evaluate the ETC performance of the offloading decisions output by the actor network. With the low-complexity critic network, we can quickly select the best offloading action and subsequently store the state-action pair in an experience replay memory as the training dataset to continuously improve the action generation DNN. To further reduce the complexity, we show that the optimal offloading decision exhibits an one-climb structure, which can be utilized to significantly reduce the search space of action generation. Numerical results show that for various types of task graphs, the proposed algorithm achieves up to 99.1% of the optimal performance while significantly reducing the computational complexity compared to the existing optimization methods.

Proceedings ArticleDOI
14 Jun 2020
TL;DR: Li et al. as mentioned in this paper propose a multimodal transformer architecture accompanied by a rich representation for text in images, which naturally fuses different modalities homogeneously by embedding them into a common semantic space where self-attention is applied to model inter-and intra-modality context.
Abstract: Many visual scenes contain text that carries crucial information, and it is thus essential to understand text in images for downstream reasoning tasks. For example, a deep water label on a warning sign warns people about the danger in the scene. Recent work has explored the TextVQA task that requires reading and understanding text in images to answer a question. However, existing approaches for TextVQA are mostly based on custom pairwise fusion mechanisms between a pair of two modalities and are restricted to a single prediction step by casting TextVQA as a classification task. In this work, we propose a novel model for the TextVQA task based on a multimodal transformer architecture accompanied by a rich representation for text in images. Our model naturally fuses different modalities homogeneously by embedding them into a common semantic space where self-attention is applied to model inter- and intra- modality context. Furthermore, it enables iterative answer decoding with a dynamic pointer network, allowing the model to form an answer through multi-step prediction instead of one-step classification. Our model outperforms existing approaches on three benchmark datasets for the TextVQA task by a large margin.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed to integrate the opportunistic and participatory modes in a two-phased hybrid framework called HyTasker, which jointly optimizes them with a total incentive budget constraint.
Abstract: Task allocation is a major challenge in Mobile Crowd Sensing (MCS). While previous task allocation approaches follow either the opportunistic or participatory mode, this paper proposes to integrate these two complementary modes in a two-phased hybrid framework called HyTasker. In the offline phase, a group of workers (called opportunistic workers ) are selected, and they complete MCS tasks during their daily routines (i.e., opportunistic mode). In the online phase, we assign another set of workers (called participatory workers ) and require them to move specifically to perform tasks that are not completed by the opportunistic workers (i.e., participatory mode). Instead of considering these two phases separately, HyTasker jointly optimizes them with a total incentive budget constraint. In particular, when selecting opportunistic workers in the offline phase of HyTasker, we propose a novel algorithm that simultaneously considers the predicted task assignment for the participatory workers, in which the density and mobility of participatory workers are taken into account. Experiments on two real-world mobility datasets demonstrate that HyTasker outperforms other methods with more completed tasks under the same budget constraint.

Journal ArticleDOI
TL;DR: A task-based taxonomy of 154 cognitive biases organized in 7 main categories is proposed that will help visualization researchers relate their design to the corresponding possible biases, and lead to new research that detects and addresses biased judgment and decision making in data visualization.
Abstract: Information visualization designers strive to design data displays that allow for efficient exploration, analysis, and communication of patterns in data, leading to informed decisions. Unfortunately, human judgment and decision making are imperfect and often plagued by cognitive biases. There is limited empirical research documenting how these biases affect visual data analysis activities. Existing taxonomies are organized by cognitive theories that are hard to associate with visualization tasks. Based on a survey of the literature we propose a task-based taxonomy of 154 cognitive biases organized in 7 main categories. We hope the taxonomy will help visualization researchers relate their design to the corresponding possible biases, and lead to new research that detects and addresses biased judgment and decision making in data visualization.

BookDOI
13 Feb 2020
TL;DR: In this article, the Cognition and Technology Group at Vanderbilt, Toward Integrated Curricula: Possibilities From Anchored Instruction, W.W. Rohwer,Jr., Proficient Autonomous Learning: Problems and Prospects.
Abstract: Contents: Preface. J.W. Thomas, W.D. Rohwer,Jr., Proficient Autonomous Learning: Problems and Prospects. The Cognition and Technology Group at Vanderbilt, Toward Integrated Curricula: Possibilities From Anchored Instruction. D. Cervone, The Role of Self-Referent Cognitions in Goal Setting, Motivation, and Performance. B. Means, Cognitive Task Analysis as a Basis for Instructional Design. G. Gabrys, A. Weiner, A. Lesgold, Learning by Problem Solving in a Coached Apprenticeship System. A.C. Graesser, N.K. Person, J. Huber, Question Asking During Tutoring and in the Design of Educational Software. S.J. Ceci, A.I. Ruiz, Inserting Context into our Thinking About Thinking: Implications for a Theory of Everyday Intelligent Behavior. A. Elstein, M. Rabinowitz, Medical Cognition: Research and Evaluation. G.A. Klein, R.R. Hoffman, Seeing the Invisible: Perceptual-Cognitive Aspects of Expertise.

Journal ArticleDOI
TL;DR: An original hardware/software architecture, Motion Analysis System (MAS), aimed at the human body digitalization and analysis during the execution of manufacturing/assembly tasks within the common industrial workstation is presented.

Journal ArticleDOI
01 Jun 2020
TL;DR: In the proposed MaTEA, an adaptive selection mechanism is proposed to select suitable “assisted” task for a given task by considering the similarity between tasks and the accumulated rewards of knowledge transfer during the evolution.
Abstract: Multi-task optimization is an emerging research topic in computational intelligence community. In this paper, we propose a novel evolutionary framework, many-task evolutionary algorithm (MaTEA), for many-task optimization. In the proposed MaTEA, an adaptive selection mechanism is proposed to select suitable “assisted” task for a given task by considering the similarity between tasks and the accumulated rewards of knowledge transfer during the evolution. Besides, a knowledge transfer schema via crossover is adopted to exchange information among tasks to improve the search efficiency. In addition, to facilitate measuring similarity between tasks and transferring knowledge among tasks that arrive at different time instances, multiple archives are integrated with the proposed MaTEA. Experiments on both single-objective and multi-objective optimization problems have demonstrated that the proposed MaTEA can outperform the state-of-the-art multi-task evolutionary algorithms, in terms of search efficiency and solution accuracy. Besides, the proposed MaTEA is also capable of solving dynamic many-task optimization where tasks arrive at different time instances.

Journal ArticleDOI
TL;DR: A Multi-granularity Image-text Alignments (MIA) model is proposed to alleviate the cross-modal fine-grained problem for better similarity evaluation in description-based person Re-id and obtains the state-of-the-art performance on the CUHK-PEDES dataset.
Abstract: Description-based person re-identification (Re-id) is an important task in video surveillance that requires discriminative cross-modal representations to distinguish different people. It is difficult to directly measure the similarity between images and descriptions due to the modality heterogeneity (the cross-modal problem). And all samples belonging to a single category (the fine-grained problem) makes this task even harder than the conventional image-description matching task. In this paper, we propose a Multi-granularity Image-text Alignments (MIA) model to alleviate the cross-modal fine-grained problem for better similarity evaluation in description-based person Re-id. Specifically, three different granularities, i.e. , global-global, global-local and local-local alignments are carried out hierarchically. Firstly, the global-global alignment in the Global Contrast (GC) module is for matching the global contexts of images and descriptions. Secondly, the global-local alignment employs the potential relations between local components and global contexts to highlight the distinguishable components while eliminating the uninvolved ones adaptively in the Relation-guided Global-local Alignment (RGA) module. Thirdly, as for the local-local alignment, we match visual human parts with noun phrases in the Bi-directional Fine-grained Matching (BFM) module. The whole network combining multiple granularities can be end-to-end trained without complex pre-processing. To address the difficulties in training the combination of multiple granularities, an effective step training strategy is proposed to train these granularities step-by-step. Extensive experiments and analysis have shown that our method obtains the state-of-the-art performance on the CUHK-PEDES dataset and outperforms the previous methods by a significant margin.

Journal ArticleDOI
TL;DR: This work proposes a joint learning framework called Self-Paced Fine-Tuning Network (SPFTN) for localizing and segmenting objects in weakly labelled videos and achieves superior performance when compared with other state-of-the-art methods and the baseline networks/models.
Abstract: Object localization and segmentation in weakly labeled videos are two interesting yet challenging tasks. Models built for simultaneous object localization and segmentation have been explored in the conventional fully supervised learning scenario to boost the performance of each task. However, none of the existing works has attempted to jointly learn object localization and segmentation models under weak supervision. To this end, we propose a joint learning framework called Self-Paced Fine-Tuning Network (SPFTN) for localizing and segmenting objects in weakly labelled videos. Learning the deep model jointly for object localization and segmentation under weak supervision is very challenging as the learning process of each single task would face serious ambiguity issue due to the lack of bounding-box or pixel-level supervision. To address this problem, our proposed deep SPFTN model is carefully designed with a novel multi-task self-paced learning objective, which leverages the task-specific prior knowledge and the knowledge that has been already captured to infer the confident training samples for each task. By aggregating the confident knowledge from each single task to mine reliable patterns and learning deep feature representation for both tasks, the proposed learning framework can address the ambiguity issue under weak supervision with simple optimization. Comprehensive experiments on the large-scale YouTube-Objects and DAVIS datasets demonstrate that the proposed approach achieves superior performance when compared with other state-of-the-art methods and the baseline networks/models.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a failure prediction algorithm based on multi-layer Bidirectional Long Short Term Memory (Bi-LSTM) to identify task and job failures in the cloud.
Abstract: A large-scale cloud data center needs to provide high service reliability and availability with low failure occurrence probability. However, current large-scale cloud data centers still face high failure rates due to many reasons such as hardware and software failures, which often result in task and job failures. Such failures can severely reduce the reliability of cloud services and also occupy huge amount of resources to recover the service from failures. Therefore, it is important to predict task or job failures before occurrence with high accuracy to avoid unexpected wastage. Many machine learning and deep learning based methods have been proposed for the task or job failure prediction by analyzing past system message logs and identifying the relationship between the data and the failures. In order to further improve the failure prediction accuracy of the previous machine learning and deep learning based methods, in this paper, we propose a failure prediction algorithm based on multi-layer Bidirectional Long Short Term Memory (Bi-LSTM) to identify task and job failures in the cloud. The trace-driven experiments show that our algorithm outperforms other state-of-art prediction methods with 93% accuracy and 87% for task failure and job failures respectively.

Proceedings ArticleDOI
14 Jun 2020
TL;DR: This work proposes a composite transformer that can be seamlessly plugged in a CNN to selectively preserve and transform the visual features conditioned on language semantics, thus yielding an expressive representation for effective image search.
Abstract: Image search with text feedback has promising impacts in various real-world applications, such as e-commerce and internet search. Given a reference image and text feedback from user, the goal is to retrieve images that not only resemble the input image, but also change certain aspects in accordance with the given text. This is a challenging task as it requires the synergistic understanding of both image and text. In this work, we tackle this task by a novel Visiolinguistic Attention Learning (VAL) framework. Specifically, we propose a composite transformer that can be seamlessly plugged in a CNN to selectively preserve and transform the visual features conditioned on language semantics. By inserting multiple composite transformers at varying depths, VAL is incentive to encapsulate the multi-granular visiolinguistic information, thus yielding an expressive representation for effective image search. We conduct comprehensive evaluation on three datasets: Fashion200k, Shoes and FashionIQ. Extensive experiments show our model exceeds existing approaches on all datasets, demonstrating consistent superiority in coping with various text feedbacks, including attribute-like and natural language descriptions.

Proceedings ArticleDOI
14 Jun 2020
TL;DR: This work proposes task-and-layer-wise attenuation on the compromised initialization of model-agnostic meta-learning to reduce its influence and names the method as L2F (Learn to Forget).
Abstract: Few-shot learning is a challenging problem where the goal is to achieve generalization from only few examples. Model-agnostic meta-learning (MAML) tackles the problem by formulating prior knowledge as a common initialization across tasks, which is then used to quickly adapt to unseen tasks. However, forcibly sharing an initialization can lead to conflicts among tasks and the compromised (undesired by tasks) location on optimization landscape, thereby hindering the task adaptation. Further, we observe that the degree of conflict differs among not only tasks but also layers of a neural network. Thus, we propose task-and-layer-wise attenuation on the compromised initialization to reduce its influence. As the attenuation dynamically controls (or selectively forgets) the influence of prior knowledge for a given task and each layer, we name our method as L2F (Learn to Forget). The experimental results demonstrate that the proposed method provides faster adaptation and greatly improves the performance. Furthermore, L2F can be easily applied and improve other state-of-the-art MAML-based frameworks, illustrating its simplicity and generalizability.

Journal ArticleDOI
TL;DR: In this article, self-supervision is used to learn a compact and multimodal representation of sensory inputs, which can then be used to improve the sample efficiency of policy learning.
Abstract: Contact-rich manipulation tasks in unstructured environments often require both haptic and visual feedback. It is nontrivial to manually design a robot controller that combines these modalities, which have very different characteristics. While deep reinforcement learning has shown success in learning control policies for high-dimensional inputs, these algorithms are generally intractable to train directly on real robots due to sample complexity. In this article, we use self-supervision to learn a compact and multimodal representation of our sensory inputs, which can then be used to improve the sample efficiency of our policy learning. Evaluating our method on a peg insertion task, we show that it generalizes over varying geometries, configurations, and clearances, while being robust to external perturbations. We also systematically study different self-supervised learning objectives and representation learning architectures. Results are presented in simulation and on a physical robot.

Journal ArticleDOI
TL;DR: A task-oriented user selection incentive mechanism (TRIM) is proposed, in an effort toward a task-centered design framework in MCS, which achieves feasible and efficient user selection while ensuring the privacy and security of the sensing user in M CS.
Abstract: The designs of existing incentive mechanisms in mobile crowdsensing (MCS) are primarily platform-centered or user-centered, while overlooking the multidimensional consideration of sensing task requirements. Therefore, the user selection fails to effectively address the task requirements or the relevant maximization and diversification problems. To tackle these issues, in this paper, with the aid of edge computing, we propose a task-oriented user selection incentive mechanism (TRIM), in an effort toward a task-centered design framework in MCS. Initially, an edge node is deployed to publish the sensing task according to its requirements, and constructs a task vector from multiple dimensions to maximize the satisfaction of the task requirements. Meanwhile, a sensing user constructs a user vector to formalize the personalized preferences for participating in the task response. Furthermore, by introducing a privacy-preserving cosine similarity computing protocol, the similarity level between the task vector and the user vector can be calculated, and subsequently a target user candidate set can be obtained according to the similarity level. In addition, considering the constraint of the task budget, the edge node performs a secondary sensing user selection based on the ratio of the similarity level and the expected reward of the sensing user. By designing a secure multi-party sorting protocol, enhanced by fuzzy closeness and the fuzzy comprehensive evaluation method, the target user set is determined aiming at maximizing the similarity of the task requirements and the user's preferences, while minimizing the payment of the edge node, and ensuring the fairness of the sensing user being selected. The simulation results show that TRIM achieves feasible and efficient user selection while ensuring the privacy and security of the sensing user in MCS. Among the dynamic changes of task requirements, TRIM excels with user selections reaching nearly 90% on the data quality level compliance rate and 70% on the task budget consumption ratio, superior to the other incentive mechanisms.

Journal ArticleDOI
TL;DR: The H-SIA method analyses the system as whole, rather than focus on each component separately, allowing identification of dependent tasks between agents and visualization of propagation of failure between the agents’ tasks.