scispace - formally typeset
Search or ask a question

Showing papers by "French Institute for Research in Computer Science and Automation published in 2019"


Posted Content
TL;DR: Previous efforts to define explainability in Machine Learning are summarized, establishing a novel definition that covers prior conceptual propositions with a major focus on the audience for which explainability is sought, and a taxonomy of recent contributions related to the explainability of different Machine Learning models are proposed.
Abstract: In the last years, Artificial Intelligence (AI) has achieved a notable momentum that may deliver the best of expectations over many application sectors across the field. For this to occur, the entire community stands in front of the barrier of explainability, an inherent problem of AI techniques brought by sub-symbolism (e.g. ensembles or Deep Neural Networks) that were not present in the last hype of AI. Paradigms underlying this problem fall within the so-called eXplainable AI (XAI) field, which is acknowledged as a crucial feature for the practical deployment of AI models. This overview examines the existing literature in the field of XAI, including a prospect toward what is yet to be reached. We summarize previous efforts to define explainability in Machine Learning, establishing a novel definition that covers prior conceptual propositions with a major focus on the audience for which explainability is sought. We then propose and discuss about a taxonomy of recent contributions related to the explainability of different Machine Learning models, including those aimed at Deep Learning methods for which a second taxonomy is built. This literature analysis serves as the background for a series of challenges faced by XAI, such as the crossroads between data fusion and explainability. Our prospects lead toward the concept of Responsible Artificial Intelligence, namely, a methodology for the large-scale implementation of AI methods in real organizations with fairness, model explainability and accountability at its core. Our ultimate goal is to provide newcomers to XAI with a reference material in order to stimulate future research advances, but also to encourage experts and professionals from other disciplines to embrace the benefits of AI in their activity sectors, without any prior bias for its lack of interpretability.

1,602 citations


Posted Content
TL;DR: It is demonstrated that a text-video embedding trained on this data leads to state-of-the-art results for text-to-video retrieval and action localization on instructional video datasets such as YouCook2 or CrossTask.
Abstract: Learning text-video embeddings usually requires a dataset of video clips with manually provided captions. However, such datasets are expensive and time consuming to create and therefore difficult to obtain on a large scale. In this work, we propose instead to learn such embeddings from video data with readily available natural language annotations in the form of automatically transcribed narrations. The contributions of this work are three-fold. First, we introduce HowTo100M: a large-scale dataset of 136 million video clips sourced from 1.22M narrated instructional web videos depicting humans performing and describing over 23k different visual tasks. Our data collection procedure is fast, scalable and does not require any additional manual annotation. Second, we demonstrate that a text-video embedding trained on this data leads to state-of-the-art results for text-to-video retrieval and action localization on instructional video datasets such as YouCook2 or CrossTask. Finally, we show that this embedding transfers well to other domains: fine-tuning on generic Youtube videos (MSR-VTT dataset) and movies (LSMDC dataset) outperforms models trained on these datasets alone. Our dataset, code and models will be publicly available at: this http URL.

440 citations


Proceedings ArticleDOI
27 Oct 2019
TL;DR: This article proposed to learn text-to-video embeddings from video data with readily available natural language annotations in the form of automatically transcribed narrations, which leads to state-of-the-art results on instructional video datasets such as YouCook2 or CrossTask.
Abstract: Learning text-video embeddings usually requires a dataset of video clips with manually provided captions. However, such datasets are expensive and time consuming to create and therefore difficult to obtain on a large scale. In this work, we propose instead to learn such embeddings from video data with readily available natural language annotations in the form of automatically transcribed narrations. The contributions of this work are three-fold. First, we introduce HowTo100M: a large-scale dataset of 136 million video clips sourced from 1.22M narrated instructional web videos depicting humans performing and describing over 23k different visual tasks. Our data collection procedure is fast, scalable and does not require any additional manual annotation. Second, we demonstrate that a text-video embedding trained on this data leads to state-of-the-art results for text-to-video retrieval and action localization on instructional video datasets such as YouCook2 or CrossTask. Finally, we show that this embedding transfers well to other domains: fine-tuning on generic Youtube videos (MSR-VTT dataset) and movies (LSMDC dataset) outperforms models trained on these datasets alone. Our dataset, code and models are publicly available.

402 citations


Proceedings ArticleDOI
15 Jun 2019
TL;DR: This work presents an end-to-end learnable model that exploits a novel contact loss that favors phys- ically plausible hand-object constellations, and improves grasp quality metrics over baselines, using RGB images as input.
Abstract: Estimating hand-object manipulations is essential for in- terpreting and imitating human actions. Previous work has made significant progress towards reconstruction of hand poses and object shapes in isolation. Yet, reconstructing hands and objects during manipulation is a more challeng- ing task due to significant occlusions of both the hand and object. While presenting challenges, manipulations may also simplify the problem since the physics of contact re- stricts the space of valid hand-object configurations. For example, during manipulation, the hand and object should be in contact but not interpenetrate. In this work, we regu- larize the joint reconstruction of hands and objects with ma- nipulation constraints. We present an end-to-end learnable model that exploits a novel contact loss that favors phys- ically plausible hand-object constellations. Our approach improves grasp quality metrics over baselines, using RGB images as input. To train and evaluate the model, we also propose a new large-scale synthetic dataset, ObMan, with hand-object manipulations. We demonstrate the transfer- ability of ObMan-trained models to real data.

343 citations


Journal ArticleDOI
TL;DR: A novel thoughtful decomposition based on the technique of the Logic-Based Benders Decomposition is designed, which solves a relaxed master, with fewer constraints, and a subproblem, whose resolution allows the generation of cuts which will, iteratively, guide the master to tighten its search space.
Abstract: Multi-access edge computing (MEC) has recently emerged as a novel paradigm to facilitate access to advanced computing capabilities at the edge of the network, in close proximity to end devices, thereby enabling a rich variety of latency sensitive services demanded by various emerging industry verticals. Internet-of-Things (IoT) devices, being highly ubiquitous and connected, can offload their computational tasks to be processed by applications hosted on the MEC servers due to their limited battery, computing, and storage capacities. Such IoT applications providing services to offloaded tasks of IoT devices are hosted on edge servers with limited computing capabilities. Given the heterogeneity in the requirements of the offloaded tasks (different computing requirements, latency, and so on) and limited MEC capabilities, we jointly decide on the task offloading (tasks to application assignment) and scheduling (order of executing them), which yields a challenging problem of combinatorial nature. Furthermore, we jointly decide on the computing resource allocation for the hosted applications, and we refer this problem as the Dynamic Task Offloading and Scheduling problem, encompassing the three subproblems mentioned earlier. We mathematically formulate this problem, and owing to its complexity, we design a novel thoughtful decomposition based on the technique of the Logic-Based Benders Decomposition. This technique solves a relaxed master, with fewer constraints, and a subproblem, whose resolution allows the generation of cuts which will, iteratively, guide the master to tighten its search space. Ultimately, both the master and the sub-problem will converge to yield the optimal solution. We show that this technique offers several order of magnitude (more than 140 times) improvements in the run time for the studied instances. One other advantage of this method is its capability of providing solutions with performance guarantees. Finally, we use this method to highlight the insightful performance trends for different vertical industries as a function of multiple system parameters with a focus on the delay-sensitive use cases.

238 citations


Journal ArticleDOI
Naihui Zhou1, Yuxiang Jiang2, Timothy Bergquist3, Alexandra J. Lee4  +185 moreInstitutions (71)
TL;DR: The third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed, concluded that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not.
Abstract: The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.

227 citations


Proceedings ArticleDOI
16 Jun 2019
TL;DR: Li et al. as mentioned in this paper proposed a perspective-aware convolutional neural network (PACNN) for efficient crowd counting, which integrates the perspective information into density regression to provide additional knowledge of the person scale change in an image.
Abstract: Crowd counting is the task of estimating people numbers in crowd images. Modern crowd counting methods employ deep neural networks to estimate crowd counts via crowd density regressions. A major challenge of this task lies in the perspective distortion, which results in drastic person scale change in an image. Density regression on the small person area is in general very hard. In this work, we propose a perspective-aware convolutional neural network (PACNN) for efficient crowd counting, which integrates the perspective information into density regression to provide additional knowledge of the person scale change in an image. Ground truth perspective maps are firstly generated for training; PACNN is then specifically designed to predict multi-scale perspective maps and encode them as perspective-aware weighting layers in the network to adaptively combine the outputs of multi-scale density maps. The weights are learned at every pixel of the maps such that the final density combination is robust to the perspective distortion. We conduct extensive experiments on the ShanghaiTech, WorldExpo’10, UCF_CC_50, and UCSD datasets, and demonstrate the effectiveness and efficiency of PACNN over the state-of-the-art.

188 citations


Journal ArticleDOI
20 Mar 2019-Nature
TL;DR: A set of 355 self-assembling DNA ‘tiles’ can be reprogrammed to implement many different computer algorithms—including sorting, palindrome testing and divisibility by three—suggesting that molecular self-assembly could be a reliable algorithmic component in programmable chemical systems.
Abstract: Molecular biology provides an inspiring proof-of-principle that chemical systems can store and process information to direct molecular activities such as the fabrication of complex structures from molecular components. To develop information-based chemistry as a technology for programming matter to function in ways not seen in biological systems, it is necessary to understand how molecular interactions can encode and execute algorithms. The self-assembly of relatively simple units into complex products1 is particularly well suited for such investigations. Theory that combines mathematical tiling and statistical-mechanical models of molecular crystallization has shown that algorithmic behaviour can be embedded within molecular self-assembly processes2,3, and this has been experimentally demonstrated using DNA nanotechnology4 with up to 22 tile types5-11. However, many information technologies exhibit a complexity threshold-such as the minimum transistor count needed for a general-purpose computer-beyond which the power of a reprogrammable system increases qualitatively, and it has been unclear whether the biophysics of DNA self-assembly allows that threshold to be exceeded. Here we report the design and experimental validation of a DNA tile set that contains 355 single-stranded tiles and can, through simple tile selection, be reprogrammed to implement a wide variety of 6-bit algorithms. We use this set to construct 21 circuits that execute algorithms including copying, sorting, recognizing palindromes and multiples of 3, random walking, obtaining an unbiased choice from a biased random source, electing a leader, simulating cellular automata, generating deterministic and randomized patterns, and counting to 63, with an overall per-tile error rate of less than 1 in 3,000. These findings suggest that molecular self-assembly could be a reliable algorithmic component within programmable chemical systems. The development of molecular machines that are reprogrammable-at a high level of abstraction and thus without requiring knowledge of the underlying physics-will establish a creative space in which molecular programmers can flourish.

186 citations


Journal ArticleDOI
TL;DR: In this article, a conditional variational autoencoder network is proposed to learn a low-dimensional probabilistic deformation model from data which can be used for the registration and the analysis of deformations.
Abstract: We propose to learn a low-dimensional probabilistic deformation model from data which can be used for the registration and the analysis of deformations. The latent variable model maps similar deformations close to each other in an encoding space. It enables to compare deformations, to generate normal or pathological deformations for any new image, or to transport deformations from one image pair to any other image. Our unsupervised method is based on the variational inference. In particular, we use a conditional variational autoencoder network and constrain transformations to be symmetric and diffeomorphic by applying a differentiable exponentiation layer with a symmetric loss function. We also present a formulation that includes spatial regularization such as the diffusion-based filters. In addition, our framework provides multi-scale velocity field estimations. We evaluated our method on 3-D intra-subject registration using 334 cardiac cine-MRIs. On this dataset, our method showed the state-of-the-art performance with a mean DICE score of 81.2% and a mean Hausdorff distance of 7.3 mm using 32 latent dimensions compared to three state-of-the-art methods while also demonstrating more regular deformation fields. The average time per registration was 0.32 s. Besides, we visualized the learned latent space and showed that the encoded deformations can be used to transport deformations and to cluster diseases with a classification accuracy of 83% after applying a linear projection.

173 citations


Proceedings ArticleDOI
15 Jun 2019
TL;DR: Zhang et al. as discussed by the authors proposed a curriculum learning strategy to train the network from images of relatively accurate and easy pseudo ground truth first, which can simultaneously detect the size and location of human heads and count them in crowds.
Abstract: Modern crowd counting methods usually employ deep neural networks (DNN) to estimate crowd counts via density regression. Despite their significant improvements, the regression-based methods are incapable of providing the detection of individuals in crowds. The detection-based methods, on the other hand, have not been largely explored in recent trends of crowd counting due to the needs for expensive bounding box annotations. In this work, we instead propose a new deep detection network with only point supervision required. It can simultaneously detect the size and location of human heads and count them in crowds. We first mine useful person size information from point-level annotations and initialize the pseudo ground truth bounding boxes. An online updating scheme is introduced to refine the pseudo ground truth during training; while a locally-constrained regression loss is designed to provide additional constraints on the size of the predicted boxes in a local neighborhood. In the end, we propose a curriculum learning strategy to train the network from images of relatively accurate and easy pseudo ground truth first. Extensive experiments are conducted in both detection and counting tasks on several standard benchmarks, e.g. ShanghaiTech, UCF_CC_50, WiderFace, and TRANCOS datasets, and the results show the superiority of our method over the state-of-the-art.

166 citations


Posted Content
TL;DR: ContinContinual Learning (CL) is a particular machine learning paradigm where the data distribution and learning objective changes through time, or where all the training data and objective criteria are never available at once as mentioned in this paper.
Abstract: Continual learning (CL) is a particular machine learning paradigm where the data distribution and learning objective changes through time, or where all the training data and objective criteria are never available at once. The evolution of the learning process is modeled by a sequence of learning experiences where the goal is to be able to learn new skills all along the sequence without forgetting what has been previously learned. Continual learning also aims at the same time at optimizing the memory, the computation power and the speed during the learning process. An important challenge for machine learning is not necessarily finding solutions that work in the real world but rather finding stable algorithms that can learn in real world. Hence, the ideal approach would be tackling the real world in a embodied platform: an autonomous agent. Continual learning would then be effective in an autonomous agent or robot, which would learn autonomously through time about the external world, and incrementally develop a set of complex skills and knowledge. Robotic agents have to learn to adapt and interact with their environment using a continuous stream of observations. Some recent approaches aim at tackling continual learning for robotics, but most recent papers on continual learning only experiment approaches in simulation or with static datasets. Unfortunately, the evaluation of those algorithms does not provide insights on whether their solutions may help continual learning in the context of robotics. This paper aims at reviewing the existing state of the art of continual learning, summarizing existing benchmarks and metrics, and proposing a framework for presenting and evaluating both robotics and non robotics approaches in a way that makes transfer between both fields easier.

Posted Content
TL;DR: In this article, the authors study the inductive bias of learning in such a regime by analyzing the neural tangent kernel and the corresponding function space (RKHS), and compare to other known kernels for similar architectures.
Abstract: State-of-the-art neural networks are heavily over-parameterized, making the optimization algorithm a crucial ingredient for learning predictive models with good generalization properties. A recent line of work has shown that in a certain over-parameterized regime, the learning dynamics of gradient descent are governed by a certain kernel obtained at initialization, called the neural tangent kernel. We study the inductive bias of learning in such a regime by analyzing this kernel and the corresponding function space (RKHS). In particular, we study smoothness, approximation, and stability properties of functions with finite norm, including stability to image deformations in the case of convolutional networks, and compare to other known kernels for similar architectures.

Journal ArticleDOI
TL;DR: This paper presents GAMA 1.8, the latest revision to date of the platform, with a focus on its modeling language and its capabilities to manage the spatial dimension of models.
Abstract: The agent-based modeling approach is now used in many domains such as geography, ecology, or economy, and more generally to study (spatially explicit) socio-environmental systems where the heterogeneity of the actors and the numerous feedback loops between them requires a modular and incremental approach to modeling. One major reason of this success, besides this conceptual facility, can be found in the support provided by the development of increasingly powerful software platforms, which now allow modelers without a strong background in computer science to easily and quickly develop their own models. Another trend observed in the latest years is the development of much more descriptive and detailed models able not only to better represent complex systems, but also answer more intricate questions. In that respect, if all agent-based modeling platforms support the design of small to mid-size models, i.e. models with little heterogeneity between agents, simple representation of the environment, simple agent decision-making processes, etc., very few are adapted to the design of large-scale models. GAMA is one of the latter. It has been designed with the aim of supporting the writing (and composing) of fairly complex models, with a strong support of the spatial dimension, while guaranteeing non-computer scientists an easy access to high-level, otherwise complex, operations. This paper presents GAMA 1.8, the latest revision to date of the platform, with a focus on its modeling language and its capabilities to manage the spatial dimension of models. The capabilities of GAMA are illustrated by the presentation of applications that take advantage of its new features.

Journal ArticleDOI
TL;DR: It is shown that the task of phonation was more efficient than speech tasks in the detection of disease and compared with other approaches that use the same data set.

Journal ArticleDOI
01 Jan 2019-Nature
TL;DR: In this article, it was shown that the jump from the ground state to an excited state of a superconducting artificial three-level atom can be tracked as it follows a predictable "flight" by monitoring the population of an auxiliary energy level coupled to the ground states.
Abstract: In quantum physics, measurements can fundamentally yield discrete and random results. Emblematic of this feature is Bohr’s 1913 proposal of quantum jumps between two discrete energy levels of an atom1. Experimentally, quantum jumps were first observed in an atomic ion driven by a weak deterministic force while under strong continuous energy measurement2–4. The times at which the discontinuous jump transitions occur are reputed to be fundamentally unpredictable. Despite the non-deterministic character of quantum physics, is it possible to know if a quantum jump is about to occur? Here we answer this question affirmatively: we experimentally demonstrate that the jump from the ground state to an excited state of a superconducting artificial three-level atom can be tracked as it follows a predictable ‘flight’, by monitoring the population of an auxiliary energy level coupled to the ground state. The experimental results demonstrate that the evolution of each completed jump is continuous, coherent and deterministic. We exploit these features, using real-time monitoring and feedback, to catch and reverse quantum jumps mid-flight—thus deterministically preventing their completion. Our findings, which agree with theoretical predictions essentially without adjustable parameters, support the modern quantum trajectory theory5–9 and should provide new ground for the exploration of real-time intervention techniques in the control of quantum systems, such as the early detection of error syndromes in quantum error correction. Experiment overturns Bohr’s view of quantum jumps, demonstrating that they possess a degree of predictability and when completed are continuous, coherent and even deterministic.

Proceedings ArticleDOI
27 Oct 2019
TL;DR: In this article, the authors propose to detect visual relations in images of the form of triplets t = (subject, predicate, object), where training examples of the individual entities are available but their combinations are unseen at training.
Abstract: We seek to detect visual relations in images of the form of triplets t = (subject, predicate, object), such as “person riding dog”, where training examples of the individual entities are available but their combinations are unseen at training. This is an important set-up due to the combinatorial nature of visual relations : collecting sufficient training data for all possible triplets would be very hard. The contributions of this work are three-fold. First, we learn a representation of visual relations that combines (i) individual embeddings for subject, object and predicate together with (ii) a visual phrase embedding that represents the relation triplet. Second, we learn how to transfer visual phrase embeddings from existing training triplets to unseen test triplets using analogies between relations that involve similar objects. Third, we demonstrate the benefits of our approach on three challenging datasets : on HICO-DET, our model achieves significant improvement over a strong baseline for both frequent and unseen triplets, and we observe similar improvement for the retrieval of unseen triplets with out-of-vocabulary predicates on the COCO-a dataset as well as the challenging unusual triplets in the UnRel dataset.

Book
01 Jul 2019
TL;DR: In this paper, the authors frame cluster analysis and classification in terms of statistical models, thus yielding principled estimation, testing and prediction methods, and sound answers to the central questions, such as how many clusters are there? which method should I use? How should I handle outliers.
Abstract: Cluster analysis finds groups in data automatically. Most methods have been heuristic and leave open such central questions as: how many clusters are there? Which method should I use? How should I handle outliers? Classification assigns new observations to groups given previously classified observations, and also has open questions about parameter tuning, robustness and uncertainty assessment. This book frames cluster analysis and classification in terms of statistical models, thus yielding principled estimation, testing and prediction methods, and sound answers to the central questions. It builds the basic ideas in an accessible but rigorous way, with extensive data examples and R code; describes modern approaches to high-dimensional data and networks; and explains such recent advances as Bayesian regularization, non-Gaussian model-based clustering, cluster merging, variable selection, semi-supervised and robust classification, clustering of functional data, text and images, and co-clustering. Written for advanced undergraduates in data science, as well as researchers and practitioners, it assumes basic knowledge of multivariate calculus, linear algebra, probability and statistics.

Journal ArticleDOI
TL;DR: In this article, the antagonistic philosophies behind two quantitative approaches: certifying robust effects in understandable variables, and evaluating how accurately a built model can forecast future outcomes, are discussed.

Proceedings ArticleDOI
20 May 2019
TL;DR: In this paper, the authors address the problem of distributing GANs so that they are able to train over datasets that are spread on multiple workers, and propose a novel learning procedure for GAN, MD-GAN, to fit this distributed setup.
Abstract: A recent technical breakthrough in the domain of machine learning is the discovery and the multiple applications of Generative Adversarial Networks (GANs). Those generative models are computationally demanding, as a GAN is composed of two deep neural networks, and because it trains on large datasets. A GAN is generally trained on a single server. In this paper, we address the problem of distributing GANs so that they are able to train over datasets that are spread on multiple workers. MD-GAN is exposed as the first solution for this problem: we propose a novel learning procedure for GANs so that they fit this distributed setup. We then compare the performance of MD-GAN to an adapted version of federated learning to GANs, using the MNIST, CIFAR10 and CelebA datasets. MD-GAN exhibits a reduction by a factor of two of the learning complexity on each worker node, while providing better or identical performances with the adaptation of federated learning. We finally discuss the practical implications of distributing GANs.

Journal ArticleDOI
TL;DR: In this article, a U-net convolutional network was used to identify and segment natural forests and eucalyptus plantations, and an indicator of forest disturbance, the tree species Cecropia hololeuca, in very high resolution images (0.3 m) from the WorldView-3 satellite in the Brazilian Atlantic rainforest region.
Abstract: Mapping forest types and tree species at regional scales to provide information for ecologists and forest managers is a new challenge for the remote sensing community. Here, we assess the potential of a U‐net convolutional network, a recent deep learning algorithm, to identify and segment (1) natural forests and eucalyptus plantations, and (2) an indicator of forest disturbance, the tree species Cecropia hololeuca, in very high resolution images (0.3 m) from the WorldView‐3 satellite in the Brazilian Atlantic rainforest region. The networks for forest types and Cecropia trees were trained with 7611 and 1568 red‐green‐blue (RGB) images, respectively, and their dense labeled masks. Eighty per cent of the images were used for training and 20% for validation. The U‐net network segmented forest types with an overall accuracy >95% and an intersection over union (IoU) of 0.96. For C. hololeuca, the overall accuracy was 97% and the IoU was 0.86. The predictions were produced over a 1600 km2 region using WorldView‐3 RGB bands pan‐sharpened at 0.3 m. Natural and eucalyptus forests compose 79 and 21% of the region's total forest cover (82 250 ha). Cecropia crowns covered 1% of the natural forest canopy. An index to describe the level of disturbance of the natural forest fragments based on the spatial distribution of Cecropia trees was developed. Our work demonstrates how a deep learning algorithm can support applications such as vegetation, tree species distributions and disturbance mapping on a regional scale.

Journal ArticleDOI
TL;DR: Simulation results confirmed the promising performance of the hybrid WOA over the other algorithms, and a comprehensive comparison has been accomplished between the proposed optimization techniques.
Abstract: In this paper, a simulation model describing the operation of a PV/wind/diesel hybrid microgrid system with battery bank storage has been proposed. Optimal sizing of the proposed system has been presented to minimize the cost of energy (COE) supplied by the system while increasing the reliability and efficiency of the system presented by the loss of power supply probability (LPSP). Novel optimization algorithms of Whale Optimization Algorithm (WOA), Water Cycle Algorithm (WCA), Moth-Flame Optimizer (MFO), and Hybrid particle swarm-gravitational search algorithm (PSOGSA) have been applied for designing the optimized microgrid. Moreover, a comprehensive comparison has been accomplished between the proposed optimization techniques. The optimal sizing of the system components has been carried out using real-time meteorological data of Abu-Monqar village located in the Western Desert of Egypt for the first time for developing this promising remote area. Statistical study for determining the capability of the optimization algorithm in finding the optimal solution has been presented. Simulation results confirmed the promising performance of the hybrid WOA over the other algorithms.

Proceedings ArticleDOI
18 Sep 2019
TL;DR: A novel skeleton image representation to be used as input of Convolutional Neural Networks (CNNs), named SkeleMotion, is introduced, which encodes the temporal dynamics by explicitly computing the magnitude and orientation values of the skeleton joints.
Abstract: Due to the availability of large-scale skeleton datasets, 3D human action recognition has recently called the attention of computer vision community. Many works have focused on encoding skeleton data as skeleton image representations based on spatial structure of the skeleton joints, in which the temporal dynamics of the sequence is encoded as variations in columns and the spatial structure of each frame is represented as rows of a matrix. To further improve such representations, we introduce a novel skeleton image representation to be used as input of Convolutional Neural Networks (CNNs), named SkeleMotion. The proposed approach encodes the temporal dynamics by explicitly computing the magnitude and orientation values of the skeleton joints. Different temporal scales are employed to compute motion values to aggregate more temporal dynamics to the representation making it able to capture long-range joint interactions involved in actions as well as filtering noisy motion values. Experimental results demonstrate the effectiveness of the proposed representation on 3D action recognition outperforming the state-of-the-art on NTU RGB+D 120 dataset.

Proceedings ArticleDOI
03 Nov 2019
TL;DR: This work explores and proposes alternative evaluation measures and reports that the reported human-evaluation analysis shows that the proposed metrics, based on Question Answering, favorably compare to ROUGE – with the additional property of not requiring reference summaries.
Abstract: ive summarization approaches based on Reinforcement Learning (RL) have recently been proposed to overcome classical likelihood maximization. RL enables to consider complex, possibly non differentiable, metrics that globally assess the quality and relevance of the generated outputs. ROUGE, the most used summarization metric, is known to suffer from bias towards lexical similarity as well as from sub-optimal accounting for fluency and readability of the generated abstracts. We thus explore and propose alternative evaluation measures: the reported human-evaluation analysis shows that the proposed metrics, based on Question Answering, favorably compare to ROUGE – with the additional property of not requiring reference summaries. Training a RL-based model on these metrics leads to improvements (both in terms of human or automated metrics) over current approaches that use ROUGE as reward.

Posted Content
TL;DR: This work experimentally prepares square and hexagonal GKP code states through a feedback protocol that incorporates non-destructive measurements that are implemented with a superconducting microwave cavity having the role of the oscillator and demonstrates QEC of an encoded qubit with suppression of all logical errors.
Abstract: Quantum bits are more robust to noise when they are encoded non-locally. In such an encoding, errors affecting the underlying physical system can then be detected and corrected before they corrupt the encoded information. In 2001, Gottesman, Kitaev and Preskill (GKP) proposed a hardware-efficient instance of such a qubit, which is delocalised in the phase-space of a single oscillator. However, implementing measurements that reveal error syndromes of the oscillator while preserving the encoded information has proved experimentally challenging: the only realisation so far relied on post-selection, which is incompatible with quantum error correction (QEC). The novelty of our experiment is precisely that it implements these non-destructive error-syndrome measurements for a superconducting microwave cavity. We design and implement an original feedback protocol that incorporates such measurements to prepare square and hexagonal GKP code states. We then demonstrate QEC of an encoded qubit with unprecedented suppression of all logical errors, in quantitative agreement with a theoretical estimate based on the measured imperfections of the experiment. Our protocol is applicable to other continuous variable systems and, in contrast with previous implementations of QEC, can mitigate all logical errors generated by a wide variety of noise processes, and open a way towards fault-tolerant quantum computation.

Journal ArticleDOI
TL;DR: Fundamental concepts of urban computing leveraging Location-Based Social Networks data are discussed and a survey of recent urban computing studies that make use of LBSN data is presented.
Abstract: Urban computing is an emerging area of investigation in which researchers study cities using digital data. Location-Based Social Networks (LBSNs) generate one specific type of digital data that offers unprecedented geographic and temporal resolutions. We discuss fundamental concepts of urban computing leveraging LBSN data and present a survey of recent urban computing studies that make use of LBSN data. We also point out the opportunities and challenges that those studies open.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed method outperforms state-of-the-art light fields depth estimation methods, including prior methods based on deep neural architectures.
Abstract: In this paper, we propose a learning-based depth estimation framework suitable for both densely and sparsely sampled light fields. The proposed framework consists of three processing steps: initial depth estimation, fusion with occlusion handling, and refinement. The estimation can be performed from a flexible subset of input views. The fusion of initial disparity estimates, relying on two warping error measures, allows us to have an accurate estimation in occluded regions and along the contours. In contrast with methods relying on the computation of cost volumes, the proposed approach does not need any prior information on the disparity range. Experimental results show that the proposed method outperforms state-of-the-art light fields depth estimation methods, including prior methods based on deep neural architectures.

Proceedings ArticleDOI
01 Oct 2019
TL;DR: The Tree Structure Reference Joints Image (TSRJI), a novel skeleton image representation to be used as input to CNNs, has been introduced, which has the advantage of combining the use of reference joints and a tree structure skeleton.
Abstract: In the last years, the computer vision research community has studied on how to model temporal dynamics in videos to employ 3D human action recognition. To that end, two main baseline approaches have been researched: (i) Recurrent Neural Networks (RNNs) with Long-Short Term Memory (LSTM); and (ii) skeleton image representations used as input to a Convolutional Neural Network (CNN). Although RNN approaches present excellent results, such methods lack the ability to efficiently learn the spatial relations between the skeleton joints. On the other hand, the representations used to feed CNN approaches present the advantage of having the natural ability of learning structural information from 2D arrays (i.e., they learn spatial relations from the skeleton joints). To further improve such representations, we introduce the Tree Structure Reference Joints Image (TSRJI), a novel skeleton image representation to be used as input to CNNs. The proposed representation has the advantage of combining the use of reference joints and a tree structure skeleton. While the former incorporates different spatial relationships between the joints, the latter preserves important spatial relations by traversing a skeleton tree with a depth-first order algorithm. Experimental results demonstrate the effectiveness of the proposed representation for 3D action recognition on two datasets achieving state-of-the-art results on the recent NTU RGB+D 120 dataset.

Journal ArticleDOI
TL;DR: A 1D repetition code based on the so-called cat qubits as a viable approach toward hardware-efficient universal and fault-tolerant quantum computation, and builds a universal set of fully protected logical gates that avoids the costly magic state preparation, distillation, and injection.
Abstract: A new implementation of quantum error-correcting ``cat codes'' could be extended to a fully tolerant, universal quantum computer with minimal hardware overhead.

Journal ArticleDOI
TL;DR: In this paper, a lower bound on the secret key rate of continuous-variable quantum key distribution with a discrete modulation of coherent states was established, which is valid against collective attacks and is obtained by formulating the problem as a semidefinite program.
Abstract: We establish a lower bound on the asymptotic secret key rate of continuous-variable quantum key distribution with a discrete modulation of coherent states. The bound is valid against collective attacks and is obtained by formulating the problem as a semidefinite program. We illustrate our general approach with the quadrature-phase-shift-keying modulation scheme and show that distances over 100 km are achievable for realistic values of noise. We also discuss the application to more complex quadrature-amplitude-modulation schemes. This result opens the way to establishing the full security of continuous-variable protocols with a discrete modulation, and thereby to the large-scale deployment of these protocols for quantum key distribution.

Proceedings ArticleDOI
27 Oct 2019
TL;DR: This paper introduces a large real-world video dataset for activities of daily living: Toyota Smarthome, and proposes a pose driven spatio-temporal attention mechanism through 3D ConvNets that outperforms state-of-the-art methods on benchmark datasets, as well as on the ToyotaSmarthome dataset.
Abstract: The performance of deep neural networks is strongly influenced by the quantity and quality of annotated data. Most of the large activity recognition datasets consist of data sourced from the web, which does not reflect challenges that exist in activities of daily living. In this paper, we introduce a large real-world video dataset for activities of daily living: Toyota Smarthome. The dataset consists of 16K RGB+D clips of 31 activity classes, performed by seniors in a smarthome. Unlike previous datasets, videos were fully unscripted. As a result, the dataset poses several challenges: high intra-class variation, high class imbalance, simple and composite activities, and activities with similar motion and variable duration. Activities were annotated with both coarse and fine-grained labels. These characteristics differentiate Toyota Smarthome from other datasets for activity recognition. As recent activity recognition approaches fail to address the challenges posed by Toyota Smarthome, we present a novel activity recognition method with attention mechanism. We propose a pose driven spatio-temporal attention mechanism through 3D ConvNets. We show that our novel method outperforms state-of-the-art methods on benchmark datasets, as well as on the Toyota Smarthome dataset. We release the dataset for research use.