scispace - formally typeset
Search or ask a question

Showing papers by "Carnegie Mellon University published in 2018"


Proceedings ArticleDOI
18 Jun 2018
TL;DR: In this article, the non-local operation computes the response at a position as a weighted sum of the features at all positions, which can be used to capture long-range dependencies.
Abstract: Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time. In this paper, we present non-local operations as a generic family of building blocks for capturing long-range dependencies. Inspired by the classical non-local means method [4] in computer vision, our non-local operation computes the response at a position as a weighted sum of the features at all positions. This building block can be plugged into many computer vision architectures. On the task of video classification, even without any bells and whistles, our nonlocal models can compete or outperform current competition winners on both Kinetics and Charades datasets. In static image recognition, our non-local models improve object detection/segmentation and pose estimation on the COCO suite of tasks. Code will be made available.

8,059 citations


Proceedings Article
24 Jun 2018
TL;DR: The proposed algorithm excels in discovering high-performance convolutional architectures for image classification and recurrent architectures for language modeling, while being orders of magnitude faster than state-of-the-art non-differentiable techniques.
Abstract: This paper addresses the scalability challenge of architecture search by formulating the task in a differentiable manner. Unlike conventional approaches of applying evolution or reinforcement learning over a discrete and non-differentiable search space, our method is based on the continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent. Extensive experiments on CIFAR-10, ImageNet, Penn Treebank and WikiText-2 show that our algorithm excels in discovering high-performance convolutional architectures for image classification and recurrent architectures for language modeling, while being orders of magnitude faster than state-of-the-art non-differentiable techniques. Our implementation has been made publicly available to facilitate further research on efficient architecture search algorithms.

2,466 citations


Journal ArticleDOI
TL;DR: In this article, the authors review the current state-of-the-art of CO2 capture, transport, utilisation and storage from a multi-scale perspective, moving from the global to molecular scales.
Abstract: Carbon capture and storage (CCS) is broadly recognised as having the potential to play a key role in meeting climate change targets, delivering low carbon heat and power, decarbonising industry and, more recently, its ability to facilitate the net removal of CO2 from the atmosphere. However, despite this broad consensus and its technical maturity, CCS has not yet been deployed on a scale commensurate with the ambitions articulated a decade ago. Thus, in this paper we review the current state-of-the-art of CO2 capture, transport, utilisation and storage from a multi-scale perspective, moving from the global to molecular scales. In light of the COP21 commitments to limit warming to less than 2 °C, we extend the remit of this study to include the key negative emissions technologies (NETs) of bioenergy with CCS (BECCS), and direct air capture (DAC). Cognisant of the non-technical barriers to deploying CCS, we reflect on recent experience from the UK's CCS commercialisation programme and consider the commercial and political barriers to the large-scale deployment of CCS. In all areas, we focus on identifying and clearly articulating the key research challenges that could usefully be addressed in the coming decade.

2,088 citations


Journal ArticleDOI
TL;DR: It is found that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art.
Abstract: Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems-patient classification, fundamental biological processes and treatment of patients-and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural network's prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine.

1,491 citations


Proceedings ArticleDOI
12 Mar 2018
TL;DR: DUC is designed to generate pixel-level prediction, which is able to capture and decode more detailed information that is generally missing in bilinear upsampling, and a hybrid dilated convolution (HDC) framework in the encoding phase is proposed.
Abstract: Recent advances in deep learning, especially deep convolutional neural networks (CNNs), have led to significant improvement over previous semantic segmentation systems. Here we show how to improve pixel-wise semantic segmentation by manipulating convolution-related operations that are of both theoretical and practical value. First, we design dense upsampling convolution (DUC) to generate pixel-level prediction, which is able to capture and decode more detailed information that is generally missing in bilinear upsampling. Second, we propose a hybrid dilated convolution (HDC) framework in the encoding phase. This framework 1) effectively enlarges the receptive fields (RF) of the network to aggregate global information; 2) alleviates what we call the "gridding issue"caused by the standard dilated convolution operation. We evaluate our approaches thoroughly on the Cityscapes dataset, and achieve a state-of-art result of 80.1% mIOU in the test set at the time of submission. We also have achieved state-of-theart overall on the KITTI road estimation benchmark and the PASCAL VOC2012 segmentation task. Our source code can be found at https://github.com/TuSimple/TuSimple-DUC.

1,358 citations


Journal ArticleDOI
TL;DR: In machine learning, the concept of interpretability is both important and slippery, so it is important to understand how these concepts can be modified.
Abstract: Supervised machine-learning models boast remarkable predictive capabilities. But can you trust your model? Will it work in deployment? What else can it tell you about the world?

1,307 citations


Journal ArticleDOI
25 Apr 2018
TL;DR: An overview of core ideas in GSP and their connection to conventional digital signal processing are provided, along with a brief historical perspective to highlight how concepts recently developed build on top of prior research in other areas.
Abstract: Research in graph signal processing (GSP) aims to develop tools for processing data defined on irregular graph domains. In this paper, we first provide an overview of core ideas in GSP and their connection to conventional digital signal processing, along with a brief historical perspective to highlight how concepts recently developed in GSP build on top of prior research in other areas. We then summarize recent advances in developing basic GSP tools, including methods for sampling, filtering, or graph learning. Next, we review progress in several application areas using GSP, including processing and analysis of sensor network data, biological data, and applications to image processing and machine learning.

1,306 citations


Posted Content
TL;DR: In this article, the authors propose a differentiable architecture search algorithm based on the continuous relaxation of the architecture representation. But the architecture search is not a discrete and non-differentiable search space.
Abstract: This paper addresses the scalability challenge of architecture search by formulating the task in a differentiable manner. Unlike conventional approaches of applying evolution or reinforcement learning over a discrete and non-differentiable search space, our method is based on the continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent. Extensive experiments on CIFAR-10, ImageNet, Penn Treebank and WikiText-2 show that our algorithm excels in discovering high-performance convolutional architectures for image classification and recurrent architectures for language modeling, while being orders of magnitude faster than state-of-the-art non-differentiable techniques. Our implementation has been made publicly available to facilitate further research on efficient architecture search algorithms.

1,272 citations


Journal ArticleDOI
TL;DR: In this article, the authors ask whether or not a supervised machine learning model will work in deployment, and what else can it tell you about the world, besides its predictive capabilities.
Abstract: Supervised machine-learning models boast remarkable predictive capabilities. But can you trust your model? Will it work in deployment? What else can it tell you about the world?

1,197 citations


Posted ContentDOI
Spyridon Bakas1, Mauricio Reyes, Andras Jakab2, Stefan Bauer3  +435 moreInstitutions (111)
TL;DR: This study assesses the state-of-the-art machine learning methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018, and investigates the challenge of identifying the best ML algorithms for each of these tasks.
Abstract: Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multi-parametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumoris a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses thestate-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that underwent gross tota lresection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset.

1,165 citations


Proceedings Article
15 Feb 2018
TL;DR: It is shown that one cause for such failures is the exponential moving average used in the algorithms, and suggested that the convergence issues can be fixed by endowing such algorithms with `long-term memory' of past gradients.
Abstract: Several recently proposed stochastic optimization methods that have been successfully used in training deep networks such as RMSProp, Adam, Adadelta, Nadam are based on using gradient updates scaled by square roots of exponential moving averages of squared past gradients. In many applications, e.g. learning with large output spaces, it has been empirically observed that these algorithms fail to converge to an optimal solution (or a critical point in nonconvex settings). We show that one cause for such failures is the exponential moving average used in the algorithms. We provide an explicit example of a simple convex optimization setting where Adam does not converge to the optimal solution, and describe the precise problems with the previous analysis of Adam algorithm. Our analysis suggests that the convergence issues can be fixed by endowing such algorithms with `long-term memory' of past gradients, and propose new variants of the Adam algorithm which not only fix the convergence issues but often also lead to improved empirical performance.

Proceedings ArticleDOI
15 May 2018
TL;DR: OpenFace 2.0 is an extension of OpenFace toolkit and is capable of more accurate facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.
Abstract: Over the past few years, there has been an increased interest in automatic facial behavior analysis and understanding. We present OpenFace 2.0 - a tool intended for computer vision and machine learning researchers, affective computing community and people interested in building interactive applications based on facial behavior analysis. OpenFace 2.0 is an extension of OpenFace toolkit and is capable of more accurate facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation. The computer vision algorithms which represent the core of OpenFace 2.0 demonstrate state-of-the-art results in all of the above mentioned tasks. Furthermore, our tool is capable of real-time performance and is able to run from a simple webcam without any specialist hardware. Finally, unlike a lot of modern approaches or toolkits, OpenFace 2.0 source code for training models and running them is freely available for research purposes.


Book ChapterDOI
08 Sep 2018
TL;DR: This paper proposes AutoML for Model Compression (AMC) which leverages reinforcement learning to efficiently sample the design space and can improve the model compression quality and achieves state-of-the-art model compression results in a fully automated way without any human efforts.
Abstract: Model compression is an effective technique to efficiently deploy neural network models on mobile devices which have limited computation resources and tight power budgets. Conventional model compression techniques rely on hand-crafted features and require domain experts to explore the large design space trading off among model size, speed, and accuracy, which is usually sub-optimal and time-consuming. In this paper, we propose AutoML for Model Compression (AMC) which leverages reinforcement learning to efficiently sample the design space and can improve the model compression quality. We achieved state-of-the-art model compression results in a fully automated way without any human efforts. Under 4\(\times \) FLOPs reduction, we achieved 2.7% better accuracy than the hand-crafted model compression method for VGG-16 on ImageNet. We applied this automated, push-the-button compression pipeline to MobileNet-V1 and achieved a speedup of 1.53\(\times \) on the GPU (Titan Xp) and 1.95\(\times \) on an Android phone (Google Pixel 1), with negligible loss of accuracy.

Proceedings ArticleDOI
26 Jun 2018
TL;DR: PoseCNN as discussed by the authors estimates the 3D translation of an object by localizing its center in the image and predicting its distance from the camera, and regresses to a quaternion representation.
Abstract: Estimating the 6D pose of known objects is important for robots to interact with the real world. The problem is challenging due to the variety of objects as well as the complexity of a scene caused by clutter and occlusions between objects. In this work, we introduce PoseCNN, a new Convolutional Neural Network for 6D object pose estimation. PoseCNN estimates the 3D translation of an object by localizing its center in the image and predicting its distance from the camera. The 3D rotation of the object is estimated by regressing to a quaternion representation. We also introduce a novel loss function that enables PoseCNN to handle symmetric objects. In addition, we contribute a large scale video dataset for 6D object pose estimation named the YCB-Video dataset. Our dataset provides accurate 6D poses of 21 objects from the YCB dataset observed in 92 videos with 133,827 frames. We conduct extensive experiments on our YCB-Video dataset and the OccludedLINEMOD dataset to show that PoseCNN is highly robust to occlusions, can handle symmetric objects, and provide accurate pose estimation using only color images as input. When using depth data to further refine the poses, our approach achieves state-of-the-art results on the challenging OccludedLINEMOD dataset. Our code and dataset are available at this https URL.

Journal ArticleDOI
TL;DR: Electrical control of magnetism in a bilayer of CrI3 enables the realization of an electrically driven magnetic phase transition and the observation of the magneto-optical Kerr effect in 2D magnets.
Abstract: Controlling magnetism via electric fields addresses fundamental questions of magnetic phenomena and phase transitions1–3, and enables the development of electrically coupled spintronic devices, such as voltage-controlled magnetic memories with low operation energy4–6. Previous studies on dilute magnetic semiconductors such as (Ga,Mn)As and (In,Mn)Sb have demonstrated large modulations of the Curie temperatures and coercive fields by altering the magnetic anisotropy and exchange interaction2,4,7–9. Owing to their unique magnetic properties10–14, the recently reported two-dimensional magnets provide a new system for studying these features15–19. For instance, a bilayer of chromium triiodide (CrI3) behaves as a layered antiferromagnet with a magnetic field-driven metamagnetic transition15,16. Here, we demonstrate electrostatic gate control of magnetism in CrI3 bilayers, probed by magneto-optical Kerr effect (MOKE) microscopy. At fixed magnetic fields near the metamagnetic transition, we realize voltage-controlled switching between antiferromagnetic and ferromagnetic states. At zero magnetic field, we demonstrate a time-reversal pair of layered antiferromagnetic states that exhibit spin-layer locking, leading to a linear dependence of their MOKE signals on gate voltage with opposite slopes. Our results allow for the exploration of new magnetoelectric phenomena and van der Waals spintronics based on 2D materials.

Posted Content
TL;DR: OpenPose is released, the first open-source realtime system for multi-person 2D pose detection, including body, foot, hand, and facial keypoints, and the first combined body and foot keypoint detector, based on an internal annotated foot dataset.
Abstract: Realtime multi-person 2D pose estimation is a key component in enabling machines to have an understanding of people in images and videos. In this work, we present a realtime approach to detect the 2D pose of multiple people in an image. The proposed method uses a nonparametric representation, which we refer to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image. This bottom-up system achieves high accuracy and realtime performance, regardless of the number of people in the image. In previous work, PAFs and body part location estimation were refined simultaneously across training stages. We demonstrate that a PAF-only refinement rather than both PAF and body part location refinement results in a substantial increase in both runtime performance and accuracy. We also present the first combined body and foot keypoint detector, based on an internal annotated foot dataset that we have publicly released. We show that the combined detector not only reduces the inference time compared to running them sequentially, but also maintains the accuracy of each component individually. This work has culminated in the release of OpenPose, the first open-source realtime system for multi-person 2D pose detection, including body, foot, hand, and facial keypoints.

Journal ArticleDOI
29 Jun 2018-Science
TL;DR: In this paper, the authors examine barriers and opportunities associated with these difficult-to-decarbonize services and processes, including possible technological solutions and research and development priorities, and examine the use of existing technologies to meet future demands for these services without net addition of CO2 to the atmosphere.
Abstract: Some energy services and industrial processes-such as long-distance freight transport, air travel, highly reliable electricity, and steel and cement manufacturing-are particularly difficult to provide without adding carbon dioxide (CO2) to the atmosphere. Rapidly growing demand for these services, combined with long lead times for technology development and long lifetimes of energy infrastructure, make decarbonization of these services both essential and urgent. We examine barriers and opportunities associated with these difficult-to-decarbonize services and processes, including possible technological solutions and research and development priorities. A range of existing technologies could meet future demands for these services and processes without net addition of CO2 to the atmosphere, but their use may depend on a combination of cost reductions via research and innovation, as well as coordinated deployment and integration of operations across currently discrete energy industries.

Proceedings Article
01 Feb 2018
TL;DR: In this paper, an end-to-end trainable model for image compression based on variational autoencoders is proposed, which incorporates a hyperprior to effectively capture spatial dependencies in the latent representation.
Abstract: We describe an end-to-end trainable model for image compression based on variational autoencoders. The model incorporates a hyperprior to effectively capture spatial dependencies in the latent representation. This hyperprior relates to side information, a concept universal to virtually all modern image codecs, but largely unexplored in image compression using artificial neural networks (ANNs). Unlike existing autoencoder compression methods, our model trains a complex prior jointly with the underlying autoencoder. We demonstrate that this model leads to state-of-the-art image compression when measuring visual quality using the popular MS-SSIM index, and yields rate-distortion performance surpassing published ANN-based methods when evaluated using a more traditional metric based on squared error (PSNR). Furthermore, we provide a qualitative comparison of models trained for different distortion metrics.

Journal ArticleDOI
TL;DR: It is demonstrated that Fe3GeTe2 (FGT), an exfoliable vdW magnet, exhibits robust 2D ferromagnetism with strong perpendicular anisotropy when thinned down to a monolayer.
Abstract: Discoveries of intrinsic two-dimensional (2D) ferromagnetism in van der Waals (vdW) crystals provide an interesting arena for studying fundamental 2D magnetism and devices that employ localized spins1–4. However, an exfoliable vdW material that exhibits intrinsic 2D itinerant magnetism remains elusive. Here we demonstrate that Fe3GeTe2 (FGT), an exfoliable vdW magnet, exhibits robust 2D ferromagnetism with strong perpendicular anisotropy when thinned down to a monolayer. Layer-number-dependent studies reveal a crossover from 3D to 2D Ising ferromagnetism for thicknesses less than 4 nm (five layers), accompanied by a fast drop of the Curie temperature (TC) from 207 K to 130 K in the monolayer. For FGT flakes thicker than ~15 nm, a distinct magnetic behaviour emerges in an intermediate temperature range, which we show is due to the formation of labyrinthine domain patterns. Our work introduces an atomically thin ferromagnetic metal that could be useful for the study of controllable 2D itinerant ferromagnetism and for engineering spintronic vdW heterostructures5. Metallic ferromagnetism is reported in an exfoliated monolayer of the van der Waals material Fe3GeTe2.

Proceedings ArticleDOI
27 Jun 2018
TL;DR: A novel deep learning framework, namely Long- and Short-term Time-series network (LSTNet), to address this open challenge of multivariate time series forecasting, using the Convolution Neural Network and the Recurrent Neural Network to extract short-term local dependency patterns among variables and to discover long-term patterns for time series trends.
Abstract: Multivariate time series forecasting is an important machine learning problem across many domains, including predictions of solar plant energy output, electricity consumption, and traffic jam situation. Temporal data arise in these real-world applications often involves a mixture of long-term and short-term patterns, for which traditional approaches such as Autoregressive models and Gaussian Process may fail. In this paper, we proposed a novel deep learning framework, namely Long- and Short-term Time-series network (LSTNet), to address this open challenge. LSTNet uses the Convolution Neural Network (CNN) and the Recurrent Neural Network (RNN) to extract short-term local dependency patterns among variables and to discover long-term patterns for time series trends. Furthermore, we leverage traditional autoregressive model to tackle the scale insensitive problem of the neural network model. In our evaluation on real-world data with complex mixtures of repetitive patterns, LSTNet achieved significant performance improvements over that of several state-of-the-art baseline methods. All the data and experiment codes are available online.

Proceedings ArticleDOI
25 Sep 2018
TL;DR: HotpotQA as discussed by the authors is a dataset with 113k Wikipedia-based question-answer pairs with four key features: finding and reasoning over multiple supporting documents to answer; the questions are diverse and not constrained to any pre-existing knowledge bases or knowledge schemas; providing sentence-level supporting facts required for reasoning; and offering a new type of factoid comparison questions to test QA systems' ability to extract relevant facts and perform necessary comparison.
Abstract: Existing question answering (QA) datasets fail to train QA systems to perform complex reasoning and provide explanations for answers We introduce HotpotQA, a new dataset with 113k Wikipedia-based question-answer pairs with four key features: (1) the questions require finding and reasoning over multiple supporting documents to answer; (2) the questions are diverse and not constrained to any pre-existing knowledge bases or knowledge schemas; (3) we provide sentence-level supporting facts required for reasoning, allowing QA systems to reason with strong supervision and explain the predictions; (4) we offer a new type of factoid comparison questions to test QA systems’ ability to extract relevant facts and perform necessary comparison We show that HotpotQA is challenging for the latest QA systems, and the supporting facts enable models to improve performance and make explainable predictions

Proceedings ArticleDOI
18 Jun 2018
TL;DR: This work introduces new sparse convolutional operations that are designed to process spatially-sparse data more efficiently, and uses them to develop Spatially-Sparse Convolutional networks, which outperform all prior state-of-the-art models on two tasks involving semantic segmentation of 3D point clouds.
Abstract: Convolutional networks are the de-facto standard for analyzing spatio-temporal data such as images, videos, and 3D shapes. Whilst some of this data is naturally dense (e.g., photos), many other data sources are inherently sparse. Examples include 3D point clouds that were obtained using a LiDAR scanner or RGB-D camera. Standard "dense" implementations of convolutional networks are very inefficient when applied on such sparse data. We introduce new sparse convolutional operations that are designed to process spatially-sparse data more efficiently, and use them to develop spatially-sparse convolutional networks. We demonstrate the strong performance of the resulting models, called submanifold sparse convolutional networks (SS-CNs), on two tasks involving semantic segmentation of 3D point clouds. In particular, our models outperform all prior state-of-the-art on the test set of a recent semantic segmentation competition.

Journal ArticleDOI
15 Jun 2018-Science
TL;DR: The possibility to push magnetic information storage to the atomically thin limit and CrI3 as a superlative magnetic tunnel barrier for vdW heterostructure spintronic devices is revealed.
Abstract: Magnetic multilayer devices that exploit magnetoresistance are the backbone of magnetic sensing and data storage technologies. Here, we report multiple-spin-filter magnetic tunnel junctions (sf-MTJs) based on van der Waals (vdW) heterostructures in which atomically thin chromium triiodide (CrI3) acts as a spin-filter tunnel barrier sandwiched between graphene contacts. We demonstrate tunneling magnetoresistance that is drastically enhanced with increasing CrI3 layer thickness, reaching a record 19,000% for magnetic multilayer structures using four-layer sf-MTJs at low temperatures. Using magnetic circular dichroism measurements, we attribute these effects to the intrinsic layer-by-layer antiferromagnetic ordering of the atomically thin CrI3 Our work reveals the possibility to push magnetic information storage to the atomically thin limit and highlights CrI3 as a superlative magnetic tunnel barrier for vdW heterostructure spintronic devices.

Posted Content
TL;DR: In this article, a data-dependent latent generative representation of model parameters is learned and a gradient-based meta-learning is performed in a low-dimensional latent space for few-shot learning.
Abstract: Gradient-based meta-learning techniques are both widely applicable and proficient at solving challenging few-shot learning and fast adaptation problems. However, they have practical difficulties when operating on high-dimensional parameter spaces in extreme low-data regimes. We show that it is possible to bypass these limitations by learning a data-dependent latent generative representation of model parameters, and performing gradient-based meta-learning in this low-dimensional latent space. The resulting approach, latent embedding optimization (LEO), decouples the gradient-based adaptation procedure from the underlying high-dimensional space of model parameters. Our evaluation shows that LEO can achieve state-of-the-art performance on the competitive miniImageNet and tieredImageNet few-shot classification tasks. Further analysis indicates LEO is able to capture uncertainty in the data, and can perform adaptation more effectively by optimizing in latent space.

Journal ArticleDOI
31 Jan 2018
TL;DR: These 10 grand challenges may have major breakthroughs, research, and/or socioeconomic impacts in the next 5 to 10 years and represent underpinning technologies that have a wider impact on all application areas of robotics.
Abstract: One of the ambitions of Science Robotics is to deeply root robotics research in science while developing novel robotic platforms that will enable new scientific discoveries. Of our 10 grand challenges, the first 7 represent underpinning technologies that have a wider impact on all application areas of robotics. For the next two challenges, we have included social robotics and medical robotics as application-specific areas of development to highlight the substantial societal and health impacts that they will bring. Finally, the last challenge is related to responsible innovation and how ethics and security should be carefully considered as we develop the technology further.

Posted Content
TL;DR: LEAF is proposed, a modular benchmarking framework for learning in federated settings that includes a suite of open-source federated datasets, a rigorous evaluation framework, and a set of reference implementations, all geared towards capturing the obstacles and intricacies of practical federated environments.
Abstract: Modern federated networks, such as those comprised of wearable devices, mobile phones, or autonomous vehicles, generate massive amounts of data each day. This wealth of data can help to learn models that can improve the user experience on each device. However, the scale and heterogeneity of federated data presents new challenges in research areas such as federated learning, meta-learning, and multi-task learning. As the machine learning community begins to tackle these challenges, we are at a critical time to ensure that developments made in these areas are grounded with realistic benchmarks. To this end, we propose LEAF, a modular benchmarking framework for learning in federated settings. LEAF includes a suite of open-source federated datasets, a rigorous evaluation framework, and a set of reference implementations, all geared towards capturing the obstacles and intricacies of practical federated environments.

Book ChapterDOI
08 Sep 2018
TL;DR: The proposed graph representation achieves state-of-the-art results on the Charades and Something-Something datasets and obtains a huge gain when the model is applied in complex environments.
Abstract: How do humans recognize the action “opening a book”? We argue that there are two important cues: modeling temporal shape dynamics and modeling functional relationships between humans and objects. In this paper, we propose to represent videos as space-time region graphs which capture these two important cues. Our graph nodes are defined by the object region proposals from different frames in a long range video. These nodes are connected by two types of relations: (i) similarity relations capturing the long range dependencies between correlated objects and (ii) spatial-temporal relations capturing the interactions between nearby objects. We perform reasoning on this graph representation via Graph Convolutional Networks. We achieve state-of-the-art results on the Charades and Something-Something datasets. Especially for Charades with complex environments, we obtain a huge \(4.4\%\) gain when our model is applied in complex environments.

Proceedings ArticleDOI
06 Mar 2018
TL;DR: The authors showed that a simple text categorization model can correctly classify the hypothesis alone in about 67% of SNLI and 53% of MultiNLI, showing that specific linguistic phenomena such as negation and vagueness are highly correlated with certain inference classes.
Abstract: Large-scale datasets for natural language inference are created by presenting crowd workers with a sentence (premise), and asking them to generate three new sentences (hypotheses) that it entails, contradicts, or is logically neutral with respect to. We show that, in a significant portion of such data, this protocol leaves clues that make it possible to identify the label by looking only at the hypothesis, without observing the premise. Specifically, we show that a simple text categorization model can correctly classify the hypothesis alone in about 67% of SNLI (Bowman et. al, 2015) and 53% of MultiNLI (Williams et. al, 2017). Our analysis reveals that specific linguistic phenomena such as negation and vagueness are highly correlated with certain inference classes. Our findings suggest that the success of natural language inference models to date has been overestimated, and that the task remains a hard open problem.

Proceedings ArticleDOI
14 Dec 2018
TL;DR: FoldingNet as discussed by the authors proposes an end-to-end deep auto-encoder to address unsupervised learning challenges on point clouds, where a folding-based decoder deforms a canonical 2D grid onto the underlying 3D object surface of a point cloud.
Abstract: Recent deep networks that directly handle points in a point set, e.g., PointNet, have been state-of-the-art for supervised learning tasks on point clouds such as classification and segmentation. In this work, a novel end-to-end deep auto-encoder is proposed to address unsupervised learning challenges on point clouds. On the encoder side, a graph-based enhancement is enforced to promote local structures on top of PointNet. Then, a novel folding-based decoder deforms a canonical 2D grid onto the underlying 3D object surface of a point cloud, achieving low reconstruction errors even for objects with delicate structures. The proposed decoder only uses about 7% parameters of a decoder with fully-connected neural networks, yet leads to a more discriminative representation that achieves higher linear SVM classification accuracy than the benchmark. In addition, the proposed decoder structure is shown, in theory, to be a generic architecture that is able to reconstruct an arbitrary point cloud from a 2D grid. Our code is available at http://www.merl.com/research/license#FoldingNet