Showing papers by "Carnegie Mellon University published in 2018"

PDF

Open Access

Proceedings Article•DOI•

[...]

Xiaolong Wang¹, Ross Girshick¹, Abhinav Gupta², Kaiming He¹•Institutions (2)

18 Jun 2018

TL;DR: In this article, the non-local operation computes the response at a position as a weighted sum of the features at all positions, which can be used to capture long-range dependencies.

...read moreread less

Abstract: Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time. In this paper, we present non-local operations as a generic family of building blocks for capturing long-range dependencies. Inspired by the classical non-local means method [4] in computer vision, our non-local operation computes the response at a position as a weighted sum of the features at all positions. This building block can be plugged into many computer vision architectures. On the task of video classification, even without any bells and whistles, our nonlocal models can compete or outperform current competition winners on both Kinetics and Charades datasets. In static image recognition, our non-local models improve object detection/segmentation and pose estimation on the COCO suite of tasks. Code will be made available.

...read moreread less

8,059 citations

Proceedings Article•

DARTS: Differentiable Architecture Search

[...]

Hanxiao Liu¹, Karen Simonyan², Yiming Yang¹•Institutions (2)

Carnegie Mellon University¹, Google²

24 Jun 2018

TL;DR: The proposed algorithm excels in discovering high-performance convolutional architectures for image classification and recurrent architectures for language modeling, while being orders of magnitude faster than state-of-the-art non-differentiable techniques.

...read moreread less

Abstract: This paper addresses the scalability challenge of architecture search by formulating the task in a differentiable manner. Unlike conventional approaches of applying evolution or reinforcement learning over a discrete and non-differentiable search space, our method is based on the continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent. Extensive experiments on CIFAR-10, ImageNet, Penn Treebank and WikiText-2 show that our algorithm excels in discovering high-performance convolutional architectures for image classification and recurrent architectures for language modeling, while being orders of magnitude faster than state-of-the-art non-differentiable techniques. Our implementation has been made publicly available to facilitate further research on efficient architecture search algorithms.

...read moreread less

2,466 citations

Journal Article•DOI•

Carbon capture and storage (CCS): the way forward

[...]

Mai Bui¹, Claire S. Adjiman¹, André Bardow², Edward J. Anthony³, Andy Boston⁴, Solomon Brown⁵, Paul S. Fennell¹, Sabine Fuss, Amparo Galindo¹, Leigh A. Hackett, Jason P. Hallett¹, Howard J. Herzog⁶, George Jackson¹, Jasmin Kemper, Samuel Krevor¹, Geoffrey C. Maitland¹, Michael Matuszewski⁷, Ian S. Metcalfe⁸, Camille Petit¹, Graeme Puxty⁹, Jeffrey A. Reimer¹⁰, David Reiner¹¹, Edward S. Rubin¹², Stuart A. Scott¹¹, Nilay Shah¹, Berend Smit¹⁰, Berend Smit¹³, J. P. Martin Trusler¹, Paul A. Webley¹⁴, Jennifer Wilcox¹⁵, Niall Mac Dowell¹ - Show less +27 more•Institutions (15)

Imperial College London¹, RWTH Aachen University², Cranfield University³, Loughborough University⁴, University of Sheffield⁵, Massachusetts Institute of Technology⁶, United States Department of Energy⁷, Newcastle University⁸, Commonwealth Scientific and Industrial Research Organisation⁹, University of California, Berkeley¹⁰, University of Cambridge¹¹, Carnegie Mellon University¹², École Polytechnique Fédérale de Lausanne¹³, University of Melbourne¹⁴, Colorado School of Mines¹⁵

16 May 2018-Energy and Environmental Science

TL;DR: In this article, the authors review the current state-of-the-art of CO2 capture, transport, utilisation and storage from a multi-scale perspective, moving from the global to molecular scales.

...read moreread less

Abstract: Carbon capture and storage (CCS) is broadly recognised as having the potential to play a key role in meeting climate change targets, delivering low carbon heat and power, decarbonising industry and, more recently, its ability to facilitate the net removal of CO2 from the atmosphere. However, despite this broad consensus and its technical maturity, CCS has not yet been deployed on a scale commensurate with the ambitions articulated a decade ago. Thus, in this paper we review the current state-of-the-art of CO2 capture, transport, utilisation and storage from a multi-scale perspective, moving from the global to molecular scales. In light of the COP21 commitments to limit warming to less than 2 °C, we extend the remit of this study to include the key negative emissions technologies (NETs) of bioenergy with CCS (BECCS), and direct air capture (DAC). Cognisant of the non-technical barriers to deploying CCS, we reflect on recent experience from the UK's CCS commercialisation programme and consider the commercial and political barriers to the large-scale deployment of CCS. In all areas, we focus on identifying and clearly articulating the key research challenges that could usefully be addressed in the coming decade.

...read moreread less

2,088 citations

Journal Article•DOI•

Opportunities and obstacles for deep learning in biology and medicine.

[...]

Travers Ching¹, Daniel Himmelstein², Brett K. Beaulieu-Jones², Alexandr A. Kalinin³, Brian T. Do⁴, Gregory P. Way², Enrico Ferrero⁵, Paul-Michael Agapow⁶, Michael Zietz², Michael M. Hoffman⁷, Michael M. Hoffman⁸, Wei Xie⁹, Gail L. Rosen¹⁰, Benjamin J. Lengerich¹¹, Johnny Israeli¹², Jack Lanchantin¹³, Stephen Woloszynek¹⁰, Anne E. Carpenter¹⁴, Avanti Shrikumar¹², Jinbo Xu¹⁵, Evan M. Cofer¹⁶, Evan M. Cofer¹⁷, Christopher A. Lavender¹⁸, Srinivas C. Turaga¹⁹, Amr Alexandari¹², Zhiyong Lu¹⁸, David J. Harris²⁰, Dave DeCaprio, Yanjun Qi¹³, Anshul Kundaje¹², Yifan Peng¹⁸, Laura K. Wiley²¹, Marwin H. S. Segler²², Simina M. Boca²³, S. Joshua Swamidass²⁴, Austin Huang²⁵, Anthony Gitter²⁶, Anthony Gitter²⁷, Casey S. Greene² - Show less +35 more•Institutions (27)

01 Apr 2018-Journal of the Royal Society Interface

TL;DR: It is found that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art.

...read moreread less

Abstract: Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems-patient classification, fundamental biological processes and treatment of patients-and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural network's prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine.

...read moreread less

1,491 citations

Proceedings Article•DOI•

Understanding Convolution for Semantic Segmentation

[...]

Panqu Wang, Pengfei Chen, Ye Yuan¹, Ding Liu², Zehua Huang, Xiaodi Hou, Garrison W. Cottrell³ - Show less +3 more•Institutions (3)

Carnegie Mellon University¹, University of Illinois at Urbana–Champaign², University of California, San Diego³

12 Mar 2018

TL;DR: DUC is designed to generate pixel-level prediction, which is able to capture and decode more detailed information that is generally missing in bilinear upsampling, and a hybrid dilated convolution (HDC) framework in the encoding phase is proposed.

...read moreread less

Abstract: Recent advances in deep learning, especially deep convolutional neural networks (CNNs), have led to significant improvement over previous semantic segmentation systems. Here we show how to improve pixel-wise semantic segmentation by manipulating convolution-related operations that are of both theoretical and practical value. First, we design dense upsampling convolution (DUC) to generate pixel-level prediction, which is able to capture and decode more detailed information that is generally missing in bilinear upsampling. Second, we propose a hybrid dilated convolution (HDC) framework in the encoding phase. This framework 1) effectively enlarges the receptive fields (RF) of the network to aggregate global information; 2) alleviates what we call the "gridding issue"caused by the standard dilated convolution operation. We evaluate our approaches thoroughly on the Cityscapes dataset, and achieve a state-of-art result of 80.1% mIOU in the test set at the time of submission. We also have achieved state-of-theart overall on the KITTI road estimation benchmark and the PASCAL VOC2012 segmentation task. Our source code can be found at https://github.com/TuSimple/TuSimple-DUC.

...read moreread less

1,358 citations

Journal Article•DOI•

The mythos of model interpretability

[...]

Zachary C. Lipton¹•Institutions (1)

Carnegie Mellon University¹

26 Sep 2018-Communications of The ACM

TL;DR: In machine learning, the concept of interpretability is both important and slippery, so it is important to understand how these concepts can be modified.

...read moreread less

Abstract: Supervised machine-learning models boast remarkable predictive capabilities. But can you trust your model? Will it work in deployment? What else can it tell you about the world?

...read moreread less

1,307 citations

Journal Article•DOI•

Graph Signal Processing: Overview, Challenges, and Applications

[...]

Antonio Ortega¹, Pascal Frossard², Jelena Kovacevic³, Jose M. F. Moura³, Pierre Vandergheynst² - Show less +1 more•Institutions (3)

University of Southern California¹, École Polytechnique Fédérale de Lausanne², Carnegie Mellon University³

25 Apr 2018

TL;DR: An overview of core ideas in GSP and their connection to conventional digital signal processing are provided, along with a brief historical perspective to highlight how concepts recently developed build on top of prior research in other areas.

...read moreread less

Abstract: Research in graph signal processing (GSP) aims to develop tools for processing data defined on irregular graph domains. In this paper, we first provide an overview of core ideas in GSP and their connection to conventional digital signal processing, along with a brief historical perspective to highlight how concepts recently developed in GSP build on top of prior research in other areas. We then summarize recent advances in developing basic GSP tools, including methods for sampling, filtering, or graph learning. Next, we review progress in several application areas using GSP, including processing and analysis of sensor network data, biological data, and applications to image processing and machine learning.

...read moreread less

1,306 citations

Posted Content•

DARTS: Differentiable Architecture Search

[...]

Hanxiao Liu¹, Karen Simonyan², Yiming Yang¹•Institutions (2)

Carnegie Mellon University¹, Google²

24 Jun 2018-arXiv: Learning

TL;DR: In this article, the authors propose a differentiable architecture search algorithm based on the continuous relaxation of the architecture representation. But the architecture search is not a discrete and non-differentiable search space.

...read moreread less

1,272 citations

Journal Article•DOI•

The Mythos of Model Interpretability: In machine learning, the concept of interpretability is both important and slippery.

[...]

Zachary C. Lipton¹•Institutions (1)

Carnegie Mellon University¹

01 Jun 2018-ACM Queue

TL;DR: In this article, the authors ask whether or not a supervised machine learning model will work in deployment, and what else can it tell you about the world, besides its predictive capabilities.

...read moreread less

Abstract: Supervised machine-learning models boast remarkable predictive capabilities. But can you trust your model? Will it work in deployment? What else can it tell you about the world?

...read moreread less

1,197 citations

Posted Content•DOI•

Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge

[...]

Spyridon Bakas¹, Mauricio Reyes, Andras Jakab², Stefan Bauer³ +435 more•Institutions (111)

05 Nov 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This study assesses the state-of-the-art machine learning methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018, and investigates the challenge of identifying the best ML algorithms for each of these tasks.

...read moreread less

Abstract: Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multi-parametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumoris a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses thestate-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that underwent gross tota lresection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset.

...read moreread less

1,165 citations

Proceedings Article•

On the Convergence of Adam and Beyond

[...]

Sashank J. Reddi¹, Satyen Kale², Sanjiv Kumar²•Institutions (2)

Carnegie Mellon University¹, Google²

15 Feb 2018

TL;DR: It is shown that one cause for such failures is the exponential moving average used in the algorithms, and suggested that the convergence issues can be fixed by endowing such algorithms with `long-term memory' of past gradients.

...read moreread less

Abstract: Several recently proposed stochastic optimization methods that have been successfully used in training deep networks such as RMSProp, Adam, Adadelta, Nadam are based on using gradient updates scaled by square roots of exponential moving averages of squared past gradients. In many applications, e.g. learning with large output spaces, it has been empirically observed that these algorithms fail to converge to an optimal solution (or a critical point in nonconvex settings). We show that one cause for such failures is the exponential moving average used in the algorithms. We provide an explicit example of a simple convex optimization setting where Adam does not converge to the optimal solution, and describe the precise problems with the previous analysis of Adam algorithm. Our analysis suggests that the convergence issues can be fixed by endowing such algorithms with `long-term memory' of past gradients, and propose new variants of the Adam algorithm which not only fix the convergence issues but often also lead to improved empirical performance.

...read moreread less

Proceedings Article•DOI•

OpenFace 2.0: Facial Behavior Analysis Toolkit

[...]

Tadas Baltrusaitis¹, Amir Zadeh², Yao Chong Lim², Louis-Philippe Morency²•Institutions (2)

Microsoft¹, Carnegie Mellon University²

15 May 2018

TL;DR: OpenFace 2.0 is an extension of OpenFace toolkit and is capable of more accurate facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

...read moreread less

Abstract: Over the past few years, there has been an increased interest in automatic facial behavior analysis and understanding. We present OpenFace 2.0 - a tool intended for computer vision and machine learning researchers, affective computing community and people interested in building interactive applications based on facial behavior analysis. OpenFace 2.0 is an extension of OpenFace toolkit and is capable of more accurate facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation. The computer vision algorithms which represent the core of OpenFace 2.0 demonstrate state-of-the-art results in all of the above mentioned tasks. Furthermore, our tool is capable of real-time performance and is able to run from a simple webcam without any specialist hardware. Finally, unlike a lot of modern approaches or toolkits, OpenFace 2.0 source code for training models and running them is freely available for research purposes.

...read moreread less

Proceedings Article•

Efficient Neural Architecture Search via Parameters Sharing

[...]

Hieu Pham¹, Melody Y. Guan², Barret Zoph³, Quoc V. Le³, Jeffrey Dean³ - Show less +1 more•Institutions (3)

Carnegie Mellon University¹, Stanford University², Google³

03 Jul 2018

Book Chapter•DOI•

AMC: AutoML for Model Compression and Acceleration on Mobile Devices

[...]

Yihui He¹, Ji Lin², Zhijian Liu², Hanrui Wang², Li-Jia Li³, Song Han² - Show less +2 more•Institutions (3)

Carnegie Mellon University¹, Massachusetts Institute of Technology², Google³

08 Sep 2018

TL;DR: This paper proposes AutoML for Model Compression (AMC) which leverages reinforcement learning to efficiently sample the design space and can improve the model compression quality and achieves state-of-the-art model compression results in a fully automated way without any human efforts.

...read moreread less

Abstract: Model compression is an effective technique to efficiently deploy neural network models on mobile devices which have limited computation resources and tight power budgets. Conventional model compression techniques rely on hand-crafted features and require domain experts to explore the large design space trading off among model size, speed, and accuracy, which is usually sub-optimal and time-consuming. In this paper, we propose AutoML for Model Compression (AMC) which leverages reinforcement learning to efficiently sample the design space and can improve the model compression quality. We achieved state-of-the-art model compression results in a fully automated way without any human efforts. Under 4\(\times \) FLOPs reduction, we achieved 2.7% better accuracy than the hand-crafted model compression method for VGG-16 on ImageNet. We applied this automated, push-the-button compression pipeline to MobileNet-V1 and achieved a speedup of 1.53\(\times \) on the GPU (Titan Xp) and 1.95\(\times \) on an Android phone (Google Pixel 1), with negligible loss of accuracy.

...read moreread less

Proceedings Article•DOI•

PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes

[...]

Yu Xiang¹, Tanner Schmidt², Venkatraman Narayanan³, Dieter Fox²•Institutions (3)

Nvidia¹, University of Washington², Carnegie Mellon University³

26 Jun 2018

TL;DR: PoseCNN as discussed by the authors estimates the 3D translation of an object by localizing its center in the image and predicting its distance from the camera, and regresses to a quaternion representation.

...read moreread less

Abstract: Estimating the 6D pose of known objects is important for robots to interact with the real world. The problem is challenging due to the variety of objects as well as the complexity of a scene caused by clutter and occlusions between objects. In this work, we introduce PoseCNN, a new Convolutional Neural Network for 6D object pose estimation. PoseCNN estimates the 3D translation of an object by localizing its center in the image and predicting its distance from the camera. The 3D rotation of the object is estimated by regressing to a quaternion representation. We also introduce a novel loss function that enables PoseCNN to handle symmetric objects. In addition, we contribute a large scale video dataset for 6D object pose estimation named the YCB-Video dataset. Our dataset provides accurate 6D poses of 21 objects from the YCB dataset observed in 92 videos with 133,827 frames. We conduct extensive experiments on our YCB-Video dataset and the OccludedLINEMOD dataset to show that PoseCNN is highly robust to occlusions, can handle symmetric objects, and provide accurate pose estimation using only color images as input. When using depth data to further refine the poses, our approach achieves state-of-the-art results on the challenging OccludedLINEMOD dataset. Our code and dataset are available at this https URL.

...read moreread less

Journal Article•DOI•

Electrical control of 2D magnetism in bilayer CrI 3 .

[...]

Bevin Huang¹, Genevieve Clark¹, Dahlia R. Klein², David MacNeill², Efrén Navarro-Moratalla³, Kyle L. Seyler¹, Nathan P. Wilson¹, Michael A. McGuire⁴, David Cobden¹, Di Xiao⁵, Wang Yao⁶, Pablo Jarillo-Herrero², Xiaodong Xu¹ - Show less +9 more•Institutions (6)

University of Washington¹, Massachusetts Institute of Technology², University of Valencia³, Oak Ridge National Laboratory⁴, Carnegie Mellon University⁵, University of Hong Kong⁶

23 Apr 2018-Nature Nanotechnology

TL;DR: Electrical control of magnetism in a bilayer of CrI3 enables the realization of an electrically driven magnetic phase transition and the observation of the magneto-optical Kerr effect in 2D magnets.

...read moreread less

Abstract: Controlling magnetism via electric fields addresses fundamental questions of magnetic phenomena and phase transitions1–3, and enables the development of electrically coupled spintronic devices, such as voltage-controlled magnetic memories with low operation energy4–6. Previous studies on dilute magnetic semiconductors such as (Ga,Mn)As and (In,Mn)Sb have demonstrated large modulations of the Curie temperatures and coercive fields by altering the magnetic anisotropy and exchange interaction2,4,7–9. Owing to their unique magnetic properties10–14, the recently reported two-dimensional magnets provide a new system for studying these features15–19. For instance, a bilayer of chromium triiodide (CrI3) behaves as a layered antiferromagnet with a magnetic field-driven metamagnetic transition15,16. Here, we demonstrate electrostatic gate control of magnetism in CrI3 bilayers, probed by magneto-optical Kerr effect (MOKE) microscopy. At fixed magnetic fields near the metamagnetic transition, we realize voltage-controlled switching between antiferromagnetic and ferromagnetic states. At zero magnetic field, we demonstrate a time-reversal pair of layered antiferromagnetic states that exhibit spin-layer locking, leading to a linear dependence of their MOKE signals on gate voltage with opposite slopes. Our results allow for the exploration of new magnetoelectric phenomena and van der Waals spintronics based on 2D materials.

...read moreread less

Posted Content•

OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

[...]

Zhe Cao¹, Gines Hidalgo², Tomas Simon³, Shih-En Wei³, Yaser Sheikh² - Show less +1 more•Institutions (3)

University of California, Berkeley¹, Carnegie Mellon University², Facebook³

18 Dec 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: OpenPose is released, the first open-source realtime system for multi-person 2D pose detection, including body, foot, hand, and facial keypoints, and the first combined body and foot keypoint detector, based on an internal annotated foot dataset.

...read moreread less

Abstract: Realtime multi-person 2D pose estimation is a key component in enabling machines to have an understanding of people in images and videos. In this work, we present a realtime approach to detect the 2D pose of multiple people in an image. The proposed method uses a nonparametric representation, which we refer to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image. This bottom-up system achieves high accuracy and realtime performance, regardless of the number of people in the image. In previous work, PAFs and body part location estimation were refined simultaneously across training stages. We demonstrate that a PAF-only refinement rather than both PAF and body part location refinement results in a substantial increase in both runtime performance and accuracy. We also present the first combined body and foot keypoint detector, based on an internal annotated foot dataset that we have publicly released. We show that the combined detector not only reduces the inference time compared to running them sequentially, but also maintains the accuracy of each component individually. This work has culminated in the release of OpenPose, the first open-source realtime system for multi-person 2D pose detection, including body, foot, hand, and facial keypoints.

...read moreread less

Journal Article•DOI•

Net-zero emissions energy systems

[...]

Steven J. Davis¹, Nathan S. Lewis², Matthew R. Shaner³, Sonia Aggarwal, Douglas J. Arent⁴, Douglas J. Arent⁵, Inês Azevedo⁶, Sally M. Benson⁷, Thomas H. Bradley⁸, Jack Brouwer¹, Yet-Ming Chiang⁹, Christopher T. M. Clack, Armond Cohen, Stephen J. Doig¹⁰, Jae Edmonds, Paul S. Fennell¹¹, Paul S. Fennell¹², Christopher B. Field⁷, Bryan Hannegan¹³, Bri-Mathias Hodge¹⁴, Bri-Mathias Hodge⁵, Bri-Mathias Hodge¹⁵, Martin I. Hoffert¹⁶, Eric Ingersoll, Paulina Jaramillo⁶, Klaus S. Lackner¹⁷, Katharine J. Mach⁷, Michael D. Mastrandrea³, Joan M. Ogden¹⁸, Per F. Peterson¹⁹, Daniel L. Sanchez³, Daniel Sperling¹⁸, Joseph Stagner⁷, Jessika E. Trancik²⁰, Jessika E. Trancik⁹, Chi Jen Yang, Ken Caldeira³ - Show less +33 more•Institutions (20)

29 Jun 2018-Science

TL;DR: In this paper, the authors examine barriers and opportunities associated with these difficult-to-decarbonize services and processes, including possible technological solutions and research and development priorities, and examine the use of existing technologies to meet future demands for these services without net addition of CO2 to the atmosphere.

...read moreread less

Abstract: Some energy services and industrial processes-such as long-distance freight transport, air travel, highly reliable electricity, and steel and cement manufacturing-are particularly difficult to provide without adding carbon dioxide (CO2) to the atmosphere. Rapidly growing demand for these services, combined with long lead times for technology development and long lifetimes of energy infrastructure, make decarbonization of these services both essential and urgent. We examine barriers and opportunities associated with these difficult-to-decarbonize services and processes, including possible technological solutions and research and development priorities. A range of existing technologies could meet future demands for these services and processes without net addition of CO2 to the atmosphere, but their use may depend on a combination of cost reductions via research and innovation, as well as coordinated deployment and integration of operations across currently discrete energy industries.

...read moreread less

Proceedings Article•

Variational image compression with a scale hyperprior

[...]

Johannes Ballé¹, David Minnen¹, Saurabh Singh², Sung Jin Hwang¹, Nick Johnston¹ - Show less +1 more•Institutions (2)

Google¹, Carnegie Mellon University²

01 Feb 2018

TL;DR: In this paper, an end-to-end trainable model for image compression based on variational autoencoders is proposed, which incorporates a hyperprior to effectively capture spatial dependencies in the latent representation.

...read moreread less

Abstract: We describe an end-to-end trainable model for image compression based on variational autoencoders. The model incorporates a hyperprior to effectively capture spatial dependencies in the latent representation. This hyperprior relates to side information, a concept universal to virtually all modern image codecs, but largely unexplored in image compression using artificial neural networks (ANNs). Unlike existing autoencoder compression methods, our model trains a complex prior jointly with the underlying autoencoder. We demonstrate that this model leads to state-of-the-art image compression when measuring visual quality using the popular MS-SSIM index, and yields rate-distortion performance surpassing published ANN-based methods when evaluated using a more traditional metric based on squared error (PSNR). Furthermore, we provide a qualitative comparison of models trained for different distortion metrics.

...read moreread less

Journal Article•DOI•

Two-dimensional itinerant ferromagnetism in atomically thin Fe3GeTe2.

[...]

Zaiyao Fei¹, Bevin Huang¹, Paul Malinowski¹, Wenbo Wang², Tiancheng Song¹, Joshua Sanchez¹, Wang Yao³, Di Xiao⁴, Xiaoyang Zhu⁵, Andrew F. May⁶, Weida Wu², David Cobden¹, Jiun-Haw Chu¹, Xiaodong Xu¹ - Show less +10 more•Institutions (6)

University of Washington¹, Rutgers University², University of Hong Kong³, Carnegie Mellon University⁴, Columbia University⁵, Oak Ridge National Laboratory⁶

13 Aug 2018-Nature Materials

TL;DR: It is demonstrated that Fe3GeTe2 (FGT), an exfoliable vdW magnet, exhibits robust 2D ferromagnetism with strong perpendicular anisotropy when thinned down to a monolayer.

...read moreread less

Abstract: Discoveries of intrinsic two-dimensional (2D) ferromagnetism in van der Waals (vdW) crystals provide an interesting arena for studying fundamental 2D magnetism and devices that employ localized spins1–4. However, an exfoliable vdW material that exhibits intrinsic 2D itinerant magnetism remains elusive. Here we demonstrate that Fe3GeTe2 (FGT), an exfoliable vdW magnet, exhibits robust 2D ferromagnetism with strong perpendicular anisotropy when thinned down to a monolayer. Layer-number-dependent studies reveal a crossover from 3D to 2D Ising ferromagnetism for thicknesses less than 4 nm (five layers), accompanied by a fast drop of the Curie temperature (TC) from 207 K to 130 K in the monolayer. For FGT flakes thicker than ~15 nm, a distinct magnetic behaviour emerges in an intermediate temperature range, which we show is due to the formation of labyrinthine domain patterns. Our work introduces an atomically thin ferromagnetic metal that could be useful for the study of controllable 2D itinerant ferromagnetism and for engineering spintronic vdW heterostructures5. Metallic ferromagnetism is reported in an exfoliated monolayer of the van der Waals material Fe3GeTe2.

...read moreread less

Proceedings Article•DOI•

Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks

[...]

Guokun Lai¹, Wei-Cheng Chang¹, Yiming Yang¹, Hanxiao Liu¹•Institutions (1)

Carnegie Mellon University¹

27 Jun 2018

TL;DR: A novel deep learning framework, namely Long- and Short-term Time-series network (LSTNet), to address this open challenge of multivariate time series forecasting, using the Convolution Neural Network and the Recurrent Neural Network to extract short-term local dependency patterns among variables and to discover long-term patterns for time series trends.

...read moreread less

Abstract: Multivariate time series forecasting is an important machine learning problem across many domains, including predictions of solar plant energy output, electricity consumption, and traffic jam situation. Temporal data arise in these real-world applications often involves a mixture of long-term and short-term patterns, for which traditional approaches such as Autoregressive models and Gaussian Process may fail. In this paper, we proposed a novel deep learning framework, namely Long- and Short-term Time-series network (LSTNet), to address this open challenge. LSTNet uses the Convolution Neural Network (CNN) and the Recurrent Neural Network (RNN) to extract short-term local dependency patterns among variables and to discover long-term patterns for time series trends. Furthermore, we leverage traditional autoregressive model to tackle the scale insensitive problem of the neural network model. In our evaluation on real-world data with complex mixtures of repetitive patterns, LSTNet achieved significant performance improvements over that of several state-of-the-art baseline methods. All the data and experiment codes are available online.

...read moreread less

Proceedings Article•DOI•

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

[...]

Zhilin Yang¹, Peng Qi², Saizheng Zhang³, Yoshua Bengio³, William W. Cohen⁴, Ruslan Salakhutdinov¹, Christopher D. Manning² - Show less +3 more•Institutions (4)

Carnegie Mellon University¹, Stanford University², Université de Montréal³, Google⁴

25 Sep 2018

TL;DR: HotpotQA as discussed by the authors is a dataset with 113k Wikipedia-based question-answer pairs with four key features: finding and reasoning over multiple supporting documents to answer; the questions are diverse and not constrained to any pre-existing knowledge bases or knowledge schemas; providing sentence-level supporting facts required for reasoning; and offering a new type of factoid comparison questions to test QA systems' ability to extract relevant facts and perform necessary comparison.

...read moreread less

Abstract: Existing question answering (QA) datasets fail to train QA systems to perform complex reasoning and provide explanations for answers We introduce HotpotQA, a new dataset with 113k Wikipedia-based question-answer pairs with four key features: (1) the questions require finding and reasoning over multiple supporting documents to answer; (2) the questions are diverse and not constrained to any pre-existing knowledge bases or knowledge schemas; (3) we provide sentence-level supporting facts required for reasoning, allowing QA systems to reason with strong supervision and explain the predictions; (4) we offer a new type of factoid comparison questions to test QA systems’ ability to extract relevant facts and perform necessary comparison We show that HotpotQA is challenging for the latest QA systems, and the supporting facts enable models to improve performance and make explainable predictions

...read moreread less

Proceedings Article•DOI•

3D Semantic Segmentation with Submanifold Sparse Convolutional Networks

[...]

Benjamin Graham¹, Martin Engelcke², Laurens van der Maaten³•Institutions (3)

Facebook¹, University of Oxford², Carnegie Mellon University³

18 Jun 2018

TL;DR: This work introduces new sparse convolutional operations that are designed to process spatially-sparse data more efficiently, and uses them to develop Spatially-Sparse Convolutional networks, which outperform all prior state-of-the-art models on two tasks involving semantic segmentation of 3D point clouds.

...read moreread less

Abstract: Convolutional networks are the de-facto standard for analyzing spatio-temporal data such as images, videos, and 3D shapes. Whilst some of this data is naturally dense (e.g., photos), many other data sources are inherently sparse. Examples include 3D point clouds that were obtained using a LiDAR scanner or RGB-D camera. Standard "dense" implementations of convolutional networks are very inefficient when applied on such sparse data. We introduce new sparse convolutional operations that are designed to process spatially-sparse data more efficiently, and use them to develop spatially-sparse convolutional networks. We demonstrate the strong performance of the resulting models, called submanifold sparse convolutional networks (SS-CNs), on two tasks involving semantic segmentation of 3D point clouds. In particular, our models outperform all prior state-of-the-art on the test set of a recent semantic segmentation competition.

...read moreread less

Journal Article•DOI•

Giant tunneling magnetoresistance in spin-filter van der Waals heterostructures.

[...]

Tiancheng Song¹, Xinghan Cai¹, Matisse Wei-Yuan Tu², Xiaoou Zhang³, Bevin Huang¹, Nathan P. Wilson¹, Kyle L. Seyler¹, Lin Zhu¹, Takashi Taniguchi⁴, Kenji Watanabe⁴, Michael A. McGuire⁵, David Cobden¹, Di Xiao³, Wang Yao², Xiaodong Xu¹ - Show less +11 more•Institutions (5)

University of Washington¹, University of Hong Kong², Carnegie Mellon University³, National Institute for Materials Science⁴, Oak Ridge National Laboratory⁵

15 Jun 2018-Science

TL;DR: The possibility to push magnetic information storage to the atomically thin limit and CrI3 as a superlative magnetic tunnel barrier for vdW heterostructure spintronic devices is revealed.

...read moreread less

Abstract: Magnetic multilayer devices that exploit magnetoresistance are the backbone of magnetic sensing and data storage technologies. Here, we report multiple-spin-filter magnetic tunnel junctions (sf-MTJs) based on van der Waals (vdW) heterostructures in which atomically thin chromium triiodide (CrI3) acts as a spin-filter tunnel barrier sandwiched between graphene contacts. We demonstrate tunneling magnetoresistance that is drastically enhanced with increasing CrI3 layer thickness, reaching a record 19,000% for magnetic multilayer structures using four-layer sf-MTJs at low temperatures. Using magnetic circular dichroism measurements, we attribute these effects to the intrinsic layer-by-layer antiferromagnetic ordering of the atomically thin CrI3 Our work reveals the possibility to push magnetic information storage to the atomically thin limit and highlights CrI3 as a superlative magnetic tunnel barrier for vdW heterostructure spintronic devices.

...read moreread less

Posted Content•

Meta-Learning with Latent Embedding Optimization

[...]

Andrei Rusu¹, Dushyant Rao², Jakub Sygnowski¹, Oriol Vinyals¹, Razvan Pascanu¹, Simon Osindero¹, Raia Hadsell¹ - Show less +3 more•Institutions (2)

Google¹, Carnegie Mellon University²

16 Jul 2018-arXiv: Learning

TL;DR: In this article, a data-dependent latent generative representation of model parameters is learned and a gradient-based meta-learning is performed in a low-dimensional latent space for few-shot learning.

...read moreread less

Abstract: Gradient-based meta-learning techniques are both widely applicable and proficient at solving challenging few-shot learning and fast adaptation problems. However, they have practical difficulties when operating on high-dimensional parameter spaces in extreme low-data regimes. We show that it is possible to bypass these limitations by learning a data-dependent latent generative representation of model parameters, and performing gradient-based meta-learning in this low-dimensional latent space. The resulting approach, latent embedding optimization (LEO), decouples the gradient-based adaptation procedure from the underlying high-dimensional space of model parameters. Our evaluation shows that LEO can achieve state-of-the-art performance on the competitive miniImageNet and tieredImageNet few-shot classification tasks. Further analysis indicates LEO is able to capture uncertainty in the data, and can perform adaptation more effectively by optimizing in latent space.

...read moreread less

Journal Article•DOI•

The grand challenges of Science Robotics

[...]

Guang-Zhong Yang¹, James G. Bellingham², Pierre E. Dupont³, Peer Fischer⁴, Peer Fischer⁵, Luciano Floridi, Robert J. Full⁶, Neil Jacobstein⁷, Neil Jacobstein⁸, Vijay Kumar⁹, Marcia McNutt¹⁰, Robert Merrifield¹, Bradley J. Nelson¹¹, Brian Scassellati¹², Mariarosaria Taddeo¹³, Mariarosaria Taddeo¹⁴, Russell H. Taylor¹⁵, Manuela Veloso¹⁶, Zhong Lin Wang¹⁷, Robert J. Wood¹⁸, Robert J. Wood¹⁹ - Show less +17 more•Institutions (19)

Imperial College London¹, Woods Hole Oceanographic Institution², Boston Children's Hospital³, University of Stuttgart⁴, Max Planck Society⁵, University of California, Berkeley⁶, NASA Research Park⁷, Stanford University⁸, University of Pennsylvania⁹, National Academy of Sciences¹⁰, ETH Zurich¹¹, Yale University¹², University of Oxford¹³, The Turing Institute¹⁴, Johns Hopkins University¹⁵, Carnegie Mellon University¹⁶, Georgia Institute of Technology¹⁷, Wyss Institute for Biologically Inspired Engineering¹⁸, Harvard University¹⁹

31 Jan 2018

TL;DR: These 10 grand challenges may have major breakthroughs, research, and/or socioeconomic impacts in the next 5 to 10 years and represent underpinning technologies that have a wider impact on all application areas of robotics.

...read moreread less

Abstract: One of the ambitions of Science Robotics is to deeply root robotics research in science while developing novel robotic platforms that will enable new scientific discoveries. Of our 10 grand challenges, the first 7 represent underpinning technologies that have a wider impact on all application areas of robotics. For the next two challenges, we have included social robotics and medical robotics as application-specific areas of development to highlight the substantial societal and health impacts that they will bring. Finally, the last challenge is related to responsible innovation and how ethics and security should be carefully considered as we develop the technology further.

...read moreread less

Posted Content•

LEAF: A Benchmark for Federated Settings

[...]

Sebastian Caldas¹, Peter Wu, Tian Li, Jakub Konečný, H. Brendan McMahan, Virginia Smith, Ameet Talwalkar - Show less +3 more•Institutions (1)

Carnegie Mellon University¹

03 Dec 2018-arXiv: Learning

TL;DR: LEAF is proposed, a modular benchmarking framework for learning in federated settings that includes a suite of open-source federated datasets, a rigorous evaluation framework, and a set of reference implementations, all geared towards capturing the obstacles and intricacies of practical federated environments.

...read moreread less

Abstract: Modern federated networks, such as those comprised of wearable devices, mobile phones, or autonomous vehicles, generate massive amounts of data each day. This wealth of data can help to learn models that can improve the user experience on each device. However, the scale and heterogeneity of federated data presents new challenges in research areas such as federated learning, meta-learning, and multi-task learning. As the machine learning community begins to tackle these challenges, we are at a critical time to ensure that developments made in these areas are grounded with realistic benchmarks. To this end, we propose LEAF, a modular benchmarking framework for learning in federated settings. LEAF includes a suite of open-source federated datasets, a rigorous evaluation framework, and a set of reference implementations, all geared towards capturing the obstacles and intricacies of practical federated environments.

...read moreread less

Book Chapter•DOI•

Videos as Space-Time Region Graphs

[...]

Xiaolong Wang¹, Abhinav Gupta¹•Institutions (1)

Carnegie Mellon University¹

08 Sep 2018

TL;DR: The proposed graph representation achieves state-of-the-art results on the Charades and Something-Something datasets and obtains a huge gain when the model is applied in complex environments.

...read moreread less

Abstract: How do humans recognize the action “opening a book”? We argue that there are two important cues: modeling temporal shape dynamics and modeling functional relationships between humans and objects. In this paper, we propose to represent videos as space-time region graphs which capture these two important cues. Our graph nodes are defined by the object region proposals from different frames in a long range video. These nodes are connected by two types of relations: (i) similarity relations capturing the long range dependencies between correlated objects and (ii) spatial-temporal relations capturing the interactions between nearby objects. We perform reasoning on this graph representation via Graph Convolutional Networks. We achieve state-of-the-art results on the Charades and Something-Something datasets. Especially for Charades with complex environments, we obtain a huge \(4.4\%\) gain when our model is applied in complex environments.

...read moreread less

Proceedings Article•DOI•

Annotation Artifacts in Natural Language Inference Data

[...]

Suchin Gururangan¹, Swabha Swayamdipta², Omer Levy¹, Roy Schwartz¹, Roy Schwartz³, Samuel R. Bowman⁴, Noah A. Smith¹ - Show less +3 more•Institutions (4)

University of Washington¹, Carnegie Mellon University², Allen Institute for Artificial Intelligence³, Google⁴

06 Mar 2018

TL;DR: The authors showed that a simple text categorization model can correctly classify the hypothesis alone in about 67% of SNLI and 53% of MultiNLI, showing that specific linguistic phenomena such as negation and vagueness are highly correlated with certain inference classes.

...read moreread less

Abstract: Large-scale datasets for natural language inference are created by presenting crowd workers with a sentence (premise), and asking them to generate three new sentences (hypotheses) that it entails, contradicts, or is logically neutral with respect to. We show that, in a significant portion of such data, this protocol leaves clues that make it possible to identify the label by looking only at the hypothesis, without observing the premise. Specifically, we show that a simple text categorization model can correctly classify the hypothesis alone in about 67% of SNLI (Bowman et. al, 2015) and 53% of MultiNLI (Williams et. al, 2017). Our analysis reveals that specific linguistic phenomena such as negation and vagueness are highly correlated with certain inference classes. Our findings suggest that the success of natural language inference models to date has been overestimated, and that the task remains a hard open problem.

...read moreread less

Proceedings Article•DOI•

FoldingNet: Point Cloud Auto-Encoder via Deep Grid Deformation

[...]

Yaoqing Yang¹, Chen Feng, Yiru Shen², Dong Tian³•Institutions (3)

Carnegie Mellon University¹, Mitsubishi Electric², Clemson University³

14 Dec 2018

TL;DR: FoldingNet as discussed by the authors proposes an end-to-end deep auto-encoder to address unsupervised learning challenges on point clouds, where a folding-based decoder deforms a canonical 2D grid onto the underlying 3D object surface of a point cloud.

...read moreread less

Abstract: Recent deep networks that directly handle points in a point set, e.g., PointNet, have been state-of-the-art for supervised learning tasks on point clouds such as classification and segmentation. In this work, a novel end-to-end deep auto-encoder is proposed to address unsupervised learning challenges on point clouds. On the encoder side, a graph-based enhancement is enforced to promote local structures on top of PointNet. Then, a novel folding-based decoder deforms a canonical 2D grid onto the underlying 3D object surface of a point cloud, achieving low reconstruction errors even for objects with delicate structures. The proposed decoder only uses about 7% parameters of a decoder with fully-connected neural networks, yet leads to a more discriminative representation that achieves higher linear SVM classification accuracy than the benchmark. In addition, the proposed decoder structure is shown, in theory, to be a generic architecture that is able to reconstruct an arbitrary point cloud from a 2D grid. Our code is available at http://www.merl.com/research/license#FoldingNet

...read moreread less

Collapse