Showing papers by "Amazon.com published in 2019"

PDF

Open Access

Proceedings Article•DOI•

Meta-Learning With Differentiable Convex Optimization

[...]

Kwonjoon Lee¹, Subhransu Maji², Avinash Ravichandran³, Stefano Soatto³•Institutions (3)

University of California, San Diego¹, University of Massachusetts Amherst², Amazon.com³

15 Jun 2019

TL;DR: The objective is to learn feature embeddings that generalize well under a linear classification rule for novel categories and this work exploits two properties of linear classifiers: implicit differentiation of the optimality conditions of the convex problem and the dual formulation of the optimization problem.

...read moreread less

Abstract: Many meta-learning approaches for few-shot learning rely on simple base learners such as nearest-neighbor classifiers. However, even in the few-shot regime, discriminatively trained linear predictors can offer better generalization. We propose to use these predictors as base learners to learn representations for few-shot learning and show they offer better tradeoffs between feature size and performance across a range of few-shot recognition benchmarks. Our objective is to learn feature embeddings that generalize well under a linear classification rule for novel categories. To efficiently solve the objective, we exploit two properties of linear classifiers: implicit differentiation of the optimality conditions of the convex problem and the dual formulation of the optimization problem. This allows us to use high-dimensional embeddings with improved generalization at a modest increase in computational overhead. Our approach, named MetaOptNet, achieves state-of-the-art performance on miniImageNet, tieredImageNet, CIFAR-FS, and FC100 few-shot learning benchmarks.

...read moreread less

1,084 citations

Proceedings Article•DOI•

Bag of Tricks for Image Classification with Convolutional Neural Networks

[...]

Tong He¹, Zhi Zhang¹, Hang Zhang¹, Zhongyue Zhang¹, Junyuan Xie¹, Mu Li¹ - Show less +2 more•Institutions (1)

Amazon.com¹

01 Jun 2019

TL;DR: This article examined a collection of such refinements and empirically evaluated their impact on the final model accuracy through ablation study, and showed that by combining these refinements together, they are able to improve various CNN models significantly.

...read moreread less

Abstract: Much of the recent progress made in image classification research can be credited to training procedure refinements, such as changes in data augmentations and optimization methods. In the literature, however, most refinements are either briefly mentioned as implementation details or only visible in source code. In this paper, we will examine a collection of such refinements and empirically evaluate their impact on the final model accuracy through ablation study. We will show that, by combining these refinements together, we are able to improve various CNN models significantly. For example, we raise ResNet-50's top-1 validation accuracy from 75.3% to 79.29% on ImageNet. We will also demonstrate that improvement on image classification accuracy leads to better transfer learning performance in other application domains such as object detection and semantic segmentation.

...read moreread less

980 citations

Journal Article•DOI•

Temporal Segment Networks for Action Recognition in Videos

[...]

Limin Wang¹, Yuanjun Xiong², Zhe Wang³, Yu Qiao⁴, Dahua Lin⁵, Xiaoou Tang⁵, Luc Van Gool⁶ - Show less +3 more•Institutions (6)

Nanjing University¹, Amazon.com², University of California, Irvine³, Chinese Academy of Sciences⁴, The Chinese University of Hong Kong⁵, ETH Zurich⁶

01 Nov 2019-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Temporal Segment Networks (TSN) as discussed by the authors is proposed to model long-range temporal structure with a new segment-based sampling and aggregation scheme, which enables the TSN framework to efficiently learn action models by using the whole video.

...read moreread less

Abstract: We present a general and flexible video-level framework for learning action models in videos. This method, called temporal segment network (TSN), aims to model long-range temporal structure with a new segment-based sampling and aggregation scheme. This unique design enables the TSN framework to efficiently learn action models by using the whole video. The learned models could be easily deployed for action recognition in both trimmed and untrimmed videos with simple average pooling and multi-scale temporal window integration, respectively. We also study a series of good practices for the implementation of the TSN framework given limited training samples. Our approach obtains the state-the-of-art performance on five challenging action recognition benchmarks: HMDB51 (71.0 percent), UCF101 (94.9 percent), THUMOS14 (80.1 percent), ActivityNet v1.2 (89.6 percent), and Kinetics400 (75.7 percent). In addition, using the proposed RGB difference as a simple motion representation, our method can still achieve competitive accuracy on UCF101 (91.0 percent) while running at 340 FPS. Furthermore, based on the proposed TSN framework, we won the video classification track at the ActivityNet challenge 2016 among 24 teams.

...read moreread less

562 citations

Proceedings Article•DOI•

OCGAN: One-Class Novelty Detection Using GANs With Constrained Latent Representations

[...]

Pramuditha Perera¹, Ramesh Nallapati², Bing Xiang²•Institutions (2)

Johns Hopkins University¹, Amazon.com²

01 Jan 2019

TL;DR: OCGAN as discussed by the authors uses a de-noising auto-encoder network to explicitly constrain the latent space to exclusively represent the given class and uses a gradient-descent based sampling technique to generate potential out-of-class examples.

...read moreread less

Abstract: We present a novel model called OCGAN for the classical problem of one-class novelty detection, where, given a set of examples from a particular class, the goal is to determine if a query example is from the same class. Our solution is based on learning latent representations of in-class examples using a de-noising auto-encoder network. The key contribution of our work is our proposal to explicitly constrain the latent space to exclusively represent the given class. In order to accomplish this goal, firstly, we force the latent space to have bounded support by introducing a tanh activation in the encoder's output layer. Secondly, using a discriminator in the latent space that is trained adversarially, we ensure that encoded representations of in-class examples resemble uniform random samples drawn from the same bounded space. Thirdly, using a second adversarial discriminator in the input space, we ensure all randomly drawn latent samples generate examples that look real. Finally, we introduce a gradient-descent based sampling technique that explores points in the latent space that generate potential out-of-class examples, which are fed back to the network to further train it to generate in-class examples from those points. The effectiveness of the proposed method is measured across four publicly available datasets using two one-class novelty detection protocols where we achieve state-of-the-art results.

...read moreread less

460 citations

Proceedings Article•DOI•

Region Proposal by Guided Anchoring

[...]

Jiaqi Wang¹, Kai Chen¹, Shuo Yang², Chen Change Loy³, Dahua Lin¹ - Show less +1 more•Institutions (3)

The Chinese University of Hong Kong¹, Amazon.com², Nanyang Technological University³

10 Jan 2019

TL;DR: This paper presents an alternative scheme, named Guided Anchoring, which leverages semantic features to guide the anchoring, and jointly predicts the locations where the center of objects of interest are likely to exist as well as the scales and aspect ratios at different locations.

...read moreread less

Abstract: Region anchors are the cornerstone of modern object detection techniques. State-of-the-art detectors mostly rely on a dense anchoring scheme, where anchors are sampled uniformly over the spatial domain with a predefined set of scales and aspect ratios. In this paper, we revisit this foundational stage. Our study shows that it can be done much more effectively and efficiently. Specifically, we present an alternative scheme, named Guided Anchoring, which leverages semantic features to guide the anchoring. The proposed method jointly predicts the locations where the center of objects of interest are likely to exist as well as the scales and aspect ratios at different locations. On top of predicted anchor shapes, we mitigate the feature inconsistency with a feature adaption module. We also study the use of high-quality proposals to improve detection performance. The anchoring scheme can be seamlessly integrated into proposal methods and detectors. With Guided Anchoring, we achieve 9.1% higher recall on MS COCO with 90% fewer anchors than the RPN baseline. We also adopt Guided Anchoring in Fast R-CNN, Faster R-CNN and RetinaNet, respectively improving the detection mAP by 2.2%, 2.7% and 1.2%. Code is available at https://github.com/open-mmlab/mmdetection.

...read moreread less

458 citations

Journal Article•DOI•

MeshCNN: a network with an edge

[...]

Rana Hanocka¹, Amir Hertz¹, Noa Fish¹, Raja Giryes¹, Shachar Fleishman², Daniel Cohen-Or¹ - Show less +2 more•Institutions (2)

Tel Aviv University¹, Amazon.com²

12 Jul 2019-ACM Transactions on Graphics

TL;DR: This paper utilizes the unique properties of the mesh for a direct analysis of 3D shapes using MeshCNN, a convolutional neural network designed specifically for triangular meshes, and demonstrates the effectiveness of MeshCNN on various learning tasks applied to 3D meshes.

...read moreread less

Abstract: Polygonal meshes provide an efficient representation for 3D shapes. They explicitly captureboth shape surface and topology, and leverage non-uniformity to represent large flat regions as well as sharp, intricate features. This non-uniformity and irregularity, however, inhibits mesh analysis efforts using neural networks that combine convolution and pooling operations. In this paper, we utilize the unique properties of the mesh for a direct analysis of 3D shapes using MeshCNN, a convolutional neural network designed specifically for triangular meshes. Analogous to classic CNNs, MeshCNN combines specialized convolution and pooling layers that operate on the mesh edges, by leveraging their intrinsic geodesic connections. Convolutions are applied on edges and the four edges of their incident triangles, and pooling is applied via an edge collapse operation that retains surface topology, thereby, generating new mesh connectivity for the subsequent convolutions. MeshCNN learns which edges to collapse, thus forming a task-driven process where the network exposes and expands the important features while discarding the redundant ones. We demonstrate the effectiveness of MeshCNN on various learning tasks applied to 3D meshes.

...read moreread less

414 citations

Posted Content•

Algorithm Unrolling: Interpretable, Efficient Deep Learning for Signal and Image Processing

[...]

Vishal Monga¹, Yuelong Li², Yonina C. Eldar³•Institutions (3)

Pennsylvania State University¹, Amazon.com², Weizmann Institute of Science³

22 Dec 2019-arXiv: Image and Video Processing

TL;DR: The increasing popularity of unrolled deep networks is due, in part, to their potential in developing efficient, high-performance (yet interpretable) network architectures from reasonably sized training sets.

...read moreread less

Abstract: Deep neural networks provide unprecedented performance gains in many real world problems in signal and image processing. Despite these gains, future development and practical deployment of deep networks is hindered by their blackbox nature, i.e., lack of interpretability, and by the need for very large training sets. An emerging technique called algorithm unrolling or unfolding offers promise in eliminating these issues by providing a concrete and systematic connection between iterative algorithms that are used widely in signal processing and deep neural networks. Unrolling methods were first proposed to develop fast neural network approximations for sparse coding. More recently, this direction has attracted enormous attention and is rapidly growing both in theoretic investigations and practical applications. The growing popularity of unrolled deep networks is due in part to their potential in developing efficient, high-performance and yet interpretable network architectures from reasonable size training sets. In this article, we review algorithm unrolling for signal and image processing. We extensively cover popular techniques for algorithm unrolling in various domains of signal and image processing including imaging, vision and recognition, and speech processing. By reviewing previous works, we reveal the connections between iterative algorithms and neural networks and present recent theoretical results. Finally, we provide a discussion on current limitations of unrolling and suggest possible future research directions.

...read moreread less

398 citations

Proceedings Article•DOI•

Variational Information Distillation for Knowledge Transfer

[...]

Sungsoo Ahn¹, Shell Xu Hu, Andreas Damianou², Neil D. Lawrence², Zhenwen Dai² - Show less +1 more•Institutions (2)

KAIST¹, Amazon.com²

15 Jun 2019

TL;DR: In this article, the authors propose an information-theoretic framework for knowledge transfer which formulates knowledge transfer as maximizing the mutual information between the teacher and the student networks, and compare their method with existing knowledge transfer methods on both knowledge distillation and transfer learning tasks and show that their method consistently outperforms existing methods.

...read moreread less

Abstract: Transferring knowledge from a teacher neural network pretrained on the same or a similar task to a student neural network can significantly improve the performance of the student neural network. Existing knowledge transfer approaches match the activations or the corresponding hand-crafted features of the teacher and the student networks. We propose an information-theoretic framework for knowledge transfer which formulates knowledge transfer as maximizing the mutual information between the teacher and the student networks. We compare our method with existing knowledge transfer methods on both knowledge distillation and transfer learning tasks and show that our method consistently outperforms existing methods. We further demonstrate the strength of our method on knowledge transfer across heterogeneous network architectures by transferring knowledge from a convolutional neural network (CNN) to a multi-layer perceptron (MLP) on CIFAR-10. The resulting MLP significantly outperforms the-state-of-the-art methods and it achieves similar performance to the CNN with a single convolutional layer.

...read moreread less

298 citations

Proceedings Article•

Meta-Learning with Implicit Gradients

[...]

Aravind Rajeswaran¹, Chelsea Finn², Sham M. Kakade³, Sergey Levine⁴•Institutions (4)

University of Washington¹, Massachusetts Institute of Technology², Amazon.com³, University of California, Berkeley⁴

01 Jan 2019

TL;DR: Theoretically, it is proved that implicit MAML can compute accurate meta-gradients with a memory footprint that is, up to small constant factors, no more than that which is required to compute a single inner loop gradient and at no overall increase in the total computational cost.

...read moreread less

Abstract: A core capability of intelligent systems is the ability to quickly learn new tasks by drawing on prior experience. Gradient (or optimization) based meta-learning has recently emerged as an effective approach for few-shot learning. In this formulation, meta-parameters are learned in the outer loop, while task-specific models are learned in the inner-loop, by using only a small amount of data from the current task. A key challenge in scaling these approaches is the need to differentiate through the inner loop learning process, which can impose considerable computational and memory burdens. By drawing upon implicit differentiation, we develop the implicit MAML algorithm, which depends only on the solution to the inner level optimization and not the path taken by the inner loop optimizer. This effectively decouples the meta-gradient computation from the choice of inner loop optimizer. As a result, our approach is agnostic to the choice of inner loop optimizer and can gracefully handle many gradient steps without vanishing gradients or memory constraints. Theoretically, we prove that implicit MAML can compute accurate meta-gradients with a memory footprint that is, up to small constant factors, no more than that which is required to compute a single inner loop gradient and at no overall increase in the total computational cost. Experimentally, we show that these benefits of implicit MAML translate into empirical gains on few-shot image recognition benchmarks.

...read moreread less

280 citations

Proceedings Article•DOI•

Co-Occurrent Features in Semantic Segmentation

[...]

Hang Zhang¹, Han Zhang², Chenguang Wang¹, Junyuan Xie¹•Institutions (2)

Amazon.com¹, Google²

15 Jun 2019

TL;DR: This paper builds an Aggregated Co-occurrent Feature (ACF) Module, which learns a fine-grained spatial invariant representation to capture co- occurrent context information across the scene and significantly improves the segmentation results using FCN.

...read moreread less

Abstract: Recent work has achieved great success in utilizing global contextual information for semantic segmentation, including increasing the receptive field and aggregating pyramid feature representations. In this paper, we go beyond global context and explore the fine-grained representation using co-occurrent features by introducing Co-occurrent Feature Model, which predicts the distribution of co-occurrent features for a given target. To leverage the semantic context in the co-occurrent features, we build an Aggregated Co-occurrent Feature (ACF) Module by aggregating the probability of the co-occurrent feature with the co-occurrent context. ACF Module learns a fine-grained spatial invariant representation to capture co-occurrent context information across the scene. Our approach significantly improves the segmentation results using FCN and achieves superior performance 54.0% mIoU on Pascal Context, 87.2% mIoU on Pascal VOC 2012 and 44.89% mIoU on ADE20K datasets. The source code and complete system will be publicly available upon publication.

...read moreread less

273 citations

Journal Article•DOI•

Compositional response of Amazon forests to climate change

[...]

Adriane Esquivel-Muelbert¹, Timothy R. Baker¹, Kyle G. Dexter², Simon L. Lewis³, Simon L. Lewis¹, Roel J. W. Brienen¹, Ted R. Feldpausch⁴, Jon Lloyd⁵, Abel Monteagudo-Mendoza⁶, Luzmila Arroyo⁷, Esteban Álvarez-Dávila, Niro Higuchi⁸, Beatriz Schwantes Marimon⁹, Ben Hur Marimon-Junior⁹, Marcos Silveira¹⁰, Emilio Vilanova¹¹, Emilio Vilanova¹², Emanuel Gloor¹, Yadvinder Malhi¹³, Jérôme Chave¹⁴, Jos Barlow¹⁵, Jos Barlow¹⁶, Damien Bonal¹⁷, Nallaret Davila Cardozo¹⁸, Terry L. Erwin¹⁹, Sophie Fauset¹, Bruno Hérault²⁰, Susan G. Laurance²¹, Lourens Poorter²², Lan Qie⁵, Clément Stahl²³, Martin J. P. Sullivan¹, Hans ter Steege²⁴, Hans ter Steege²⁵, Vincent A. Vos, Pieter A. Zuidema²², Everton Cristo de Almeida²⁶, Edmar Almeida de Oliveira⁹, Ana Andrade⁸, Simone Aparecida Vieira²⁷, Luiz E. O. C. Aragão²⁸, Luiz E. O. C. Aragão⁴, Alejandro Araujo-Murakami⁷, Eric Arets²², Gerardo A. Aymard C, Christopher Baraloto²⁹, Plínio Barbosa de Camargo³⁰, Jorcely Barroso¹⁰, Frans Bongers²², René G. A. Boot³¹, José Luís Camargo⁸, Wendeson Castro¹⁰, Victor Chama Moscoso⁶, James A. Comiskey¹⁹, Fernando Cornejo Valverde³², Antonio Carlos Lola da Costa³³, Jhon del Aguila Pasquel³², Jhon del Aguila Pasquel³⁴, Anthony Di Fiore³⁵, Luisa Fernanda Duque, Fernando Elias⁹, Julien Engel²⁰, Julien Engel²⁹, Gerardo Flores Llampazo, David W. Galbraith¹, Rafael Herrera Fernández³⁶, Rafael Herrera Fernández³⁷, Eurídice N. Honorio Coronado³⁴, Wannes Hubau³⁸, Eliana Jimenez-Rojas³⁹, Adriano José Nogueira Lima⁸, Ricardo Keichi Umetsu⁹, William F. Laurance²¹, Gabriela Lopez-Gonzalez¹, Thomas E. Lovejoy⁴⁰, Omar Aurelio Melo Cruz⁴¹, Paulo S. Morandi⁹, David A. Neill, Percy Núñez Vargas⁶, Nadir Pallqui Camacho⁶, Alexander Parada Gutierrez, Guido Pardo, Julie Peacock¹, Marielos Peña-Claros²², Maria Cristina Peñuela-Mora, Pascal Petronelli¹⁴, Georgia Pickavance¹, Nigel C. A. Pitman, Adriana Prieto⁴², Carlos A. Quesada⁸, Hirma Ramírez-Angulo¹¹, Maxime Réjou-Méchain⁴³, Zorayda Restrepo Correa, Anand Roopsind⁴⁴, Agustín Rudas⁴², Rafael de Paiva Salomão¹⁵, Natalino Silva, Javier Silva Espejo⁴⁵, James Singh⁴⁶, Juliana Stropp⁴⁷, John Terborgh⁴⁸, Raquel Thomas⁴⁴, Marisol Toledo⁷, Armando Torres-Lezama¹¹, Luis Valenzuela Gamarra, Peter J. van de Meer⁴⁹, Geertje M. F. van der Heijden⁵⁰, Peter van der Hout, Rodolfo Vásquez Martínez, César I.A. Vela⁶, Ima Célia Guimarães Vieira¹⁵, Oliver L. Phillips¹ - Show less +108 more•Institutions (50)

University of Leeds¹, University of Edinburgh², University College London³, University of Exeter⁴, Imperial College London⁵, National University of Saint Anthony the Abbot in Cuzco⁶, Universidad Autónoma Gabriel René Moreno⁷, National Institute of Amazonian Research⁸, Universidade do Estado de Mato Grosso⁹, Universidade Federal do Acre¹⁰, University of Los Andes¹¹, University of Washington¹², Environmental Change Institute¹³, Centre national de la recherche scientifique¹⁴, Museu Paraense Emílio Goeldi¹⁵, Lancaster University¹⁶, University of Lorraine¹⁷, Universidad Nacional de la Amazonía Peruana¹⁸, Smithsonian Institution¹⁹, University of Montpellier²⁰, James Cook University²¹, Wageningen University and Research Centre²², Agro ParisTech²³, Naturalis²⁴, University of Amsterdam²⁵, Federal University of Western Pará²⁶, State University of Campinas²⁷, National Institute for Space Research²⁸, Florida International University²⁹, University of São Paulo³⁰, Tropenbos International³¹, Amazon.com³², Federal University of Pará³³, Michigan Technological University³⁴, University of Texas at Austin³⁵, Venezuelan Institute for Scientific Research³⁶, Polytechnic University of Valencia³⁷, Royal Museum for Central Africa³⁸, Tecnológico de Antioquia³⁹, George Mason University⁴⁰, Universidad del Tolima⁴¹, National University of Colombia⁴², Paul Sabatier University⁴³, Georgetown University⁴⁴, University of La Serena⁴⁵, Forestry Commission⁴⁶, Federal University of Alagoas⁴⁷, Duke University⁴⁸, Van Hall Larenstein University of Applied Sciences⁴⁹, University of Nottingham⁵⁰

01 Jan 2019-Global Change Biology

TL;DR: A slow shift to a more dry‐affiliated Amazonia is underway, with changes in compositional dynamics consistent with climate‐change drivers, but yet to significantly impact whole‐community composition.

...read moreread less

Abstract: Most of the planet's diversity is concentrated in the tropics, which includes many regions undergoing rapid climate change. Yet, while climate‐induced biodiversity changes are widely documented elsewhere, few studies have addressed this issue for lowland tropical ecosystems. Here we investigate whether the floristic and functional composition of intact lowland Amazonian forests have been changing by evaluating records from 106 long‐term inventory plots spanning 30 years. We analyse three traits that have been hypothesized to respond to different environmental drivers (increase in moisture stress and atmospheric CO2 concentrations): maximum tree size, biogeographic water‐deficit affiliation and wood density. Tree communities have become increasingly dominated by large‐statured taxa, but to date there has been no detectable change in mean wood density or water deficit affiliation at the community level, despite most forest plots having experienced an intensification of the dry season. However, among newly recruited trees, dry‐affiliated genera have become more abundant, while the mortality of wet‐affiliated genera has increased in those plots where the dry season has intensified most. Thus, a slow shift to a more dry‐affiliated Amazonia is underway, with changes in compositional dynamics (recruits and mortality) consistent with climate‐change drivers, but yet to significantly impact whole‐community composition. The Amazon observational record suggests that the increase in atmospheric CO2 is driving a shift within tree communities to large‐statured species and that climate changes to date will impact forest composition, but long generation times of tropical trees mean that biodiversity change is lagging behind climate change.

...read moreread less

Posted Content•

Meta-Learning with Implicit Gradients

[...]

Aravind Rajeswaran¹, Chelsea Finn², Sham M. Kakade³, Sergey Levine⁴•Institutions (4)

University of Washington¹, Massachusetts Institute of Technology², Amazon.com³, University of California, Berkeley⁴

10 Sep 2019-arXiv: Learning

TL;DR: The implicit MAML algorithm as discussed by the authors decouples the meta-gradient computation from the choice of inner-loop optimizer and can gracefully handle many gradient steps without vanishing gradients or memory constraints.

...read moreread less

Journal Article•DOI•

Transformative effects of IoT, Blockchain and Artificial Intelligence on cloud computing: Evolution, vision, trends and open challenges

[...]

Sukhpal Singh Gill¹, Shreshth Tuli², Minxian Xu³, Inderpreet Singh⁴, Karan Singh⁵, Karan Singh⁶, Dominic Lindsay⁷, Shikhar Tuli², Daria Smirnova⁷, Manmeet Singh², Manmeet Singh⁸, Udit Jain², Haris Pervaiz⁷, Bhanu Sehgal⁹, Sukhwinder Singh Kaila, Sanjay Misra¹⁰, Sanjay Misra¹¹, Mohammad Sadegh Aslanpour¹², Harshit Mehta¹³, Vlado Stankovski¹⁴, Peter Garraghan⁷ - Show less +17 more•Institutions (14)

Queen Mary University of London¹, Indian Institutes of Technology², Chinese Academy of Sciences³, Simon Fraser University⁴, University of Waterloo⁵, Amazon.com⁶, Lancaster University⁷, Indian Institute of Tropical Meteorology⁸, Accenture⁹, Atılım University¹⁰, Covenant University¹¹, Islamic Azad University¹², University of Texas at Austin¹³, University of Ljubljana¹⁴

01 Dec 2019

TL;DR: A conceptual model for cloud futurology is proposed in this article to explore the influence of emerging paradigms and technologies on evolution of cloud computing. But, the model is limited to three technologies: Blockchain, IoT and Artificial Intelligence.

...read moreread less

Abstract: Cloud computing plays a critical role in modern society and enables a range of applications from infrastructure to social media. Such system must cope with varying load and evolving usage reflecting societies’ interaction and dependency on automated computing systems whilst satisfying Quality of Service (QoS) guarantees. Enabling these systems are a cohort of conceptual technologies, synthesized to meet demand of evolving computing applications. In order to understand current and future challenges of such system, there is a need to identify key technologies enabling future applications. In this study, we aim to explore how three emerging paradigms (Blockchain, IoT and Artificial Intelligence) will influence future cloud computing systems. Further, we identify several technologies driving these paradigms and invite international experts to discuss the current status and future directions of cloud computing. Finally, we proposed a conceptual model for cloud futurology to explore the influence of emerging paradigms and technologies on evolution of cloud computing.

...read moreread less

Journal Article•DOI•

The Geometry of Culture: Analyzing the Meanings of Class through Word Embeddings:

[...]

Austin C. Kozlowski¹, Matt Taddy², James A. Evans³, James A. Evans¹•Institutions (3)

University of Chicago¹, Amazon.com², Santa Fe Institute³

25 Sep 2019-American Sociological Review

TL;DR: The authors argue that word embedding models are a useful tool for the study of culture using a historical analysis of shared understandings of social class as an empirical case, and they argue word embeddings represent semant...

...read moreread less

Abstract: We argue word embedding models are a useful tool for the study of culture using a historical analysis of shared understandings of social class as an empirical case. Word embeddings represent semant...

...read moreread less

Proceedings Article•DOI•

Learning Problem-Agnostic Speech Representations from Multiple Self-Supervised Tasks.

[...]

Santiago Pascual¹, Mirco Ravanelli², Joan Serrà³, Antonio Bonafonte⁴, Yoshua Bengio² - Show less +1 more•Institutions (4)

Polytechnic University of Catalonia¹, Université de Montréal², Telefónica³, Amazon.com⁴

06 Apr 2019

TL;DR: This article proposed an improved self-supervised method, where a single neural encoder is followed by multiple workers that jointly solve different selfsupervised tasks, and the needed consensus across different tasks naturally imposes meaningful constraints to the encoder, contributing to discover general representations and to minimize the risk of learning superficial ones.

...read moreread less

Abstract: Learning good representations without supervision is still an open issue in machine learning, and is particularly challenging for speech signals, which are often characterized by long sequences with a complex hierarchical structure. Some recent works, however, have shown that it is possible to derive useful speech representations by employing a self-supervised encoder-discriminator approach. This paper proposes an improved self-supervised method, where a single neural encoder is followed by multiple workers that jointly solve different self-supervised tasks. The needed consensus across different tasks naturally imposes meaningful constraints to the encoder, contributing to discover general representations and to minimize the risk of learning superficial ones. Experiments show that the proposed approach can learn transferable, robust, and problem-agnostic features that carry on relevant information from the speech signal, such as speaker identity, phonemes, and even higher-level features such as emotional cues. In addition, a number of design choices make the encoder easily exportable, facilitating its direct usage or adaptation to different problems.

...read moreread less

Posted Content•

Variational Information Distillation for Knowledge Transfer.

[...]

Sungsoo Ahn¹, Shell Xu Hu², Andreas Damianou³, Neil D. Lawrence³, Zhenwen Dai³ - Show less +1 more•Institutions (3)

KAIST¹, École des ponts ParisTech², Amazon.com³

11 Apr 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: An information-theoretic framework for knowledge transfer is proposed which formulates knowledge transfer as maximizing the mutual information between the teacher and the student networks and which consistently outperforms existing methods.

...read moreread less

Proceedings Article•DOI•

Topical-Chat: Towards Knowledge-Grounded Open-Domain Conversations.

[...]

Karthik Gopalakrishnan¹, Behnam Hedayatnia², Qinlang Chen, Anna Gottardi, Sanjeev Kwatra, Anu Venkatesh², Raefer Gabriel², Dilek Hakkani-Tur² - Show less +4 more•Institutions (2)

Indian Institute of Technology Patna¹, Amazon.com²

15 Sep 2019

TL;DR: Topical-Chat is introduced, a knowledge-grounded humanhuman conversation dataset where the underlying knowledge spans 8 broad topics and conversation partners don’t have explicitly defined roles, to help further research in opendomain conversational AI.

...read moreread less

Abstract: Building socialbots that can have deep, engaging open-domain conversations with humans is one of the grand challenges of artificial intelligence (AI). To this end, bots need to be able to leverage world knowledge spanning several domains effectively when conversing with humans who have their own world knowledge. Existing knowledge-grounded conversation datasets are primarily stylized with explicit roles for conversation partners. These datasets also do not explore depth or breadth of topical coverage with transitions in conversations. We introduce Topical-Chat, a knowledge-grounded humanhuman conversation dataset where the underlying knowledge spans 8 broad topics and conversation partners don’t have explicitly defined roles, to help further research in opendomain conversational AI. We also train several state-of-theart encoder-decoder conversational models on Topical-Chat and perform automated and human evaluation for benchmarking.

...read moreread less

Proceedings Article•DOI•

Jasper: An End-to-End Convolutional Neural Acoustic Model.

[...]

Jason Li¹, Vitaly Lavrukhin¹, Boris Ginsburg¹, Ryan Leary¹, Oleksii Kuchaiev¹, Jonathan Cohen¹, Huyen Nguyen¹, Ravi Teja Gadde² - Show less +4 more•Institutions (2)

Nvidia¹, Amazon.com²

05 Apr 2019

TL;DR: This paper reports state-of-the-art results on LibriSpeech among end-to-end speech recognition models without any external training data and introduces a new layer-wise optimizer called NovoGrad to improve training.

...read moreread less

Abstract: In this paper, we report state-of-the-art results on LibriSpeech among end-to-end speech recognition models without any external training data. Our model, Jasper, uses only 1D convolutions, batch normalization, ReLU, dropout, and residual connections. To improve training, we further introduce a new layer-wise optimizer called NovoGrad. Through experiments, we demonstrate that the proposed deep architecture performs as well or better than more complex choices. Our deepest Jasper variant uses 54 convolutional layers. With this architecture, we achieve 2.95% WER using a beam-search decoder with an external neural language model and 3.86% WER with a greedy decoder on LibriSpeech test-clean. We also report competitive results on the Wall Street Journal and the Hub5'00 conversational evaluation datasets.

...read moreread less

Proceedings Article•DOI•

Unsupervised 3D Pose Estimation With Geometric Self-Supervision

[...]

Ching-Hang Chen¹, Ambrish Tyagi¹, Amit Agrawal¹, Dylan Drover¹, Rohith Mv², Stefan Stojanov², James M. Rehg¹ - Show less +3 more•Institutions (2)

Amazon.com¹, Georgia Institute of Technology²

15 Jun 2019

TL;DR: It is shown that self-consistency alone is not sufficient to generate realistic skeletons, however adding a 2D pose discriminator enables the lifter to output valid 3D poses and demonstrates the useful- ness of2D pose data for unsupervised 3D lifting.

...read moreread less

Abstract: We present an unsupervised learning approach to re- cover 3D human pose from 2D skeletal joints extracted from a single image. Our method does not require any multi- view image data, 3D skeletons, correspondences between 2D-3D points, or use previously learned 3D priors during training. A lifting network accepts 2D landmarks as inputs and generates a corresponding 3D skeleton estimate. Dur- ing training, the recovered 3D skeleton is reprojected on random camera viewpoints to generate new ‘synthetic’ 2D poses. By lifting the synthetic 2D poses back to 3D and re-projecting them in the original camera view, we can de- fine self-consistency loss both in 3D and in 2D. The training can thus be self supervised by exploiting the geometric self- consistency of the lift-reproject-lift process. We show that self-consistency alone is not sufficient to generate realistic skeletons, however adding a 2D pose discriminator enables the lifter to output valid 3D poses. Additionally, to learn from 2D poses ‘in the wild’, we train an unsupervised 2D domain adapter network to allow for an expansion of 2D data. This improves results and demonstrates the useful- ness of 2D pose data for unsupervised 3D lifting. Results on Human3.6M dataset for 3D human pose estimation demon- strate that our approach improves upon the previous un- supervised methods by 30% and outperforms many weakly supervised approaches that explicitly use 3D data.

...read moreread less

Journal Article•DOI•

Mapping 123 million neonatal, infant and child deaths between 2000 and 2017

[...]

Roy Burstein¹, Nathaniel J Henry¹, Michael Collison¹, Laurie B. Marczak¹ +663 more•Institutions (290)

16 Oct 2019-Nature

TL;DR: A high-resolution, global atlas of mortality of children under five years of age between 2000 and 2017 highlights subnational geographical inequalities in the distribution, rates and absolute counts of child deaths by age.

...read moreread less

Abstract: Since 2000, many countries have achieved considerable success in improving child survival, but localized progress remains unclear. To inform efforts towards United Nations Sustainable Development Goal 3.2—to end preventable child deaths by 2030—we need consistently estimated data at the subnational level regarding child mortality rates and trends. Here we quantified, for the period 2000–2017, the subnational variation in mortality rates and number of deaths of neonates, infants and children under 5 years of age within 99 low- and middle-income countries using a geostatistical survival model. We estimated that 32% of children under 5 in these countries lived in districts that had attained rates of 25 or fewer child deaths per 1,000 live births by 2017, and that 58% of child deaths between 2000 and 2017 in these countries could have been averted in the absence of geographical inequality. This study enables the identification of high-mortality clusters, patterns of progress and geographical inequalities to inform appropriate investments and implementations that will help to improve the health of all populations.

...read moreread less

Proceedings Article•DOI•

Few-Shot Learning With Embedded Class Models and Shot-Free Meta Training

[...]

Avinash Ravichandran¹, Rahul Bhotika¹, Stefano Soatto²•Institutions (2)

Amazon.com¹, University of California, Los Angeles²

10 May 2019

TL;DR: This work proposes a method for learning embeddings for few-shot learning that is suitable for use with any number of shots (shot-free), that encompasses metric learning, that facilitates adding new classes without crowding the class representation space.

...read moreread less

Abstract: We propose a method for learning embeddings for few-shot learning that is suitable for use with any number of shots (shot-free). Rather than fixing the class prototypes to be the Euclidean average of sample embeddings, we allow them to live in a higher-dimensional space (embedded class models) and learn the prototypes along with the model parameters. The class representation function is defined implicitly, which allows us to deal with a variable number of shots per class with a simple constant-size architecture. The class embedding encompasses metric learning, that facilitates adding new classes without crowding the class representation space. Despite being general and not tuned to the benchmark, our approach achieves state-of-the-art performance on the standard few-shot benchmark datasets.

...read moreread less

Proceedings Article•

Classification is a Strong Baseline for Deep Metric Learning.

[...]

Andrew Zhai¹, Hao-Yu Wu²•Institutions (2)

University of California, Berkeley¹, Amazon.com²

01 Jan 2019

TL;DR: This paper evaluates on several standard retrieval datasets such as CAR-196, CUB-200-2011, Stanford Online Product, and In-Shop datasets for image retrieval and clustering, and establishes that the classification-based approach is competitive across different feature dimensions and base feature networks.

...read moreread less

Abstract: Deep metric learning aims to learn a function mapping image pixels to embedding feature vectors that model the similarity between images. Two major applications of metric learning are content-based image retrieval and face verification. For the retrieval tasks, the majority of current state-of-the-art (SOTA) approaches are triplet-based non-parametric training. For the face verification tasks, however, recent SOTA approaches have adopted classification-based parametric training. In this paper, we look into the effectiveness of classification based approaches on image retrieval datasets. We evaluate on several standard retrieval datasets such as CAR-196, CUB-200-2011, Stanford Online Product, and In-Shop datasets for image retrieval and clustering, and establish that our classification-based approach is competitive across different feature dimensions and base feature networks. We further provide insights into the performance effects of subsampling classes for scalable classification-based training, and the effects of binarization, enabling efficient storage and computation for practical applications.

...read moreread less

Posted Content•DOI•

Transformers without Tears: Improving the Normalization of Self-Attention

[...]

Toan Q. Nguyen¹, Julian Salazar²•Institutions (2)

University of Notre Dame¹, Amazon.com²

02 Nov 2019-arXiv: Computation and Language

TL;DR: It is shown that pre-norm residual connections (PRENORM) and smaller initializations enable warmup-free, validation-based training with large learning rates and proposed l2 normalization with a single scale parameter (SCALENORN) for faster training and better performance.

...read moreread less

Abstract: We evaluate three simple, normalization-centric changes to improve Transformer training. First, we show that pre-norm residual connections (PreNorm) and smaller initializations enable warmup-free, validation-based training with large learning rates. Second, we propose $\ell_2$ normalization with a single scale parameter (ScaleNorm) for faster training and better performance. Finally, we reaffirm the effectiveness of normalizing word embeddings to a fixed length (FixNorm). On five low-resource translation pairs from TED Talks-based corpora, these changes always converge, giving an average +1.1 BLEU over state-of-the-art bilingual baselines and a new 32.8 BLEU on IWSLT'15 English-Vietnamese. We observe sharper performance curves, more consistent gradient norms, and a linear relationship between activation scaling and decoder depth. Surprisingly, in the high-resource setting (WMT'14 English-German), ScaleNorm and FixNorm remain competitive but PreNorm degrades performance.

...read moreread less

Journal Article•DOI•

Dynamic Service Migration in Mobile Edge Computing Based on Markov Decision Process

[...]

Shiqiang Wang, Rahul Urgaonkar¹, Murtaza Zafer, Ting He², Kevin S. Chan³, Kin K. Leung⁴ - Show less +2 more•Institutions (4)

Amazon.com¹, Pennsylvania State University², United States Army Research Laboratory³, Imperial College London⁴

01 Jun 2019-IEEE ACM Transactions on Networking

TL;DR: In this paper, the authors formulate the service migration problem as a Markov decision process (MDP) and provide a mathematical framework to design optimal service migration policies in mobile edge computing.

...read moreread less

Abstract: In mobile edge computing, local edge servers can host cloud-based services, which reduces network overhead and latency but requires service migrations as users move to new locations. It is challenging to make migration decisions optimally because of the uncertainty in such a dynamic cloud environment. In this paper, we formulate the service migration problem as a Markov decision process (MDP). Our formulation captures general cost models and provides a mathematical framework to design optimal service migration policies. In order to overcome the complexity associated with computing the optimal policy, we approximate the underlying state space by the distance between the user and service locations. We show that the resulting MDP is exact for the uniform 1-D user mobility, while it provides a close approximation for uniform 2-D mobility with a constant additive error. We also propose a new algorithm and a numerical technique for computing the optimal solution, which is significantly faster than traditional methods based on the standard value or policy iteration. We illustrate the application of our solution in practical scenarios where many theoretical assumptions are relaxed. Our evaluations based on real-world mobility traces of San Francisco taxis show the superior performance of the proposed solution compared to baseline solutions.

...read moreread less

Proceedings Article•DOI•

Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering

[...]

Zhiguo Wang¹, Patrick Ng², Xiaofei Ma¹, Ramesh Nallapati¹, Bing Xiang¹ - Show less +1 more•Institutions (2)

Amazon.com¹, Anschutz Medical Campus²

22 Aug 2019

TL;DR: The authors proposed a multi-passage BERT model to globally normalize answer scores across all passages of the same question, and this change enables our QA model find better answers by utilizing more passages.

...read moreread less

Abstract: BERT model has been successfully applied to open-domain QA tasks. However, previous work trains BERT by viewing passages corresponding to the same question as independent training instances, which may cause incomparable scores for answers from different passages. To tackle this issue, we propose a multi-passage BERT model to globally normalize answer scores across all passages of the same question, and this change enables our QA model find better answers by utilizing more passages. In addition, we find that splitting articles into passages with the length of 100 words by sliding window improves performance by 4%. By leveraging a passage ranker to select high-quality passages, multi-passage BERT gains additional 2%. Experiments on four standard benchmarks showed that our multi-passage BERT outperforms all state-of-the-art models on all benchmarks. In particular, on the OpenSQuAD dataset, our model gains 21.4% EM and 21.5% F1 over all non-BERT models, and 5.8% EM and 6.5% F1 over BERT-based models.

...read moreread less

Proceedings Article•DOI•

Semantic Correlation Promoted Shape-Variant Context for Segmentation

[...]

Henghui Ding¹, Xudong Jiang¹, Bing Shuai², Ai Qun Liu¹, Gang Wang³ - Show less +1 more•Institutions (3)

Nanyang Technological University¹, Amazon.com², Alibaba Group³

01 Jun 2019

TL;DR: This work proposes a novel paired convolution to infer the semantic correlation of the pair and based on that to generate a shape mask, of which the receptive field is controlled by the shape mask that varies with the appearance of input.

...read moreread less

Abstract: Context is essential for semantic segmentation. Due to the diverse shapes of objects and their complex layout in various scene images, the spatial scales and shapes of contexts for different objects have very large variation. It is thus ineffective or inefficient to aggregate various context information from a predefined fixed region. In this work, we propose to generate a scale- and shape-variant semantic mask for each pixel to confine its contextual region. To this end, we first propose a novel paired convolution to infer the semantic correlation of the pair and based on that to generate a shape mask. Using the inferred spatial scope of the contextual region, we propose a shape-variant convolution, of which the receptive field is controlled by the shape mask that varies with the appearance of input. In this way, the proposed network aggregates the context information of a pixel from its semantic-correlated region instead of a predefined fixed region. Furthermore, this work also proposes a labeling denoising model to reduce wrong predictions caused by the noisy low-level features. Without bells and whistles, the proposed segmentation network achieves new state-of-the-arts consistently on the six public segmentation datasets.

...read moreread less

Proceedings Article•DOI•

Task2Vec: Task Embedding for Meta-Learning

[...]

Alessandro Achille¹, Michael Lam², Rahul Tewari², Avinash Ravichandran², Subhransu Maji³, Charless C. Fowlkes⁴, Stefano Soatto¹, Pietro Perona⁵ - Show less +4 more•Institutions (5)

University of California, Los Angeles¹, Amazon.com², University of Massachusetts Amherst³, University of California, Irvine⁴, California Institute of Technology⁵

10 Feb 2019

TL;DR: A method to generate vectorial representations of visual classification tasks which can be used to reason about the nature of those tasks and their relations, and is demonstrated to be capable of predicting task similarities that match the authors' intuition about semantic and taxonomic relations between different visual tasks.

...read moreread less

Abstract: We introduce a method to generate vectorial representations of visual classification tasks which can be used to reason about the nature of those tasks and their relations. Given a dataset with ground-truth labels and a loss function, we process images through a "probe network" and compute an embedding based on estimates of the Fisher information matrix associated with the probe network parameters. This provides a fixed-dimensional embedding of the task that is independent of details such as the number of classes and requires no understanding of the class label semantics. We demonstrate that this embedding is capable of predicting task similarities that match our intuition about semantic and taxonomic relations between different visual tasks. We demonstrate the practical value of this framework for the meta-task of selecting a pre-trained feature extractor for a novel task. We present a simple meta-learning framework for learning a metric on embeddings that is capable of predicting which feature extractors will perform well on which task. Selecting a feature extractor with task embedding yields performance close to the best available feature extractor, with substantially less computational effort than exhaustively training and evaluating all available models.

...read moreread less

Proceedings Article•DOI•

Self-attention Networks for Connectionist Temporal Classification in Speech Recognition

[...]

Julian Salazar¹, Katrin Kirchhoff¹, Zhiheng Huang¹•Institutions (1)

Amazon.com¹

22 Jan 2019

TL;DR: This work proposes SAN-CTC, a deep, fully self-attentional network for CTC, and shows it is tractable and competitive for end-to-end speech recognition, and explores how label alphabets affect attention heads and performance.

...read moreread less

Abstract: The success of self-attention in NLP has led to recent applications in end-to-end encoder-decoder architectures for speech recognition. Separately, connectionist temporal classification (CTC) has matured as an alignment-free, non-autoregressive approach to sequence transduction, either by itself or in various multitask and decoding frameworks. We propose SAN-CTC, a deep, fully self-attentional network for CTC, and show it is tractable and competitive for end-to-end speech recognition. SAN-CTC trains quickly and outperforms existing CTC models and most encoder-decoder models, with character error rates (CERs) of 4.7% in 1 day on WSJ eval92 and 2.8% in 1 week on LibriSpeech test-clean, with a fixed architecture and one GPU. Similar improvements hold for WERs after LM decoding. We motivate the architecture for speech, evaluate position and down-sampling approaches, and explore how label alphabets (character, phoneme, subword) affect attention heads and performance.

...read moreread less

Posted Content•

Bag of Freebies for Training Object Detection Neural Networks.

[...]

Zhi Zhang, Tong He, Hang Zhang, Zhongyue Zhang, Junyuan Xie, Mu Li¹ - Show less +2 more•Institutions (1)

Amazon.com¹

11 Feb 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work explores training tweaks that apply to various models including Faster R-CNN and YOLOv3 that can improve up to 5% absolute precision compared to state-of-the-art baselines.

...read moreread less

Abstract: Training heuristics greatly improve various image classification model accuracies~\cite{he2018bag}. Object detection models, however, have more complex neural network structures and optimization targets. The training strategies and pipelines dramatically vary among different models. In this works, we explore training tweaks that apply to various models including Faster R-CNN and YOLOv3. These tweaks do not change the model architectures, therefore, the inference costs remain the same. Our empirical results demonstrate that, however, these freebies can improve up to 5% absolute precision compared to state-of-the-art baselines.

...read moreread less

Proceedings Article•DOI•

Nexus: a GPU cluster engine for accelerating DNN-based video analysis

[...]

Haichen Shen¹, Lequn Chen², Yuchen Jin², Liangyu Zhao², Bingyu Kong³, Matthai Philipose⁴, Arvind Krishnamurthy², Ravi Sundaram⁵ - Show less +4 more•Institutions (5)

Amazon.com¹, University of Washington², Shanghai Jiao Tong University³, Microsoft⁴, Northeastern University⁵

27 Oct 2019

TL;DR: Nexus is a fully implemented system that includes cluster-scale resource management that performs detailed scheduling of GPUs, reasoning about groups of DNN invocations that need to be co-scheduled, and moving from the conventional whole-DNN execution model to executing fragments ofDNNs.

...read moreread less

Abstract: We address the problem of serving Deep Neural Networks (DNNs) efficiently from a cluster of GPUs. In order to realize the promise of very low-cost processing made by accelerators such as GPUs, it is essential to run them at sustained high utilization. Doing so requires cluster-scale resource management that performs detailed scheduling of GPUs, reasoning about groups of DNN invocations that need to be co-scheduled, and moving from the conventional whole-DNN execution model to executing fragments of DNNs. Nexus is a fully implemented system that includes these innovations. In large-scale case studies on 16 GPUs, when required to stay within latency constraints at least 99% of the time, Nexus can process requests at rates 1.8-12.7X higher than state of the art systems can. A long-running multi-application deployment stays within 84% of optimal utilization and, on a 100-GPU cluster, violates latency SLOs on 0.27% of requests.

...read moreread less

Collapse