scispace - formally typeset
Search or ask a question

Showing papers by "Amazon.com published in 2017"


Journal ArticleDOI
TL;DR: It is suggested data controllers should offer a particular type of explanation, unconditional counterfactual explanations, to support these three aims, which describe the smallest change to the world that can be made to obtain a desirable outcome, or to arrive at the closest possible world, without needing to explain the internal logic of the system.
Abstract: There has been much discussion of the “right to explanation” in the EU General Data Protection Regulation, and its existence, merits, and disadvantages. Implementing a right to explanation that opens the ‘black box’ of algorithmic decision-making faces major legal and technical barriers. Explaining the functionality of complex algorithmic decision-making systems and their rationale in specific cases is a technically challenging problem. Some explanations may offer little meaningful information to data subjects, raising questions around their value. Data controllers have an interest to not disclose information about their algorithms that contains trade secrets, violates the rights and freedoms of others (e.g. privacy), or allows data subjects to game or manipulate decision-making. Explanations of automated decisions need not hinge on the general public understanding how algorithmic systems function. Even though such interpretability is of great importance and should be pursued, explanations can, in principle, be offered without opening the black box. Looking at explanations as a means to help a data subject act rather than merely understand, one could gauge the scope and content of explanations according to the specific goal or action they are intended to support. From the perspective of individuals affected by automated decision-making, we propose three aims for explanations: (1) to inform and help the individual understand why a particular decision was reached, (2) to provide grounds to contest the decision if the outcome is undesired, and (3) to understand what would need to change in order to receive a desired result in the future, based on the current decision-making model. We assess how each of these goals finds support in the GDPR, and the extent to which they hinge on opening the ‘black box’. We suggest data controllers should offer a particular type of explanation, ‘unconditional counterfactual explanations’, to support these three aims. These counterfactual explanations describe the smallest change to the world that can be made to obtain a desirable outcome, or to arrive at the “closest possible world.” As multiple variables or sets of variables can lead to one or more desirable outcomes, multiple counterfactual explanations can be provided, corresponding to different choices of nearby possible worlds for which the counterfactual holds. Counterfactuals describe a dependency on the external facts that lead to that decision without the need to convey the internal state or logic of an algorithm. As a result, counterfactuals serve as a minimal solution that bypasses the current technical limitations of interpretability, while striking a balance between transparency and the rights and freedoms of others (e.g. privacy, trade secrets).

1,167 citations


Proceedings Article
17 Jul 2017
TL;DR: This paper argues for the fundamental importance of the value distribution: the distribution of the random return received by a reinforcement learning agent, and designs a new algorithm which applies Bellman's equation to the learning of approximate value distributions.
Abstract: In this paper we argue for the fundamental importance of the value distribution: the distribution of the random return received by a reinforcement learning agent. This is in contrast to the common approach to reinforcement learning which models the expectation of this return, or value. Although there is an established body of literature studying the value distribution, thus far it has always been used for a specific purpose such as implementing risk-aware behaviour. We begin with theoretical results in both the policy evaluation and control settings, exposing a significant distributional instability in the latter. We then use the distributional perspective to design a new algorithm which applies Bellman's equation to the learning of approximate value distributions. We evaluate our algorithm using the suite of games from the Arcade Learning Environment. We obtain both state-of-the-art results and anecdotal evidence demonstrating the importance of the value distribution in approximate reinforcement learning. Finally, we combine theoretical and empirical evidence to highlight the ways in which the value distribution impacts learning in the approximate setting.

708 citations


Proceedings ArticleDOI
13 Feb 2017
TL;DR: This article proposed a bilateral multi-perspective matching (BiMPM) model under the "matching-aggregation" framework, which first encodes two sentences with a BiLSTM encoder and then matches the two encoded sentences in two directions.
Abstract: Natural language sentence matching is a fundamental technology for a variety of tasks. Previous approaches either match sentences from a single direction or only apply single granular (word-by-word or sentence-by-sentence) matching. In this work, we propose a bilateral multi-perspective matching (BiMPM) model under the "matching-aggregation" framework. Given two sentences $P$ and $Q$, our model first encodes them with a BiLSTM encoder. Next, we match the two encoded sentences in two directions $P \rightarrow Q$ and $P \leftarrow Q$. In each matching direction, each time step of one sentence is matched against all time-steps of the other sentence from multiple perspectives. Then, another BiLSTM layer is utilized to aggregate the matching results into a fix-length matching vector. Finally, based on the matching vector, the decision is made through a fully connected layer. We evaluate our model on three tasks: paraphrase identification, natural language inference and answer sentence selection. Experimental results on standard benchmark datasets show that our model achieves the state-of-the-art performance on all tasks.

563 citations


Journal ArticleDOI
TL;DR: This update to their original paper discusses some of the changes as Amazon has grown, which help customers discover items they might otherwise not have found.
Abstract: Amazon is well-known for personalization and recommendations, which help customers discover items they might otherwise not have found. In this update to their original paper, the authors discuss some of the changes as Amazon has grown.

439 citations


Posted Content
TL;DR: This work proposes a bilateral multi-perspective matching (BiMPM) model under the "matching-aggregation" framework that achieves the state-of-the-art performance on all tasks.
Abstract: Natural language sentence matching is a fundamental technology for a variety of tasks. Previous approaches either match sentences from a single direction or only apply single granular (word-by-word or sentence-by-sentence) matching. In this work, we propose a bilateral multi-perspective matching (BiMPM) model under the "matching-aggregation" framework. Given two sentences $P$ and $Q$, our model first encodes them with a BiLSTM encoder. Next, we match the two encoded sentences in two directions $P \rightarrow Q$ and $P \leftarrow Q$. In each matching direction, each time step of one sentence is matched against all time-steps of the other sentence from multiple perspectives. Then, another BiLSTM layer is utilized to aggregate the matching results into a fix-length matching vector. Finally, based on the matching vector, the decision is made through a fully connected layer. We evaluate our model on three tasks: paraphrase identification, natural language inference and answer sentence selection. Experimental results on standard benchmark datasets show that our model achieves the state-of-the-art performance on all tasks.

427 citations


Proceedings Article
01 Jan 2017
TL;DR: This work proposes a transfer framework for the scenario where the reward function changes between tasks but the environment's dynamics remain the same, and derives two theorems that set the approach in firm theoretical ground and presents experiments that show that it successfully promotes transfer in practice.
Abstract: Transfer in reinforcement learning refers to the notion that generalization should occur not only within a task but also across tasks. We propose a transfer framework for the scenario where the reward function changes between tasks but the environment's dynamics remain the same. Our approach rests on two key ideas: "successor features", a value function representation that decouples the dynamics of the environment from the rewards, and "generalized policy improvement", a generalization of dynamic programming's policy improvement operation that considers a set of policies rather than a single one. Put together, the two ideas lead to an approach that integrates seamlessly within the reinforcement learning framework and allows the free exchange of information across tasks. The proposed method also provides performance guarantees for the transferred policy even before any learning has taken place. We derive two theorems that set our approach in firm theoretical ground and present experiments that show that it successfully promotes transfer in practice, significantly outperforming alternative methods in a sequence of navigation tasks and in the control of a simulated robotic arm.

341 citations


Journal ArticleDOI
TL;DR: A probabilistic framework based on Gaussian process regression and nonlinear autoregressive schemes that is capable of learning complex nonlinear and space-dependent cross-correlations between models of variable fidelity, and can effectively safeguard against low-fidelity models that provide wrong trends is put forth.
Abstract: Multi-fidelity modelling enables accurate inference of quantities of interest by synergistically combining realizations of low-cost/low-fidelity models with a small set of high-fidelity observations. This is particularly effective when the low- and high-fidelity models exhibit strong correlations, and can lead to significant computational gains over approaches that solely rely on high-fidelity models. However, in many cases of practical interest, low-fidelity models can only be well correlated to their high-fidelity counterparts for a specific range of input parameters, and potentially return wrong trends and erroneous predictions if probed outside of their validity regime. Here we put forth a probabilistic framework based on Gaussian process regression and nonlinear autoregressive schemes that is capable of learning complex nonlinear and space-dependent cross-correlations between models of variable fidelity, and can effectively safeguard against low-fidelity models that provide wrong trends. This introduces a new class of multi-fidelity information fusion algorithms that provide a fundamental extension to the existing linear autoregressive methodologies, while still maintaining the same algorithmic complexity and overall computational cost. The performance of the proposed methods is tested in several benchmark problems involving both synthetic and real multi-fidelity datasets from computational fluid dynamics simulations.

259 citations


Proceedings Article
08 Feb 2017
TL;DR: This research is a testament that Neural Networks could be robust classifiers for brain signals, even outperforming traditional learning techniques.
Abstract: Emotion recognition is an important field of research in Brain Computer Interactions. As technology and the understanding of emotions are advancing, there are growing opportunities for automatic emotion recognition systems. Neural networks are a family of statistical learning models inspired by biological neural networks and are used to estimate functions that can depend on a large number of inputs that are generally unknown. In this paper we seek to use this effectiveness of Neural Networks to classify user emotions using EEG signals from the DEAP (Koelstra et al (2012)) dataset which represents the benchmark for Emotion classification research. We explore 2 different Neural Models, a simple Deep Neural Network and a Convolutional Neural Network for classification. Our model provides the state-of-the-art classification accuracy, obtaining 4.51 and 4.96 percentage point improvements over (Rozgic et al (2013)) classification of Valence and Arousal into 2 classes (High and Low) and 13.39 and 6.58 percentage point improvements over (Chung and Yoon(2012)) classification of Valence and Arousal into 3 classes (High, Normal and Low). Moreover our research is a testament that Neural Networks could be robust classifiers for brain signals, even outperforming traditional learning techniques.

239 citations


Proceedings Article
06 Aug 2017
TL;DR: It is shown that computational time can be dramatically reduced by exploiting the fact that many examples can be correctly classified using relatively efficient networks and that complex, computationally costly networks are only necessary for a small fraction of examples.
Abstract: We present an approach to adaptively utilize deep neural networks in order to reduce the evaluation time on new examples without loss of accuracy. Rather than attempting to redesign or approximate existing networks, we propose two schemes that adaptively utilize networks. We first pose an adaptive network evaluation scheme, where we learn a system to adaptively choose the components of a deep network to be evaluated for each example. By allowing examples correctly classified using early layers of the system to exit, we avoid the computational time associated with full evaluation of the network. We extend this to learn a network selection system that adaptively selects the network to be evaluated for each example. We show that computational time can be dramatically reduced by exploiting the fact that many examples can be correctly classified using relatively efficient networks and that complex, computationally costly networks are only necessary for a small fraction of examples. We pose a global objective for learning an adaptive early exit or network selection policy and solve it by reducing the policy learning problem to a layer-by-layer weighted binary classification problem. Empirically, these approaches yield dramatic reductions in computational cost, with up to a 2.8x speedup on state-of-the-art networks from the ImageNet image recognition challenge with minimal (< 1%) loss of top5 accuracy.

235 citations


Journal ArticleDOI
Martin J. P. Sullivan1, Joey Talbot1, Simon L. Lewis1, Simon L. Lewis2, Oliver L. Phillips1, Lan Qie1, Serge K. Begne3, Serge K. Begne1, Jérôme Chave4, Aida Cuni-Sanchez2, Wannes Hubau1, Gabriela Lopez-Gonzalez1, Lera Miles5, Abel Monteagudo-Mendoza6, Bonaventure Sonké3, Terry Sunderland7, Terry Sunderland8, Hans ter Steege9, Hans ter Steege10, Lee J. T. White11, Kofi Affum-Baffoe12, Shin-ichiro Aiba13, Everton Cristo de Almeida14, Edmar Almeida de Oliveira15, Patricia Alvarez-Loayza16, Esteban Alvarez Dávila, Ana Andrade17, Luiz E. O. C. Aragão18, Peter S. Ashton19, Gerardo A. Aymard C, Timothy R. Baker1, Michael Balinga8, Lindsay F. Banin, Christopher Baraloto20, Jean-François Bastin, Nicholas J. Berry21, Jan Bogaert22, Damien Bonal23, Frans Bongers24, Roel J. W. Brienen1, José Luís Camargo17, Carlos Cerón25, Victor Chama Moscoso6, Eric Chezeaux, Connie J. Clark16, Alvaro Cogollo Pacheco, James A. Comiskey26, James A. Comiskey27, Fernando Cornejo Valverde28, Eurídice N. Honorio Coronado28, Greta C. Dargie1, Stuart J. Davies29, Charles De Cannière30, Marie Noel Djuikouo K.31, Jean-Louis Doucet22, Terry L. Erwin26, Javier Silva Espejo6, Corneille E. N. Ewango32, Sophie Fauset1, Sophie Fauset33, Ted R. Feldpausch18, Rafael Herrera34, Rafael Herrera35, Martin Gilpin1, Emanuel Gloor1, Jefferson S. Hall29, David Harris36, Terese B. Hart37, Kuswata Kartawinata38, Lip Khoon Kho39, Kanehiro Kitayama40, Susan G. Laurance7, William F. Laurance7, Miguel E. Leal32, Thomas E. Lovejoy41, Jon C. Lovett1, Faustin Mpanya Lukasu42, Jean-Remy Makana32, Yadvinder Malhi43, Leandro Maracahipes44, Beatriz Schwantes Marimon15, Ben Hur Marimon Junior15, Andrew R. Marshall45, Paulo S. Morandi15, John Tshibamba Mukendi42, Jaques Mukinzi32, Reuben Nilus, Percy Núñez Vargas6, Nadir Pallqui Camacho6, Guido Pardo, Marielos Peña-Claros24, Pascal Petronelli, Georgia Pickavance1, Axel Dalberg Poulsen37, John R. Poulsen16, Richard B. Primack46, H. Priyadi47, H. Priyadi8, Carlos A. Quesada17, Jan Reitsma, Maxime Réjou-Méchain4, Zorayda Restrepo, Ervan Rutishauser, Kamariah Abu Salim48, Rafael de Paiva Salomão49, Ismayadi Samsoedin50, Douglas Sheil51, Douglas Sheil8, Rodrigo Sierra, Marcos Silveira52, J. W. Ferry Slik, Lisa Steel53, Hermann Taedoumg3, Sylvester Tan19, John Terborgh16, Sean C. Thomas54, Marisol Toledo, Peter M. Umunay55, Luis Valenzuela Gamarra, Ima Célia Guimarães Vieira49, Vincent A. Vos, Ophelia Wang56, Simon Willcock57, Simon Willcock58, Lise Zemagho3 
University of Leeds1, University College London2, University of Yaoundé I3, Paul Sabatier University4, United Nations Environment Programme5, National University of Saint Anthony the Abbot in Cuzco6, James Cook University7, Center for International Forestry Research8, Naturalis9, Utrecht University10, University of Stirling11, Forestry Commission12, Kagoshima University13, Federal University of Western Pará14, Universidade do Estado de Mato Grosso15, Duke University16, National Institute of Amazonian Research17, University of Exeter18, Harvard University19, Florida International University20, University of Edinburgh21, Gembloux Agro-Bio Tech22, Institut national de la recherche agronomique23, Wageningen University and Research Centre24, Central University of Ecuador25, Smithsonian Institution26, National Park Service27, Amazon.com28, Smithsonian Tropical Research Institute29, Université libre de Bruxelles30, University of Buea31, Wildlife Conservation Society32, State University of Campinas33, Venezuelan Institute for Scientific Research34, University of Vienna35, Royal Botanic Garden Edinburgh36, American Museum of Natural History37, Indonesian Institute of Sciences38, Malaysian Palm Oil Board39, Kyoto University40, George Mason University41, University of Kisangani42, University of Oxford43, Universidade Federal de Goiás44, University of York45, Boston University46, Swedish University of Agricultural Sciences47, Universiti Brunei Darussalam48, Museu Paraense Emílio Goeldi49, Ministry of Forestry50, Norwegian University of Life Sciences51, Universidade Federal do Acre52, World Wide Fund for Nature53, University of Toronto54, Yale University55, Northern Arizona University56, Bangor University57, University of Southampton58
TL;DR: In this article, a pan-tropical dataset of 360 plots located in structurally intact old-growth closed-canopy forest, surveyed using standardised methods, allowing a multi-scale evaluation of diversity-carbon relationships in tropical forests.
Abstract: Tropical forests are global centres of biodiversity and carbon storage. Many tropical countries aspire to protect forest to fulfil biodiversity and climate mitigation policy targets, but the conservation strategies needed to achieve these two functions depend critically on the tropical forest tree diversity-carbon storage relationship. Assessing this relationship is challenging due to the scarcity of inventories where carbon stocks in aboveground biomass and species identifications have been simultaneously and robustly quantified. Here, we compile a unique pan-tropical dataset of 360 plots located in structurally intact old-growth closed-canopy forest, surveyed using standardised methods, allowing a multi-scale evaluation of diversity-carbon relationships in tropical forests. Diversity-carbon relationships among all plots at 1 ha scale across the tropics are absent, and within continents are either weak (Asia) or absent (Amazonia, Africa). A weak positive relationship is detectable within 1 ha plots, indicating that diversity effects in tropical forests may be scale dependent. The absence of clear diversity-carbon relationships at scales relevant to conservation planning means that carbon-centred conservation strategies will inevitably miss many high diversity ecosystems. As tropical forests can have any combination of tree diversity and carbon stocks both require explicit consideration when optimising policies to manage tropical carbon and biodiversity.

222 citations


Proceedings Article
01 Jan 2017
TL;DR: The architecture of Peloton is presented, the first selfdriving DBMS, which enables new optimizations that are important for modern high-performance DBMSs, but which are not possible today because the complexity of managing these systems has surpassed the abilities of human experts.
Abstract: In the last two decades, both researchers and vendors have built advisory tools to assist database administrators (DBAs) in various aspects of system tuning and physical design. Most of this previous work, however, is incomplete because they still require humans to make the final decisions about any changes to the database and are reactionary measures that fix problems after they occur. What is needed for a truly “self-driving” database management system (DBMS) is a new architecture that is designed for autonomous operation. This is different than earlier attempts because all aspects of the system are controlled by an integrated planning component that not only optimizes the system for the current workload, but also predicts future workload trends so that the system can prepare itself accordingly. With this, the DBMS can support all of the previous tuning techniques without requiring a human to determine the right way and proper time to deploy them. It also enables new optimizations that are important for modern high-performance DBMSs, but which are not possible today because the complexity of managing these systems has surpassed the abilities of human experts. This paper presents the architecture of Peloton, the first selfdriving DBMS. Peloton’s autonomic capabilities are now possible due to algorithmic advancements in deep learning, as well as improvements in hardware and adaptive database architectures.

Proceedings ArticleDOI
21 Jul 2017
TL;DR: The proposed system, called DenseReg, allows us to estimate dense image-to-template correspondences in a fully convolutional manner and can provide useful correspondence information as a stand-alone system, while when used as an initialization for Statistical Deformable Models the authors obtain landmark localization results that largely outperform the current state-of-the-art on the challenging 300W benchmark.
Abstract: In this paper we propose to learn a mapping from image pixels into a dense template grid through a fully convolutional network. We formulate this task as a regression problem and train our network by leveraging upon manually annotated facial landmarks in-the-wild. We use such landmarks to establish a dense correspondence field between a three-dimensional object template and the input image, which then serves as the ground-truth for training our regression system. We show that we can combine ideas from semantic segmentation with regression networks, yielding a highly-accurate quantized regression architecture. Our system, called DenseReg, allows us to estimate dense image-to-template correspondences in a fully convolutional manner. As such our network can provide useful correspondence information as a stand-alone system, while when used as an initialization for Statistical Deformable Models we obtain landmark localization results that largely outperform the current state-of-the-art on the challenging 300W benchmark. We thoroughly evaluate our method on a host of facial analysis tasks, and demonstrate its use for other correspondence estimation tasks, such as the human body and the human ear. DenseReg code is made available at http://alpguler.com/DenseReg.html along with supplementary materials.

Proceedings Article
31 Aug 2017
TL;DR: The experiments show that transfer learning helps word-based translation only slightly, but when used on top of a much stronger BPE baseline, it yields larger improvements of up to 4.3 BLEU.
Abstract: We present a simple method to improve neural translation of a low-resource language pair using parallel data from a related, also low-resource, language pair. The method is based on the transfer method of Zoph et al., but whereas their method ignores any source vocabulary overlap, ours exploits it. First, we split words using Byte Pair Encoding (BPE) to increase vocabulary overlap. Then, we train a model on the first language pair and transfer its parameters, including its source word embeddings, to another model and continue training on the second language pair. Our experiments show that transfer learning helps word-based translation only slightly, but when used on top of a much stronger BPE baseline, it yields larger improvements of up to 4.3 BLEU.

Proceedings ArticleDOI
01 Aug 2017
TL;DR: In this article, the authors combine deep learning with active learning and show that they can outperform classical methods even with a significantly smaller amount of training data than a large dataset or a large budget for manually labeling data.
Abstract: Deep neural networks have advanced the state of the art in named entity recognition. However, under typical training procedures, advantages over classical methods emerge only with large datasets. As a result, deep learning is employed only when large public datasets or a large budget for manually labeling data is available. In this work, we show otherwise: by combining deep learning with active learning, we can outperform classical methods even with a significantly smaller amount of training data.

Proceedings ArticleDOI
09 May 2017
TL;DR: This paper describes the architecture of Aurora and the design considerations leading to that architecture, and describes how Aurora achieves consensus on durable state across numerous storage nodes using an efficient asynchronous scheme, avoiding expensive and chatty recovery protocols.
Abstract: Amazon Aurora is a relational database service for OLTP workloads offered as part of Amazon Web Services (AWS). In this paper, we describe the architecture of Aurora and the design considerations leading to that architecture. We believe the central constraint in high throughput data processing has moved from compute and storage to the network. Aurora brings a novel architecture to the relational database to address this constraint, most notably by pushing redo processing to a multi-tenant scale-out storage service, purpose-built for Aurora. We describe how doing so not only reduces network traffic, but also allows for fast crash recovery, failovers to replicas without loss of data, and fault-tolerant, self-healing storage. We then describe how Aurora achieves consensus on durable state across numerous storage nodes using an efficient asynchronous scheme, avoiding expensive and chatty recovery protocols. Finally, having operated Aurora as a production service for over 18 months, we share the lessons we have learnt from our customers on what modern cloud applications expect from databases.

Proceedings ArticleDOI
01 Oct 2017
TL;DR: In this paper, the authors propose to factorize the convolutional layer to reduce its computation, which can effectively preserve the spatial information and maintain the accuracy with significantly less computation.
Abstract: In this paper, we propose to factorize the convolutional layer to reduce its computation. The 3D convolution operation in a convolutional layer can be considered as performing spatial convolution in each channel and linear projection across channels simultaneously. By unravelling them and arranging the spatial convolutions sequentially, the proposed layer is composed of a low-cost single intra-channel convolution and a linear channel projection. When combined with residual connection, it can effectively preserve the spatial information and maintain the accuracy with significantly less computation. We also introduce a topological subdivisioning to reduce the connection between the input and output channels. Our experiments demonstrate that the proposed layers outperform the standard convolutional layers on performance/complexity ratio. Our models achieve similar performance to VGG-16, ResNet-34, ResNet-50, ResNet-101 while requiring 42x,7.32x,4.38x,5.85x less computation respectively.

Posted Content
TL;DR: The authors proposed a unified deep learning architecture and an end-to-end variational learning algorithm which can handle noise in questions, and learn multi-hop reasoning simultaneously, which achieves state-of-the-art performance on a recent benchmark dataset in the literature.
Abstract: Knowledge graph (KG) is known to be helpful for the task of question answering (QA), since it provides well-structured relational information between entities, and allows one to further infer indirect facts. However, it is challenging to build QA systems which can learn to reason over knowledge graphs based on question-answer pairs alone. First, when people ask questions, their expressions are noisy (for example, typos in texts, or variations in pronunciations), which is non-trivial for the QA system to match those mentioned entities to the knowledge graph. Second, many questions require multi-hop logic reasoning over the knowledge graph to retrieve the answers. To address these challenges, we propose a novel and unified deep learning architecture, and an end-to-end variational learning algorithm which can handle noise in questions, and learn multi-hop reasoning simultaneously. Our method achieves state-of-the-art performance on a recent benchmark dataset in the literature. We also derive a series of new benchmark datasets, including questions for multi-hop reasoning, questions paraphrased by neural translation model, and questions in human voice. Our method yields very promising results on all these challenging datasets.

Journal ArticleDOI
01 Oct 2017
TL;DR: A hybrid multi-party computation protocol that combines Yao’s garbled circuits with tailored protocols for computing inner products is proposed, suitable for secure computation because it uses an efficient fixed-point representation of real numbers while maintaining accuracy and convergence rates comparable to what can be obtained with a classical solution using floating point numbers.
Abstract: We propose privacy-preserving protocols for computing linear regression models, in the setting where the training dataset is vertically distributed among several parties. Our main contribution is a hybrid multi-party computation protocol that combines Yao’s garbled circuits with tailored protocols for computing inner products. Like many machine learning tasks, building a linear regression model involves solving a system of linear equations. We conduct a comprehensive evaluation and comparison of different techniques for securely performing this task, including a new Conjugate Gradient Descent (CGD) algorithm. This algorithm is suitable for secure computation because it uses an efficient fixed-point representation of real numbers while maintaining accuracy and convergence rates comparable to what can be obtained with a classical solution using floating point numbers. Our technique improves on Nikolaenko et al.’s method for privacy-preserving ridge regression (S&P 2013), and can be used as a building block in other analyses. We implement a complete system and demonstrate that our approach is highly scalable, solving data analysis problems with one million records and one hundred features in less than one hour of total running time.

Journal ArticleDOI
TL;DR: In this paper, the authors propose an offline algorithm that solves for the optimal configuration in a specific look-ahead time-window, and an online approximation algorithm with polynomial time-complexity to find the placement in real-time whenever an instance arrives.
Abstract: Mobile micro-clouds are promising for enabling performance-critical cloud applications. However, one challenge therein is the dynamics at the network edge. In this paper, we study how to place service instances to cope with these dynamics, where multiple users and service instances coexist in the system. Our goal is to find the optimal placement (configuration) of instances to minimize the average cost over time, leveraging the ability of predicting future cost parameters with known accuracy. We first propose an offline algorithm that solves for the optimal configuration in a specific look-ahead time-window. Then, we propose an online approximation algorithm with polynomial time-complexity to find the placement in real-time whenever an instance arrives. We analytically show that the online algorithm is $O(1)$ -competitive for a broad family of cost functions. Afterwards, the impact of prediction errors is considered and a method for finding the optimal look-ahead window size is proposed, which minimizes an upper bound of the average actual cost. The effectiveness of the proposed approach is evaluated by simulations with both synthetic and real-world (San Francisco taxi) user-mobility traces. The theoretical methodology used in this paper can potentially be applied to a larger class of dynamic resource allocation problems.

Proceedings ArticleDOI
02 Feb 2017
TL;DR: This work presents a model based on Long-Short Term Memory to estimate when a user will return to a site and what their future listening behavior will be, and shows that the resulting multitask problem can be solved accurately, when applied to two real-world datasets.
Abstract: The ability to predict future user activity is invaluable when it comes to content recommendation and personalization. For instance, knowing when users will return to an online music service and what they will listen to increases user satisfaction and therefore user retention. We present a model based on Long-Short Term Memory to estimate when a user will return to a site and what their future listening behavior will be. In doing so, we aim to solve the problem of Just-In-Time recommendation, that is, to recommend the right items at the right time. We use tools from survival analysis for return time prediction and exponential families for future activity analysis. We show that the resulting multitask problem can be solved accurately, when applied to two real-world datasets.

Posted Content
TL;DR: In this article, a CoMatch layer was introduced to match the second order feature statistics with the target styles, which achieved real-time brush-size control in a purely feed-forward manner for style transfer.
Abstract: Despite the rapid progress in style transfer, existing approaches using feed-forward generative network for multi-style or arbitrary-style transfer are usually compromised of image quality and model flexibility. We find it is fundamentally difficult to achieve comprehensive style modeling using 1-dimensional style embedding. Motivated by this, we introduce CoMatch Layer that learns to match the second order feature statistics with the target styles. With the CoMatch Layer, we build a Multi-style Generative Network (MSG-Net), which achieves real-time performance. We also employ an specific strategy of upsampled convolution which avoids checkerboard artifacts caused by fractionally-strided convolution. Our method has achieved superior image quality comparing to state-of-the-art approaches. The proposed MSG-Net as a general approach for real-time style transfer is compatible with most existing techniques including content-style interpolation, color-preserving, spatial control and brush stroke size control. MSG-Net is the first to achieve real-time brush-size control in a purely feed-forward manner for style transfer. Our implementations and pre-trained models for Torch, PyTorch and MXNet frameworks will be publicly available.

Journal ArticleDOI
23 Aug 2017-PLOS ONE
TL;DR: Together, these six dams are predicted to reduce the supply of sediments, phosphorus and nitrogen from the Andean region and to the entire Amazon basin by 64, 51 and 23%, respectively, which will have major impacts on channel geomorphology, floodplain fertility and aquatic productivity.
Abstract: Increased energy demand has led to plans for building many new dams in the western Amazon, mostly in the Andean region. Historical data and mechanistic scenarios are used to examine potential impacts above and below six of the largest dams planned for the region, including reductions in downstream sediment and nutrient supplies, changes in downstream flood pulse, changes in upstream and downstream fish yields, reservoir siltation, greenhouse gas emissions and mercury contamination. Together, these six dams are predicted to reduce the supply of sediments, phosphorus and nitrogen from the Andean region by 69, 67 and 57% and to the entire Amazon basin by 64, 51 and 23%, respectively. These large reductions in sediment and nutrient supplies will have major impacts on channel geomorphology, floodplain fertility and aquatic productivity. These effects will be greatest near the dams and extend to the lowland floodplains. Attenuation of the downstream flood pulse is expected to alter the survival, phenology and growth of floodplain vegetation and reduce fish yields below the dams. Reservoir filling times due to siltation are predicted to vary from 106-6240 years, affecting the storage performance of some dams. Total CO2 equivalent carbon emission from 4 Andean dams was expected to average 10 Tg y-1 during the first 30 years of operation, resulting in a MegaWatt weighted Carbon Emission Factor of 0.139 tons C MWhr-1. Mercury contamination in fish and local human populations is expected to increase both above and below the dams creating significant health risks. Reservoir fish yields will compensate some downstream losses, but increased mercury contamination could offset these benefits.

Journal ArticleDOI
Adriane Esquivel-Muelbert1, Timothy R. Baker1, Kyle G. Dexter2, Simon L. Lewis3, Simon L. Lewis1, Hans ter Steege4, Gabriela Lopez-Gonzalez1, Abel Monteagudo Mendoza, Roel J. W. Brienen1, Ted R. Feldpausch5, Nigel C. A. Pitman6, Alfonso Alonso7, Geertje M. F. van der Heijden8, Marielos Peña-Claros9, Manuel Ahuite, Miguel Alexiaides10, Esteban Alvarez Dávila, Alejandro Araujo Murakami, Luzmila Arroyo, Milton Aulestia, Henrik Balslev11, Jorcely Barroso, René G. A. Boot12, Ángela Cano, Victor Chama Moscoso, James A. Comiskey13, Fernando Cornejo14, Francisco Dallmeier7, Douglas C. Daly15, Nállarett Dávila, Joost F. Duivenvoorden16, Álvaro Javier Duque Montoya, Terry L. Erwin17, Anthony Di Fiore18, Todd S. Fredericksen, Alfredo F. Fuentes19, Roosevelt García-Villacorta2, Therany Gonzales, Juan Ernesto Guevara Andino20, Eurídice N. Honorio Coronado, Isau Huamantupa-Chuquimaco, Rojas Eliana Maria Jiménez, Timothy J. Killeen21, Yadvinder Malhi, Casimiro Mendoza22, Hugo Mogollón, Peter M. Jørgensen19, Juan Carlos Montero23, Bonifacio Mostacedo24, William Nauray25, David A. Neill, Percy Núñez Vargas25, Sonia Palacios, Walter Palacios Cuenca, Nadir Pallqui Camacho25, Julie Peacock1, Juan Fernando Phillips, Georgia Pickavance1, Carlos A. Quesada, Hirma Ramírez-Angulo, Zorayda Restrepo, Carlos Reynel Rodriguez26, Marcos Ríos Paredes, Maria Cristina Peñuela-Mora, Rodrigo Sierra, Marcos Silveira, Pablo R. Stevenson, Juliana Stropp27, John Terborgh6, Milton Tirado18, Marisol Toledo, Armando Torres-Lezama, María Natalia Umaña28, Ligia E. Urrego, Rodolfo Vásquez Martínez, Luis Valenzuela Gamarra, César I.A. Vela, Emilio Vilanova Torre, Vincent A. Vos29, Patricio von Hildebrand, Corine Vriesendorp, Ophelia Wang30, Kenneth R. Young18, Charles E. Zartman, Oliver L. Phillips1 
TL;DR: It is found that the distributions of tree taxa are indeed nested along precipitation gradients in the western Neotropics, and the results suggest that the ‘dry tolerance’ hypothesis has broad applicability in the world's most species-rich forests.
Abstract: Within the tropics, the species richness of tree communities is strongly and positively associated with precipitation. Previous research has suggested that this macroecological pattern is driven by the negative effect of water-stress on the physiological processes of most tree species. This process implies that the range limits of taxa are defined by their ability to occur under dry conditions, and thus in terms of species distributions it predicts a nested pattern of taxa distribution from wet to dry areas. However, this ‘dry-tolerance’ hypothesis has yet to be adequately tested at large spatial and taxonomic scales. Here, using a dataset of 531 inventory plots of closed canopy forest distributed across the Western Neotropics we investigated how precipitation, evaluated both as mean annual precipitation and as the maximum climatological water deficit, influences the distribution of tropical tree species, genera and families. We find that the distributions of tree taxa are indeed nested along precipitation gradients in the western Neotropics. Taxa tolerant to seasonal drought are disproportionally widespread across the precipitation gradient, with most reaching even the wettest climates sampled; however, most taxa analysed are restricted to wet areas. Our results suggest that the ‘dry tolerance’ hypothesis has broad applicability in the world's most species-rich forests. In addition, the large number of species restricted to wetter conditions strongly indicates that an increased frequency of drought could severely threaten biodiversity in this region. Overall, this study establishes a baseline for exploring how tropical forest tree composition may change in response to current and future environmental changes in this region.

Proceedings ArticleDOI
Roy Bar-Haim1, Indrajit Bhattacharya1, Francesco Dinuzzo2, Amrita Saha1, Noam Slonim1 
01 Apr 2017
TL;DR: This work introduces the complementary task of Claim Stance Classification, along with the first benchmark dataset for this task, and describes an implementation of the model, focusing on a novel algorithm for contrast detection.
Abstract: Recent work has addressed the problem of detecting relevant claims for a given controversial topic. We introduce the complementary task of Claim Stance Classification, along with the first benchmark dataset for this task. We decompose this problem into: (a) open-domain target identification for topic and claim (b) sentiment classification for each target, and (c) open-domain contrast detection between the topic and the claim targets. Manual annotation of the dataset confirms the applicability and validity of our model. We describe an implementation of our model, focusing on a novel algorithm for contrast detection. Our approach achieves promising results, and is shown to outperform several baselines, which represent the common practice of applying a single, monolithic classifier for stance classification.

Proceedings ArticleDOI
01 Sep 2017
TL;DR: Evaluating on POS datasets from 14 languages in the Universal Dependencies corpus, it is shown that the proposed transfer learning model improves the POS tagging performance of the target languages without exploiting any linguistic knowledge between the source language and the target language.
Abstract: Training a POS tagging model with crosslingual transfer learning usually requires linguistic knowledge and resources about the relation between the source language and the target language. In this paper, we introduce a cross-lingual transfer learning model for POS tagging without ancillary resources such as parallel corpora. The proposed cross-lingual model utilizes a common BLSTM that enables knowledge transfer from other languages, and private BLSTMs for language-specific representations. The cross-lingual model is trained with language-adversarial training and bidirectional language modeling as auxiliary objectives to better represent language-general information while not losing the information about a specific target language. Evaluating on POS datasets from 14 languages in the Universal Dependencies corpus, we show that the proposed transfer learning model improves the POS tagging performance of the target languages without exploiting any linguistic knowledge between the source language and the target language.

Proceedings ArticleDOI
Zijun Yao1, Yifan Sun, Weicong Ding2, Nikhil Rao2, Hui Xiong1 
TL;DR: A dynamic statistical model is developed that simultaneously learns time-aware embeddings and solves the resulting alignment problem and consistently outperforms state-of-the-art temporal embedding approaches on both semantic accuracy and alignment quality.
Abstract: Word evolution refers to the changing meanings and associations of words throughout time, as a byproduct of human language evolution. By studying word evolution, we can infer social trends and language constructs over different periods of human history. However, traditional techniques such as word representation learning do not adequately capture the evolving language structure and vocabulary. In this paper, we develop a dynamic statistical model to learn time-aware word vector representation. We propose a model that simultaneously learns time-aware embeddings and solves the resulting "alignment problem". This model is trained on a crawled NYTimes dataset. Additionally, we develop multiple intuitive evaluation strategies of temporal word embeddings. Our qualitative and quantitative tests indicate that our method not only reliably captures this evolution over time, but also consistently outperforms state-of-the-art temporal embedding approaches on both semantic accuracy and alignment quality.

Posted Content
TL;DR: By combining deep learning with active learning, the authors can outperform classical methods even with a significantly smaller amount of training data, and this work shows otherwise.
Abstract: Deep learning has yielded state-of-the-art performance on many natural language processing tasks including named entity recognition (NER). However, this typically requires large amounts of labeled data. In this work, we demonstrate that the amount of labeled training data can be drastically reduced when deep learning is combined with active learning. While active learning is sample-efficient, it can be computationally expensive since it requires iterative retraining. To speed this up, we introduce a lightweight architecture for NER, viz., the CNN-CNN-LSTM model consisting of convolutional character and word encoders and a long short term memory (LSTM) tag decoder. The model achieves nearly state-of-the-art performance on standard datasets for the task while being computationally much more efficient than best performing models. We carry out incremental active learning, during the training process, and are able to nearly match state-of-the-art performance with just 25\% of the original training data.

Proceedings ArticleDOI
20 Aug 2017
TL;DR: This paper proposes to apply singular value decomposition (SVD) to further reduce TDNN complexity, and results show that the full-rank TDNN achieves a 19.7% DET AUC reduction compared to a similar-size deep neural network baseline.
Abstract: In this paper we investigate a time delay neural network (TDNN) for a keyword spotting task that requires low CPU, memory and latency. The TDNN is trained with transfer learning and multi-task learning. Temporal subsampling enabled by the time delay architecture reduces computational complexity. We propose to apply singular value decomposition (SVD) to further reduce TDNN complexity. This allows us to first train a larger full-rank TDNN model which is not limited byCPU/memory constraints. The larger TDNN usually achieves better performance. Afterwards, its size can be compressed by SVD to meet the budget requirements. Hidden Markov models (HMM) are used in conjunction with the networks to perform keyword detection and performance is measured in terms of area under the curve (AUC) for detection error tradeoff (DET) curves. Our experimental results on a large in-house far-field corpus show that the full-rank TDNN achieves a 19.7% DET AUC reduction compared to a similar-size deep neural network (DNN) baseline. If we train a larger size full-rank TDNN first and then reduce it via SVD to the comparable size of the DNN, we obtain a 37.6% reduction in DET AUC compared to the DNN baseline.

Proceedings ArticleDOI
26 Jul 2017
TL;DR: This paper proposes the first, to the best of the knowledge, in-the-wild 3DMM by combining a powerful statistical model of facial shape, which describes both identity and expression, with an in- the-wild texture model, and demonstrates the first 3D facial database with relatively unconstrained conditions.
Abstract: 3D Morphable Models (3DMMs) are powerful statistical models of 3D facial shape and texture, and among the state-of-the-art methods for reconstructing facial shape from single images. With the advent of new 3D sensors, many 3D facial datasets have been collected containing both neutral as well as expressive faces. However, all datasets are captured under controlled conditions. Thus, even though powerful 3D facial shape models can be learnt from such data, it is difficult to build statistical texture models that are sufficient to reconstruct faces captured in unconstrained conditions (in-the-wild). In this paper, we propose the first, to the best of our knowledge, in-the-wild 3DMM by combining a powerful statistical model of facial shape, which describes both identity and expression, with an in-the-wild texture model. We show that the employment of such an in-the-wild texture model greatly simplifies the fitting procedure, because there is no need to optimise with regards to the illumination parameters. Furthermore, we propose a new fast algorithm for fitting the 3DMM in arbitrary images. Finally, we have captured the first 3D facial database with relatively unconstrained conditions and report quantitative evaluations with state-of-the-art performance. Complementary qualitative reconstruction results are demonstrated on standard in-the-wild facial databases.

Journal ArticleDOI
TL;DR: In this paper, the authors used satellite data on cropland expansion, forest cover, and vegetation carbon stocks to estimate annual gross forest carbon emissions from croplands expansion in the Cerrado biome.
Abstract: Land use, land use change, and forestry accounted for two-thirds of Brazil's greenhouse gas emissions profile in 2005. Amazon deforestation has declined by more than 80% over the past decade, yet Brazil's forests extend beyond the Amazon biome. Rapid expansion of cropland in the neighboring Cerrado biome has the potential to undermine climate mitigation efforts if emissions from dry forest and woodland conversion negate some of the benefits of avoided Amazon deforestation. Here, we used satellite data on cropland expansion, forest cover, and vegetation carbon stocks to estimate annual gross forest carbon emissions from cropland expansion in the Cerrado biome. Nearly half of the Cerrado met Brazil's definition of forest cover in 2000 (greater than or equal to 0.5 ha with greater than or equal to 10% canopy cover). In areas of established crop production, conversion of both forest and non-forest Cerrado formations for cropland declined during 2003-2013. However, forest carbon emissions from cropland expansion increased over the past decade in Matopiba, a new frontier of agricultural production that includes portions of Maranhao, Tocantins, Piau­, and Bahia states. Gross carbon emissions from cropland expansion in the Cerrado averaged 16.28 Tg C yr (exp -1) between 2003 and 2013, with forest-to-cropland conversion accounting for 29% of emissions. The fraction of forest carbon emissions from Matopiba was much higher; between 2010-2013, large-scale cropland conversion in Matopiba contributed 45% of total Cerrado forest carbon emissions. Carbon emissions from Cerrado-tocropland transitions offset 5-7% of the avoided emissions from reduced Amazon deforestation rates during 2011-2013. Comprehensive national estimates of forest carbon fluxes, including all biomes, are critical to detect cross-biome leakage within countries and achieve climate mitigation targets to reduce emissions from land use, land use change, and forestry.