Showing papers by "IBM published in 2019"

PDF

Open Access

Journal Article•DOI•

Adaptive Federated Learning in Resource Constrained Edge Computing Systems

[...]

Shiqiang Wang¹, Tiffany Tuor², Theodoros Salonidis¹, Kin K. Leung², Christian Makaya¹, Ting He³, Kevin S. Chan⁴ - Show less +3 more•Institutions (4)

IBM¹, Imperial College London², Pennsylvania State University³, United States Army Research Laboratory⁴

11 Mar 2019-IEEE Journal on Selected Areas in Communications

TL;DR: In this paper, the authors consider the problem of learning model parameters from data distributed across multiple edge nodes, without sending raw data to a centralized place, and propose a control algorithm that determines the best tradeoff between local update and global parameter aggregation to minimize the loss function under a given resource budget.

...read moreread less

Abstract: Emerging technologies and applications including Internet of Things, social networking, and crowd-sourcing generate large amounts of data at the network edge. Machine learning models are often built from the collected data, to enable the detection, classification, and prediction of future events. Due to bandwidth, storage, and privacy concerns, it is often impractical to send all the data to a centralized location. In this paper, we consider the problem of learning model parameters from data distributed across multiple edge nodes, without sending raw data to a centralized place. Our focus is on a generic class of machine learning models that are trained using gradient-descent-based approaches. We analyze the convergence bound of distributed gradient descent from a theoretical point of view, based on which we propose a control algorithm that determines the best tradeoff between local update and global parameter aggregation to minimize the loss function under a given resource budget. The performance of the proposed algorithm is evaluated via extensive experiments with real datasets, both on a networked prototype system and in a larger-scale simulated environment. The experimentation results show that our proposed approach performs near to the optimum with various machine learning models and different data distributions.

...read moreread less

1,441 citations

Journal Article•DOI•

Supervised learning with quantum-enhanced feature spaces.

[...]

Vojtěch Havlíček¹, Vojtěch Havlíček², Antonio Corcoles¹, Kristan Temme¹, Aram W. Harrow³, Abhinav Kandala¹, Jerry M. Chow¹, Jay M. Gambetta¹ - Show less +4 more•Institutions (3)

IBM¹, University of Oxford², Massachusetts Institute of Technology³

13 Mar 2019-Nature

TL;DR: In this article, two quantum algorithms for machine learning on a superconducting processor are proposed and experimentally implemented, using a variational quantum circuit to classify the data in a way similar to the method of conventional SVMs.

...read moreread less

Abstract: Machine learning and quantum computing are two technologies that each have the potential to alter how computation is performed to address previously untenable problems. Kernel methods for machine learning are ubiquitous in pattern recognition, with support vector machines (SVMs) being the best known method for classification problems. However, there are limitations to the successful solution to such classification problems when the feature space becomes large, and the kernel functions become computationally expensive to estimate. A core element in the computational speed-ups enabled by quantum algorithms is the exploitation of an exponentially large quantum state space through controllable entanglement and interference. Here we propose and experimentally implement two quantum algorithms on a superconducting processor. A key component in both methods is the use of the quantum state space as feature space. The use of a quantum-enhanced feature space that is only efficiently accessible on a quantum computer provides a possible path to quantum advantage. The algorithms solve a problem of supervised learning: the construction of a classifier. One method, the quantum variational classifier, uses a variational quantum circuit1,2 to classify the data in a way similar to the method of conventional SVMs. The other method, a quantum kernel estimator, estimates the kernel function on the quantum computer and optimizes a classical SVM. The two methods provide tools for exploring the applications of noisy intermediate-scale quantum computers3 to machine learning.

...read moreread less

1,140 citations

Journal Article•DOI•

Recent advances in physical reservoir computing: A review

[...]

Gouhei Tanaka¹, Toshiyuki Yamane², Jean Benoit Héroux², Ryosho Nakane¹, Naoki Kanazawa², Seiji Takeda², Hidetoshi Numata², Daiju Nakano², Akira Hirose¹ - Show less +5 more•Institutions (2)

University of Tokyo¹, IBM²

01 Jul 2019-Neural Networks

TL;DR: An overview of recent advances in physical reservoir computing is provided by classifying them according to the type of the reservoir to expand its practical applications and develop next-generation machine learning systems.

...read moreread less

959 citations

Book•

An Algorithm for the Traveling Salesman Problem

[...]

John D. C. Little¹, Katta G. Murty², Dura W. Sweeney³, Caroline Karel⁴•Institutions (4)

Massachusetts Institute of Technology¹, Indian Statistical Institute², IBM³, Case Western Reserve University⁴

03 Feb 2019

TL;DR: A “branch and bound” algorithm is presented for solving the traveling salesman problem, where the set of all tours feasible solutions is broken up into increasingly small subsets by a procedure called branching.

...read moreread less

Abstract: A “branch and bound” algorithm is presented for solving the traveling salesman problem. The set of all tours feasible solutions is broken up into increasingly small subsets by a procedure called branching. For each subset a lower bound on the length of the tours therein is calculated. Eventually, a subset is found that contains a single tour whose length is less than or equal to some lower bound for every tour. The motivation of the branching and the calculation of the lower bounds are based on ideas frequently used in solving assignment problems. Computationally, the algorithm extends the size of problem that can reasonably be solved without using methods special to the particular problem.

...read moreread less

813 citations

Journal Article•DOI•

All one needs to know about fog computing and related edge computing paradigms: A complete survey

[...]

Ashkan Yousefpour¹, Caleb Fung², Tam T. Nguyen³, Krishna P. Kadiyala², Fatemeh Jalali⁴, Amirreza Niakanlahiji⁵, Jian Kong², Jason P. Jue² - Show less +4 more•Institutions (5)

University of California, Berkeley¹, University of Texas at Dallas², Georgia Institute of Technology³, IBM⁴, University of North Carolina at Charlotte⁵

01 Sep 2019-Journal of Systems Architecture

TL;DR: This paper provides a tutorial on fog computing and its related computing paradigms, including their similarities and differences, and provides a taxonomy of research topics in fog computing.

...read moreread less

783 citations

Journal Article•DOI•

Error mitigation extends the computational reach of a noisy quantum processor

[...]

Abhinav Kandala¹, Kristan Temme¹, Antonio Corcoles¹, Antonio Mezzacapo¹, Jerry M. Chow¹, Jay M. Gambetta¹ - Show less +2 more•Institutions (1)

IBM¹

01 Mar 2019-Nature

TL;DR: This work applies the error mitigation protocol to mitigate errors in canonical single- and two-qubit experiments and extends its application to the variational optimization of Hamiltonians for quantum chemistry and magnetism.

...read moreread less

Abstract: Quantum computation, a paradigm of computing that is completely different from classical methods, benefits from theoretically proved speed-ups for certain problems and can be used to study the properties of quantum systems1. Yet, because of the inherently fragile nature of the physical computing elements (qubits), achieving quantum advantages over classical computation requires extremely low error rates for qubit operations, as well as substantial physical qubits, to realize fault tolerance via quantum error correction2,3. However, recent theoretical work4,5 has shown that the accuracy of computation (based on expectation values of quantum observables) can be enhanced through an extrapolation of results from a collection of experiments of varying noise. Here we demonstrate this error mitigation protocol on a superconducting quantum processor, enhancing its computational capability, with no additional hardware modifications. We apply the protocol to mitigate errors in canonical single- and two-qubit experiments and then extend its application to the variational optimization6–8 of Hamiltonians for quantum chemistry and magnetism9. We effectively demonstrate that the suppression of incoherent errors helps to achieve an otherwise inaccessible level of accuracy in the variational solutions using our noisy processor. These results demonstrate that error mitigation techniques will enable substantial improvements in the capabilities of near-term quantum computing hardware. The accuracy of computations on noisy, near-term quantum systems can be enhanced by extrapolating results from experiments with various noise levels, without requiring additional hardware modifications.

...read moreread less

690 citations

Proceedings Article•DOI•

A Hybrid Approach to Privacy-Preserving Federated Learning

[...]

Stacey Truex¹, Nathalie Baracaldo², Ali Anwar², Thomas Steinke², Heiko Ludwig², Rui Zhang², Yi Zhou² - Show less +3 more•Institutions (2)

Georgia Institute of Technology¹, IBM²

11 Nov 2019

TL;DR: This paper presents an alternative approach that utilizes both differential privacy and SMC to balance these trade-offs and enables the growth of noise injection as the number of parties increases without sacrificing privacy while maintaining a pre-defined rate of trust.

...read moreread less

Abstract: Federated learning facilitates the collaborative training of models without the sharing of raw data. However, recent attacks demonstrate that simply maintaining data locality during training processes does not provide sufficient privacy guarantees. Rather, we need a federated learning system capable of preventing inference over both the messages exchanged during training and the final trained model while ensuring the resulting model also has acceptable predictive accuracy. Existing federated learning approaches either use secure multiparty computation (SMC) which is vulnerable to inference or differential privacy which can lead to low accuracy given a large number of parties with relatively small amounts of data each. In this paper, we present an alternative approach that utilizes both differential privacy and SMC to balance these trade-offs. Combining differential privacy with secure multiparty computation enables us to reduce the growth of noise injection as the number of parties increases without sacrificing privacy while maintaining a pre-defined rate of trust. Our system is therefore a scalable approach that protects against inference threats and produces models with high accuracy. Additionally, our system can be used to train a variety of machine learning models, which we validate with experimental results on 3 different machine learning algorithms. Our experiments demonstrate that our approach out-performs state of the art solutions.

...read moreread less

538 citations

Journal Article•DOI•

Validating quantum computers using randomized model circuits

[...]

Andrew W. Cross¹, Lev S. Bishop¹, Sarah Sheldon¹, P. D. Nation¹, Jay M. Gambetta¹ - Show less +1 more•Institutions (1)

IBM¹

20 Sep 2019-Physical Review A

TL;DR: A single-number metric, quantum volume, that can be measured using a concrete protocol on near-term quantum computers of modest size, and measured on several state-of-the-art transmon devices, finding values as high as 16.5%.

...read moreread less

Abstract: We introduce a single-number metric, quantum volume, that can be measured using a concrete protocol on near-term quantum computers of modest size ($n\ensuremath{\lesssim}50$), and measure it on several state-of-the-art transmon devices, finding values as high as 16. The quantum volume is linked to system error rates, and is empirically reduced by uncontrolled interactions within the system. It quantifies the largest random circuit of equal width and depth that the computer successfully implements. Quantum computing systems with high-fidelity operations, high connectivity, large calibrated gate sets, and circuit rewriting toolchains are expected to have higher quantum volumes. The quantum volume is a pragmatic way to measure and compare progress toward improved system-wide gate error rates for near-term quantum computation and error-correction experiments.

...read moreread less

532 citations

Proceedings Article•DOI•

FLAIR: An Easy-to-Use Framework for State-of-the-Art NLP

[...]

Alan Akbik¹, Tanja Bergmann², Duncan A. J. Blythe³, Kashif Rasul⁴, Stefan Schweter, Roland Vollgraf - Show less +2 more•Institutions (4)

IBM¹, Hasso Plattner Institute², German Center for Neurodegenerative Diseases³, Free University of Berlin⁴

01 Jan 2019

TL;DR: The core idea of the FLAIR framework is to present a simple, unified interface for conceptually very different types of word and document embeddings, which effectively hides all embedding-specific engineering complexity and allows researchers to “mix and match” variousembeddings with little effort.

...read moreread less

Abstract: We present FLAIR, an NLP framework designed to facilitate training and distribution of state-of-the-art sequence labeling, text classification and language models. The core idea of the framework is to present a simple, unified interface for conceptually very different types of word and document embeddings. This effectively hides all embedding-specific engineering complexity and allows researchers to “mix and match” various embeddings with little effort. The framework also implements standard model training and hyperparameter selection routines, as well as a data fetching module that can download publicly available NLP datasets and convert them into data structures for quick set up of experiments. Finally, FLAIR also ships with a “model zoo” of pre-trained models to allow researchers to use state-of-the-art NLP models in their applications. This paper gives an overview of the framework and its functionality. The framework is available on GitHub at https://github.com/zalandoresearch/flair .

...read moreread less

499 citations

Journal Article•DOI•

A Single-Cell Atlas of the Tumor and Immune Ecosystem of Human Breast Cancer

[...]

Johanna Wagner¹, Maria Anna Rapsomaniki², Stéphane Chevrier¹, Tobias Anzeneder, Claus Langwieder, August Dykgers, Martin Rees, Annette Ramaswamy, Simone Muenst³, Savas D. Soysal³, Andrea Jacobs¹, Jonas Windhager¹, Karina Silina¹, Maries van den Broek¹, Konstantin J. Dedes¹, María Rodríguez Martínez², Walter P. Weber³, Bernd Bodenmiller¹ - Show less +14 more•Institutions (3)

University of Zurich¹, IBM², University Hospital of Basel³

16 May 2019-Cell

TL;DR: This large-scale, single-cell atlas deepens the understanding of breast tumor ecosystems and suggests that ecosystem-based patient classification will facilitate identification of individuals for precision medicine approaches targeting the tumor and its immunoenvironment.

...read moreread less

470 citations

Proceedings Article•DOI•

[...]

Yang Fu¹, Yunchao Wei¹, Guanshuo Wang², Yuqian Zhou¹, Honghui Shi³, Uiuc Uiuc, Thomas S. Huang¹ - Show less +3 more•Institutions (3)

University of Illinois at Urbana–Champaign¹, Shanghai Jiao Tong University², IBM³

01 Oct 2019

TL;DR: A Self-similarity Grouping (SSG) approach, which exploits the potential similarity of unlabeled samples to build multiple clusters from different views automatically, and introduces a clustering-guided semisupervised approach named SSG ++ to conduct the one-shot domain adaption in an open set setting.

...read moreread less

Abstract: Domain adaptation in person re-identification (re-ID) has always been a challenging task. In this work, we explore how to harness the similar natural characteristics existing in the samples from the target domain for learning to conduct person re-ID in an unsupervised manner. Concretely, we propose a Self-similarity Grouping (SSG) approach, which exploits the potential similarity (from the global body to local parts) of unlabeled samples to build multiple clusters from different views automatically. These independent clusters are then assigned with labels, which serve as the pseudo identities to supervise the training process. We repeatedly and alternatively conduct such a grouping and training process until the model is stable. Despite the apparent simplify, our SSG outperforms the state-of-the-arts by more than 4.6% (DukeMTMC→Market1501) and 4.4% (Market1501→DukeMTMC) in mAP, respectively. Upon our SSG, we further introduce a clustering-guided semisupervised approach named SSG ++ to conduct the one-shot domain adaption in an open set setting (i.e. the number of independent identities from the target domain is unknown). Without spending much effort on labeling, our SSG ++ can further promote the mAP upon SSG by 10.7% and 6.9%, respectively. Our Code is available at: https://github.com/OasisYang/SSG .

...read moreread less

Journal Article•DOI•

Classifying IoT Devices in Smart Environments Using Network Traffic Characteristics

[...]

Arunan Sivanathan¹, Hassan Habibi Gharakheili¹, Franco Loi¹, Adam Radford², Chamith Wijenayake¹, Arun Vishwanath³, Vijay Sivaraman¹ - Show less +3 more•Institutions (3)

University of New South Wales¹, Cisco Systems, Inc.², IBM³

01 Aug 2019-IEEE Transactions on Mobile Computing

TL;DR: This study paves the way for operators of smart environments to monitor their IoT assets for presence, functionality, and cyber-security without requiring any specialized devices or protocols.

...read moreread less

Abstract: The Internet of Things (IoT) is being hailed as the next wave revolutionizing our society, and smart homes, enterprises, and cities are increasingly being equipped with a plethora of IoT devices. Yet, operators of such smart environments may not even be fully aware of their IoT assets, let alone whether each IoT device is functioning properly safe from cyber-attacks. In this paper, we address this challenge by developing a robust framework for IoT device classification using traffic characteristics obtained at the network level. Our contributions are fourfold. First, we instrument a smart environment with 28 different IoT devices spanning cameras, lights, plugs, motion sensors, appliances, and health-monitors. We collect and synthesize traffic traces from this infrastructure for a period of six months, a subset of which we release as open data for the community to use. Second, we present insights into the underlying network traffic characteristics using statistical attributes such as activity cycles, port numbers, signalling patterns, and cipher suites. Third, we develop a multi-stage machine learning based classification algorithm and demonstrate its ability to identify specific IoT devices with over 99 percent accuracy based on their network activity. Finally, we discuss the trade-offs between cost, speed, and performance involved in deploying the classification framework in real-time. Our study paves the way for operators of smart environments to monitor their IoT assets for presence, functionality, and cyber-security without requiring any specialized devices or protocols.

...read moreread less

Journal Article•DOI•

[...]

Mario Lanza¹, H.-S. Philip Wong², Eric Pop², Daniele Ielmini³, Dimitri Strukov⁴, B. C. Regan⁵, Luca Larcher⁶, Marco A. Villena², Jianhua Yang⁷, Ludovic Goux⁸, Attilio Belmonte⁸, Yuchao Yang⁹, Francesco Maria Puglisi⁶, Jinfeng Kang⁹, Blanka Magyari-Köpe², Eilam Yalon², Anthony J. Kenyon, Mark Buckwell, Adnan Mehonic, Alexander L. Shluger¹⁰, Haitong Li², Tuo-Hung Hou¹¹, Boris Hudec¹¹, Deji Akinwande¹², Ruijing Ge¹², Stefano Ambrogio¹³, Juan Bautista Roldán¹⁴, Enrique Miranda¹⁵, Jordi Suñé¹⁵, Kin Leong Pey¹⁶, Xing Wu¹⁷, Nagarajan Raghavan¹⁶, Ernest Y. Wu¹³, Wei Lu¹⁸, Gabriele Navarro, Weidong Zhang¹⁹, Huaqiang Wu²⁰, Run-Wei Li²¹, Alexander W. Holleitner²², Ursula Wurstbauer²², Max C. Lemme²³, Ming Liu²¹, Shibing Long²¹, Qi Liu²¹, Hangbing Lv²¹, Andrea Padovani, Paolo Pavan⁶, Ilia Valov²³, Xu Jing¹, Tingting Han¹, Kaichen Zhu¹, Shaochuan Chen¹, Fei Hui¹, Yuanyuan Shi¹ - Show less +50 more•Institutions (23)

01 Jan 2019-Advanced electronic materials

TL;DR: This manuscript describes the most recommendable methodologies for the fabrication, characterization, and simulation of RS devices, as well as the proper methods to display the data obtained.

...read moreread less

Abstract: Resistive switching (RS) is an interesting property shown by some materials systems that, especially during the last decade, has gained a lot of interest for the fabrication of electronic devices, with electronic nonvolatile memories being those that have received the most attention. The presence and quality of the RS phenomenon in a materials system can be studied using different prototype cells, performing different experiments, displaying different figures of merit, and developing different computational analyses. Therefore, the real usefulness and impact of the findings presented in each study for the RS technology will be also different. This manuscript describes the most recommendable methodologies for the fabrication, characterization, and simulation of RS devices, as well as the proper methods to display the data obtained. The idea is to help the scientific community to evaluate the real usefulness and impact of an RS study for the development of RS technology. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

...read moreread less

Posted Content•

EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs

[...]

Aldo Pareja¹, Giacomo Domeniconi¹, Jie Chen¹, Tengfei Ma¹, Toyotaro Suzumura¹, Hiroki Kanezashi¹, Tim Kaler², Tao B. Schardl², Charles E. Leiserson² - Show less +5 more•Institutions (2)

IBM¹, Massachusetts Institute of Technology²

26 Feb 2019-arXiv: Learning

TL;DR: This work proposes EvolveGCN, which adapts the graph convolutional network (GCN) model along the temporal dimension without resorting to node embeddings, and captures the dynamism of the graph sequence through using an RNN to evolve the GCN parameters.

...read moreread less

Abstract: Graph representation learning resurges as a trending research subject owing to the widespread use of deep learning for Euclidean data, which inspire various creative designs of neural networks in the non-Euclidean domain, particularly graphs. With the success of these graph neural networks (GNN) in the static setting, we approach further practical scenarios where the graph dynamically evolves. Existing approaches typically resort to node embeddings and use a recurrent neural network (RNN, broadly speaking) to regulate the embeddings and learn the temporal dynamics. These methods require the knowledge of a node in the full time span (including both training and testing) and are less applicable to the frequent change of the node set. In some extreme scenarios, the node sets at different time steps may completely differ. To resolve this challenge, we propose EvolveGCN, which adapts the graph convolutional network (GCN) model along the temporal dimension without resorting to node embeddings. The proposed approach captures the dynamism of the graph sequence through using an RNN to evolve the GCN parameters. Two architectures are considered for the parameter evolution. We evaluate the proposed approach on tasks including link prediction, edge classification, and node classification. The experimental results indicate a generally higher performance of EvolveGCN compared with related approaches. The code is available at \url{this https URL}.

...read moreread less

Journal Article•DOI•

Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction.

[...]

Philippe Schwaller¹, Philippe Schwaller², Teodoro Laino¹, Théophile Gaudin¹, Peter Bolgar², Christopher A. Hunter², Costas Bekas¹, Alpha A. Lee² - Show less +4 more•Institutions (2)

IBM¹, University of Cambridge²

30 Aug 2019-ACS central science

TL;DR: This work shows that a multihead attention Molecular Transformer model outperforms all algorithms in the literature, achieving a top-1 accuracy above 90% on a common benchmark data set and is able to handle inputs without a reactant–reagent split and including stereochemistry, which makes the method universally applicable.

...read moreread less

Abstract: Organic synthesis is one of the key stumbling blocks in medicinal chemistry. A necessary yet unsolved step in planning synthesis is solving the forward problem: Given reactants and reagents, predict the products. Similar to other work, we treat reaction prediction as a machine translation problem between simplified molecular-input line-entry system (SMILES) strings (a text-based representation) of reactants, reagents, and the products. We show that a multihead attention Molecular Transformer model outperforms all algorithms in the literature, achieving a top-1 accuracy above 90% on a common benchmark data set. Molecular Transformer makes predictions by inferring the correlations between the presence and absence of chemical motifs in the reactant, reagent, and product present in the data set. Our model requires no handcrafted rules and accurately predicts subtle chemical transformations. Crucially, our model can accurately estimate its own uncertainty, with an uncertainty score that is 89% accurate in terms of classifying whether a prediction is correct. Furthermore, we show that the model is able to handle inputs without a reactant-reagent split and including stereochemistry, which makes our method universally applicable.

...read moreread less

Proceedings Article•DOI•

CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge

[...]

Alon Talmor¹, Jonathan Herzig², Nicholas Lourie¹, Jonathan Berant³•Institutions (3)

Allen Institute for Artificial Intelligence¹, IBM², Tel Aviv University³

01 Jun 2019

TL;DR: In this article, the authors present commonsenseQA, a dataset for commonsense question answering with prior knowledge, where workers are asked to create multiple-choice questions with complex semantics that often require prior knowledge.

...read moreread less

Abstract: When answering a question, people often draw upon their rich world knowledge in addition to the particular context. Recent work has focused primarily on answering questions given some relevant document or context, and required very little general background. To investigate question answering with prior knowledge, we present CommonsenseQA: a challenging new dataset for commonsense question answering. To capture common sense beyond associations, we extract from ConceptNet (Speer et al., 2017) multiple target concepts that have the same semantic relation to a single source concept. Crowd-workers are asked to author multiple-choice questions that mention the source concept and discriminate in turn between each of the target concepts. This encourages workers to create questions with complex semantics that often require prior knowledge. We create 12,247 questions through this procedure and demonstrate the difficulty of our task with a large number of strong baselines. Our best baseline is based on BERT-large (Devlin et al., 2018) and obtains 56% accuracy, well below human performance, which is 89%.

...read moreread less

Journal Article•DOI•

An sp-hybridized molecular carbon allotrope, cyclo[18]carbon

[...]

Katharina Kaiser¹, Lorel M. Scriven², Fabian Schulz¹, Przemyslaw Gawel², Leo Gross¹, Harry L. Anderson² - Show less +2 more•Institutions (2)

IBM¹, University of Oxford²

20 Sep 2019-Science

TL;DR: In this paper, a cyclo[18] carbon (C18) was generated using atom manipulation on bilayer NaCl on Cu(111) at 5 kelvin by eliminating carbon monoxide from a cyclocarbon oxide molecule, C24O6.

...read moreread less

Abstract: Carbon allotropes built from rings of two-coordinate atoms, known as cyclo[n]carbons, have fascinated chemists for many years, but until now they could not be isolated or structurally characterized because of their high reactivity. We generated cyclo[18]carbon (C18) using atom manipulation on bilayer NaCl on Cu(111) at 5 kelvin by eliminating carbon monoxide from a cyclocarbon oxide molecule, C24O6 Characterization of cyclo[18]carbon by high-resolution atomic force microscopy revealed a polyynic structure with defined positions of alternating triple and single bonds. The high reactivity of cyclocarbon and cyclocarbon oxides allows covalent coupling between molecules to be induced by atom manipulation, opening an avenue for the synthesis of other carbon allotropes and carbon-rich materials from the coalescence of cyclocarbon molecules.

...read moreread less

Proceedings Article•

ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models

[...]

Andrei Barbu¹, David Mayo, Julian Alverio, William Luo, Christopher Wang¹, Dan Gutfreund², Joshua B. Tenenbaum¹, Boris Katz¹ - Show less +4 more•Institutions (2)

Massachusetts Institute of Technology¹, IBM²

06 Sep 2019

TL;DR: A highly automated platform that enables gathering datasets with controls at scale using automated tools throughout machine learning to generate datasets that exercise models in new ways thus providing valuable feedback to researchers is developed.

...read moreread less

Abstract: We collect a large real-world test set, ObjectNet, for object recognition with controls where object backgrounds, rotations, and imaging viewpoints are random. Most scientific experiments have controls, confounds which are removed from the data, to ensure that subjects cannot perform a task by exploiting trivial correlations in the data. Historically, large machine learning and computer vision datasets have lacked such controls. This has resulted in models that must be fine-tuned for new datasets and perform better on datasets than in real-world applications. When tested on ObjectNet, object detectors show a 40-45% drop in performance, with respect to their performance on other benchmarks, due to the controls for biases. Controls make ObjectNet robust to fine-tuning showing only small performance increases. We develop a highly automated platform that enables gathering datasets with controls by crowdsourcing image capturing and annotation. ObjectNet is the same size as the ImageNet test set (50,000 images), and by design does not come paired with a training set in order to encourage generalization. The dataset is both easier than ImageNet (objects are largely centered and unoccluded) and harder (due to the controls). Although we focus on object recognition here, data with controls can be gathered at scale using automated tools throughout machine learning to generate datasets that exercise models in new ways thus providing valuable feedback to researchers. This work opens up new avenues for research in generalizable, robust, and more human-like computer vision and in creating datasets where results are predictive of real-world performance.

...read moreread less

Journal Article•DOI•

Evidence of topological superconductivity in planar Josephson junctions

[...]

Antonio Fornieri¹, Alexander M. Whiticar¹, F. Setiawan², Elías Portolés¹, Asbjorn Drachmann¹, Anna Keselman³, Sergei Gronin⁴, Candice Thomas⁴, Tian Wang⁴, Ray Kallaher⁴, Geoffrey C. Gardner⁴, Erez Berg⁵, Erez Berg², Michael J. Manfra, Ady Stern⁵, Charles Marcus¹, Fabrizio Nichele¹, Fabrizio Nichele⁶ - Show less +14 more•Institutions (6)

University of Copenhagen¹, University of Chicago², Microsoft³, Purdue University⁴, Weizmann Institute of Science⁵, IBM⁶

24 Apr 2019-Nature

TL;DR: In this paper, phase-dependent zero-bias peak measured by tunnelling spectroscopy at the end of Josephson junctions realized on a heterostructure consisting of aluminium on indium arsenide was found.

...read moreread less

Abstract: Majorana zero modes—quasiparticle states localized at the boundaries of topological superconductors—are expected to be ideal building blocks for fault-tolerant quantum computing1,2. Several observations of zero-bias conductance peaks measured by tunnelling spectroscopy above a critical magnetic field have been reported as experimental indications of Majorana zero modes in superconductor–semiconductor nanowires3–8. On the other hand, two-dimensional systems offer the alternative approach of confining Majorana channels within planar Josephson junctions, in which the phase difference φ between the superconducting leads represents an additional tuning knob that is predicted to drive the system into the topological phase at lower magnetic fields than for a system without phase bias9,10. Here we report the observation of phase-dependent zero-bias conductance peaks measured by tunnelling spectroscopy at the end of Josephson junctions realized on a heterostructure consisting of aluminium on indium arsenide. Biasing the junction to φ ≈ π reduces the critical field at which the zero-bias peak appears, with respect to φ = 0. The phase and magnetic-field dependence of the zero-energy states is consistent with a model of Majorana zero modes in finite-size Josephson junctions. As well as providing experimental evidence of phase-tuned topological superconductivity, our devices are compatible with superconducting quantum electrodynamics architectures11 and are scalable to the complex geometries needed for topological quantum computing9,12,13. Evidence is found for phase-tunable Majorana zero modes in scalable two-dimensional Josephson junctions produced by top-down fabrication.

...read moreread less

Journal Article•DOI•

Semantic photo manipulation with a generative image prior

[...]

David Bau¹, Hendrik Strobelt², William Peebles¹, Jonas Wulff¹, Bolei Zhou³, Jun-Yan Zhu¹, Antonio Torralba¹ - Show less +3 more•Institutions (3)

Massachusetts Institute of Technology¹, IBM², The Chinese University of Hong Kong³

12 Jul 2019-ACM Transactions on Graphics

TL;DR: The authors adapts the image prior learned by GANs to image statistics of an individual image, which can accurately reconstruct the input image and synthesize new content consistent with the appearance of the original image.

...read moreread less

Abstract: Despite the recent success of GANs in synthesizing images conditioned on inputs such as a user sketch, text, or semantic labels, manipulating the high-level attributes of an existing natural photograph with GANs is challenging for two reasons. First, it is hard for GANs to precisely reproduce an input image. Second, after manipulation, the newly synthesized pixels often do not fit the original image. In this paper, we address these issues by adapting the image prior learned by GANs to image statistics of an individual image. Our method can accurately reconstruct the input image and synthesize new content, consistent with the appearance of the input image. We demonstrate our interactive system on several semantic image editing tasks, including synthesizing new objects consistent with background, removing unwanted objects, and changing the appearance of an object. Quantitative and qualitative comparisons against several existing methods demonstrate the effectiveness of our method.

...read moreread less

Proceedings Article•DOI•

RepMet: Representative-Based Metric Learning for Classification and Few-Shot Object Detection

[...]

Leonid Karlinsky¹, Joseph Shtok¹, Sivan Harary¹, Eli Schwartz², Amit Aides¹, Rogerio Feris¹, Raja Giryes², Alexander M. Bronstein³ - Show less +4 more•Institutions (3)

IBM¹, Tel Aviv University², Technion – Israel Institute of Technology³

07 Jan 2019

TL;DR: In this article, distance metric learning (DML) is applied to object classification, both in the standard regime of rich training data and in the few-shot scenario, where each category is represented by only a few examples.

...read moreread less

Abstract: Distance metric learning (DML) has been successfully applied to object classification, both in the standard regime of rich training data and in the few-shot scenario, where each category is represented by only a few examples. In this work, we propose a new method for DML that simultaneously learns the backbone network parameters, the embedding space, and the multi-modal distribution of each of the training categories in that space, in a single end-to-end training process. Our approach outperforms state-of-the-art methods for DML-based object classification on a variety of standard fine-grained datasets. Furthermore, we demonstrate the effectiveness of our approach on the problem of few-shot object detection, by incorporating the proposed DML architecture as a classification head into a standard object detection model. We achieve the best results on the ImageNet-LOC dataset compared to strong baselines, when only a few training examples are available. We also offer the community a new episodic benchmark based on the ImageNet dataset for the few-shot object detection task.

...read moreread less

Journal Article•DOI•

Horizontal Pyramid Matching for Person Re-Identification

[...]

Yang Fu¹, Yunchao Wei¹, Yuqian Zhou¹, Honghui Shi², Gao Huang³, Xinchao Wang⁴, Zhiqiang Yao, Thomas S. Huang¹ - Show less +4 more•Institutions (4)

University of Illinois at Urbana–Champaign¹, IBM², Cornell University³, Stevens Institute of Technology⁴

17 Jul 2019

TL;DR: A simple yet effective Horizontal Pyramid Matching (HPM) approach to fully exploit various partial information of a given person, so that correct person candidates can be still identified even even some key parts are missing.

...read moreread less

Abstract: Despite the remarkable progress in person re-identification (Re-ID), such approaches still suffer from the failure cases where the discriminative body parts are missing. To mitigate this type of failure, we propose a simple yet effective Horizontal Pyramid Matching (HPM) approach to fully exploit various partial information of a given person, so that correct person candidates can be identified even if some key parts are missing. With HPM, we make the following contributions to produce more robust feature representations for the Re-ID task: 1) we learn to classify using partial feature representations at different horizontal pyramid scales, which successfully enhance the discriminative capabilities of various person parts; 2) we exploit average and max pooling strategies to account for person-specific discriminative information in a global-local manner. To validate the effectiveness of our proposed HPM method, extensive experiments are conducted on three popular datasets including Market-1501, DukeMTMCReID and CUHK03. Respectively, we achieve mAP scores of 83.1%, 74.5% and 59.7% on these challenging benchmarks, which are the new state-of-the-arts.

...read moreread less

Journal Article•DOI•

Liquid versus tissue biopsy for detecting acquired resistance and tumor heterogeneity in gastrointestinal cancers.

[...]

Aparna Raj Parikh¹, Ignaty Leshchiner², Liudmila Elagina², Lipika Goyal¹, Chaya Levovitz³, Giulia Siravegna⁴, Dimitri Livitz², Kahn Rhrissorrakrai³, Elizabeth E. Martin², Emily E. Van Seventer¹, Megan Hanna², Kara Slowik², Filippo Utro³, Christopher J. Pinto¹, Alicia Wong², Brian P. Danysh², Ferran Fece de la Cruz¹, Isobel J Fetter¹, Brandon Nadres¹, Heather A. Shahzade¹, Jill N. Allen¹, Lawrence S. Blaszkowsky¹, Jeffrey W. Clark¹, Bruce J. Giantonio¹, Janet E. Murphy¹, Ryan D. Nipp¹, Eric Roeland¹, David P. Ryan¹, Colin D. Weekes¹, Eunice L. Kwak¹, Jason E. Faris¹, Jennifer Y. Wo¹, François Aguet², Ipsita Dey-Guha¹, Mehlika Hazar-Rethinam¹, Dora Dias-Santagata¹, David T. Ting¹, Andrew X. Zhu¹, Theodore S. Hong¹, Todd R. Golub⁵, Todd R. Golub², Todd R. Golub¹, A. John Iafrate¹, Viktor A. Adalsteinsson², Alberto Bardelli⁴, Laxmi Parida³, Dejan Juric¹, Gad Getz, Ryan B. Corcoran¹ - Show less +45 more•Institutions (5)

Harvard University¹, Broad Institute², IBM³, University of Turin⁴, Howard Hughes Medical Institute⁵

01 Sep 2019-Nature Medicine

TL;DR: Direct prospective comparison of circulating tumor DNA and tissue biopsy sequencing shows the superiority of liquid biopsies for capturing clinically relevant alterations mediating resistance to targeted therapies in cancer patients.

...read moreread less

Abstract: During cancer therapy, tumor heterogeneity can drive the evolution of multiple tumor subclones harboring unique resistance mechanisms in an individual patient1–3. Previous case reports and small case series have suggested that liquid biopsy (specifically, cell-free DNA (cfDNA)) may better capture the heterogeneity of acquired resistance4–8. However, the effectiveness of cfDNA versus standard single-lesion tumor biopsies has not been directly compared in larger-scale prospective cohorts of patients following progression on targeted therapy. Here, in a prospective cohort of 42 patients with molecularly defined gastrointestinal cancers and acquired resistance to targeted therapy, direct comparison of postprogression cfDNA versus tumor biopsy revealed that cfDNA more frequently identified clinically relevant resistance alterations and multiple resistance mechanisms, detecting resistance alterations not found in the matched tumor biopsy in 78% of cases. Whole-exome sequencing of serial cfDNA, tumor biopsies and rapid autopsy specimens elucidated substantial geographic and evolutionary differences across lesions. Our data suggest that acquired resistance is frequently characterized by profound tumor heterogeneity, and that the emergence of multiple resistance alterations in an individual patient may represent the ‘rule’ rather than the ‘exception’. These findings have profound therapeutic implications and highlight the potential advantages of cfDNA over tissue biopsy in the setting of acquired resistance. Direct prospective comparison of circulating tumor DNA and tissue biopsy sequencing shows the superiority of liquid biopsies for capturing clinically relevant alterations mediating resistance to targeted therapies in cancer patients.

...read moreread less

Journal Article•DOI•

Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study.

[...]

Philipp Tschandl¹, Noel C. F. Codella², Bengü Nisa Akay³, Giuseppe Argenziano, Ralph P. Braun⁴, Horacio Cabo, David A. Gutman⁵, Allan C. Halpern⁶, Brian Helba⁷, Rainer Hofmann-Wellenhof⁸, Aimilios Lallas⁹, Jan Lapins¹⁰, Caterina Longo¹¹, Josep Malvehy¹², Josep Malvehy¹³, Michael A. Marchetti⁶, Ashfaq A. Marghoob⁶, Scott W. Menzies¹⁴, Amanda Oakley¹⁵, John Paoli¹⁶, Susana Puig¹², Susana Puig¹³, Christoph Rinner¹, Cliff Rosendahl¹⁷, Alon Scope¹⁸, Christoph Sinz¹, H. Peter Soyer¹⁷, Luc Thomas¹⁹, Iris Zalaudek²⁰, Harald Kittler¹ - Show less +26 more•Institutions (20)

01 Jul 2019-Lancet Oncology

TL;DR: State-of-the-art machine-learning classifiers outperformed human experts in the diagnosis of pigmented skin lesions and should have a more important role in clinical practice.

...read moreread less

Abstract: Summary Background Whether machine-learning algorithms can diagnose all pigmented skin lesions as accurately as human experts is unclear. The aim of this study was to compare the diagnostic accuracy of state-of-the-art machine-learning algorithms with human readers for all clinically relevant types of benign and malignant pigmented skin lesions. Methods For this open, web-based, international, diagnostic study, human readers were asked to diagnose dermatoscopic images selected randomly in 30-image batches from a test set of 1511 images. The diagnoses from human readers were compared with those of 139 algorithms created by 77 machine-learning labs, who participated in the International Skin Imaging Collaboration 2018 challenge and received a training set of 10 015 images in advance. The ground truth of each lesion fell into one of seven predefined disease categories: intraepithelial carcinoma including actinic keratoses and Bowen's disease; basal cell carcinoma; benign keratinocytic lesions including solar lentigo, seborrheic keratosis and lichen planus-like keratosis; dermatofibroma; melanoma; melanocytic nevus; and vascular lesions. The two main outcomes were the differences in the number of correct specific diagnoses per batch between all human readers and the top three algorithms, and between human experts and the top three algorithms. Findings Between Aug 4, 2018, and Sept 30, 2018, 511 human readers from 63 countries had at least one attempt in the reader study. 283 (55·4%) of 511 human readers were board-certified dermatologists, 118 (23·1%) were dermatology residents, and 83 (16·2%) were general practitioners. When comparing all human readers with all machine-learning algorithms, the algorithms achieved a mean of 2·01 (95% CI 1·97 to 2·04; p Interpretation State-of-the-art machine-learning classifiers outperformed human experts in the diagnosis of pigmented skin lesions and should have a more important role in clinical practice. However, a possible limitation of these algorithms is their decreased performance for out-of-distribution images, which should be addressed in future research. Funding None.

...read moreread less

Journal Article•DOI•

Large Pockels effect in micro- and nanostructured barium titanate integrated on silicon

[...]

Stefan Abel¹, Felix Eltes¹, J. Elliott Ortmann², Andreas Messner³, P. Castéra⁴, Tino Wagner³, Darius Urbonas¹, Alvaro Rosa⁴, A. M. Gutierrez⁴, Domenico Tulli, Ping Ma³, Benedikt Baeuerle³, Arne Josten³, Wolfgang Heni³, Daniele Caimi¹, Lukas Czornomaz¹, Alexander A. Demkov², Juerg Leuthold³, Pablo Sanchis⁴, Jean Fompeyrine¹ - Show less +16 more•Institutions (4)

IBM¹, University of Texas at Austin², ETH Zurich³, Polytechnic University of Valencia⁴

01 Jan 2019-Nature Materials

TL;DR: It is proved that the Pockels effect remains strong even in nanoscale devices, and shown as a practical example data modulation up to 50 Gbit s−1.

...read moreread less

Abstract: The electro-optical Pockels effect is an essential nonlinear effect used in many applications. The ultrafast modulation of the refractive index is, for example, crucial to optical modulators in photonic circuits. Silicon has emerged as a platform for integrating such compact circuits, but a strong Pockels effect is not available on silicon platforms. Here, we demonstrate a large electro-optical response in silicon photonic devices using barium titanate. We verify the Pockels effect to be the physical origin of the response, with r42 = 923 pm V−1, by confirming key signatures of the Pockels effect in ferroelectrics: the electro-optic response exhibits a crystalline anisotropy, remains strong at high frequencies, and shows hysteresis on changing the electric field. We prove that the Pockels effect remains strong even in nanoscale devices, and show as a practical example data modulation up to 50 Gbit s−1. We foresee that our work will enable novel device concepts with an application area largely extending beyond communication technologies. Electro-optic modulators based on epitaxial barium titanate (BTO) integrated on silicon exhibit speeds up to 50 Gbit s–1 while the Pockels coefficient of the BTO film is found to be approaching the bulk value.

...read moreread less

Journal Article•DOI•

Towards understanding two-level-systems in amorphous solids: insights from quantum circuits.

[...]

Clemens Müller¹, Clemens Müller², Clemens Müller³, Jared H. Cole⁴, Jürgen Lisenfeld⁵ - Show less +1 more•Institutions (5)

IBM¹, University of Queensland², ETH Zurich³, RMIT University⁴, Karlsruhe Institute of Technology⁵

30 Oct 2019-Reports on Progress in Physics

TL;DR: This article reviews the plethora of recent experimental results in this area and discusses the various theoretical models which have been used to describe the observations and summarises the current approaches to solving this fundamentally important problem in solid-state physics.

...read moreread less

Abstract: Amorphous solids show surprisingly universal behaviour at low temperatures. The prevailing wisdom is that this can be explained by the existence of two-state defects within the material. The so-called standard tunneling model has become the established framework to explain these results, yet it still leaves the central question essentially unanswered-what are these two-level defects (TLS)? This question has recently taken on a new urgency with the rise of superconducting circuits in quantum computing, circuit quantum electrodynamics, magnetometry, electrometry and metrology. Superconducting circuits made from aluminium or niobium are fundamentally limited by losses due to TLS within the amorphous oxide layers encasing them. On the other hand, these circuits also provide a novel and effective method for studying the very defects which limit their operation. We can now go beyond ensemble measurements and probe individual defects-observing the quantum nature of their dynamics and studying their formation, their behaviour as a function of applied field, strain, temperature and other properties. This article reviews the plethora of recent experimental results in this area and discusses the various theoretical models which have been used to describe the observations. In doing so, it summarises the current approaches to solving this fundamentally important problem in solid-state physics.

...read moreread less

Proceedings Article•DOI•

Pooled Contextualized Embeddings for Named Entity Recognition.

[...]

Alan Akbik¹, Tanja Bergmann², Roland Vollgraf•Institutions (2)

IBM¹, Hasso Plattner Institute²

01 Jun 2019

TL;DR: This work proposes a method in which it dynamically aggregate contextualized embeddings of each unique string that the authors encounter and uses a pooling operation to distill a ”global” word representation from all contextualized instances.

...read moreread less

Abstract: Contextual string embeddings are a recent type of contextualized word embedding that were shown to yield state-of-the-art results when utilized in a range of sequence labeling tasks. They are based on character-level language models which treat text as distributions over characters and are capable of generating embeddings for any string of characters within any textual context. However, such purely character-based approaches struggle to produce meaningful embeddings if a rare string is used in a underspecified context. To address this drawback, we propose a method in which we dynamically aggregate contextualized embeddings of each unique string that we encounter. We then use a pooling operation to distill a ”global” word representation from all contextualized instances. We evaluate these ”pooled contextualized embeddings” on common named entity recognition (NER) tasks such as CoNLL-03 and WNUT and show that our approach significantly improves the state-of-the-art for NER. We make all code and pre-trained models available to the research community for use and reproduction.

...read moreread less

Journal Article•DOI•

AutoZOOM: Autoencoder-Based Zeroth Order Optimization Method for Attacking Black-Box Neural Networks

[...]

Chun-Chen Tu¹, Paishun Ting¹, Pin-Yu Chen², Sijia Liu², Huan Zhang³, Jinfeng Yi, Cho-Jui Hsieh³, Shin-Ming Cheng⁴ - Show less +4 more•Institutions (4)

University of Michigan¹, IBM², University of California, Los Angeles³, National Taiwan University of Science and Technology⁴

17 Jul 2019

TL;DR: Li et al. as discussed by the authors proposed an adaptive random gradient estimation strategy to balance query counts and distortion, and an autoencoder that is either trained offline with unlabeled data or a bilinear resizing operation for attack acceleration.

...read moreread less

Abstract: Recent studies have shown that adversarial examples in state-of-the-art image classifiers trained by deep neural networks (DNN) can be easily generated when the target model is transparent to an attacker, known as the white-box setting. However, when attacking a deployed machine learning service, one can only acquire the input-output correspondences of the target model; this is the so-called black-box attack setting. The major drawback of existing black-box attacks is the need for excessive model queries, which may give a false sense of model robustness due to inefficient query designs. To bridge this gap, we propose a generic framework for query-efficient blackbox attacks. Our framework, AutoZOOM, which is short for Autoencoder-based Zeroth Order Optimization Method, has two novel building blocks towards efficient black-box attacks: (i) an adaptive random gradient estimation strategy to balance query counts and distortion, and (ii) an autoencoder that is either trained offline with unlabeled data or a bilinear resizing operation for attack acceleration. Experimental results suggest that, by applying AutoZOOM to a state-of-the-art black-box attack (ZOO), a significant reduction in model queries can be achieved without sacrificing the attack success rate and the visual quality of the resulting adversarial examples. In particular, when compared to the standard ZOO method, AutoZOOM can consistently reduce the mean query counts in finding successful adversarial examples (or reaching the same distortion level) by at least 93% on MNIST, CIFAR-10 and ImageNet datasets, leading to novel insights on adversarial robustness.

...read moreread less

Proceedings Article•DOI•

Graph Convolutional Networks with EigenPooling

[...]

Yao Ma¹, Suhang Wang², Charu C. Aggarwal³, Jiliang Tang¹•Institutions (3)

Michigan State University¹, Pennsylvania State University², IBM³

25 Jul 2019

TL;DR: Wang et al. as discussed by the authors introduced a pooling operator based on graph Fourier transform, which can utilize the node features and local structures during the pooling process, and designed pooling layers based on the pool operator, which are further combined with traditional GCN convolutional layers to form a graph neural network framework for graph classification.

...read moreread less

Abstract: Graph neural networks, which generalize deep neural network models to graph structured data, have attracted increasing attention in recent years. They usually learn node representations by transforming, propagating and aggregating node features and have been proven to improve the performance of many graph related tasks such as node classification and link prediction. To apply graph neural networks for the graph classification task, approaches to generate thegraph representation from node representations are demanded. A common way is to globally combine the node representations. However, rich structural information is overlooked. Thus a hierarchical pooling procedure is desired to preserve the graph structure during the graph representation learning. There are some recent works on hierarchically learning graph representation analogous to the pooling step in conventional convolutional neural (CNN) networks. However, the local structural information is still largely neglected during the pooling process. In this paper, we introduce a pooling operator $\pooling$ based on graph Fourier transform, which can utilize the node features and local structures during the pooling process. We then design pooling layers based on the pooling operator, which are further combined with traditional GCN convolutional layers to form a graph neural network framework $\m$ for graph classification. Theoretical analysis is provided to understand $\pooling$ from both local and global perspectives. Experimental results of the graph classification task on $6$ commonly used benchmarks demonstrate the effectiveness of the proposed framework.

...read moreread less

Proceedings Article•DOI•

Seeing What a GAN Cannot Generate

[...]

David Bau¹, Jun-Yan Zhu¹, Jonas Wulff¹, William Peebles¹, Bolei Zhou², Hendrik Strobelt³, Antonio Torralba¹ - Show less +3 more•Institutions (3)

Massachusetts Institute of Technology¹, The Chinese University of Hong Kong², IBM³

01 Oct 2019

TL;DR: This work visualize mode collapse at both the distribution level and the instance level, and deploys a semantic segmentation network to compare the distribution of segmented objects in the generated images with the target distribution in the training set.

...read moreread less

Abstract: Despite the success of Generative Adversarial Networks (GANs), mode collapse remains a serious issue during GAN training. To date, little work has focused on understanding and quantifying which modes have been dropped by a model. In this work, we visualize mode collapse at both the distribution level and the instance level. First, we deploy a semantic segmentation network to compare the distribution of segmented objects in the generated images with the target distribution in the training set. Differences in statistics reveal object classes that are omitted by a GAN. Second, given the identified omitted object classes, we visualize the GAN's omissions directly. In particular, we compare specific differences between individual photos and their approximate inversions by a GAN. To this end, we relax the problem of inversion and solve the tractable problem of inverting a GAN layer instead of the entire generator. Finally, we use this framework to analyze several recent GANs trained on multiple datasets and identify their typical failure cases.

...read moreread less

Collapse