Showing papers by "Amazon.com published in 2021"

PDF

Open Access

Journal Article•DOI•

Algorithm Unrolling: Interpretable, Efficient Deep Learning for Signal and Image Processing

[...]

Vishal Monga¹, Yuelong Li², Yonina C. Eldar³•Institutions (3)

Pennsylvania State University¹, Amazon.com², Weizmann Institute of Science³

25 Feb 2021-IEEE Signal Processing Magazine

TL;DR: In this paper, an emerging technique called algorithm unrolling, or unfolding, offers promise in eliminating these issues by providing a concrete and systematic connection between iterative algorithms that are widely used in signal processing and deep neural networks.

...read moreread less

Abstract: Deep neural networks provide unprecedented performance gains in many real-world problems in signal and image processing. Despite these gains, the future development and practical deployment of deep networks are hindered by their black-box nature, i.e., a lack of interpretability and the need for very large training sets. An emerging technique called algorithm unrolling, or unfolding, offers promise in eliminating these issues by providing a concrete and systematic connection between iterative algorithms that are widely used in signal processing and deep neural networks. Unrolling methods were first proposed to develop fast neural network approximations for sparse coding. More recently, this direction has attracted enormous attention, and it is rapidly growing in both theoretic investigations and practical applications. The increasing popularity of unrolled deep networks is due, in part, to their potential in developing efficient, high-performance (yet interpretable) network architectures from reasonably sized training sets.

...read moreread less

377 citations

Journal Article•DOI•

Multiple object tracking: A literature review

[...]

Wenhan Luo¹, Wenhan Luo², Junliang Xing³, Anton Milan⁴, Xiaoqin Zhang⁵, Wei Liu², Tae-Kyun Kim¹ - Show less +3 more•Institutions (5)

Imperial College London¹, Tencent², Chinese Academy of Sciences³, Amazon.com⁴, Wenzhou University⁵

01 Apr 2021-Artificial Intelligence

TL;DR: This work provides a thorough review on the development of this problem in recent decades and inspects the recent advances in various aspects and proposes some interesting directions for future research.

...read moreread less

340 citations

Proceedings Article•DOI•

SUPERB: Speech processing Universal PERformance Benchmark

[...]

Shu-wen Yang¹, Po-Han Chi¹, Yung-Sung Chuang¹, Cheng-I Jeff Lai², Kushal Lakhotia³, Yist Y. Lin¹, Andy T. Liu¹, Jiatong Shi⁴, Xuankai Chang⁴, Guan-Ting Lin, Tzu-hsien Huang¹, Wei-Cheng Tseng⁵, Ko-tik Lee, Da-Rong Liu¹, Zili Huang⁴, Shuyan Dong⁶, Shang-Wen Li⁶, Shinji Watanabe⁴, Abdelrahman Mohamed³, Hung-yi Lee¹ - Show less +16 more•Institutions (6)

National Taiwan University¹, Massachusetts Institute of Technology², Facebook³, Johns Hopkins University⁴, National Tsing Hua University⁵, Amazon.com⁶

03 May 2021

TL;DR: The Speech processing Universal PERformance Benchmark (SUPERB) as discussed by the authors is a leaderboard to benchmark the performance of a shared model across a wide range of speech processing tasks with minimal architecture changes and labeled data.

...read moreread less

Abstract: Self-supervised learning (SSL) has proven vital for advancing research in natural language processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on large volumes of unlabeled data and achieves state-of-the-art (SOTA) for various tasks with minimal adaptation. However, the speech processing community lacks a similar setup to systematically explore the paradigm. To bridge this gap, we introduce Speech processing Universal PERformance Benchmark (SUPERB). SUPERB is a leaderboard to benchmark the performance of a shared model across a wide range of speech processing tasks with minimal architecture changes and labeled data. Among multiple usages of the shared model, we especially focus on extracting the representation learned from SSL due to its preferable re-usability. We present a simple framework to solve SUPERB tasks by learning task-specialized lightweight prediction heads on top of the frozen shared model. Our results demonstrate that the framework is promising as SSL representations show competitive generalizability and accessibility across SUPERB tasks. We release SUPERB as a challenge with a leaderboard and a benchmark toolkit to fuel the research in representation learning and general speech processing.

...read moreread less

138 citations

Posted Content•DOI•

ETA Prediction with Graph Neural Networks in Google Maps

[...]

Austin Derrow-Pinion, Jennifer She, David Wong, Oliver Fritz Lange¹, Todd Hester², Luis Perez³, Marc Nunkesser¹, Seongjae Lee¹, Xueying Guo¹, Brett Wiltshire, Peter W. Battaglia, Vishal Gupta, Ang Li, Zhongwen Xu, Alvaro Sanchez-Gonzalez, Yujia Li, Petar Veličković - Show less +13 more•Institutions (3)

Google¹, Amazon.com², Facebook³

25 Aug 2021-arXiv: Learning

TL;DR: In this article, a graph neural network estimator for estimated time of arrival (ETA) is presented, which has been deployed in production at Google Maps and has shown promising results.

...read moreread less

Abstract: Travel-time prediction constitutes a task of high importance in transportation networks, with web mapping services like Google Maps regularly serving vast quantities of travel time queries from users and enterprises alike. Further, such a task requires accounting for complex spatiotemporal interactions (modelling both the topological properties of the road network and anticipating events -- such as rush hours -- that may occur in the future). Hence, it is an ideal target for graph representation learning at scale. Here we present a graph neural network estimator for estimated time of arrival (ETA) which we have deployed in production at Google Maps. While our main architecture consists of standard GNN building blocks, we further detail the usage of training schedule methods such as MetaGradients in order to make our model robust and production-ready. We also provide prescriptive studies: ablating on various architectural decisions and training regimes, and qualitative analyses on real-world situations where our model provides a competitive edge. Our GNN proved powerful when deployed, significantly reducing negative ETA outcomes in several regions compared to the previous production baseline (40+% in cities like Sydney).

...read moreread less

115 citations

Proceedings Article•DOI•

Humble Teachers Teach Better Students for Semi-Supervised Object Detection

[...]

Yihe Tang¹, Weifeng Chen², Yijun Luo², Yuting Zhang²•Institutions (2)

Carnegie Mellon University¹, Amazon.com²

20 Jun 2021

TL;DR: In this paper, a semi-supervised approach for contemporary object detectors following the teacher-student dual model framework is proposed, where the exponential moving averaging strategy is used to update the teacher from the student online, and a light-weighted detection-specific data ensemble for the teacher to generate more reliable pseudo-labels.

...read moreread less

Abstract: We propose a semi-supervised approach for contemporary object detectors following the teacher-student dual model framework. Our method 1 is featured with 1) the exponential moving averaging strategy to update the teacher from the student online, 2) using plenty of region proposals and soft pseudo-labels as the student’s training targets, and 3) a light-weighted detection-specific data ensemble for the teacher to generate more reliable pseudo-labels. Compared to the recent state-of-the-art – STAC, which uses hard labels on sparsely selected hard pseudo samples, the teacher in our model exposes richer information to the student with soft-labels on many proposals. Our model achieves COCO-style AP of 53.04% on VOC07 val set, 8.4% better than STAC, when using VOC12 as unlabeled data. On MS-COCO, it outperforms prior work when only a small percentage of data is taken as labeled. It also reaches 53.8% AP on MS-COCO test-dev with 3.1% gain over the fully supervised ResNet-152 Cascaded R-CNN, by tapping into unlabeled data of a similar size to the labeled data.

...read moreread less

102 citations

Proceedings Article•DOI•

ETA Prediction with Graph Neural Networks in Google Maps

[...]

Google¹, Amazon.com², Facebook³

26 Oct 2021

TL;DR: In this paper, a graph neural network estimator for estimated time of arrival (ETA) is presented, which has been deployed in production at Google Maps and has shown promising results.

...read moreread less

Abstract: Travel-time prediction constitutes a task of high importance in transportation networks, with web mapping services like Google Maps regularly serving vast quantities of travel time queries from users and enterprises alike. Further, such a task requires accounting for complex spatiotemporal interactions (modelling both the topological properties of the road network and anticipating events---such as rush hours---that may occur in the future). Hence, it is an ideal target for graph representation learning at scale. Here we present a graph neural network estimator for estimated time of arrival (ETA) which we have deployed in production at Google Maps. While our main architecture consists of standard GNN building blocks, we further detail the usage of training schedule methods such as MetaGradients in order to make our model robust and production-ready. We also provide prescriptive studies: ablating on various architectural decisions and training regimes, and qualitative analyses on real-world situations where our model provides a competitive edge. Our GNN proved powerful when deployed, significantly reducing negative ETA outcomes in several regions compared to the previous production baseline (40+% in cities like Sydney).

...read moreread less

102 citations

Journal Article•DOI•

Accelerated antimicrobial discovery via deep generative models and molecular dynamics simulations.

[...]

Payel Das¹, Payel Das², Tom Sercu³, Tom Sercu¹, Kahini Wadhawan¹, Inkit Padhi¹, Sebastian Gehrmann⁴, Sebastian Gehrmann¹, Flaviu Cipcigan¹, Vijil Chenthamarakshan¹, Hendrik Strobelt¹, Cicero Nogueira dos Santos¹, Cicero Nogueira dos Santos⁵, Pin-Yu Chen¹, Yi Yan Yang, Jeremy P. K. Tan, James L. Hedrick¹, Jason Crain¹, Jason Crain⁶, Aleksandra Mojsilovic¹ - Show less +16 more•Institutions (6)

IBM¹, Columbia University², Facebook³, Harvard University⁴, Amazon.com⁵, University of Oxford⁶

11 Mar 2021-Nature Biomedical Engineering

TL;DR: In this article, a computational method leveraging deep learning and molecular dynamics simulations enables the rapid discovery of antimicrobial peptides with low toxicity and with high potency against diverse Gram-positive and Gram-negative pathogens.

...read moreread less

Abstract: The de novo design of antimicrobial therapeutics involves the exploration of a vast chemical repertoire to find compounds with broad-spectrum potency and low toxicity. Here, we report an efficient computational method for the generation of antimicrobials with desired attributes. The method leverages guidance from classifiers trained on an informative latent space of molecules modelled using a deep generative autoencoder, and screens the generated molecules using deep-learning classifiers as well as physicochemical features derived from high-throughput molecular dynamics simulations. Within 48 days, we identified, synthesized and experimentally tested 20 candidate antimicrobial peptides, of which two displayed high potency against diverse Gram-positive and Gram-negative pathogens (including multidrug-resistant Klebsiella pneumoniae) and a low propensity to induce drug resistance in Escherichia coli. Both peptides have low toxicity, as validated in vitro and in mice. We also show using live-cell confocal imaging that the bactericidal mode of action of the peptides involves the formation of membrane pores. The combination of deep learning and molecular dynamics may accelerate the discovery of potent and selective broad-spectrum antimicrobials. A computational method leveraging deep learning and molecular dynamics simulations enables the rapid discovery of antimicrobial peptides with low toxicity and with high potency against diverse Gram-positive and Gram-negative pathogens.

...read moreread less

98 citations

Proceedings Article•DOI•

Supporting Clustering with Contrastive Learning

[...]

Dejiao Zhang¹, Feng Nan¹, Xiaokai Wei², Shang-Wen Li¹, Henghui Zhu¹, Kathleen R. McKeown³, Ramesh Nallapati¹, Andrew Arnold¹, Bing Xiang¹ - Show less +5 more•Institutions (3)

Amazon.com¹, Facebook², Columbia University³

01 Jun 2021

TL;DR: This work proposes Supporting Clustering with Contrastive Learning (SCCL) – a novel framework to leverage contrastive learning to promote better separation in distance-based clustering and demonstrates the effectiveness of SCCL in leveraging the strengths of both bottom-up instance discrimination and top-down clustering to achieve better intra-clusters and inter-cluster distances.

...read moreread less

Abstract: Unsupervised clustering aims at discovering the semantic categories of data according to some distance measured in the representation space. However, different categories often overlap with each other in the representation space at the beginning of the learning process, which poses a significant challenge for distance-based clustering in achieving good separation between different categories. To this end, we propose Supporting Clustering with Contrastive Learning (SCCL) – a novel framework to leverage contrastive learning to promote better separation. We assess the performance of SCCL on short text clustering and show that SCCL significantly advances the state-of-the-art results on most benchmark datasets with 3%-11% improvement on Accuracy and 4%-15% improvement on Normalized Mutual Information. Furthermore, our quantitative analysis demonstrates the effectiveness of SCCL in leveraging the strengths of both bottom-up instance discrimination and top-down clustering to achieve better intra-cluster and inter-cluster distances when evaluated with the ground truth cluster labels.

...read moreread less

95 citations

Proceedings Article•DOI•

Energy-Based Learning for Scene Graph Generation

[...]

Mohammed Suhail¹, Abhay Mittal², Behjat Siddiquie², Chris Broaddus², Jayan Eledath², Gerard Medioni², Leonid Sigal¹ - Show less +3 more•Institutions (2)

University of British Columbia¹, Amazon.com²

03 Mar 2021

TL;DR: In this paper, an energy-based learning framework for scene graph generation is proposed to incorporate the structure of scene graphs in the output space, which allows models to learn efficiently from a small number of labels.

...read moreread less

Abstract: Traditional scene graph generation methods are trained using cross-entropy losses that treat objects and relationships as independent entities. Such a formulation, however, ignores the structure in the output space, in an inherently structured prediction problem. In this work, we introduce a novel energy-based learning framework for generating scene graphs. The proposed formulation allows for efficiently incorporating the structure of scene graphs in the output space. This additional constraint in the learning framework acts as an inductive bias and allows models to learn efficiently from a small number of labels. We use the proposed energy-based framework† to train existing stateof-the-art models and obtain a significant performance improvement, of up to 21% and 27%, on the Visual Genome [9] and GQA [5] benchmark datasets, respectively. Furthermore, we showcase the learning efficiency of the proposed framework by demonstrating superior performance in the zero- and few-shot settings where data is scarce.

...read moreread less

94 citations

Journal Article•DOI•

Massive soybean expansion in South America since 2000 and implications for conservation.

[...]

Xiao-Peng Song¹, Xiao-Peng Song², Matthew C. Hansen¹, Peter Potapov¹, Bernard Adusei¹, Jeffrey Pickering¹, Marcos Adami³, André Lima¹, V. Zalles¹, Stephen V. Stehman⁴, Carlos Marcelo Di Bella⁵, María Cecilia Conde⁵, Esteban J. Copati, Lucas B. Fernandes, Andres Hernandez-Serna¹, Samuel M. Jantz¹, A. H. Pickens¹, Svetlana Turubanova¹, Alexandra Tyukavina¹ - Show less +15 more•Institutions (5)

University of Maryland, College Park¹, Texas Tech University², Amazon.com³, State University of New York System⁴, University of Buenos Aires⁵

07 Jun 2021

TL;DR: For example, the authors found that between 2000 and 2019, most soybean expansion in South America was on pastures converted originally for cattle production, especially in the Brazilian Amazon, where 9% of forest loss was converted to soybeans by 2016.

...read moreread less

Abstract: A prominent goal of policies mitigating climate change and biodiversity loss is to achieve zero deforestation in the global supply chain of key commodities, such as palm oil and soybean. However, the extent and dynamics of deforestation driven by commodity expansion are largely unknown. Here we mapped annual soybean expansion in South America between 2000 and 2019 by combining satellite observations and sample field data. From 2000 to 2019, the area cultivated with soybean more than doubled from 26.4 Mha to 55.1 Mha. Most soybean expansion occurred on pastures originally converted from natural vegetation for cattle production. The most rapid expansion occurred in the Brazilian Amazon, where soybean area increased more than tenfold, from 0.4 Mha to 4.6 Mha. Across the continent, 9% of forest loss was converted to soybean by 2016. Soybean-driven deforestation was concentrated at the active frontiers, nearly half located in the Brazilian Cerrado. Efforts to limit future deforestation must consider how soybean expansion may drive deforestation indirectly by displacing pasture or other land uses. Holistic approaches that track land use across all commodities coupled with vegetation monitoring are required to maintain critical ecosystem services. Deforestation is often driven by land conversion for growing commodity crops. This study finds that, between 2000 and 2019, most soybean expansion in South America was on pastures converted originally for cattle production, especially in the Brazilian Amazon. More soy-driven deforestation occurred in the Brazilian Cerrado.

...read moreread less

91 citations

Proceedings Article•DOI•

BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation

[...]

Jwala Dhamala¹, Tony Sun², Varun Kumar¹, Satyapriya Krishna¹, Yada Pruksachatkun¹, Kai-Wei Chang¹, Rahul Gupta¹ - Show less +3 more•Institutions (2)

Amazon.com¹, University of California, Santa Barbara²

03 Mar 2021

TL;DR: The Bias in Open-Ended Language Generation Dataset (BOLD) as mentioned in this paper is a large-scale dataset that consists of 23,679 English text generation prompts for bias benchmarking across five domains: profession, gender, race, religion and political ideology.

...read moreread less

Abstract: Recent advances in deep learning techniques have enabled machines to generate cohesive open-ended text when prompted with a sequence of words as context. While these models now empower many downstream applications from conversation bots to automatic storytelling, they have been shown to generate texts that exhibit social biases. To systematically study and benchmark social biases in open-ended language generation, we introduce the Bias in Open-Ended Language Generation Dataset (BOLD), a large-scale dataset that consists of 23,679 English text generation prompts for bias benchmarking across five domains: profession, gender, race, religion, and political ideology. We also propose new automated metrics for toxicity, psycholinguistic norms, and text gender polarity to measure social biases in open-ended text generation from multiple angles. An examination of text generated from three popular language models reveals that the majority of these models exhibit a larger social bias than human-written Wikipedia text across all domains. With these results we highlight the need to benchmark biases in open-ended language generation and caution users of language generation models on downstream tasks to be cognizant of these embedded prejudices.

...read moreread less

Proceedings Article•DOI•

SiamMOT: Siamese Multi-Object Tracking

[...]

Bing Shuai¹, Andrew Berneshawi¹, Xinyu Li¹, Davide Modolo¹, Joseph Tighe¹ - Show less +1 more•Institutions (1)

Amazon.com¹

25 May 2021

TL;DR: SiamMOT as discussed by the authors introduces a motion model that estimates the instance's movement between two frames such that detected instances are associated, and it runs at 17 FPS for 720P videos on a single modern GPU.

...read moreread less

Abstract: In this paper, we focus on improving online multi-object tracking (MOT). In particular, we introduce a region-based Siamese Multi-Object Tracking network, which we name SiamMOT. SiamMOT includes a motion model that estimates the instance’s movement between two frames such that detected instances are associated. To explore how the motion modelling affects its tracking capability, we present two variants of Siamese tracker, one that implicitly models motion and one that models it explicitly. We carry out extensive quantitative experiments on three different MOT datasets: MOT17, TAO-person and Caltech Roadside Pedestrians, showing the importance of motion modelling for MOT and the ability of SiamMOT to substantially outperform the state-of-the-art. Finally, SiamMOT also outperforms the winners of ACM MM’20 HiEve Grand Challenge on HiEve dataset. Moreover, SiamMOT is efficient, and it runs at 17 FPS for 720P videos on a single modern GPU.

...read moreread less

Journal Article•DOI•

MOTChallenge: A Benchmark for Single-Camera Multiple Target Tracking

[...]

Patrick Dendorfer¹, Aljosa Osep¹, Anton Milan², Konrad Schindler³, Daniel Cremers¹, Ian Reid⁴, Stefan Roth⁵, Laura Leal-Taixé¹ - Show less +4 more•Institutions (5)

Technische Universität München¹, Amazon.com², ETH Zurich³, University of Adelaide⁴, Technische Universität Darmstadt⁵

01 Apr 2021-International Journal of Computer Vision

TL;DR: The MOTChallenge as mentioned in this paper is a benchmark for single-camera multiple object tracking (MOT) which has been widely used in the field of computer vision and has been used to evaluate the performance of object tracking algorithms.

...read moreread less

Abstract: Standardized benchmarks have been crucial in pushing the performance of computer vision algorithms, especially since the advent of deep learning. Although leaderboards should not be over-claimed, they often provide the most objective measure of performance and are therefore important guides for research. We present MOTChallenge, a benchmark for single-camera Multiple Object Tracking (MOT) launched in late 2014, to collect existing and new data and create a framework for the standardized evaluation of multiple object tracking methods. The benchmark is focused on multiple people tracking, since pedestrians are by far the most studied object in the tracking community, with applications ranging from robot navigation to self-driving cars. This paper collects the first three releases of the benchmark: (i) MOT15, along with numerous state-of-the-art results that were submitted in the last years, (ii) MOT16, which contains new challenging videos, and (iii) MOT17, that extends MOT16 sequences with more precise labels and evaluates tracking performance on three different object detectors. The second and third release not only offers a significant increase in the number of labeled boxes, but also provide labels for multiple object classes beside pedestrians, as well as the level of visibility for every single object of interest. We finally provide a categorization of state-of-the-art trackers and a broad error analysis. This will help newcomers understand the related work and research trends in the MOT community, and hopefully shed some light into potential future research directions.

...read moreread less

Proceedings Article•DOI•

Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach

[...]

Yue Yu¹, Simiao Zuo¹, Haoming Jiang², Wendi Ren¹, Tuo Zhao¹, Chao Zhang³ - Show less +2 more•Institutions (3)

Georgia Institute of Technology¹, Amazon.com², Yangzhou University³

01 Jun 2021

TL;DR: This work develops a contrastive self-training framework, COSINE, to enable fine-tuning LMs with weak supervision, underpinned by contrastive regularization and confidence-based reweighting, which gradually improves model fitting while effectively suppressing error propagation.

...read moreread less

Abstract: Fine-tuned pre-trained language models (LMs) have achieved enormous success in many natural language processing (NLP) tasks, but they still require excessive labeled data in the fine-tuning stage. We study the problem of fine-tuning pre-trained LMs using only weak supervision, without any labeled data. This problem is challenging because the high capacity of LMs makes them prone to overfitting the noisy labels generated by weak supervision. To address this problem, we develop a contrastive self-training framework, COSINE, to enable fine-tuning LMs with weak supervision. Underpinned by contrastive regularization and confidence-based reweighting, our framework gradually improves model fitting while effectively suppressing error propagation. Experiments on sequence, token, and sentence pair classification tasks show that our model outperforms the strongest baseline by large margins and achieves competitive performance with fully-supervised fine-tuning methods. Our implementation is available on https://github.com/yueyu1030/COSINE.

...read moreread less

Proceedings Article•DOI•

The 5th AI City Challenge

[...]

Milind Naphade¹, Shuo Wang¹, David C. Anastasiu², Zheng Tang³, Ming-Ching Chang⁴, Xiaodong Yang, Yue Yao⁵, Liang Zheng⁵, Pranamesh Chakraborty⁶, Anuj Sharma⁷, Qi Feng⁸, Vitaly Ablavsky⁹, Stan Sclaroff⁸ - Show less +9 more•Institutions (9)

Nvidia¹, Santa Clara University², Amazon.com³, University at Albany, SUNY⁴, Australian National University⁵, Indian Institute of Technology Kanpur⁶, Iowa State University⁷, Boston University⁸, University of Washington⁹

01 Jun 2021

TL;DR: The fifth AI City Challenge as mentioned in this paper attracted 305 participating teams across 38 countries, who leveraged city-scale real traffic data and high-quality synthetic data to compete in five challenge tracks: Track 1 addressed video-based automatic vehicle counting, where the evaluation being conducted on both algorithmic effectiveness and computational efficiency.

...read moreread less

Abstract: The AI City Challenge was created with two goals in mind: (1) pushing the boundaries of research and development in intelligent video analysis for smarter cities use cases, and (2) assessing tasks where the level of performance is enough to cause real-world adoption. Transportation is a segment ripe for such adoption. The fifth AI City Challenge attracted 305 participating teams across 38 countries, who leveraged city-scale real traffic data and high-quality synthetic data to compete in five challenge tracks. Track 1 addressed video-based automatic vehicle counting, where the evaluation being conducted on both algorithmic effectiveness and computational efficiency. Track 2 addressed city-scale vehicle re-identification with augmented synthetic data to substantially increase the training set for the task. Track 3 addressed city-scale multi-target multi-camera vehicle tracking. Track 4 addressed traffic anomaly detection. Track 5 was a new track addressing vehicle retrieval using natural language descriptions. The evaluation system shows a general leader board of all submitted results, and a public leader board of results limited to the contest participation rules, where teams are not allowed to use external data in their work. The public leader board shows results more close to real-world situations where annotated data is limited. Results show the promise of AI in Smarter Transportation. State-of-the-art performance for some tasks shows that these technologies are ready for adoption in real-world systems.

...read moreread less

Proceedings Article•DOI•

BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation.

[...]

Jwala Dhamala¹, Tony Sun², Varun Kumar¹, Satyapriya Krishna³, Yada Pruksachatkun¹, Kai-Wei Chang¹, Rahul Gupta¹ - Show less +3 more•Institutions (3)

Amazon.com¹, University of California, Santa Barbara², Association for Computing Machinery³

27 Jan 2021-arXiv: Computation and Language

...read moreread less

Journal Article•DOI•

LaSOT: A High-quality Large-scale Single Object Tracking Benchmark

[...]

Heng Fan¹, Hexin Bai², Liting Lin³, Fan Yang², Peng Chu², Ge Deng², Sijia Yu², Harshit¹, Mingzhen Huang¹, Juehuan Liu², Yong Xu³, Chunyuan Liao, Lin Yuan⁴, Haibin Ling¹ - Show less +10 more•Institutions (4)

Stony Brook University¹, Temple University², South China University of Technology³, Amazon.com⁴

01 Feb 2021-International Journal of Computer Vision

TL;DR: The goal in releasing LaSOT is to provide a dedicated high quality platform for both training and evaluation of trackers, and to take advantage of the close connection between visual appearance and natural language, the largest densely annotated tracking benchmark to be presented.

...read moreread less

Abstract: Despite great recent advances in visual tracking, its further development, including both algorithm design and evaluation, is limited due to lack of dedicated large-scale benchmarks. To address this problem, we present LaSOT, a high-quality Large-scale Single Object Tracking benchmark. LaSOT contains a diverse selection of 85 object classes, and offers 1550 totaling more than 3.87 million frames. Each video frame is carefully and manually annotated with a bounding box. This makes LaSOT, to our knowledge, the largest densely annotated tracking benchmark. Our goal in releasing LaSOT is to provide a dedicated high quality platform for both training and evaluation of trackers. The average video length of LaSOT is around 2500 frames, where each video contains various challenge factors that exist in real world video footage,such as the targets disappearing and re-appearing. These longer video lengths allow for the assessment of long-term trackers. To take advantage of the close connection between visual appearance and natural language, we provide language specification for each video in LaSOT. We believe such additions will allow for future research to use linguistic features to improve tracking. Two protocols, full-overlap and one-shot, are designated for flexible assessment of trackers. We extensively evaluate 48 baseline trackers on LaSOT with in-depth analysis, and results reveal that there still exists significant room for improvement. The complete benchmark, tracking results as well as analysis are available at http://vision.cs.stonybrook.edu/~lasot/ .

...read moreread less

Proceedings Article•DOI•

Audio Albert: A Lite Bert for Self-Supervised Learning of Audio Representation

[...]

Po-Han Chi¹, Pei-Hung Chung¹, Tsung-Han Wu¹, Chun-Cheng Hsieh¹, Yen-Hao Chen¹, Shang-Wen Li², Hung-yi Lee¹ - Show less +3 more•Institutions (2)

National Taiwan University¹, Amazon.com²

19 Jan 2021

TL;DR: In this article, the authors proposed Audio ALBERT, a lite version of the self-supervised speech representation model, which achieved performance comparable with massive pre-trained networks in the downstream tasks while having 91% fewer parameters.

...read moreread less

Abstract: Self-supervised speech models are powerful speech representation extractors for downstream applications. Recently, larger models have been utilized in acoustic model training to achieve better performance. We propose Audio ALBERT, a lite version of the self-supervised speech representation model. We apply the lightweight representation extractor to two downstream tasks, speaker classification and phoneme classification. We show that Audio ALBERT achieves performance comparable with massive pre-trained networks in the downstream tasks while having 91% fewer parameters. Moreover, we design probing models to measure how much the latent representations can encode the speaker’s and phoneme’s information. We find that the representations encoded in internal layers of Audio ALBERT contain more information for both phoneme and speaker than the last layer, which is generally used for downstream tasks. Our findings provide a new avenue for using self-supervised networks to achieve better performance and efficiency.

...read moreread less

DOI•

GA4GH: International policies and standards for data sharing across genomic research and healthcare

[...]

Heidi L. Rehm¹, Heidi L. Rehm², Angela Page², Lindsay Smith³ +220 more•Institutions (73)

10 Nov 2021

TL;DR: The Global Alliance for Genomics and Health (GA4GH) aims to accelerate biomedical advances by enabling the responsible sharing of clinical and genomic data through both harmonized data aggregation and federated approaches.

...read moreread less

Abstract: Summary The Global Alliance for Genomics and Health (GA4GH) aims to accelerate biomedical advances by enabling the responsible sharing of clinical and genomic data through both harmonized data aggregation and federated approaches. The decreasing cost of genomic sequencing (along with other genome-wide molecular assays) and increasing evidence of its clinical utility will soon drive the generation of sequence data from tens of millions of humans, with increasing levels of diversity. In this perspective, we present the GA4GH strategies for addressing the major challenges of this data revolution. We describe the GA4GH organization, which is fueled by the development efforts of eight Work Streams and informed by the needs of 24 Driver Projects and other key stakeholders. We present the GA4GH suite of secure, interoperable technical standards and policy frameworks and review the current status of standards, their relevance to key domains of research and clinical care, and future plans of GA4GH. Broad international participation in building, adopting, and deploying GA4GH standards and frameworks will catalyze an unprecedented effort in data sharing that will be critical to advancing genomic medicine and ensuring that all populations can access its benefits.

...read moreread less

Journal Article•DOI•

Economic Predictions With Big Data: The Illusion of Sparsity

[...]

Domenico Giannone¹, Domenico Giannone², Michele Lenza³, Michele Lenza¹, Giorgio E. Primiceri⁴, Giorgio E. Primiceri¹, Giorgio E. Primiceri⁵ - Show less +3 more•Institutions (5)

Center for Economic and Policy Research¹, Amazon.com², European Central Bank³, Northwestern University⁴, National Bureau of Economic Research⁵

01 Jan 2021-Econometrica

TL;DR: This work compares sparse and dense representations of predictive models in macroeconomic, microeconomics, and finance and specifies a prior that allows for both variable selection and shrinkage.

...read moreread less

Abstract: We compare sparse and dense representations of predictive models in macroeconomics, microeconomics, and finance. To deal with a large number of possible predictors, we specify a prior that allows for both variable selection and shrinkage. The posterior distribution does not typically concentrate on a single sparse model, but on a wide set of models that often include many predictors.

...read moreread less

Proceedings Article•DOI•

Domain Consensus Clustering for Universal Domain Adaptation

[...]

Guangrui Li¹, Guoliang Kang², Yi Zhu³, Yunchao Wei¹, Yi Yang¹ - Show less +1 more•Institutions (3)

University of Technology, Sydney¹, Carnegie Mellon University², Amazon.com³

01 Jun 2021

TL;DR: Wang et al. as discussed by the authors proposed Domain Consensus Clustering (DCC), which exploits the domain consensus knowledge to discover discriminative clusters on both common samples and private ones.

...read moreread less

Abstract: In this paper, we investigate Universal Domain Adaptation (UniDA) problem, which aims to transfer the knowledge from source to target under unaligned label space. The main challenge of UniDA lies in how to separate common classes (i.e., classes shared across domains), from private classes (i.e., classes only exist in one domain). Previous works treat the private samples in the target as one generic class but ignore their intrinsic structure. Consequently, the resulting representations are not compact enough in the latent space and can be easily confused with common samples. To better exploit the intrinsic structure of the target domain, we propose Domain Consensus Clustering (DCC), which exploits the domain consensus knowledge to discover discriminative clusters on both common samples and private ones. Specifically, we draw the domain consensus knowledge from two aspects to facilitate the clustering and the private class discovery, i.e., the semantic-level consensus, which identifies the cycle-consistent clusters as the common classes, and the sample-level consensus, which utilizes the cross-domain classification agreement to determine the number of clusters and discover the private classes. Based on DCC, we are able to separate the private classes from the common ones, and differentiate the private classes themselves. Finally, we apply a class-aware alignment technique on identified common samples to minimize the distribution shift, and a prototypical regularizer to inspire discriminative target clusters. Experiments on four benchmarks demonstrate DCC significantly outperforms previous state-of-the-arts.

...read moreread less

Journal Article•DOI•

Measuring Technological Innovation over the Long Run

[...]

Bryan T. Kelly¹, Dimitris Papanikolaou², Amit Seru³, Matt Taddy⁴•Institutions (4)

Yale University¹, Northwestern University², Hoover Institution³, Amazon.com⁴

01 Sep 2021-National Bureau of Economic Research

TL;DR: In this article, the authors use textual analysis of high-dimensional data from patent documents to create new indicators of technological innovation and identify significant patents based on textual similarity of a given patent to previous and subsequent work: these patents are distinct from previous work but are related to subsequent innovations.

...read moreread less

Abstract: We use textual analysis of high-dimensional data from patent documents to create new indicators of technological innovation We identify significant patents based on textual similarity of a given patent to previous and subsequent work: these patents are distinct from previous work but are related to subsequent innovations Our measure of patent significance is predictive of future citations and correlates strongly with measures of market value We identify breakthrough innovations as the most significant patents – those in the right tail of our measure – to construct indices of technological change at the aggregate, sectoral, and firm level Our technology indices span two centuries (1840-2010) and cover innovation by private and public firms, as well as non-profit organizations and the US government These indices capture the evolution of technological waves over a long time span and are strong predictors of productivity at the aggregate and sectoral level

...read moreread less

Journal Article•DOI•

Taking the pulse of Earth's tropical forests using networks of highly distributed plots

[...]

Cecilia Blundo¹, Julieta Carilla¹, Ricardo Grau¹, Agustina Malizia¹ +549 more•Institutions (176)

25 May 2021-Biological Conservation

TL;DR: In this paper, the authors show how a global community is responding to the challenges of tropical ecosystem research with diverse teams measuring forests tree-by-tree in thousands of long-term plots.

...read moreread less

DOI•

Scientific outcome of the IPBES-IPCC co-sponsored workshop on biodiversity and climate change

[...]

Hans Otto-Portner, Bob Scholes¹, John Agard², Emma Archer³, Almut Arneth⁴, Xuemei Bai⁵, David K. A. Barnes⁶, Michael T. Burrows⁷, Lena Chan⁸, Wai Lung Cheung⁹, Sarah E. Diamond¹⁰, Camila I. Donatti¹¹, Carlos M. Duarte¹², Nico Eisenhauer¹³, Wendy Foden¹⁴, Maria A. Gasalla¹⁵, Collins Handa¹⁶, Thomas Hickler, Ove Hoegh-Guldberg¹⁷, Kazuhito Ichii¹⁸, Ute Jacob, Gregory Insarov¹⁹, Wolfgang Kiessling²⁰, Paul Leadley, Rik Leemans²¹, Lisa A. Levin²², Michelle Lim²³, Shobha S. Maharaj²³, Shunsuke Managi²⁴, Pablo A. Marquet²⁵, Pamela McElwee²⁶, Guy F. Midgley¹⁴, Thierry Oberdorff²⁷, David Obura, Balgis Osman Elasha, Ram Pandit, Unai Pascual²⁸, Aliny P. F. Pires²⁹, Alexander Popp³⁰, Victoria Reyes-García³¹, Mahesh Sankaran³², Josef Settele, Yunne-Jai Shin²⁷, Dejene W. Sintayehu³³, Pete Smith³⁴, Nadja Steiner³⁵, Bernardo B. N. Strassburg³⁶, Raman Sukumar³⁷, Christopher H. Trisos³⁸, Adalberto Luis Val³⁹, Jianguo Wu, Edvin Aldrian, Camille Parmesan⁴⁰, Ramon Pichs-Madruga, Debra Roberts, Alex Rogers, Sandra Díaz⁴¹, Markus Fischer⁴², Shizuka Hashimoto⁴³, Sandra Lavorel, Ning Wu⁴⁴, Hien T Ngo⁴⁵ - Show less +58 more•Institutions (45)

Alfred Wegener Institute for Polar and Marine Research¹, University of the West Indies², University of Pretoria³, Karlsruhe Institute of Technology⁴, Australian National University⁵, British Antarctic Survey⁶, Scottish Association for Marine Science⁷, National Parks Board⁸, University of British Columbia⁹, Case Western Reserve University¹⁰, Conservation International¹¹, King Abdullah University of Science and Technology¹², Leipzig University¹³, Stellenbosch University¹⁴, University of São Paulo¹⁵, Technical University of Kenya¹⁶, University of Queensland¹⁷, Chiba University¹⁸, Russian Academy¹⁹, University of Erlangen-Nuremberg²⁰, Wageningen University and Research Centre²¹, University of California, San Diego²², Macquarie University²³, Kyushu University²⁴, Pontifical Catholic University of Chile²⁵, Rutgers University²⁶, Institut de recherche pour le développement²⁷, Ikerbasque²⁸, Rio de Janeiro State University²⁹, Potsdam Institute for Climate Impact Research³⁰, Autonomous University of Barcelona³¹, National Centre for Biological Sciences³², Haramaya University³³, University of Aberdeen³⁴, Fisheries and Oceans Canada³⁵, Pontifical Catholic University of Rio de Janeiro³⁶, Indian Institute of Science³⁷, University of Cape Town³⁸, Amazon.com³⁹, University of Texas at Austin⁴⁰, National University of Cordoba⁴¹, University of Bern⁴², University of Tokyo⁴³, Chinese Academy of Sciences⁴⁴, Food and Agriculture Organization⁴⁵

24 Jun 2021

Proceedings Article•DOI•

Mixed-Privacy Forgetting in Deep Networks

[...]

Aditya Golatkar¹, Alessandro Achille², Avinash Ravichandran², Marzia Polito², Stefano Soatto² - Show less +1 more•Institutions (2)

University of California, Los Angeles¹, Amazon.com²

01 Jun 2021

TL;DR: In this paper, the influence of a subset of the training samples can be removed from the weights of a network trained on large-scale image classification tasks, and they provide strong computable bounds on the amount of remaining information after forgetting.

...read moreread less

Abstract: We show that the influence of a subset of the training samples can be removed – or "forgotten" – from the weights of a network trained on large-scale image classification tasks, and we provide strong computable bounds on the amount of remaining information after forgetting. Inspired by real-world applications of forgetting techniques, we introduce a novel notion of forgetting in mixed-privacy setting, where we know that a "core" subset of the training samples does not need to be forgotten. While this variation of the problem is conceptually simple, we show that working in this setting significantly improves the accuracy and guarantees of forgetting methods applied to vision classification tasks. Moreover, our method allows efficient removal of all information contained in non-core data by simply setting to zero a subset of the weights with minimal loss in performance. We achieve these results by replacing a standard deep network with a suitable linear approximation. With opportune changes to the network architecture and training procedure, we show that such linear approximation achieves comparable performance to the original network and that the forgetting problem becomes quadratic and can be solved efficiently even for large models. Unlike previous forgetting methods on deep networks, ours can achieve close to the state-of-the-art accuracy on large scale vision tasks. In particular, we show that our method allows forgetting without having to trade off the model accuracy.

...read moreread less

Journal Article•DOI•

Formal Synthesis of Lyapunov Neural Networks

[...]

Alessandro Abate¹, Daniele Ahmed², Mirco Giacobbe¹, Andrea Peruffo¹•Institutions (2)

University of Oxford¹, Amazon.com²

01 Jul 2021

TL;DR: This work proposes a method that finds Lyapunov functions fully automatically—using machine learning—while also providing formal guarantees—using satisfiability modulo theories (SMT), and synthesises Lyap unov functions faster and over wider spatial domains than the alternatives, yet providing stronger or equal guarantees.

...read moreread less

Abstract: We propose an automatic and formally sound method for synthesising Lyapunov functions for the asymptotic stability of autonomous non-linear systems Traditional methods are either analytical and require manual effort or are numerical but lack of formal soundness Symbolic computational methods for Lyapunov functions, which are in between, give formal guarantees but are typically semi-automatic because they rely on the user to provide appropriate function templates We propose a method that finds Lyapunov functions fully automatically—using machine learning—while also providing formal guarantees—using satisfiability modulo theories (SMT) We employ a counterexample-guided approach where a numerical learner and a symbolic verifier interact to construct provably correct Lyapunov neural networks (LNNs) The learner trains a neural network that satisfies the Lyapunov criteria for asymptotic stability over a samples set; the verifier proves via SMT solving that the criteria are satisfied over the whole domain or augments the samples set with counterexamples Our method supports neural networks with polynomial activation functions and multiple depth and width, which display wide learning capabilities We demonstrate our method over several non-trivial benchmarks and compare it favourably against a numerical optimisation-based approach, a symbolic template-based approach, and a cognate LNN-based approach Our method synthesises Lyapunov functions faster and over wider spatial domains than the alternatives, yet providing stronger or equal guarantees

...read moreread less

Proceedings Article•

Fast and Complete: Enabling Complete Neural Network Verification with Rapid and Massively Parallel Incomplete Verifiers

[...]

Kaidi Xu¹, Huan Zhang², Shiqi Wang³, Yihan Wang², Suman Jana³, Xue Lin¹, Cho-Jui Hsieh⁴ - Show less +3 more•Institutions (4)

Northeastern University¹, University of California, Los Angeles², Columbia University³, Amazon.com⁴

03 May 2021

TL;DR: This paper proposes to use the backward mode linear relaxation based perturbation analysis (LiRPA) to replace LP during the BaB process, which can be efficiently implemented on the typical machine learning accelerators such as GPUs and TPUs and demonstrates an order of magnitude speedup compared to existing LP-based approaches.

...read moreread less

Abstract: Formal verification of neural networks (NNs) is a challenging and important problem. Existing efficient complete solvers typically require the branch-and-bound (BaB) process, which splits the problem domain into sub-domains and solves each sub-domain using faster but weaker incomplete verifiers, such as Linear Programming (LP) on linearly relaxed sub-domains. In this paper, we propose to use the backward mode linear relaxation based perturbation analysis (LiRPA) to replace LP during the BaB process, which can be efficiently implemented on the typical machine learning accelerators such as GPUs and TPUs. However, unlike LP, LiRPA when applied naively can produce much weaker bounds and even cannot check certain conflicts of sub-domains during splitting, making the entire procedure incomplete after BaB. To address these challenges, we apply a fast gradient based bound tightening procedure combined with batch splits and the design of minimal usage of LP bound procedure, enabling us to effectively use LiRPA on the accelerator hardware for the challenging complete NN verification problem and significantly outperform LP-based approaches. On a single GPU, we demonstrate an order of magnitude speedup compared to existing LP-based approaches.

...read moreread less

Journal Article•DOI•

Bias Preservation in Machine Learning: The Legality of Fairness Metrics Under EU Non-Discrimination Law

[...]

Sandra Wachter¹, Brent Mittelstadt¹, Chris Russell²•Institutions (2)

University of Oxford¹, Amazon.com²

15 Jan 2021-Social Science Research Network

TL;DR: A novel classification scheme for fairness metrics in machine learning is proposed based on how they handle pre-existing bias and thus align with the aims of non-discrimination law and provides concrete recommendations including a user-friendly checklist for choosing the most appropriate fairness metric for uses of machine learning and AI under EU non- discrimination law.

...read moreread less

Abstract: Western societies are marked by diverse and extensive biases and inequality that are unavoidably embedded in the data used to train machine learning. Algorithms trained on biased data will, without intervention, produce biased outcomes and increase the inequality experienced by historically disadvantaged groups. Recognising this problem, much work has emerged in recent years to test for bias in machine learning and AI systems using various fairness and bias metrics. Often these metrics address technical bias but ignore the underlying causes of inequality. In this paper we make three contributions. First, we assess the compatibility of fairness metrics used in machine learning against the aims and purpose of EU non-discrimination law. We show that the fundamental aim of the law is not only to prevent ongoing discrimination, but also to change society, policies, and practices to ‘level the playing field’ and achieve substantive rather than merely formal equality. Based on this, we then propose a novel classification scheme for fairness metrics in machine learning based on how they handle pre-existing bias and thus align with the aims of non-discrimination law. Specifically, we distinguish between ‘bias preserving’ and ‘bias transforming’ fairness metrics. Our classification system is intended to bridge the gap between non-discrimination law and decisions around how to measure fairness in machine learning and AI in practice. Finally, we show that the legal need for justification in cases of indirect discrimination can impose additional obligations on developers, deployers, and users that choose to use bias preserving fairness metrics when making decisions about individuals because they can give rise to prima facie discrimination. To achieve substantive equality in practice, and thus meet the aims of the law, we instead recommend using bias transforming metrics. To conclude, we provide concrete recommendations including a user-friendly checklist for choosing the most appropriate fairness metric for uses of machine learning and AI under EU non-discrimination law.

...read moreread less

Journal Article•DOI•

Relationship between obesity and severe COVID-19 outcomes in patients with type 2 diabetes: Results from the CORONADO study.

[...]

Sarra Smati¹, Blandine Tramunt², Matthieu Wargny, Cyrielle Caussy³, Bénédicte Gaborit, Camille Vatier⁴, Bruno Vergès, Deborah Ancelle, Coralie Amadou, Leila Ait Bachir, Olivier Bourron⁴, Olivier Bourron⁵, Christine Coffin-Boutreux, Sara Barraud, Anne Dorange, Bénédicte Fremy, Jean-François Gautier⁶, Natacha Germain, Etienne Larger⁴, Stéphanie Laugier-Robiolle, Laurent Meyer, Arnaud Monier, Isabelle Moura, Louis Potier⁴, Nadia Sabbah⁷, Nadia Sabbah⁸, Dominique Seret-Bégué, Patrice Winiszewski, Matthieu Pichelin¹, Pierre-Jean Saulnier⁹, Samy Hadjadj¹, Bertrand Cariou¹, Pierre Gourdy² - Show less +29 more•Institutions (9)

University of Nantes¹, University of Toulouse², Claude Bernard University Lyon 1³, University of Paris⁴, French Institute of Health and Medical Research⁵, Paris Diderot University⁶, University of the French West Indies and Guiana⁷, Amazon.com⁸, University of Poitiers⁹

01 Feb 2021-Diabetes, Obesity and Metabolism

TL;DR: To assess the relationship between body mass index (BMI) classes and early COVID‐19 prognosis in inpatients with type 2 diabetes (T2D), a large number of patients with T2D were diagnosed with type 1 diabetes.

...read moreread less

Abstract: Aim To assess the relationship between body mass index (BMI) classes and early COVID-19 prognosis in inpatients with type 2 diabetes (T2D). Methods From the CORONAvirus-SARS-CoV-2 and Diabetes Outcomes (CORONADO) study, we conducted an analysis in patients with T2D categorized by four BMI subgroups according to the World Health Organization classification. Clinical characteristics and COVID-19-related outcomes (i.e. intubation for mechanical ventilation [IMV], death and discharge by day 7 [D7]) were analysed according to BMI status. Results Among 1965 patients with T2D, 434 (22.1%) normal weight (18.5-24.9 kg/m2 , reference group), 726 (36.9%) overweight (25-29.9 kg/m2 ) and 805 (41.0%) obese subjects were analysed, including 491 (25.0%) with class I obesity (30-34.9 kg/m2 ) and 314 (16.0%) with class II/III obesity (≥35 kg/m2 ). In a multivariable-adjusted model, the primary outcome (i.e. IMV and/or death by D7) was significantly associated with overweight (OR 1.65 [1.05-2.59]), class I (OR 1.93 [1.19-3.14]) and class II/III obesity (OR 1.98 [1.11-3.52]). After multivariable adjustment, primary outcome by D7 was significantly associated with obesity in patients aged younger than 75 years, while such an association was no longer found in those aged older than 75 years. Conclusions Overweight and obesity are associated with poor early prognosis in patients with T2D hospitalized for COVID-19. Importantly, the deleterious impact of obesity on COVID-19 prognosis was no longer observed in the elderly, highlighting the need for specific management in this population.

...read moreread less

Proceedings Article•DOI•

Exponential Moving Average Normalization for Self-supervised and Semi-supervised Learning

[...]

Zhaowei Cai¹, Avinash Ravichandran¹, Subhransu Maji¹, Charless C. Fowlkes¹, Zhuowen Tu¹, Stefano Soatto¹ - Show less +2 more•Institutions (1)

Amazon.com¹

01 Jun 2021

TL;DR: This article proposed an exponential moving average normalization (EMAN) to improve the performance of student-teacher based self- and semi-supervised learning techniques, which reduces the intrinsic cross-sample dependency of BN and enhances the generalization of the teacher.

...read moreread less

Abstract: We present a plug-in replacement for batch normalization (BN) called exponential moving average normalization (EMAN), which improves the performance of existing student-teacher based self- and semi-supervised learning techniques. Unlike the standard BN, where the statistics are computed within each batch, EMAN, used in the teacher, updates its statistics by exponential moving average from the BN statistics of the student. This design reduces the intrinsic cross-sample dependency of BN and enhances the generalization of the teacher. EMAN improves strong baselines for self-supervised learning by 4-6/1-2 points and semi-supervised learning by about 7/2 points, when 1%/10% supervised labels are available on ImageNet. These improvements are consistent across methods, network architectures, training duration, and datasets, demonstrating the general effectiveness of this technique. The code will be made available online.

...read moreread less

Collapse