Showing papers on "Representation (systemics) published in 2020"

PDF

Open Access

Journal Article•DOI•

Deep Learning for Generic Object Detection: A Survey

[...]

Li Liu¹, Li Liu², Wanli Ouyang³, Xiaogang Wang⁴, Paul Fieguth⁵, Jie Chen², Xinwang Liu¹, Matti Pietikäinen² - Show less +4 more•Institutions (5)

National University of Defense Technology¹, University of Oulu², University of Sydney³, The Chinese University of Hong Kong⁴, University of Waterloo⁵

01 Feb 2020-International Journal of Computer Vision

TL;DR: A comprehensive survey of the recent achievements in this field brought about by deep learning techniques, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics.

...read moreread less

Abstract: Object detection, one of the most fundamental and challenging problems in computer vision, seeks to locate object instances from a large number of predefined categories in natural images. Deep learning techniques have emerged as a powerful strategy for learning feature representations directly from data and have led to remarkable breakthroughs in the field of generic object detection. Given this period of rapid evolution, the goal of this paper is to provide a comprehensive survey of the recent achievements in this field brought about by deep learning techniques. More than 300 research contributions are included in this survey, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics. We finish the survey by identifying promising directions for future research.

...read moreread less

1,897 citations

Book Chapter•DOI•

Object-Contextual Representations for Semantic Segmentation

[...]

Yuhui Yuan¹, Xilin Chen¹, Jingdong Wang²•Institutions (2)

Chinese Academy of Sciences¹, Microsoft²

23 Aug 2020

TL;DR: This paper addresses the semantic segmentation problem with a focus on the context aggregation strategy, and presents a simple yet effective approach, object-contextual representations, characterizing a pixel by exploiting the representation of the corresponding object class.

...read moreread less

Abstract: In this paper, we study the context aggregation problem in semantic segmentation. Motivated by that the label of a pixel is the category of the object that the pixel belongs to, we present a simple yet effective approach, object-contextual representations, characterizing a pixel by exploiting the representation of the corresponding object class. First, we learn object regions under the supervision of the ground-truth segmentation. Second, we compute the object region representation by aggregating the representations of the pixels lying in the object region. Last, we compute the relation between each pixel and each object region, and augment the representation of each pixel with the object-contextual representation which is a weighted aggregation of all the object region representations. We empirically demonstrate our method achieves competitive performance on various benchmarks: Cityscapes, ADE20K, LIP, PASCAL-Context and COCO-Stuff. Our submission “HRNet + OCR + SegFix” achieves the \({1}^{\mathrm {st}}\) place on the Cityscapes leaderboard by the ECCV 2020 submission deadline. Code is available at: https://git.io/openseg and https://git.io/HRNet.OCR.

...read moreread less

952 citations

Proceedings Article•DOI•

Interpreting the Latent Space of GANs for Semantic Face Editing

[...]

Yujun Shen¹, Jinjin Gu¹, Xiaoou Tang¹, Bolei Zhou¹•Institutions (1)

The Chinese University of Hong Kong¹

14 Jun 2020

TL;DR: This work proposes a novel framework, called InterFaceGAN, for semantic face editing by interpreting the latent semantics learned by GANs, and finds that the latent code of well-trained generative models actually learns a disentangled representation after linear transformations.

...read moreread less

Abstract: Despite the recent advance of Generative Adversarial Networks (GANs) in high-fidelity image synthesis, there lacks enough understanding of how GANs are able to map a latent code sampled from a random distribution to a photo-realistic image. Previous work assumes the latent space learned by GANs follows a distributed representation but observes the vector arithmetic phenomenon. In this work, we propose a novel framework, called InterFaceGAN, for semantic face editing by interpreting the latent semantics learned by GANs. In this framework, we conduct a detailed study on how different semantics are encoded in the latent space of GANs for face synthesis. We find that the latent code of well-trained generative models actually learns a disentangled representation after linear transformations. We explore the disentanglement between various semantics and manage to decouple some entangled semantics with subspace projection, leading to more precise control of facial attributes. Besides manipulating gender, age, expression, and the presence of eyeglasses, we can even vary the face pose as well as fix the artifacts accidentally generated by GAN models. The proposed method is further applied to achieve real image manipulation when combined with GAN inversion methods or some encoder-involved models. Extensive results suggest that learning to synthesize faces spontaneously brings a disentangled and controllable facial attribute representation.

...read moreread less

560 citations

Posted Content•

Learning Continuous Image Representation with Local Implicit Image Function

[...]

Yinbo Chen¹, Sifei Liu², Xiaolong Wang¹•Institutions (2)

University of California, San Diego¹, Nvidia²

16 Dec 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper proposes Local Implicit Image Function (LIIF), which takes an image coordinate and the 2D deep features around the coordinate as inputs, predicts the RGB value at a given coordinate as an output and builds a bridge between discrete and continuous representation in 2D.

...read moreread less

Abstract: How to represent an image? While the visual world is presented in a continuous manner, machines store and see the images in a discrete way with 2D arrays of pixels. In this paper, we seek to learn a continuous representation for images. Inspired by the recent progress in 3D reconstruction with implicit neural representation, we propose Local Implicit Image Function (LIIF), which takes an image coordinate and the 2D deep features around the coordinate as inputs, predicts the RGB value at a given coordinate as an output. Since the coordinates are continuous, LIIF can be presented in arbitrary resolution. To generate the continuous representation for images, we train an encoder with LIIF representation via a self-supervised task with super-resolution. The learned continuous representation can be presented in arbitrary resolution even extrapolate to x30 higher resolution, where the training tasks are not provided. We further show that LIIF representation builds a bridge between discrete and continuous representation in 2D, it naturally supports the learning tasks with size-varied image ground-truths and significantly outperforms the method with resizing the ground-truths.

...read moreread less

198 citations

Posted Content•

Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction

[...]

Guy Gafni¹, Justus Thies¹, Michael Zollhöfer², Matthias Nießner¹•Institutions (2)

Technische Universität München¹, Facebook²

05 Dec 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work combines a scene representation network with a low-dimensional morphable model which provides explicit control over pose and expressions and shows that this learned volumetric representation allows for photorealistic image generation that surpasses the quality of state-of-the-art video-based reenactment methods.

...read moreread less

Abstract: We present dynamic neural radiance fields for modeling the appearance and dynamics of a human face. Digitally modeling and reconstructing a talking human is a key building-block for a variety of applications. Especially, for telepresence applications in AR or VR, a faithful reproduction of the appearance including novel viewpoints or head-poses is required. In contrast to state-of-the-art approaches that model the geometry and material properties explicitly, or are purely image-based, we introduce an implicit representation of the head based on scene representation networks. To handle the dynamics of the face, we combine our scene representation network with a low-dimensional morphable model which provides explicit control over pose and expressions. We use volumetric rendering to generate images from this hybrid representation and demonstrate that such a dynamic neural scene representation can be learned from monocular input data only, without the need of a specialized capture setup. In our experiments, we show that this learned volumetric representation allows for photo-realistic image generation that surpasses the quality of state-of-the-art video-based reenactment methods.

...read moreread less

185 citations

Proceedings Article•DOI•

Video Playback Rate Perception for Self-Supervised Spatio-Temporal Representation Learning

[...]

Yuan Yao¹, Chang Liu¹, Dezhao Luo¹, Yu Zhou¹, Qixiang Ye¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

14 Jun 2020

TL;DR: A novel self-supervised method, referred to as video Playback Rate Perception (PRP), to learn spatio-temporal representation in a simple-yet-effective way and outperforms state-of-the-art self- supervised models with significant margins.

...read moreread less

Abstract: In self-supervised spatio-temporal representation learning, the temporal resolution and long-short term characteristics are not yet fully explored, which limits representation capabilities of learned models. In this paper, we propose a novel self-supervised method, referred to as video Playback Rate Perception (PRP), to learn spatio-temporal representation in a simple-yet-effective way. PRP roots in a dilated sampling strategy, which produces self-supervision signals about video playback rates for representation model learning. PRP is implemented with a feature encoder, a classification module, and a reconstructing decoder, to achieve spatio-temporal semantic retention in a collaborative discrimination-generation manner. The discriminative perception model follows a feature encoder to prefer perceiving low temporal resolution and long-term representation by classifying fast-forward rates. The generative perception model acts as a feature decoder to focus on comprehending high temporal resolution and short-term representation by introducing a motion-attention mechanism. PRP is applied on typical video target tasks including action recognition and video retrieval. Experiments show that PRP outperforms state-of-the-art self-supervised models with significant margins. Code is available at github.com/yuanyao366/PRP.

...read moreread less

165 citations

Proceedings Article•DOI•

Category-Level Articulated Object Pose Estimation

[...]

Xiaolong Li¹, He Wang², Li Yi³, Leonidas J. Guibas², A. Lynn Abbott¹, Shuran Song⁴ - Show less +2 more•Institutions (4)

Virginia Tech¹, Stanford University², Google³, Columbia University⁴

14 Jun 2020

TL;DR: A deep network based on PointNet++ is developed that predicts ANCSH from a single depth point cloud, including part segmentation, normalized coordinates, and joint parameters in the canonical object space, and leveraging the canonicalized joints are demonstrated.

...read moreread less

Abstract: This paper addresses the task of category-level pose estimation for articulated objects from a single depth image. We present a novel category-level approach that correctly accommodates object instances previously unseen during training. We introduce Articulation-aware Normalized Coordinate Space Hierarchy (ANCSH) – a canonical representation for different articulated objects in a given category. As the key to achieve intra-category generalization, the representation constructs a canonical object space as well as a set of canonical part spaces. The canonical object space normalizes the object orientation, scales and articulations (e.g. joint parameters and states) while each canonical part space further normalizes its part pose and scale. We develop a deep network based on PointNet++ that predicts ANCSH from a single depth point cloud, including part segmentation, normalized coordinates, and joint parameters in the canonical object space. By leveraging the canonicalized joints, we demonstrate: 1) improved performance in part pose and scale estimations using the induced kinematic constraints from joints; 2) high accuracy for joint parameter estimation in camera space.

...read moreread less

142 citations

Report•DOI•

Concise Binary Object Representation (CBOR)

[...]

Carsten Bormann, Paul E. Hoffman

30 Sep 2020

TL;DR: The Concise Binary Object Representation is a data format whose design goals include the possibility of extremely small code size, fairly small message size, and extensibility without the need for version negotiation.

...read moreread less

Abstract: The Concise Binary Object Representation (CBOR) is a data format whose design goals include the possibility of extremely small code size, fairly small message size, and extensibility without the need for version negotiation. These design goals make it different from earlier binary serializations such as ASN.1 and MessagePack.

...read moreread less

138 citations

Journal Article•DOI•

Separability and geometry of object manifolds in deep neural networks.

[...]

Uri Cohen¹, SueYeon Chung², SueYeon Chung³, SueYeon Chung⁴, Daniel D. Lee⁵, Haim Sompolinsky⁴, Haim Sompolinsky¹ - Show less +3 more•Institutions (5)

Hebrew University of Jerusalem¹, Massachusetts Institute of Technology², Columbia University³, Harvard University⁴, Cornell University⁵

06 Feb 2020-Nature Communications

TL;DR: It is demonstrated that changes in the geometry of the associated object manifolds underlie this improved capacity, and shed light on the functional roles different levels in the hierarchy play to achieve it, through orchestrated reduction of manifolds’ radius, dimensionality and inter-manifold correlations.

...read moreread less

Abstract: Stimuli are represented in the brain by the collective population responses of sensory neurons, and an object presented under varying conditions gives rise to a collection of neural population responses called an 'object manifold'. Changes in the object representation along a hierarchical sensory system are associated with changes in the geometry of those manifolds, and recent theoretical progress connects this geometry with 'classification capacity', a quantitative measure of the ability to support object classification. Deep neural networks trained on object classification tasks are a natural testbed for the applicability of this relation. We show how classification capacity improves along the hierarchies of deep neural networks with different architectures. We demonstrate that changes in the geometry of the associated object manifolds underlie this improved capacity, and shed light on the functional roles different levels in the hierarchy play to achieve it, through orchestrated reduction of manifolds' radius, dimensionality and inter-manifold correlations.

...read moreread less

120 citations

Proceedings Article•

A Mutual Information Maximization Perspective of Language Representation Learning

[...]

Lingpeng Kong¹, Cyprien de Masson d'Autume¹, Lei Yu¹, Wang Ling¹, Zihang Dai², Dani Yogatama¹ - Show less +2 more•Institutions (2)

Google¹, Carnegie Mellon University²

30 Apr 2020

TL;DR: This work shows state-of-the-art word representation learning methods maximize an objective function that is a lower bound on the mutual information between different parts of a word sequence (i.e., a sentence).

...read moreread less

Abstract: We show state-of-the-art word representation learning methods maximize an objective function that is a lower bound on the mutual information between different parts of a word sequence (i.e., a sentence). Our formulation provides an alternative perspective that unifies classical word embedding models (e.g., Skip-gram) and modern contextual embeddings (e.g., BERT, XLNet). In addition to enhancing our theoretical understanding of these methods, our derivation leads to a principled framework that can be used to construct new self-supervised tasks. We provide an example by drawing inspirations from related methods based on mutual information maximization that have been successful in computer vision, and introduce a simple self-supervised objective that maximizes the mutual information between a global sentence representation and n-grams in the sentence. Our analysis offers a holistic view of representation learning methods to transfer knowledge and translate progress across multiple domains (e.g., natural language processing, computer vision, audio processing).

...read moreread less

119 citations

Journal Article•DOI•

Augmented Autoencoders: Implicit 3D Orientation Learning for 6D Object Detection

[...]

Martin Sundermeyer¹, Zoltan-Csaba Marton¹, Maximilian Durner¹, Rudolph Triebel¹, Rudolph Triebel² - Show less +1 more•Institutions (2)

German Aerospace Center¹, Technische Universität München²

01 Mar 2020-International Journal of Computer Vision

TL;DR: This novel 3D orientation estimation is based on a variant of the Denoising Autoencoder that is trained on simulated views of a 3D model using Domain Randomization and achieves state-of-the-art performance on the T-LESS dataset both in the RGB and RGB-D domain.

...read moreread less

Abstract: We propose a real-time RGB-based pipeline for object detection and 6D pose estimation. Our novel 3D orientation estimation is based on a variant of the Denoising Autoencoder that is trained on simulated views of a 3D model using Domain Randomization. This so-called Augmented Autoencoder has several advantages over existing methods: It does not require real, pose-annotated training data, generalizes to various test sensors and inherently handles object and view symmetries. Instead of learning an explicit mapping from input images to object poses, it provides an implicit representation of object orientations defined by samples in a latent space. Our pipeline achieves state-of-the-art performance on the T-LESS dataset both in the RGB and RGB-D domain. We also evaluate on the LineMOD dataset where we can compete with other synthetically trained approaches. We further increase performance by correcting 3D orientation estimates to account for perspective errors when the object deviates from the image center and show extended results. Our code is available here https://github.com/DLR-RM/AugmentedAutoencoder.

...read moreread less

Journal Article•DOI•

Modality-correlation-aware sparse representation for RGB-infrared object tracking

[...]

Xiangyuan Lan¹, Mang Ye¹, Shengping Zhang², Huiyu Zhou³, Pong C. Yuen¹ - Show less +1 more•Institutions (3)

Hong Kong Baptist University¹, Harbin Institute of Technology², University of Leicester³

01 Feb 2020-Pattern Recognition Letters

TL;DR: This paper presents a feature representation and fusion model to combine the feature representation of the object in RGB and infrared modalities for object tracking and demonstrates the effectiveness of the proposed method.

...read moreread less

Proceedings Article•DOI•

Representation Learning for Information Extraction from Form-like Documents

[...]

Bodhisattwa Prasad Majumder¹, Navneet Potti², Sandeep Tata², James B. Wendt², Qi Zhao³, Marc Najork² - Show less +2 more•Institutions (3)

University of California, San Diego¹, Google², Fudan University³

01 Jul 2020

TL;DR: An extraction system that uses knowledge of the types of the target fields to generate extraction candidates and a neural network architecture that learns a dense representation of each candidate based on neighboring words in the document is proposed.

...read moreread less

Abstract: We propose a novel approach using representation learning for tackling the problem of extracting structured information from form-like document images. We propose an extraction system that uses knowledge of the types of the target fields to generate extraction candidates and a neural network architecture that learns a dense representation of each candidate based on neighboring words in the document. These learned representations are not only useful in solving the extraction task for unseen document templates from two different domains but are also interpretable, as we show using loss cases.

...read moreread less

Posted Content•

Self-supervised Learning from a Multi-view Perspective

[...]

Yao-Hung Hubert Tsai¹, Yue Wu¹, Ruslan Salakhutdinov¹, Louis-Philippe Morency¹•Institutions (1)

Carnegie Mellon University¹

10 Jun 2020-arXiv: Learning

TL;DR: This paper demonstrates that self-supervised learned representations can extract task-relevant information and discard task-irrelevant information, and proposes a composite objective that bridges the gap between prior contrastive and predictive learning objectives, and introduces an additional objective term to discard Task-IRrelevant information.

...read moreread less

Abstract: As a subset of unsupervised representation learning, self-supervised representation learning adopts self-defined signals as supervision and uses the learned representation for downstream tasks, such as object detection and image captioning. Many proposed approaches for self-supervised learning follow naturally a multi-view perspective, where the input (e.g., original images) and the self-supervised signals (e.g., augmented images) can be seen as two redundant views of the data. Building from this multi-view perspective, this paper provides an information-theoretical framework to better understand the properties that encourage successful self-supervised learning. Specifically, we demonstrate that self-supervised learned representations can extract task-relevant information and discard task-irrelevant information. Our theoretical framework paves the way to a larger space of self-supervised learning objective design. In particular, we propose a composite objective that bridges the gap between prior contrastive and predictive learning objectives, and introduce an additional objective term to discard task-irrelevant information. To verify our analysis, we conduct controlled experiments to evaluate the impact of the composite objectives. We also explore our framework's empirical generalization beyond the multi-view perspective, where the cross-view redundancy may not be clearly observed.

...read moreread less

Journal Article•DOI•

Hierarchical Contextualized Representation for Named Entity Recognition

[...]

Ying Luo¹, Fengshun Xiao¹, Hai Zhao¹•Institutions (1)

Shanghai Jiao Tong University¹

03 Apr 2020

TL;DR: This paper proposed a model augmented with hierarchical contextualized representation, which takes different contributions of words in a single sentence into consideration to enhance the sentence representation learned from an independent BiLSTM via label embedding attention mechanism.

...read moreread less

Abstract: Named entity recognition (NER) models are typically based on the architecture of Bi-directional LSTM (BiLSTM). The constraints of sequential nature and the modeling of single input prevent the full utilization of global information from larger scope, not only in the entire sentence, but also in the entire document (dataset). In this paper, we address these two deficiencies and propose a model augmented with hierarchical contextualized representation: sentence-level representation and document-level representation. In sentence-level, we take different contributions of words in a single sentence into consideration to enhance the sentence representation learned from an independent BiLSTM via label embedding attention mechanism. In document-level, the key-value memory network is adopted to record the document-aware information for each unique word which is sensitive to similarity of context information. Our two-level hierarchical contextualized representations are fused with each input token embedding and corresponding hidden state of BiLSTM, respectively. The experimental results on three benchmark NER datasets (CoNLL-2003 and Ontonotes 5.0 English datasets, CoNLL-2002 Spanish dataset) show that we establish new state-of-the-art results.

...read moreread less

Posted Content•

Grasping Field: Learning Implicit Representations for Human Grasps

[...]

Korrawe Karunratanakul¹, Jinlong Yang¹, Yan Zhang¹, Michael J. Black², Krikamol Muandet², Siyu Tang² - Show less +2 more•Institutions (2)

ETH Zurich¹, Max Planck Society²

10 Aug 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: The generative model is able to synthesize high-quality human grasps, given only on a 3D object point cloud and achieves comparable performance for 3D hand reconstruction compared to state-of-the-art methods.

...read moreread less

Abstract: Robotic grasping of house-hold objects has made remarkable progress in recent years. Yet, human grasps are still difficult to synthesize realistically. There are several key reasons: (1) the human hand has many degrees of freedom (more than robotic manipulators); (2) the synthesized hand should conform to the surface of the object; and (3) it should interact with the object in a semantically and physically plausible manner. To make progress in this direction, we draw inspiration from the recent progress on learning-based implicit representations for 3D object reconstruction. Specifically, we propose an expressive representation for human grasp modelling that is efficient and easy to integrate with deep neural networks. Our insight is that every point in a three-dimensional space can be characterized by the signed distances to the surface of the hand and the object, respectively. Consequently, the hand, the object, and the contact area can be represented by implicit surfaces in a common space, in which the proximity between the hand and the object can be modelled explicitly. We name this 3D to 2D mapping as Grasping Field, parameterize it with a deep neural network, and learn it from data. We demonstrate that the proposed grasping field is an effective and expressive representation for human grasp generation. Specifically, our generative model is able to synthesize high-quality human grasps, given only on a 3D object point cloud. The extensive experiments demonstrate that our generative model compares favorably with a strong baseline and approaches the level of natural human grasps. Our method improves the physical plausibility of the hand-object contact reconstruction and achieves comparable performance for 3D hand reconstruction compared to state-of-the-art methods.

...read moreread less

Journal Article•DOI•

Avocado: a multi-scale deep tensor factorization method learns a latent representation of the human epigenome.

[...]

Jacob Schreiber¹, Timothy Durham¹, Jeff A. Bilmes¹, William Stafford Noble¹•Institutions (1)

University of Washington¹

30 Mar 2020-Genome Biology

TL;DR: A deep neural network tensor factorization method is proposed, Avocado, that compresses this epigenomic data into a dense, information-rich representation, and it is shown that machine learning models that exploit this representation outperform those trained directly on epigenomicData on a variety of genomics tasks.

...read moreread less

Abstract: The human epigenome has been experimentally characterized by thousands of measurements for every basepair in the human genome. We propose a deep neural network tensor factorization method, Avocado, that compresses this epigenomic data into a dense, information-rich representation. We use this learned representation to impute epigenomic data more accurately than previous methods, and we show that machine learning models that exploit this representation outperform those trained directly on epigenomic data on a variety of genomics tasks. These tasks include predicting gene expression, promoter-enhancer interactions, replication timing, and an element of 3D chromatin architecture.

...read moreread less

Journal Article•DOI•

Equivalent circuit representation of a vortex‐induced vibration‐based energy harvester using a semi‐empirical lumped parameter approach

[...]

Junlei Wang¹, Junlei Wang², Lihua Tang¹, Liya Zhao³, Guobiao Hu¹, Rujun Song⁴, Kun Xu⁵ - Show less +3 more•Institutions (5)

University of Auckland¹, Zhengzhou University², University of Technology, Sydney³, Shandong University of Technology⁴, Beijing University of Technology⁵

10 Feb 2020-International Journal of Energy Research

Journal Article•DOI•

Increasing the Representation Accuracy of Quantum Simulations of Chemistry without Extra Quantum Resources

[...]

Tyler Y. Takeshita¹, Nicholas C. Rubin², Zhang Jiang², Eunseok Lee¹, Ryan Babbush², Jarrod R. McClean² - Show less +2 more•Institutions (2)

Mercedes-Benz¹, Google²

07 Jan 2020-Physical Review X

TL;DR: The right combination of quantum and classical computations allows for accurate quantum chemistry simulations using surprisingly few qubits as mentioned in this paper, which can be found in many quantum simulations using only a handful of qubits.

...read moreread less

Abstract: The right combination of quantum and classical computations allows for accurate quantum chemistry simulations using surprisingly few qubits.

...read moreread less

Journal Article•DOI•

Decomposition‐based multiinnovation gradient identification algorithms for a special bilinear system based on its input‐output representation

[...]

Longjin Wang¹, Yan Ji¹, Hualin Yang¹, Ling Xu•Institutions (1)

College of Electrical and Mechanical Engineering¹

22 Mar 2020-International Journal of Robust and Nonlinear Control

TL;DR: This article considers the parameter estimation for a special bilinear system with colored noise and proposes a multiinnovation generalized extended stochastic gradient (MI‐GESG) algorithm that can generate more accurate parameter estimation.

...read moreread less

Posted Content•

Detailed 2D-3D Joint Representation for Human-Object Interaction

[...]

Yong-Lu Li¹, Xinpeng Liu¹, Han Lu¹, Shiyi Wang¹, Junqi Liu¹, Jiefeng Li¹, Cewu Lu¹ - Show less +3 more•Institutions (1)

Shanghai Jiao Tong University¹

17 Apr 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: A detailed 2D-3D joint representation learning method for human-Object Interaction detection and a new benchmark named Ambiguous-HOI consisting of hard ambiguous images are proposed to better evaluate the 2D ambiguity processing capacity of models.

...read moreread less

Abstract: Human-Object Interaction (HOI) detection lies at the core of action understanding. Besides 2D information such as human/object appearance and locations, 3D pose is also usually utilized in HOI learning since its view-independence. However, rough 3D body joints just carry sparse body information and are not sufficient to understand complex interactions. Thus, we need detailed 3D body shape to go further. Meanwhile, the interacted object in 3D is also not fully studied in HOI learning. In light of these, we propose a detailed 2D-3D joint representation learning method. First, we utilize the single-view human body capture method to obtain detailed 3D body, face and hand shapes. Next, we estimate the 3D object location and size with reference to the 2D human-object spatial configuration and object category priors. Finally, a joint learning framework and cross-modal consistency tasks are proposed to learn the joint HOI representation. To better evaluate the 2D ambiguity processing capacity of models, we propose a new benchmark named Ambiguous-HOI consisting of hard ambiguous images. Extensive experiments in large-scale HOI benchmark and Ambiguous-HOI show impressive effectiveness of our method. Code and data are available at this https URL.

...read moreread less

Posted Content•

TIDE: A General Toolbox for Identifying Object Detection Errors.

[...]

Daniel Bolya¹, Sean Foley¹, James Hays¹, Judy Hoffman¹•Institutions (1)

Georgia Institute of Technology¹

18 Aug 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: The first to introduce a technique for measuring the contribution of each error in a way that isolates its effect on overall performance and shows that such a representation is critical for drawing accurate, comprehensive conclusions through in-depth analysis across 4 datasets and 7 recognition models.

...read moreread less

Abstract: We introduce TIDE, a framework and associated toolbox for analyzing the sources of error in object detection and instance segmentation algorithms. Importantly, our framework is applicable across datasets and can be applied directly to output prediction files without required knowledge of the underlying prediction system. Thus, our framework can be used as a drop-in replacement for the standard mAP computation while providing a comprehensive analysis of each model's strengths and weaknesses. We segment errors into six types and, crucially, are the first to introduce a technique for measuring the contribution of each error in a way that isolates its effect on overall performance. We show that such a representation is critical for drawing accurate, comprehensive conclusions through in-depth analysis across 4 datasets and 7 recognition models. Available at this https URL

...read moreread less

Journal Article•DOI•

Overcoming Language Priors in VQA via Decomposed Linguistic Representations

[...]

Chenchen Jing¹, Yuwei Wu¹, Xiaoxun Zhang², Yunde Jia¹, Qi Wu³ - Show less +1 more•Institutions (3)

Beijing Institute of Technology¹, Alibaba Group², University of Adelaide³

03 Apr 2020

TL;DR: A novel method of language attention-based VQA that learns decomposed linguistic representations of questions and utilizes the representations to infer answers for overcoming language priors is presented.

...read moreread less

Abstract: Most existing Visual Question Answering (VQA) models overly rely on language priors between questions and answers. In this paper, we present a novel method of language attention-based VQA that learns decomposed linguistic representations of questions and utilizes the representations to infer answers for overcoming language priors. We introduce a modular language attention mechanism to parse a question into three phrase representations: type representation, object representation, and concept representation. We use the type representation to identify the question type and the possible answer set (yes/no or specific concepts such as colors or numbers), and the object representation to focus on the relevant region of an image. The concept representation is verified with the attended region to infer the final answer. The proposed method decouples the language-based concept discovery and vision-based concept verification in the process of answer inference to prevent language priors from dominating the answering process. Experiments on the VQA-CP dataset demonstrate the effectiveness of our method.

...read moreread less

Journal Article•DOI•

Cannabis use is associated with potentially heritable widespread changes in autism candidate gene DLGAP2 DNA methylation in sperm.

[...]

Rose Schrott¹, Kelly S. Acharya¹, Nilda Itchon-Ramos¹, Andrew B. Hawkey¹, Erica Pippen¹, John T. Mitchell¹, Scott H. Kollins¹, Edward D. Levin¹, Susan K. Murphy¹ - Show less +5 more•Institutions (1)

Duke University¹

01 Jan 2020-Epigenetics

TL;DR: The results warrant further investigation into the effects of preconception cannabis use in males and the potential effects on subsequent generations.

...read moreread less

Abstract: Parental cannabis use has been associated with adverse neurodevelopmental outcomes in offspring, but how such phenotypes are transmitted is largely unknown. Using reduced representation bis...

...read moreread less

Journal Article•DOI•

DeepMicro: deep representation learning for disease prediction based on microbiome data.

[...]

Min Oh¹, Liqing Zhang¹•Institutions (1)

Virginia Tech¹

07 Apr 2020-Scientific Reports

TL;DR: This work proposes DeepMicro, a deep representation learning framework allowing for an effective representation of microbiome profiles that outperforms the current best approaches based on the strain-level marker profile in five different datasets in disease prediction.

...read moreread less

Abstract: Human microbiota plays a key role in human health and growing evidence supports the potential use of microbiome as a predictor of various diseases. However, the high-dimensionality of microbiome data, often in the order of hundreds of thousands, yet low sample sizes, poses great challenge for machine learning-based prediction algorithms. This imbalance induces the data to be highly sparse, preventing from learning a better prediction model. Also, there has been little work on deep learning applications to microbiome data with a rigorous evaluation scheme. To address these challenges, we propose DeepMicro, a deep representation learning framework allowing for an effective representation of microbiome profiles. DeepMicro successfully transforms high-dimensional microbiome data into a robust low-dimensional representation using various autoencoders and applies machine learning classification algorithms on the learned representation. In disease prediction, DeepMicro outperforms the current best approaches based on the strain-level marker profile in five different datasets. In addition, by significantly reducing the dimensionality of the marker profile, DeepMicro accelerates the model training and hyperparameter optimization procedure with 8X-30X speedup over the basic approach. DeepMicro is freely available at https://github.com/minoh0201/DeepMicro.

...read moreread less

Book Chapter•DOI•

TIDE: A General Toolbox for Identifying Object Detection Errors

[...]

Daniel Bolya¹, Sean Foley¹, James Hays¹, Judy Hoffman¹•Institutions (1)

Georgia Institute of Technology¹

23 Aug 2020

TL;DR: TIDE as discussed by the authors is a framework and associated toolbox (https://dbolya.github.io/tide/) for analyzing the sources of error in object detection and instance segmentation algorithms.

...read moreread less

Abstract: We introduce TIDE, a framework and associated toolbox (https://dbolya.github.io/tide/) for analyzing the sources of error in object detection and instance segmentation algorithms. Importantly, our framework is applicable across datasets and can be applied directly to output prediction files without required knowledge of the underlying prediction system. Thus, our framework can be used as a drop-in replacement for the standard mAP computation while providing a comprehensive analysis of each model’s strengths and weaknesses. We segment errors into six types and, crucially, are the first to introduce a technique for measuring the contribution of each error in a way that isolates its effect on overall performance. We show that such a representation is critical for drawing accurate, comprehensive conclusions through in-depth analysis across 4 datasets and 7 recognition models.

...read moreread less

Journal Article•DOI•

PERL: Pivot-based Domain Adaptation for Pre-trained Deep Contextualized Embedding Models

[...]

Eyal Ben-David¹, Carmel Rabinovitz¹, Roi Reichart¹•Institutions (1)

Technion – Israel Institute of Technology¹

03 Sep 2020-Transactions of the Association for Computational Linguistics

TL;DR: PerL is a representation learning model that extends contextualized word embedding models such as BERT with pivot-based fine-tuning that outperforms strong baselines across 22 sentiment classification domain adaptation setups, improves in-domain model performance, yields effective reduced-size models, and increases model stability.

...read moreread less

Abstract: Pivot-based neural representation models have led to significant progress in domain adaptation for NLP. However, previous research following this approach utilize only labeled data from the source ...

...read moreread less

Journal Article•DOI•

Conceptualizing the key features of cyber‐physical systems in a multi‐layered representation for safety and security analysis

[...]

Nelson Humberto Carreras Guzman¹, Nelson Humberto Carreras Guzman², Morten Wied¹, Morten Wied², Igor Kozine¹, Mary Ann Lundteigen² - Show less +2 more•Institutions (2)

Technical University of Denmark¹, Norwegian University of Science and Technology²

01 Mar 2020-Systems Engineering

TL;DR: This paper examines the key features of CPSs and their relation with other system types, defines the dependencies between levels of automation and human roles in CPSs from a systems engineering perspective, and applies systems thinking to describe a multi‐layered diagrammatic representation of CPS's for combined safety and security risk analysis.

...read moreread less

Posted Content•

A Comprehensive Survey on Word Representation Models: From Classical to State-Of-The-Art Word Representation Language Models

[...]

Usman Naseem¹, Imran Razzak², Shah Khalid Khan³, Mukesh Prasad⁴•Institutions (4)

University of Sydney¹, Deakin University², RMIT University³, Association for Computing Machinery⁴

28 Oct 2020-arXiv: Computation and Language

TL;DR: A variety of text representation methods, and model designs have blossomed in the context of NLP, including SOTA LMs are described, which can transform large volumes of text into effective vector representations capturing the same semantic information.

...read moreread less

Abstract: Word representation has always been an important research area in the history of natural language processing (NLP). Understanding such complex text data is imperative, given that it is rich in information and can be used widely across various applications. In this survey, we explore different word representation models and its power of expression, from the classical to modern-day state-of-the-art word representation language models (LMS). We describe a variety of text representation methods, and model designs have blossomed in the context of NLP, including SOTA LMs. These models can transform large volumes of text into effective vector representations capturing the same semantic information. Further, such representations can be utilized by various machine learning (ML) algorithms for a variety of NLP related tasks. In the end, this survey briefly discusses the commonly used ML and DL based classifiers, evaluation metrics and the applications of these word embeddings in different NLP tasks.

...read moreread less

Journal Article•DOI•

Machine Learning Prediction of Nine Molecular Properties Based on the SMILES Representation of the QM9 Quantum-Chemistry Dataset.

[...]

Gabriel A. Pinheiro¹, Johnatan Mucelini², Marinalva D. Soares³, Ronaldo C. Prati⁴, Juarez L. F. Da Silva², Marcos G. Quiles³ - Show less +2 more•Institutions (4)

National Institute for Space Research¹, University of São Paulo², Federal University of São Paulo³, Universidade Federal do ABC⁴

11 Nov 2020-Journal of Physical Chemistry A

TL;DR: This work evaluates a feed-forward neural network (FNN) model's prediction performance over five feature selection methods and nine ground-state properties from a public data set composed of ∼130k organic molecules and investigates the impact of choosing free-coordinate descriptors based on the Simplified Molecular Input Line Entry System (SMILES) representation.

...read moreread less

Abstract: Machine learning (ML) models can potentially accelerate the discovery of tailored materials by learning a function that maps chemical compounds into their respective target properties. In this real...

...read moreread less

Collapse