Home
/
Authors
/
Yufei Ye

Author

Yufei Ye

Bio: Yufei Ye is an academic researcher from Carnegie Mellon University. The author has contributed to research in topics: Computer science & Graph (abstract data type). The author has an hindex of 5, co-authored 9 publications receiving 471 citations. Previous affiliations of Yufei Ye include Tsinghua University.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs

[...]

Xiaolong Wang¹, Yufei Ye², Abhinav Gupta²•Institutions (2)

Facebook¹, Carnegie Mellon University²

18 Jun 2018

TL;DR: In this article, a graph convolutional network (GCN) is used to predict the visual classifiers of unseen categories, which is robust to noise in the learned knowledge graph (KG) given a semantic embedding for each node (representing visual category).

...read moreread less

Abstract: We consider the problem of zero-shot recognition: learning a visual classifier for a category with zero training examples, just using the word embedding of the category and its relationship to other categories, which visual data are provided. The key to dealing with the unfamiliar or novel category is to transfer knowledge obtained from familiar classes to describe the unfamiliar class. In this paper, we build upon the recently introduced Graph Convolutional Network (GCN) and propose an approach that uses both semantic embeddings and the categorical relationships to predict the classifiers. Given a learned knowledge graph (KG), our approach takes as input semantic embeddings for each node (representing visual category). After a series of graph convolutions, we predict the visual classifier for each category. During training, the visual classifiers for a few categories are given to learn the GCN parameters. At test time, these filters are used to predict the visual classifiers of unseen categories. We show that our approach is robust to noise in the KG. More importantly, our approach provides significant improvement in performance compared to the current state-of-the-art results (from 2 ~ 3% on some metrics to whopping 20% on a few).

...read moreread less

570 citations

Proceedings Article•DOI•

Compositional Video Prediction

[...]

Yufei Ye¹, Maneesh Singh, Abhinav Gupta¹, Shubham Tulsiani²•Institutions (2)

Carnegie Mellon University¹, Facebook²

01 Oct 2019

TL;DR: An approach for pixel-level future prediction given an input image of a scene observing that a scene is comprised of distinct entities that undergo motion is presented and empirically validate the approach against alternate representations and ways of incorporating multi-modality.

...read moreread less

Abstract: We present an approach for pixel-level future prediction given an input image of a scene. We observe that a scene is comprised of distinct entities that undergo motion and present an approach that operationalizes this insight. We implicitly predict future states of independent entities while reasoning about their interactions, and compose future video frames using these predicted states. We overcome the inherent multi-modality of the task using a global trajectory-level latent random variable, and show that this allows us to sample diverse and plausible futures. We empirically validate our approach against alternate representations and ways of incorporating multi-modality. We examine two datasets, one comprising of stacked objects that may fall, and the other containing videos of humans performing activities in a gym, and show that our approach allows realistic stochastic video prediction across these diverse settings. See project website (https://judyye.github.io/CVP/) for video predictions.

...read moreread less

90 citations

Proceedings Article•DOI•

Shelf-Supervised Mesh Prediction in the Wild

[...]

Yufei Ye¹, Shubham Tulsiani², Abhinav Gupta¹•Institutions (2)

Carnegie Mellon University¹, Facebook²

11 Feb 2021

TL;DR: In this paper, a learning-based approach that can train from unstructured image collections, supervised by only segmentation outputs from off-the-shelf recognition systems (i.e., "shelf-supervised") is proposed.

...read moreread less

Abstract: We aim to infer 3D shape and pose of object from a single image and propose a learning-based approach that can train from unstructured image collections, supervised by only segmentation outputs from off-the-shelf recognition systems (i.e. ‘shelf-supervised’). We first infer a volumetric representation in a canonical frame, along with the camera pose. We enforce the representation geometrically consistent with both appearance and masks, and also that the synthesized novel views are indistinguishable from image collections. The coarse volumetric prediction is then converted to a mesh-based representation, which is further refined in the predicted camera frame. These two steps allow both shape-pose factorization from image collections and per-instance reconstruction in finer details. We examine the method on both synthetic and the real-world datasets and demonstrate its scalability on 50 categories in the wild, an order of magnitude more classes than existing works.

...read moreread less

54 citations

Posted Content•

Object-centric Forward Modeling for Model Predictive Control

[...]

Yufei Ye¹, Dhiraj Gandhi², Abhinav Gupta², Shubham Tulsiani²•Institutions (2)

Carnegie Mellon University¹, Facebook²

08 Oct 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: An approach to learn an object-centric forward model that can be leveraged to search for action sequences that lead to desired goal configurations, and that in conjunction with a learned correction module, this allows for robust closed loop execution.

...read moreread less

Abstract: We present an approach to learn an object-centric forward model, and show that this allows us to plan for sequences of actions to achieve distant desired goals. We propose to model a scene as a collection of objects, each with an explicit spatial location and implicit visual feature, and learn to model the effects of actions using random interaction data. Our model allows capturing the robot-object and object-object interactions, and leads to more sample-efficient and accurate predictions. We show that this learned model can be leveraged to search for action sequences that lead to desired goal configurations, and that in conjunction with a learned correction module, this allows for robust closed loop execution. We present experiments both in simulation and the real world, and show that our approach improves over alternate implicit or pixel-space forward models. Please see our project page (this https URL) for result videos.

...read moreread less

24 citations

Proceedings Article•DOI•

What's in your hands? 3D Reconstruction of Generic Objects in Hands

[...]

Yufei Ye, Abhinav Gupta, Shubham Tulsiani

14 Apr 2022

TL;DR: The key insight is that hand articulation is highly predictive of the object shape, and this work proposes an approach that conditionally reconstructs the object based on the articulation and the visual input and allows the hand pose estimation to further improve in test-time optimization.

...read moreread less

Abstract: Our work aims to reconstruct hand-held objects given a single RGB image. In contrast to prior works that typically assume known 3D templates and reduce the problem to 3D pose estimation, our work reconstructs generic hand-held object without knowing their 3D templates. Our key insight is that hand articulation is highly predictive of the object shape, and we propose an approach that conditionally reconstructs the object based on the articulation and the visual input. Given an image depicting a hand-held object, we first use off-the-shelf systems to estimate the underlying hand pose and then infer the object shape in a normalized hand-centric coordinate frame. We parameterized the object by signed distance which are inferred by an implicit network which leverages the information from both visual feature and articulation-aware coordinates to process a query point. We perform experiments across three datasets and show that our method consistently outperforms baselines and is able to reconstruct a diverse set of objects. We analyze the benefits and robustness of explicit articulation conditioning and also show that this allows the hand pose estimation to further improve in test-time optimization.

...read moreread less

17 citations

Cited by

PDF

Open Access

More filters

Posted Content•

Graph Neural Networks: A Review of Methods and Applications

[...]

Jie Zhou¹, Ganqu Cui¹, Shengding Hu¹, Zhengyan Zhang¹, Cheng Yang², Zhiyuan Liu¹, Lifeng Wang³, Changcheng Li³, Maosong Sun¹ - Show less +5 more•Institutions (3)

Tsinghua University¹, Beijing University of Posts and Telecommunications², Tencent³

20 Dec 2018-arXiv: Learning

TL;DR: A detailed review over existing graph neural network models is provided, systematically categorize the applications, and four open problems for future research are proposed.

...read moreread less

Abstract: Lots of learning tasks require dealing with graph data which contains rich relation information among elements. Modeling physics systems, learning molecular fingerprints, predicting protein interface, and classifying diseases demand a model to learn from graph inputs. In other domains such as learning from non-structural data like texts and images, reasoning on extracted structures (like the dependency trees of sentences and the scene graphs of images) is an important research topic which also needs graph reasoning models. Graph neural networks (GNNs) are neural models that capture the dependence of graphs via message passing between the nodes of graphs. In recent years, variants of GNNs such as graph convolutional network (GCN), graph attention network (GAT), graph recurrent network (GRN) have demonstrated ground-breaking performances on many deep learning tasks. In this survey, we propose a general design pipeline for GNN models and discuss the variants of each component, systematically categorize the applications, and propose four open problems for future research.

...read moreread less

2,494 citations

Journal Article•DOI•

Graph Neural Networks: A Review of Methods and Applications

[...]

Jie Zhou¹, Ganqu Cui¹, Shengding Hu¹, Zhengyan Zhang¹, Cheng Yang², Zhiyuan Liu¹, Lifeng Wang³, Changcheng Li³, Maosong Sun¹ - Show less +5 more•Institutions (3)

Tsinghua University¹, Beijing University of Posts and Telecommunications², Tencent³

01 Jan 2020

TL;DR: In this paper, the authors propose a general design pipeline for GNN models and discuss the variants of each component, systematically categorize the applications, and propose four open problems for future research.

...read moreread less

1,266 citations

Journal Article•DOI•

A Survey on Knowledge Graphs: Representation, Acquisition and Applications

[...]

Shaoxiong Ji¹, Shirui Pan², Erik Cambria³, Pekka Marttinen¹, Philip S. Yu⁴ - Show less +1 more•Institutions (4)

Aalto University¹, Monash University, Clayton campus², Nanyang Technological University³, University of Illinois at Chicago⁴

26 Apr 2021-IEEE Transactions on Neural Networks

TL;DR: A comprehensive review of the knowledge graph covering overall research topics about: 1) knowledge graph representation learning; 2) knowledge acquisition and completion; 3) temporal knowledge graph; and 4) knowledge-aware applications and summarize recent breakthroughs and perspective directions to facilitate future research.

...read moreread less

Abstract: Human knowledge provides a formal understanding of the world. Knowledge graphs that represent structural relations between entities have become an increasingly popular research direction toward cognition and human-level intelligence. In this survey, we provide a comprehensive review of the knowledge graph covering overall research topics about: 1) knowledge graph representation learning; 2) knowledge acquisition and completion; 3) temporal knowledge graph; and 4) knowledge-aware applications and summarize recent breakthroughs and perspective directions to facilitate future research. We propose a full-view categorization and new taxonomies on these topics. Knowledge graph embedding is organized from four aspects of representation space, scoring function, encoding models, and auxiliary information. For knowledge acquisition, especially knowledge graph completion, embedding methods, path inference, and logical rule reasoning are reviewed. We further explore several emerging topics, including metarelational learning, commonsense reasoning, and temporal knowledge graphs. To facilitate future research on knowledge graphs, we also provide a curated collection of data sets and open-source libraries on different tasks. In the end, we have a thorough outlook on several promising research directions.

...read moreread less

1,025 citations

Posted Content•

Generalizing from a Few Examples: A Survey on Few-Shot Learning

[...]

Yaqing Wang¹, Quanming Yao², James T. Kwok¹, Lionel M. Ni¹•Institutions (2)

Hong Kong University of Science and Technology¹, Paradigm²

10 Apr 2019-arXiv: Learning

TL;DR: A thorough survey to fully understand Few-Shot Learning (FSL), and categorizes FSL methods from three perspectives: data, which uses prior knowledge to augment the supervised experience; model, which used to reduce the size of the hypothesis space; and algorithm, which using prior knowledgeto alter the search for the best hypothesis in the given hypothesis space.

...read moreread less

Abstract: Machine learning has been highly successful in data-intensive applications but is often hampered when the data set is small. Recently, Few-Shot Learning (FSL) is proposed to tackle this problem. Using prior knowledge, FSL can rapidly generalize to new tasks containing only a few samples with supervised information. In this paper, we conduct a thorough survey to fully understand FSL. Starting from a formal definition of FSL, we distinguish FSL from several relevant machine learning problems. We then point out that the core issue in FSL is that the empirical risk minimized is unreliable. Based on how prior knowledge can be used to handle this core issue, we categorize FSL methods from three perspectives: (i) data, which uses prior knowledge to augment the supervised experience; (ii) model, which uses prior knowledge to reduce the size of the hypothesis space; and (iii) algorithm, which uses prior knowledge to alter the search for the best hypothesis in the given hypothesis space. With this taxonomy, we review and discuss the pros and cons of each category. Promising directions, in the aspects of the FSL problem setups, techniques, applications and theories, are also proposed to provide insights for future research.

...read moreread less

840 citations

Posted Content•

Simplifying Graph Convolutional Networks

[...]

Felix Wu, Tianyi Zhang, Amauri Holanda de Souza, Christopher Fifty, Tao Yu, Kilian Q. Weinberger - Show less +2 more

19 Feb 2019-arXiv: Learning

TL;DR: In this paper, the authors reduce the complexity of GCN by successively removing nonlinearities and collapsing weight matrices between consecutive layers, which corresponds to a fixed low-pass filter followed by a linear classifier.

...read moreread less

Abstract: Graph Convolutional Networks (GCNs) and their variants have experienced significant attention and have become the de facto methods for learning graph representations. GCNs derive inspiration primarily from recent deep learning approaches, and as a result, may inherit unnecessary complexity and redundant computation. In this paper, we reduce this excess complexity through successively removing nonlinearities and collapsing weight matrices between consecutive layers. We theoretically analyze the resulting linear model and show that it corresponds to a fixed low-pass filter followed by a linear classifier. Notably, our experimental evaluation demonstrates that these simplifications do not negatively impact accuracy in many downstream applications. Moreover, the resulting model scales to larger datasets, is naturally interpretable, and yields up to two orders of magnitude speedup over FastGCN.

...read moreread less

666 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156

Collapse