Home
/
Authors
/
Mathew Monfort

Author

Mathew Monfort

Other affiliations: University of Illinois at Chicago

Bio: Mathew Monfort is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Tensor (intrinsic definition) & Set (abstract data type). The author has an hindex of 12, co-authored 26 publications receiving 3809 citations. Previous affiliations of Mathew Monfort include University of Illinois at Chicago.

Papers

PDF

Open Access

More filters

Posted Content•

End to End Learning for Self-Driving Cars

[...]

Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D. Jackel, Mathew Monfort, Urs A. Muller, Jiakai Zhang, Xin Zhang, Jake Zhao, Karol Zieba - Show less +9 more

25 Apr 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: A convolutional neural network is trained to map raw pixels from a single front-facing camera directly to steering commands and it is argued that this will eventually lead to better performance and smaller systems.

...read moreread less

Abstract: We trained a convolutional neural network (CNN) to map raw pixels from a single front-facing camera directly to steering commands. This end-to-end approach proved surprisingly powerful. With minimum training data from humans the system learns to drive in traffic on local roads with or without lane markings and on highways. It also operates in areas with unclear visual guidance such as in parking lots and on unpaved roads. The system automatically learns internal representations of the necessary processing steps such as detecting useful road features with only the human steering angle as the training signal. We never explicitly trained it to detect, for example, the outline of roads. Compared to explicit decomposition of the problem, such as lane marking detection, path planning, and control, our end-to-end system optimizes all processing steps simultaneously. We argue that this will eventually lead to better performance and smaller systems. Better performance will result because the internal components self-optimize to maximize overall system performance, instead of optimizing human-selected intermediate criteria, e.g., lane detection. Such criteria understandably are selected for ease of human interpretation which doesn't automatically guarantee maximum system performance. Smaller networks are possible because the system learns to solve the problem with the minimal number of processing steps. We used an NVIDIA DevBox and Torch 7 for training and an NVIDIA DRIVE(TM) PX self-driving car computer also running Torch 7 for determining where to drive. The system operates at 30 frames per second (FPS).

...read moreread less

3,379 citations

Journal Article•DOI•

Moments in Time Dataset: One Million Videos for Event Understanding

[...]

Mathew Monfort¹, Carl Vondrick², Aude Oliva¹, Alex Andonian¹, Bolei Zhou³, Kandan Ramakrishnan¹, Sarah Adel Bargal⁴, Tom Yan¹, Lisa M. Brown⁵, Quanfu Fan⁵, Dan Gutfreund⁵ - Show less +7 more•Institutions (5)

Massachusetts Institute of Technology¹, Columbia University², The Chinese University of Hong Kong³, Boston University⁴, IBM⁵

01 Feb 2020-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The Moments in Time dataset, a large-scale human-annotated collection of one million short videos corresponding to dynamic events unfolding within three seconds, can serve as a new challenge to develop models that scale to the level of complexity and abstract reasoning that a human processes on a daily basis.

...read moreread less

Abstract: We present the Moments in Time Dataset, a large-scale human-annotated collection of one million short videos corresponding to dynamic events unfolding within three seconds. Modeling the spatial-audio-temporal dynamics even for actions occurring in 3 second videos poses many challenges: meaningful events do not include only people, but also objects, animals, and natural phenomena; visual and auditory events can be symmetrical in time (“opening” is “closing” in reverse), and either transient or sustained. We describe the annotation process of our dataset (each video is tagged with one action or activity label among 339 different classes), analyze its scale and diversity in comparison to other large-scale video datasets for action recognition, and report results of several baseline models addressing separately, and jointly, three modalities: spatial, temporal and auditory. The Moments in Time dataset, designed to have a large coverage and diversity of events in both visual and auditory modalities, can serve as a new challenge to develop models that scale to the level of complexity and abstract reasoning that a human processes on a daily basis.

...read moreread less

416 citations

Proceedings Article•DOI•

Multi-Agent Tensor Fusion for Contextual Trajectory Prediction

[...]

Tianyang Zhao¹, Yifei Xu², Mathew Monfort, Wongun Choi, Chris L. Baker, Yibiao Zhao, Yizhou Wang, Ying Nian Wu² - Show less +4 more•Institutions (2)

Peking University¹, University of California, Los Angeles²

15 Jun 2019

TL;DR: This work encodes multiple agents' past trajectories and the scene context into a Multi-Agent Tensor, then applies convolutional fusion to capture multiagent interactions while retaining the spatial structure of agents and thescene context.

...read moreread less

Abstract: Accurate prediction of others' trajectories is essential for autonomous driving. Trajectory prediction is challenging because it requires reasoning about agents' past movements, social interactions among varying numbers and kinds of agents, constraints from the scene context, and the stochasticity of human behavior. Our approach models these interactions and constraints jointly within a novel Multi-Agent Tensor Fusion (MATF) network. Specifically, the model encodes multiple agents' past trajectories and the scene context into a Multi-Agent Tensor, then applies convolutional fusion to capture multiagent interactions while retaining the spatial structure of agents and the scene context. The model decodes recurrently to multiple agents' future trajectories, using adversarial loss to learn stochastic predictions. Experiments on both highway driving and pedestrian crowd datasets show that the model achieves state-of-the-art prediction accuracy.

...read moreread less

339 citations

Posted Content•

Moments in Time Dataset: one million videos for event understanding

[...]

Mathew Monfort, Bolei Zhou, Sarah Adel Bargal, Alex Andonian, Tom Yan, Kandan Ramakrishnan, Lisa M. Brown, Quanfu Fan, Dan Gutfruend, Carl Vondrick, Aude Oliva - Show less +7 more

09 Jan 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: The Moments in Time dataset as mentioned in this paper is a large-scale human-annotated collection of one million short videos corresponding to dynamic events unfolding within three seconds, where each video is tagged with one action or activity label among 339 different classes.

...read moreread less

Abstract: We present the Moments in Time Dataset, a large-scale human-annotated collection of one million short videos corresponding to dynamic events unfolding within three seconds. Modeling the spatial-audio-temporal dynamics even for actions occurring in 3 second videos poses many challenges: meaningful events do not include only people, but also objects, animals, and natural phenomena; visual and auditory events can be symmetrical or not in time ("opening" means "closing" in reverse order), and transient or sustained. We describe the annotation process of our dataset (each video is tagged with one action or activity label among 339 different classes), analyze its scale and diversity in comparison to other large-scale video datasets for action recognition, and report results of several baseline models addressing separately and jointly three modalities: spatial, temporal and auditory. The Moments in Time dataset designed to have a large coverage and diversity of events in both visual and auditory modalities, can serve as a new challenge to develop models that scale to the level of complexity and abstract reasoning that a human processes on a daily basis.

...read moreread less

209 citations

"Quality vs Quantity": Improved Shot Prediction in Soccer using Strategic Features from Spatiotemporal Data

[...]

Patrick Lucey¹, Alina Bialkowski, Mathew Monfort, Peter W. Carr, Iain Matthews - Show less +1 more•Institutions (1)

Disney Research¹

01 Feb 2015

TL;DR: In this paper, the authors present a method which accurately estimates the likelihood of chances in soccer using strategic features from an entire season of player and ball tracking data taken from a professional league.

...read moreread less

Abstract: In this paper, we present a method which accurately estimates the likelihood of chances in soccer using strategic features from an entire season of player and ball tracking data taken from a professional league. From the data, we analyzed the spatiotemporal patterns of the ten-second window of play before a shot for nearly 10,000 shots. From our analysis, we found that not only is the game phase important (i.e., corner, free-kick, open-play, counter attack etc.), the strategic features such as defender proximity, interaction of surrounding players, speed of play, coupled with the shot location play an impact on determining the likelihood of a team scoring a goal. Using our spatiotemporal strategic features, we can accurately measure the likelihood of each shot. We use this analysis to quantify the efficiency of each team and their strategy.

...read moreread less

109 citations

1
2
3
4
…
5
6

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Phd by thesis

[...]

Richard Lathe¹•Institutions (1)

French Institute of Health and Medical Research¹

01 Apr 1988-Nature

TL;DR: In this paper, a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) is presented.

...read moreread less

Abstract: Deposits of clastic carbonate-dominated (calciclastic) sedimentary slope systems in the rock record have been identified mostly as linearly-consistent carbonate apron deposits, even though most ancient clastic carbonate slope deposits fit the submarine fan systems better. Calciclastic submarine fans are consequently rarely described and are poorly understood. Subsequently, very little is known especially in mud-dominated calciclastic submarine fan systems. Presented in this study are a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) that reveals a >250 m thick calciturbidite complex deposited in a calciclastic submarine fan setting. Seven facies are recognised from core and thin section characterisation and are grouped into three carbonate turbidite sequences. They include: 1) Calciturbidites, comprising mostly of highto low-density, wavy-laminated bioclast-rich facies; 2) low-density densite mudstones which are characterised by planar laminated and unlaminated muddominated facies; and 3) Calcidebrites which are muddy or hyper-concentrated debrisflow deposits occurring as poorly-sorted, chaotic, mud-supported floatstones. These

...read moreread less

9,929 citations

Journal Article•DOI•

Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)

[...]

Amina Adadi¹, Mohammed Berrada¹•Institutions (1)

SIDI¹

17 Sep 2018-IEEE Access

TL;DR: This survey provides an entry point for interested researchers and practitioners to learn key aspects of the young and rapidly growing body of research related to XAI, and review the existing approaches regarding the topic, discuss trends surrounding its sphere, and present major research trajectories.

...read moreread less

Abstract: At the dawn of the fourth industrial revolution, we are witnessing a fast and widespread adoption of artificial intelligence (AI) in our daily life, which contributes to accelerating the shift towards a more algorithmic society. However, even with such unprecedented advancements, a key impediment to the use of AI-based systems is that they often lack transparency. Indeed, the black-box nature of these systems allows powerful predictions, but it cannot be directly explained. This issue has triggered a new debate on explainable AI (XAI). A research field holds substantial promise for improving trust and transparency of AI-based systems. It is recognized as the sine qua non for AI to continue making steady progress without disruption. This survey provides an entry point for interested researchers and practitioners to learn key aspects of the young and rapidly growing body of research related to XAI. Through the lens of the literature, we review the existing approaches regarding the topic, discuss trends surrounding its sphere, and present major research trajectories.

...read moreread less

2,258 citations

Journal Article•DOI•

Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks

[...]

Yu-Hsin Chen¹, Tushar Krishna¹, Joel Emer¹, Vivienne Sze¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2017-IEEE Journal of Solid-state Circuits

TL;DR: Eyeriss as mentioned in this paper is an accelerator for state-of-the-art deep convolutional neural networks (CNNs) that optimizes for the energy efficiency of the entire system, including the accelerator chip and off-chip DRAM, by reconfiguring the architecture.

...read moreread less

Abstract: Eyeriss is an accelerator for state-of-the-art deep convolutional neural networks (CNNs). It optimizes for the energy efficiency of the entire system, including the accelerator chip and off-chip DRAM, for various CNN shapes by reconfiguring the architecture. CNNs are widely used in modern AI systems but also bring challenges on throughput and energy efficiency to the underlying hardware. This is because its computation requires a large amount of data, creating significant data movement from on-chip and off-chip that is more energy-consuming than computation. Minimizing data movement energy cost for any CNN shape, therefore, is the key to high throughput and energy efficiency. Eyeriss achieves these goals by using a proposed processing dataflow, called row stationary (RS), on a spatial architecture with 168 processing elements. RS dataflow reconfigures the computation mapping of a given shape, which optimizes energy efficiency by maximally reusing data locally to reduce expensive data movement, such as DRAM accesses. Compression and data gating are also applied to further improve energy efficiency. Eyeriss processes the convolutional layers at 35 frames/s and 0.0029 DRAM access/multiply and accumulation (MAC) for AlexNet at 278 mW (batch size $N = 4$ ), and 0.7 frames/s and 0.0035 DRAM access/MAC for VGG-16 at 236 mW ( $N = 3$ ).

...read moreread less

2,165 citations

Posted Content•

On Calibration of Modern Neural Networks

[...]

Chuan Guo¹, Geoff Pleiss², Yu Sun², Kilian Q. Weinberger²•Institutions (2)

University of Waterloo¹, Cornell University²

14 Jun 2017-arXiv: Learning

TL;DR: It is discovered that modern neural networks, unlike those from a decade ago, are poorly calibrated, and on most datasets, temperature scaling -- a single-parameter variant of Platt Scaling -- is surprisingly effective at calibrating predictions.

...read moreread less

Abstract: Confidence calibration -- the problem of predicting probability estimates representative of the true correctness likelihood -- is important for classification models in many applications. We discover that modern neural networks, unlike those from a decade ago, are poorly calibrated. Through extensive experiments, we observe that depth, width, weight decay, and Batch Normalization are important factors influencing calibration. We evaluate the performance of various post-processing calibration methods on state-of-the-art architectures with image and document classification datasets. Our analysis and experiments not only offer insights into neural network learning, but also provide a simple and straightforward recipe for practical settings: on most datasets, temperature scaling -- a single-parameter variant of Platt Scaling -- is surprisingly effective at calibrating predictions.

...read moreread less

1,883 citations

Proceedings Article•

On calibration of modern neural networks

[...]

Chuan Guo¹, Geoff Pleiss², Yu Sun², Kilian Q. Weinberger²•Institutions (2)

University of Waterloo¹, Cornell University²

17 Jul 2017

TL;DR: This article found that depth, width, weight decay, and batch normalization are important factors influencing confidence calibration of neural networks, and that temperature scaling is surprisingly effective at calibrating predictions.

...read moreread less

1,853 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse