Home
/
Authors
/
Alexander C. Berg

Author

Alexander C. Berg

University of North Carolina at Chapel Hill

Other affiliations: Facebook, Stanford University, Columbia University ...read more

Bio: Alexander C. Berg is an academic researcher from University of North Carolina at Chapel Hill. The author has contributed to research in topics: Object detection & Natural language. The author has an hindex of 57, co-authored 109 publications receiving 67829 citations. Previous affiliations of Alexander C. Berg include Facebook & Stanford University.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2001
2000

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Materials discovery: Fine-grained classification of X-ray scattering images

[...]

M. Hadi Kiapour¹, Kevin G. Yager², Alexander C. Berg¹, Tamara L. Berg¹•Institutions (2)

University of North Carolina at Chapel Hill¹, Brookhaven National Laboratory²

24 Mar 2014

TL;DR: An attribute-based approach to recognition in x-ray scattering images is devised and applications to image annotation and retrieval are demonstrated.

...read moreread less

Abstract: We explore the use of computer vision methods for organizing, searching, and classifying x-ray scattering images. X-ray scattering is a technique that shines an intense beam of x-rays through a sample of interest. By recording the intensity of x-ray deflection as a function of angle, scientists can measure the structure of materials at the molecular and nano-scale. Current and planned synchrotron instruments are producing x-ray scattering data at an unprecedented rate, making the design of automatic analysis techniques crucial for future research. In this paper, we devise an attribute-based approach to recognition in x-ray scattering images and demonstrate applications to image annotation and retrieval.

...read moreread less

15 citations

Posted Content•

A Mask-RCNN Baseline for Probabilistic Object Detection.

[...]

Phil Ammirato¹, Alexander C. Berg¹•Institutions (1)

University of North Carolina at Chapel Hill¹

09 Aug 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This submission to the Probabilistic Object Detection Challenge is a fine-tuned version of Mask-RCNN with some additional post-processing, hoping it can provide some insight into how detectors designed for mean average precision (mAP) evaluation behave under PDQ, as well as a strong baseline for future work.

...read moreread less

Abstract: The Probabilistic Object Detection Challenge evaluates object detection methods using a new evaluation measure, Probability-based Detection Quality (PDQ), on a new synthetic image dataset. We present our submission to the challenge, a fine-tuned version of Mask-RCNN with some additional post-processing. Our method, submitted under username pammirato, is currently second on the leaderboard with a score of 21.432, while also achieving the highest spatial quality and average overall quality of detections. We hope this method can provide some insight into how detectors designed for mean average precision (mAP) evaluation behave under PDQ, as well as a strong baseline for future work.

...read moreread less

14 citations

Journal Article•DOI•

Combining Multiple Cues for Visual Madlibs Question Answering

[...]

Tatiana Tommasi¹, Arun Mallya², Bryan A. Plummer², Svetlana Lazebnik², Alexander C. Berg³, Tamara L. Berg³ - Show less +2 more•Institutions (3)

Istituto Italiano di Tecnologia¹, University of Illinois at Urbana–Champaign², University of North Carolina at Chapel Hill³

15 Jan 2019-International Journal of Computer Vision

TL;DR: In this paper, the authors present an approach for answering fill-in-the-blank multiple choice questions from the Visual Madlibs dataset, which employs a combination of networks trained for specialized tasks such as scene recognition, person activity classification and attribute prediction.

...read moreread less

Abstract: This paper presents an approach for answering fill-in-the-blank multiple choice questions from the Visual Madlibs dataset. Instead of generic and commonly used representations trained on the ImageNet classification task, our approach employs a combination of networks trained for specialized tasks such as scene recognition, person activity classification, and attribute prediction. We also present a method for localizing phrases from candidate answers in order to provide spatial support for feature extraction. We map each of these features, together with candidate answers, to a joint embedding space through normalized canonical correlation analysis (nCCA). Finally, we solve an optimization problem to learn to combine scores from nCCA models trained on multiple cues to select the best answer. Extensive experimental results show a significant improvement over the previous state of the art and confirm that answering questions from a wide range of types benefits from examining a variety of image cues and carefully choosing the spatial support for feature extraction.

...read moreread less

13 citations

Proceedings Article•DOI•

When Was That Made

[...]

Sirion Vittayakorn¹, Alexander C. Berg¹, Tamara L. Berg¹•Institutions (1)

University of North Carolina at Chapel Hill¹

24 Mar 2017

TL;DR: In this article, the authors explore deep learning methods for estimating when the objects were made and demonstrate that the deep learning approach outperforms both a color-based baseline and visual data mining approach which is the previous state of the art method for the temporal estimation.

...read moreread less

Abstract: In this paper, we explore deep learning methods for estimating when the objects were made. Temporal estimation of objects is a challenging task which requires expertise in the object domain. With temporal information of objects, historian, genealogists, sociologist, archaeologist or conservationists can study the past through the objects. Toward this goal, we utilize features from existing deep networks and fine-tune new networks for temporal estimation task. The results demonstrate that the deep learning approach outperforms both a color-based baseline and visual data mining approach which is the previous state of the art method for the temporal estimation. To gain the insights into the deep network performance, we provide the analyses of neuron activations and their entropy including neuron temporal sensitivity, neuron activity and the correlation between discriminative parts from the deep network and the data mining approach. Finally, we demonstrate the potential of the temporal estimation pipeline for an interesting application such as fashion trend analysis.

...read moreread less

11 citations

Posted Content•

Solving Visual Madlibs with Multiple Cues

[...]

Tatiana Tommasi¹, Arun Mallya², Bryan A. Plummer², Svetlana Lazebnik², Alexander C. Berg¹, Tamara L. Berg¹ - Show less +2 more•Institutions (2)

University of North Carolina at Chapel Hill¹, University of Illinois at Urbana–Champaign²

11 Aug 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, the authors focus on answering fill-in-the-blank style multiple choice questions from the Visual Madlibs dataset and employ features derived from networks trained for specialized tasks of scene classification, person activity prediction, and person and object attribute prediction.

...read moreread less

Abstract: This paper focuses on answering fill-in-the-blank style multiple choice questions from the Visual Madlibs dataset. Previous approaches to Visual Question Answering (VQA) have mainly used generic image features from networks trained on the ImageNet dataset, despite the wide scope of questions. In contrast, our approach employs features derived from networks trained for specialized tasks of scene classification, person activity prediction, and person and object attribute prediction. We also present a method for selecting sub-regions of an image that are relevant for evaluating the appropriateness of a putative answer. Visual features are computed both from the whole image and from local regions, while sentences are mapped to a common space using a simple normalized canonical correlation analysis (CCA) model. Our results show a significant improvement over the previous state of the art, and indicate that answering different question types benefits from examining a variety of image cues and carefully choosing informative image sub-regions.

...read moreread less

10 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
…
15
16
17
18
19
20
21
…
22
23

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Deep Residual Learning for Image Recognition

[...]

Kaiming He¹, Xiangyu Zhang¹, Shaoqing Ren¹, Jian Sun¹•Institutions (1)

Microsoft¹

27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

...read moreread less

123,388 citations

Proceedings Article•

Very Deep Convolutional Networks for Large-Scale Image Recognition

[...]

Karen Simonyan¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

04 Sep 2014

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

...read moreread less

55,235 citations

Proceedings Article•

Very Deep Convolutional Networks for Large-Scale Image Recognition

[...]

Karen Simonyan¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

01 Jan 2015

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

49,914 citations

Posted Content•

Deep Residual Learning for Image Recognition

[...]

Kaiming He¹, Xiangyu Zhang¹, Shaoqing Ren¹, Jian Sun¹•Institutions (1)

Microsoft¹

10 Dec 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

...read moreread less

Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

...read moreread less

44,703 citations

Book•

Deep Learning

[...]

Ian Goodfellow¹, Yoshua Bengio², Aaron Courville²•Institutions (2)

Google¹, Université de Montréal²

18 Nov 2016

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.

...read moreread less

Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

...read moreread less

38,208 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse