Showing papers by "Alexander C. Berg published in 2012"

PDF

Open Access

Proceedings Article•

Midge: Generating Image Descriptions From Computer Vision Detections

[...]

Margaret Mitchell¹, Jesse Dodge², Amit Goyal², Kota Yamaguchi³, Karl Stratos⁴, Xufeng Han², Alyssa Mensch², Alexander C. Berg³, Tamara L. Berg⁴, Hal Daumé⁵ - Show less +6 more•Institutions (5)

University of Aberdeen¹, Stony Brook University², University of Maryland, College Park³, Columbia University⁴, Massachusetts Institute of Technology⁵

23 Apr 2012

TL;DR: A novel generation system that composes humanlike descriptions of images from computer vision detections by leveraging syntactically informed word co-occurrence statistics and automatically generating some of the most natural image descriptions to date.

...read moreread less

Abstract: This paper introduces a novel generation system that composes humanlike descriptions of images from computer vision detections. By leveraging syntactically informed word co-occurrence statistics, the generator filters and constrains the noisy detections output from a vision system to generate syntactic trees that detail what the computer vision system sees. Results show that the generation system outperforms state-of-the-art systems, automatically generating some of the most natural image descriptions to date.

...read moreread less

450 citations

Proceedings Article•

Collective Generation of Natural Image Descriptions

[...]

Polina Kuznetsova¹, Vicente Ordonez¹, Alexander C. Berg¹, Tamara L. Berg¹, Yejin Choi¹ - Show less +1 more•Institutions (1)

Stony Brook University¹

08 Jul 2012

TL;DR: A holistic data-driven approach to image description generation, exploiting the vast amount of (noisy) parallel image data and associated natural language descriptions available on the web to generate novel descriptions for query images.

...read moreread less

Abstract: We present a holistic data-driven approach to image description generation, exploiting the vast amount of (noisy) parallel image data and associated natural language descriptions available on the web. More specifically, given a query image, we retrieve existing human-composed phrases used to describe visually similar images, then selectively combine those phrases to generate a novel description for the query image. We cast the generation process as constraint optimization problems, collectively incorporating multiple interconnected aspects of language composition for content planning, surface realization and discourse structure. Evaluation by human annotators indicates that our final system generates more semantically correct and linguistically appealing descriptions than two nontrivial baselines.

...read moreread less

353 citations

Proceedings Article•DOI•

Hedging your bets: Optimizing accuracy-specificity trade-offs in large scale visual recognition

[...]

Jia Deng¹, Jonathan Krause¹, Alexander C. Berg², Li Fei-Fei¹•Institutions (2)

Stanford University¹, Stony Brook University²

16 Jun 2012

TL;DR: This work proposes the Dual Accuracy Reward Trade-off Search (DARTS) algorithm and proves that, under practical conditions, it converges to an optimal solution.

...read moreread less

Abstract: As visual recognition scales up to ever larger numbers of categories, maintaining high accuracy is increasingly difficult. In this work, we study the problem of optimizing accuracy-specificity trade-offs in large scale recognition, motivated by the observation that object categories form a semantic hierarchy consisting of many levels of abstraction. A classifier can select the appropriate level, trading off specificity for accuracy in case of uncertainty. By optimizing this trade-off, we obtain classifiers that try to be as specific as possible while guaranteeing an arbitrarily high accuracy. We formulate the problem as maximizing information gain while ensuring a fixed, arbitrarily small error rate with a semantic hierarchy. We propose the Dual Accuracy Reward Trade-off Search (DARTS) algorithm and prove that, under practical conditions, it converges to an optimal solution. Experiments demonstrate the effectiveness of our algorithm on datasets ranging from 65 to over 10,000 categories.

...read moreread less

201 citations

Proceedings Article•DOI•

Understanding and predicting importance in images

[...]

Alexander C. Berg¹, Tamara L. Berg¹, Hal Daumé², Jesse Dodge³, Amit Goyal², Xufeng Han¹, Alyssa Mensch⁴, Margaret Mitchell⁵, Aneesh Sood¹, Karl Stratos⁶, Kota Yamaguchi¹ - Show less +7 more•Institutions (6)

Stony Brook University¹, University of Maryland, College Park², University of Washington³, Massachusetts Institute of Technology⁴, University of Aberdeen⁵, Columbia University⁶

16 Jun 2012

TL;DR: This paper explores how a number of factors relate to human perception of importance using what people describe as a proxy for importance, and builds models to predict what will be described about an image given either known image content, or image content estimated automatically by recognition systems.

...read moreread less

Abstract: What do people care about in an image? To drive computational visual recognition toward more human-centric outputs, we need a better understanding of how people perceive and judge the importance of content in images. In this paper, we explore how a number of factors relate to human perception of importance. Proposed factors fall into 3 broad types: 1) factors related to composition, e.g. size, location, 2) factors related to semantics, e.g. category of object or scene, and 3) contextual factors related to the likelihood of attribute-object, or object-scene pairs. We explore these factors using what people describe as a proxy for importance. Finally, we build models to predict what will be described about an image given either known image content, or image content estimated automatically by recognition systems.

...read moreread less

179 citations

Proceedings Article•

Detecting Visual Text

[...]

Jesse Dodge¹, Amit Goyal², Xufeng Han³, Alyssa Mensch⁴, Margaret Mitchell⁵, Karl Stratos⁶, Kota Yamaguchi³, Yejin Choi³, Hal Daumé², Alexander C. Berg³, Tamara L. Berg³ - Show less +7 more•Institutions (6)

University of Washington¹, University of Maryland, College Park², Stony Brook University³, Massachusetts Institute of Technology⁴, Oregon Health & Science University⁵, Columbia University⁶

03 Jun 2012

TL;DR: This work concretely defines what it means to be visual, annotate visual text and develops algorithms to automatically classify noun phrases as visual or non-visual, and finds that using text alone, it is able to achieve high accuracies at this task, and that incorporating features derived from computer vision algorithms improves performance.

...read moreread less

Abstract: When people describe a scene, they often include information that is not visually apparent; sometimes based on background knowledge, sometimes to tell a story. We aim to separate visual text---descriptions of what is being seen---from non-visual text in natural images and their descriptions. To do so, we first concretely define what it means to be visual, annotate visual text and then develop algorithms to automatically classify noun phrases as visual or non-visual. We find that using text alone, we are able to achieve high accuracies at this task, and that incorporating features derived from computer vision algorithms improves performance. Finally, we show that we can reliably mine visual nouns and adjectives from large corpora and that we can use these effectively in the classification task.

...read moreread less

53 citations

Journal Article•DOI•

Modeling guidance and recognition in categorical search: Bridging human and computer object detection

[...]

Gregory J. Zelinsky¹, Yifan Peng¹, Alexander C. Berg¹, Dimitris Samaras¹•Institutions (1)

Stony Brook University¹

02 Aug 2012-Journal of Vision

TL;DR: It is concluded that guidance and recognition in the context of search are not separate processes mediated by different features, and that what the literature knows as guidance is really recognition performed on blurred objects viewed in the visual periphery.

...read moreread less

Abstract: Search is commonly described as a repeating cycle of guidance to target-like objects, followed by the recognition of these objects as targets or distractors. Are these indeed separate processes using different visual features? We addressed this question by comparing observer behavior to that of support vector machine (SVM) models trained on guidance and recognition tasks. Observers searched for a categorically defined teddy bear target in four-object arrays. Target-absent trials consisted of random category distractors rated in their visual similarity to teddy bears. Guidance, quantified as first-fixated objects during search, was strongest for targets, followed by target-similar, medium-similarity, and target-dissimilar distractors. False positive errors to first-fixated distractors also decreased with increasing dissimilarity to the target category. To model guidance, nine teddy bear detectors, using features ranging in biological plausibility, were trained on unblurred bears then tested on blurred versions of the same objects appearing in each search display. Guidance estimates were based on target probabilities obtained from these detectors. To model recognition, nine bear/nonbear classifiers, trained and tested on unblurred objects, were used to classify the object that would be fixated first (based on the detector estimates) as a teddy bear or a distractor. Patterns of categorical guidance and recognition accuracy were modeled almost perfectly by an HMAX model in combination with a color histogram feature. We conclude that guidance and recognition in the context of search are not separate processes mediated by different features, and that what the literature knows as guidance is really recognition performed on blurred objects viewed in the visual periphery.

...read moreread less

37 citations

Proceedings Article•DOI•

DCMSVM: Distributed parallel training for single-machine multiclass classifiers

[...]

Xufeng Han¹, Alexander C. Berg¹•Institutions (1)

Stony Brook University¹

16 Jun 2012

TL;DR: This work builds on a framework borrowed from parallel convex optimization - the alternating direction method of multipliers (ADMM) - to develop a new consensus based algorithm for distributed training of single-machine approaches which allows distributed parallel training with small communication requirements.

...read moreread less

Abstract: We present an algorithm and implementation for distributed parallel training of single-machine multiclass SVMs. While there is ongoing and healthy debate about the best strategy for multiclass classification, there are some features of the single-machine approach that are not available when training alternatives such as one-vs-all, and that are quite complex for tree based methods. One obstacle to exploring single-machine approaches on large datasets is that they are usually limited to running on a single machine! We build on a framework borrowed from parallel convex optimization — the alternating direction method of multipliers (ADMM) — to develop a new consensus based algorithm for distributed training of single-machine approaches. This is demonstrated with an implementation of our novel sequential dual algorithm (DCMSVM) which allows distributed parallel training with small communication requirements. Benchmark results show significant reduction in wall clock time compared to current state of the art multiclass SVM implementation (Liblinear) on a single node. Experiments are performed on large scale image classification including results with modern high-dimensional features.

...read moreread less

7 citations