Home
/
Authors
/
J.Y.A. Wang

Author

J.Y.A. Wang

Bio: J.Y.A. Wang is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Motion estimation & Image segmentation. The author has an hindex of 8, co-authored 9 publications receiving 2125 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Representing moving images with layers

[...]

J.Y.A. Wang¹, Edward H. Adelson•Institutions (1)

Massachusetts Institute of Technology¹

01 Sep 1994-IEEE Transactions on Image Processing

TL;DR: A system for representing moving images with sets of overlapping layers that is more flexible than standard image transforms and can capture many important properties of natural image sequences.

...read moreread less

Abstract: We describe a system for representing moving images with sets of overlapping layers. Each layer contains an intensity map that defines the additive values of each pixel, along with an alpha map that serves as a mask indicating the transparency. The layers are ordered in depth and they occlude each other in accord with the rules of compositing. Velocity maps define how the layers are to be warped over time. The layered representation is more flexible than standard image transforms and can capture many important properties of natural image sequences. We describe some methods for decomposing image sequences into layers using motion analysis, and we discuss how the representation may be used for image coding and other applications. >

...read moreread less

1,360 citations

Proceedings Article•DOI•

Layered representation for motion analysis

[...]

J.Y.A. Wang¹, Edward H. Adelson¹•Institutions (1)

Massachusetts Institute of Technology¹

15 Jun 1993

TL;DR: A set of techniques is devised for segmenting images into coherently moving regions using affine motion analysis and clustering techniques and it is possible to decompose an image into a set of layers along with information about occlusion and depth ordering.

...read moreread less

Abstract: Standard approaches to motion analysis assume that the optic flow is smooth; such techniques have trouble dealing with occlusion boundaries. The image sequence can be decomposed into a set of overlapping layers, where each layer's motion is described by a smooth flow field. The discontinuities in the description are then attributed to object opacities rather than to the flow itself, mirroring the structure of the scene. A set of techniques is devised for segmenting images into coherently moving regions using affine motion analysis and clustering techniques. It is possible to decompose an image into a set of layers along with information about occlusion and depth ordering. The techniques are applied to a flower garden sequence. The scene can be analyzed into four layers, and, the entire 30-frame sequence can be represented with a single image of each layer, along with associated motion parameters. >

...read moreread less

344 citations

Patent•

System for encoding image data into multiple layers representing regions of coherent motion and associated motion parameters

[...]

J.Y.A. Wang¹, Edward H. Adelson¹•Institutions (1)

Massachusetts Institute of Technology¹

27 Dec 1994

TL;DR: In this paper, a system stores images as a series of layers by determining (i) the boundaries of regions of coherent motion over the entire image, or frame, sequence; and (ii) associated motion parameters, or coefficients of motion equations, that describe the transformations of the regions from frame to frame.

...read moreread less

Abstract: A system stores images as a series of layers by determining (i) the boundaries of regions of coherent motion over the entire image, or frame, sequence; and (ii) associated motion parameters, or coefficients of motion equations, that describe the transformations of the regions from frame to frame. The system first estimates motion locally, by determining the movements within small neighborhoods of pixels from one image frame i to the next image frame i+1, to develop an optical flow, or dense motion, model of the image. Next, the system estimates the motion using affine or other low order, smooth transformations within a set of regions which the system has previously identified as having coherent motion, i.e., identified by analyzing the motions in the frames i-1 and i. It groups, or clusters, similar motion models and iteratively produces an updated set of models for the image. The system then uses the local motion estimates to associate individual pixels in the image with the motion model that most closely resembles the pixel's movement, to update the regions of coherent motion. Using these updated regions, the system iteratively updates its motion models and, as appropriate, further updates the coherent motion regions, and so forth. The system then does the same analysis for the remaining frames. The system next segments the image into regions of coherent motion and defines associated layers in terms of (i) pixel intensity values, (ii) associated motion model parameters, and (iii) order in "depth" within the image.

...read moreread less

210 citations

Proceedings Article•DOI•

Spatio-Temporal Segmentation of Video Data

[...]

J.Y.A. Wang¹, Edward H. Adelson¹•Institutions (1)

Massachusetts Institute of Technology¹

23 Mar 1994

TL;DR: The objective of the spatiotemporal segmentation is to produce a layered image representation of the video for image coding applications whereby video data is simply described as a set of moving layers.

...read moreread less

Abstract: Image segmentation provides a powerful semantic description of video imagery essential in image understanding and efficient manipulation of image data. In particular, segmentation based on image motion defines regions undergoing similar motion allowing image coding system to more efficiently represent video sequences. This paper describes a general iterative framework for segmentation of video data . The objective of our spatiotemporal segmentation is to produce a layered image representation of the video for image coding applications whereby video data is simply described as a set of moving layers.

...read moreread less

121 citations

Proceedings Article•DOI•

Applying Mid-level Vision Techniques for Video Data Compression and Manipulation

[...]

J.Y.A. Wang¹, Edward H. Adelson¹•Institutions (1)

Massachusetts Institute of Technology¹

02 May 1994

TL;DR: In this paper, a coding scheme based on a set of overlapping layers is described, which are ordered in depth and move over one another, in a manner similar to traditional “cel” animation.

...read moreread less

Abstract: Most image coding systems rely on signal processing concepts such as transforms, VQ, and motion compensation. In order to achieve significantly lower bit rates, it will be necessary to devise encoding schemes that involve mid-level and high-level computer vision. Model-based systems have been described, but these are usually restricted to some special class of images such as head-and-shoulders sequences. We propose to use mid-level vision concepts to achieve a decomposition that can be applied to a wider domain of image material. In particular, we describe a coding scheme based on a set of overlapping layers. The layers, which are ordered in depth and move over one another, are composited in a manner similar to traditional “cel” animation. The decompos ition (the vision problem) is challenging, but we have attained promising results on simple sequences. Once the decomposition has been achieved, the synthesis is straightforward.

...read moreread less

42 citations

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Going deeper with convolutions

[...]

Christian Szegedy¹, Wei Liu², Yangqing Jia¹, Pierre Sermanet¹, Scott Reed³, Dragomir Anguelov¹, Dumitru Erhan¹, Vincent Vanhoucke¹, Andrew Rabinovich - Show less +5 more•Institutions (3)

Google¹, University of North Carolina at Chapel Hill², University of Michigan³

07 Jun 2015

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

...read moreread less

40,257 citations

Journal Article•DOI•

A taxonomy and evaluation of dense two-frame stereo correspondence algorithms

[...]

Daniel Scharstein¹, Richard Szeliski², Ramin Zabih³•Institutions (3)

Middlebury College¹, Microsoft², Cornell University³

09 Dec 2001-International Journal of Computer Vision

TL;DR: This paper has designed a stand-alone, flexible C++ implementation that enables the evaluation of individual components and that can easily be extended to include new algorithms.

...read moreread less

Abstract: Stereo matching is one of the most active research areas in computer vision. While a large number of algorithms for stereo correspondence have been developed, relatively little work has been done on characterizing their performance. In this paper, we present a taxonomy of dense, two-frame stereo methods designed to assess the different components and design decisions made in individual stereo algorithms. Using this taxonomy, we compare existing stereo methods and present experiments evaluating the performance of many different variants. In order to establish a common software platform and a collection of data sets for easy evaluation, we have designed a stand-alone, flexible C++ implementation that enables the evaluation of individual components and that can be easily extended to include new algorithms. We have also produced several new multiframe stereo data sets with ground truth, and are making both the code and data sets available on the Web.

...read moreread less

7,458 citations

Journal Article•DOI•

Object tracking: A survey

[...]

Alper Yilmaz¹, Omar Javed, Mubarak Shah²•Institutions (2)

Ohio State University¹, University of Central Florida²

25 Dec 2006-ACM Computing Surveys

TL;DR: The goal of this article is to review the state-of-the-art tracking methods, classify them into different categories, and identify new trends to discuss the important issues related to tracking including the use of appropriate image features, selection of motion models, and detection of objects.

...read moreread less

Abstract: The goal of this article is to review the state-of-the-art tracking methods, classify them into different categories, and identify new trends. Object tracking, in general, is a challenging problem. Difficulties in tracking objects can arise due to abrupt object motion, changing appearance patterns of both the object and the scene, nonrigid object structures, object-to-object and object-to-scene occlusions, and camera motion. Tracking is usually performed in the context of higher-level applications that require the location and/or shape of the object in every frame. Typically, assumptions are made to constrain the tracking problem in the context of a particular application. In this survey, we categorize the tracking methods on the basis of the object and motion representations used, provide detailed descriptions of representative methods in each category, and examine their pros and cons. Moreover, we discuss the important issues related to tracking including the use of appropriate image features, selection of motion models, and detection of objects.

...read moreread less

5,318 citations

Book•

Computer Vision: Algorithms and Applications

[...]

Richard Szeliski

30 Sep 2010

TL;DR: Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images and takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene.

...read moreread less

Abstract: Humans perceive the three-dimensional structure of the world with apparent ease. However, despite all of the recent advances in computer vision research, the dream of having a computer interpret an image at the same level as a two-year old remains elusive. Why is computer vision such a challenging problem and what is the current state of the art? Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images. It also describes challenging real-world applications where vision is being successfully used, both for specialized applications such as medical imaging, and for fun, consumer-level tasks such as image editing and stitching, which students can apply to their own personal photos and videos. More than just a source of recipes, this exceptionally authoritative and comprehensive textbook/reference also takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene. These problems are also analyzed using statistical models and solved using rigorous engineering techniques Topics and features: structured to support active curricula and project-oriented courses, with tips in the Introduction for using the book in a variety of customized courses; presents exercises at the end of each chapter with a heavy emphasis on testing algorithms and containing numerous suggestions for small mid-term projects; provides additional material and more detailed mathematical topics in the Appendices, which cover linear algebra, numerical techniques, and Bayesian estimation theory; suggests additional reading at the end of each chapter, including the latest research in each sub-field, in addition to a full Bibliography at the end of the book; supplies supplementary course material for students at the associated website, http://szeliski.org/Book/. Suitable for an upper-level undergraduate or graduate-level course in computer science or engineering, this textbook focuses on basic techniques that work under real-world conditions and encourages students to push their creative boundaries. Its design and exposition also make it eminently suitable as a unique reference to the fundamental techniques and current research literature in computer vision.

...read moreread less

4,146 citations

Journal Article•DOI•

Query by image and video content: the QBIC system

[...]

Myron D. Flickner¹, Harpreet Sawhney¹, W. Niblack¹, Jonathan Ashley¹, Qian Huang¹, Byron Dom¹, Monika Gorkani¹, James Lee Hafner¹, D. Lee¹, Dragutin Petkovic¹, David Steele¹, Peter Cornelius Yanker¹ - Show less +8 more•Institutions (1)

IBM¹

01 Sep 1995-IEEE Computer

TL;DR: The Query by Image Content (QBIC) system as discussed by the authors allows queries on large image and video databases based on example images, user-constructed sketches and drawings, selected color and texture patterns, camera and object motion, and other graphical information.

...read moreread less

Abstract: Research on ways to extend and improve query methods for image databases is widespread. We have developed the QBIC (Query by Image Content) system to explore content-based retrieval methods. QBIC allows queries on large image and video databases based on example images, user-constructed sketches and drawings, selected color and texture patterns, camera and object motion, and other graphical information. Two key properties of QBIC are (1) its use of image and video content-computable properties of color, texture, shape and motion of images, videos and their objects-in the queries, and (2) its graphical query language, in which queries are posed by drawing, selecting and other graphical means. This article describes the QBIC system and demonstrates its query capabilities. QBIC technology is part of several IBM products. >

...read moreread less

3,957 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse