scispace - formally typeset
Search or ask a question
Author

Gerard Medioni

Other affiliations: Apple Inc., Philips, AmeriCorps VISTA  ...read more
Bio: Gerard Medioni is an academic researcher from Amazon.com. The author has contributed to research in topics: Image segmentation & Facial recognition system. The author has an hindex of 72, co-authored 443 publications receiving 24378 citations. Previous affiliations of Gerard Medioni include Apple Inc. & Philips.


Papers
More filters
Journal ArticleDOI
TL;DR: A new approach is proposed which works on range data directly and registers successive views with enough overlapping area to get an accurate transformation between views and is performed by minimizing a functional which does not require point-to-point matches.

2,850 citations

Proceedings ArticleDOI
09 Apr 1991
TL;DR: The authors propose an approach that works on range data directly and registers successive views with enough overlapping area to get an accurate transformation between views and performs a functional that does not require point-to-point matches.
Abstract: The problem of creating a complete model of a physical object is studied. Although this may be possible using intensity images, the authors use range images which directly provide access to three-dimensional information. The first problem that needs to be solved is to find the transformation between the different views. Previous approaches have either assumed this transformation to be known (which is extremely difficult for a complete model) or computed it with feature matching (which is not accurate enough for integration. The authors propose an approach that works on range data directly and registers successive views with enough overlapping area to get an accurate transformation between views. This is performed by minimizing a functional that does not require point-to-point matches. Details are given of the registration method and modeling procedure, and they are illustrated on range images of complex objects. >

2,157 citations

Journal ArticleDOI
TL;DR: The approach uses two different types of primitives for matching: small surface patches, where differential properties can be reliably computed, and lines corresponding to depth or orientation discontinuities, which are represented by splashes and 3-D curves, respectively.
Abstract: The authors present an approach for the recognition of multiple 3-D object models from three 3-D scene data. The approach uses two different types of primitives for matching: small surface patches, where differential properties can be reliably computed, and lines corresponding to depth or orientation discontinuities. These are represented by splashes and 3-D curves, respectively. It is shown how both of these primitives can be encoded by a set of super segments, consisting of connected linear segments. These super segments are entered into a table and provide the essential mechanism for fast retrieval and matching. The issues of robustness and stability of the features are addressed in detail. The acquisition of the 3-D models is performed automatically by computing splashes in highly structured areas of the objects and by using boundary and surface edges for the generation of 3-D curves. The authors present results with the current system (3-D object recognition based on super segments) and discuss further extensions. >

577 citations

Proceedings ArticleDOI
20 Jun 2011
TL;DR: A method to address visual tracking in unconstrained environments by exploiting the context on-the-fly in two terms: Distracters and Supporters is presented, which shows the tracking improvement when using this context information.
Abstract: Visual tracking in unconstrained environments is very challenging due to the existence of several sources of varieties such as changes in appearance, varying lighting conditions, cluttered background, and frame-cuts. A major factor causing tracking failure is the emergence of regions having similar appearance as the target. It is even more challenging when the target leaves the field of view (FoV) leading the tracker to follow another similar object, and not reacquire the right target when it reappears. This paper presents a method to address this problem by exploiting the context on-the-fly in two terms: Distracters and Supporters. Both of them are automatically explored using a sequential randomized forest, an online template-based appearance model, and local features. Distracters are regions which have similar appearance as the target and consistently co-occur with high confidence score. The tracker must keep tracking these distracters to avoid drifting. Supporters, on the other hand, are local key-points around the target with consistent co-occurrence and motion correlation in a short time span. They play an important role in verifying the genuine target. Extensive experiments on challenging real-world video sequences show the tracking improvement when using this context information. Comparisons with several state-of-the-art approaches are also provided.

566 citations

Journal ArticleDOI
TL;DR: A system which takes as input a video stream obtained from an airborne moving platform and produces an analysis of the behavior of the moving objects in the scene and relies on two modular blocks to achieve this functionality.
Abstract: We present a system which takes as input a video stream obtained from an airborne moving platform and produces an analysis of the behavior of the moving objects in the scene. To achieve this functionality, our system relies on two modular blocks. The first one detects and tracks moving regions in the sequence. It uses a set of features at multiple scales to stabilize the image sequence, that is, to compensate for the motion of the observer, then extracts regions with residual motion and uses an attribute graph representation to infer their trajectories. The second module takes as input these trajectories, together with user-provided information in the form of geospatial context and goal context to instantiate likely scenarios. We present details of the system, together with results on a number of real video sequences and also provide a quantitative analysis of the results.

505 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Journal ArticleDOI
TL;DR: It is proved the convergence of a recursive mean shift procedure to the nearest stationary point of the underlying density function and, thus, its utility in detecting the modes of the density.
Abstract: A general non-parametric technique is proposed for the analysis of a complex multimodal feature space and to delineate arbitrarily shaped clusters in it. The basic computational module of the technique is an old pattern recognition procedure: the mean shift. For discrete data, we prove the convergence of a recursive mean shift procedure to the nearest stationary point of the underlying density function and, thus, its utility in detecting the modes of the density. The relation of the mean shift procedure to the Nadaraya-Watson estimator from kernel regression and the robust M-estimators; of location is also established. Algorithms for two low-level vision tasks discontinuity-preserving smoothing and image segmentation - are described as applications. In these algorithms, the only user-set parameter is the resolution of the analysis, and either gray-level or color images are accepted as input. Extensive experimental results illustrate their excellent performance.

11,727 citations

Journal ArticleDOI
TL;DR: A set of automated procedures for obtaining accurate reconstructions of the cortical surface are described, which have been applied to data from more than 100 subjects, requiring little or no manual intervention.

9,599 citations

Journal ArticleDOI
TL;DR: This paper has designed a stand-alone, flexible C++ implementation that enables the evaluation of individual components and that can easily be extended to include new algorithms.
Abstract: Stereo matching is one of the most active research areas in computer vision. While a large number of algorithms for stereo correspondence have been developed, relatively little work has been done on characterizing their performance. In this paper, we present a taxonomy of dense, two-frame stereo methods designed to assess the different components and design decisions made in individual stereo algorithms. Using this taxonomy, we compare existing stereo methods and present experiments evaluating the performance of many different variants. In order to establish a common software platform and a collection of data sets for easy evaluation, we have designed a stand-alone, flexible C++ implementation that enables the evaluation of individual components and that can be easily extended to include new algorithms. We have also produced several new multiframe stereo data sets with ground truth, and are making both the code and data sets available on the Web.

7,458 citations

Journal ArticleDOI
TL;DR: A review of recent as well as classic image registration methods to provide a comprehensive reference source for the researchers involved in image registration, regardless of particular application areas.

6,842 citations