scispace - formally typeset
Search or ask a question
Author

Martin Herman

Bio: Martin Herman is an academic researcher from Carnegie Mellon University. The author has contributed to research in topics: Image processing & Scene description language. The author has an hindex of 6, co-authored 10 publications receiving 339 citations. Previous affiliations of Martin Herman include National Institute of Standards and Technology.

Papers
More filters
Journal ArticleDOI
TL;DR: The 3D Mosaic system is a vision system that incrementally reconstructs complex 3D scenes from a sequence of images obtained from multiple viewpoints, and the various components of the system are described, including stereo analysis, monocular analysis, and constructing and updating the scene model.
Abstract: The 3D Mosaic system is a vision system that incrementally reconstructs complex 3D scenes from a sequence of images obtained from multiple viewpoints. The system encompasses several levels of the vision process, starting with images and ending with symbolic scene descriptions. This paper describes the various components of the system, including stereo analysis, monocular analysis, and constructing and updating the scene model. In addition, the representation of the scene model is described. This model is intended for tasks such as matching, display generation, planning paths through the scene, and making other decisions about the scene environment. Examples showing how the system is used to interpret complex aerial photographs of urban scenes are presented. Each view of the scene, which may be either a single image or a stereo pair, undergoes analysis which results in a 3D wire-frame description that represents portions of edges and vertices of objects. The model is a surface-based description constructed from the wire frames. With each successive view, the model is incrementally updated and gradually becomes more accurate and complete. Task-specific knowledge, involving block-shaped objects in an urban scene, is used to extract the wire frames and construct and update the model. The model is represented as a graph in terms of symbolic primitives such as faces, edges, vertices, and their topology and geometry. This permits the representation of partially complete, planar-faced objects. Because incremental modifications to the model must be easy to perform, the model contains mechanisms to (1) add primitives in a manner such that constraints on geometry imposed by these additions are propagated throughout the model, and (2) modify and delete primitives if discrepancies arise between newly derived and current information. The model also contains mechanisms that permit the generation, addition, and deletion of hypotheses for parts of the scene for which there is little data.

146 citations

Book ChapterDOI
01 Jan 1987
TL;DR: The various components of the 3D Mosaic system are described, including stereo analysis, monocular analysis, and constructing and modifying the scene model, intended for tasks such as matching, display generation, planning paths through the scene, and making other decisions about the scene environment.
Abstract: The 3D Mosaic system is a vision system that incrementally reconstructs complex 3D scenes from multiple images. The system encompasses several levels of the vision process, starting with images and ending with symbolic scene descriptions. This paper describes the various components of the system, including stereo analysis, monocular analysis, and constructing and modifying the scene model In addition, the representation of the scene model is described. This model is intended for tasks such as matching, display generation, planning paths through the scene, and making other decisions about the scene environment. Examples showing how the system is used to interpret complex aerial photographs of urban scenes are presented. Each view of the scene, which may be either a single image or a stereo pair, undergoes analysis which results in a 3D wire-frame description that represents portions of edges and vertices of objects. The model is a surface-based description constructed from the wire frames. With each successive view, the model is incrementally updated and gradually becomes more accurate and complete. Task-specific knowledge, involving block-shaped objects in an urban scene, is used to extract the wire frames and construct and update the model.

46 citations

01 Oct 1982
TL;DR: The current state of the3-D Mosaic project, whose goal is to incrementally acquire a 3-D model of a complex urban scene from images, is described and an experiment in combining two views of the scene to obtain a rermed model is described.
Abstract: We describe the current state of the 3-D Mosaic project, whose goal is to incrementally acquire a 3-D model of a complex urban scene from images. The notion of incremental acquisition arises from the observations that 1) single images contain only parfial information about a scene, 2) complex images are difficult to fully interpret, and 3) different features of a given scene tend to be easier to extract in different images because of differences in viewpoint and lighting conditions. In our approach, multiple images of the scene are sequentially analyzed so as to incrementaly construct the model. Each new image provides information which refines the model. We describe some experiments toward this end. Our method of extracting 3-D shape information from the images is stereo analysis. Because we are dealing with urban scenes, a junction-based matching technique proves very useful. This technique produces rather sparse wire-frame descriptions of the scene. A reasoning system that relies on task-specific knowledge generates an approximate model of the scene from the stereo output. Gray scale information is also acquired for the faces in the model. Finally, we describe an experiment in combining two views of the scene to obtain a rermed model.

43 citations

Journal ArticleDOI
TL;DR: The 3D Mosaic project as discussed by the authors uses stereo analysis to extract 3D shape information from the images of complex urban scenes, and then combines two views of the scene to obtain a re-med model.
Abstract: We describe the current state of the 3-D Mosaic project, whose goal is to incrementally acquire a 3-D model of a complex urban scene from images. The notion of incremental acquisition arises from the observations that 1) single images contain only parfial information about a scene, 2) complex images are difficult to fully interpret, and 3) different features of a given scene tend to be easier to extract in different images because of differences in viewpoint and lighting conditions. In our approach, multiple images of the scene are sequentially analyzed so as to incrementaly construct the model. Each new image provides information which refines the model. We describe some experiments toward this end. Our method of extracting 3-D shape information from the images is stereo analysis. Because we are dealing with urban scenes, a junction-based matching technique proves very useful. This technique produces rather sparse wire-frame descriptions of the scene. A reasoning system that relies on task-specific knowledge generates an approximate model of the scene from the stereo output. Gray scale information is also acquired for the faces in the model. Finally, we describe an experiment in combining two views of the scene to obtain a rermed model.

41 citations

Proceedings Article
01 Jan 1983

34 citations


Cited by
More filters
01 Jan 1979
TL;DR: This special issue aims at gathering the recent advances in learning with shared information methods and their applications in computer vision and multimedia analysis and addressing interesting real-world computer Vision and multimedia applications.
Abstract: In the real world, a realistic setting for computer vision or multimedia recognition problems is that we have some classes containing lots of training data and many classes contain a small amount of training data. Therefore, how to use frequent classes to help learning rare classes for which it is harder to collect the training data is an open question. Learning with Shared Information is an emerging topic in machine learning, computer vision and multimedia analysis. There are different level of components that can be shared during concept modeling and machine learning stages, such as sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, etc. Regarding the specific methods, multi-task learning, transfer learning and deep learning can be seen as using different strategies to share information. These learning with shared information methods are very effective in solving real-world large-scale problems. This special issue aims at gathering the recent advances in learning with shared information methods and their applications in computer vision and multimedia analysis. Both state-of-the-art works, as well as literature reviews, are welcome for submission. Papers addressing interesting real-world computer vision and multimedia applications are especially encouraged. Topics of interest include, but are not limited to: • Multi-task learning or transfer learning for large-scale computer vision and multimedia analysis • Deep learning for large-scale computer vision and multimedia analysis • Multi-modal approach for large-scale computer vision and multimedia analysis • Different sharing strategies, e.g., sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, • Real-world computer vision and multimedia applications based on learning with shared information, e.g., event detection, object recognition, object detection, action recognition, human head pose estimation, object tracking, location-based services, semantic indexing. • New datasets and metrics to evaluate the benefit of the proposed sharing ability for the specific computer vision or multimedia problem. • Survey papers regarding the topic of learning with shared information. Authors who are unsure whether their planned submission is in scope may contact the guest editors prior to the submission deadline with an abstract, in order to receive feedback.

1,758 citations

Journal ArticleDOI
TL;DR: In this paper, a precise definition of the 3D object recognition problem is proposed, and basic concepts associated with this problem are discussed, and a review of relevant literature is provided.
Abstract: A general-purpose computer vision system must be capable of recognizing three-dimensional (3-D) objects. This paper proposes a precise definition of the 3-D object recognition problem, discusses basic concepts associated with this problem, and reviews the relevant literature. Because range images (or depth maps) are often used as sensor input instead of intensity images, techniques for obtaining, processing, and characterizing range data are also surveyed.

1,146 citations

Journal ArticleDOI
TL;DR: This paper presents a stereo matching algorithm using the dynamic programming technique that uses edge-delimited intervals as elements to be matched, and employs the above mentioned two searches: one is inter-scanline search for possible correspondences of connected edges in right and left images and the other is intra-scanlines search for correspondence of edge-Delimited interval on each scanline pair.
Abstract: This paper presents a stereo matching algorithm using the dynamic programming technique. The stereo matching problem, that is, obtaining a correspondence between right and left images, can be cast as a search problem. When a pair of stereo images is rectified, pairs of corresponding points can be searched for within the same scanlines. We call this search intra-scanline search. This intra-scanline search can be treated as the problem of finding a matching path on a two-dimensional (2D) search plane whose axes are the right and left scanlines. Vertically connected edges in the images provide consistency constraints across the 2D search planes. Inter-scanline search in a three-dimensional (3D) search space, which is a stack of the 2D search planes, is needed to utilize this constraint. Our stereo matching algorithm uses edge-delimited intervals as elements to be matched, and employs the above mentioned two searches: one is inter-scanline search for possible correspondences of connected edges in right and left images and the other is intra-scanline search for correspondences of edge-delimited intervals on each scanline pair. Dynamic programming is used for both searches which proceed simultaneously: the former supplies the consistency constraint to the latter while the latter supplies the matching score to the former. An interval-based similarity metric is used to compute the score. The algorithm has been tested with different types of images including urban aerial images, synthesized images, and block scenes, and its computational requirement has been discussed.

913 citations

Journal ArticleDOI
TL;DR: This paper surveys the state-of-the-art automatic object extraction techniques from aerial imagery and focuses on building extraction approaches, which present the majority of the work in this area.
Abstract: This paper surveys the state-of-the-art automatic object extraction techniques from aerial imagery. It focuses on building extraction approaches, which present the majority of the work in this area. After proposing well-defined criteria for their assessment, characteristic approaches are selected and assessed, based on their models and strategies. The assessment gives rise to a combined model and strategy covering the current knowledge in the field. The model comprises: the derivation of characteristic properties from the function of objects; three-dimensional geometry and material properties; scales and levels of abstraction/aggregation; local and global context. The strategy consists of grouping, focusing on different scales, context-based control and generation of evidence from structures of parts, and fusion of data and algorithms. Many ideas which have not been explored in depth lead to promising directions for further research.

390 citations

Journal ArticleDOI
TL;DR: The authors propose a method for solving the stereo correspondence problem by extracting local image structures and matching similar such structures between two images using a benefit function.
Abstract: The authors propose a method for solving the stereo correspondence problem. The method consists of extracting local image structures and matching similar such structures between two images. Linear edge segments are extracted from both the left and right images. Each segment is characterized by its position and orientation in the image as well as its relationships with the nearby segments. A relational graph is thus built from each image. For each segment in one image as set of potential assignments is represented as a set of nodes in a correspondence graph. Arcs in the graph represent compatible assignments established on the basis of segment relationships. Stereo matching becomes equivalent to searching for sets of mutually compatible nodes in this graph. Sets are found by looking for maximal cliques. The maximal clique best suited to represent a stereo correspondence is selected using a benefit function. Numerous results obtained with this method are shown. >

370 citations