scispace - formally typeset
Search or ask a question

Showing papers on "Silhouette published in 2013"


Journal ArticleDOI
TL;DR: A human action recognition method is presented in which pose representation is based on the contour points of the human silhouette and actions are learned by making use of sequences of multi-view key poses, achieving state-of-the-art success rates without compromising the speed of the recognition process.

168 citations


Journal ArticleDOI
TL;DR: This paper proposes to use variations in silhouette area that are obtained from only one camera to find the silhouette, and shows that the proposed feature is view invariant.
Abstract: Population of old generation is growing in most countries. Many of these seniors are living alone at home. Falling is among the most dangerous events that often happen and may need immediate medical care. Automatic fall detection systems could help old people and patients to live independently. Vision-based systems have advantage over wearable devices. These visual systems extract some features from video sequences and classify fall and normal activities. These features usually depend on camera's view direction. Using several cameras to solve this problem increases the complexity of the final system. In this paper, we propose to use variations in silhouette area that are obtained from only one camera. We use a simple background separation method to find the silhouette. We show that the proposed feature is view invariant. Extracted feature is fed into a support vector machine for classification. Simulation of the proposed method using a publicly available dataset shows promising results.

157 citations


Journal ArticleDOI
TL;DR: A positive linear association between preferred silhouette and age remained after stratification by BMI, and a significant inverse linear association of silhouette discrepancy score and age was found only prior to stratifying by BMI.
Abstract: To explore age differences in current and preferred silhouette and body dissatisfaction (current − preferred silhouette discrepancy) in women aged 25–89 years using figural stimuli [range: 1 (very small) to 9 (very large)]. Data were abstracted from two online convenience samples (N = 5868). t-tests with permutation-adjusted p-values examined linear associations between mean silhouette scores (current, preferred, discrepancy score) and age with/without stratification by body mass index (BMI). Modal current silhouette was 5; modal preferred silhouette was 4; mean discrepancy score was 1.8. There was no significant association between current silhouette and age, but a positive linear association between preferred silhouette and age remained after stratification by BMI. A significant inverse linear association of silhouette discrepancy score and age was found only prior to stratification by BMI. Body dissatisfaction exists in women across the adult life span and is influenced by BMI. Copyright © 2012 John Wiley & Sons, Ltd and Eating Disorders Association.

140 citations


Journal ArticleDOI
TL;DR: A novel method based on the anthropometric measures of the hand is proposed for extracting the regions constituting the hand and the forearm and it can be well realized for real-time implementation of gesture based applications.

130 citations


Journal ArticleDOI
TL;DR: A novel incremental framework based on optical flow is proposed, which can greatly improve the usability of gait traits in video surveillance applications and makes the training process of the HMM more robust to noise.
Abstract: Gait analysis provides a feasible approach for identification in intelligent video surveillance. However, the effectiveness of the dominant silhouette-based approaches is overly dependent upon background subtraction. In this paper, we propose a novel incremental framework based on optical flow, including dynamics learning, pattern retrieval, and recognition. It can greatly improve the usability of gait traits in video surveillance applications. Local binary pattern (LBP) is employed to describe the texture information of optical flow. This representation is called LBP flow, which performs well as a static representation of gait movement. Dynamics within and among gait stances becomes the key consideration for multiframe detection and tracking, which is quite different from existing approaches. To simulate the natural way of knowledge acquisition, an individual hidden Markov model (HMM) representing the gait dynamics of a single subject incrementally evolves from a population model that reflects the average motion process of human gait. It is beneficial for both tracking and recognition and makes the training process of the HMM more robust to noise. Extensive experiments on widely adopted databases have been carried out to show that our proposed approach achieves excellent performance.

123 citations


Journal ArticleDOI
TL;DR: Results show that the proposed approach efficiently tackles main Kinect data problems: distance-dependent depth maps, spatial noise, and temporal random fluctuations are dramatically reduced; objects depth boundaries are refined, and nonmeasured depth pixels are interpolated.
Abstract: Low-cost depth cameras, such as Microsoft Kinect, have completely changed the world of human-computer interaction through controller-free gaming applications. Depth data provided by the Kinect sensor presents several noise-related problems that have to be tackled to improve the accuracy of the depth data, thus obtaining more reliable game control platforms and broadening its applicability. In this paper, we present a depth-color fusion strategy for 3-D modeling of indoor scenes with Kinect. Accurate depth and color models of the background elements are iteratively built, and used to detect moving objects in the scene. Kinect depth data is processed with an innovative adaptive joint-bilateral filter that efficiently combines depth and color by analyzing an edge-uncertainty map and the detected foreground regions. Results show that the proposed approach efficiently tackles main Kinect data problems: distance-dependent depth maps, spatial noise, and temporal random fluctuations are dramatically reduced; objects depth boundaries are refined, and nonmeasured depth pixels are interpolated. Moreover, a robust depth and color background model and accurate moving objects silhouette are generated.

101 citations


Journal ArticleDOI
TL;DR: The proposed scheme takes advantages of local and global features and, therefore, provides a discriminative representation for human actions and outperforms the state-of-the-art methods on the IXMAS action recognition dataset.
Abstract: In this paper, we propose a novel scheme for human action recognition that combines the advantages of both local and global representations. We explore human silhouettes for human action representation by taking into account the correlation between sequential poses in an action. A modified bag-of-words model, named bag of correlated poses, is introduced to encode temporally local features of actions. To utilize the property of visual word ambiguity, we adopt the soft assignment strategy to reduce the dimensionality of our model and circumvent the penalty of computational complexity and quantization error. To compensate for the loss of structural information, we propose an extended motion template, i.e., extensions of the motion history image, to capture the holistic structural features. The proposed scheme takes advantages of local and global features and, therefore, provides a discriminative representation for human actions. Experimental results prove the viability of the complimentary properties of two descriptors and the proposed approach outperforms the state-of-the-art methods on the IXMAS action recognition dataset.

97 citations


Journal ArticleDOI
TL;DR: A novel approach to recovering and grouping the symmetric parts of an object from a cluttered scene by using a multiresolution superpixel segmentation to generate medial point hypotheses, and using a learned affinity function to perceptually group nearby medial points likely to belong to the same medial branch.
Abstract: Skeletonization algorithms typically decompose an object's silhouette into a set of symmetric parts, offering a powerful representation for shape categorization. However, having access to an object's silhouette assumes correct figure-ground segmentation, leading to a disconnect with the mainstream categorization community, which attempts to recognize objects from cluttered images. In this paper, we present a novel approach to recovering and grouping the symmetric parts of an object from a cluttered scene. We begin by using a multiresolution superpixel segmentation to generate medial point hypotheses, and use a learned affinity function to perceptually group nearby medial points likely to belong to the same medial branch. In the next stage, we learn higher granularity affinity functions to group the resulting medial branches likely to belong to the same object. The resulting framework yields a skeletal approximation that is free of many of the instabilities that occur with traditional skeletons. More importantly, it does not require a closed contour, enabling the application of skeleton-based categorization systems to more realistic imagery.

96 citations


Proceedings ArticleDOI
02 Dec 2013
TL;DR: The combination of body pose estimation and 2D shape, in order to provide additional characteristic value, is considered so as to improve human action recognition and achieves to improve the recognition rates, outperforming state-of-the-art results in recognition rate and robustness.
Abstract: Since the Microsoft Kinect has been released, the usage of marker-less body pose estimation has been enormously eased. Based on 3D skeletal pose information, complex human gestures and actions can be recognised in real time. However, due to errors in tracking or occlusions, the obtained information can be noisy. Since the RGB-D data is available, the 3D or 2D shape of the person can be used instead. However, depending on the viewpoint and the action to recognise, it might present a low discriminative value. In this paper, the combination of body pose estimation and 2D shape, in order to provide additional characteristic value, is considered so as to improve human action recognition. Using efficient feature extraction techniques, skeletal and silhouette-based features are obtained which are low dimensional and can be obtained in real time. These two features are then combined by means of feature fusion. The proposed approach is validated using a state-of-the-art learning method and the MSR Action3D dataset as benchmark. The obtained results show that the fused feature achieves to improve the recognition rates, outperforming state-of-the-art results in recognition rate and robustness.

94 citations


Journal ArticleDOI
TL;DR: An automatic pipeline for identifying and extracting the silhouette of signs in every individual image and a multi-view constrained 3D reconstruction algorithm provides an optimum 3D silhouette for the detected signs.
Abstract: 3D reconstruction of traffic signs is of great interest in many applications such as image-based localization and navigation In order to reflect the reality, the reconstruction process should meet both accuracy and precision In order to reach such a valid reconstruction from calibrated multi-view images, accurate and precise extraction of signs in every individual view is a must This paper presents first an automatic pipeline for identifying and extracting the silhouette of signs in every individual image Then, a multi-view constrained 3D reconstruction algorithm provides an optimum 3D silhouette for the detected signs The first step called detection, tackles with a color-based segmentation to generate ROIs (Region of Interests) in image The shape of every ROI is estimated by fitting an ellipse, a quadrilateral or a triangle to edge points A ROI is rejected if none of the three shapes can be fitted sufficiently precisely Thanks to the estimated shape the remained candidates ROIs are rectified to remove the perspective distortion and then matched with a set of reference signs using textural information Poor matches are rejected and the types of remained ones are identified The output of the detection algorithm is a set of identified road signs whose silhouette in image plane is represented by and ellipse, a quadrilateral or a triangle The 3D reconstruction process is based on a hypothesis generation and verification Hypotheses are generated by a stereo matching approach taking into account epipolar geometry and also the similarity of the categories The hypotheses that are plausibly correspond to the same 3D road sign are identified and grouped during this process Finally, all the hypotheses of the same group are merged to generate a unique 3D road sign by a multi-view algorithm integrating a priori knowledges about 3D shape of road signs as constraints The algorithm is assessed on real and synthetic images and reached and average accuracy of 35cm for position and 45° for orientation

66 citations


Journal ArticleDOI
TL;DR: A three-phase gait recognition method that analyses the spatio-temporal shape and dynamic motion characteristics of a human subject's silhouettes to identify the subject in the presence of most of the challenging factors that affect existing gait recognized systems is presented.

Proceedings ArticleDOI
16 Mar 2013
TL;DR: This work developed a real-time finger tracking technique using the Microsoft Kinect as an input device and compared its results with an existing technique that uses the K-curvature algorithm.
Abstract: Hand gestures are intuitive ways to interact with a variety of user interfaces. We developed a real-time finger tracking technique using the Microsoft Kinect as an input device and compared its results with an existing technique that uses the K-curvature algorithm. Our technique calculates feature vectors based on Fourier descriptors of equidistant points chosen on the silhouette of the detected hand and uses template matching to find the best match. Our preliminary results show that our technique performed as well as an existing k-curvature algorithm based finger detection technique.

Proceedings ArticleDOI
29 Jun 2013
TL;DR: A new method designed to reconstruct closed surfaces that relies on recent advances in silhouette Based reconstruction methods to obtain the template from a reference image, and combines an inextensibility prior on the deformation with powerful image measurements, in the form of silhouette and area constraints.
Abstract: Reconstructing the shape of a deformable object from a single image is a challenging problem, even when a 3D template shape is available. Many different methods have been proposed for this problem, however what they have in common is that they are only able to reconstruct the part of the surface which is visible in a reference image. In contrast, we are interested in recovering the full shape of a deformable 3D object. We introduce a new method designed to reconstruct closed surfaces. This type of surface is better suited for representing objects with volume. Our method relies on recent advances in silhouette Based reconstruction methods to obtain the template from a reference image. This template is then deformed in order to fit the measurements of a new input image. We combine an inextensibility prior on the deformation with powerful image measurements, in the form of silhouette and area constraints, to make our method less reliant on point correspondences. We show reconstruction results for different object classes, such as animals or hands, that have not been previously attempted with existing template methods.

Journal ArticleDOI
01 Jan 2013
TL;DR: A method that integrates both geometric and statistical priors to reconstruct the shape of a subject assuming a standardized posture from a frontal and a lateral silhouette and shows a mean absolute 3D error of 8 mm with ideal silhouettes extraction.
Abstract: Silhouettes are robust image features that provide considerable evidence about the three-dimensional (3D) shape of a human body. The information they provide is, however, incomplete and prior knowledge has to be integrated to reconstruction algorithms in order to obtain realistic body models. This paper presents a method that integrates both geometric and statistical priors to reconstruct the shape of a subject assuming a standardized posture from a frontal and a lateral silhouette. The method is comprised of three successive steps. First, a non-linear function that connects the silhouette appearances and the body shapes is learnt and used to create a first approximation. Then, the body shape is deformed globally along the principal directions of the population (obtained by performing principal component analysis over 359 subjects) to follow the contours of the silhouettes. Finally, the body shape is deformed locally to ensure it fits the input silhouettes as well as possible. Experimental results showed a mean absolute 3D error of 8 mm with ideal silhouettes extraction. Furthermore, experiments on body measurements (circumferences or distances between two points on the body) resulted in a mean error of 11 mm.

Journal ArticleDOI
01 Jan 2013
TL;DR: This paper presents a system for hand gesture recognition devoted to control windows applications that performs hand segmentation as well as a low-level extraction of potentially relevant features which are related to the morphological representation of the hand silhouette.
Abstract: The use of hand gestures offers an alternative to the commonly used human computer interfaces, providing a more intuitive way of navigating among menus and multimedia applications. This paper presents a system for hand gesture recognition devoted to control windows applications. Starting from the images captured by a time-of-flight camera (a camera that produces images with an intensity level inversely proportional to the depth of the objects observed) the system performs hand segmentation as well as a low-level extraction of potentially relevant features which are related to the morphological representation of the hand silhouette. Classification based on these features discriminates between a set of possible static hand postures which results, combined with the estimated motion pattern of the hand, in the recognition of dynamic hand gestures. The whole system works in real-time, allowing practical interaction between user and application.

Proceedings ArticleDOI
23 Jun 2013
TL;DR: An algorithm for an approximate solution using local minimum search for reconstructing the shape of thin, texture-less objects such as leafless trees when there is noise or deterministic error in the silhouette extraction step or there are small errors in camera calibration.
Abstract: This paper considers the problem of reconstructing the shape of thin, texture-less objects such as leafless trees when there is noise or deterministic error in the silhouette extraction step or there are small errors in camera calibration. Traditional intersection-based techniques such as the visual hull are not robust to error because they penalize false negative and false positive error unequally. We provide a voxel-based formalism that penalizes false negative and positive error equally, by casting the reconstruction problem as a pseudo-Boolean minimization problem, where voxels are the variables of a pseudo-Boolean function and are labeled occupied or empty. Since the pseudo-Boolean minimization problem is NP-Hard for nonsubmodular functions, we developed an algorithm for an approximate solution using local minimum search. Our algorithm treats input binary probability maps (in other words, silhouettes) or continuously-valued probability maps identically, and places no constraints on camera placement. The algorithm was tested on three different leafless trees and one metal object where the number of voxels is 54.4 million (voxel sides measure 3.6 mm). Results show that our approach reconstructs the complicated branching structure of thin, texture-less objects in the presence of error where intersection-based approaches currently fail.

Book ChapterDOI
09 Jun 2013
TL;DR: Four silhouette features are selected which represent the dynamics of gait motion and can more effectively reflect the tiny variance between different gait patterns and be recognized according to the smallest error principle.
Abstract: In this paper, we present a new silhouette-based gait recognition method via deterministic learning theory. We select four silhouette features which represent the dynamics of gait motion and can more effectively reflect the tiny variance between different gait patterns. The gait recognition approach consists of two phases: a training phase and a test phase. In the training phase, the gait dynamics underlying different individuals' gaits are locally-accurately approximated by radial basis function (RBF) networks. The obtained knowledge of approximated gait dynamics is stored in constant RBF networks. In the test phase, a bank of dynamical estimators is constructed for all the training gait patterns. By comparing the set of estimators with a test gait pattern, a set of recognition errors are generated, and the average L1 norms of the errors are taken as the similarity measure between the dynamics of the training gait patterns and the dynamics of the test gait pattern. The test gait pattern similar to one of the training gait patterns can be recognized according to the smallest error principle. Finally, the recognition performance of the proposed algorithm is comparatively illustrated to take into consideration the published gait recognition approaches on the CASIA gait database (Dataset B).

Journal ArticleDOI
TL;DR: The silhouette problem that exists in previous interference-based encryption methods with two POMs can be eliminated during the generation procedure of Poms based on the interference principle and the multiplexing capacity is analyzed through the correlation coefficient.
Abstract: An approach for multiple-image encryption based on interference and position multiplexing is proposed. In the encryption process, multiple images are analytically hidden into three phase-only masks (POMs). The encryption algorithm for this method is quite simple and does not need iterative encoding. For decryption, both the digital method and optical method could be employed. Also, we analyze the multiplexing capacity through the correlation coefficient. In addition, the silhouette problem that exists in previous interference-based encryption methods with two POMs can be eliminated during the generation procedure of POMs based on the interference principle. Simulation results are presented to verify the validity of the proposed approach.

Journal ArticleDOI
TL;DR: A new gait recognition method that combines holistic and model-based features that is able to capture more detailed sub-dynamics by refining upon the preceding general dynamics.
Abstract: We propose a new gait recognition method that combines holistic and model-based features. Both types of features are extracted automatically from gait silhouette sequences and their combination takes place by means of a pair of hidden Markov models. In the proposed system, the holistic features are initially used for capturing general gait dynamics whereas, subsequently, the model-based features are deployed for capturing more detailed sub-dynamics by refining upon the preceding general dynamics. Furthermore, the holistic and model-based features are suitably processed in order to improve the discriminatory capacity of the final system. The experimental results show that the proposed method exhibits performance advantages in comparison with popular existing methods.

Journal ArticleDOI
TL;DR: In this article, a driving action dataset was prepared by a side-mounted camera looking at a driver's left profile and the driving actions, including operating the shift lever, talking on a cell phone, eating, and smoking, were decomposed into a number of predefined action primitives.
Abstract: In the field of intelligent transportation system (ITS), automatic interpretation of a driver’s behavior is an urgent and challenging topic. This paper studies vision-based driving posture recognition in the human action recognition framework. A driving action dataset was prepared by a side-mounted camera looking at a driver’s left profile. The driving actions, including operating the shift lever, talking on a cell phone, eating, and smoking, are first decomposed into a number of predefined action primitives, that is, interaction with shift lever, operating the shift lever, interaction with head, and interaction with dashboard. A global grid-based representation for the action primitives was emphasized, which first generate the silhouette shape from motion history image, followed by application of the pyramid histogram of oriented gradients (PHOG) for more discriminating characterization. The random forest (RF) classifier was then exploited to classify the action primitives together with comparisons to some other commonly applied classifiers such as NN, multiple layer perceptron, and support vector machine. Classification accuracy is over 94% for the RF classifier in holdout and cross-validation experiments on the four manually decomposed driving actions.

Proceedings ArticleDOI
13 Oct 2013
TL;DR: Results encourage the use of 3D meshes as opposed to videos or images, given that their direct, real time acquisition is becoming possible due to devices like Leap Motion® or high resolution depth cameras.
Abstract: This paper presents a method for recognizing hand configurations of the Brazilian sign language (LIBRAS) using 3D meshes and 2D projections of the hand. Five actors performing 61 different hand configurations of the LIBRAS language were recorded twice, and the videos were manually segmented to extract one frame with a frontal and one with a lateral view of the hand. For each frame pair, a 3D mesh of the hand was constructed using the Shape from Silhouette method, and the rotation, translation and scale invariant Spherical Harmonics method was used to extract features for classification. A Support Vector Machine (SVM) achieved a correct classification of Rank1 = 86.06% and Rank3 = 96.83% on a database composed of 610 meshes. SVM classification was also performed on a database composed of 610 image pairs using 2D horizontal and vertical projections as features, resulting in Rank1 = 88.69% and Rank3 = 98.36%. Results encourage the use of 3D meshes as opposed to videos or images, given that their direct, real time acquisition is becoming possible due to devices like Leap Motion® or high resolution depth cameras.

Journal ArticleDOI
TL;DR: Differently from recent approaches, this work shows how the combination of the 3D gradient and textural appearance improves the recognition accuracy, compared to methods based on silhouette or not textural feature extractors.

Patent
27 Mar 2013
TL;DR: In this paper, a computer graphic editing or modeling system that automatically alters a computer graphics object based on a user sketch is presented, where the sketch is placed in proximity to some feature of the image space view.
Abstract: A computer graphic editing or modeling system that automatically alters a computer graphic object based on a user sketch. The computer graphic object may be presented as an image space view of the object (proxy). The sketch is placed in proximity to some feature of the image space view. The system matches the sketch with the feature taking into account silhouettes, which may be derived by way of depth continuity and depth gradient similarity, of the object and matching the silhouette with the feature based on proximity and shape. The matched handle silhouette is transformed to associated handle vertices of a mesh of the graphic object. The system may then deform the mesh based on the user sketch by obtaining a dimensional relationship between the user sketch and the associated silhouette and applying the dimensional relationship to a region of interest, which includes the handle vertices.

Proceedings ArticleDOI
01 Sep 2013
TL;DR: It is shown, that GEI can even be outperformed by directly applying gradient histogram extraction on the already bina-rized silhouettes, and with a new part-based extension, recognition performance can be further improved.
Abstract: In this paper, we exploit gradient histograms for person identification based on gait. A traditional and successful method for gait recognition is the Gait Energy Image (GEI). Here, person silhouettes are averaged over full gait cycles, which leads to a robust and efficient representation. However, binarized silhouettes only capture edge information at the boundary of the person. By contrast, the Gradient Histogram Energy Image (GHEI) also captures edges within the silhouette by means of gradient histograms. Combined with precise α-matte preprocessing and with a new part-based extension, recognition performance can be further improved. In addition, we show, that GEI can even be outperformed by directly applying gradient histogram extraction on the already bina-rized silhouettes. We run all experiments on the widely used HumanID gait database and show significant performance improvements over the current state of the art.

Journal ArticleDOI
TL;DR: This paper presents a new appearance descriptor that is distinctive and resilient to noise for 3D human pose estimation and combines the proposed appearance descriptor with a shape descriptor computed from the silhouette of the human subject using discriminative learning.

Proceedings ArticleDOI
01 Jan 2013
TL;DR: A region-based approach that fuses Graph-cut segmentation with human object detection and shows effectiveness in getting region based silhouette of a player in the sports video and also supports its suitability for segmenting videos with dynamic backgrounds.
Abstract: In this paper we presents a novel and effective way for extracting a region based silhouette of a human with a moving background thereby facilitating subsequent analysis like action recognition. The system first detects objects in the video that can be classified as human or non human. For this, the Histogram of Oriented Gradients (HOG) is used as descriptors and Support Vector Machine (SVM) is used as a classifier. The localized human part also contains unnecessary background information. Hence, we propose to use Graph-Cut method for extracting the foreground (humans) information from the video. Since our goal is to extract only human regions, we propose a region-based approach that fuses Graph-cut segmentation with human object detection. Sports videos are used to test the proposed system and algorithms, and the extensive and encouraging experimental results show their effectiveness in getting region based silhouette of a player in the sports video and also supports its suitability for segmenting videos with dynamic backgrounds.

Journal ArticleDOI
TL;DR: A region based method to recognize human actions from video sequences that works with the surrounding regions of the human silhouette termed as negative space, which addresses the problem of long shadows which is one of the major challenges of human action recognition.

Journal ArticleDOI
TL;DR: Experimental results show that the retrieval results using the salient views are comparable to the existing light field descriptor method, and the method achieves a 15-fold speedup in the feature extraction computation time.
Abstract: This paper presents a method for selecting salient 2D views to describe 3D objects for the purpose of retrieval. The views are obtained by first identifying salient points via a learning approach that uses shape characteristics of the 3D points (Atmosukarto and Shapiro in International workshop on structural, syntactic, and statistical pattern recognition, 2008; Atmosukarto and Shapiro in ACM multimedia information retrieval, 2008). The salient views are selected by choosing views with multiple salient points on the silhouette of the object. Silhouette-based similarity measures from Chen et al. (Comput Graph Forum 22(3):223–232, 2003) are then used to calculate the similarity between two 3D objects. Retrieval experiments were performed on three datasets: the Heads dataset, the SHREC2008 dataset, and the Princeton dataset. Experimental results show that the retrieval results using the salient views are comparable to the existing light field descriptor method (Chen et al. in Comput Graph Forum 22(3):223–232, 2003), and our method achieves a 15-fold speedup in the feature extraction computation time.

Proceedings ArticleDOI
19 Jul 2013
TL;DR: An intuitive interface for easy modeling of terrains, compatible with example-based synthesis approach, using Digital Elevation Models of real world terrains as source data, and a weighted sum function to continually combine the heights.
Abstract: We present an intuitive interface for easy modeling of terrains, compatible with example-based synthesis approach. The interface consists on a picture's canvas-like screen, where the user sketches silhouettes of mountains, as he would do if drawing mountains on a piece of paper. Realistic results are achieved by combining copies of the example terrain in such a manner that matches the sketched silhouette. We use Digital Elevation Models (DEM) of real world terrains as source data, and a weighted sum function to continually combine the heights.

Patent
25 Nov 2013
TL;DR: In this article, the authors propose a method for extracting a silhouette of a physical object from the captured image and mapping it over a 3D geometry, incorporating said virtual object as one more element in the virtual scene, and orienting the virtual object with regard to the virtual camera.
Abstract: The method comprises capturing by at least a camera an image of a physical object against a background; extracting a silhouette of said physical object from the captured image and mapping it over a three dimensional geometry; incorporating said virtual object as one more element in the virtual scene; and orienting said virtual object with regard to the virtual camera. Embodiments of the method further comprises obtaining and using intrinsic and/or extrinsic parameters of said physical camera and said captured image to calculate said physical object position; projecting back said captured image over the three dimensional geometry using said intrinsic and/or extrinsic parameters; and placing the virtual object in the virtual scene and selecting an axis of rotation to orient the virtual object with regard to the virtual camera based on said calculated position of the physical object.