scispace - formally typeset
Search or ask a question

Showing papers by "Matthew Turk published in 2016"


Journal ArticleDOI
01 Nov 2016
TL;DR: The proposed 2D K-L divergence improves the accuracy of image segmentation; the MPSO overcomes the drawback of premature convergence of PSO by improving the location update formulation and the global best position of particles, and reduces drastically the time complexity of multilevel thresholding segmentation.
Abstract: Graphical abstractDisplay Omitted HighlightsWe propose the 2D K-L divergence applied to multilevel image segmentation and infer the formulation of 2D K-L divergence as an objective function of multilevel image segmentation.We propose MPSO that modifies the location update formula and the global best position of particles to overcome the premature convergence of PSO.We propose a scheme that denotes 2D K-L divergence as the fitness function of MPSO, which improves the effectiveness of the segmentation and reduces the time complexity. Multilevel image segmentation is a technique that divides images into multiple homogeneous regions. In order to improve the effectiveness and efficiency of multilevel image thresholding segmentation, we propose a segmentation algorithm based on two-dimensional (2D) Kullback-Leibler(K-L) divergence and modified Particle Swarm Optimization (MPSO). This approach calculates the 2D K-L divergence between an image and its segmented result by adopting 2D histogram as the distribution function, then employs the sum of divergences of different regions as the fitness function of MPSO to seek the optimal thresholds. The proposed 2D K-L divergence improves the accuracy of image segmentation; the MPSO overcomes the drawback of premature convergence of PSO by improving the location update formulation and the global best position of particles, and reduces drastically the time complexity of multilevel thresholding segmentation. Experiments were conducted extensively on the Berkeley Segmentation Dataset and Benchmark (BSDS300), and four performance indices of image segmentation - BDE, PRI, GCE and VOI - were tested. The results show the robustness and effectiveness of the proposed algorithm.

63 citations


Proceedings ArticleDOI
01 Oct 2016
TL;DR: The distributed camera model is introduced, a novel model for Structure-from-Motion (SfM) that describes image observations in terms of light rays with ray origins and directions rather than pixels and computes a solution that is up to 8 times more efficient and robust to rotation singularities in comparison with gDLS.
Abstract: We introduce the distributed camera model, a novel model for Structure-from-Motion (SfM). This model describes image observations in terms of light rays with ray origins and directions rather than pixels. As such, the proposed model is capable of describing a single camera or multiple cameras simultaneously as the collection of all light rays observed. We show how the distributed camera model is a generalization of the standard camera model and we describe a general formulation and solution to the absolute camera pose problem that works for standard or distributed cameras. The proposed method computes a solution that is up to 8 times more efficient and robust to rotation singularities in comparison with gDLS[21]. Finally, this method is used in an novel large-scale incremental SfM pipeline where distributed cameras are accurately and robustly merged together. This pipeline is a direct generalization of traditional incremental SfM, however, instead of incrementally adding one camera at a time to grow the reconstruction the reconstruction is grown by adding a distributed camera. Our pipeline produces highly accurate reconstructions efficiently by avoiding the need for many bundle adjustment iterations and is capable of computing a 3D model of Rome from over 15,000 images in just 22 minutes.

44 citations


Patent
03 Jun 2016
TL;DR: In this paper, the authors present a system, methods, devices, and software for an augmented shared visual space for live mobile remote collaboration on physical tasks, where participants can explore a scene in location A independently of one or more local participants current camera position in location B, and communicate via spatial annotations that are immediately visible to all other participants in augmented reality.
Abstract: Various embodiments each include at least one of systems, methods, devices, and software for an augmented shared visual space for live mobile remote collaboration on physical tasks. One or more participants in location A can explore a scene in location B independently of one or more local participants current camera position in location B, and can communicate via spatial annotations that are immediately visible to all other participants in augmented reality.

37 citations


Proceedings ArticleDOI
19 Mar 2016
TL;DR: By first classifying which type of gesture the user drew, it is shown that it is possible to render the 2D annotations in 3D in a way that conforms more to the original intention of the user than with traditional methods.
Abstract: A 2D gesture annotation provides a simple way to annotate the physical world in augmented reality for a range of applications such as remote collaboration. When rendered from novel viewpoints, these annotations have previously only worked with statically positioned cameras or planar scenes. However, if the camera moves and is observing an arbitrary environment, 2D gesture annotations can easily lose their meaning when shown from novel viewpoints due to perspective effects. In this paper, we present a new approach towards solving this problem by using a gesture enhanced annotation interpretation. By first classifying which type of gesture the user drew, we show that it is possible to render the 2D annotations in 3D in a way that conforms more to the original intention of the user than with traditional methods. We first determined a generic vocabulary of important 2D gestures for an augmented reality enhanced remote collaboration scenario by running an Amazon Mechanical Turk study with 88 participants. Next, we designed a novel real-time method to automatically handle the two most common 2D gesture annotations — arrows and circles — and give a detailed analysis of the ambiguities that must be handled in each case. Arrow gestures are interpreted by identifying their anchor points and using scene surface normals for better perspective rendering. For circle gestures, we designed a novel energy function to help infer the object of interest using both 2D image cues and 3D geometric cues. Results indicate that our method outperforms previous approaches by better conveying the meaning of the original drawing from different viewpoints.

32 citations


Proceedings ArticleDOI
02 Nov 2016
TL;DR: A novel 2D gesture annotation method for use in image-based 3D reconstructed scenes with applications in collaborative virtual and augmented reality and its use in a synchronous local-remote user collaboration system is highlighted.
Abstract: We present a novel 2D gesture annotation method for use in image-based 3D reconstructed scenes with applications in collaborative virtual and augmented reality. Image-based reconstructions allow users to virtually explore a remote environment using image-based rendering techniques. To collaborate with other users, either synchronously or asynchronously, simple 2D gesture annotations can be used to convey spatial information to another user. Unfortunately, prior methods are either unable to disambiguate such 2D annotations in 3D from novel viewpoints or require relatively dense reconstructions of the environment. In this paper, we propose a simple multi-view annotation method that is useful in a variety of scenarios and applicable to both very sparse and dense 3D reconstructions. Specifically, we employ interactive disambiguation of the 2D gestures via a second annotation drawn from another viewpoint, triangulating two drawings to achieve a 3D result. Our method automatically chooses an appropriate second viewpoint and uses image-based rendering transitions to keep the user oriented while moving to the second viewpoint. User experiments in an asynchronous collaboration scenario demonstrate the usability of the method and its superiority over a baseline method. In addition, we showcase our method running on a variety of image-based reconstruction datasets and highlight its use in a synchronous local-remote user collaboration system.

20 citations


Proceedings ArticleDOI
02 Nov 2016
TL;DR: A new optical design for head-mounted displays (HMD) which has an exceptionally wide field of view (FOV) based on seamless lenses and screens curved around the eyes is presented, suggesting a feasible way to significantly expand the FOV of HMDs.
Abstract: We present a new optical design for head-mounted displays (HMD) which has an exceptionally wide field of view (FOV). It can cover even the full human FOV. It is based on seamless lenses and screens curved around the eyes. The proof-of-concept prototypes are promising, and one of them far exceeds the human FOV, although the effective FOV is limited by the anatomy of the human head. The presented optical design has advantages such as compactness, light weight, low cost and super-wide FOV with high resolution. Even though this is still work-in-progress and display functionality is not yet implemented, it suggests a feasible way to significantly expand the FOV of HMDs.

11 citations


Proceedings ArticleDOI
07 Oct 2016
TL;DR: A new optical design for head-mounted displays (HMD) that has an exceptionally wide field of view (FOV) based on seamless lenses and screens curved around the eyes is presented, suggesting a feasible way to significantly expand the FOV of HMDs.
Abstract: We present a new optical design for head-mounted displays (HMD) that has an exceptionally wide field of view (FOV). It can cover even the full human FOV. It is based on seamless lenses and screens curved around the eyes. We constructed several compact and lightweight proof-of-concept prototypes of the optical design. One of them far exceeds the human FOV, although the anatomy of the human head limits the effective FOV. The presented optical design has advantages such as compactness, light weight, low cost and superwide FOV with high resolution. The prototypes are promising, and though this is still work-in-progress and display functionality is not yet implemented, it suggests a feasible way to significantly expand the FOV of HMDs.

9 citations


Posted Content
TL;DR: The distributed camera model as mentioned in this paper is a generalization of the standard camera model and describes a general formulation and solution to the absolute camera pose problem that works for standard or distributed cameras.
Abstract: We introduce the distributed camera model, a novel model for Structure-from-Motion (SfM). This model describes image observations in terms of light rays with ray origins and directions rather than pixels. As such, the proposed model is capable of describing a single camera or multiple cameras simultaneously as the collection of all light rays observed. We show how the distributed camera model is a generalization of the standard camera model and describe a general formulation and solution to the absolute camera pose problem that works for standard or distributed cameras. The proposed method computes a solution that is up to 8 times more efficient and robust to rotation singularities in comparison with gDLS. Finally, this method is used in an novel large-scale incremental SfM pipeline where distributed cameras are accurately and robustly merged together. This pipeline is a direct generalization of traditional incremental SfM; however, instead of incrementally adding one camera at a time to grow the reconstruction the reconstruction is grown by adding a distributed camera. Our pipeline produces highly accurate reconstructions efficiently by avoiding the need for many bundle adjustment iterations and is capable of computing a 3D model of Rome from over 15,000 images in just 22 minutes.

7 citations


Proceedings ArticleDOI
01 Sep 2016
TL;DR: A novel segmentation algorithm that utilizes both 2D and 3D scene cues, structured into a three-layer graph of pixels, 3D points, and volumes (supervoxels), solved via standard graph cut algorithms is proposed.
Abstract: We present a method for collaborative augmented reality (AR) that enables users from different viewpoints to interpret object references specified via 2D on-screen circling gestures. Based on a user's 2D drawing annotation, the method segments out the userselected object using an incomplete or imperfect scene model and the color image from the drawing viewpoint. Specifically, we propose a novel segmentation algorithm that utilizes both 2D and 3D scene cues, structured into a three-layer graph of pixels, 3D points, and volumes (supervoxels), solved via standard graph cut algorithms. This segmentation enables an appropriate rendering of the user's 2D annotation from other viewpoints in 3D augmented reality. Results demonstrate the superiority of the proposed method over existing methods.

6 citations


Proceedings ArticleDOI
01 Dec 2016
TL;DR: The proposed strategy reduces the false positive rate and increases the accuracy of detecting instances from novel classes and uses two parallel hyperplanes to learn the normal region of the decision scores of the target class.
Abstract: This work introduces the one-class slab SVM (OCSSVM), a one-class classifier that aims at improving the performance of the one-class SVM. The proposed strategy reduces the false positive rate and increases the accuracy of detecting instances from novel classes. To this end, it uses two parallel hyperplanes to learn the normal region of the decision scores of the target class. OCSSVM extends one-class SVM since it can scale and learn non-linear decision functions via kernel methods. The experiments on two publicly available datasets show that OCSSVM can consistently outperform the one-class SVM and perform comparable to or better than other state-of-the-art one-class classifiers.

6 citations


Posted Content
TL;DR: One-class slab SVM (OCSSVM) as mentioned in this paper extends one-class SVM by using two parallel hyperplanes to learn the normal region of the decision scores of the target class.
Abstract: This work introduces the one-class slab SVM (OCSSVM), a one-class classifier that aims at improving the performance of the one-class SVM. The proposed strategy reduces the false positive rate and increases the accuracy of detecting instances from novel classes. To this end, it uses two parallel hyperplanes to learn the normal region of the decision scores of the target class. OCSSVM extends one-class SVM since it can scale and learn non-linear decision functions via kernel methods. The experiments on two publicly available datasets show that OCSSVM can consistently outperform the one-class SVM and perform comparable to or better than other state-of-the-art one-class classifiers.

Proceedings ArticleDOI
17 Oct 2016
TL;DR: This paper presents novel designs for casual (short-term) immersive viewing of spatial and 3D content, such as augmented and virtual reality, with smartphones, to create a simple and low-cost casual-viewing design which could be retrofitted and eventually be embedded into smartphones, instead of using larger spatial viewing accessories.
Abstract: In this paper, we explore how to better integrate virtual reality viewing to a smartphone. We present novel designs for casual (short-term) immersive viewing of spatial and 3D content, such as augmented and virtual reality, with smartphones. Our goal is to create a simple and low-cost casual-viewing design which could be retrofitted and eventually be embedded into smartphones, instead of using larger spatial viewing accessories. We explore different designs and implemented several prototypes. One prototype uses thin and light near-to-eye optics with a smartphone display, thus providing the user with the functionality of a large, high-resolution virtual display. Our designs also enable 3D user interfaces. Easy interaction through various gestures and other modalities is possible by using the inertial and other sensors and camera of the smartphone. Our preliminary concepts are a starting point for exploring useful constructions and designs for such usage.

Proceedings ArticleDOI
19 Mar 2016
TL;DR: By first classifying which type of gesture the user drew, it is shown that it is possible to render annotations in 3D in a way that conforms more to the original intention of the user than with traditional methods.
Abstract: Augmented reality enhanced collaboration systems often allow users to draw 2D gesture annotations onto video feeds to help collaborators to complete physical tasks. This works well for static cameras, but for movable cameras, perspective effects cause problems when trying to render 2D annotations from a new viewpoint in 3D. In this paper, we present a new approach towards solving this problem by using gesture enhanced annotations. By first classifying which type of gesture the user drew, we show that it is possible to render annotations in 3D in a way that conforms more to the original intention of the user than with traditional methods. We first determined a generic vocabulary of important 2D gestures for remote collaboration by running an Amazon Mechanical Turk study with 88 participants. Next, we designed a novel system to automatically handle the top two 2D gesture annotations — arrows and circles. Arrows are handled by identifying their anchor points and using surface normals for better perspective rendering. For circles, we designed a novel energy function to help infer the object of interest using both 2D image cues and 3D geometric cues. Results indicate that our approach outperforms previous methods in terms of better conveying the original drawing's meaning from different viewpoints.