scispace - formally typeset
Search or ask a question

Showing papers by "Matthew Turk published in 2015"


Proceedings ArticleDOI
07 Dec 2015
TL;DR: This paper presents a novel optimization that improves the quality of the relative geometries in the viewing graph by enforcing loop consistency constraints with the epipolar point transfer and removes the need for filtering steps or robust algorithms typically used in global SfM methods.
Abstract: The viewing graph represents a set of views that are related by pairwise relative geometries. In the context of Structure-from-Motion (SfM), the viewing graph is the input to the incremental or global estimation pipeline. Much effort has been put towards developing robust algorithms to overcome potentially inaccurate relative geometries in the viewing graph during SfM. In this paper, we take a fundamentally different approach to SfM and instead focus on improving the quality of the viewing graph before applying SfM. Our main contribution is a novel optimization that improves the quality of the relative geometries in the viewing graph by enforcing loop consistency constraints with the epipolar point transfer. We show that this optimization greatly improves the accuracy of relative poses in the viewing graph and removes the need for filtering steps or robust algorithms typically used in global SfM methods. In addition, the optimized viewing graph can be used to efficiently calibrate cameras at scale. We combine our viewing graph optimization and focal length calibration into a global SfM pipeline that is more efficient than existing approaches. To our knowledge, ours is the first global SfM pipeline capable of handling uncalibrated image sets.

124 citations


Proceedings ArticleDOI
13 Oct 2015
TL;DR: A comprehensive multi-view geometry library that focuses on large-scale SfM pipelines and contains clean code that is well documented, easy to extend, and active contributors from the open-source community.
Abstract: In this paper, we have presented a comprehensive multi-view geometry library, Theia, that focuses on large-scale SfM. In addition to state-of-the-art scalable SfM pipelines, the library provides numerous tools that are useful for students, researchers, and industry experts in the field of multi-view geometry. Theia contains clean code that is well documented (with code comments and the website) and easy to extend. The modular design allows for users to easily implement and experiment with new algorithms within our current pipeline without having to implement a full end-to-end SfM pipeline themselves. Theia has already gathered a large number of diverse users from universities, startups, and industry and we hope to continue to gather users and active contributors from the open-source community.

81 citations


Proceedings ArticleDOI
29 Sep 2015
TL;DR: This work proposes a novel formulation for determining the absolute pose of a single or multi-camera system given a known vertical direction and shows its improved robustness to image and IMU noise compared to the current state of the art.
Abstract: We propose a novel formulation for determining the absolute pose of a single or multi-camera system given a known vertical direction. The vertical direction may be easily obtained by detecting the vertical vanishing points with computer vision techniques, or with the aid of IMU sensor measurements from a smartphone. Our solver is general and able to compute absolute camera pose from two 2D-3D correspondences for single or multi-camera systems. We run several synthetic experiments that demonstrate our algorithm's improved robustness to image and IMU noise compared to the current state of the art. Additionally, we run an image localization experiment that demonstrates the accuracy of our algorithm in real-world scenarios. Finally, we show that our algorithm provides increased performance for real-time model-based tracking compared to solvers that do not utilize the vertical direction and show our algorithm in use with an augmented reality application running on a Google Tango tablet.

48 citations


Proceedings ArticleDOI
18 Mar 2015
TL;DR: A novel system that can spatio-temporally detect divided attention in users during two different reading applications: typical document reading and speed reading is presented.
Abstract: Reading is central to learning and communicating, however, divided attention in the form of distraction may be present in learning environments, resulting in a limited understanding of the reading material. This paper presents a novel system that can spatio-temporally detect divided attention in users during two different reading applications: typical document reading and speed reading. Eye tracking and electroencephalography (EEG) monitor the user during reading and provide a classifier with data to decide the user's attention state. The multimodal data informs the system where the user was distracted spatially in the user interface and when the user was distracted. Classification was evaluated with two exploratory experiments. The first experiment was designed to divide the user's attention with a multitasking scenario. The second experiment was designed to divide the users attention by simulating a real-world scenario where the reader is interrupted by unpredictable audio distractions. Results from both experiments show that divided attention may be detected spatio-temporally well above chance on a single-trial basis.

30 citations


Proceedings ArticleDOI
07 Jun 2015
TL;DR: This work utilizes a known vertical direction of the generalized cameras to solve the generalized relative pose and scale problem as an efficient Quadratic Eigenvalue Problem, which is the first method for computing similarity transformations that does not require any 3D information.
Abstract: We propose a novel solution for computing the relative pose between two generalized cameras that includes reconciling the internal scale of the generalized cameras. This approach can be used to compute a similarity transformation between two coordinate systems, making it useful for loop closure in visual odometry and registering multiple structure from motion reconstructions together. In contrast to alternative similarity transformation methods, our approach uses 2D-2D image correspondences thus is not subject to the depth uncertainty that often arises with 3D points. We utilize a known vertical direction (which may be easily obtained from IMU data or vertical vanishing point detection) of the generalized cameras to solve the generalized relative pose and scale problem as an efficient Quadratic Eigenvalue Problem. To our knowledge, this is the first method for computing similarity transformations that does not require any 3D information. Our experiments on synthetic and real data demonstrate that this leads to improved performance compared to methods that use 3D-3D or 2D-3D correspondences, especially as the depth of the scene increases.

26 citations


Book ChapterDOI
01 Jan 2015
TL;DR: This chapter discusses the use of computer vision for mobile augmented reality and presents work on a vision-based AR application (mobile sign detection and translation), a vision -supplied AR resource (indoor localization and post estimation), and a low-level correspondence tracking and model estimation approach to increase accuracy and efficiency ofComputer vision methods in augmented reality.
Abstract: Mobile augmented reality (AR) employs computer vision capabilities in order to properly integrate the real and the virtual, whether that integration involves the user’s location, object-based interaction, 2D or 3D annotations, or precise alignment of image overlays. Real-time vision technologies vital for the AR context include tracking, object and scene recognition, localization, and scene model construction. For mobile AR, which has limited computational resources compared with static computing environments, efficient processing is critical, as are consideration of power consumption (i.e., battery life), processing and memory limitations, lag, and the processing and display requirements of the foreground application. On the other hand, additional sensors (such as gyroscopes, accelerometers, and magnetometers) are typically available in the mobile context, and, unlike many traditional computer vision applications, user interaction is often available for user feedback and disambiguation. In this chapter, we discuss the use of computer vision for mobile augmented reality and present work on a vision-based AR application (mobile sign detection and translation), a vision-supplied AR resource (indoor localization and post estimation), and a low-level correspondence tracking and model estimation approach to increase accuracy and efficiency of computer vision methods in augmented reality.

15 citations


Proceedings ArticleDOI
19 Oct 2015
TL;DR: This paper proposes an algorithm based on high-order two-view co-labeling to simultaneously retarget a given stereo pair and preserve its 2D as well as 3D quality.
Abstract: The major objective of image retargeting algorithms is to preserve the viewer's perception while adjusting the aspect ratio of an image. This means that an ideal retargeting algorithm has to be able to preserve high-level semantics and avoid generating low-level image distortion. Stereoscopic image retargeting poses a even more challenging problem in that the 3D perception has to be preserved as well. In this paper, we propose an algorithm based on high-order two-view co-labeling to simultaneously retarget a given stereo pair and preserve its 2D as well as 3D quality. Our experimental results qualitatively demonstrate the improved ability of preserving 2D image structures in both views. In addition, we show quantitatively that our algorithm improves upon the state-of-the-art up to 85% in terms of a measurement based on depth distortion.

7 citations


Proceedings Article
01 Jan 2015
TL;DR: An algorithm is presented to segment the selected object, including its occluded surfaces, such that the 2D selection can be appropriately interpreted in 3D and rendered as a useful AR annotation even when the local user moves and significantly changes the viewpoint.
Abstract: In Augmented Reality (AR) based remote collaboration, a remote user can draw a 2D annotation that emphasizes an object of interest to guide a local user accomplishing a task. This annotation is typically performed only once and then sticks to the selected object in the local user's view, independent of his or her camera movement. In this paper, we present an algorithm to segment the selected object, including its occluded surfaces, such that the 2D selection can be appropriately interpreted in 3D and rendered as a useful AR annotation even when the local user moves and significantly changes the viewpoint.

6 citations


Proceedings ArticleDOI
28 Feb 2015
TL;DR: This study investigates ways to motivate local crowds to serve as the world's sensors and provide geographical data about their surroundings through games and a pilot study to understand whether people can motivate people to contribute data to their neighborhoods via games or for the greater social good of helping the neighborhood.
Abstract: Organizations invest resources to gather geographical information about cities or neighborhoods. This can help governments or companies identify needed services or city improvements. However, collecting this information can be difficult and expensive. In this study we investigate ways to motivate local crowds to serve as the world's sensors and provide geographical data about their surroundings. We conduct interviews and a pilot study to understand whether we can motivate people to contribute data about their neighborhoods via games or for the greater social good of helping the neighborhood. Our results provide a glimpse of how people feel about donating neighborhood data given different motivators; they also provide insight into the amount of data people are willing to contribute. We conclude by discussing possible design implications of our findings.

6 citations


Book ChapterDOI
14 Dec 2015
TL;DR: This work implemented a gaze correction system that can automatically maintain eye contact by replacing the eyes of the user with the direct looking eyes (looking directly into the camera) captured in the initialization stage.
Abstract: In traditional video conferencing systems, it is impossible for users to have eye contact when looking at the conversation partner’s face displayed on the screen, due to the disparity between the locations of the camera and the screen. In this work, we implemented a gaze correction system that can automatically maintain eye contact by replacing the eyes of the user with the direct looking eyes (looking directly into the camera) captured in the initialization stage. Our real-time system has good robustness against different lighting conditions and head poses, and it provides visually convincing and natural results while relying only on a single webcam that can be positioned almost anywhere around the screen.

6 citations


Proceedings ArticleDOI
29 Sep 2015
TL;DR: In this paper, the authors present an algorithm to segment the selected object, including its occluded surfaces, such that the 2D selection can be appropriately interpreted in 3D and rendered as a useful AR annotation even when the local user moves and significantly changes the viewpoint.
Abstract: In Augmented Reality (AR) based remote collaboration, a remote user can draw a 2D annotation that emphasizes an object of interest to guide a local user accomplishing a task. This annotation is typically performed only once and then sticks to the selected object in the local user's view, independent of his or her camera movement. In this paper, we present an algorithm to segment the selected object, including its occluded surfaces, such that the 2D selection can be appropriately interpreted in 3D and rendered as a useful AR annotation even when the local user moves and significantly changes the viewpoint.

Journal ArticleDOI
TL;DR: This editorial introduction complements the shorter introduction to the first part of the two-part special issue on Behavior Understanding for Arts and Entertainment and discusses general questions and challenges that were suggested by the entire set of seven articles of the special issue and by the comments of the reviewers.
Abstract: This editorial introduction complements the shorter introduction to the first part of the two-part special issue on Behavior Understanding for Arts and Entertainment. It offers a more expansive discussion of the use of behavior analysis for interactive systems that involve creativity, either for the producer or the consumer of such a system. We first summarise the two articles that appear in this second part of the special issue. We then discuss general questions and challenges in this domain that were suggested by the entire set of seven articles of the special issue and by the comments of the reviewers of these articles.

Proceedings ArticleDOI
05 Jan 2015
TL;DR: Computer vision and computational photography techniques can be applied to provide image composites, such as panoramas, high dynamic range images, and stroboscopic images, as well as automatically selecting individual alternative frames.
Abstract: Cameras are becoming increasingly aware of the picture-taking context, collecting extra information around the act of photographing. This contextual information enables the computational generation of a wide range of enhanced photographic outputs, effectively expanding the imaging experience provided by consumer cameras. Computer vision and computational photography techniques can be applied to provide image composites, such as panoramas, high dynamic range images, and stroboscopic images, as well as automatically selecting individual alternative frames. Our technology can be integrated into point-and shoot cameras, and it effectively expands the photographic possibilities for casual and amateur users, who often rely on automatic camera modes.

Proceedings ArticleDOI
10 Dec 2015
TL;DR: This paper pioneers a method for local color editing on stereo image pairs by introducing recent advances in the field of image segmentation, thus allowing a user's edits in one view to be simultaneously performed in the other view.
Abstract: This paper pioneers a method for local color editing on stereo image pairs. We generalize the conventional edit propagation framework to stereo views by introducing recent advances in the field of image segmentation, thus allowing a user's edits in one view to be simultaneously performed in the other view. This new formulation maintains consistent editing quality in both views and avoids singularities by solving a well regularized linear system for edit propagation.

Journal ArticleDOI
TL;DR: This editorial introduction describes the aims and scope of the special issue of the ACM Transactions on Interactive Intelligent Systems on Behavior Understanding for Arts and Entertainment and describes each of the five articles included.
Abstract: This editorial introduction describes the aims and scope of the special issue of the ACM Transactions on Interactive Intelligent Systems on Behavior Understanding for Arts and Entertainment, which is being published in issues 2 and 3 of volume 5 of the journal. Here we offer a brief introduction to the use of behavior analysis for interactive systems that involve creativity in either the creator or the consumer of a work of art. We then characterize each of the five articles included in this first part of the special issue, which span a wide range of applications.