scispace - formally typeset
Search or ask a question

Showing papers by "Matthew Turk published in 2006"


Journal ArticleDOI
TL;DR: Preliminary experimental results show that the probabilistic facial expression model on manifold significantly improves facial deformation tracking and expression recognition.

267 citations


Journal ArticleDOI
TL;DR: It is shown how the input channels are integrated to use the modalities beneficially and how this enhances the interface's overall usability.
Abstract: An augmented reality system enhances a mobile user's situational awareness and provides new visualization functionality. The custom-built multimodal interface provides access to information encountered in urban environments. In this article, we detail our experiences with various input devices and modalities and discuss their advantages and drawbacks in the context of interaction tasks in mobile computing. We show how we integrated the input channels to use the modalities beneficially and how this enhances the interface's overall usability

59 citations


01 Jan 2006
TL;DR: In this paper, an isometric self-organizing map (ISO-SOM) method is proposed for nonlinear dimensionality reduction, which integrates a self-organized map model and an ISOMAP dimension reduction algorithm, organizing the high dimension data in a low dimension lattice structure.
Abstract: We propose an isometric self-organizing map (ISO-SOM) method for nonlinear dimensionality reduction, which integrates a self-organizing map model and an ISOMAP dimension reduction algorithm, organizing the high dimension data in a low dimension lattice structure. We apply the proposed method to the problem of appearance-based 3D hand posture estimation. As a learning stage, we use a realistic 3D hand model to generate data encoding the mapping between the hand pose space and the image feature space. The intrinsic dimension of such nonlinear mapping is learned by ISOSOM, which clusters the data into a lattice map. We perform 3D hand posture estimation on this map, showing that the ISOSOM algorithm performs better than traditional image retrieval algorithms for pose estimation. We also show that a 2.5D feature representation based on depth edges is clearly superior to intensity edge features commonly used in previous methods.

47 citations


Proceedings ArticleDOI
01 Oct 2006
TL;DR: A novel body-section labeling module based on spatial hidden-Markov models (HMM) allows different processing policies to be applied in different body sections and works robustly despite the large variations in clinical PET images.
Abstract: We present a system for automatic hot spots detection and segmentation in whole body FDG-PET images. The main contribution of our system is threefold. First, it has a novel body-section labeling module based on spatial hidden-Markov models (HMM); this allows different processing policies to be applied in different body sections. Second, the competition diffusion (CD) segmentation algorithm, which takes into account body-section information, converts the binary thresholding results to probabilistic interpretation and detects hot-spot region candidates. Third, a recursive intensity mode-seeking algorithm finds hot spot centers efficiently, and given these centers, a clinically meaningful protocol is proposed to accurately quantify hot spot volumes. Experimental results show that our system works robustly despite the large variations in clinical PET images.

39 citations


Proceedings ArticleDOI
10 Apr 2006
TL;DR: An isometric self-organizing map (ISO-SOM) method for nonlinear dimensionality reduction, which integrates a self- Organizing map model and an ISOMAP dimension reduction algorithm, organizing the high dimension data in a low dimension lattice structure is proposed.
Abstract: We propose an isometric self-organizing map (ISO-SOM) method for nonlinear dimensionality reduction, which integrates a self-organizing map model and an ISOMAP dimension reduction algorithm, organizing the high dimension data in a low dimension lattice structure. We apply the proposed method to the problem of appearance-based 3D hand posture estimation. As a learning stage, we use a realistic 3D hand model to generate data encoding the mapping between the hand pose space and the image feature space. The intrinsic dimension of such nonlinear mapping is learned by ISOSOM, which clusters the data into a lattice map. We perform 3D hand posture estimation on this map, showing that the ISOSOM algorithm performs better than traditional image retrieval algorithms for pose estimation. We also show that a 2.5D feature representation based on depth edges is clearly superior to intensity edge features commonly used in previous methods.

38 citations


Proceedings ArticleDOI
17 Jun 2006
TL;DR: This work describes a novel approach to appearance-based hand pose estimation which relies on multiple cameras to improve accuracy and resolve ambiguities caused by selfocclusions, and forms the problem in a MAP (maximum a posteriori) framework, where the information from multiple cameras is fused to provide reliable hand poses estimation.
Abstract: We describe a novel approach to appearance-based hand pose estimation which relies on multiple cameras to improve accuracy and resolve ambiguities caused by selfocclusions. Rather than estimating 3D geometry as most previous multi-view imaging systems, our approach uses multiple views to extend current exemplar-based methods that estimate hand pose by matching a probe image with a large discrete set of labeled hand pose images. We formulate the problem in a MAP (maximum a posteriori) framework, where the information from multiple cameras is fused to provide reliable hand pose estimation. Our quantitative experimental results show that correct estimation rate is much higher using our multi-view approach than using a single-view approach.

35 citations


ProceedingsDOI
23 Oct 2006
TL;DR: The program co-chairs for ACM MM 2006 are Yong Rui, Wolfgang Klas, and Ketan Mayer-Patel, who were responsible for selecting the long paper program in the areas of Content, Applications, and Systems, and this rigorous review process resulted in the acceptance of 48 long papers.
Abstract: Welcome to the Fourteenth ACM International Conference on Multimedia (ACM MM 20) , held October 23-27, 2006 at Fess Parker's Doubletree Resort Hotel in beautiful Santa Barbara, California, USA.ACM Multimedia is the premier annual professional meeting for communicating the state-of-the-art in multimedia research, technology, and art. As in previous years, starting with the first ACM Multimedia conference in 1993, the conference seeks to bring together researchers and practitioners in academia, industry, and government who are interested in exploring and exploiting new and multiple media to create new capabilities for human expression, communication, collaboration, and interaction. ACM Multimedia covers all aspects of multimedia computing: from underlying technologies to applications, theory to practice, and servers to networks to devices. Multimedia is an interdisciplinary endeavor, and the variety of conference events reflects this. The overall conference encompasses three major parts: interesting tutorials on Monday, October 23, an exciting three-day main conference on Tuesday through Thursday, October 24-26, and a set of workshops in hot multimedia areas on Friday, October 27.The three-day main conference comprises several different technical program elements, each with separate submissions and reviewing: full papers, short papers, a panel, the doctoral symposium, Brave New Topics sessions, technical demonstrations, a video program, an open source contest, and the Interactive Arts Program. We are excited to host two keynote presentations by Ken Goldberg of UC Berkeley and Bradley Horowitz of Yahoo!, who will give unique perspectives of their work in academia and industry.The program co-chairs for ACM MM 2006 are Yong Rui, Wolfgang Klas, and Ketan Mayer-Patel, who were responsible, along with a program committee of 92 members, for selecting the long paper program in the areas of Content, Applications, and Systems. These tracks received 292 long paper submissions (128 in Content, 100 in Applications, and 64 in Systems). Each paper was reviewed by at least three qualified reviewers in a double-blind review process. The program committee met on June 21, 2006 in Redmond, Washington to discuss the papers and make final selections for papers to be included as oral presentations in the conference program. This rigorous review process resulted in the acceptance of 48 long papers: 21 in the Content track, 16 in the Applications track, and 11 in the Systems track. This represents an acceptance rate of 16 percent. We heartily thank the program co-chairs and the program committee members for their outstanding and dedicated work.The short paper program received 178 submissions and, after a thorough review process, accepted 66 papers, for a 37 percent acceptance rate. These will be presented during poster sessions at the conference. Many thanks to the short paper program co-chairs Brian Bailey, Belle Tseng, and Nalini Venkatasubramanian for an excellent job.

26 citations


Journal ArticleDOI
TL;DR: A novel method to reduce the effect of specularities in digital images using a simple modification of the capture setup: a multi-flash camera is used to take multiple pictures of the scene, each one with a differently positioned light source.
Abstract: We present a novel method to reduce the effect of specularities in digital images. Our approach relies on a simple modification of the capture setup: a multi-flash camera is used to take multiple pictures of the scene, each one with a differently positioned light source. We then formulate the problem of specular highlights reduction as solving a Poisson equation on a gradient field obtained from the input images. The obtained specular reduced image is further refined in a matting process with the maximum composite of the input images. Experimental results are demonstrated on real and synthetic images. The entire setup can be conceivably packaged into a self-contained device, no larger than existing digital cameras.

13 citations


Proceedings ArticleDOI
11 Dec 2006
TL;DR: By varying illumination parameters, such as the number, spatial position, and wavelength of light sources, this work shows that it is able to handle fundamental problems in depth edge detection, including multi-scale depth changes and motion.
Abstract: Sharp discontinuities in depth, or depth edges, are very important low-level features for scene understanding. Recently, we have proposed a solution to the depth edge detection problem using a simple modification of the capture setup: a multi-flash camera with flashes appropriately positioned to cast shadows along depth discontinuities in the scene. In this paper, we show that by varying illumination parameters, such as the number, spatial position, and wavelength of light sources, we are able to handle fundamental problems in depth edge detection, including multi-scale depth changes and motion. The robustness of our methods is demonstrated through our experimental results in complex scenes

10 citations


Journal ArticleDOI
TL;DR: The intermediate-scale local analysis approach used in the proposed face verification system resulted in state-of-the-art face verification performance and high performance loss.

8 citations


Proceedings Article
23 Oct 2006
TL;DR: The program co-chairs for ACM MM 2006 are Yong Rui, Wolfgang Klas, and Ketan Mayer-Patel, who were responsible for selecting the long paper program in the areas of Content, Applications, and Systems, and this rigorous review process resulted in the acceptance of 48 long papers.
Abstract: Welcome to the Fourteenth ACM International Conference on Multimedia (ACM MM 20) , held October 23-27, 2006 at Fess Parker's Doubletree Resort Hotel in beautiful Santa Barbara, California, USA.ACM Multimedia is the premier annual professional meeting for communicating the state-of-the-art in multimedia research, technology, and art. As in previous years, starting with the first ACM Multimedia conference in 1993, the conference seeks to bring together researchers and practitioners in academia, industry, and government who are interested in exploring and exploiting new and multiple media to create new capabilities for human expression, communication, collaboration, and interaction. ACM Multimedia covers all aspects of multimedia computing: from underlying technologies to applications, theory to practice, and servers to networks to devices. Multimedia is an interdisciplinary endeavor, and the variety of conference events reflects this. The overall conference encompasses three major parts: interesting tutorials on Monday, October 23, an exciting three-day main conference on Tuesday through Thursday, October 24-26, and a set of workshops in hot multimedia areas on Friday, October 27.The three-day main conference comprises several different technical program elements, each with separate submissions and reviewing: full papers, short papers, a panel, the doctoral symposium, Brave New Topics sessions, technical demonstrations, a video program, an open source contest, and the Interactive Arts Program. We are excited to host two keynote presentations by Ken Goldberg of UC Berkeley and Bradley Horowitz of Yahoo!, who will give unique perspectives of their work in academia and industry.The program co-chairs for ACM MM 2006 are Yong Rui, Wolfgang Klas, and Ketan Mayer-Patel, who were responsible, along with a program committee of 92 members, for selecting the long paper program in the areas of Content, Applications, and Systems. These tracks received 292 long paper submissions (128 in Content, 100 in Applications, and 64 in Systems). Each paper was reviewed by at least three qualified reviewers in a double-blind review process. The program committee met on June 21, 2006 in Redmond, Washington to discuss the papers and make final selections for papers to be included as oral presentations in the conference program. This rigorous review process resulted in the acceptance of 48 long papers: 21 in the Content track, 16 in the Applications track, and 11 in the Systems track. This represents an acceptance rate of 16 percent. We heartily thank the program co-chairs and the program committee members for their outstanding and dedicated work.The short paper program received 178 submissions and, after a thorough review process, accepted 66 papers, for a 37 percent acceptance rate. These will be presented during poster sessions at the conference. Many thanks to the short paper program co-chairs Brian Bailey, Belle Tseng, and Nalini Venkatasubramanian for an excellent job.

01 Jan 2006
TL;DR: By combining active illumination with viewpoint variation, the proposed framework for robust depth-edge preserving stereo is provided and the usefulness of the techniques in non-photorealistic rendering is shown, with applications in comprehensible rendering, medical imaging and human facial illustrations.
Abstract: Discontinuity modeling and detection has a long history in the field of computer vision, but most methods are of limited use because either they deal with intensity edges, which may not be informative regarding intrinsic object properties, or they attempt to detect discontinuities from noisy dense maps such as stereo or motion, which are particularly error-prone near discontinuities in depth (also known as depth edges or occluding contours). We propose to systematically vary imaging parameters (in particular illumination and viewpoint) in order to detect and analyze depth discontinuities in real-world scenes. We build on promising preliminary research on multi-flash imaging [85], which uses small baseline active illumination to label depth edges in images. We show that by varying illumination parameters (such as the spatial position, number, type, and wavelength of light sources), we are able to handle fundamental problems in depth edge detection, including multi-scale depth changes, specularities and motion. By combining active illumination with viewpoint variation, we provide a framework for robust depth-edge preserving stereo. We propose novel feature maps based on qualitative depth and occlusion analysis, which are useful priors for stereo. Based on these feature maps, we demonstrate enhanced local and global stereo algorithms which produce accurate results near depth discontinuities. Finally, we show the usefulness of our techniques in non-photorealistic rendering, with applications in comprehensible rendering, medical imaging and human facial illustrations. We also demonstrate the importance of depth contours in visual recognition, showing improved results on the problem of fingerspelling recognition.


01 Jan 2006
TL;DR: A probabilistic model based on manifold of facial expression can represent facial expression analytically and globally and the Regional FACS system provides a novel FACS recognition solution with objective measurement.
Abstract: Facial expression is one of the most powerful means for people to coordinate conversation and communicate emotions and other mental, social, and physiological cues. We address two problems in facial expression recognition in this thesis: global facial expression space representation and facial expression recognition method with objective measurement. We propose the concept of the manifold of facial expression based on the observation that the images of all possible facial deformations of an individual make a smooth manifold embedded in a high dimensional image space. To combine the manifolds of different subjects that vary significantly and are usually hard to align, we transfer the facial deformations in all training videos to one standard model. Lipschitz embedding embeds the normalized deformation of the standard model in a low dimensional Generalized Manifold. Deformation data from different subjects complement each other for a better description of the true manifold. We learn a probabilistic expression model on the generalized manifold. There are six kinds of universally recognized facial expressions: happiness, sadness, fear, anger, disgust, and surprise, which we explicitly represent as basic expressions. In the embedded space, a complete expression sequence becomes a path on the expression manifold, emanating from a center that corresponds to the neutral expression. The transition between different expressions is represented as the evolution of the posterior probability of the six basic expressions. These six kinds of basic facial expressions comprise only a small subset of all visible facial deformation. To measure the facial expression recognition rate precisely in the manifold model, we developed Regional FACS (Facial Action Coding System). FACS encodes facial deformation in terms of 44 kinds of Action Units (AU). By learning the AU combinations in 9 separate facial regions, the number of combinations of regional deformations is dramatically decreased compared to the number of combinations of AUs. The manifold of each facial regional can be considered as the sub-vector of the whole manifold. The experimental results demonstrate that our system works effectively for automatic recognition of 29 AUs that cover the most frequently appearing facial deformations. The FACS recognition results also lead to high recognition accuracy of six basic expression categories. The main contributions of this thesis are: (1) A probabilistic model based on manifold of facial expression can represent facial expression analytically and globally; (2) The Regional FACS system provides a novel FACS recognition solution with objective measurement.


01 Jan 2006
TL;DR: In this article, an isometric self-organizing map (ISO-SOM) method is proposed for nonlinear dimensionality reduction, which integrates a self-organized map model and an ISOMAP dimension reduction algorithm, organizing the high dimension data in a low dimension lattice structure.
Abstract: We propose an isometric self-organizing map (ISO-SOM) method for nonlinear dimensionality reduction, which integrates a self-organizing map model and an ISOMAP dimension reduction algorithm, organizing the high dimension data in a low dimension lattice structure. We apply the proposed method to the problem of appearance-based 3D hand posture estimation. As a learning stage, we use a realistic 3D hand model to generate data encoding the mapping between the hand pose space and the image feature space. The intrinsic dimension of such nonlinear mapping is learned by ISOSOM, which clusters the data into a lattice map. We perform 3D hand posture estimation on this map, showing that the ISOSOM algorithm performs better than traditional image retrieval algorithms for pose estimation. We also show that a 2.5D feature representation based on depth edges is clearly superior to intensity edge features commonly used in previous methods.