scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Multimedia in 2006"


Journal ArticleDOI
TL;DR: A user independent fully automatic system for real time recognition of facial actions from the Facial Action Coding System (FACS) automatically detects frontal faces in the video stream and coded each frame with respect to 20 Action units.
Abstract: Spontaneous facial expressions differ from posed expressions in both which muscles are moved, and in the dynamics of the movement. Advances in the field of automatic facial expression measurement will require development and assessment on spontaneous behavior. Here we present preliminary results on a task of facial action detection in spontaneous facial expressions. We employ a user independent fully automatic system for real time recognition of facial actions from the Facial Action Coding System (FACS). The system automatically detects frontal faces in the video stream and coded each frame with respect to 20 Action units. The approach applies machine learning methods such as support vector machines and AdaBoost, to texture-based image representations. The output margin for the learned classifiers predicts action unit intensity. Frame-by-frame intensity measurements will enable investigations into facial expression dynamics which were previously intractable by human coding.

494 citations


Journal ArticleDOI
TL;DR: Methods for detecting emotional facial expressions occurring in a realistic human conversation setting—the Adult Attachment Interview (AAI) are explored to suggest that one-class classification methods can reach a good balance between cost and computing and recognition performance by avoiding non-emotional expression labeling and modeling.
Abstract: Change in a speaker's emotion is a fundamental component in human communication. Automatic recognition of spontaneous emotion would significantly impact human-computer interaction and emotion-related studies in education, psychology and psychiatry. In this paper, we explore methods for detecting emotional facial expressions occurring in a realistic human conversation setting—the Adult Attachment Interview (AAI). Because non-emotional facial expressions have no distinct description and are expensive to model, we treat emotional facial expression detection as a one-class classification problem, which is to describe target objects (i.e., emotional facial expressions) and distinguish them from outliers (i.e., non-emotional ones). Our preliminary experiments on AAI data suggest that one-class classification methods can reach a good balance between cost (labeling and computing) and recognition performance by avoiding non-emotional expression labeling and modeling.

105 citations


Journal ArticleDOI
TL;DR: The main system design parameters that influence the performance of the H.264 encoded video streaming over EGPRS and UMTS bearers and an advanced receiver concept, the so-called permeable layer receiver, is introduced and analyzed.
Abstract: Recently, Multimedia Broadcast Multicast Service (MBMS) has been specified by 3GPP as a Release 6 feature in order to meet the increasing demands of multimedia download and streaming applications in mobile scenarios. H.264, as the unique recommended video codec for MBMS, serves as an essential component because of its high compression efficiency and easy network integration capability. In this study, we introduce and analyze the main system design parameters that influence the performance of the H.264 encoded video streaming over EGPRS and UMTS bearers. Effective design methodology including robustness against packet losses and efficient use of the scarce radio resources is presented. Care is taken on the processing power of mobiles, service delay constraints, and heterogeneous receiving conditions. Then, we investigate application of an advanced receiver concept, the so-called permeable layer receiver, in MBMS video broadcasting environments. Selected simulation results show the suitability of certain parameter selection as well as the benefits provided by the advanced receiver concept. Finally, a real-time test bed for MBMS called RealNeS-MBMS is presented. With this tool, a standard-compliant GERAN network can be simulated and the system design procedure including H.264 based video broadcast streaming can be evaluated in real-time.

90 citations


Journal ArticleDOI
TL;DR: It is shown that when the Bhattacharyya coefficient is applied to gray scale images, it produces biased results.
Abstract: Bhattacharyya coefficient is a popular method that uses color histograms to correlate images. Bhattacharyya Coefficient is believed to be the absolute similarity measure for frequency coded data and it needs no bias correction. In this paper, we show that when this method is applied to gray scale images, it produces biased results. Correlation based on this measure is not adequate for common gray scale images, as the color in grayscale is not a sufficient feature. The biased ness is explored and demonstrated through numerous experiments with different kinds of non-rigid maneuvering objects in cluttered and less cluttered environments, in context to the object tracking. The spectral performance of the Bhattacharyya curve is compared with the spatial matching criterion i.e. Mean Square Difference.

51 citations


Journal ArticleDOI
TL;DR: An improved Active Shape Model for facial features extraction relies on initializing the ASM model using the centers of the mouth and eyes using color information, incorporating RGB color information to represent the local structure of the feature points, and applying 2D affine transformation in aligning the facial features that are perturbed by head pose variations.
Abstract: In this paper we present an improved Active Shape Model (ASM) for facial features extraction. The original ASM developed by Cootes et al. (1) suffers from factors such as, poor model initialization, modeling the intensity of the local structure of the facial features, and alignment of the shape model to a new instant of the object in a given image using simple Euclidian transformation. The core of our enhancement relies on three improvements (a) initializing the ASM model using the centers of the mouth and eyes, which are located using color information, (b) incorporating RGB color information to represent the local structure of the feature points, and (c) applying 2D affine transformation in aligning the facial features that are perturbed by head pose variations, which effectively aligns the matched facial features to the shape model and compensates for the effect of the head pose variations. Experiments on a face database of 70 subjects show that our approach outperforms the standard ASM and is successful in extracting facial features.

40 citations


Journal ArticleDOI
TL;DR: This paper proposes a detection approach not requiring the binarization of the difference image, and results are demonstrated for a crowded scene and evaluation of the proposed tracking framework is presented.
Abstract: Change detection by background subtraction is a common approach to detect moving foreground. The resulting difference image is usually thresholded to obtain objects based on pixel connectedness and resulting blob objects are subsequently tracked. This paper proposes a detection approach not requiring the binarization of the difference image. Local density maxima in the difference image - usually representing moving objects - are outlined by a fast non-parametric mean shift clustering procedure. Object tracking is carried out by updating and propagating cluster parameters over time using the mode seeking property of the mean shift procedure. For occluding targets, a fast procedure determining the object configuration maximizing image likelihood is presented. Detection and tracking results are demonstrated for a crowded scene and evaluation of the proposed tracking framework is presented.

39 citations


Journal ArticleDOI
TL;DR: Two novel methods for rough terrain-mobile robots, using visual input and a semi-empirical model of wheel sinkage are presented, which can be integrated in control and planning algorithms to improve the performance of ground vehicles operating in uncharted environments.
Abstract: External perception based on vision plays a critical role in developing improved and robust localization algorithms, as well as gaining important information about the vehicle and the terrain it is traversing. This paper presents two novel methods for rough terrain-mobile robots, using visual input. The first method consists of a stereovision algorithm for real-time 6DoF ego-motion estimation. It integrates image intensity information and 3D stereo data in the well-known Iterative Closest Point (ICP) scheme. Neither a-priori knowledge of the motion nor inputs from other sensors are required, while the only assumption is that the scene always contains visually distinctive features which can be tracked over subsequent stereo pairs. This generates what is usually referred to as visual odometry. The second method aims at estimating the wheel sinkage of a mobile robot on sandy soil, based on edge detection strategy. A semi-empirical model of wheel sinkage is also presented referring to the classical terramechanics theory. Experimental results obtained with an all-terrain mobile robot and with a wheel sinkage test bed are presented to validate our approach. It is shown that the proposed techniques can be integrated in control and planning algorithms to improve the performance of ground vehicles operating in uncharted environments.

32 citations


Journal ArticleDOI
Ou Yang1, Jianhua Lu1
TL;DR: By taking advantage of traffic periodicity and regularity, a set of CAC and scheduling schemes for real-time video traffic in IEEE 802.16 networks is proposed and simulations with real life video traces show that the proposed schemes may well bear flexibility in balancing throughput, delay and fairness.
Abstract: IEEE 802.16 networks are going to provide broadband wireless access with quality of service (QoS) guarantee. In all of services, real-time video traffic plays an impeditive role because of its varying bit-rate and stringent delay constraint. To the best of our knowledge, no call admission control (CAC) and scheduling schemes cover throughput expectation, delay constraint and fairness requirement simultaneously. In this paper, by taking advantage of traffic periodicity and regularity, a set of CAC and scheduling schemes for real-time video traffic in IEEE 802.16 networks is proposed. Specifically, two key parameters are studied to compromise throughput and delay, as well as, delay and fairness. Simulations with real life video traces show that the proposed schemes may well bear flexibility in balancing throughput, delay and fairness, or, offering significant throughput improvement with acceptable delay and fairness.

27 citations


Journal ArticleDOI
TL;DR: This paper proposes a method for recovering the damaged blocks using the magnitudes of DFT coefficients, and demonstrates that the efficacy of the proposed algorithm is good in locally uniform regions and on edges and textures.
Abstract: In image authentication research, a common approach is to divide a given image into a number of smaller blocks, and embed a fragile watermark into each block. The modifications can therefore be detected in the blocks that have been tampered with. The literature includes many authentication techniques for detecting modifications only. In this paper, we propose a method for recovering the damaged blocks using the magnitudes of DFT coefficients. If a given block is considered to be damaged, we divide it into 2x2 blocks, and replace the magnitude of the DFT coefficient F(0,0) with the predicted magnitude close to the original image. As the F(0,0) coefficients are always real, we quantize them and round them off. An index map and a small set of DFT coefficients from the original image are employed for image recovery and are sent from the sender to the receiver using a public key scheme. As an image authentication system, we will use the scheme proposed by Wong and Memon. In our experiments, the results demonstrate that the efficacy of the proposed algorithm is good in locally uniform regions and on edges and textures.

26 citations


Journal ArticleDOI
TL;DR: In this paper, the authors describe a framework for constructing a 3D immersive environment that can be used for training physical activities, which is designed to capture three dimensional full-body human motion in real time and visualize the data either locally or remotely through 3D display system.
Abstract: This paper describes a framework for constructing a three-dimensional immersive environment that can be used for training physical activities. The system is designed to capture three dimensional full-body human motion in real time and visualize the data either locally or remotely through three- dimensional display system so that the data can be viewed from arbitrary viewpoints. In the proposed system, an immersive environment is constructed through realistic reconstruction of a scene by applying a stereo algorithm on a set of images that is captured from multiple viewpoints. Specifically, twelve camera clusters that consist of four camera quadruples are used to capture the scene, where each camera cluster is processed by a pc independently and synchronously so that partial reconstructions from each viewpoint are merged to form a complete 3D description of the scene. This paper discusses in detail system architectures that enable synchronous operations across multiple computers while achieving parallel computations within each multi-processor system. A set of experiments is performed to learn tai-chi lessons in this environment where students are instructed to follow pre-recorded teacher’s movements while observing both their own motions and the teacher in real-time. The effect of learning in the virtual environment is discussed by comparing the performance of the trainee groups under different control environments.

26 citations


Journal ArticleDOI
TL;DR: The current work augments EvoFIT by developing a set of psychologically useful scales - such as facial weight, masculinity, and age - that allow EvOFIT faces to be manipulated.
Abstract: Facial composites are pictures of human faces. These are normally constructed by victims and witnesses of crime who describe a suspect's face and then select individual facial features. Unfortunately, research has shown that composites constructed in this way are not often recognised. In contrast, we are quite good at recognizing complete faces, even if the face is unfamiliar and only seen briefly. This more natural way of processing faces is at the heart of a new composite system called EvoFIT. With this computer program, witnesses are presented with sets of complete faces for selection and a composite is 'evolved' over time. The current work augments EvoFIT by developing a set of psychologically useful scales - such as facial weight, masculinity, and age - that allow EvoFIT faces to be manipulated. These holistic dimensions were implemented by increasing the size and variability of the underlying face model and by obtaining perceptual ratings so that the space could be suitably vectorised. The result of three evaluations suggested that the new dimensions were operating appropriately. Index Terms—facial composite, holistic, witness, crime, EvoFIT

Journal ArticleDOI
TL;DR: Algorithms and applications for using the hand as an interface device in virtual and physical spaces, including parametric modelling of the central region of the hand, are proposed and experiments are presented to demonstrate the proposed applications.
Abstract: We propose algorithms and applications for using the hand as an interface device in virtual and physical spaces. In virtual drawing, by tracking the hand in 3-D and estimating a virtual plane in space, the intended drawing of user is recognized. In a virtual marble game, the instantaneous orientation of the hand is simulated to render a graphical scene of the game board. Real-time visual feedback allows the user to navigate a virtual ball in a maze. In 3-D model construction, the system tracks the hand motion in space while the user is traversing edges of a physical object. The object is then rendered virtually by the computer. These applications involve estimating the 3-D absolute position and/or orientation of the hand in space. We propose parametric modelling of the central region of the hand to extract this information. A stereo camera is used to first build a preliminary disparity map of the hand. Then, the best fitting plane to the disparity points is computed using robust estimation. The 3-D hand plane is calculated based on the disparity plane and the position and orientation parameters of the hand. Tracking the hand region over a sequence of frames and coping with noise using robust modelling of the hand motion enables estimating the trajectory of the hand in space. The algorithms are real-time and experiments are presented to demonstrate the proposed applications of using the hand as an interface device.

Journal ArticleDOI
TL;DR: A clustering-based tracking algorithm for tracking people (e.g. hand, head, eyeball, body, and lips) and achieves the automatic self tracking failure detection and recovery is presented.
Abstract: In this paper, we present a clustering-based tracking algorithm for tracking people (e.g. hand, head, eyeball, body, and lips). It is always a challenging task to track people under complex environment, because such target often appears as a concave object or having apertures. In this case, many background areas are mixed into the tracking area which are difficult to be removed by modifying the shape of the search area during tracking. Our method becomes a robust tracking algorithm by applying the following four key ideas simultaneously: 1) Using a 5D feature vector to describe both the geometric feature “(x,y)” and color feature “(Y,U,V)” of each pixel uniformly. This description ensures our method to follow both the position and color changes simultaneously during tracking; 2) This algorithm realizes the robust tracking for objects with apertures by classifying the pixels, within the search area, into “target” and “background” with K-means clustering algorithm that uses both the “positive” and “negative” samples. 3) Using a variable ellipse model (a) to describe the shape of a nonrigid object (e.g. hand) approximately, (b) to restrict the search area, and (c) to model the surrounding non-target background. This guarantees the stable tracking of objects with various geometric transformations. 4) With both the “positive” and “negative” samples, our algorithm achieves the automatic self tracking failure detection and recovery. This ability makes our method distinctively more robust than the conventional tracking algorithms. Through extensive experiments in various environments and conditions, the effectiveness and the efficiency of the proposed algorithm is confirmed.

Journal ArticleDOI
TL;DR: The performance of combining Hough transform and Hidden Markov Models in a multifont Arabic OCR system is described and some promising experimental results are reported.
Abstract: Optical Characters Recognition (OCR) has been an active subject of research since the early days of computers. Despite the age of the subject, it remains one of the most challenging and exciting areas of research in computer science. In recent years it has grown into a mature discipline, producing a huge body of work. Arabic character recognition has been one of the last major languages to receive attention. This is due, in part, to the cursive nature of the task since even printed Arabic characters are in cursive form. This paper describes the performance of combining Hough transform and Hidden Markov Models in a multifont Arabic OCR system. Experimental tests have been carried out on a set of 85.000 samples of characters corresponding to 5 different fonts from the most commonly used in Arabic writing. Some promising experimental results are reported.

Journal ArticleDOI
TL;DR: A technique for gait recognition from motion capture data based on two successive stages of principal component analysis (PCA) on kinematic data promising recognition performance is obtained.
Abstract: We propose a technique for gait recognition from motion capture data based on two successive stages of principal component analysis (PCA) on kinematic data. The first stage of PCA provides a low dimensional representation of gait. Components of this representation closely correspond to particular spatiotemporal features of gait that we have shown to be important for visual recognition of gait in a separate psychophysical study. A second stage of PCA captures the shape of the trajectory within the low dimensional space during a given gait cycle across different individuals or gaits. The projection space of the second stage of PCA has distinguishable clusters corresponding to the individual identity and type of gait. Despite the simple eigen-analysis based approach, promising recognition performance is obtained.

Journal ArticleDOI
TL;DR: The proposed target detection method in low contrast forward looking infrared (FLIR) images is superior to the morphological method in terms of receiver operating characteristic (ROC) curve and average computation time.
Abstract: This paper proposes a new target detection method in low contrast forward looking infrared (FLIR) images. Automatic detection of small targets in remotely sensed images is a difficult and challenging work. The goal is to find out target locations with low false alarms in a thermal infrared scene of battlefield. The interesting targets are military vehicles such as battle tanks and armored personal carriers in ground-to-ground scenarios. The proposed method consists of three following stages. First, center-surround difference is proposed in order to find salient areas in an input image. Second, local thresholding for a region of interest (ROI) is proposed. The ROI is selected on the basis of a salient region that is the result of first step. Third, the shape of extracted binary images is compared with binary target templates using size and affinity to remove clutters. In the experiments, the proposed method is compared with morphology method using many natural infrared images with high variability. The result demonstrates that our method is superior to the morphological method in terms of receiver operating characteristic (ROC) curve and average computation time.

Journal Article
TL;DR: It is argued that physical location data can be regarded as a type of presence information and proposed an architecture which reuses a large part of the IMS presence infra-structure by applying presence mechanisms, like notification handling, access control and privacy management, to location data.
Abstract: The 3GPP IP Multimedia Subsystem (IMS) is currently expected to provide the basic architecture framework for the Next Generation Network which will bridge the traditional divide between circuit-switched and packet-switched networks and consolidate both sides into one single network for all services There-fore, the imminent commercial roll-out of IMS will have immense impact both for the migration of the core network as well as the integration of future mobile services and applications This paper presents an OpenSER-based experimental testbed which has been designed as a minimal standard-compliant IMS core network We discuss major practical requirements and describe our implementation of this "IMS in a bottle" approach Furthermore, we introduce a terminal-based native IMS location service enabler We argue that physical location data can be regarded as a type of presence information and propose an architecture which reuses a large part of the IMS presence infra-structure by applying presence mechanisms, like notification handling, access control and privacy management, to location data We demonstrate that the realization of this service can be integrated efficiently into the IMS core environment, and present initial evaluation results for the joint demonstrator Finally, important current and future challenges including migration, interworking, charging, Quality-of-Service, identity management, security, and regulatory aspects, are discussed in detail, thus ending up with an up-to-date research agenda

Journal ArticleDOI
TL;DR: This paper presents an efficient, robust and fully automatic real-time system for 3D object pose tracking in image sequences that achieves full automatic and robust object tracking in real world situations, requiring very small manual intervention from the user.
Abstract: This paper presents an efficient, robust and fully automatic real-time system for 3D object pose tracking in image sequences. The developed application integrates two main components: an extended and optimized implementa- tion of a state-of-the-art local curve fitting algorithm, and a robust global re-initialization module based on point feature matching. By combining the two main components, together with a trajectory coherence and image cross-correlation check modules, our system achieves full automatic and robust object tracking in real world situations, requiring very small manual intervention from the user. The developed application relies upon a few standard libraries available on most platforms, and runs at video frame rate on a PC with standard hardware equipment.

Journal ArticleDOI
TL;DR: This article built a simple acquisition system composed of 5 standard cameras which can take simultaneously 5 views of a face at different angles, and chose an easily hardware-achievable algorithm, consisting of successive linear transformations, in order to compose a panoramic face from these 5 views.
Abstract: In this article, we present some development results of a system that performs mosaicing (or mosaicking) of panoramic faces. Our objective is to study the feasibility of panoramic face construction in real-time. To do so, we built a simple acquisition system composed of 5 standard cameras which, together, can take simultaneously 5 views of a face at different angles. Then, we chose an easily hardware-achievable algorithm, consisting of successive linear transformations, in order to compose a panoramic face from these 5 views. The method has been tested on a relatively large number of faces. In order to validate our system of panoramic face mosaicing, we also conducted a preliminary study on panoramic faces recognition, based on the principal component analysis method. Experimental results show the feasibility and viability of our system.

Journal ArticleDOI
TL;DR: SSIML/AR is introduced, a visual modeling language for the abstract specification of AR applications in general and AR user interfaces in particular that enables the seamless transition from the design level to the implementation level.
Abstract: Augmented Reality (AR) technologies open up new possibilities especially for task-focused domains such as assembly and maintenance. However, it can be noticed that there is still a lack of concepts and tools for a structured AR development process and an application specification above the code level. To address this problem we introduce SSIML/AR, a visual modeling language for the abstract specification of AR applications in general and AR user interfaces in particular. With SSIML/AR, three different aspects of AR user interfaces can be described: The user interface structure, the presentation of relevant information depending on the user’s current task and the integration of the user interface with other system components. Code skeletons can be generated automatically from SSIML/AR models. This enables the seamless transition from the design level to the implementation level. In addition, we sketch how SSIML/AR models can be integrated in an overall AR development process.

Journal ArticleDOI
TL;DR: The proposed hierarchical Kernel Associative Memory (KAM) face recognition scheme with a multiscale Gabor transform demonstrated strong robustness in recognizing faces under different conditions, particularly under occlusions, pose alterations and expression changes.
Abstract: Face recognition can be studied as an associative memory (AM) problem and kernel-based AM models have been proven efficient. In this paper, a hierarchical Kernel Associative Memory (KAM) face recognition scheme with a multiscale Gabor transform, is proposed. The pyrami- dal multiscale Gabor decomposition proposed by Nestares, Navarro, Portilla and Tabernero not only provides a very efficient implementation of the Gabor transform in the spatial domain, but also permits a fast reconstruction of images. In our method, face images of each person are first decomposed into their multiscale representations by a quasicomplete Gabor transform, which are then modelled by Kernel Associative Memories. In the recognition stage, a query face image is also represented by a Gabor mul- tiresolution pyramid and the reconstructions from different KAM models corresponding to even Gabor channels are then simply summed to give the recall. The recognition scheme was thoroughly tested using several benchmarking face datasets, including the AR faces, UMIST faces, JAFFE faces and Yale A faces, which include different kind of face variations from occlusions, pose, expression and illu- mination. The experiment results show that the proposed method demonstrated strong robustness in recognizing faces under different conditions, particularly under occlusions, pose alterations and expression changes.

Journal Article
TL;DR: The paper introduces the system DITIS, identifies the needs and challenges of co-ordinated teams of multidisciplinary healthcare professionals and discusses relevant computing models for their implementation.
Abstract: This paper presents an e-health mobile application, called DITIS, which supports networked collaboration for home healthcare. The system was originally developed with a view to address the difficulties of continuity of care and communication between the members of a home health care multidisciplinary team. The paper introduces the system DITIS, identifies the needs and challenges of co-ordinated teams of multidisciplinary healthcare professionals and discusses relevant computing models for their implementation. The adopted technology as well as the security needs and a multilayer security framework are briefly described. An evaluation study of the system is also briefly presented.

Journal ArticleDOI
TL;DR: In this paper, an efficient representation of hand motions by independent component analysis (ICA) is proposed for tracking and recognizing hand-finger gestures in an image sequence, where the ICA basis vectors represent local features, each of which corresponds to the motion of a particular finger.
Abstract: This paper introduces a new representation of hand motions for tracking and recognizing hand-finger gestures in an image sequence. A human hand has many joints, for example our hand model has 15, and its high dimensionality makes it difficult to model hand motions. To make things easier, it is important to represent a hand motion in a low dimensional space. Principle component analysis (PCA) has been proposed to reduce the dimensionality. However, the PCA basis vectors only represent global features, which are not optimal for representing intrinsic features. This paper proposes an efficient representation of hand motions by independent component analysis (ICA). The ICA basis vectors represent local features, each of which corresponds to the motion of a particular finger. This representation is more efficient in modeling hand motions for tracking and recognizing handfinger gestures in an image sequence. We will demonstrate the effectiveness of the method by tracking a hand in real image sequences.

Journal ArticleDOI
TL;DR: In this work, the weights given to landmarks are optimized, and thereby improved the recognition rates for the two benchmarks used.
Abstract: A new method named Landmark Model Matching was recently proposed for fully automatic face recognition. It was inspired by Elastic Bunch Graph Matching and Active Shape Model. Landmark Model Matching consists of four phases: creation of the landmark distribution model, face finding, landmark finding, and recognition. A drawback in Landmark Model Matching is that, in the recognition phase, the weights given to different landmarks or facial feature points were determined experimentally. In this work, we optimized the weights given to landmarks, and thereby improved the recognition rates for the two benchmarks used.

Journal ArticleDOI
TL;DR: In BitVampire, participating peers help each other to get the desired media content, thus powerful servers/proxies are not necessary, which makes it a cost- effective approach.
Abstract: This paper presents a cost-effective peer-to-peer (P2P) architecture for large-scale on-demand media streaming, named BitVampire. BitVampire’s primary design goal is to aggregate peers’ storage and upstream bandwidth to facilitate on-demand media streaming. To achieve this goal, BitVampire splits published videos into segments and distributes them to different peers. When a peer (or a receiver) wants to watch a video, it (i) searches the corresponding segments, then (ii) selfishly determines the best subset of supplying peers and (iii) aggregates bandwidth from these peers to stream the media content. In BitVampire, participating peers help each other to get the desired media content, thus powerful servers/proxies are not necessary, which makes it a cost- effective approach. To demonstrate the effectiveness of BitVampire, we conducted extensive simulation on large, hierarchical, Internet-like topologies. We also implemented a functional prototype using Java and Java Media Framework (JMF) to demonstrate the feasibility of BitVampire.

Journal ArticleDOI
TL;DR: This work presents a novel algorithm which breaks this restriction, allowing to register 3D scans of faces with arbitrary identity and expression, and can process incomplete data, yielding results which are both continuous and with low reconstruction error.
Abstract: The registration of 3D scans of faces is a key step for many applications, in particular for building 3D Morphable Models. Although a number of algorithms are already available for registering data with neutral expression, the registration of scans with arbitrary expressions is typically performed under the assumption of a known, fixed identity. We present a novel algorithm which breaks this restriction, allowing to register 3D scans of faces with arbitrary identity and expression. Furthermore, our algorithm can process incomplete data, yielding results which are both continuous and with low reconstruction error. Even in the case of complete, expression-less data, our method can yield better results than previous algorithms, due to an adaptive smoothing, which regularizes the results surface only where the estimated correspondence is unreliable.

Journal ArticleDOI
TL;DR: This work proposes a new method for head and hands detection that relies on geometric information from disparity maps, locally refined by color processing, and successfully found more than 97% of target features, with very few false positives.
Abstract: We address the need for robust detection of obstructed human features in complex environments, with a focus on intelligent surgical UIs. In our setup, real-time detection is used to find features without the help of local (spatial or temporal) information. Such a detector is used to validate, correct or reject the output of the visual feature tracking, which is locally more robust, but drifts over time. In Operating Rooms (OR), surgeons’ faces and hands are typically obstructed by sterile clothing and tools, making statistical and/or feature-based feature detection approaches ineffective. We propose a new method for head and hands detection that relies on geometric information from disparity maps, locally refined by color processing. We have applied our method to a surgical mock- up scene, as well as to images gathered during real surgery. Running in a realtime, continuous detection loop, our detector successfully found more than 97% of target features, with very few false positives (less than 0.7%).

Journal ArticleDOI
TL;DR: Experiments with the standard bearings-only tracking problem indicate that the proposed new particle filter method is indeed very successful when observations are reliable and the advantage of the new filter grows with the increasing dimensionality of the system.
Abstract: In the low observation noise limit particle filters become inefficient. In this paper a simple-to- implement particle filter is suggested as a solution to this well-known problem. The proposed Local Importance Sampling based particle filters draw the particles’ positions in a two-step process that makes use of both the dynamics of the system and the most recent observation. Experiments with the standard bearings-only tracking problem indicate that the proposed new particle filter method is indeed very successful when observations are reliable. Experiments with a high-dimensional variant of this problem further show that the advantage of the new filter grows with the increasing dimensionality of the system.

Journal Article
TL;DR: In this article, the authors discuss how to support peers with multimedia streaming service by using multiple contents peers and discuss a pair of flooding-based protocols, distributed and tree-based coordination protocols DCoP and TCoP, to synchronize multiple peers to reliably and efficiently deliver packets to a requesting peer.
Abstract: Multimedia contents are distributed to peers in various ways in peer-to-peer (P2P) overlay networks. A peer which holds a content, even a part of a content can provide other peers with the content. Multimedia streaming is more significant in multimedia applications than downloading ways in Internet applications. We discuss how to support peers with multimedia streaming service by using multiple contents peers. In our distributed multi-source streaming model, a collection of multiple contents peers in parallel transmit packets of a multimedia content to a requesting leaf peer to realize the reliability and scalability without any centralized controller. Even if some peer stops by fault and is degraded in performance and packets are lost and delayed in networks, a requesting leaf peer receives every data of a content at the required rate. We discuss a pair of flooding-based protocols, distributed and treebased coordination protocols DCoP and TCoP, to synchronize multiple contents peers to reliably and efficiently deliver packets to a requesting peer. A peer can be redundantly selected by multiple peers in DCoP but it taken by at most one peer in TCoP. We evaluate the coordination protocols DCoP and TCoP in terms of how long it takes and how many messages are transmitted to synchronize multiple contents peers.

Journal ArticleDOI
TL;DR: Two methods for automatic facial gesturing of graphically embodied animated agents are presented and one provides virtual speaker capable of reading plain English text and rendering it in a form of speech accompanied by the appropriate facial gestures.
Abstract: We present two methods for automatic facial gesturing of graphically embodied animated agents. In one case, conversational agent is driven by speech in automatic Lip Sync process. By analyzing speech input, lip movements are determined from the speech signal. Another method provides virtual speaker capable of reading plain English text and rendering it in a form of speech accompanied by the appropriate facial gestures. Proposed statistical model for generating virtual speaker’s facial gestures can be also applied as addition to lip synchronization process in order to obtain speech driven facial gesturing. In this case statistical model will be triggered with the input speech prosody instead of lexical analysis of the input text.