scispace - formally typeset
Search or ask a question
Book ChapterDOI

perception-based design for tele-presence

TL;DR: A novel perception-driven approach to low-cost tele-presence systems, to support immersive experience in continuity between projected video and conferencing room, using geometry and spectral correction to impart for perceptual continuity to the whole scene.
Abstract: We present a novel perception-driven approach to low-cost tele-presence systems, to support immersive experience in continuity between projected video and conferencing room.We use geometry and spectral correction to impart for perceptual continuity to the whole scene. The geometric correction comes from a learning-based approach to identifying horizontal and vertical surfaces. Our method redraws the projected video to match its vanishing point with that of the conference room in which it is projected. We quantify intuitive concepts such as the depth-of-field using a Gabor filter analysis of overall images of the conference room. We equalise spectral features across the projected video and the conference room, for spectral continuity between the two.

Content maybe subject to copyright    Report

Citations
More filters
Dissertation

[...]

01 Jun 2016
TL;DR: This thesis presents a signature-based framework for efficient visual similarity based video matching, which contributes and addresses the aforementioned problems, i.e. speed, scalability and genericness, by encoding a given video shot into a single compact fixed-length signature.
Abstract: The quantity of digital videos is huge, due to technological advances in video capture, storage and compression. However, the usefulness of these enormous volumes is limited by the effectiveness of content-based video retrieval systems (CBVR) that still requires time-consuming annotating/tagging to feed the text-based search. Visual similarity is the core of these CBVR systems where videos are matched based on their respective visual features and their evolvement across video frames. Also, it acts as an essential foundational layer to infer semantic similarity at advanced stage, in collaboration with metadata. Furthermore, handling such amounts of video data, especially the compressed-domain, forces certain challenges for CBVR systems: speed, scalability and genericness. The situation is even more challenging with availability of nonpixelated features, due to compression, e.g. DC/AC coefficients and motion vectors, that requires sophisticated processing. Thus, a careful features’ selection is important to realize the visual similarity based matching within boundaries of the aforementioned challenges. Matching speed is crucial, because most of the current research is biased towards the accuracy and leaves the speed lagging behind, which in many cases affect the practical uses. Scalability is the key for benefiting from these enormous available videos amounts. Genericness is an essential aspect to develop systems that is applicable to, both, compressed and uncompressed videos. This thesis presents a signature-based framework for efficient visual similarity based video matching. The proposed framework represents a vital component for search and retrieval systems, where it could be used in three possible different ways: (1)Directly for CBVR systems where a user submits a query video and the system retrieves a ranked list of visually similar ones. (2)For text-based video retrieval systems, e.g. YouTube, when a user submits a textual description and the system retrieves a ranked list of relevant videos. The retrieval in this case works by finding videos that were manually assigned similar textual description (annotations). For this scenario, the framework could be used to enhance the annotation process. This is achievable by suggesting an annotations-set for the newly uploading videos. These annotations are derived from other visually similar videos that can be retrieved by the proposed framework. In this way, the framework could make annotations more relevant to video contents (compared to the manual way) which improves the overall CBVR systems’ performance as well. (3)The top-N matched list obtained by the framework, could be used as an input to higher layers, e.g. semantic analysis, where it is easier to perform complex processing on this limited set of videos. i The proposed framework contributes and addresses the aforementioned problems, i.e. speed, scalability and genericness, by encoding a given video shot into a single compact fixed-length signature. This signature is able to robustly encode the shot contents for later speedy matching and retrieval tasks. This is in contrast with the current research trend of using an exhaustive complex features/descriptors, e.g. dense trajectories. Moreover, towards a higher matching speed, the framework operates over a sequence of tiny images (DC-images) rather than full size frames. This limits the need to fully decompress compressed-videos, as the DC-images are exacted directly from the compressed stream. The DC-image is highly useful for complex processing, due to its small size compared to the full size frame. In addition, it could be generated from uncompressed videos as well, while the proposed framework is still applicable in the same manner (genericness aspect). Furthermore, for a robust capturing of the visual similarity, scene and motion information are extracted independently, to better address their different characteristics. Scene information is captured using a statistical representation of scene key colours’ profiles, while motion information is captured using a graph-based structure. Then, both information from scene and motion are fused together to generate an overall video signature. The signature’s compact fixedlength aspect contributes to the scalability aspect. This is because, compact fixedlength signatures are highly indexable entities, which facilitates the retrieval process over large-scale video data. The proposed framework is adaptive and provides two different fixed-length video signatures. Both works in a speedy and accurate manner, but with different degrees of matching speed and retrieval accuracy. Such granularity of the signatures is useful to accommodate for different applications’ trade-offs between speed and accuracy. The proposed framework was extensively evaluated using black-box tests for the overall fused signatures and white-box tests for its individual components. The evaluation was done on multiple challenging large-size datasets against a diverse set of state-ofart baselines. The results supported by the quantitative evaluation demonstrated the promisingness of the proposed framework to support real-time applications.

1 citations


Cites background from "perception-based design for tele-pr..."

  • [...]

  • [...]

References
More filters
Journal ArticleDOI

[...]

TL;DR: The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.
Abstract: In this paper, we propose a computational model of the recognition of real world scenes that bypasses the segmentation and the processing of individual objects or regions. The procedure is based on a very low dimensional representation of the scene, that we term the Spatial Envelope. We propose a set of perceptual dimensions (naturalness, openness, roughness, expansion, ruggedness) that represent the dominant spatial structure of a scene. Then, we show that these dimensions may be reliably estimated using spectral and coarsely localized information. The model generates a multidimensional space in which scenes sharing membership in semantic categories (e.g., streets, highways, coasts) are projected closed together. The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.

6,454 citations


"perception-based design for tele-pr..." refers background in this paper

  • [...]

  • [...]

Proceedings Article

[...]

31 Dec 1993
TL;DR: Results from constrained optimization some results from algebraic geometry differential geometry are shown.
Abstract: Projective geometry modelling and calibrating cameras edge detection representing geometric primitives and their uncertainty stereo vision determining discrete motion from points and lines tracking tokens over time motion fields of curves interpolating and approximating three-dimensional data recognizing and locating objects and places answers to problems. Appendices: constrained optimization some results from algebraic geometry differential geometry.

2,742 citations

Book

[...]

19 Nov 1993

1,259 citations

Book

[...]

02 Jun 1987
TL;DR: A complete production system is proposed which makes use of unified geometric models to be shared by CAD, CAM, and visual processing, and can solve a second difficulty in applying 3-dimensional computer vision to industry, namely the problem of the extensive programming effort.
Abstract: There are some difficulties in applying 3-dimensional computer vision to industry. One of them is that a vast amount of computation is required for low level processing. Some special hardware systems are described and one device is shown in more detail. Applications of the hardware are discussed in three examples. Two methods for range data acquisition employing special processors are described. Then our studies for range data processing are introduced. Finally, a complete production system is proposed which makes use of unified geometric models to be shared by CAD, CAM, and visual processing. This approach can solve a second difficulty in applying 3-dimensional computer vision to industry, namely the problem of the extensive programming effort that is required.

865 citations


"perception-based design for tele-pr..." refers background in this paper

  • [...]

Journal ArticleDOI

[...]

TL;DR: An algebraic representation is developed which unifies the three types of measurement and permits a first order error propagation analysis to be performed, associating an uncertainty with each measurement.
Abstract: We describe how 3D affine measurements may be computed from a single perspective view of a scene given only minimal geometric information determined from the image This minimal information is typically the vanishing line of a reference plane, and a vanishing point for a direction not parallel to the plane It is shown that affine scene structure may then be determined from the image, without knowledge of the camera's internal calibration (eg focal length), nor of the explicit relation between camera and world (pose) In particular, we show how to (i) compute the distance between planes parallel to the reference plane (up to a common scale factor)s (ii) compute area and length ratios on any plane parallel to the reference planes (iii) determine the camera's location Simple geometric derivations are given for these results We also develop an algebraic representation which unifies the three types of measurement and, amongst other advantages, permits a first order error propagation analysis to be performed, associating an uncertainty with each measurement We demonstrate the technique for a variety of applications, including height measurements in forensic images and 3D graphical modelling from single images

721 citations


"perception-based design for tele-pr..." refers background or methods in this paper

  • [...]

  • [...]