perception-based design for tele-presence
27 Jun 2011-pp 154-159
TL;DR: A novel perception-driven approach to low-cost tele-presence systems, to support immersive experience in continuity between projected video and conferencing room, using geometry and spectral correction to impart for perceptual continuity to the whole scene.
Abstract: We present a novel perception-driven approach to low-cost tele-presence systems, to support immersive experience in continuity between projected video and conferencing room.We use geometry and spectral correction to impart for perceptual continuity to the whole scene. The geometric correction comes from a learning-based approach to identifying horizontal and vertical surfaces. Our method redraws the projected video to match its vanishing point with that of the conference room in which it is projected. We quantify intuitive concepts such as the depth-of-field using a Gabor filter analysis of overall images of the conference room. We equalise spectral features across the projected video and the conference room, for spectral continuity between the two.
Content maybe subject to copyright Report
Citations
More filters
Dissertation•
[...]
01 Jun 2016
TL;DR: This thesis presents a signature-based framework for efficient visual similarity based video matching, which contributes and addresses the aforementioned problems, i.e. speed, scalability and genericness, by encoding a given video shot into a single compact fixed-length signature.
Abstract: The quantity of digital videos is huge, due to technological advances in video capture,
storage and compression. However, the usefulness of these enormous volumes
is limited by the effectiveness of content-based video retrieval systems (CBVR) that
still requires time-consuming annotating/tagging to feed the text-based search. Visual
similarity is the core of these CBVR systems where videos are matched based on their
respective visual features and their evolvement across video frames. Also, it acts as an
essential foundational layer to infer semantic similarity at advanced stage, in collaboration
with metadata. Furthermore, handling such amounts of video data, especially
the compressed-domain, forces certain challenges for CBVR systems: speed, scalability
and genericness. The situation is even more challenging with availability of nonpixelated
features, due to compression, e.g. DC/AC coefficients and motion vectors,
that requires sophisticated processing. Thus, a careful features’ selection is important
to realize the visual similarity based matching within boundaries of the aforementioned
challenges. Matching speed is crucial, because most of the current research is biased
towards the accuracy and leaves the speed lagging behind, which in many cases affect
the practical uses. Scalability is the key for benefiting from these enormous available
videos amounts. Genericness is an essential aspect to develop systems that is applicable
to, both, compressed and uncompressed videos.
This thesis presents a signature-based framework for efficient visual similarity
based video matching. The proposed framework represents a vital component for
search and retrieval systems, where it could be used in three possible different ways:
(1)Directly for CBVR systems where a user submits a query video and the system retrieves
a ranked list of visually similar ones. (2)For text-based video retrieval systems,
e.g. YouTube, when a user submits a textual description and the system retrieves a
ranked list of relevant videos. The retrieval in this case works by finding videos that
were manually assigned similar textual description (annotations). For this scenario,
the framework could be used to enhance the annotation process. This is achievable
by suggesting an annotations-set for the newly uploading videos. These annotations
are derived from other visually similar videos that can be retrieved by the proposed
framework. In this way, the framework could make annotations more relevant to video
contents (compared to the manual way) which improves the overall CBVR systems’
performance as well. (3)The top-N matched list obtained by the framework, could be
used as an input to higher layers, e.g. semantic analysis, where it is easier to perform
complex processing on this limited set of videos.
i
The proposed framework contributes and addresses the aforementioned problems,
i.e. speed, scalability and genericness, by encoding a given video shot into a single
compact fixed-length signature. This signature is able to robustly encode the shot
contents for later speedy matching and retrieval tasks. This is in contrast with the
current research trend of using an exhaustive complex features/descriptors, e.g. dense
trajectories. Moreover, towards a higher matching speed, the framework operates over
a sequence of tiny images (DC-images) rather than full size frames. This limits the
need to fully decompress compressed-videos, as the DC-images are exacted directly
from the compressed stream. The DC-image is highly useful for complex processing,
due to its small size compared to the full size frame. In addition, it could be generated
from uncompressed videos as well, while the proposed framework is still applicable
in the same manner (genericness aspect). Furthermore, for a robust capturing of the
visual similarity, scene and motion information are extracted independently, to better
address their different characteristics. Scene information is captured using a statistical
representation of scene key colours’ profiles, while motion information is captured
using a graph-based structure. Then, both information from scene and motion are
fused together to generate an overall video signature. The signature’s compact fixedlength
aspect contributes to the scalability aspect. This is because, compact fixedlength
signatures are highly indexable entities, which facilitates the retrieval process
over large-scale video data.
The proposed framework is adaptive and provides two different fixed-length video
signatures. Both works in a speedy and accurate manner, but with different degrees of
matching speed and retrieval accuracy. Such granularity of the signatures is useful to
accommodate for different applications’ trade-offs between speed and accuracy. The
proposed framework was extensively evaluated using black-box tests for the overall
fused signatures and white-box tests for its individual components. The evaluation
was done on multiple challenging large-size datasets against a diverse set of state-ofart
baselines. The results supported by the quantitative evaluation demonstrated the
promisingness of the proposed framework to support real-time applications.
1 citations
Cites background from "perception-based design for tele-pr..."
[...]
[...]
References
More filters
[...]
TL;DR: The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.
Abstract: In this paper, we propose a computational model of the recognition of real world scenes that bypasses the segmentation and the processing of individual objects or regions. The procedure is based on a very low dimensional representation of the scene, that we term the Spatial Envelope. We propose a set of perceptual dimensions (naturalness, openness, roughness, expansion, ruggedness) that represent the dominant spatial structure of a scene. Then, we show that these dimensions may be reliably estimated using spectral and coarsely localized information. The model generates a multidimensional space in which scenes sharing membership in semantic categories (e.g., streets, highways, coasts) are projected closed together. The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.
6,454 citations
"perception-based design for tele-pr..." refers background in this paper
[...]
[...]
Proceedings Article•
[...]
TL;DR: Results from constrained optimization some results from algebraic geometry differential geometry are shown.
Abstract: Projective geometry modelling and calibrating cameras edge detection representing geometric primitives and their uncertainty stereo vision determining discrete motion from points and lines tracking tokens over time motion fields of curves interpolating and approximating three-dimensional data recognizing and locating objects and places answers to problems. Appendices: constrained optimization some results from algebraic geometry differential geometry.
2,742 citations
Book•
[...]
02 Jun 1987
TL;DR: A complete production system is proposed which makes use of unified geometric models to be shared by CAD, CAM, and visual processing, and can solve a second difficulty in applying 3-dimensional computer vision to industry, namely the problem of the extensive programming effort.
Abstract: There are some difficulties in applying 3-dimensional computer vision to industry. One of them is that a vast amount of computation is required for low level processing. Some special hardware systems are described and one device is shown in more detail. Applications of the hardware are discussed in three examples. Two methods for range data acquisition employing special processors are described. Then our studies for range data processing are introduced. Finally, a complete production system is proposed which makes use of unified geometric models to be shared by CAD, CAM, and visual processing. This approach can solve a second difficulty in applying 3-dimensional computer vision to industry, namely the problem of the extensive programming effort that is required.
865 citations
"perception-based design for tele-pr..." refers background in this paper
[...]
[...]
TL;DR: An algebraic representation is developed which unifies the three types of measurement and permits a first order error propagation analysis to be performed, associating an uncertainty with each measurement.
Abstract: We describe how 3D affine measurements may be computed from a single perspective view of a scene given only minimal geometric information determined from the image This minimal information is typically the vanishing line of a reference plane, and a vanishing point for a direction not parallel to the plane It is shown that affine scene structure may then be determined from the image, without knowledge of the camera's internal calibration (eg focal length), nor of the explicit relation between camera and world (pose)
In particular, we show how to (i) compute the distance between planes parallel to the reference plane (up to a common scale factor)s (ii) compute area and length ratios on any plane parallel to the reference planes (iii) determine the camera's location Simple geometric derivations are given for these results We also develop an algebraic representation which unifies the three types of measurement and, amongst other advantages, permits a first order error propagation analysis to be performed, associating an uncertainty with each measurement
We demonstrate the technique for a variety of applications, including height measurements in forensic images and 3D graphical modelling from single images
721 citations
"perception-based design for tele-pr..." refers background or methods in this paper
[...]
[...]