scispace - formally typeset
Search or ask a question

Showing papers on "Orientation (computer vision) published in 2006"


Journal ArticleDOI
TL;DR: In this paper, the main problems and the available solutions for the generation of 3D models from terrestrial images are addressed, and the full pipeline is presented for 3D modelling from terrestrial image data, considering the different approaches and analyzing all the steps involved.
Abstract: In this paper the main problems and the available solutions are addressed for the generation of 3D models from terrestrial images. Close range photogrammetry has dealt for many years with manual or automatic image measurements for precise 3D modelling. Nowadays 3D scanners are also becoming a standard source for input data in many application areas, but image-based modelling still remains the most complete, economical, portable, flexible and widely used approach. In this paper the full pipeline is presented for 3D modelling from terrestrial image data, considering the different approaches and analysing all the steps involved.

848 citations


Journal ArticleDOI
TL;DR: An accurate and fast method for fiber orientation mapping using multidirectional diffusion-weighted magnetic resonance (MR) data using the Fourier transform relationship between the water displacement probabilities and diffusion-attenuated MR signal expressed in spherical coordinates is described.

432 citations


Dissertation
17 Jul 2006
TL;DR: This thesis introduces grids of locally normalised Histograms of Oriented Gradients (HOG) as descriptors for object detection in static images and proposes descriptors based on oriented histograms of differential optical flow to detect moving humans in videos.
Abstract: This thesis targets the detection of humans and other object classes in images and videos. Our focus is on developing robust feature extraction algorithms that encode image regions as highdimensional feature vectors that support high accuracy object/non-object decisions. To test our feature sets we adopt a relatively simple learning framework that uses linear Support Vector Machines to classify each possible image region as an object or as a non-object. The approach is data-driven and purely bottom-up using low-level appearance and motion vectors to detect objects. As a test case we focus on person detection as people are one of the most challenging object classes with many applications, for example in film and video analysis, pedestrian detection for smart cars and video surveillance. Nevertheless we do not make any strong class specific assumptions and the resulting object detection framework also gives state-of-the-art performance for many other classes including cars, motorbikes, cows and sheep. This thesis makes four main contributions. Firstly, we introduce grids of locally normalised Histograms of Oriented Gradients (HOG) as descriptors for object detection in static images. The HOG descriptors are computed over dense and overlapping grids of spatial blocks, with image gradient orientation features extracted at fixed resolution and gathered into a highdimensional feature vector. They are designed to be robust to small changes in image contour locations and directions, and significant changes in image illumination and colour, while remaining highly discriminative for overall visual form. We show that unsmoothed gradients, fine orientation voting, moderately coarse spatial binning, strong normalisation and overlapping blocks are all needed for good performance. Secondly, to detect moving humans in videos, we propose descriptors based on oriented histograms of differential optical flow. These are similar to static HOG descriptors, but instead of image gradients, they are based on local differentials of dense optical flow. They encode the noisy optical flow estimates into robust feature vectors in a manner that is robust to the overall camera motion. Several variants are proposed, some capturing motion boundaries while others encode the relative motions of adjacent image regions. Thirdly, we propose a general method based on kernel density estimation for fusing multiple overlapping detections, that takes into account the number of detections, their confidence scores and the scales of the detections. Lastly, we present work in progress on a parts based approach to person detection that first detects local body parts like heads, torso, and legs and then fuses them to create a global overall person detector.

340 citations


Book ChapterDOI
13 Jan 2006
TL;DR: In this paper, an efficient data structure for both, the detector and the descriptor is proposed, where the detector is based on orientation histograms, which is accelerated by the use of an integral orientation histogram.
Abstract: We propose a considerably faster approximation of the well known SIFT method. The main idea is to use efficient data structures for both, the detector and the descriptor. The detection of interest regions is considerably speed-up by using an integral image for scale space computation. The descriptor which is based on orientation histograms, is accelerated by the use of an integral orientation histogram. We present an analysis of the computational costs comparing both parts of our approach to the conventional method. Extensive experiments show a speed-up by a factor of eight while the matching and repeatability performance is decreased only slightly.

228 citations


Proceedings ArticleDOI
14 May 2006
TL;DR: A new correlation based method for matching two images with large camera motion based on the rotation and scale invariant normalized cross-correlation, which is effective for matching image pairs with significant rotation and Scale changes.
Abstract: Correlation is widely used as an effective similarity measure in matching tasks. However, traditional correlation based matching methods are limited to the short baseline case. In this paper we propose a new correlation based method for matching two images with large camera motion. Our method is based on the rotation and scale invariant normalized cross-correlation. Both the size and the orientation of the correlation windows are determined according to the characteristic scale and the dominant direction of the interest points. Experimental results on real images demonstrate that the new method is effective for matching image pairs with significant rotation and scale changes as well as other common imaging conditions.

201 citations


Dissertation
01 Jan 2006
TL;DR: The approach couples topic models originally developed for text analysis with spatial transformations, and thus consistently accounts for geometric constraints by building integrated scene models, which may discover contextual relationships, and better exploit partially labeled training images.
Abstract: We develop statistical methods which allow effective visual detection, categorization, and tracking of objects in complex scenes. Such computer vision systems must be robust to wide variations in object appearance; the often small size of training databases, and ambiguities induced by articulated or partially occluded objects. Graphical models provide a powerful framework for encoding the statistical structure of visual scenes, and developing corresponding learning and inference algorithms. In this thesis, we describe several models which integrate graphical representations with nonparametric statistical methods. This approach leads to inference algorithms which tractably recover high-dimensional, continuous object pose variations, and learning procedures which transfer knowledge among related recognition tasks. Motivated by visual tracking problems, we first develop a nonparametric extension of the belief propagation (BP) algorithm. Using Monte Carlo methods, we provide general procedures for recursively updating particle-based approximations of continuous sufficient statistics. Efficient multiscale sampling methods then allow this nonparametric BP algorithm to be flexibly adapted to many different applications. As a particular example, we consider a graphical model describing the hand's three-dimensional (3D) structure, kinematics, and dynamics. This graph encodes global hand pose via the 3D position and orientation of several rigid components, and thus exposes local structure in a high-dimensional articulated model. Applying nonparametric BP, we recover a hand tracking algorithm which is robust to outliers and local visual ambiguities. Via a set of latent occupancy masks, we also extend our approach to consistently infer occlusion events in a distributed fashion. In the second half of this thesis, we develop methods for learning hierarchical models of objects, the parts composing them, and the scenes surrounding them. Our approach couples topic models originally developed for text analysis with spatial transformations, and thus consistently accounts for geometric constraints. By building integrated scene models, we may discover contextual relationships, and better exploit partially labeled training images. We first consider images of isolated objects, and show that sharing parts among object categories improves accuracy when learning from few examples. Turning to multiple object scenes, we propose nonparametric models which use Dirichlet processes to automatically learn the number of parts underlying each object category, and objects composing each scene. Adapting these transformed Dirichlet processes to images taken with a binocular stereo camera, we learn integrated, 3D models of object geometry and appearance. This leads to a Monte Carlo algorithm which automatically infers 3D scene structure from the predictable geometry of known object categories. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

200 citations


Proceedings ArticleDOI
26 Jun 2006
TL;DR: This work presents a new volumetric method for reconstructing watertight triangle meshes from arbitrary, unoriented point clouds that efficiently produces solid models of low genus even for noisy and highly irregular data containing large holes, without loosing fine details in densely sampled regions.
Abstract: We present a new volumetric method for reconstructing watertight triangle meshes from arbitrary, unoriented point clouds. While previous techniques usually reconstruct surfaces as the zero level-set of a signed distance function, our method uses an unsigned distance function and hence does not require any information about the local surface orientation. Our algorithm estimates local surface confidence values within a dilated crust around the input samples. The surface which maximizes the global confidence is then extracted by computing the minimum cut of a weighted spatial graph structure. We present an algorithm, which efficiently converts this cut into a closed, manifold triangle mesh with a minimal number of vertices. The use of an unsigned distance function avoids the topological noise artifacts caused by misalignment of 3D scans, which are common to most volumetric reconstruction techniques. Due to a hierarchical approach our method efficiently produces solid models of low genus even for noisy and highly irregular data containing large holes, without loosing fine details in densely sampled regions. We show several examples for different application settings such as model generation from raw laser-scanned data, image-based 3D reconstruction, and mesh repair.

197 citations


Journal ArticleDOI
TL;DR: 3D displays can be very effective for approximate navigation and relative positioning when appropriate cues, such as shadows, are present, but are not effective for precise navigation and positioning except possibly in specific circumstances, for instance, when good viewing angles or measurement tools are available.
Abstract: We describe a series of experiments that compare 2D displays, 3D displays, and combined 2D/3D displays (orientation icon, ExoVis, and clip planes) for relative position estimation, orientation, and volume of interest tasks. Our results indicate that 3D displays can be very effective for approximate navigation and relative positioning when appropriate cues, such as shadows, are present. However, 3D displays are not effective for precise navigation and positioning except possibly in specific circumstances, for instance, when good viewing angles or measurement tools are available. For precise tasks in other situations, orientation icon and ExoVis displays were better than strict 2D or 3D displays (displays consisting exclusively of 2D or 3D views). The combined displays had as good or better performance, inspired higher confidence, and allowed natural, integrated navigation. Clip plane displays were not effective for 3D orientation because users could not easily view more than one 2D slice at a time and had to frequently change the visibility of individual slices. Major factors contributing to display preference and usability were task characteristics, orientation cues, occlusion, and spatial proximity of views that were used together.

159 citations


Proceedings ArticleDOI
01 Jan 2006
TL;DR: In this article, a vision-based estimation and control of a quadrotor vehicle using a single camera relative to a novel target that incorporates the use of moire patterns is presented.
Abstract: We present the vision-based estimation and control of a quadrotor vehicle using a single camera relative to a novel target that incorporates the use of moire patterns. The objective is to acquire the six degree of freedom estimation that is essential for the operation of vehicles in close proximity to other craft and landing platforms. A target contains markers to determine its relative orientation and locate two sets of orthogonal moire patterns at two difierent frequencies. A camera is mounted on the vehicle with the target in the fleld of view. An algorithm processes the images, extracting the attitude and position information of the camera relative to the target utilizing geometry and four single-point discrete Fourier transforms on the moire patterns. The position and yaw estimations with accompanying control techniques have been implemented on a remote-controlled quadrotor. The ∞ight tests conducted prove the system's feasibility as an option for precise relative navigation for indoor and outdoor operations.

147 citations


Book ChapterDOI
06 Apr 2006-CLEaR
TL;DR: This paper addresses the problem of estimating head pose over a wide range of angles from low-resolution images usingGrey-level normalized face imagettes serve as input for linear auto-associative memory, and achieves similar results in estimating orientation in tilt (head nodding) angle, and higher precision for estimating Orientation in the pan (side-to-side) angle.
Abstract: This paper addresses the problem of estimating head pose over a wide range of angles from low-resolution images. Faces are detected using chrominance-based features. Grey-level normalized face imagettes serve as input for linear auto-associative memory. One memory is computed for each pose using a Widrow-Hoff learning rule. Head pose is classified with a winner-takes-all process. We compare results from our method with abilities of human subjects to estimate head pose from the same data set. Our method achieves similar results in estimating orientation in tilt (head nodding) angle, and higher precision for estimating orientation in the pan (side-to-side) angle.

127 citations


Journal ArticleDOI
TL;DR: Experimental results demonstrate that MAP filtering can be successfully applied to SAR images represented in the shift-invariant wavelet domain, without resorting to a logarithmic transformation.
Abstract: In this paper, a new despeckling method based on undecimated wavelet decomposition and maximum a posteriori (MAP) estimation is proposed. Such a method relies on the assumption that the probability density function (pdf) of each wavelet coefficient is generalized Gaussian (GG). The major novelty of the proposed approach is that the parameters of the GG pdf are taken to be space-varying within each wavelet frame. Thus, they may be adjusted to spatial image context, not only to scale and orientation. Since the MAP equation to be solved is a function of the parameters of the assumed pdf model, the variance and shape factor of the GG function are derived from the theoretical moments, which depend on the moments and joint moments of the observed noisy signal and on the statistics of speckle. The solution of the MAP equation yields the MAP estimate of the wavelet coefficients of the noise-free image. The restored SAR image is synthesized from such coefficients. Experimental results, carried out on both synthetic speckled images and true SAR images, demonstrate that MAP filtering can be successfully applied to SAR images represented in the shift-invariant wavelet domain, without resorting to a logarithmic transformation

Patent
24 Jan 2006
TL;DR: In this article, a system and method for imaging a document, and using a reference document to place pieces of the document in their correct relative position and resize such pieces in order to generate a single unified image, including the electronic capturing a document with one or multiple images using an imaging device.
Abstract: A system and method for imaging a document, and using a reference document to place pieces of the document in their correct relative position and resize such pieces in order to generate a single unified image, including the electronic capturing a document with one or multiple images using an imaging device, the performing of pre-processing of said images to optimize the results of subsequent image recognition, enhancement, and decoding, the comparing of said images against a database of reference documents to determine the most closely fitting reference document, and the applying of knowledge from said closely fitting reference document to adjust geometrically the orientation, shape, and size of said electronically captured images so that said images correspond as closely as possibly to said reference document.

Proceedings ArticleDOI
14 Jun 2006
TL;DR: 2D-lines are automatically detected in images with the assistance of an EM-based vanishing point estimation method which assumes the existence of edges along mutally orthogonal vanishing directions and is used to reduce the number of degrees of freedom of 3D lines during optimization.
Abstract: We present a novel method for recovering the 3D-line structure of a scene from multiple widely separated views. Traditional optimization-based approaches to line-based structure from motion minimize the error between measured line segments and the projections of corresponding 3D lines. In such a case, 3D lines can be optimized using a minimum of 4 parameters. We show that this number of parameters can be further reduced by introducing additional constraints on the orientations of lines in a 3D scene. In our approach, 2D-lines are automatically detected in images with the assistance of an EM-based vanishing point estimation method which assumes the existence of edges along mutally orthogonal vanishing directions. Each detected line is automatically labeled with the orientation (e.g. vertical, horizontal) of the 3D line which generated the measurement, and it is this additional knowledge that we use to reduce the number of degrees of freedom of 3D lines during optimization. We present 3D reconstruction results for urban scenes based on manually established feature correspondences across images.

Journal ArticleDOI
TL;DR: A two-dimensional edge adaptive lifting structure, which is similar to Daubechies 5/3 wavelet, is presented and the 2-D prediction filter predicts the value of the next polyphase component according to an edge orientation estimator of the image.
Abstract: Lifting-style implementations of wavelets are widely used in image coders. A two-dimensional (2-D) edge adaptive lifting structure, which is similar to Daubechies 5/3 wavelet, is presented. The 2-D prediction filter predicts the value of the next polyphase component according to an edge orientation estimator of the image. Consequently, the prediction domain is allowed to rotate /spl plusmn/45/spl deg/ in regions with diagonal gradient. The gradient estimator is computationally inexpensive with additional costs of only six subtractions per lifting instruction, and no multiplications are required.

Patent
Yuichi Bannai1
31 Mar 2006
TL;DR: In this paper, an information processing method and apparatus enables one or more further users to share a mixed reality space image including a virtual object (43) superimposed in a space where a first user (40) exists.
Abstract: An information processing method and apparatus enables one or more further users (50) to share a mixed reality space image including a virtual object(43) superimposed in a space where a first user (40) exists. A first stereo image is acquired based on a stereo video captured by a first stereo capturing section (22) mounted on the first user and a virtual object image created based on the position and orientation of the first stereo capturing section. A second stereo image is acquired based on a stereo video captured by a second stereo capturing section (70) provided in the space where the first user exists and a virtual object image (43) created based on the position and orientation of the second stereo capturing section. An image is selected from the first stereo image and the second stereo image according to an instruction of the further user. The selected image is presented to the further user.

Journal ArticleDOI
TL;DR: Field experiments demonstrate that ATSIS can robustly measure hundreds of matched image points in seconds, allowing fast extraction of the temporal evolution of a three-dimensional surface wave field.

Journal ArticleDOI
TL;DR: A scheme for systematically estimating fingerprint ridge orientation and segmenting fingerprint image by means of evaluating the correctness of the ridge orientation based on neural network is proposed and is compared with VeriFinger 4.2 published by Neurotechnologija Ltd. in 2004, and the comparison shows that the proposed scheme leads to an improved accuracy of minutiae detection.

Patent
25 Jul 2006
TL;DR: In this paper, the authors propose a method for generating an image sequence using an image capture device. But the method is limited to images generated by the device and representative of an existing image of a sequence to assist a user of the device to capture one or more subsequent images for the sequence.
Abstract: A method for generating an image sequence using an image capture device, the method comprising using image data generated using the device and representative of an existing image of a sequence to assist a user of the device to capture one or more subsequent images for the sequence in order that said existing and the or each subsequent image are captured at substantially the same location and device orientation, and an image capture device operable to assist a user in generating an image sequence.

Book ChapterDOI
13 Jan 2006
TL;DR: A bottom-up approach that uses local image features to estimate human upper body pose from single images in cluttered backgrounds, and shows that it estimates pose with similar performance levels to current example-based methods, but unlike them it works in the presence of natural backgrounds, without any prior segmentation.
Abstract: Recovering the pose of a person from single images is a challenging problem. This paper discusses a bottom-up approach that uses local image features to estimate human upper body pose from single images in cluttered backgrounds. The method takes the image window with a dense grid of local gradient orientation histograms, followed by non negative matrix factorization to learn a set of bases that correspond to local features on the human body, enabling selective encoding of human-like features in the presence of background clutter. Pose is then recovered by direct regression. This approach allows us to key on gradient patterns such as shoulder contours and bent elbows that are characteristic of humans and carry important pose information, unlike current regressive methods that either use weak limb detectors or require prior segmentation to work. The system is trained on a database of images with labelled poses. We show that it estimates pose with similar performance levels to current example-based methods, but unlike them it works in the presence of natural backgrounds, without any prior segmentation.

Book ChapterDOI
01 Jan 2006
TL;DR: This chapter wants to give an overview of the different approaches to the computation of the structure tensor, whereas the focus lies on the methods based on robust statistics and nonlinear diffusion.
Abstract: The structure tensor, also known as second moment matrix or Forstner interest operator, is a very popular tool in image processing. Its purpose is the estimation of orientation and the local analysis of structure in general. It is based on the integration of data from a local neighborhood. Normally, this neighborhood is defined by a Gaussian window function and the structure tensor is computed by the weighted sum within this window. Some recently proposed methods, however, adapt the computation of the structure tensor to the image data. There are several ways how to do that. This chapter wants to give an overview of the different approaches, whereas the focus lies on the methods based on robust statistics and nonlinear diffusion. Furthermore, the data-adaptive structure tensors are evaluated in some applications. Here the main focus lies on optic flow estimation, but also texture analysis and corner detection are considered.

Journal ArticleDOI
TL;DR: In this article, the authors provide a comprehensive analysis of exactly what visual information about the world is embedded within a single image of an eye and provide a detailed analysis of the characteristics of the corneal imaging system including field of view, resolution and locus of viewpoints.
Abstract: This paper provides a comprehensive analysis of exactly what visual information about the world is embedded within a single image of an eye. It turns out that the cornea of an eye and a camera viewing the eye form a catadioptric imaging system. We refer to this as a corneal imaging system. Unlike a typical catadioptric system, a corneal one is flexible in that the reflector (cornea) is not rigidly attached to the camera. Using a geometric model of the cornea based on anatomical studies, its 3D location and orientation can be estimated from a single image of the eye. Once this is done, a wide-angle view of the environment of the person can be obtained from the image. In addition, we can compute the projection of the environment onto the retina with its center aligned with the gaze direction. This foveated retinal image reveals what the person is looking at. We present a detailed analysis of the characteristics of the corneal imaging system including field of view, resolution and locus of viewpoints. When both eyes of a person are captured in an image, we have a stereo corneal imaging system. We analyze the epipolar geometry of this stereo system and show how it can be used to compute 3D structure. The framework we present in this paper for interpreting eye images is passive and non-invasive. It has direct implications for several fields including visual recognition, human-machine interfaces, computer graphics and human affect studies.

Journal ArticleDOI
TL;DR: Subjects matched a sequence of two filtered images, each containing every other combination of spatial frequency and orientation, of faces or non-face 3D blobs, judging whether the person or blob was the same or different.

Journal Article
TL;DR: A new method for automatically estimating where a person is looking in images where the head is typically in the range 20 to 40 pixels high, using a feature vector based on skin detection to estimate the orientation of the head, relative to the camera.
Abstract: In this paper we describe a new method for automatically estimating where a person is looking in images where the head is typically in the range 20 to 40 pixels high. We use a feature vector based on skin detection to estimate the orientation of the head, which is discretised into 8 different orientations, relative to the camera. A fast sampling method returns a distribution over previously-seen head-poses. The overall body pose relative to the camera frame is approximated using the velocity of the body, obtained via automatically-initiated colour-based tracking in the image sequence. We show that, by combining direction and head-pose information gaze is determined more robustly than using each feature alone. We demonstrate this technique on surveillance and sports footage.

Journal ArticleDOI
TL;DR: In this article, the IBR method for Image-Based Registration (IBR) is proposed for TLS point cloud registration, which is a one-step registration of the point clouds from each scanner position.
Abstract: Building 3D models using terrestrial laser scanner (TLS) data is currently an active area of research, especially in the fields of heritage recording and site documentation. Multiple TLS scans are often required to generate an occlusion-free 3D model in situations where the object to be recorded has a complex geometry. The first task associated with building 3D models from laser scanner data in such cases is to transform the data from the scanner’s local coordinate system into a uniform Cartesian reference datum, which requires sufficient overlap between the scans. Many TLS systems are now supplied with an SLR-type digital camera, such that the scene to be scanned can also be photographed. The provision of overlapping imagery offers an alternative, photogrammetric means to achieve point cloud registration between adjacent scans. The images from the digital camera mounted on top of the laser scanner are used to first relatively orient the network of images, and then to transfer this orientation to the TLS stations to provide exterior orientation. The proposed approach, called the IBR method for Image-Based Registration, offers a one-step registration of the point clouds from each scanner position. In the case of multiple scans, exterior orientation is simultaneously determined for all TLS stations by bundle adjustment. This paper outlines the IBR method and discusses test results obtained with the approach. It will be shown that the photogrammetric orientation process for TLS point cloud registration is efficient and accurate, and offers a viable alternative to other approaches, such as the well-known iterative closest point algorithm.

Patent
02 Aug 2006
TL;DR: In this article, a system, method and device for changing a notional viewing location for a moving image on a device, depending on an orientation of the device, is described.
Abstract: The invention relates to a system, method and device for changing a notional viewing location for a moving image on a device, depending on an orientation of the device. For the moving image management system, it comprises: a sensor; a movement detection module connected to the sensor providing movement data registering a notable signal from the sensor; and a moving image adjustment module determining a new viewing location of the moving image utilizing the movement data and generating a replacement moving image for the moving image representing the moving image as viewed from the new viewing location.

Patent
25 Apr 2006
TL;DR: In this article, a 3D model of an anatomical structure is constructed based on the contours-of-interest and on the measured location and orientation coordinates of the ultrasonic sensor at each of the plurality of spatial positions.
Abstract: A method for modeling of an anatomical structure includes acquiring a plurality of ultrasonic images of the anatomical structure using an ultrasonic sensor, at a respective plurality of spatial positions of the ultrasonic sensor. Location and orientation coordinates of the ultrasonic sensor are measured at each of the plurality of spatial positions. Contours-of-interest that refer to features of the anatomical structure are marked in one or more of the ultrasonic images. A three-dimensional (3-D) model of the anatomical structure is constructed, based on the contours-of-interest and on the measured location and orientation coordinates.

Patent
18 Jan 2006
TL;DR: A laser projection system, intelligent data correction system and method which corrects for differences between the as-built condition and the as designed condition of a workpiece is discussed in this article.
Abstract: A laser projection system, intelligent data correction system and method which corrects for differences between the as-built condition and the as- designed condition of a workpiece which includes determining the as-built condition of a workpiece with a digitizer scanner and modifying data of the as-built condition or the data of a laser projection based upon the data received from the digitizer scanner of the as-built condition. A preferred intelligent data correction system includes metrology receivers fixed relative to the digitizer scanner and the workpiece and a metrology transmitter to determine the precise location and orientation of the digitizer scanner relative to the workpiece.

Journal ArticleDOI
Martin Hoheisel1
TL;DR: Three-dimensional(3D) reconstruction of image data allows much better orientation in the body, permitting a more accurate diagnosis, precise treatment planning, and image-guided therapy.
Abstract: Medical imaging can be looked at from two different perspectives, the medical and the physical. The medical point of view is application-driven and involves finding the best way of tackling a medical problem through imaging, i.e. either to answer a diagnostic question, or to facilitate a therapy. For this purpose, industry offers a broad spectrum of radiographic, fluoroscopic, and angiographic equipment. The requirements depend on the medical problem: which organs have to be imaged, which details have to be made visible, how to deal with the problem of motion if any, and so forth. In radiography, for instance, large detector sizes of up to 43 cm � 43 cm and relatively high energies are needed to image a whole chest. In mammography, pixel sizes between 25 and 70mm are favorable for good spatial resolution, which is essential for detecting microcalcifications. In cardiology, 30–60 images per second are required to follow the heart’s motion. In computed tomography, marginal contrast differences down to one Hounsfield unit have to be resolved. In all cases, but especially in pediatrics, the required radiation dose must be kept as low as reasonably achievable. Moreover, three-dimensional(3D) reconstruction of image data allows much better orientation in the body, permitting a more accurate diagnosis, precise treatment planning, and image-guided therapy. Additional functional information from different modalities is very helpful, information such as perfusion, flow rate, diffusion, oxygen concentration, metabolism, and receptor affinity for specific molecules. To visualize, functional and anatomical information are fused into one combined image. The physical point of view is technology-driven. A choice of different energies from the electromagnetic spectrum is available for imaging; not only X-rays in the range of 10–150 keV, but also g rays, which are used in nuclear medicine, X-rays in the MeV range, which are used in portal imaging to monitor radiation therapy, visible and near infrared light (1–3 eV) for retina inspection and mamma transillumination, and even Terahertz waves (0.5–20 meV) are under discussion. Feasibility is determined by the existing radiation sources, the materials available for absorbing and converting the radiation used, the microelectronic circuits for integrating or counting readout, and the computing power required to process and, where applicable, reconstruct data in real-time. Furthermore, other physical effects can be utilized such as the phase information a wave front receives when passing through an object. Some new developments will be discussed, e.g. energy-resolving methods for distinguishing different tissues in the patient, quantacounting detection, phase contrast imaging, CCDs for very high spatial resolution, fast volume CT scanners, and organic semiconductors for a new generation of detection devices. Admittedly, apart from imaging performance, economic factors also have to be taken into account.

Patent
05 Apr 2006
TL;DR: In this paper, the orientation parameters indicative of an orientation of the mobile device (1) relative to the visual background (2) are determined, and the captured visual background is displayed on the mobile devices overlaid with visual objects based on these orientation parameters.
Abstract: For executing an application (13) in a mobile device (1) comprising a camera (14), a visual background (2) is captured through the camera (14). A selected application (13) is associated with the visual background (2). The selected application (13) is executed in the mobile device (1). Determined are orientation parameters indicative of an orientation of the mobile device (1) relative to the visual background (2). Based on the orientation parameters, application-specific output signals are generated in the mobile device (1). Particularly, the captured visual background (2) is displayed on the mobile device (1 ) overlaid with visual objects based on the orientation parameters. Displaying the captured visual background with overlaid visual objects, selected and/or positioned dependent on the relative orientation of the mobile device (1 ), makes possible interactive augmented reality applications, e.g. interactive augmented reality games, controlled by the orientation of the mobile device (1) relative to the visual background (2).

Journal Article
TL;DR: In this paper, a bottom-up approach that uses local image features to estimate human upper body pose from single images in cluttered backgrounds is presented, which is trained on a database of images with labelled poses.
Abstract: Recovering the pose of a person from single images is a challenging problem. This paper discusses a bottom-up approach that uses local image features to estimate human upper body pose from single images in cluttered backgrounds. The method takes the image window with a dense grid of local gradient orientation histograms, followed by non negative matrix factorization to learn a set of bases that correspond to local features on the human body, enabling selective encoding of human-like features in the presence of background clutter. Pose is then recovered by direct regression. This approach allows us to key on gradient patterns such as shoulder contours and bent elbows that are characteristic of humans and carry important pose information, unlike current regressive methods that either use weak limb detectors or require prior segmentation to work. The system is trained on a database of images with labelled poses. We show that it estimates pose with similar performance levels to current example-based methods, but unlike them it works in the presence of natural backgrounds, without any prior segmentation.