scispace - formally typeset
Search or ask a question

Showing papers on "View synthesis published in 2014"


Journal ArticleDOI
TL;DR: The problem of view synthesis is formulated as a continuous inverse problem, which allows us to correctly take into account foreshortening effects caused by scene geometry transformations, and all optimization problems are solved with state-of-the-art convex relaxation techniques.
Abstract: We develop a continuous framework for the analysis of 4D light fields, and describe novel variational methods for disparity reconstruction as well as spatial and angular super-resolution. Disparity maps are estimated locally using epipolar plane image analysis without the need for expensive matching cost minimization. The method works fast and with inherent subpixel accuracy since no discretization of the disparity space is necessary. In a variational framework, we employ the disparity maps to generate super-resolved novel views of a scene, which corresponds to increasing the sampling rate of the 4D light field in spatial as well as angular direction. In contrast to previous work, we formulate the problem of view synthesis as a continuous inverse problem, which allows us to correctly take into account foreshortening effects caused by scene geometry transformations. All optimization problems are solved with state-of-the-art convex relaxation techniques. We test our algorithms on a number of real-world examples as well as our new benchmark data set for light fields, and compare results to a multiview stereo method. The proposed method is both faster as well as more accurate. Data sets and source code are provided online for additional evaluation.

575 citations


Proceedings ArticleDOI
23 Jun 2014
TL;DR: This paper presents a surround view camera solution that consists of three key algorithm components: geometric alignment, photometric alignment, and composite view synthesis that produces a seamlessly stitched bird-eye view of the vehicle from four cameras.
Abstract: Automotive surround view camera system is an emerging automotive ADAS (Advanced Driver Assistance System) technology that assists the driver in parking the vehicle safely by allowing him/her to see a top-down view of the 360 degree surroundings of the vehicle. Such a system normally consists of four to six wide-angle (fish-eye lens) cameras mounted around the vehicle, each facing a different direction. From these camera inputs, a composite bird-eye view of the vehicle is synthesized and shown to the driver in real-time during parking. In this paper, we present a surround view camera solution that consists of three key algorithm components: geometric alignment, photometric alignment, and composite view synthesis. Our solution produces a seamlessly stitched bird-eye view of the vehicle from four cameras. It runs real-time on DSP C66x producing an 880x1080 output video at 30 fps.

83 citations


Journal ArticleDOI
TL;DR: The temporal correlation of texture and depth information is exploited to generate a background reference image that is then used to fill the holes associated with the dynamic parts of the scene, whereas for static parts the traditional inpainting method is used.
Abstract: The depth-image-based-rendering is a key technique to realize free viewpoint television. However, one critical problem in these systems is filling the disocclusion due to the 3-D warping process. This paper exploits the temporal correlation of texture and depth information to generate a background reference image. This is then used to fill the holes associated with the dynamic parts of the scene, whereas for static parts the traditional inpainting method is used. To generate the background reference image, the Gaussian mixture model is employed on the texture information, whereas, depth maps information are used to detect moving objects so as to enhance the background reference image. The proposed holes filling approach is particularly useful for the single-view-plus-depth format, where, contrary to the multi-view-plus-depth format, only information of one view could be used for this task. The experimental results show that objective and subjective gains can be achieved, and the gain ranges from 1 to 3 dB over the inpainting method.

66 citations


Proceedings ArticleDOI
23 Jun 2014
TL;DR: This paper contributes a new physics-based generative model and the corresponding Maximum a Posteriori estimate, providing the desired unification between heuristics-based methods and a Bayesian formulation and shows that the novel Bayesian model significantly improves the quality of novel views, in particular if the scene geometry estimate is inaccurate.
Abstract: In this paper, we address the problem of synthesizing novel views from a set of input images. State of the art methods, such as the Unstructured Lumigraph, have been using heuristics to combine information from the original views, often using an explicit or implicit approximation of the scene geometry. While the proposed heuristics have been largely explored and proven to work effectively, a Bayesian formulation was recently introduced, formalizing some of the previously proposed heuristics, pointing out which physical phenomena could lie behind each. However, some important heuristics were still not taken into account and lack proper formalization. We contribute a new physics-based generative model and the corresponding Maximum a Posteriori estimate, providing the desired unification between heuristics-based methods and a Bayesian formulation. The key point is to systematically consider the error induced by the uncertainty in the geometric proxy. We provide an extensive discussion, analyzing how the obtained equations explain the heuristics developed in previous methods. Furthermore, we show that our novel Bayesian model significantly improves the quality of novel views, in particular if the scene geometry estimate is inaccurate.

58 citations


Journal ArticleDOI
TL;DR: A novel zero-synthesized view difference (ZSVD) model is devised which jointly accounts for the distortion of the synthesized view induced by the compound impact of depth-disparity mapping, texture adaptation, and occlusion in the view synthesis process and can remarkably reduce the coding computational complexity with negligible performance loss.
Abstract: In this correspondence, we explore a low-complexity adaptive view synthesis optimization (VSO) scheme in the upcoming high-efficiency video coding (HEVC)-based 3-D video coding standard. We first devise a novel zero-synthesized view difference (ZSVD) model which jointly accounts for the distortion of the synthesized view induced by the compound impact of depth-disparity mapping, texture adaptation, and occlusion in the view synthesis process. This model can efficiently estimate the maximum allowable depth distortion in synthesizing a virtual view without introducing any geometry distortion. Then, an adaptive ZSVD-aware VSO scheme is proposed by incorporating the ZSVD model into the rate-distortion optimization process, which is developed by pruning the conventional view synthesis algorithm. Extensive experimental results confirm that the proposed model is capable of accurately predicting the zero distortion of the synthesized view and exhibit that the proposed ZSVD-aware VSO scheme can remarkably reduce the coding computational complexity with negligible performance loss.

40 citations


Journal ArticleDOI
TL;DR: The simulation results show that the proposed scheme can achieve approximately 5.4% and 10.2% coding gains for AVC- and HEVC-compatible 3-D coding, respectively, and the results show the remarkable complexity reduction of the scheme compared to the view synthesis optimization method currently used in3-D-HEVC.
Abstract: This paper presents an efficient view synthesis distortion estimation method for 3-D video. It also introduces the application of this method to Advanced Video Coding (AVC)- and High Efficiency Video Coding (HEVC)-compatible 3-D video coding. Although the proposed view synthesis distortion scheme is generic, its use for actual 3-D video codec systems addresses the many issues caused by different video-coding formats and restrictions. The solutions for these issues are herein proposed. The simulation results show that the proposed scheme can achieve approximately 5.4% and 10.2% coding gains for AVC- and HEVC-compatible 3-D coding, respectively. In addition, the results show the remarkable complexity reduction of the scheme compared to the view synthesis optimization method currently used in 3-D-HEVC. The proposed method has been adopted into the presently developing AVC- and HEVC-compatible test model reference software.

37 citations


Proceedings ArticleDOI
19 Mar 2014
TL;DR: A rate adaptation logic based on sampled rate-distortion (R-D) values, which relate the distortion of synthesized view to the bit rates of the texture and depth components of the reference views, is proposed to maximize the quality of rendered virtual views.
Abstract: We present an interactive free-viewpoint video (FVV) streaming system that is based on the dynamic adaptive streaming over HTTP (DASH) standard. The system uses standard HTTP Web servers to achieve scalability with a large number of users and performs view synthesis and rate adaptation at the client-side to achieve high response time. We propose a rate adaptation logic based on sampled rate-distortion (R-D) values, which relate the distortion of synthesized view to the bit rates of the texture and depth components of the reference views, to maximize the quality of rendered virtual views. Initial results indicate that the proposed R-D-based rate adaptation strategy outperforms equal bit rate allocation among the reference streams components.

35 citations


Journal ArticleDOI
TL;DR: The PBR is effective in suppressing flicker artifacts of virtual video rendering although no temporal aspect is considered, and it is shown that the depth map itself calculated from the RWR-based method (by simply choosing the most probable matching point) is also comparable with that of the state-of-the-art local stereo matching methods.
Abstract: In this paper, a probability-based rendering (PBR) method is described for reconstructing an intermediate view with a steady-state matching probability (SSMP) density function. Conventionally, given multiple reference images, the intermediate view is synthesized via the depth image-based rendering technique in which geometric information (e.g., depth) is explicitly leveraged, thus leading to serious rendering artifacts on the synthesized view even with small depth errors. We address this problem by formulating the rendering process as an image fusion in which the textures of all probable matching points are adaptively blended with the SSMP representing the likelihood that points among the input reference images are matched. The PBR hence becomes more robust against depth estimation errors than existing view synthesis approaches. The MP in the steady-state, SSMP, is inferred for each pixel via the random walk with restart (RWR). The RWR always guarantees visually consistent MP, as opposed to conventional optimization schemes (e.g., diffusion or filtering-based approaches), the accuracy of which heavily depends on parameters used. Experimental results demonstrate the superiority of the PBR over the existing view synthesis approaches both qualitatively and quantitatively. Especially, the PBR is effective in suppressing flicker artifacts of virtual video rendering although no temporal aspect is considered. Moreover, it is shown that the depth map itself calculated from our RWR-based method (by simply choosing the most probable matching point) is also comparable with that of the state-of-the-art local stereo matching methods.

34 citations


Proceedings ArticleDOI
23 Jun 2014
TL;DR: This work generates novel synthetic views of people based on a 3D appearance tensor indexed by images, viewpoints, and image positions and shows that the inferred views are both visually and quantitatively accurate.
Abstract: We pose unseen view synthesis as a probabilistic tensor completion problem. Given images of people organized by their rough viewpoint, we form a 3D appearance tensor indexed by images (pose examples), viewpoints, and image positions. After discovering the low-dimensional latent factors that approximate that tensor, we can impute its missing entries. In this way, we generate novel synthetic views of people -- even when they are observed from just one camera viewpoint. We show that the inferred views are both visually and quantitatively accurate. Furthermore, we demonstrate their value for recognizing actions in unseen views and estimating viewpoint in novel images. While existing methods are often forced to choose between data that is either realistic or multi-view, our virtual views offer both, thereby allowing greater robustness to viewpoint in novel images.

32 citations


Journal ArticleDOI
TL;DR: Object and subjective results proof the suitability of the approach to time-of-flight super resolution approach for depth scenery capture, based on the combination of depth and texture sources.
Abstract: Acquiring scenery depth is a fundamental task in computer vision, with many applications in manufacturing, surveillance, or robotics relying on accurate scenery information. Time-of-flight cameras can provide depth information in real-time and overcome short-comings of traditional stereo analysis. However, they provide limited spatial resolution and sophisticated upscaling algorithms are sought after. In this paper, we present a sensor fusion approach to time-of-flight super resolution, based on the combination of depth and texture sources. Unlike other texture guided approaches, we interpret the depth upscaling process as a weighted energy optimization problem. Three different weights are introduced, employing different available sensor data. The individual weights address object boundaries in depth, depth sensor noise, and temporal consistency. Applied in consecutive order, they form three weighting strategies for time-of-flight super resolution. Objective evaluations show advantages in depth accuracy and for depth image based rendering compared with state-of-the-art depth upscaling. Subjective view synthesis evaluation shows a significant increase in viewer preference by a factor of four in stereoscopic viewing conditions. To the best of our knowledge, this is the first extensive subjective test performed on time-of-flight depth upscaling. Objective and subjective results proof the suitability of our approach to time-of-flight super resolution approach for depth scenery capture.

30 citations


Patent
24 Mar 2014
TL;DR: In this paper, a first depth map is generated from stereo images by stereo-matching and a disparity fallback selects a second depth map from a single view without stereo matching, preventing stereo matching errors from producing visible artifacts or flickering.
Abstract: Multi view images are generated with reduced flickering. A first depth map is generated from stereo images by stereo-matching. When stereo-matching is poor or varies too much from frame to frame, disparity fallback selects a second depth map that is generated from a single view without stereo-matching, preventing stereo-matching errors from producing visible artifacts or flickering. Flat or textureless regions can use the second depth map, while regions with good stereo-matching use the first depth map. Depth maps are generated with a one-frame delay and buffered. Low-cost temporal coherence reduces costs used for stereo-matching when the pixel location selected as the lowest-cost disparity is within a distance threshold of the same pixel in a last frame. Hybrid view synthesis uses forward mapping for smaller numbers of views, and backward mapping from the forward-mapping results for larger numbers of views. Rotated masks are generated on-the-fly for backward mapping.

Patent
07 Apr 2014
TL;DR: In this article, the type of prediction used for a reference picture index may be signaled in the video bit-stream, and the omission of motion vectors from the video bits-stream for a certain image element may also be signaled; signaling may indicate to the decoder that motion vectors used in prediction are to be construed at decoding.
Abstract: There are disclosed various methods, apparatuses and computer program products for video encoding. The type of prediction used for a reference picture index may be signaled in the video bit-stream. The omission of motion vectors from the video bit-stream for a certain image element may also be signaled; signaling may indicate to the decoder that motion vectors used in prediction are to be construed at the decoder. The construction of motion vectors may take place by using disparity information that has been obtained from depth information of the picture being used as a reference.

Proceedings ArticleDOI
29 Oct 2014
TL;DR: Experimental results show that the proposed method significantly improves the inter-view consistency for multiview images and depth maps, compared to those of previous methods.
Abstract: This paper proposes a new inter-view consistent hole filling method in view extrapolation for multi-view image generation. In stereopsis, inter-view consistency regarding structure, color, and luminance is one of the crucial factors that affect the overall viewing quality of three-dimensional image contents. In particular, the inter-view inconsistency could induce visual stress on the human visual system. To ensure the inter-view consistency, the proposed method suggests a hole filling method in an order from the nearest to farthest view to the reference view by propagating the filled color information in the preceding view. In addition, a novel depth map filling method is incorporated to achieve the inter-view consistency. Experimental results show that the proposed method significantly improves the inter-view consistency for multiview images and depth maps, compared to those of previous methods.

Patent
10 Oct 2014
TL;DR: In this article, a technique for computing an image of a virtual view based on a plurality of camera views is presented, where two or three camera views that at least partially overlap and overlap with the virtual view are selected among the plurality of views.
Abstract: A technique for computing an image of a virtual view based on a plurality of camera views is presented. One or more cameras provide the plurality of camera views. As to a method aspect of the technique, two or three camera views that at least partially overlap and that at least partially overlap with the virtual view are selected among the plurality of camera views. The image of the virtual view is computed based on objects in the selected camera views using a multilinear relation that relates the selected camera views and the virtual view.

Journal ArticleDOI
TL;DR: The experimental results show that the proposed scheme not only achieves high view synthesis performance, but also reduce the computational complexity of encoding.
Abstract: In 3-D video, view synthesis with depth-image-based rendering is employed to generate any virtual view between available camera views. Distortions in depth map induce geometry changes in the virtual views, and thus degrade the performance of view synthesis. This paper proposes a depth map coding method to improve the performance of view synthesis based on distortion analyses. The major technical innovation of this paper is to formulate maximum tolerable depth distortion (MTDD) and depth disocclusion mask (DDM), since such depth sensitivity for view synthesis and inter-view redundancy can be well utilized in coding. To be more specific, we define two different encoders (e.g., base encoder and side encoder) for depth maps in left and right views, respectively. For base encoding, different types of coding units are extracted based on the distribution of MTDD and assigned with different quantitative parameters for coding. For side encoding, a warped-SKIP mode is designed to remove inter-view redundancy based on the distribution of DDM. The experimental results show that the proposed scheme not only achieves high view synthesis performance, but also reduce the computational complexity of encoding.

Journal ArticleDOI
TL;DR: An allowable depth distortion (ADD) model is presented for 3D depth map coding, and an ADD-based rate-distortion model is proposed for mode decision and motion/disparity estimation modules aiming at minimizing view synthesis distortion at a given bit rate constraint.
Abstract: Depth video is used as the geometrical information of 3D world scenes in 3D view synthesis. Due to the mismatch between the number of depth levels and disparity levels in the view synthesis, the relationship between depth distortion and rendering position error can be modeled as a many-to-one mapping function, in which different depth distortion values might be projected to the same geometrical distortion in the synthesized virtual view image. Based on this property, we present an allowable depth distortion (ADD) model for 3D depth map coding. Then, an ADD-based rate-distortion model is proposed for mode decision and motion/disparity estimation modules aiming at minimizing view synthesis distortion at a given bit rate constraint. In addition, an ADD-based depth bit reduction algorithm is proposed to further reduce the depth bit rate while maintaining the qualities of the synthesized images. Experimental results in intra depth coding show that the proposed overall algorithm achieves Bjontegaard delta peak signal-to-noise ratio gains of 1.58 and 2.68 dB on average for half and integer-pixel rendering precisions, respectively. In addition, the proposed algorithms are also highly efficient for inter depth coding when evaluated with different metrics.

Journal ArticleDOI
TL;DR: The proposed signal representation improves the interactivity of dense point-based methods, making them appropriate for modeling the scene semantics and free-viewpoint 3DTV applications, and a ''selective'' warping technique is proposed that takes the advantage of temporal coherence to reduce the computational overhead.

Journal ArticleDOI
Xinchen Ye1, Jingyu Yang1, Hao Huang1, Chunping Hou1, Yao Wang2 
TL;DR: Experimental results show that the proposed lightweight multiview imaging approach with Kinect, a handheld integrated depth-color camera, under the depth-image-based rendering framework restores high quality depth maps even for large missing areas, and synthesizes naturalMultiview images from restored depth maps.
Abstract: The lack of 3-D content has become a bottleneck for the advancement of three-dimensional television (3-DTV), but conventional multicamera arrays for multiview imaging are expensive to setup and cumbersome to use. This paper proposes a lightweight multiview imaging approach with Kinect, a handheld integrated depth-color camera, under the depth-image-based rendering framework. The proposed method consists of two components: depth restoration from noisy and incomplete depth measurements and view synthesis from depth-color pairs. In depth restoration, we propose a moving 2-D polynomial approximation via least squares to suppress quantization errors in the acquired depth values, and propose a progressive edge-guided trilateral filter to fill missing areas of the depth map. Edges extracted from color image are used to predict the locations of depth discontinuities in missing areas and to guide the proposed trilateral filter avoiding filtering across discontinuities. In view synthesis, we propose a low-rank matrix restoration model to inpaint disocclusion regions, fully exploiting the nonlocal correlations in images, and devise an efficient algorithm under the augmented lagrange multiplier (ALM) framework. Disocclusion areas are inpainted progressively from the boundaries of disocclusion with an estimated priority consisting of four terms: warping term, reliability term, texture term, and depth term. Experimental results show that our method restores high quality depth maps even for large missing areas, and synthesizes natural multiview images from restored depth maps. Strong 3-D visual experiences are observed when the synthesized multiview images are shown in two types of stereoscopic displays.

Proceedings ArticleDOI
01 Jan 2014
TL;DR: This work extends novel-view image synthesis from the common diffuse and opaque image formation model to the reflective and refractive case, using a ray tree of RGBZ images, where each node contains one RGB light path which is to be warped differently depending on the depth Z and the type of path.
Abstract: We extend novel-view image synthesis from the common diffuse and opaque image formation model to the reflective and refractive case. Our approach uses a ray tree of RGBZ images, where each node contains one RGB light path which is to be warped differently depending on the depth Z and the type of path. Core of our approach are two efficient procedures for reflective and refractive warping. Different from the diffuse and opaque case, no simple direct solution exists for general geometry. Instead, a per-pixel optimization in combination with informed initial guesses warps an HD image with reflections and refractions in 18 ms on a current mobile GPU. The key application is latency avoidance in remote rendering in particular for head-mounted displays. Other applications are single-pass stereo or multi-view, motion blur and depth-of-field rendering as well as their combinations.

Proceedings ArticleDOI
02 Jul 2014
TL;DR: MPEG started the third phase of FTV standardization in August 2013, targeting super multiview and free navigation applications, which need more flexible camera arrangement, more efficient video coding and better view synthesis.
Abstract: FTV (Free-viewpoint TV) enables to view a 3D world by freely changing the viewpoint. MPEG has been developing FTV standards since 2001. MVC (Multiview Video Coding) was the first phase of FTV, which enabled the efficient coding of multiview video. View synthesis is not considered in MVC. 3DV (3D Video) is the second phase of FTV, which enables the efficient coding of multiview video and their depth maps for multiview displays. View synthesis between linearly arranged cameras is considered in 3DV. Based on recent development of 3D technology, MPEG started the third phase of FTV in August 2013, targeting super multiview and free navigation applications. They need more flexible camera arrangement, more efficient video coding and better view synthesis. The vision of this FTV standardization is to establish a new FTV framework that revolutionizes viewing of 3D scenes.

Journal ArticleDOI
TL;DR: An efficient bit allocation algorithm based on a novel view synthesis distortion model is proposed for the rate-distortion optimized coding of multiview video plus depth sequences in this paper, which can optimally divide a limited bit budget between the texture and depth data.
Abstract: An efficient bit allocation algorithm based on a novel view synthesis distortion model is proposed for the rate-distortion optimized coding of multiview video plus depth sequences in this paper. We decompose an input frame into nonedge blocks and edge blocks. For each nonedge block, we linearly approximate its texture and disparity values, and derive a view synthesis distortion model, which quantifies the impacts of the texture and depth distortions on the qualities of synthesized virtual views. On the other hand, for each edge block, we use its texture and disparity gradients for the distortion model. In addition, we formulate a bit-rate allocation problem in terms of the quantization parameters for texture and depth data. By solving the problem, we can optimally divide a limited bit budget between the texture and depth data, in order to maximize the qualities of synthesized virtual views, as well as those of encoded real views. Experimental results demonstrate that the proposed algorithm yields the average PSNR gains of 1.98 and 2.04 dB in two-view and three-view scenarios, respectively, as compared with a benchmark conventional algorithm.

Proceedings ArticleDOI
01 Nov 2014
TL;DR: It is demonstrated that encoding inter-coded depth block residuals with quantization at pixel domain is more efficient than the intra-coding techniques relying on explicit edge preservation.
Abstract: With the growing demands for 3D and multi-view video content, efficient depth data coding becomes a vital issue in image and video coding area. In this paper, we propose a simple depth coding scheme using multiple prediction modes exploiting temporal correlation of depth map. Current depth coding techniques mostly depend on intra-coding mode that cannot get the advantage of temporal redundancy in the depth maps and higher spatial redundancy in inter-predicted depth residuals. Depth maps are characterized by smooth regions with sharp edges that play an important role in the view synthesis process. As depth maps are more sensitive to coding errors, use of transformation or approximation of edges by explicit edge modelling has impact on view synthesis quality. Moreover, lossy compression of depth map brings additional geometrical distortion to synthetic view. In this paper, we have demonstrated that encoding inter-coded depth block residuals with quantization at pixel domain is more efficient than the intra-coding techniques relying on explicit edge preservation. On standard 3D video sequences, the proposed depth coding has achieved superior image quality of synthesized views against the new 3D-HEVC standard for depth map bit-rate 0.25 bpp or higher.

Proceedings ArticleDOI
01 Oct 2014
TL;DR: It is shown that it is possible to significantly improve the visual quality of the interpolated view by enforcing prior knowledge on the admissible deformations of edges under projective transformation and results show that the proposed approach is very effective.
Abstract: Depth image based rendering is a well-known technology for the generation of virtual views in between a limited set of views acquired by a cameras array Intermediate views are rendered by warping image pixels based on their depth Nonetheless, depth maps are usually imperfect as they need to be estimated through stereo matching algorithms; moreover, for representation and transmission requirements depth values are obviously quantized Such depth representation errors translate into a warping error when generating intermediate views thus impacting on the rendered image quality We observe that depth errors turn to be very critical when they affect the object contours since in such a case they cause significant structural distortion in the warped objects This paper presents an algorithm to improve the visual quality of the synthesized views by enforcing the shape of the edges in presence of erroneous depth estimates We show that it is possible to significantly improve the visual quality of the interpolated view by enforcing prior knowledge on the admissible deformations of edges under projective transformation Both visual and objective results show that the proposed approach is very effective

Proceedings ArticleDOI
01 Sep 2014
TL;DR: Analyzing the performance of several commonly used objective quality metrics on FVV sequences, which were synthesized from decompressed depth data, using subjective scores as ground truth showed that commonly used metrics were not reliable predictors of perceived image quality when different contents and distortions were considered.
Abstract: Free-viewpoint television is expected to create a more natural and interactive viewing experience by providing the ability to interactively change the viewpoint to enjoy a 3D scene. To render new virtual viewpoints, free-viewpoint systems rely on view synthesis. However, it is known that most objective metrics fail at predicting perceived quality of synthesized views. Therefore, it is legitimate to question the reliability of commonly used objective metrics to assess the quality of free-viewpoint video (FVV) sequences. In this paper, we analyze the performance of several commonly used objective quality metrics on FVV sequences, which were synthesized from decompressed depth data, using subjective scores as ground truth. Statistical analyses showed that commonly used metrics were not reliable predictors of perceived image quality when different contents and distortions were considered. However, the correlation improved when considering individual conditions, which indicates that the artifacts produced by some view synthesis algorithms might not be correctly handled by current metrics.

Patent
02 Apr 2014
TL;DR: In this article, a method and apparatus for a three-dimensional encoding or decoding system incorporating view synthesis prediction (VSP) with reduced computational complexity and/or memory access bandwidth are disclosed.
Abstract: A method and apparatus for a three-dimensional encoding or decoding system incorporating view synthesis prediction (VSP) with reduced computational complexity and/or memory access bandwidth are disclosed. The system applies the VSP process to the texture data only and applies non-VSP process to the depth data. Therefore, when a current texture block in a dependent view is coded according to VSP by backward warping the current texture block to the reference picture using an associated depth block and the motion parameter inheritance (MPI) mode is selected for the corresponding depth block in the dependent view, the corresponding depth block in the dependent view is encoded or decoded using non-VSP inter- view prediction based on motion information inherited from the current texture block.

Proceedings ArticleDOI
02 Jul 2014
TL;DR: An experimental multiview video production, processing and delivery chain developed at Poznan University of Technology for research on free-viewpoint television with no cabling is needed in the system, which is important for shooting real-world events.
Abstract: The paper describes an experimental multi-view video production, processing and delivery chain developed at Poznan University of Technology for research on free-viewpoint television. The multiview-video acquisition system consists of HD camera units with wireless synchronization, wireless control, video storage and power supply units. Therefore no cabling is needed in the system that is very important for shooting real-world events. The system is mostly used for nearly circular setup of cameras but the locations of cameras are arbitrary, and the procedures for system calibration and multiview video correction are considered. The paper deals also with adoption for circular camera arrangement of the techniques implemented in Depth Estimation Reference Software and View Synthesis Reference Software.

Patent
Ying Chen1, Ye-Kui Wang1, Li Zhang1
10 Jan 2014
TL;DR: In this paper, a method of decoding video data includes determining whether a reference index for a current block corresponds to an inter-view reference picture, and when the reference index of the current block correspond to the reference picture.
Abstract: In an example, a method of decoding video data includes determining whether a reference index for a current block corresponds to an inter-view reference picture, and when the reference index for the current block corresponds to the inter-view reference picture, obtaining, from an encoded bitstream, data indicating a view synthesis prediction (VSP) mode of the current block, where the VSP mode for the reference index indicates whether the current block is predicted with view synthesis prediction from the inter-view reference picture.

Journal ArticleDOI
TL;DR: A novel view synthesis method named Visto, which uses a reference input view to generate synthesized views in nearby viewpoints that tends to implicitly inherit the image characteristics from the reference view without the explicit use of image priors or texture modeling.
Abstract: In this paper, we present a novel view synthesis method named Visto, which uses a reference input view to generate synthesized views in nearby viewpoints. We formulate the problem as a joint optimization of inter-view texture and depth map similarity, a framework that is significantly different from other traditional approaches. As such, Visto tends to implicitly inherit the image characteristics from the reference view without the explicit use of image priors or texture modeling. Visto assumes that each patch is available in both the synthesized and reference views and thus can be applied to the common area between the two views but not the out-of-region area at the border of the synthesized view. Visto uses a Gauss-Seidel-like iterative approach to minimize the energy function. Simulation results suggest that Visto can generate seamless virtual views and outperform other state-of-the-art methods.

Proceedings ArticleDOI
02 Jul 2014
TL;DR: This work proposes to improve the view synthesis step by per-pixel selection of the projection method and can improve the depth-based super resolution upsampling by 0.64% of dBR on average for total coded bitrate and 0.55% of ddBR for synthesized views.
Abstract: Advances in 3D video technology expand the availability of hardware for 3D video generation and display Some 3D capture and coding arrangements take advantage of a mixed resolution setup In difference to typical 3D multiview video the mixed resolution scenario assumes that a subset of views is coded at reduced spatial resolution After decoding, the low resolution views have to upsampled in order to preserve homogeneous resolution of the rendered 3D video We improve the depth-based super resolution technique that uses a view synthesis process as an essential step We propose to improve the view synthesis step by per-pixel selection of the projection method The method is tested on the data coded using a mixed resolution 3D video codec implemented on top of the 3DV-ATM reference software and evaluated under the JCT-3V common test conditions Simulation results show that our method can improve the depth-based super resolution upsampling by 064% of dBR on average for total coded bitrate and 055% of dBR on average for synthesized views Aggregated coding gain with respect to full resolution scenario is improved by 771% of dBR on average for total coded bitrate and 111% dBR for synthesized views

Patent
Jin-Young Lee1, Byeong-Doo Choi1, Min-Woo Park1, Ho-Cheon Wey1, Jae-Won Yoon1, Yong-jin Cho1 
17 Apr 2014
TL;DR: In this paper, a multi-view video decoding apparatus and method and a multiview encoding apparatus are presented. But the decoding method is not described. But it is assumed that the decoding is performed on an adjacent block of the current block.
Abstract: Provided are a multi-view video decoding apparatus and method and a multi-view encoding apparatus and method. The decoding method includes: determining whether a prediction mode of a current block being decoded is a merge mode; when the prediction mode is determined to be the merge mode, forming a merge candidate list including at least one of an inter-view candidate, a spatial candidate, a disparity candidate, a view synthesis prediction candidate, and a temporal candidate; and predicting the current block by selecting a merge candidate for predicting the current block from the merge candidate list, wherein whether to include, in the merge candidate list, at least one of a view synthesis prediction candidate for an adjacent block of the current block and a view synthesis prediction candidate for the current block is determined based on whether view synthesis prediction is performed on the adjacent block and the current block.