scispace - formally typeset
Search or ask a question
Author

Pengfei Wan

Bio: Pengfei Wan is an academic researcher from Hong Kong University of Science and Technology. The author has contributed to research in topics: Computer science & Motion estimation. The author has an hindex of 8, co-authored 23 publications receiving 212 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: This work argues that a graph-signal smoothness prior-one defined on a graph embedding the image structure-is an appropriate prior for the bit-depth enhancement problem, and proposes an efficient approximation strategy that estimates the ac component of the desired signal in a maximum a posteriori formulation, efficiently computed via convex programming.
Abstract: When images at low bit-depth are rendered at high bit-depth displays, missing least significant bits needs to be estimated. We study the image bit-depth enhancement problem: estimating an original image from its quantized version from a minimum mean squared error (MMSE) perspective. We first argue that a graph-signal smoothness prior-one defined on a graph embedding the image structure-is an appropriate prior for the bit-depth enhancement problem. We next show that directly solving for the MMSE solution is, in general, too computationally expensive to be practical. We then propose an efficient approximation strategy. In particular, we first estimate the ac component of the desired signal in a maximum a posteriori formulation, efficiently computed via convex programming. We then compute the dc component with an MMSE criterion in a closed form given the computed ac component. Experiments show that our proposed two-step approach has improved performance over the conventional bit-depth enhancement schemes in both objective and subjective comparisons.

42 citations

Proceedings ArticleDOI
09 Jul 2012
TL;DR: This paper proposes a novel method to generate high bit-depth (HBD) images from a single low bit- depth (LBD) image by reconstructing the least significant bits (LSBs) for the LBD image after it is rescaled to highbit-depth.
Abstract: In this paper, we address the problem of image bit-depth expansion and present a novel method to generate high bit-depth (HBD) images from a single low bit-depth (LBD) image. We expand image bit-depth by reconstructing the least significant bits (LSBs) for the LBD image after it is rescaled to high bit-depth. For image regions whose intensities are neither locally maximum nor minimum, neighborhood flooding is applied to convert 2D interpolation problem into 1D interpolation, for local maxima/minima (LMM) regions where interpolation is not applicable, a virtual skeleton marking algorithm is proposed to convert problematic 2D extrapolation problem into 1D interpolation. At last, a content-adaptive reconstruction model is proposed to obtain the output HBD image. The experimental results show that proposed method significantly outperforms existing methods in PSNR and SSIM without contouring artifacts.

42 citations

Proceedings ArticleDOI
19 May 2013
TL;DR: The approach is to arrange all the similar images into tree structure then apply video coding technique along each branch to maximize the inter-image correlation between adjacent photos, and considers the minimum spanning tree (MST) subjecting to a maximum depth limit to ensure fast access to all images.
Abstract: The advance in multimedia technologies have resulted in an explosive growth of pictures in personal computers and in cloud. Typically many pictures taken in the same occasion are similar. The cost to store and transmit them can be very significant. Thus it is important to find an efficient method to store these pictures. This paper proposed a compression scheme for similar images. Our approach is to arrange all the similar images into tree structure then apply video coding technique along each branch. To maximize the inter-image correlation between adjacent photos, we consider the minimum spanning tree (MST) subjecting to a maximum depth limit to ensure fast access to all images. This structure is encoded by the latest video coding technique High Efficiency Video Coding (HEVC), which is reported to has advantage in high definition video/image compression. It also supports deleting, adding and modifying images. Experiments show that the proposed method saved 75% space comparing to JPEG format.

31 citations

Proceedings ArticleDOI
01 Oct 2014
TL;DR: This paper defines smoothness via a signal-dependent graph Laplacian, and argues that MAP can still be used to efficiently estimate the AC component of the desired HBD signal, which along with a distortion-minimizing DC component can result in a good approximate solution that minimizes the expected distortion.
Abstract: While modern displays offer high dynamic range (HDR) with large bit-depth for each rendered pixel, the bulk of legacy image and video contents were captured using cameras with shallower bit-depth. In this paper, we study the bit-depth enhancement problem for images, so that a high bit-depth (HBD) image can be reconstructed from an input low bit-depth (LBD) image. The key idea is to apply appropriate smoothing given the constraints that reconstructed signal must lie within the per-pixel quantization bins. Specifically, we first define smoothness via a signal-dependent graph Laplacian, so that natural image gradients can nonetheless be interpreted as low frequencies. Given defined smoothness prior and observed LBD image, we then demonstrate that computing the most probable signal via maximum a posteriori (MAP) estimation can lead to large expected distortion. However, we argue that MAP can still be used to efficiently estimate the AC component of the desired HBD signal, which along with a distortion-minimizing DC component, can result in a good approximate solution that minimizes the expected distortion. Experimental results show that our proposed method outperforms existing bit-depth enhancement methods in terms of reconstruction error.

29 citations

Journal ArticleDOI
TL;DR: A novel neural network is designed, named PMP-Net++, to mimic behavior of an earth mover, which improves quality of predicted complete shape in point cloud completion and introduces a transformer-enhanced representation learning network, which significantly improves completion performance.
Abstract: Point cloud completion concerns to predict missing part for incomplete 3D shapes. A common strategy is to generate complete shape according to incomplete input. However, unordered nature of point clouds will degrade generation of high-quality 3D shapes, as detailed topology and structure of unordered points are hard to be captured during the generative process using an extracted latent code. We address this problem by formulating completion as point cloud deformation process. Specifically, we design a novel neural network, named PMP-Net++, to mimic behavior of an earth mover. It moves each point of incomplete input to obtain a complete point cloud, where total distance of point moving paths (PMPs) should be the shortest. Therefore, PMP-Net++ predicts unique PMP for each point according to constraint of point moving distances. The network learns a strict and unique correspondence on point-level, and thus improves quality of predicted complete shape. Moreover, since moving points heavily relies on per-point features learned by network, we further introduce a transformer-enhanced representation learning network, which significantly improves completion performance of PMP-Net++. We conduct comprehensive experiments in shape completion, and further explore application on point cloud up-sampling, which demonstrate non-trivial improvement of PMP-Net++ over state-of-the-art point cloud completion/up-sampling methods.

23 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Experimental results show that the proposed multiresolution-GFT scheme outperforms H.264 intra by 6.8 dB on average in peak signal-to-noise ratio at the same bit rate.
Abstract: Piecewise smooth (PWS) images (e.g., depth maps or animation images) contain unique signal characteristics such as sharp object boundaries and slowly varying interior surfaces. Leveraging on recent advances in graph signal processing, in this paper, we propose to compress the PWS images using suitable graph Fourier transforms (GFTs) to minimize the total signal representation cost of each pixel block, considering both the sparsity of the signal’s transform coefficients and the compactness of transform description. Unlike fixed transforms, such as the discrete cosine transform, we can adapt GFT to a particular class of pixel blocks. In particular, we select one among a defined search space of GFTs to minimize total representation cost via our proposed algorithms, leveraging on graph optimization techniques, such as spectral clustering and minimum graph cuts. Furthermore, for practical implementation of GFT, we introduce two techniques to reduce computation complexity. First, at the encoder, we low-pass filter and downsample a high-resolution (HR) pixel block to obtain a low-resolution (LR) one, so that a LR-GFT can be employed. At the decoder, upsampling and interpolation are performed adaptively along HR boundaries coded using arithmetic edge coding, so that sharp object boundaries can be well preserved. Second, instead of computing GFT from a graph in real-time via eigen-decomposition, the most popular LR-GFTs are pre-computed and stored in a table for lookup during encoding and decoding. Using depth maps and computer-graphics images as examples of the PWS images, experimental results show that our proposed multiresolution-GFT scheme outperforms H.264 intra by 6.8 dB on average in peak signal-to-noise ratio at the same bit rate.

225 citations

Journal ArticleDOI
TL;DR: In this article, a graph Laplacian regularizer is proposed for image denoising in the continuous domain, and the convergence of the regularizer to a continuous domain functional is analyzed.
Abstract: Inverse imaging problems are inherently underdetermined, and hence, it is important to employ appropriate image priors for regularization. One recent popular prior—the graph Laplacian regularizer—assumes that the target pixel patch is smooth with respect to an appropriately chosen graph. However, the mechanisms and implications of imposing the graph Laplacian regularizer on the original inverse problem are not well understood. To address this problem, in this paper, we interpret neighborhood graphs of pixel patches as discrete counterparts of Riemannian manifolds and perform analysis in the continuous domain, providing insights into several fundamental aspects of graph Laplacian regularization for image denoising. Specifically, we first show the convergence of the graph Laplacian regularizer to a continuous-domain functional, integrating a norm measured in a locally adaptive metric space. Focusing on image denoising, we derive an optimal metric space assuming non-local self-similarity of pixel patches, leading to an optimal graph Laplacian regularizer for denoising in the discrete domain. We then interpret graph Laplacian regularization as an anisotropic diffusion scheme to explain its behavior during iterations, e.g., its tendency to promote piecewise smooth signals under certain settings. To verify our analysis, an iterative image denoising algorithm is developed. Experimental results show that our algorithm performs competitively with state-of-the-art denoising methods, such as BM3D for natural images, and outperforms them significantly for piecewise smooth images.

180 citations

Journal ArticleDOI
TL;DR: This paper proposed a method for editing images from human instructions, where given an input image and a written instruction that tells the model what to do, the model follows these instructions to edit the image.
Abstract: We propose a method for editing images from human instructions: given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image. To obtain training data for this problem, we combine the knowledge of two large pretrained models -- a language model (GPT-3) and a text-to-image model (Stable Diffusion) -- to generate a large dataset of image editing examples. Our conditional diffusion model, InstructPix2Pix, is trained on our generated data, and generalizes to real images and user-written instructions at inference time. Since it performs edits in the forward pass and does not require per example fine-tuning or inversion, our model edits images quickly, in a matter of seconds. We show compelling editing results for a diverse collection of input images and written instructions.

153 citations

Book
24 Aug 2021
TL;DR: In this article, the authors overview graph spectral techniques in graph signal processing (GSP) specifically for image/video processing, including image compression, image restoration, image filtering, and image segmentation.
Abstract: Recent advent of graph signal processing (GSP) has spurred intensive studies of signals that live naturally on irregular data kernels described by graphs (e.g., social networks, wireless sensor networks). Though a digital image contains pixels that reside on a regularly sampled 2-D grid, if one can design an appropriate underlying graph connecting pixels with weights that reflect the image structure, then one can interpret the image (or image patch) as a signal on a graph, and apply GSP tools for processing and analysis of the signal in graph spectral domain. In this paper, we overview recent graph spectral techniques in GSP specifically for image/video processing. The topics covered include image compression, image restoration, image filtering, and image segmentation.

126 citations

Journal ArticleDOI
TL;DR: Li et al. as discussed by the authors proposed a graph-signal smoothness prior (LERaG) based on the left eigenvectors of the random walk graph Laplacian matrix, which has desirable image filtering properties with low computation overhead.
Abstract: Given the prevalence of joint photographic experts group (JPEG) compressed images, optimizing image reconstruction from the compressed format remains an important problem. Instead of simply reconstructing a pixel block from the centers of indexed discrete cosine transform (DCT) coefficient quantization bins (hard decoding), soft decoding reconstructs a block by selecting appropriate coefficient values within the indexed bins with the help of signal priors. The challenge thus lies in how to define suitable priors and apply them effectively. In this paper, we combine three image priors—Laplacian prior for DCT coefficients, sparsity prior, and graph-signal smoothness prior for image patches—to construct an efficient JPEG soft decoding algorithm. Specifically, we first use the Laplacian prior to compute a minimum mean square error initial solution for each code block. Next, we show that while the sparsity prior can reduce block artifacts, limiting the size of the overcomplete dictionary (to lower computation) would lead to poor recovery of high DCT frequencies. To alleviate this problem, we design a new graph-signal smoothness prior (desired signal has mainly low graph frequencies) based on the left eigenvectors of the random walk graph Laplacian matrix (LERaG). Compared with the previous graph-signal smoothness priors, LERaG has desirable image filtering properties with low computation overhead. We demonstrate how LERaG can facilitate recovery of high DCT frequencies of a piecewise smooth signal via an interpretation of low graph frequency components as relaxed solutions to normalized cut in spectral clustering. Finally, we construct a soft decoding algorithm using the three signal priors with appropriate prior weights. Experimental results show that our proposal outperforms the state-of-the-art soft decoding algorithms in both objective and subjective evaluations noticeably.

114 citations