“Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告

From the Publisher: 
The accessible presentation of this book gives both a general view of the entire computer vision enterprise and also offers sufficient detail to be able to build useful applications. Users learn techniques that have proven to be useful by first-hand experience and a wide range of mathematical methods. A CD-ROM with every copy of the text contains source code for programming practice, color images, and illustrative movies. Comprehensive and up-to-date, this book includes essential topics that either reflect practical significance or are of theoretical importance. Topics are discussed in substantial and increasing depth. Application surveys describe numerous important application areas such as image based rendering and digital libraries. Many important algorithms broken down and illustrated in pseudo code. Appropriate for use by engineers as a comprehensive reference to the computer vision enterprise.

Computer vision : a modern approach = 计算机视觉 : 一种现代的方法

We present a new approach for modeling and rendering existing architectural scenes from a sparse set of still photographs. Our modeling approach, which combines both geometry-based and imagebased techniques, has two components. The first component is a photogrammetricmodeling method which facilitates the recovery of the basic geometry of the photographed scene. Our photogrammetric modeling approach is effective, convenient, and robust because it exploits the constraints that are characteristic of architectural scenes. The second component is a model-based stereo algorithm, which recovers how the real scene deviates from the basic model. By making use of the model, our stereo technique robustly recovers accurate depth from widely-spaced image pairs. Consequently, our approach can model large architectural environments with far fewer photographs than current image-based modeling approaches. For producing renderings, we present view-dependent texture mapping, a method of compositing multiple views of a scene that better simulates geometric detail on basic models. Our approach can be used to recover models for use in either geometry-based or image-based rendering systems. We present results that demonstrate our approach’s ability to create realistic renderings of architectural scenes from viewpoints far from the original photographs. CR Descriptors: I.2.10 [Artificial Intelligence]: Vision and Scene Understanding Modeling and recovery of physical attributes; I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism Color, shading, shadowing, and texture I.4.8 [Image Processing]: Scene Analysis Stereo; J.6 [Computer-Aided Engineering]: Computer-aided design (CAD).

/pdf/modeling-and-rendering-architecture-from-photographs-a-1wvpl6aaqb.pdf

Modeling and rendering architecture from photographs: a hybrid geometry- and image-based approach

Theory of fuzzy integrals and its applications

IEEE Transactions on Pattern Analysis and Machine Intelligence

Rain streaks can severely degrade the visibility, which causes many current computer vision algorithms fail to work. So it is necessary to remove the rain from images. We propose a novel deep network architecture based on deep convolutional and recurrent neural networks for single image deraining. As contextual information is very important for rain removal, we first adopt the dilated convolutional neural network to acquire large receptive field. To better fit the rain removal task, we also modify the network. In heavy rain, rain streaks have various directions and shapes, which can be regarded as the accumulation of multiple rain streak layers. We assign different alpha-values to various rain streak layers according to the intensity and transparency by incorporating the squeeze-and-excitation block. Since rain streak layers overlap with each other, it is not easy to remove the rain in one stage. So we further decompose the rain removal into multiple stages. Recurrent neural network is incorporated to preserve the useful information in previous stages and benefit the rain removal in later stages. We conduct extensive experiments on both synthetic and real-world datasets. Our proposed method outperforms the state-of-the-art approaches under all evaluation metrics. Codes and supplementary material are available at our project webpage: https://xialipku.github.io/RESCAN.

/pdf/recurrent-squeeze-and-excitation-context-aggregation-net-for-jknjhd4zo0.pdf

Recurrent Squeeze-and-Excitation Context Aggregation Net for Single Image Deraining

Recently, manifold learning has been widely exploited in pattern recognition, data analysis, and machine learning. This paper presents a novel framework, called Riemannian manifold learning (RML), based on the assumption that the input high-dimensional data lie on an intrinsically low-dimensional Riemannian manifold. The main idea is to formulate the dimensionality reduction problem as a classical problem in Riemannian geometry, that is, how to construct coordinate charts for a given Riemannian manifold? We implement the Riemannian normal coordinate chart, which has been the most widely used in Riemannian geometry, for a set of unorganized data points. First, two input parameters (the neighborhood size k and the intrinsic dimension d) are estimated based on an efficient simplicial reconstruction of the underlying manifold. Then, the normal coordinates are computed to map the input high-dimensional data into a low- dimensional space. Experiments on synthetic data, as well as real-world images, demonstrate that our algorithm can learn intrinsic geometric structures of the data, preserve radial geodesic distances, and yield regular embeddings.

Riemannian Manifold Learning

In this paper, an efficient real-time autonomous driving motion planner with trajectory optimization is proposed. The planner first discretizes the plan space and searches for the best trajectory based on a set of cost functions. Then an iterative optimization is applied to both the path and speed of the resultant trajectory. The post-optimization is of low computational complexity and is able to converge to a higher-quality solution within a few iterations. Compared with the planner without optimization, this framework can reduce the planning time by 52% and improve the trajectory quality. The proposed motion planner is implemented and tested both in simulation and on a real autonomous vehicle in three different scenarios. Experiments show that the planner outputs high-quality trajectories and performs intelligent driving behaviors.

/pdf/a-real-time-motion-planner-with-trajectory-optimization-for-54gyy4v0du.pdf

A real-time motion planner with trajectory optimization for autonomous vehicles

Tracking generic human motion is highly challenging due to its high-dimensional state space and the various motion types involved. In order to deal with these challenges, a fusion formulation which integrates low- and high-dimensional tracking approaches into one framework is proposed. The low-dimensional approach successfully overcomes the high-dimensional problem of tracking the motions with available training data by learning motion models, but it only works with specific motion types. On the other hand, although the high-dimensional approach may recover the motions without learned models by sampling directly in the pose space, it lacks robustness and efficiency. Within the framework, the two parallel approaches, low- and high-dimensional, are fused via a probabilistic approach at each time step. This probabilistic fusion approach ensures that the overall performance of the system is improved by concentrating on the respective advantages of the two approaches and resolving their weak points. The experimental results, after qualitative and quantitative comparisons, demonstrate the effectiveness of the proposed approach in tracking generic human motion.

http://www.bmva.org/bmvc/2011/proceedings/paper57/abstract.pdf

Tracking Generic Human Motion via Fusion of Low- and High-Dimensional Approaches

Segmenting images into superpixels as supporting regions for feature vectors and primitives to reduce computational complexity has been commonly used as a fundamental step in various image analysis and computer vision tasks. In this paper, we describe the structure-sensitive superpixel technique by exploiting Lloyd’s algorithm with the geodesic distance. Our method generates smaller superpixels to achieve relatively low under-segmentation in structure-dense regions with high intensity or color variation, and produces larger segments to increase computational efficiency in structure-sparse regions with homogeneous appearance. We adopt geometric flows to compute geodesic distances amongst pixels. In the segmentation procedure, the density of over-segments is automatically adjusted through iteratively optimizing an energy functional that embeds color homogeneity, structure density. Comparative experiments with the Berkeley database show that the proposed algorithm outperforms the prior arts while offering a comparable computational efficiency as TurboPixels. Further applications in image compression, object closure extraction and video segmentation demonstrate the effective extensions of our approach.

/pdf/structure-sensitive-superpixels-via-geodesic-distance-3uw0plt24w.pdf

Hongbin Zha

Papers

Recurrent Squeeze-and-Excitation Context Aggregation Net for Single Image Deraining

Riemannian Manifold Learning

A real-time motion planner with trajectory optimization for autonomous vehicles

Tracking Generic Human Motion via Fusion of Low- and High-Dimensional Approaches

Structure-Sensitive Superpixels via Geodesic Distance