scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Data-driven hallucination of different times of day from a single outdoor photo

TL;DR: This paper introduces the first data-driven approach to automatically creating a plausible-looking photo that appears as though it were taken at a different time of day, using a database of time-lapse videos of various scenes.
Abstract: We introduce "time hallucination": synthesizing a plausible image at a different time of day from an input image. This challenging task often requires dramatically altering the color appearance of the picture. In this paper, we introduce the first data-driven approach to automatically creating a plausible-looking photo that appears as though it were taken at a different time of day. The time of day is specified by a semantic time label, such as "night".Our approach relies on a database of time-lapse videos of various scenes. These videos provide rich information about the variations in color appearance of a scene throughout the day. Our method transfers the color appearance from videos with a similar scene as the input photo. We propose a locally affine model learned from the video for the transfer, allowing our model to synthesize new color data while retaining image details. We show that this model can hallucinate a wide range of different times of day. The model generates a large sparse linear system, which can be solved by off-the-shelf solvers. We validate our methods by synthesizing transforming photos of various outdoor scenes to four times of interest: daytime, the golden hour, the blue hour, and nighttime.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
21 Jul 2017
TL;DR: Conditional adversarial networks are investigated as a general-purpose solution to image-to-image translation problems and it is demonstrated that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.
Abstract: We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Moreover, since the release of the pix2pix software associated with this paper, hundreds of twitter users have posted their own artistic experiments using our system. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without handengineering our loss functions either.

11,958 citations


Cites background from "Data-driven hallucination of differ..."

  • ..., [14, 23, 18, 8, 10, 50, 30, 36, 16, 55, 58]), despite the fact that the setting is always the same: predict pixels from pixels....

    [...]

Posted Content
TL;DR: Conditional Adversarial Network (CA) as discussed by the authors is a general-purpose solution to image-to-image translation problems, which can be used to synthesize photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.
Abstract: We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Indeed, since the release of the pix2pix software associated with this paper, a large number of internet users (many of them artists) have posted their own experiments with our system, further demonstrating its wide applicability and ease of adoption without the need for parameter tweaking. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.

11,127 citations

Posted Content
TL;DR: This work presents an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples, and introduces a cycle consistency loss to push F(G(X)) ≈ X (and vice versa).
Abstract: Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. However, for many tasks, paired training data will not be available. We present an approach for learning to translate an image from a source domain $X$ to a target domain $Y$ in the absence of paired examples. Our goal is to learn a mapping $G: X \rightarrow Y$ such that the distribution of images from $G(X)$ is indistinguishable from the distribution $Y$ using an adversarial loss. Because this mapping is highly under-constrained, we couple it with an inverse mapping $F: Y \rightarrow X$ and introduce a cycle consistency loss to push $F(G(X)) \approx X$ (and vice versa). Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc. Quantitative comparisons against several prior methods demonstrate the superiority of our approach.

4,465 citations

Book ChapterDOI
08 Oct 2016
TL;DR: This paper proposes to learn the natural image manifold directly from data using a generative adversarial neural network, and defines a class of image editing operations, and constrain their output to lie on that learned manifold at all times.
Abstract: Realistic image manipulation is challenging because it requires modifying the image appearance in a user-controlled way, while preserving the realism of the result. Unless the user has considerable artistic skill, it is easy to “fall off” the manifold of natural images while editing. In this paper, we propose to learn the natural image manifold directly from data using a generative adversarial neural network. We then define a class of image editing operations, and constrain their output to lie on that learned manifold at all times. The model automatically adjusts the output keeping all edits as realistic as possible. All our manipulations are expressed in terms of constrained optimization and are applied in near-real time. We evaluate our algorithm on the task of realistic photo manipulation of shape and color. The presented method can further be used for changing one image to look like the other, as well as generating novel imagery from scratch based on user’s scribbles.

1,116 citations


Cites methods from "Data-driven hallucination of differ..."

  • ...The data term relaxes the color constancy assumption by introducing a locally affine color transfer model A [32] while the spatial and color regularization terms encourage smoothness in both the motion and color change....

    [...]

  • ...We solve the objective by iteratively estimating the flow (u, v) using a traditional optical flow algorithm, and computing the color change A by solving a system of linear equations [32]....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a convolutional neural network is used to predict the coefficients of a locally affine model in bilateral space, which is then applied to the full-resolution image.
Abstract: Performance is a critical challenge in mobile image processing. Given a reference imaging pipeline, or even human-adjusted pairs of images, we seek to reproduce the enhancements and enable real-time evaluation. For this, we introduce a new neural network architecture inspired by bilateral grid processing and local affine color transforms. Using pairs of input/output images, we train a convolutional neural network to predict the coefficients of a locally-affine model in bilateral space. Our architecture learns to make local, global, and content-dependent decisions to approximate the desired image transformation. At runtime, the neural network consumes a low-resolution version of the input image, produces a set of affine transformations in bilateral space, upsamples those transformations in an edge-preserving fashion using a new slicing node, and then applies those upsampled transformations to the full-resolution image. Our algorithm processes high-resolution images on a smartphone in milliseconds, provides a real-time viewfinder at 1080p resolution, and matches the quality of state-of-the-art approximation techniques on a large class of image operators. Unlike previous work, our model is trained off-line from data and therefore does not require access to the original operator at runtime. This allows our model to learn complex, scene-dependent transformations for which no reference implementation is available, such as the photographic edits of a human retoucher.

510 citations

References
More filters
Proceedings ArticleDOI
20 Jun 2005
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Abstract: We study the question of feature sets for robust visual object recognition; adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.

31,952 citations


"Data-driven hallucination of differ..." refers methods in this paper

  • ...We tried the different descriptors suggested in Xiao’s paper, and found that the Histograms of Oriented Gradients (HOG) [Dalal and Triggs 2005] works well for our data....

    [...]

Proceedings ArticleDOI
13 Jun 2010
TL;DR: This paper proposes the extensive Scene UNderstanding (SUN) database that contains 899 categories and 130,519 images and uses 397 well-sampled categories to evaluate numerous state-of-the-art algorithms for scene recognition and establish new bounds of performance.
Abstract: Scene categorization is a fundamental problem in computer vision However, scene understanding research has been constrained by the limited scope of currently-used databases which do not capture the full variety of scene categories Whereas standard databases for object categorization contain hundreds of different classes of objects, the largest available dataset of scene categories contains only 15 classes In this paper we propose the extensive Scene UNderstanding (SUN) database that contains 899 categories and 130,519 images We use 397 well-sampled categories to evaluate numerous state-of-the-art algorithms for scene recognition and establish new bounds of performance We measure human scene classification performance on the SUN database and compare this with computational methods Additionally, we study a finer-grained scene representation to detect scenes embedded inside of larger scenes

2,960 citations


"Data-driven hallucination of differ..." refers methods in this paper

  • ...We achieve these two tasks using existing scene and image matching techniques [Xiao et al. 2010]....

    [...]

Proceedings ArticleDOI
01 Aug 2001
TL;DR: This work uses quilting as a fast and very simple texture synthesis algorithm which produces surprisingly good results for a wide range of textures and extends the algorithm to perform texture transfer — rendering an object with a texture taken from a different object.
Abstract: We present a simple image-based method of generating novel visual appearance in which a new image is synthesized by stitching together small patches of existing images. We call this process image quilting. First, we use quilting as a fast and very simple texture synthesis algorithm which produces surprisingly good results for a wide range of textures. Second, we extend the algorithm to perform texture transfer — rendering an object with a texture taken from a different object. More generally, we demonstrate how an image can be re-rendered in the style of a different image. The method works directly on the images and does not require 3D information.

2,649 citations


"Data-driven hallucination of differ..." refers background or methods in this paper

  • ...Image Analogies Our work relates to Image Analogies [Hertzmann et al. 2001; Efros and Freeman 2001] in the sense that input : hallucinated image :: matched frame : target frame where the matched and target frames are from the time-lapse video....

    [...]

  • ...Image Analogies Our work relates to Image Analogies [Hertzmann et al. 2001; Efros and Freeman 2001] in the sense that...

    [...]

Journal ArticleDOI
TL;DR: This work uses a simple statistical analysis to impose one image's color characteristics on another by choosing an appropriate source image and applying its characteristic to another image.
Abstract: We use a simple statistical analysis to impose one image's color characteristics on another. We can achieve color correction by choosing an appropriate source image and apply its characteristic to another image.

2,615 citations


"Data-driven hallucination of differ..." refers methods in this paper

  • ...Approaches for color transfer such as [Reinhard et al. 2001; Pouli and Reinhard 2011; Pitie et al. 2005] apply a global color mapping to match color statistics between images....

    [...]

  • ...Figure 12 compares our approach to techniques based on a global color transfer [Reinhard et al. 2001; Pitie et al. 2005]....

    [...]

Journal ArticleDOI
TL;DR: This work built on another training-based super- resolution algorithm and developed a faster and simpler algorithm for one-pass super-resolution that requires only a nearest-neighbor search in the training set for a vector derived from each patch of local image data.
Abstract: We call methods for achieving high-resolution enlargements of pixel-based images super-resolution algorithms. Many applications in graphics or image processing could benefit from such resolution independence, including image-based rendering (IBR), texture mapping, enlarging consumer photographs, and converting NTSC video content to high-definition television. We built on another training-based super-resolution algorithm and developed a faster and simpler algorithm for one-pass super-resolution. Our algorithm requires only a nearest-neighbor search in the training set for a vector derived from each patch of local image data. This one-pass super-resolution algorithm is a step toward achieving resolution independence in image-based representations. We don't expect perfect resolution independence-even the polygon representation doesn't have that-but increasing the resolution independence of pixel-based representations is an important task for IBR.

2,576 citations


"Data-driven hallucination of differ..." refers methods in this paper

  • ...Image Collections Recent research demonstrates convincing graphics application with big data, such as scene completion [Hays and Efros 2007], tone adjustment [Bychkovsky et al. 2011], and super-resolution [Freeman et al. 2002]....

    [...]