scispace - formally typeset
Search or ask a question
Author

Lucy Chai

Bio: Lucy Chai is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Computer science & Inpainting. The author has an hindex of 7, co-authored 14 publications receiving 463 citations. Previous affiliations of Lucy Chai include University of Cambridge & Mitre Corporation.

Papers
More filters
Proceedings Article
30 Apr 2020
TL;DR: It is shown that although current GANs can fit standard datasets very well, they still fall short of being comprehensive models of the visual manifold, and it is hypothesized that the degree of distributional shift is related to the breadth of the training data distribution.
Abstract: An open secret in contemporary machine learning is that many models work beautifully on standard benchmarks but fail to generalize outside the lab. This has been attributed to biased training data, which provide poor coverage over real world events. Generative models are no exception, but recent advances in generative adversarial networks (GANs) suggest otherwise -- these models can now synthesize strikingly realistic and diverse images. Is generative modeling of photos a solved problem? We show that although current GANs can fit standard datasets very well, they still fall short of being comprehensive models of the visual manifold. In particular, we study their ability to fit simple transformations such as camera movements and color changes. We find that the models reflect the biases of the datasets on which they are trained (e.g., centered objects), but that they also exhibit some capacity for generalization: by "steering" in latent space, we can shift the distribution while still creating realistic images. We hypothesize that the degree of distributional shift is related to the breadth of the training data distribution. Thus, we conduct experiments to quantify the limits of GAN transformations and introduce techniques to mitigate the problem.

165 citations

Posted Content
TL;DR: In this article, the authors show that although GANs can fit standard datasets very well, they still fall short of being comprehensive models of the visual manifold, and study their ability to fit simple transformations such as camera movements and color changes.
Abstract: An open secret in contemporary machine learning is that many models work beautifully on standard benchmarks but fail to generalize outside the lab This has been attributed to biased training data, which provide poor coverage over real world events Generative models are no exception, but recent advances in generative adversarial networks (GANs) suggest otherwise - these models can now synthesize strikingly realistic and diverse images Is generative modeling of photos a solved problem? We show that although current GANs can fit standard datasets very well, they still fall short of being comprehensive models of the visual manifold In particular, we study their ability to fit simple transformations such as camera movements and color changes We find that the models reflect the biases of the datasets on which they are trained (eg, centered objects), but that they also exhibit some capacity for generalization: by "steering" in latent space, we can shift the distribution while still creating realistic images We hypothesize that the degree of distributional shift is related to the breadth of the training data distribution Thus, we conduct experiments to quantify the limits of GAN transformations and introduce techniques to mitigate the problem Code is released on our project page: this https URL

153 citations

Journal ArticleDOI
TL;DR: Network methods are used, applied to fMRI data collected from 22 human subjects performing a language comprehension task, to reveal the dynamic nature of the language system and suggest a trade-off between a region's specialization and its capacity for flexible network reconfiguration.
Abstract: During linguistic processing, a set of brain regions on the lateral surfaces of the left frontal, temporal, and parietal cortices exhibit robust responses. These areas display highly correlated activity while a subject rests or performs a naturalistic language comprehension task, suggesting that they form an integrated functional system. Evidence suggests that this system is spatially and functionally distinct from other systems that support high-level cognition in humans. Yet, how different regions within this system might be recruited dynamically during task performance is not well understood. Here we use network methods, applied to fMRI data collected from 22 human subjects performing a language comprehension task, to reveal the dynamic nature of the language system. We observe the presence of a stable core of brain regions, predominantly located in the left hemisphere, that consistently coactivate with one another. We also observe the presence of a more flexible periphery of brain regions, predominantly located in the right hemisphere, that coactivate with different regions at different times. However, the language functional ROIs in the angular gyrus and the anterior temporal lobe were notable exceptions to this trend. By highlighting the temporal dimension of language processing, these results suggest a trade-off between a region's specialization and its capacity for flexible network reconfiguration.

147 citations

Journal ArticleDOI
03 Apr 2017
TL;DR: Network science and machine learning are used to show that growing cognitive abilities are accompanied by greater flexibility of brain regions within distributed networks, including the executive system, which is critical for higher-order cognitive functions and increases in expression and flexibility from childhood to young adulthood.
Abstract: Cognitive function evolves significantly over development, enabling flexible control of human behavior. Yet, how these functions are instantiated in spatially distributed and dynamically interactin...

91 citations

Posted Content
TL;DR: This work uses a patch-based classifier with limited receptive fields to visualize which regions of fake images are more easily detectable and shows a technique to exaggerate these detectable properties and demonstrates that, even when the image generator is adversarially finetuned against a fake image classifier, it is still imperfect and leaves detectable artifacts in certain image patches.
Abstract: The quality of image generation and manipulation is reaching impressive levels, making it increasingly difficult for a human to distinguish between what is real and what is fake. However, deep networks can still pick up on the subtle artifacts in these doctored images. We seek to understand what properties of fake images make them detectable and identify what generalizes across different model architectures, datasets, and variations in training. We use a patch-based classifier with limited receptive fields to visualize which regions of fake images are more easily detectable. We further show a technique to exaggerate these detectable properties and demonstrate that, even when the image generator is adversarially finetuned against a fake image classifier, it is still imperfect and leaves detectable artifacts in certain image patches. Code is available at this https URL.

91 citations


Cited by
More filters
Posted Content
TL;DR: It is observed that despite their hierarchical convolutional nature, the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner, and small architectural changes are derived that guarantee that unwanted information cannot leak into the hierarchical synthesis process.
Abstract: We observe that despite their hierarchical convolutional nature, the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner. This manifests itself as, e.g., detail appearing to be glued to image coordinates instead of the surfaces of depicted objects. We trace the root cause to careless signal processing that causes aliasing in the generator network. Interpreting all signals in the network as continuous, we derive generally applicable, small architectural changes that guarantee that unwanted information cannot leak into the hierarchical synthesis process. The resulting networks match the FID of StyleGAN2 but differ dramatically in their internal representations, and they are fully equivariant to translation and rotation even at subpixel scales. Our results pave the way for generative models better suited for video and animation.

621 citations

Proceedings ArticleDOI
14 Jun 2020
TL;DR: This work proposes a novel framework, called InterFaceGAN, for semantic face editing by interpreting the latent semantics learned by GANs, and finds that the latent code of well-trained generative models actually learns a disentangled representation after linear transformations.
Abstract: Despite the recent advance of Generative Adversarial Networks (GANs) in high-fidelity image synthesis, there lacks enough understanding of how GANs are able to map a latent code sampled from a random distribution to a photo-realistic image. Previous work assumes the latent space learned by GANs follows a distributed representation but observes the vector arithmetic phenomenon. In this work, we propose a novel framework, called InterFaceGAN, for semantic face editing by interpreting the latent semantics learned by GANs. In this framework, we conduct a detailed study on how different semantics are encoded in the latent space of GANs for face synthesis. We find that the latent code of well-trained generative models actually learns a disentangled representation after linear transformations. We explore the disentanglement between various semantics and manage to decouple some entangled semantics with subspace projection, leading to more precise control of facial attributes. Besides manipulating gender, age, expression, and the presence of eyeglasses, we can even vary the face pose as well as fix the artifacts accidentally generated by GAN models. The proposed method is further applied to achieve real image manipulation when combined with GAN inversion methods or some encoder-involved models. Extensive results suggest that learning to synthesize faces spontaneously brings a disentangled and controllable facial attribute representation.

560 citations

Posted Content
TL;DR: This work presents a generic image-to-image translation framework, pixel2style2pixel (pSp), based on a novel encoder network that directly generates a series of style vectors which are fed into a pretrained StyleGAN generator, forming the extended latent space.
Abstract: We present a generic image-to-image translation framework, Pixel2Style2Pixel (pSp). Our pSp framework is based on a novel encoder network that directly generates a series of style vectors which are fed into a pretrained StyleGAN generator, forming the extended W+ latent space. We first show that our encoder can directly embed real images into W+, with no additional optimization. We further introduce a dedicated identity loss which is shown to achieve improved performance in the reconstruction of an input image. We demonstrate pSp to be a simple architecture that, by leveraging a well-trained, fixed generator network, can be easily applied on a wide-range of image-to-image translation tasks. Solving these tasks through the style representation results in a global approach that does not rely on a local pixel-to-pixel correspondence and further supports multi-modal synthesis via the resampling of styles. Notably, we demonstrate that pSp can be trained to align a face image to a frontal pose without any labeled data, generate multi-modal results for ambiguous tasks such as conditional face generation from segmentation maps, and construct high-resolution images from corresponding low-resolution images.

504 citations

Posted Content
TL;DR: InterFaceGAN as discussed by the authors explores the disentanglement between various semantics and manage to decouple some entangled semantics with subspace projection, leading to more precise control of facial attributes, including gender, age, expression, and the presence of eyeglasses.
Abstract: Despite the recent advance of Generative Adversarial Networks (GANs) in high-fidelity image synthesis, there lacks enough understanding of how GANs are able to map a latent code sampled from a random distribution to a photo-realistic image. Previous work assumes the latent space learned by GANs follows a distributed representation but observes the vector arithmetic phenomenon. In this work, we propose a novel framework, called InterFaceGAN, for semantic face editing by interpreting the latent semantics learned by GANs. In this framework, we conduct a detailed study on how different semantics are encoded in the latent space of GANs for face synthesis. We find that the latent code of well-trained generative models actually learns a disentangled representation after linear transformations. We explore the disentanglement between various semantics and manage to decouple some entangled semantics with subspace projection, leading to more precise control of facial attributes. Besides manipulating gender, age, expression, and the presence of eyeglasses, we can even vary the face pose as well as fix the artifacts accidentally generated by GAN models. The proposed method is further applied to achieve real image manipulation when combined with GAN inversion methods or some encoder-involved models. Extensive results suggest that learning to synthesize faces spontaneously brings a disentangled and controllable facial attribute representation.

426 citations

Journal ArticleDOI
TL;DR: Although predominantly peppered with examples from human neuroimaging, it is hoped that this account will offer an accessible guide to any neuroscientist aiming to measure, characterize, and understand the full richness of the brain's multiscale network structure.

422 citations