scispace - formally typeset
Search or ask a question

Showing papers by "Ron Weiss published in 2021"


Proceedings Article
03 May 2021
TL;DR: WaveGrad offers a natural way to trade inference speed for sample quality by adjusting the number of refinement steps, and bridges the gap between non-autoregressive and autoregressive models in terms of audio quality.
Abstract: This paper introduces WaveGrad, a conditional model for waveform generation which estimates gradients of the data density. The model is built on prior work on score matching and diffusion probabilistic models. It starts from a Gaussian white noise signal and iteratively refines the signal via a gradient-based sampler conditioned on the mel-spectrogram. WaveGrad offers a natural way to trade inference speed for sample quality by adjusting the number of refinement steps, and bridges the gap between non-autoregressive and autoregressive models in terms of audio quality. We find that it can generate high fidelity audio samples using as few as six iterations. Experiments reveal WaveGrad to generate high fidelity audio, outperforming adversarial non-autoregressive baselines and matching a strong likelihood-based autoregressive baseline using fewer sequential operations. Audio samples are available at https://wavegrad-iclr2021.github.io/.

351 citations


Journal ArticleDOI
TL;DR: An overview of bioengineering technologies that can be harnessed to facilitate the culture, self-organization and functionality of human pluripotent stem cell-derived organoids is provided.
Abstract: In recent years considerable progress has been made in the development of faithful procedures for the differentiation of human pluripotent stem cells (hPSCs). An important step in this direction has also been the derivation of organoids. This technology generally relies on traditional three-dimensional culture techniques that exploit cell-autonomous self-organization responses of hPSCs with minimal control over the external inputs supplied to the system. The convergence of stem cell biology and bioengineering offers the possibility to provide these stimuli in a controlled fashion, resulting in the development of naturally inspired approaches to overcome major limitations of this nascent technology. Based on the current developments, we emphasize the achievements and ongoing challenges of bringing together hPSC organoid differentiation, bioengineering and ethics. This Review underlines the need for providing engineering solutions to gain control of self-organization and functionality of hPSC-derived organoids. We expect that this knowledge will guide the community to generate higher-grade hPSC-derived organoids for further applications in developmental biology, drug screening, disease modelling and personalized medicine. This Review provides an overview of bioengineering technologies that can be harnessed to facilitate the culture, self-organization and functionality of human pluripotent stem cell-derived organoids.

130 citations


Proceedings ArticleDOI
Isaac Elias1, Heiga Zen1, Jonathan Shen1, Yu Zhang1, Ye Jia1, Ron Weiss1, Yonghui Wu1 
06 Jun 2021
TL;DR: Parallel Tacotron as mentioned in this paper uses a variational autoencoder-based residual encoder for text-to-speech models, which is highly parallelizable during both training and inference.
Abstract: Although neural end-to-end text-to-speech models can synthesize highly natural speech, there is still room for improvements to its efficiency and naturalness. This paper proposes a non-autoregressive neural text-to-speech model augmented with a variational autoencoder-based residual encoder. This model, called Parallel Tacotron, is highly parallelizable during both training and inference, allowing efficient synthesis on modern parallel hardware. The use of the variational autoencoder relaxes the one-to-many mapping nature of the text-to-speech problem and improves naturalness. To further improve the naturalness, we use lightweight convolutions, which can efficiently capture local contexts, and introduce an iterative spectrogram loss inspired by iterative refinement. Experimental results show that Parallel Tacotron matches a strong autoregressive baseline in subjective evaluations with significantly decreased inference time.

35 citations


Journal ArticleDOI
TL;DR: In this article, the authors describe control systems approaches for achieving context-aware devices that are robust to context effects, and then consider cell fate programing as a case study to explore the potential impact of contextaware devices for regenerative medicine applications.
Abstract: The rise of systems biology has ushered a new paradigm: the view of the cell as a system that processes environmental inputs to drive phenotypic outputs. Synthetic biology provides a complementary approach, allowing us to program cell behavior through the addition of synthetic genetic devices into the cellular processor. These devices, and the complex genetic circuits they compose, are engineered using a design-prototype-test cycle, allowing for predictable device performance to be achieved in a context-dependent manner. Within mammalian cells, context effects impact device performance at multiple scales, including the genetic, cellular, and extracellular levels. In order for synthetic genetic devices to achieve predictable behaviors, approaches to overcome context dependence are necessary. Here, we describe control systems approaches for achieving context-aware devices that are robust to context effects. We then consider cell fate programing as a case study to explore the potential impact of context-aware devices for regenerative medicine applications.

26 citations


Proceedings ArticleDOI
Ron Weiss1, RJ Skerry-Ryan1, Eric Battenberg1, Soroosh Mariooryad1, Diederik P. Kingma1 
06 Jun 2021
TL;DR: The authors proposed a sequence-to-sequence neural network which directly generates speech waveforms from text inputs by incorporating a normalizing flow into the autoregressive decoder loop, which can be optimized directly with maximum likelihood, with-out using intermediate, hand-designed features.
Abstract: We describe a sequence-to-sequence neural network which directly generates speech waveforms from text inputs. The architecture extends the Tacotron model by incorporating a normalizing flow into the autoregressive decoder loop. Output waveforms are modeled as a sequence of non-overlapping fixed-length blocks, each one containing hundreds of samples. The interdependencies of waveform samples within each block are modeled using the normalizing flow, enabling parallel training and synthesis. Longer-term dependencies are handled autoregressively by conditioning each flow on preceding blocks. This model can be optimized directly with maximum likelihood, with-out using intermediate, hand-designed features nor additional loss terms. Contemporary state-of-the-art text-to-speech (TTS) systems use a cascade of separately learned models: one (such as Tacotron) which generates intermediate features (such as spectrograms) from text, followed by a vocoder (such as WaveRNN) which generates waveform samples from the intermediate features. The proposed system, in contrast, does not use a fixed intermediate representation, and learns all parameters end-to-end. Experiments show that the proposed model generates speech with quality approaching a state-of-the-art neural TTS system, with significantly improved generation speed.

20 citations


Journal ArticleDOI
02 Jul 2021-Science
TL;DR: In this article, a bistable toggle switch was created in Saccharomyces cerevisiae using a cross-repression topology comprising 11 protein-protein phosphorylation elements.
Abstract: Synthetic biological networks comprising fast, reversible reactions could enable engineering of new cellular behaviors that are not possible with slower regulation. Here, we created a bistable toggle switch in Saccharomyces cerevisiae using a cross-repression topology comprising 11 protein-protein phosphorylation elements. The toggle is ultrasensitive, can be induced to switch states in seconds, and exhibits long-term bistability. Motivated by our toggle's architecture and size, we developed a computational framework to search endogenous protein pathways for other large and similar bistable networks. Our framework helped us to identify and experimentally verify five formerly unreported endogenous networks that exhibit bistability. Building synthetic protein-protein networks will enable bioengineers to design fast sensing and processing systems, allow sophisticated regulation of cellular processes, and aid discovery of endogenous networks with particular functions.

16 citations


Journal ArticleDOI
TL;DR: The in vivo utility of a synthetic self-amplifying mRNA (RNA replicon) whose expression can be turned off using a genetic switch that responds to oral administration of trimethoprim (TMP), an FDA-approved small-molecule drug is validated.

10 citations


Proceedings ArticleDOI
17 Jun 2021
TL;DR: This article proposed WaveGrad 2, a non-autoregressive generative model for text-to-speech synthesis, which is trained to estimate the gradient of the log conditional density of the waveform given a phoneme sequence.
Abstract: This paper introduces WaveGrad 2, a non-autoregressive generative model for text-to-speech synthesis. WaveGrad 2 is trained to estimate the gradient of the log conditional density of the waveform given a phoneme sequence. The model takes an input phoneme sequence, and through an iterative refinement process, generates an audio waveform. This contrasts to the original WaveGrad vocoder which conditions on mel-spectrogram features, generated by a separate model. The iterative refinement process starts from Gaussian noise, and through a series of refinement steps (e.g., 50 steps), progressively recovers the audio sequence. WaveGrad 2 offers a natural way to trade-off between inference speed and sample quality, through adjusting the number of refinement steps. Experiments show that the model can generate high fidelity audio, approaching the performance of a state-of-the-art neural TTS system. We also report various ablation studies over different model configurations. Audio samples are available at this https URL.

10 citations


Posted Content
Scott Wisdom1, Aren Jansen1, Ron Weiss1, Hakan Erdogan1, John R. Hershey1 
TL;DR: In this article, the authors introduce sparsity losses that favor fewer output sources and a covariance loss that discourages correlated outputs to combat over-separation in the mixture invariant training (MixIT) method.
Abstract: Supervised neural network training has led to significant progress on single-channel sound separation. This approach relies on ground truth isolated sources, which precludes scaling to widely available mixture data and limits progress on open-domain tasks. The recent mixture invariant training (MixIT) method enables training on in-the wild data; however, it suffers from two outstanding problems. First, it produces models which tend to over-separate, producing more output sources than are present in the input. Second, the exponential computational complexity of the MixIT loss limits the number of feasible output sources. These problems interact: increasing the number of output sources exacerbates over-separation. In this paper we address both issues. To combat over-separation we introduce new losses: sparsity losses that favor fewer output sources and a covariance loss that discourages correlated outputs. We also experiment with a semantic classification loss by predicting weak class labels for each mixture. To extend MixIT to larger numbers of sources, we introduce an efficient approximation using a fast least-squares solution, projected onto the MixIT constraint set. Our experiments show that the proposed losses curtail over-separation and improve overall performance. The best performance is achieved using larger numbers of output sources, enabled by our efficient MixIT loss, combined with sparsity losses to prevent over-separation. On the FUSS test set, we achieve over 13 dB in multi-source SI-SNR improvement, while boosting single-source reconstruction SI-SNR by over 17 dB.

8 citations


Posted ContentDOI
31 Mar 2021-bioRxiv
TL;DR: In this article, the authors used proteins derived from bacterial two-component signaling pathways to develop synthetic phosphorylation-based and feedback-controlled devices in mammalian cells with such properties.
Abstract: Rewired and synthetic signaling networks can impart cells with new functionalities and enable efforts in engineering cell therapies and directing cell development However, there is a need for tools to build synthetic signaling networks that are tunable, can precisely regulate target gene expression, and are robust to perturbations within the complex context of mammalian cells Here, we use proteins derived from bacterial two-component signaling pathways to develop synthetic phosphorylation-based and feedback-controlled devices in mammalian cells with such properties First, we isolate kinase and phosphatase proteins from the bifunctional histidine kinase EnvZ We then use these proteins to engineer a synthetic covalent modification cycle, in which the kinase and phosphatase competitively regulate phosphorylation of the cognate response regulator OmpR, enabling analog tuning of OmpR-driven gene expression Further, we show that the phosphorylation cycle can be extended by connecting phosphatase expression to small molecule and miRNA inputs in the cell, with the latter enabling cell-type specific signaling responses and accurate cell type classification Finally, we implement a tunable negative feedback controller by co-expressing the kinase-driven output gene with the small molecule-tunable phosphatase This negative feedback substantially reduces cell-to-cell noise in output expression and mitigates the effects of cell context perturbations due to off-target regulation and resource competition Our work thus lays the foundation for establishing tunable, precise, and robust control over cell behavior with synthetic signaling network

7 citations


Journal ArticleDOI
15 Jul 2021
TL;DR: In this article, a combination of signal-to-noise ratio (SNR), area under a receiver operating characteristic curve (AUC), and fold change (FC) was used to quantitatively define digitizer performance and predict responses to different input signals.
Abstract: Many synthetic gene circuits are restricted to single-use applications or require iterative refinement for incorporation into complex systems. One example is the recombinase-based digitizer circuit, which has been used to improve weak or leaky biological signals. Here we present a workflow to quantitatively define digitizer performance and predict responses to different input signals. Using a combination of signal-to-noise ratio (SNR), area under a receiver operating characteristic curve (AUC), and fold change (FC), we evaluate three small-molecule inducible digitizer designs demonstrating FC up to 508x and SNR up to 3.77 dB. To study their behavior further and improve modularity, we develop a mixed phenotypic/mechanistic model capable of predicting digitizer configurations that amplify a synNotch cell-to-cell communication signal (Δ SNR up to 2.8 dB). We hope the metrics and modeling approaches here will facilitate incorporation of these digitizers into other systems while providing an improved workflow for gene circuit characterization.

Journal ArticleDOI
20 Jan 2021
TL;DR: By varying the number of highly adhesive and less adhesive cells in multicellular aggregates, this work finds the cell-type ratio and total cell count control pattern formation, with resulting structures maintained for several days.
Abstract: Summary Adhesion-mediated cell sorting has long been considered an organizing principle in developmental biology. While most computational models have emphasized the dynamics of segregation to fully sorted structures, cell sorting can also generate a plethora of transient, incompletely sorted states. The timescale of such states in experimental systems is unclear: if they are long-lived, they can be harnessed by development or engineered in synthetic tissues. Here, we use experiments and computational modeling to demonstrate how such structures can be systematically designed by quantitative control of cell composition. By varying the number of highly adhesive and less adhesive cells in multicellular aggregates, we find the cell-type ratio and total cell count control pattern formation, with resulting structures maintained for several days. Our work takes a step toward mapping the design space of self-assembling structures in development and provides guidance to the emerging field of shape engineering with synthetic biology.

Proceedings ArticleDOI
30 Aug 2021
TL;DR: The authors proposed a multitask training method for attention-based end-to-end speech recognition models to better incorporate language level information, which leads to an 11% relative performance improvement over the baseline and is comparable to language model shallow fusion.
Abstract: We propose a multitask training method for attention-based end-to-end speech recognition models to better incorporate language level information. We regularize the decoder in a sequence-to-sequence architecture by multitask training it on both the speech recognition task and a next-token prediction language modeling task. Trained on either the 100 hour subset of LibriSpeech or the full 960 hour dataset, the proposed method leads to an 11% relative performance improvement over the baseline and is comparable to language model shallow fusion, without requiring an additional neural network during decoding. Analyses of sample output sentences and the word error rate on rare words demonstrate that the proposed method can incorporate language level information effectively.

Posted ContentDOI
10 Oct 2021-bioRxiv
TL;DR: In this article, the authors derived an Engineering Error Inequality that provides a quantitative mathematical bound on the relationship between predictability of results, model accuracy, measurement precision, and device characteristics, recommending a target standard deviation of 1.5-fold.
Abstract: Reliable, predictable engineering of cellular behavior is one of the key goals of synthetic biology. As the field matures, biological engineers will become increasingly reliant on computer models that allow for the rapid exploration of design space prior to the more costly construction and characterization of candidate designs. The efficacy of such models, however, depends on the accuracy of their predictions, the precision of the measurements used to parameterize the models, and the tolerance of biological devices for imperfections in modeling and measurement. To better understand this relationship, we have derived an Engineering Error Inequality that provides a quantitative mathematical bound on the relationship between predictability of results, model accuracy, measurement precision, and device characteristics. We apply this relation to estimate measurement precision requirements for engineering genetic regulatory networks given current model and device characteristics, recommending a target standard deviation of 1.5-fold. We then compare these requirements with the results of an interlaboratory study to validate that these requirements can be met via flow cytometry with matched instrument channels and an independent calibrant. Based on these results, we recommend a set of best practices for quality control of flow cytometry data and discuss how these might be extended to other measurement modalities and applied to support further development of genetic regulatory network engineering.

Book ChapterDOI
TL;DR: TASBE Image Analytics as discussed by the authors is a software pipeline for automatically segmenting collections of cells using the fluorescence channels of microscopy images, which can be grouped into spatially disjoint segments and the movement or development of these segments tracked over time.
Abstract: Laboratory automation now commonly allows high-throughput sample preparation, culturing, and acquisition of microscopy images, but quantitative image analysis is often still a painstaking and subjective process. This is a problem especially significant for work on programmed morphogenesis, where the spatial organization of cells and cell types is of paramount importance. To address the challenges of quantitative analysis for such experiments, we have developed TASBE Image Analytics, a software pipeline for automatically segmenting collections of cells using the fluorescence channels of microscopy images. With TASBE Image Analytics, collections of cells can be grouped into spatially disjoint segments, the movement or development of these segments tracked over time, and rich statistical data output in a standardized format for analysis. Processing is readily configurable, rapid, and produces results that closely match hand annotation by humans for all but the smallest and dimmest segments. TASBE Image Analytics can thus provide the analysis necessary to complete the design-build-test-learn cycle for high-throughput experiments in programmed morphogenesis, as validated by our application of this pipeline to process experiments on shape formation with engineered CHO and HEK293 cells.

Patent
22 Apr 2021
TL;DR: In this paper, methods of generating proteoglycans with distinct glycan structures in engineered, non-naturally occurring eukaryotic cells are presented, making accessible a dynamic range of protein glycosylation.
Abstract: Disclosed herein are methods of generating proteoglycans with distinct glycan structures in engineered, non-naturally occurring eukaryotic cells. These methods make accessible a dynamic range of protein glycosylation. Compositions of engineered, non-naturally occurring cells capable of generating these proteoglycans are also disclosed herein.

Patent
Ehsan Variani1, Kevin W. Wilson1, Ron Weiss1, Tara N. Sainath1, Arun Narayanan1 
13 Jul 2021

Posted Content
TL;DR: This paper proposed WaveGrad 2, a non-autoregressive generative model for text-to-speech synthesis, which is trained to estimate the gradient of the log conditional density of the waveform given a phoneme sequence.
Abstract: This paper introduces WaveGrad 2, a non-autoregressive generative model for text-to-speech synthesis. WaveGrad 2 is trained to estimate the gradient of the log conditional density of the waveform given a phoneme sequence. The model takes an input phoneme sequence, and through an iterative refinement process, generates an audio waveform. This contrasts to the original WaveGrad vocoder which conditions on mel-spectrogram features, generated by a separate model. The iterative refinement process starts from Gaussian noise, and through a series of refinement steps (e.g., 50 steps), progressively recovers the audio sequence. WaveGrad 2 offers a natural way to trade-off between inference speed and sample quality, through adjusting the number of refinement steps. Experiments show that the model can generate high fidelity audio, approaching the performance of a state-of-the-art neural TTS system. We also report various ablation studies over different model configurations. Audio samples are available at this https URL.