scispace - formally typeset
Search or ask a question

Showing papers by "Ron Weiss published in 2019"


Posted Content
Heiga Zen1, Viet Dang, Robert A. J. Clark1, Yu Zhang1, Ron Weiss1, Ye Jia1, Zhifeng Chen1, Yonghui Wu1 
TL;DR: This paper introduced a new speech corpus called "LibriTTS" for text-to-speech use, which is derived from the original audio and text materials of the LibriSpeech corpus, which was used for training and evaluating automatic speech recognition systems.
Abstract: This paper introduces a new speech corpus called "LibriTTS" designed for text-to-speech use. It is derived from the original audio and text materials of the LibriSpeech corpus, which has been used for training and evaluating automatic speech recognition systems. The new corpus inherits desired properties of the LibriSpeech corpus while addressing a number of issues which make LibriSpeech less than ideal for text-to-speech work. The released corpus consists of 585 hours of speech data at 24kHz sampling rate from 2,456 speakers and the corresponding texts. Experimental results show that neural end-to-end TTS models trained from the LibriTTS corpus achieved above 4.0 in mean opinion scores in naturalness in five out of six evaluation speakers. The corpus is freely available for download from this http URL.

303 citations


Proceedings ArticleDOI
Heiga Zen1, Viet Dang, Robert A. J. Clark1, Yu Zhang1, Ron Weiss1, Ye Jia1, Zhifeng Chen1, Yonghui Wu1 
05 Apr 2019
TL;DR: Experimental results show that neural end-to-end TTS models trained from the LibriTTS corpus achieved above 4.0 in mean opinion scores in naturalness in five out of six evaluation speakers.
Abstract: This paper introduces a new speech corpus called "LibriTTS" designed for text-to-speech use. It is derived from the original audio and text materials of the LibriSpeech corpus, which has been used for training and evaluating automatic speech recognition systems. The new corpus inherits desired properties of the LibriSpeech corpus while addressing a number of issues which make LibriSpeech less than ideal for text-to-speech work. The released corpus consists of 585 hours of speech data at 24kHz sampling rate from 2,456 speakers and the corresponding texts. Experimental results show that neural end-to-end TTS models trained from the LibriTTS corpus achieved above 4.0 in mean opinion scores in naturalness in five out of six evaluation speakers. The corpus is freely available for download from this http URL.

286 citations


Journal ArticleDOI
TL;DR: A regularization scheme is introduced that forces the representations to focus on the phonetic content of the utterance and report performance comparable with the top entries in the ZeroSpeech 2017 unsupervised acoustic unit discovery task.
Abstract: We consider the task of unsupervised extraction of meaningful latent representations of speech by applying autoencoding neural networks to speech waveforms. The goal is to learn a representation able to capture high level semantic content from the signal, e.g. phoneme identities, while being invariant to confounding low level details in the signal such as the underlying pitch contour or background noise. Since the learned representation is tuned to contain only phonetic content, we resort to using a high capacity WaveNet decoder to infer information discarded by the encoder from previous samples. Moreover, the behavior of autoencoder models depends on the kind of constraint that is applied to the latent representation. We compare three variants: a simple dimensionality reduction bottleneck, a Gaussian Variational Autoencoder (VAE), and a discrete Vector Quantized VAE (VQ-VAE). We analyze the quality of learned representations in terms of speaker independence, the ability to predict phonetic content, and the ability to accurately reconstruct individual spectrogram frames. Moreover, for discrete encodings extracted using the VQ-VAE, we measure the ease of mapping them to phonemes. We introduce a regularization scheme that forces the representations to focus on the phonetic content of the utterance and report performance comparable with the top entries in the ZeroSpeech 2017 unsupervised acoustic unit discovery task.

252 citations


Posted Content
TL;DR: This document outlines the underlying design of Lingvo and serves as an introduction to the various pieces of the framework, while also offering examples of advanced features that showcase the capabilities of the Framework.
Abstract: Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models. Lingvo models are composed of modular building blocks that are flexible and easily extensible, and experiment configurations are centralized and highly customizable. Distributed training and quantized inference are supported directly within the framework, and it contains existing implementations of a large number of utilities, helper functions, and the newest research ideas. Lingvo has been used in collaboration by dozens of researchers in more than 20 papers over the last two years. This document outlines the underlying design of Lingvo and serves as an introduction to the various pieces of the framework, while also offering examples of advanced features that showcase the capabilities of the framework.

213 citations


Proceedings ArticleDOI
15 Sep 2019
TL;DR: A novel system that separates the voice of a target speaker from multi-speaker signals, by making use of a reference signal from the target speaker, by training two separate neural networks.

149 citations


Proceedings Article
01 Jan 2019
TL;DR: This article proposed a conditional generative model based on the variational autoencoder (VAE) framework, with two levels of hierarchical latent variables, a categorical variable representing attribute groups (e.g. clean/noisy) and a multivariate Gaussian variable, which characterizes specific attribute configurations and enables disentangling fine-grained control over these attributes.
Abstract: This paper proposes a neural sequence-to-sequence text-to-speech (TTS) model which can control latent attributes in the generated speech that are rarely annotated in the training data, such as speaking style, accent, background noise, and recording conditions. The model is formulated as a conditional generative model based on the variational autoencoder (VAE) framework, with two levels of hierarchical latent variables. The first level is a categorical variable, which represents attribute groups (e.g. clean/noisy) and provides interpretability. The second level, conditioned on the first, is a multivariate Gaussian variable, which characterizes specific attribute configurations (e.g. noise level, speaking rate) and enables disentangled fine-grained control over these attributes. This amounts to using a Gaussian mixture model (GMM) for the latent distribution. Extensive evaluation demonstrates its ability to control the aforementioned attributes. In particular, we train a high-quality controllable TTS model on real found data, which is capable of inferring speaker and style attributes from a noisy utterance and use it to synthesize clean speech with controllable speaking style.

140 citations


Journal ArticleDOI
TL;DR: In this article, an unsupervised extraction of meaningful latent representations of speech by applying autoencoding neural networks to speech waveforms is considered. But the learned representation is tuned to contain only phonetic content, and the decoder is used to infer information discarded by the encoder from previous samples.
Abstract: We consider the task of unsupervised extraction of meaningful latent representations of speech by applying autoencoding neural networks to speech waveforms. The goal is to learn a representation able to capture high level semantic content from the signal, e.g.\ phoneme identities, while being invariant to confounding low level details in the signal such as the underlying pitch contour or background noise. Since the learned representation is tuned to contain only phonetic content, we resort to using a high capacity WaveNet decoder to infer information discarded by the encoder from previous samples. Moreover, the behavior of autoencoder models depends on the kind of constraint that is applied to the latent representation. We compare three variants: a simple dimensionality reduction bottleneck, a Gaussian Variational Autoencoder (VAE), and a discrete Vector Quantized VAE (VQ-VAE). We analyze the quality of learned representations in terms of speaker independence, the ability to predict phonetic content, and the ability to accurately reconstruct individual spectrogram frames. Moreover, for discrete encodings extracted using the VQ-VAE, we measure the ease of mapping them to phonemes. We introduce a regularization scheme that forces the representations to focus on the phonetic content of the utterance and report performance comparable with the top entries in the ZeroSpeech 2017 unsupervised acoustic unit discovery task.

128 citations


Proceedings ArticleDOI
12 May 2019
TL;DR: Experimental results demonstrate that the proposed method can disentangle speaker and noise attributes even if they are correlated in the training data, and can be used to consistently synthesize clean speech for all speakers.
Abstract: To leverage crowd-sourced data to train multi-speaker text-to-speech (TTS) models that can synthesize clean speech for all speakers, it is essential to learn disentangled representations which can independently control the speaker identity and background noise in generated signals. However, learning such representations can be challenging, due to the lack of labels describing the recording conditions of each training example, and the fact that speakers and recording conditions are often correlated, e.g. since users often make many recordings using the same equipment. This paper proposes three components to address this problem by: (1) formulating a conditional generative model with factorized latent variables, (2) using data augmentation to add noise that is not correlated with speaker identity and whose label is known during training, and (3) using adversarial factorization to improve disentanglement. Experimental results demonstrate that the proposed method can disentangle speaker and noise attributes even if they are correlated in the training data, and can be used to consistently synthesize clean speech for all speakers. Ablation studies verify the importance of each proposed component.

113 citations


Proceedings ArticleDOI
12 May 2019
TL;DR: This paper showed that using pre-trained MT or text-to-speech (TTS) synthesis models to convert weakly supervised data into speechto-translation pairs for ST training can be more effective than multi-task learning.
Abstract: End-to-end Speech Translation (ST) models have many potential advantages when compared to the cascade of Automatic Speech Recognition (ASR) and text Machine Translation (MT) models, including lowered inference latency and the avoidance of error compounding. However, the quality of end-to-end ST is often limited by a paucity of training data, since it is difficult to collect large parallel corpora of speech and translated transcript pairs. Previous studies have proposed the use of pre-trained components and multi-task learning in order to benefit from weakly supervised training data, such as speech-to-transcript or text-to-foreign-text pairs. In this paper, we demonstrate that using pre-trained MT or text-to-speech (TTS) synthesis models to convert weakly supervised data into speech-to-translation pairs for ST training can be more effective than multi-task learning. Furthermore, we demonstrate that a high quality end-to-end ST model can be trained using only weakly supervised datasets, and that synthetic data sourced from unlabeled monolingual text or speech can be used to improve performance. Finally, we discuss methods for avoiding overfitting to synthetic speech with a quantitative ablation study.

108 citations


Posted Content
Ye Jia1, Ron Weiss1, Fadi Biadsy1, Wolfgang Macherey1, Melvin Johnson1, Zhifeng Chen1, Yonghui Wu1 
TL;DR: The authors presented an attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation, and demonstrated the ability to synthesize translated speech using the voice of the source speaker.
Abstract: We present an attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation. The network is trained end-to-end, learning to map speech spectrograms into target spectrograms in another language, corresponding to the translated content (in a different canonical voice). We further demonstrate the ability to synthesize translated speech using the voice of the source speaker. We conduct experiments on two Spanish-to-English speech translation datasets, and find that the proposed model slightly underperforms a baseline cascade of a direct speech-to-text translation model and a text-to-speech synthesis model, demonstrating the feasibility of the approach on this very challenging task.

107 citations


Proceedings ArticleDOI
09 Jul 2019
TL;DR: This article presented a multispeaker, multilingual text-to-speech (TTS) synthesis model based on Tacotron that is able to produce high quality speech in multiple languages.
Abstract: We present a multispeaker, multilingual text-to-speech (TTS) synthesis model based on Tacotron that is able to produce high quality speech in multiple languages. Moreover, the model is able to transfer voices across languages, e.g. synthesize fluent Spanish speech using an English speaker's voice, without training on any bilingual or parallel examples. Such transfer works across distantly related languages, e.g. English and Mandarin. Critical to achieving this result are: 1. using a phonemic input representation to encourage sharing of model capacity across languages, and 2. incorporating an adversarial loss term to encourage the model to disentangle its representation of speaker identity (which is perfectly correlated with language in the training data) from the speech content. Further scaling up the model by training on multiple speakers of each language, and incorporating an autoencoding input to help stabilize attention during training, results in a model which can be used to consistently synthesize intelligible speech for training speakers in all languages seen during training, and in native or foreign accents.


Proceedings ArticleDOI
12 May 2019
TL;DR: The authors proposed a spelling correction model to explicitly correct the characteristic error distribution made by the attention-based sequence-to-sequence models, which showed an 18.6% relative improvement over the baseline model when directly correcting top ASR hypothesis, and a 29.0% improvement when further rescoring an expanded n-best list.
Abstract: Attention-based sequence-to-sequence models for speech recognition jointly train an acoustic model, language model (LM), and alignment mechanism using a single neural network and require only parallel audio-text pairs. Thus, the language model component of the end-to-end model is only trained on transcribed audio-text pairs, which leads to performance degradation especially on rare words. While there have been a variety of work that look at incorporating an external LM trained on text-only data into the end-to-end framework, none of them have taken into account the characteristic error distribution made by the model. In this paper, we propose a novel approach to utilizing text-only data, by training a spelling correction (SC) model to explicitly correct those errors. On the LibriSpeech dataset, we demonstrate that the proposed model results in an 18.6% relative improvement in WER over the baseline model when directly correcting top ASR hypothesis, and a 29.0% relative improvement when further rescoring an expanded n-best list using an external LM.

Proceedings ArticleDOI
Ye Jia1, Ron Weiss1, Fadi Biadsy1, Wolfgang Macherey1, Melvin Johnson1, Zhifeng Chen1, Yonghui Wu1 
12 Apr 2019
TL;DR: An attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation is presented.
Abstract: We present an attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation. The network is trained end-to-end, learning to map speech spectrograms into target spectrograms in another language, corresponding to the translated content (in a different canonical voice). We further demonstrate the ability to synthesize translated speech using the voice of the source speaker. We conduct experiments on two Spanish-to-English speech translation datasets, and find that the proposed model slightly underperforms a baseline cascade of a direct speech-to-text translation model and a text-to-speech synthesis model, demonstrating the feasibility of the approach on this very challenging task.

Posted Content
TL;DR: A multispeaker, multilingual text-to-speech (TTS) synthesis model based on Tacotron that is able to produce high quality speech in multiple languages and be able to transfer voices across languages, e.g. English and Mandarin.
Abstract: We present a multispeaker, multilingual text-to-speech (TTS) synthesis model based on Tacotron that is able to produce high quality speech in multiple languages. Moreover, the model is able to transfer voices across languages, e.g. synthesize fluent Spanish speech using an English speaker's voice, without training on any bilingual or parallel examples. Such transfer works across distantly related languages, e.g. English and Mandarin. Critical to achieving this result are: 1. using a phonemic input representation to encourage sharing of model capacity across languages, and 2. incorporating an adversarial loss term to encourage the model to disentangle its representation of speaker identity (which is perfectly correlated with language in the training data) from the speech content. Further scaling up the model by training on multiple speakers of each language, and incorporating an autoencoding input to help stabilize attention during training, results in a model which can be used to consistently synthesize intelligible speech for training speakers in all languages seen during training, and in native or foreign accents.


Journal ArticleDOI
TL;DR: Synthetic gene circuits regulated by small molecules have been used to fine-tune glycosyltransferase expression in CHO cells, providing a method to produce therapeutic monoclonal antibodies with precise Glycosylation states.
Abstract: N-linked glycosylation in monoclonal antibodies (mAbs) is crucial for structural and functional properties of mAb therapeutics, including stability, pharmacokinetics, safety and clinical efficacy. The biopharmaceutical industry currently lacks tools to precisely control N-glycosylation levels during mAb production. In this study, we engineered Chinese hamster ovary cells with synthetic genetic circuits to tune N-glycosylation of a stably expressed IgG. We knocked out two key glycosyltransferase genes, α-1,6-fucosyltransferase (FUT8) and β-1,4-galactosyltransferase (β4GALT1), genomically integrated circuits expressing synthetic glycosyltransferase genes under constitutive or inducible promoters and generated antibodies with concurrently desired fucosylation (0-97%) and galactosylation (0-87%) levels. Simultaneous and independent control of FUT8 and β4GALT1 expression was achieved using orthogonal small molecule inducers. Effector function studies confirmed that glycosylation profile changes affected antibody binding to a cell surface receptor. Precise and rational modification of N-glycosylation will allow new recombinant protein therapeutics with tailored in vitro and in vivo effects for various biotechnological and biomedical applications.

Journal ArticleDOI
22 Feb 2019-iScience
TL;DR: In this article, an integrative approach involving multi-dimensional omics analyses was employed to dissect the temporal dynamics of glycoforms produced during fed-batch cultures of CHO cells.

Journal ArticleDOI
TL;DR: A next-generation sequencing approach combined with machine learning is used to screen a synthetic promoter library with 6107 designs for high-performance SPECS for potentially any cell state.
Abstract: Cell state-specific promoters constitute essential tools for basic research and biotechnology because they activate gene expression only under certain biological conditions. Synthetic Promoters with Enhanced Cell-State Specificity (SPECS) can be superior to native ones, but the design of such promoters is challenging and frequently requires gene regulation or transcriptome knowledge that is not readily available. Here, to overcome this challenge, we use a next-generation sequencing approach combined with machine learning to screen a synthetic promoter library with 6107 designs for high-performance SPECS for potentially any cell state. We demonstrate the identification of multiple SPECS that exhibit distinct spatiotemporal activity during the programmed differentiation of induced pluripotent stem cells (iPSCs), as well as SPECS for breast cancer and glioblastoma stem-like cells. We anticipate that this approach could be used to create SPECS for gene therapies that are activated in specific cell states, as well as to study natural transcriptional regulatory networks. Synthetic promoters can be superior to native ones but the design is challenging without knowledge of gene regulation. Here the authors develop a pipeline that allows for screening a synthetic promoter library to identify high performance promoters in potentially any given cell state of interest.

Journal ArticleDOI
TL;DR: An in vitro evolution strategy was developed and six mutations in nonstructural proteins of Venezuelan equine encephalitis replicon that promoted subgenome expression in cells were identified that may be useful for improving RNA therapeutics for vaccination, cancer immunotherapy, and gene therapy.
Abstract: Self-replicating (replicon) RNA is a promising new platform for gene therapy, but applications are still limited by short persistence of expression in most cell types and low levels of transgene expression in vivo. To address these shortcomings, we developed an in vitro evolution strategy and identified six mutations in nonstructural proteins (nsPs) of Venezuelan equine encephalitis (VEE) replicon that promoted subgenome expression in cells. Two mutations in nsP2 and nsP3 enhanced transgene expression, while three mutations in nsP3 regulated this expression. Replicons containing the most effective mutation combinations showed enhanced duration and cargo gene expression in vivo. In comparison to wildtype replicon, mutants expressing IL-2 injected into murine B16F10 melanoma showed 5.5-fold increase in intratumoral IL-2 and 2.1-fold increase in infiltrating CD8 T cells, resulting in significantly slowed tumor growth. Thus, these mutant replicons may be useful for improving RNA therapeutics for vaccination, cancer immunotherapy, and gene therapy.

Journal ArticleDOI
TL;DR: It is shown that a point mutation (Bxb1-GA) in Bxb1 target sites significantly increases Bx1-mediated integration efficiency at the Rosa26 locus in Chinese hamster ovary cells, resulting in the highest integration efficiency reported with a site-specific integrase in mammalian cells.
Abstract: Phage-derived integrases can catalyze irreversible, site-specific integration of transgenic payloads into a chromosomal locus, resulting in mammalian cells that stably express transgenes or circuits of interest. Previous studies have demonstrated high-efficiency integration by the Bxb1 integrase in mammalian cells. Here, we show that a point mutation (Bxb1-GA) in Bxb1 target sites significantly increases Bxb1-mediated integration efficiency at the Rosa26 locus in Chinese hamster ovary cells, resulting in the highest integration efficiency reported with a site-specific integrase in mammalian cells. Bxb1-GA point mutant sites do not cross-react with Bxb1 wild-type sites, enabling their use in applications that require orthogonal pairs of target sites. In comparison, we test the efficiency and orthogonality of ϕC31 and Wβ integrases, and show that Wβ has an integration efficiency between those of Bxb1-GA and wild-type Bxb1. Our data present a toolbox of integrases for inserting payloads such as gene circuits or therapeutic transgenes into mammalian cell lines.

Journal ArticleDOI
TL;DR: One-pot evaluation enabled by poly-transfection accelerates and simplifies the design of genetic systems, providing a new high-information strategy for interrogating biology.
Abstract: Biological research is relying on increasingly complex genetic systems and circuits to perform sophisticated operations in living cells. Performing these operations often requires simultaneous delivery of many genes, and optimizing the stoichiometry of these genes can yield drastic improvements in performance. However, sufficiently sampling the large design space of gene expression stoichiometries in mammalian cells using current methods is cumbersome, complex, or expensive. We present a 'poly-transfection' method as a simple yet high-throughput alternative that enables comprehensive evaluation of genetic systems in a single, readily-prepared transfection sample. Each cell in a poly-transfection represents an independent measurement at a distinct gene expression stoichiometry, fully leveraging the single-cell nature of transfection experiments. We first benchmark poly-transfection against co-transfection, showing that titration curves for commonly-used regulators agree between the two methods. We then use poly-transfections to efficiently generate new insights, for example in CRISPRa and synthetic miRNA systems. Finally, we use poly-transfection to rapidly engineer a difficult-to-optimize miRNA-based cell classifier for discriminating cancerous cells. One-pot evaluation enabled by poly-transfection accelerates and simplifies the design of genetic systems, providing a new high-information strategy for interrogating biology.

Posted Content
TL;DR: This paper proposes a novel approach to utilizing text-only data, by training a spelling correction (SC) model to explicitly correct errors made by the end-to-end model.
Abstract: Attention-based sequence-to-sequence models for speech recognition jointly train an acoustic model, language model (LM), and alignment mechanism using a single neural network and require only parallel audio-text pairs. Thus, the language model component of the end-to-end model is only trained on transcribed audio-text pairs, which leads to performance degradation especially on rare words. While there have been a variety of work that look at incorporating an external LM trained on text-only data into the end-to-end framework, none of them have taken into account the characteristic error distribution made by the model. In this paper, we propose a novel approach to utilizing text-only data, by training a spelling correction (SC) model to explicitly correct those errors. On the LibriSpeech dataset, we demonstrate that the proposed model results in an 18.6% relative improvement in WER over the baseline model when directly correcting top ASR hypothesis, and a 29.0% relative improvement when further rescoring an expanded n-best list using an external LM.

Proceedings ArticleDOI
01 Dec 2019
TL;DR: A Biomolecular Neural Network (BNN), a dynamical chemical reaction network which faithfully implements ANN computations and which is unconditionally stable with respect to its parameters when composed into deeper networks is proposed.
Abstract: While much of synthetic biology was founded on the creation of reusable, standardized parts, there is now a growing interest in synthetic networks which can compute unique, specially-designed functions in order to recognize patterns or classify cells in-vivo. While artificial neural networks (ANNs) have long provided a mature mathematical framework to address this problem in-silico, their implementation becomes much more challenging in living systems. In this work, we propose a Biomolecular Neural Network (BNN), a dynamical chemical reaction network which faithfully implements ANN computations and which is unconditionally stable with respect to its parameters when composed into deeper networks. Our implementation emphasizes the usefulness of molecular sequestration for achieving negative weight values and a nonlinear "activation function" in its elemental unit, a biomolecular perceptron. We then discuss the application of BNNs to linear and nonlinear classification tasks, and draw analogies to other major concepts in modern machine learning research.

Posted Content
Fadi Biadsy1, Ron Weiss1, Pedro J. Moreno1, Dimitri Kanevsky1, Ye Jia1 
TL;DR: It is demonstrated that this model can be trained to normalize speech from any speaker regardless of accent, prosody, and background noise, into the voice of a single canonical target speaker with a fixed accent and consistent articulation and prosody.
Abstract: We describe Parrotron, an end-to-end-trained speech-to-speech conversion model that maps an input spectrogram directly to another spectrogram, without utilizing any intermediate discrete representation The network is composed of an encoder, spectrogram and phoneme decoders, followed by a vocoder to synthesize a time-domain waveform We demonstrate that this model can be trained to normalize speech from any speaker regardless of accent, prosody, and background noise, into the voice of a single canonical target speaker with a fixed accent and consistent articulation and prosody We further show that this normalization model can be adapted to normalize highly atypical speech from a deaf speaker, resulting in significant improvements in intelligibility and naturalness, measured via a speech recognizer and listening tests Finally, demonstrating the utility of this model on other speech tasks, we show that the same model architecture can be trained to perform a speech separation task

Posted ContentDOI
16 Dec 2019-bioRxiv
TL;DR: It is shown that genomically introduced transgenes exhibit resistance to silencing when regulated using this platform compared to those that are transcriptionally-regulated, and the orthogonal, modular and composable nature of this platform holds promise for their application in gene and cell therapies.
Abstract: Regulation of transgene expression is becoming an integral component of gene therapies, cell therapies and biomanufacturing. However, transcription factor-based regulation upon which the majority of such applications are based suffers from complications such as epigenetic silencing, which limits the longevity and reliability of these efforts. Genetically engineered mammalian cells used for cell therapies and biomanufacturing as well as newer RNA-based gene therapies would benefit from post-transcriptional methods of gene regulation, but few such platforms exist that enable sophisticated programming of cell behavior. Here we engineer the 5’ and 3’ untranslated regions of transcripts to enable robust and composable RNA-level regulation through transcript cleavage and, in particular, create modular RNA-level OFF- and ON-switch motifs. We show that genomically introduced transgenes exhibit resistance to silencing when regulated using this platform compared to those that are transcriptionally-regulated. We adapt nine CRISPR-specific endoRNases as RNA-level “activators” and “repressors” and show that these can be easily layered and composed to reconstruct genetic programming topologies previously achieved with transcription factor-based regulation including cascades, all 16 two-input Boolean logic functions, positive feedback, a feed-forward loop and a putative bistable toggle switch. The orthogonal, modular and composable nature of this platform as well as the ease with which robust and predictable gene circuits are constructed holds promise for their application in gene and cell therapies.

01 Feb 2019
TL;DR: The results show that galactose, and not manganese, is able to mitigate the temporal bottleneck, despite both being known effectors of galactosylation.
Abstract: N-linked glycosylation affects the potency, safety, immunogenicity, and pharmacokinetic clearance of several therapeutic proteins including monoclonal antibodies. A robust control strategy is needed to dial in appropriate glycosylation profile during the course of cell culture processes accurately. However, N-glycosylation dynamics remains insufficiently understood owing to the lack of integrative analyses of factors that influence the dynamics, including sugar nucleotide donors, glycosyltransferases, and glycosidases. Here, an integrative approach involving multi-dimensional omics analyses was employed to dissect the temporal dynamics of glycoforms produced during fed-batch cultures of CHO cells. Several pathways including glycolysis, tricarboxylic citric acid cycle, and nucleotide biosynthesis exhibited temporal dynamics over the cell culture period. The steps involving galactose and sialic acid addition were determined as temporal bottlenecks. Our results show that galactose, and not manganese, is able to mitigate the temporal bottleneck, despite both being known effectors of galactosylation. Furthermore, sialylation is limited by the galactosylated precursors and autoregulation of cytidine monophosphate-sialic acid biosynthesis.

Proceedings ArticleDOI
12 May 2019
TL;DR: It is demonstrated that synthesizing diverse audio textures is challenging, and argued that this is because audio data is relatively low-dimensional, and two new terms to the original Grammian loss are introduced: an autocorrelation term that preserves rhythm, and a diversity term that encourages the optimization procedure to synthesize unique textures.
Abstract: Texture synthesis techniques based on matching the Gram matrix of feature activations in neural networks have achieved spectacular success in the image domain. In this paper we extend these techniques to the audio domain. We demonstrate that synthesizing diverse audio textures is challenging, and argue that this is because audio data is relatively low-dimensional. We therefore introduce two new terms to the original Grammian loss: an autocorrelation term that preserves rhythm, and a diversity term that encourages the optimization procedure to synthesize unique textures. We quantitatively study the impact of our design choices on the quality of the synthesized audio by introducing an audio analogue to the Inception loss which we term the VGGish loss. We show that there is a trade-off between the diversity and quality of the synthesized audio using this technique. Finally we perform a number of experiments to qualitatively study how these design choices impact the quality of the synthesized audio.

Journal ArticleDOI
TL;DR: The five fundamentals of the Principles of Synthetic Biology, a structured approach to learning the biological principles and theoretical underpinnings of synthetic biology, are described and impact and metrics data from two runs of the course through the edX platform are described.
Abstract: Synthetic biology requires students and scientists to draw upon knowledge and expertise from many disciplines. While this diversity is one of the field's primary strengths, it also makes it challenging for newcomers to acquire the background knowledge necessary to thrive. To address this gap, we developed a course that provides a structured approach to learning the biological principles and theoretical underpinnings of synthetic biology. Our course, Principles of Synthetic Biology (PoSB), was released on the massively open online course platform edX in 2016. PoSB seeks to teach synthetic biology through five key fundamentals: (i) parts and layers of abstraction, (ii) biomolecular modeling, (iii) digital logic abstraction, (iv) circuit design principles and (v) extended circuit modalities. In this article, we describe the five fundamentals, our formulation of the course, and impact and metrics data from two runs of the course through the edX platform.

Journal ArticleDOI
TL;DR: An in silico circuit that performs homeostatic control by utilizing a novel scheme with both symmetric and asymmetric division of stem cells is designed, which could be useful in porting an analog-circuit design framework to synthetic biological applications of the future.
Abstract: Tissue homeostasis (feedback control) is an important mechanism that regulates the population of different cell types within a tissue. In type-1 diabetes, auto-immune attack and consequent death of pancreatic β cells result in the failure of homeostasis and loss of organ function. Synthetically engineered adult stem cells with homeostatic control based on digital logic have been proposed as a solution for regenerating β cells. Such previously proposed homeostatic control circuits have thus far been unable to reliably control both stem-cell proliferation and stem-cell differentiation. Using analog circuits and feedback systems analysis, we have designed an in silico circuit that performs homeostatic control by utilizing a novel scheme with both symmetric and asymmetric division of stem cells. The use of a variety of feedback systems analysis techniques, which is common in analog circuit design, including root-locus techniques, Bode plots of feedback-loop frequency response, compensation techniques for improving stability, and robustness analysis help us choose design parameters to meet desirable specifications. For example, we show that lead compensation in analog circuits instantiated as an incoherent feed-forward loop in the biological circuit improves stability, whereas simultaneously reducing steady-state tracking error. Our symmetric and asymmetric division scheme also improves phase margin in the feedback loop, and thus improves robustness. This paper could be useful in porting an analog-circuit design framework to synthetic biological applications of the future.