Top 38 papers published by Ron Weiss from Massachusetts Institute of Technology in 2019

Posted Content•

LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

[...]

Heiga Zen¹, Viet Dang, Robert A. J. Clark¹, Yu Zhang¹, Ron Weiss¹, Ye Jia¹, Zhifeng Chen¹, Yonghui Wu¹ - Show less +4 more•Institutions (1)

Google¹

05 Apr 2019-arXiv: Sound

TL;DR: This paper introduced a new speech corpus called "LibriTTS" for text-to-speech use, which is derived from the original audio and text materials of the LibriSpeech corpus, which was used for training and evaluating automatic speech recognition systems.

...read moreread less

Abstract: This paper introduces a new speech corpus called "LibriTTS" designed for text-to-speech use. It is derived from the original audio and text materials of the LibriSpeech corpus, which has been used for training and evaluating automatic speech recognition systems. The new corpus inherits desired properties of the LibriSpeech corpus while addressing a number of issues which make LibriSpeech less than ideal for text-to-speech work. The released corpus consists of 585 hours of speech data at 24kHz sampling rate from 2,456 speakers and the corresponding texts. Experimental results show that neural end-to-end TTS models trained from the LibriTTS corpus achieved above 4.0 in mean opinion scores in naturalness in five out of six evaluation speakers. The corpus is freely available for download from this http URL.

...read moreread less

303 citations

Proceedings Article•DOI•

LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

[...]

Heiga Zen¹, Viet Dang, Robert A. J. Clark¹, Yu Zhang¹, Ron Weiss¹, Ye Jia¹, Zhifeng Chen¹, Yonghui Wu¹ - Show less +4 more•Institutions (1)

Google¹

05 Apr 2019

TL;DR: Experimental results show that neural end-to-end TTS models trained from the LibriTTS corpus achieved above 4.0 in mean opinion scores in naturalness in five out of six evaluation speakers.

...read moreread less

Abstract: This paper introduces a new speech corpus called "LibriTTS" designed for text-to-speech use. It is derived from the original audio and text materials of the LibriSpeech corpus, which has been used for training and evaluating automatic speech recognition systems. The new corpus inherits desired properties of the LibriSpeech corpus while addressing a number of issues which make LibriSpeech less than ideal for text-to-speech work. The released corpus consists of 585 hours of speech data at 24kHz sampling rate from 2,456 speakers and the corresponding texts. Experimental results show that neural end-to-end TTS models trained from the LibriTTS corpus achieved above 4.0 in mean opinion scores in naturalness in five out of six evaluation speakers. The corpus is freely available for download from this http URL.

...read moreread less

286 citations

Journal Article•DOI•

Unsupervised Speech Representation Learning Using WaveNet Autoencoders

[...]

Jan Chorowski¹, Ron Weiss², Samy Bengio², Aaron van den Oord•Institutions (2)

University of Wrocław¹, Google²

01 Dec 2019-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A regularization scheme is introduced that forces the representations to focus on the phonetic content of the utterance and report performance comparable with the top entries in the ZeroSpeech 2017 unsupervised acoustic unit discovery task.

...read moreread less

Abstract: We consider the task of unsupervised extraction of meaningful latent representations of speech by applying autoencoding neural networks to speech waveforms. The goal is to learn a representation able to capture high level semantic content from the signal, e.g. phoneme identities, while being invariant to confounding low level details in the signal such as the underlying pitch contour or background noise. Since the learned representation is tuned to contain only phonetic content, we resort to using a high capacity WaveNet decoder to infer information discarded by the encoder from previous samples. Moreover, the behavior of autoencoder models depends on the kind of constraint that is applied to the latent representation. We compare three variants: a simple dimensionality reduction bottleneck, a Gaussian Variational Autoencoder (VAE), and a discrete Vector Quantized VAE (VQ-VAE). We analyze the quality of learned representations in terms of speaker independence, the ability to predict phonetic content, and the ability to accurately reconstruct individual spectrogram frames. Moreover, for discrete encodings extracted using the VQ-VAE, we measure the ease of mapping them to phonemes. We introduce a regularization scheme that forces the representations to focus on the phonetic content of the utterance and report performance comparable with the top entries in the ZeroSpeech 2017 unsupervised acoustic unit discovery task.

...read moreread less

252 citations

Posted Content•

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

[...]

Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia Xu Chen, Ye Jia, Anjuli Kannan, Tara N. Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Marie Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob, Bowen Liang, HyoukJoong Lee, Ciprian Chelba, Sébastien Jean, Bo Li, Melvin Johnson, Rohan Anil, Rajat Tibrewal, Xiaobing Liu, Akiko Eriguchi, Navdeep Jaitly, Naveen Ari, Colin Cherry, Parisa Haghani, Otavio Good, Youlong Cheng, Raziel Alvarez, Isaac Caswell, Wei-Ning Hsu, Zongheng Yang, Kuan-Chieh Wang, Ekaterina Gonina, Katrin Tomanek, Ben Vanik, Zelin Wu, Llion Jones, Mike Schuster, Yanping Huang, Dehao Chen, Kazuki Irie, George Foster, John Richardson, Klaus Macherey, Antoine Bruguier, Heiga Zen, Colin Raffel, Shankar Kumar, Kanishka Rao, David Rybach, Matthew Murray, Vijayaditya Peddinti, Maxim Krikun, Michiel Bacchiani, Thomas B. Jablin, Robert Suderman, Ian Williams, Benjamin N. Lee, Deepti Bhatia, Justin Carlson, Semih Yavuz, Yu Zhang, Ian McGraw, Max Galkin, Qi Ge, Golan Pundak, Chad Whipkey, Todd Wang, Uri Alon, Dmitry Lepikhin, Ye Tian, Sara Sabour, William Chan, Shubham Toshniwal, Baohua Liao, Michael Nirschl, Pat Rondon - Show less +87 more

21 Feb 2019-arXiv: Learning

TL;DR: This document outlines the underlying design of Lingvo and serves as an introduction to the various pieces of the framework, while also offering examples of advanced features that showcase the capabilities of the Framework.

...read moreread less

Abstract: Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models. Lingvo models are composed of modular building blocks that are flexible and easily extensible, and experiment configurations are centralized and highly customizable. Distributed training and quantized inference are supported directly within the framework, and it contains existing implementations of a large number of utilities, helper functions, and the newest research ideas. Lingvo has been used in collaboration by dozens of researchers in more than 20 papers over the last two years. This document outlines the underlying design of Lingvo and serves as an introduction to the various pieces of the framework, while also offering examples of advanced features that showcase the capabilities of the framework.

...read moreread less

213 citations

Proceedings Article•DOI•

VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking

[...]

Hannah Muckenhirn¹, Ignacio Lopez Moreno¹, John R. Hershey², Kevin W. Wilson¹, Prashant Sridhar¹, Quan Wang¹, Rif A. Saurous¹, Ron Weiss³, Ye Jia¹, Zelin Wu¹ - Show less +6 more•Institutions (3)

Google¹, Mitsubishi Electric², Massachusetts Institute of Technology³

15 Sep 2019

TL;DR: A novel system that separates the voice of a target speaker from multi-speaker signals, by making use of a reference signal from the target speaker, by training two separate neural networks.

...read moreread less

149 citations

Proceedings Article•

Hierarchical Generative Modeling for Controllable Speech Synthesis.

[...]

Wei-Ning Hsu¹, Yu Zhang², Ron Weiss¹, Heiga Zen², Yonghui Wu², Yuxuan Wang², Yuan Cao³, Ye Jia², Zhifeng Chen², Jonathan Shen², Patrick Nguyen², Ruoming Pang² - Show less +8 more•Institutions (3)

Massachusetts Institute of Technology¹, Google², University of California, Los Angeles³

01 Jan 2019

TL;DR: This article proposed a conditional generative model based on the variational autoencoder (VAE) framework, with two levels of hierarchical latent variables, a categorical variable representing attribute groups (e.g. clean/noisy) and a multivariate Gaussian variable, which characterizes specific attribute configurations and enables disentangling fine-grained control over these attributes.

...read moreread less

Abstract: This paper proposes a neural sequence-to-sequence text-to-speech (TTS) model which can control latent attributes in the generated speech that are rarely annotated in the training data, such as speaking style, accent, background noise, and recording conditions. The model is formulated as a conditional generative model based on the variational autoencoder (VAE) framework, with two levels of hierarchical latent variables. The first level is a categorical variable, which represents attribute groups (e.g. clean/noisy) and provides interpretability. The second level, conditioned on the first, is a multivariate Gaussian variable, which characterizes specific attribute configurations (e.g. noise level, speaking rate) and enables disentangled fine-grained control over these attributes. This amounts to using a Gaussian mixture model (GMM) for the latent distribution. Extensive evaluation demonstrates its ability to control the aforementioned attributes. In particular, we train a high-quality controllable TTS model on real found data, which is capable of inferring speaker and style attributes from a noisy utterance and use it to synthesize clean speech with controllable speaking style.

...read moreread less

140 citations

Journal Article•DOI•

Unsupervised speech representation learning using WaveNet autoencoders

[...]

Jan Chorowski¹, Ron Weiss², Samy Bengio², Aaron van den Oord•Institutions (2)

University of Wrocław¹, Google²

25 Jan 2019-arXiv: Learning

TL;DR: In this article, an unsupervised extraction of meaningful latent representations of speech by applying autoencoding neural networks to speech waveforms is considered. But the learned representation is tuned to contain only phonetic content, and the decoder is used to infer information discarded by the encoder from previous samples.

...read moreread less

Abstract: We consider the task of unsupervised extraction of meaningful latent representations of speech by applying autoencoding neural networks to speech waveforms. The goal is to learn a representation able to capture high level semantic content from the signal, e.g.\ phoneme identities, while being invariant to confounding low level details in the signal such as the underlying pitch contour or background noise. Since the learned representation is tuned to contain only phonetic content, we resort to using a high capacity WaveNet decoder to infer information discarded by the encoder from previous samples. Moreover, the behavior of autoencoder models depends on the kind of constraint that is applied to the latent representation. We compare three variants: a simple dimensionality reduction bottleneck, a Gaussian Variational Autoencoder (VAE), and a discrete Vector Quantized VAE (VQ-VAE). We analyze the quality of learned representations in terms of speaker independence, the ability to predict phonetic content, and the ability to accurately reconstruct individual spectrogram frames. Moreover, for discrete encodings extracted using the VQ-VAE, we measure the ease of mapping them to phonemes. We introduce a regularization scheme that forces the representations to focus on the phonetic content of the utterance and report performance comparable with the top entries in the ZeroSpeech 2017 unsupervised acoustic unit discovery task.

...read moreread less

128 citations

Proceedings Article•DOI•

Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization

[...]

Wei-Ning Hsu¹, Yu Zhang², Ron Weiss², Yu-An Chung¹, Yuxuan Wang², Yonghui Wu², James Glass¹ - Show less +3 more•Institutions (2)

Massachusetts Institute of Technology¹, Google²

12 May 2019

TL;DR: Experimental results demonstrate that the proposed method can disentangle speaker and noise attributes even if they are correlated in the training data, and can be used to consistently synthesize clean speech for all speakers.

...read moreread less

Abstract: To leverage crowd-sourced data to train multi-speaker text-to-speech (TTS) models that can synthesize clean speech for all speakers, it is essential to learn disentangled representations which can independently control the speaker identity and background noise in generated signals. However, learning such representations can be challenging, due to the lack of labels describing the recording conditions of each training example, and the fact that speakers and recording conditions are often correlated, e.g. since users often make many recordings using the same equipment. This paper proposes three components to address this problem by: (1) formulating a conditional generative model with factorized latent variables, (2) using data augmentation to add noise that is not correlated with speaker identity and whose label is known during training, and (3) using adversarial factorization to improve disentanglement. Experimental results demonstrate that the proposed method can disentangle speaker and noise attributes even if they are correlated in the training data, and can be used to consistently synthesize clean speech for all speakers. Ablation studies verify the importance of each proposed component.

...read moreread less

113 citations

Proceedings Article•DOI•

Leveraging Weakly Supervised Data to Improve End-to-end Speech-to-text Translation

[...]

Ye Jia¹, Melvin Johnson¹, Wolfgang Macherey¹, Ron Weiss¹, Yuan Cao¹, Chung-Cheng Chiu¹, Naveen Ari¹, Stella Marie Laurenzo¹, Yonghui Wu¹ - Show less +5 more•Institutions (1)

Google¹

12 May 2019

TL;DR: This paper showed that using pre-trained MT or text-to-speech (TTS) synthesis models to convert weakly supervised data into speechto-translation pairs for ST training can be more effective than multi-task learning.

...read moreread less

Abstract: End-to-end Speech Translation (ST) models have many potential advantages when compared to the cascade of Automatic Speech Recognition (ASR) and text Machine Translation (MT) models, including lowered inference latency and the avoidance of error compounding. However, the quality of end-to-end ST is often limited by a paucity of training data, since it is difficult to collect large parallel corpora of speech and translated transcript pairs. Previous studies have proposed the use of pre-trained components and multi-task learning in order to benefit from weakly supervised training data, such as speech-to-transcript or text-to-foreign-text pairs. In this paper, we demonstrate that using pre-trained MT or text-to-speech (TTS) synthesis models to convert weakly supervised data into speech-to-translation pairs for ST training can be more effective than multi-task learning. Furthermore, we demonstrate that a high quality end-to-end ST model can be trained using only weakly supervised datasets, and that synthetic data sourced from unlabeled monolingual text or speech can be used to improve performance. Finally, we discuss methods for avoiding overfitting to synthetic speech with a quantitative ablation study.

...read moreread less

108 citations

Posted Content•

Direct speech-to-speech translation with a sequence-to-sequence model

[...]

Ye Jia¹, Ron Weiss¹, Fadi Biadsy¹, Wolfgang Macherey¹, Melvin Johnson¹, Zhifeng Chen¹, Yonghui Wu¹ - Show less +3 more•Institutions (1)

Google¹

12 Apr 2019-arXiv: Computation and Language

TL;DR: The authors presented an attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation, and demonstrated the ability to synthesize translated speech using the voice of the source speaker.

...read moreread less

Abstract: We present an attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation. The network is trained end-to-end, learning to map speech spectrograms into target spectrograms in another language, corresponding to the translated content (in a different canonical voice). We further demonstrate the ability to synthesize translated speech using the voice of the source speaker. We conduct experiments on two Spanish-to-English speech translation datasets, and find that the proposed model slightly underperforms a baseline cascade of a direct speech-to-text translation model and a text-to-speech synthesis model, demonstrating the feasibility of the approach on this very challenging task.

...read moreread less

107 citations

Proceedings Article•DOI•

Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning

[...]

Yu Zhang¹, Ron Weiss¹, Heiga Zen¹, Yonghui Wu¹, Zhifeng Chen¹, RJ Skerry-Ryan¹, Ye Jia¹, Andrew Rosenberg¹, Bhuvana Ramabhadran² - Show less +5 more•Institutions (2)

Google¹, IBM²

09 Jul 2019

TL;DR: This article presented a multispeaker, multilingual text-to-speech (TTS) synthesis model based on Tacotron that is able to produce high quality speech in multiple languages.

...read moreread less

Abstract: We present a multispeaker, multilingual text-to-speech (TTS) synthesis model based on Tacotron that is able to produce high quality speech in multiple languages. Moreover, the model is able to transfer voices across languages, e.g. synthesize fluent Spanish speech using an English speaker's voice, without training on any bilingual or parallel examples. Such transfer works across distantly related languages, e.g. English and Mandarin. Critical to achieving this result are: 1. using a phonemic input representation to encourage sharing of model capacity across languages, and 2. incorporating an adversarial loss term to encourage the model to disentangle its representation of speaker identity (which is perfectly correlated with language in the training data) from the speech content. Further scaling up the model by training on multiple speakers of each language, and incorporating an autoencoding input to help stabilize attention during training, results in a model which can be used to consistently synthesize intelligible speech for training speakers in all languages seen during training, and in native or foreign accents.

...read moreread less

Proceedings Article•DOI•

Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation

[...]

Fadi Biadsy¹, Ron Weiss², Pedro J. Moreno¹, Dimitri Kanvesky, Ye Jia¹ - Show less +1 more•Institutions (2)

Google¹, Massachusetts Institute of Technology²

15 Sep 2019

Proceedings Article•DOI•

A Spelling Correction Model for End-to-end Speech Recognition

[...]

Jinxi Guo¹, Tara N. Sainath², Ron Weiss²•Institutions (2)

University of California, Los Angeles¹, Google²

12 May 2019

TL;DR: The authors proposed a spelling correction model to explicitly correct the characteristic error distribution made by the attention-based sequence-to-sequence models, which showed an 18.6% relative improvement over the baseline model when directly correcting top ASR hypothesis, and a 29.0% improvement when further rescoring an expanded n-best list.

...read moreread less

Abstract: Attention-based sequence-to-sequence models for speech recognition jointly train an acoustic model, language model (LM), and alignment mechanism using a single neural network and require only parallel audio-text pairs. Thus, the language model component of the end-to-end model is only trained on transcribed audio-text pairs, which leads to performance degradation especially on rare words. While there have been a variety of work that look at incorporating an external LM trained on text-only data into the end-to-end framework, none of them have taken into account the characteristic error distribution made by the model. In this paper, we propose a novel approach to utilizing text-only data, by training a spelling correction (SC) model to explicitly correct those errors. On the LibriSpeech dataset, we demonstrate that the proposed model results in an 18.6% relative improvement in WER over the baseline model when directly correcting top ASR hypothesis, and a 29.0% relative improvement when further rescoring an expanded n-best list using an external LM.

...read moreread less

Proceedings Article•DOI•

Direct speech-to-speech translation with a sequence-to-sequence model

[...]

Ye Jia¹, Ron Weiss¹, Fadi Biadsy¹, Wolfgang Macherey¹, Melvin Johnson¹, Zhifeng Chen¹, Yonghui Wu¹ - Show less +3 more•Institutions (1)

Google¹

12 Apr 2019

TL;DR: An attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation is presented.

...read moreread less

Abstract: We present an attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation. The network is trained end-to-end, learning to map speech spectrograms into target spectrograms in another language, corresponding to the translated content (in a different canonical voice). We further demonstrate the ability to synthesize translated speech using the voice of the source speaker. We conduct experiments on two Spanish-to-English speech translation datasets, and find that the proposed model slightly underperforms a baseline cascade of a direct speech-to-text translation model and a text-to-speech synthesis model, demonstrating the feasibility of the approach on this very challenging task.

...read moreread less

Posted Content•

Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning

[...]

Yu Zhang¹, Ron Weiss¹, Heiga Zen¹, Yonghui Wu¹, Zhifeng Chen¹, RJ Skerry-Ryan¹, Ye Jia¹, Andrew Rosenberg¹, Bhuvana Ramabhadran² - Show less +5 more•Institutions (2)

Google¹, IBM²

09 Jul 2019-arXiv: Computation and Language

TL;DR: A multispeaker, multilingual text-to-speech (TTS) synthesis model based on Tacotron that is able to produce high quality speech in multiple languages and be able to transfer voices across languages, e.g. English and Mandarin.

...read moreread less

Abstract: We present a multispeaker, multilingual text-to-speech (TTS) synthesis model based on Tacotron that is able to produce high quality speech in multiple languages. Moreover, the model is able to transfer voices across languages, e.g. synthesize fluent Spanish speech using an English speaker's voice, without training on any bilingual or parallel examples. Such transfer works across distantly related languages, e.g. English and Mandarin. Critical to achieving this result are: 1. using a phonemic input representation to encourage sharing of model capacity across languages, and 2. incorporating an adversarial loss term to encourage the model to disentangle its representation of speaker identity (which is perfectly correlated with language in the training data) from the speech content. Further scaling up the model by training on multiple speakers of each language, and incorporating an autoencoding input to help stabilize attention during training, results in a model which can be used to consistently synthesize intelligible speech for training speakers in all languages seen during training, and in native or foreign accents.

...read moreread less

Journal Article•DOI•

Immunogenicity of RNA Replicons Encoding HIV Env Immunogens Designed for Self-Assembly into Nanoparticles.

[...]

Mariane B. Melo¹, Porter Ely Blanton², Yuan Zhang², Murillo Silva², Na Li³, Na Li², Brian Dobosh², Alessia Liguori³, Pat Skog³, Elise Landais⁴, Sergey Menis⁴, Sergey Menis³, Devin Sok⁴, Devin Sok³, David Nemazee³, William R. Schief¹, Ron Weiss², Darrell J. Irvine - Show less +14 more•Institutions (4)

Ragon Institute of MGH, MIT and Harvard¹, Massachusetts Institute of Technology², Scripps Research Institute³, International AIDS Vaccine Initiative⁴

19 Aug 2019-Molecular Therapy

TL;DR: Data suggest replicon delivery of Env immunogens may be a promising avenue for HIV vaccine development.

...read moreread less

Journal Article•DOI•

Small-molecule control of antibody N-glycosylation in engineered mammalian cells.

[...]

Michelle M Chang¹, Leonid Gaidukov¹, Giyoung Jung¹, Wen Allen Tseng¹, John J. Scarcelli², Richard J. Cornell², Jeffrey K. Marshall², Jonathan L. Lyles¹, Paul Sakorafas², An-Hsiang Adam Chu², Kaffa Cote², Boriana Tzvetkova², Sepideh Dolatshahi¹, Madhuresh Sumit², Bhanu Chandra Mulukutla², Douglas A. Lauffenburger¹, Bruno Figueroa², Nevin M. Summers¹, Timothy K. Lu¹, Ron Weiss¹ - Show less +16 more•Institutions (2)

Massachusetts Institute of Technology¹, Pfizer²

20 May 2019-Nature Chemical Biology

TL;DR: Synthetic gene circuits regulated by small molecules have been used to fine-tune glycosyltransferase expression in CHO cells, providing a method to produce therapeutic monoclonal antibodies with precise Glycosylation states.

...read moreread less

Abstract: N-linked glycosylation in monoclonal antibodies (mAbs) is crucial for structural and functional properties of mAb therapeutics, including stability, pharmacokinetics, safety and clinical efficacy. The biopharmaceutical industry currently lacks tools to precisely control N-glycosylation levels during mAb production. In this study, we engineered Chinese hamster ovary cells with synthetic genetic circuits to tune N-glycosylation of a stably expressed IgG. We knocked out two key glycosyltransferase genes, α-1,6-fucosyltransferase (FUT8) and β-1,4-galactosyltransferase (β4GALT1), genomically integrated circuits expressing synthetic glycosyltransferase genes under constitutive or inducible promoters and generated antibodies with concurrently desired fucosylation (0-97%) and galactosylation (0-87%) levels. Simultaneous and independent control of FUT8 and β4GALT1 expression was achieved using orthogonal small molecule inducers. Effector function studies confirmed that glycosylation profile changes affected antibody binding to a cell surface receptor. Precise and rational modification of N-glycosylation will allow new recombinant protein therapeutics with tailored in vitro and in vivo effects for various biotechnological and biomedical applications.

...read moreread less

Journal Article•DOI•

Dissecting N-Glycosylation Dynamics in Chinese Hamster Ovary Cells Fed-batch Cultures using Time Course Omics Analyses.

[...]

Madhuresh Sumit¹, Sepideh Dolatshahi², An-Hsiang Adam Chu¹, Kaffa Cote¹, John J. Scarcelli¹, Jeffrey K. Marshall¹, Richard J. Cornell¹, Ron Weiss², Douglas A. Lauffenburger², Bhanu Chandra Mulukutla¹, Bruno Figueroa¹ - Show less +7 more•Institutions (2)

Pfizer¹, Massachusetts Institute of Technology²

22 Feb 2019-iScience

TL;DR: In this article, an integrative approach involving multi-dimensional omics analyses was employed to dissect the temporal dynamics of glycoforms produced during fed-batch cultures of CHO cells.

...read moreread less

Journal Article•DOI•

A high-throughput screening and computation platform for identifying synthetic promoters with enhanced cell-state specificity (SPECS)

[...]

Ming-Ru Wu¹, Lior Nissim², Doron Stupp², Erez Pery¹, Adina Binder-Nissim¹, Karen Weisinger¹, Casper Enghuus¹, Sebastian Palacios¹, Melissa R. Humphrey³, Zhizhuo Zhang¹, Zhizhuo Zhang⁴, Eva Maria Novoa¹, Eva Maria Novoa⁴, Manolis Kellis⁴, Manolis Kellis¹, Ron Weiss¹, Samuel D. Rabkin³, Yuval Tabach², Timothy K. Lu - Show less +15 more•Institutions (4)

Massachusetts Institute of Technology¹, Hebrew University of Jerusalem², Harvard University³, Broad Institute⁴

28 Jun 2019-Nature Communications

TL;DR: A next-generation sequencing approach combined with machine learning is used to screen a synthetic promoter library with 6107 designs for high-performance SPECS for potentially any cell state.

...read moreread less

Abstract: Cell state-specific promoters constitute essential tools for basic research and biotechnology because they activate gene expression only under certain biological conditions. Synthetic Promoters with Enhanced Cell-State Specificity (SPECS) can be superior to native ones, but the design of such promoters is challenging and frequently requires gene regulation or transcriptome knowledge that is not readily available. Here, to overcome this challenge, we use a next-generation sequencing approach combined with machine learning to screen a synthetic promoter library with 6107 designs for high-performance SPECS for potentially any cell state. We demonstrate the identification of multiple SPECS that exhibit distinct spatiotemporal activity during the programmed differentiation of induced pluripotent stem cells (iPSCs), as well as SPECS for breast cancer and glioblastoma stem-like cells. We anticipate that this approach could be used to create SPECS for gene therapies that are activated in specific cell states, as well as to study natural transcriptional regulatory networks. Synthetic promoters can be superior to native ones but the design is challenging without knowledge of gene regulation. Here the authors develop a pipeline that allows for screening a synthetic promoter library to identify high performance promoters in potentially any given cell state of interest.

...read moreread less

Journal Article•DOI•

In vitro evolution of enhanced RNA replicons for immunotherapy.

[...]

Yingzhong Li¹, Brian Teague¹, Yuan Zhang², Zhijun Su¹, Porter Ely Blanton¹, Brian Dobosh¹, Tyler E Wagner¹, Darrell J. Irvine, Ron Weiss¹ - Show less +5 more•Institutions (2)

Massachusetts Institute of Technology¹, University of Rhode Island²

06 May 2019-Scientific Reports

TL;DR: An in vitro evolution strategy was developed and six mutations in nonstructural proteins of Venezuelan equine encephalitis replicon that promoted subgenome expression in cells were identified that may be useful for improving RNA therapeutics for vaccination, cancer immunotherapy, and gene therapy.

...read moreread less

Abstract: Self-replicating (replicon) RNA is a promising new platform for gene therapy, but applications are still limited by short persistence of expression in most cell types and low levels of transgene expression in vivo. To address these shortcomings, we developed an in vitro evolution strategy and identified six mutations in nonstructural proteins (nsPs) of Venezuelan equine encephalitis (VEE) replicon that promoted subgenome expression in cells. Two mutations in nsP2 and nsP3 enhanced transgene expression, while three mutations in nsP3 regulated this expression. Replicons containing the most effective mutation combinations showed enhanced duration and cargo gene expression in vivo. In comparison to wildtype replicon, mutants expressing IL-2 injected into murine B16F10 melanoma showed 5.5-fold increase in intratumoral IL-2 and 2.1-fold increase in infiltrating CD8 T cells, resulting in significantly slowed tumor growth. Thus, these mutant replicons may be useful for improving RNA therapeutics for vaccination, cancer immunotherapy, and gene therapy.

...read moreread less

Journal Article•DOI•

Comparison of Integrases Identifies Bxb1-GA Mutant as the Most Efficient Site-Specific Integrase System in Mammalian Cells

[...]

Barbara Jusiak¹, Kalpana Jagtap¹, Leonid Gaidukov¹, Xavier Duportet¹, Kalpanie Bandara², Jianlin Chu², Lin Zhang², Ron Weiss¹, Timothy K. Lu¹ - Show less +5 more•Institutions (2)

Massachusetts Institute of Technology¹, Pfizer²

04 Jan 2019-ACS Synthetic Biology

TL;DR: It is shown that a point mutation (Bxb1-GA) in Bxb1 target sites significantly increases Bx1-mediated integration efficiency at the Rosa26 locus in Chinese hamster ovary cells, resulting in the highest integration efficiency reported with a site-specific integrase in mammalian cells.

...read moreread less

Abstract: Phage-derived integrases can catalyze irreversible, site-specific integration of transgenic payloads into a chromosomal locus, resulting in mammalian cells that stably express transgenes or circuits of interest. Previous studies have demonstrated high-efficiency integration by the Bxb1 integrase in mammalian cells. Here, we show that a point mutation (Bxb1-GA) in Bxb1 target sites significantly increases Bxb1-mediated integration efficiency at the Rosa26 locus in Chinese hamster ovary cells, resulting in the highest integration efficiency reported with a site-specific integrase in mammalian cells. Bxb1-GA point mutant sites do not cross-react with Bxb1 wild-type sites, enabling their use in applications that require orthogonal pairs of target sites. In comparison, we test the efficiency and orthogonality of ϕC31 and Wβ integrases, and show that Wβ has an integration efficiency between those of Bxb1-GA and wild-type Bxb1. Our data present a toolbox of integrases for inserting payloads such as gene circuits or therapeutic transgenes into mammalian cell lines.

...read moreread less

Journal Article•DOI•

A 'poly-transfection' method for rapid, one-pot characterization and optimization of genetic systems.

[...]

Jeremy J Gam¹, Breanna DiAndreth¹, Ross D. Jones¹, Jin Huh¹, Ron Weiss¹ - Show less +1 more•Institutions (1)

Massachusetts Institute of Technology¹

10 Oct 2019-Nucleic Acids Research

TL;DR: One-pot evaluation enabled by poly-transfection accelerates and simplifies the design of genetic systems, providing a new high-information strategy for interrogating biology.

...read moreread less

Abstract: Biological research is relying on increasingly complex genetic systems and circuits to perform sophisticated operations in living cells. Performing these operations often requires simultaneous delivery of many genes, and optimizing the stoichiometry of these genes can yield drastic improvements in performance. However, sufficiently sampling the large design space of gene expression stoichiometries in mammalian cells using current methods is cumbersome, complex, or expensive. We present a 'poly-transfection' method as a simple yet high-throughput alternative that enables comprehensive evaluation of genetic systems in a single, readily-prepared transfection sample. Each cell in a poly-transfection represents an independent measurement at a distinct gene expression stoichiometry, fully leveraging the single-cell nature of transfection experiments. We first benchmark poly-transfection against co-transfection, showing that titration curves for commonly-used regulators agree between the two methods. We then use poly-transfections to efficiently generate new insights, for example in CRISPRa and synthetic miRNA systems. Finally, we use poly-transfection to rapidly engineer a difficult-to-optimize miRNA-based cell classifier for discriminating cancerous cells. One-pot evaluation enabled by poly-transfection accelerates and simplifies the design of genetic systems, providing a new high-information strategy for interrogating biology.

...read moreread less

Posted Content•

A spelling correction model for end-to-end speech recognition

[...]

Jinxi Guo¹, Tara N. Sainath², Ron Weiss²•Institutions (2)

University of California, Los Angeles¹, Google²

19 Feb 2019-arXiv: Audio and Speech Processing

TL;DR: This paper proposes a novel approach to utilizing text-only data, by training a spelling correction (SC) model to explicitly correct errors made by the end-to-end model.

...read moreread less

Abstract: Attention-based sequence-to-sequence models for speech recognition jointly train an acoustic model, language model (LM), and alignment mechanism using a single neural network and require only parallel audio-text pairs. Thus, the language model component of the end-to-end model is only trained on transcribed audio-text pairs, which leads to performance degradation especially on rare words. While there have been a variety of work that look at incorporating an external LM trained on text-only data into the end-to-end framework, none of them have taken into account the characteristic error distribution made by the model. In this paper, we propose a novel approach to utilizing text-only data, by training a spelling correction (SC) model to explicitly correct those errors. On the LibriSpeech dataset, we demonstrate that the proposed model results in an 18.6% relative improvement in WER over the baseline model when directly correcting top ASR hypothesis, and a 29.0% relative improvement when further rescoring an expanded n-best list using an external LM.

...read moreread less

Proceedings Article•DOI•

A Dynamical Biomolecular Neural Network

[...]

Andrew Moorman, Christian Cuba Samaniego, Carlo C. Maley¹, Ron Weiss•Institutions (1)

Arizona State University¹

01 Dec 2019

TL;DR: A Biomolecular Neural Network (BNN), a dynamical chemical reaction network which faithfully implements ANN computations and which is unconditionally stable with respect to its parameters when composed into deeper networks is proposed.

...read moreread less

Abstract: While much of synthetic biology was founded on the creation of reusable, standardized parts, there is now a growing interest in synthetic networks which can compute unique, specially-designed functions in order to recognize patterns or classify cells in-vivo. While artificial neural networks (ANNs) have long provided a mature mathematical framework to address this problem in-silico, their implementation becomes much more challenging in living systems. In this work, we propose a Biomolecular Neural Network (BNN), a dynamical chemical reaction network which faithfully implements ANN computations and which is unconditionally stable with respect to its parameters when composed into deeper networks. Our implementation emphasizes the usefulness of molecular sequestration for achieving negative weight values and a nonlinear "activation function" in its elemental unit, a biomolecular perceptron. We then discuss the application of BNNs to linear and nonlinear classification tasks, and draw analogies to other major concepts in modern machine learning research.

...read moreread less

Posted Content•

Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation

[...]

Fadi Biadsy¹, Ron Weiss¹, Pedro J. Moreno¹, Dimitri Kanevsky¹, Ye Jia¹ - Show less +1 more•Institutions (1)

Google¹

08 Apr 2019-arXiv: Audio and Speech Processing

TL;DR: It is demonstrated that this model can be trained to normalize speech from any speaker regardless of accent, prosody, and background noise, into the voice of a single canonical target speaker with a fixed accent and consistent articulation and prosody.

...read moreread less

Abstract: We describe Parrotron, an end-to-end-trained speech-to-speech conversion model that maps an input spectrogram directly to another spectrogram, without utilizing any intermediate discrete representation The network is composed of an encoder, spectrogram and phoneme decoders, followed by a vocoder to synthesize a time-domain waveform We demonstrate that this model can be trained to normalize speech from any speaker regardless of accent, prosody, and background noise, into the voice of a single canonical target speaker with a fixed accent and consistent articulation and prosody We further show that this normalization model can be adapted to normalize highly atypical speech from a deaf speaker, resulting in significant improvements in intelligibility and naturalness, measured via a speech recognizer and listening tests Finally, demonstrating the utility of this model on other speech tasks, we show that the same model architecture can be trained to perform a speech separation task

...read moreread less

Posted Content•DOI•

PERSIST: A programmable RNA regulation platform using CRISPR endoRNases

[...]

Breanna DiAndreth¹, Noreen Wauford¹, Eileen Hu¹, Sebastian Palacios¹, Ron Weiss¹ - Show less +1 more•Institutions (1)

Massachusetts Institute of Technology¹

16 Dec 2019-bioRxiv

TL;DR: It is shown that genomically introduced transgenes exhibit resistance to silencing when regulated using this platform compared to those that are transcriptionally-regulated, and the orthogonal, modular and composable nature of this platform holds promise for their application in gene and cell therapies.

...read moreread less

Abstract: Regulation of transgene expression is becoming an integral component of gene therapies, cell therapies and biomanufacturing. However, transcription factor-based regulation upon which the majority of such applications are based suffers from complications such as epigenetic silencing, which limits the longevity and reliability of these efforts. Genetically engineered mammalian cells used for cell therapies and biomanufacturing as well as newer RNA-based gene therapies would benefit from post-transcriptional methods of gene regulation, but few such platforms exist that enable sophisticated programming of cell behavior. Here we engineer the 5’ and 3’ untranslated regions of transcripts to enable robust and composable RNA-level regulation through transcript cleavage and, in particular, create modular RNA-level OFF- and ON-switch motifs. We show that genomically introduced transgenes exhibit resistance to silencing when regulated using this platform compared to those that are transcriptionally-regulated. We adapt nine CRISPR-specific endoRNases as RNA-level “activators” and “repressors” and show that these can be easily layered and composed to reconstruct genetic programming topologies previously achieved with transcription factor-based regulation including cascades, all 16 two-input Boolean logic functions, positive feedback, a feed-forward loop and a putative bistable toggle switch. The orthogonal, modular and composable nature of this platform as well as the ease with which robust and predictable gene circuits are constructed holds promise for their application in gene and cell therapies.

...read moreread less

Dissecting N-Glycosylation Dynamics in Chinese Hamster Ovary Cells Fed-batch Cultures using Time Course Omics Analyses

[...]

Madhuresh Sumit¹, Sepideh Dolatshahi², An-Hsiang Adam Chu¹, Kaffa Cote¹, John J. Scarcelli¹, Jeffrey K. Marshall¹, Richard J. Cornell¹, Ron Weiss², Douglas A. Lauffenburger², Bhanu Chandra Mulukutla¹, Bruno Figueroa¹ - Show less +7 more•Institutions (2)

Pfizer¹, Massachusetts Institute of Technology²

01 Feb 2019

TL;DR: The results show that galactose, and not manganese, is able to mitigate the temporal bottleneck, despite both being known effectors of galactosylation.

...read moreread less

Abstract: N-linked glycosylation affects the potency, safety, immunogenicity, and pharmacokinetic clearance of several therapeutic proteins including monoclonal antibodies. A robust control strategy is needed to dial in appropriate glycosylation profile during the course of cell culture processes accurately. However, N-glycosylation dynamics remains insufficiently understood owing to the lack of integrative analyses of factors that influence the dynamics, including sugar nucleotide donors, glycosyltransferases, and glycosidases. Here, an integrative approach involving multi-dimensional omics analyses was employed to dissect the temporal dynamics of glycoforms produced during fed-batch cultures of CHO cells. Several pathways including glycolysis, tricarboxylic citric acid cycle, and nucleotide biosynthesis exhibited temporal dynamics over the cell culture period. The steps involving galactose and sialic acid addition were determined as temporal bottlenecks. Our results show that galactose, and not manganese, is able to mitigate the temporal bottleneck, despite both being known effectors of galactosylation. Furthermore, sialylation is limited by the galactosylated precursors and autoregulation of cytidine monophosphate-sialic acid biosynthesis.

...read moreread less

Proceedings Article•DOI•

Audio Texture Synthesis with Random Neural Networks: Improving Diversity and Quality

[...]

Joseph M. Antognini, Matthew W. Hoffman¹, Ron Weiss¹•Institutions (1)

Google¹

12 May 2019

TL;DR: It is demonstrated that synthesizing diverse audio textures is challenging, and argued that this is because audio data is relatively low-dimensional, and two new terms to the original Grammian loss are introduced: an autocorrelation term that preserves rhythm, and a diversity term that encourages the optimization procedure to synthesize unique textures.

...read moreread less

Abstract: Texture synthesis techniques based on matching the Gram matrix of feature activations in neural networks have achieved spectacular success in the image domain. In this paper we extend these techniques to the audio domain. We demonstrate that synthesizing diverse audio textures is challenging, and argue that this is because audio data is relatively low-dimensional. We therefore introduce two new terms to the original Grammian loss: an autocorrelation term that preserves rhythm, and a diversity term that encourages the optimization procedure to synthesize unique textures. We quantitatively study the impact of our design choices on the quality of the synthesized audio by introducing an audio analogue to the Inception loss which we term the VGGish loss. We show that there is a trade-off between the diversity and quality of the synthesized audio using this technique. Finally we perform a number of experiments to qualitatively study how these design choices impact the quality of the synthesized audio.

...read moreread less

Journal Article•DOI•

Principles of synthetic biology: a MOOC for an emerging field.

[...]

Daniel A. Anderson¹, Ross D. Jones¹, Adam P. Arkin², Adam P. Arkin³, Ron Weiss¹ - Show less +1 more•Institutions (3)

Massachusetts Institute of Technology¹, Lawrence Berkeley National Laboratory², University of California, Berkeley³

01 Jan 2019-Synthetic Biology

TL;DR: The five fundamentals of the Principles of Synthetic Biology, a structured approach to learning the biological principles and theoretical underpinnings of synthetic biology, are described and impact and metrics data from two runs of the course through the edX platform are described.

...read moreread less

Abstract: Synthetic biology requires students and scientists to draw upon knowledge and expertise from many disciplines. While this diversity is one of the field's primary strengths, it also makes it challenging for newcomers to acquire the background knowledge necessary to thrive. To address this gap, we developed a course that provides a structured approach to learning the biological principles and theoretical underpinnings of synthetic biology. Our course, Principles of Synthetic Biology (PoSB), was released on the massively open online course platform edX in 2016. PoSB seeks to teach synthetic biology through five key fundamentals: (i) parts and layers of abstraction, (ii) biomolecular modeling, (iii) digital logic abstraction, (iv) circuit design principles and (v) extended circuit modalities. In this article, we describe the five fundamentals, our formulation of the course, and impact and metrics data from two runs of the course through the edX platform.

...read moreread less

Journal Article•DOI•

An Artificial Tissue Homeostasis Circuit Designed via Analog Circuit Techniques

[...]

Jonathan J. Y. Teo¹, Ron Weiss¹, Rahul Sarpeshkar²•Institutions (2)

Massachusetts Institute of Technology¹, Dartmouth College²

25 Mar 2019-IEEE Transactions on Biomedical Circuits and Systems

TL;DR: An in silico circuit that performs homeostatic control by utilizing a novel scheme with both symmetric and asymmetric division of stem cells is designed, which could be useful in porting an analog-circuit design framework to synthetic biological applications of the future.

...read moreread less

Abstract: Tissue homeostasis (feedback control) is an important mechanism that regulates the population of different cell types within a tissue. In type-1 diabetes, auto-immune attack and consequent death of pancreatic β cells result in the failure of homeostasis and loss of organ function. Synthetically engineered adult stem cells with homeostatic control based on digital logic have been proposed as a solution for regenerating β cells. Such previously proposed homeostatic control circuits have thus far been unable to reliably control both stem-cell proliferation and stem-cell differentiation. Using analog circuits and feedback systems analysis, we have designed an in silico circuit that performs homeostatic control by utilizing a novel scheme with both symmetric and asymmetric division of stem cells. The use of a variety of feedback systems analysis techniques, which is common in analog circuit design, including root-locus techniques, Bode plots of feedback-loop frequency response, compensation techniques for improving stability, and robustness analysis help us choose design parameters to meet desirable specifications. For example, we show that lead compensation in analog circuits instantiated as an incoherent feed-forward loop in the biological circuit improves stability, whereas simultaneously reducing steady-state tracking error. Our symmetric and asymmetric division scheme also improves phase margin in the feedback loop, and thus improves robustness. This paper could be useful in porting an analog-circuit design framework to synthetic biological applications of the future.

...read moreread less

Showing papers by "Ron Weiss published in 2019"