抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1（PfPMP1）与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用，在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员，通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

疟原虫var基因转换速率变化导致抗原变异[英]／Paul H, Robert P, Christodoulou Z, et al//Proc Natl Acad Sci U S A

Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.

/pdf/xgboost-a-scalable-tree-boosting-system-48ocu6x4c7.pdf

XGBoost: A Scalable Tree Boosting System

In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/. To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine -- the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.

/pdf/the-anatomy-of-a-large-scale-hypertextual-web-search-engine-496poj789d.pdf

The anatomy of a large-scale hypertextual Web search engine

The importance of a Web page is an inherently subjective matter, which depends on the readers interests, knowledge and attitudes. But there is still much that can be said objectively about the relative importance of Web pages. This paper describes PageRank, a mathod for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them. We compare PageRank to an idealized random Web surfer. We show how to efficiently compute PageRank for large numbers of pages. And, we show how to apply PageRank to search and to user navigation.

The PageRank Citation Ranking : Bringing Order to the Web

Recently, we presented a multichannel neural network model trained to perform speech enhancement jointly with acoustic modeling [1], directly from raw waveform input signals. While this model achieved over a 10% relative improvement compared to a single channel model, it came at a large cost in computational complexity, particularly in the convolutions used to implement a time-domain filterbank. In this paper we present several different approaches to reduce the complexity of this model by reducing the stride of the convolution operation and by implementing filters in the frequency domain. These optimizations reduce the computational complexity of the model by a factor of 3 with no loss in accuracy on a 2,000 hour Voice Search task.

Reducing the Computational Complexity of Multimicrophone Acoustic Models with Integrated Feature Extraction.

We describe a sequence-to-sequence neural network which directly generates speech waveforms from text inputs. The architecture extends the Tacotron model by incorporating a normalizing flow into the autoregressive decoder loop. Output waveforms are modeled as a sequence of non-overlapping fixed-length blocks, each one containing hundreds of samples. The interdependencies of waveform samples within each block are modeled using the normalizing flow, enabling parallel training and synthesis. Longer-term dependencies are handled autoregressively by conditioning each flow on preceding blocks. This model can be optimized directly with maximum likelihood, with-out using intermediate, hand-designed features nor additional loss terms. Contemporary state-of-the-art text-to-speech (TTS) systems use a cascade of separately learned models: one (such as Tacotron) which generates intermediate features (such as spectrograms) from text, followed by a vocoder (such as WaveRNN) which generates waveform samples from the intermediate features. The proposed system, in contrast, does not use a fixed intermediate representation, and learns all parameters end-to-end. Experiments show that the proposed model generates speech with quality approaching a state-of-the-art neural TTS system, with significantly improved generation speed.

Wave-Tacotron: Spectrogram-Free End-to-End Text-to-Speech Synthesis

This paper introduces a new algorithm to infer temporal logic properties of a system from data consisting of a set of finite time system traces. We propose an algorithm that generates a Signal Temporal Logic formula by discretizing the entire domain and codomain of the system traces. Unlike many popular inference algorithms which require labeled data that represents whether a trace exhibits a desired behavior (positive) or not (negative), this approach only requires positive traces to infer temporal logic properties. We present two case studies to illustrate the efficiency and accuracy of the proposed algorithm. The first is a biological network consisting of a genetic logic circuit in a bacterial cell. The second is a fault detection problem in automotive powertrain systems. We also compare the performance of the algorithm with an existing inference algorithm.

Grid-based temporal logic inference

While much of synthetic biology was founded on the creation of reusable, standardized parts, there is now a growing interest in synthetic networks which can compute unique, specially-designed functions in order to recognize patterns or classify cells in-vivo. While artificial neural networks (ANNs) have long provided a mature mathematical framework to address this problem in-silico, their implementation becomes much more challenging in living systems. In this work, we propose a Biomolecular Neural Network (BNN), a dynamical chemical reaction network which faithfully implements ANN computations and which is unconditionally stable with respect to its parameters when composed into deeper networks. Our implementation emphasizes the usefulness of molecular sequestration for achieving negative weight values and a nonlinear "activation function" in its elemental unit, a biomolecular perceptron. We then discuss the application of BNNs to linear and nonlinear classification tasks, and draw analogies to other major concepts in modern machine learning research.

A Dynamical Biomolecular Neural Network

We describe Parrotron, an end-to-end-trained speech-to-speech conversion model that maps an input spectrogram directly to another spectrogram, without utilizing any intermediate discrete representation The network is composed of an encoder, spectrogram and phoneme decoders, followed by a vocoder to synthesize a time-domain waveform We demonstrate that this model can be trained to normalize speech from any speaker regardless of accent, prosody, and background noise, into the voice of a single canonical target speaker with a fixed accent and consistent articulation and prosody We further show that this normalization model can be adapted to normalize highly atypical speech from a deaf speaker, resulting in significant improvements in intelligibility and naturalness, measured via a speech recognizer and listening tests Finally, demonstrating the utility of this model on other speech tasks, we show that the same model architecture can be trained to perform a speech separation task

Ron Weiss

Papers

Reducing the Computational Complexity of Multimicrophone Acoustic Models with Integrated Feature Extraction.

Wave-Tacotron: Spectrogram-Free End-to-End Text-to-Speech Synthesis

Grid-based temporal logic inference

A Dynamical Biomolecular Neural Network

Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation