https://www2.econ.iastate.edu/tesfatsi/DeepLearningInNeuralNetworksOverview.JSchmidhuber2015.pdf

Deep learning in neural networks

Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

Machine learning

TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. Tensor-Flow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, general-purpose GPUs, and custom-designed ASICs known as Tensor Processing Units (TPUs). This architecture gives flexibility to the application developer: whereas in previous "parameter server" designs the management of shared state is built into the system, TensorFlow enables developers to experiment with novel optimizations and training algorithms. TensorFlow supports a variety of applications, with a focus on training and inference on deep neural networks. Several Google services use TensorFlow in production, we have released it as an open-source project, and it has become widely used for machine learning research. In this paper, we describe the TensorFlow dataflow model and demonstrate the compelling performance that TensorFlow achieves for several real-world applications.

/pdf/tensorflow-a-system-for-large-scale-machine-learning-10wgdjhl6k.pdf

TensorFlow: a system for large-scale machine learning

TensorFlow is an interface for expressing machine learning algorithms, and an
implementation for executing such algorithms. A computation expressed using
TensorFlow can be executed with little or no change on a wide variety of
heterogeneous systems, ranging from mobile devices such as phones and tablets
up to large-scale distributed systems of hundreds of machines and thousands of
computational devices such as GPU cards. The system is flexible and can be used
to express a wide variety of algorithms, including training and inference
algorithms for deep neural network models, and it has been used for conducting
research and for deploying machine learning systems into production across more
than a dozen areas of computer science and other fields, including speech
recognition, computer vision, robotics, information retrieval, natural language
processing, geographic information extraction, and computational drug
discovery. This paper describes the TensorFlow interface and an implementation
of that interface that we have built at Google. The TensorFlow API and a
reference implementation were released as an open-source package under the
Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.

/pdf/tensorflow-large-scale-machine-learning-on-heterogeneous-2yk4fgchw6.pdf

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

Pattern Recognition and Machine Learning

A method and apparatus transforms typically differing length text string representations (i.e., sequences) of biological fragments into uniform length representations. A comparison database stores a predefined number of known biological sequences. A comparison routine compares and scores a subject sequence against each known sequence in the database. Each individual score (one for each known sequence in the database) serves as a vector element forming a fixed length vector representation of the subject sequence. Vector length equals the predefined number of known biological sequences in the database. Scoring is a probability or an occurrence count of the known biological sequence in the subject sequence.

Computer method and apparatus for uniform representation of genome sequences

We describe Parrotron, an end-to-end-trained speech-to-speech conversion model that maps an input spectrogram directly to another spectrogram, without utilizing any intermediate discrete representation The network is composed of an encoder, spectrogram and phoneme decoders, followed by a vocoder to synthesize a time-domain waveform We demonstrate that this model can be trained to normalize speech from any speaker regardless of accent, prosody, and background noise, into the voice of a single canonical target speaker with a fixed accent and consistent articulation and prosody We further show that this normalization model can be adapted to normalize highly atypical speech from a deaf speaker, resulting in significant improvements in intelligibility and naturalness, measured via a speech recognizer and listening tests Finally, demonstrating the utility of this model on other speech tasks, we show that the same model architecture can be trained to perform a speech separation task

Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation

We explore automated discovery of topicallycoherent segments in speech or text sequences. We give two new discriminative topic segmentation algorithms which employ a new measure of text similarity based on word co-occurrence. Both algorithms function by finding extrema in the similarity signal over the text, with the latter algorithm using a compact support-vector based description of a window of text or speech observations in word similarity space to overcome noise introduced by speech recognition errors and off-topic content. In experiments over speech and text news streams, we show that these algorithms outperform previous methods. We observe that topic segmentation of speech recognizer output is a more difficult problem than that of text streams; however, we demonstrate that by using a lattice of competing hypotheses rather than just the one-best hypothesis as input to the segmentation algorithm, the performance of the algorithm can be improved.

/pdf/discriminative-topic-segmentation-of-text-and-speech-2ep4nlwfxj.pdf

Discriminative Topic Segmentation of Text and Speech

We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages. This is achieved by pre-training the encoder of the model on a large unlabeled multilingual dataset of 12 million (M) hours spanning over 300 languages, and fine-tuning on a smaller labeled dataset. We use multilingual pre-training with random-projection quantization and speech-text modality matching to achieve state-of-the-art performance on downstream multilingual ASR and speech-to-text translation tasks. We also demonstrate that despite using a labeled training set 1/7-th the size of that used for the Whisper model, our model exhibits comparable or better performance on both in-domain and out-of-domain speech recognition tasks across many languages.

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

While research has often shown that building dialect-specific Automatic Speech Recognizers is the optimal approach to dealing with dialectal variations of the same language, we have observed that dialect-specific recognizers do not always output the best recognitions. Often enough, another dialectal recognizer outputs a better recognition than the dialect-specific one. In this paper, we present two methods to select and combine the best decoded hypothesis from a pool of dialectal recognizers. We follow a Machine Learning approach and extract features from the Speech Recognition output along with Word Embeddings and use Shallow Neural Networks for classification. Our experiments using Dictation and Voice Search data from the main four Arabic dialects show good WER improvements for the hypothesis selection scheme, reducing the WER by 2.1 to 12.1% depending on the test set, and promising results for the hypotheses combination scheme.

/pdf/selection-and-combination-of-hypotheses-for-dialectal-speech-4nkwofswjh.pdf

Pedro J. Moreno

Papers

Computer method and apparatus for uniform representation of genome sequences

Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation

Discriminative Topic Segmentation of Text and Speech

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

Selection and combination of hypotheses for dialectal speech recognition