Top 11 papers published by Ran El-Yaniv from Technion – Israel Institute of Technology in 2017

Journal Article•

Quantized neural networks: training neural networks with low precision weights and activations

[...]

Itay Hubara¹, Matthieu Courbariaux², Daniel Soudry³, Ran El-Yaniv¹, Yoshua Bengio² - Show less +1 more•Institutions (3)

Technion – Israel Institute of Technology¹, Université de Montréal², Columbia University³

01 Jan 2017-Journal of Machine Learning Research

TL;DR: In this paper, a method to train quantized neural networks (QNNs) with extremely low precision (e.g., 1-bit) weights and activations, at run-time is introduced.

...read moreread less

Abstract: We introduce a method to train Quantized Neural Networks (QNNs) -- neural networks with extremely low precision (e.g., 1-bit) weights and activations, at run-time. At traintime the quantized weights and activations are used for computing the parameter gradients. During the forward pass, QNNs drastically reduce memory size and accesses, and replace most arithmetic operations with bit-wise operations. As a result, power consumption is expected to be drastically reduced. We trained QNNs over the MNIST, CIFAR-10, SVHN and ImageNet datasets. The resulting QNNs achieve prediction accuracy comparable to their 32-bit counterparts. For example, our quantized version of AlexNet with 1-bit weights and 2-bit activations achieves 51% top-1 accuracy. Moreover, we quantize the parameter gradients to 6-bits as well which enables gradients computation using only bit-wise operation. Quantized recurrent neural networks were tested over the Penn Treebank dataset, and achieved comparable accuracy as their 32-bit counterparts using only 4-bits. Last but not least, we programmed a binary matrix multiplication GPU kernel with which it is possible to run our MNIST QNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy. The QNN code is available online.

...read moreread less

919 citations

Proceedings Article•

Selective Classification for Deep Neural Networks

[...]

Yonatan Geifman¹, Ran El-Yaniv¹•Institutions (1)

Technion – Israel Institute of Technology¹

01 May 2017

TL;DR: A method to construct a selective classifier given a trained neural network, which allows a user to set a desired risk level and the classifier rejects instances as needed, to grant the desired risk (with high probability).

...read moreread less

Abstract: Selective classification techniques (also known as reject option) have not yet been considered in the context of deep neural networks (DNNs). These techniques can potentially significantly improve DNNs prediction performance by trading-off coverage. In this paper we propose a method to construct a selective classifier given a trained neural network. Our method allows a user to set a desired risk level. At test time, the classifier rejects instances as needed, to grant the desired risk (with high probability). Empirical results over CIFAR and ImageNet convincingly demonstrate the viability of our method, which opens up possibilities to operate DNNs in mission-critical applications. For example, using our method an unprecedented 2% error in top-5 ImageNet classification can be guaranteed with probability 99.9%, with almost 60% test coverage.

...read moreread less

330 citations

Posted Content•

Deep Active Learning over the Long Tail.

[...]

Yonatan Geifman, Ran El-Yaniv

02 Nov 2017-arXiv: Learning

TL;DR: A novel active learning algorithm that queries consecutive points from the pool using farthest-first traversals in the space of neural activation over a representation layer shows consistent and overwhelming improvement in sample complexity over passive learning (random sampling) for three datasets: MNIST, CIFar-10, and CIFAR-100.

...read moreread less

Abstract: This paper is concerned with pool-based active learning for deep neural networks. Motivated by coreset dataset compression ideas, we present a novel active learning algorithm that queries consecutive points from the pool using farthest-first traversals in the space of neural activation over a representation layer. We show consistent and overwhelming improvement in sample complexity over passive learning (random sampling) for three datasets: MNIST, CIFAR-10, and CIFAR-100. In addition, our algorithm outperforms the traditional uncertainty sampling technique (obtained using softmax activations), and we identify cases where uncertainty sampling is only slightly better than random sampling.

...read moreread less

99 citations

Journal Article•DOI•

Learn on Source, Refine on Target: A Model Transfer Learning Framework with Random Forests

[...]

Noam Segev¹, Maayan Harel¹, Shie Mannor¹, Koby Crammer¹, Ran El-Yaniv¹ - Show less +1 more•Institutions (1)

Technion – Israel Institute of Technology¹

01 Sep 2017-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Novel model transfer-learning methods that refine a decision forest model by considering an ensemble that contains the union of the two forests and exhibit impressive experimental results over a range of problems are proposed.

...read moreread less

Abstract: We propose novel model transfer-learning methods that refine a decision forest model $M$ learned within a “source” domain using a training set sampled from a “target” domain, assumed to be a variation of the source. We present two random forest transfer algorithms. The first algorithm searches greedily for locally optimal modifications of each tree structure by trying to locally expand or reduce the tree around individual nodes. The second algorithm does not modify structure, but only the parameter (thresholds) associated with decision nodes. We also propose to combine both methods by considering an ensemble that contains the union of the two forests. The proposed methods exhibit impressive experimental results over a range of problems.

...read moreread less

89 citations

Posted Content•

Selective Classification for Deep Neural Networks

[...]

Yonatan Geifman¹, Ran El-Yaniv¹•Institutions (1)

Technion – Israel Institute of Technology¹

23 May 2017-arXiv: Learning

TL;DR: This paper proposed a method to construct a selective classifier given a trained neural network, which allows a user to set a desired risk level, and the classifier rejects instances as needed, to grant the desired risk (with high probability).

...read moreread less

Abstract: Selective classification techniques (also known as reject option) have not yet been considered in the context of deep neural networks (DNNs). These techniques can potentially significantly improve DNNs prediction performance by trading-off coverage. In this paper we propose a method to construct a selective classifier given a trained neural network. Our method allows a user to set a desired risk level. At test time, the classifier rejects instances as needed, to grant the desired risk (with high probability). Empirical results over CIFAR and ImageNet convincingly demonstrate the viability of our method, which opens up possibilities to operate DNNs in mission-critical applications. For example, using our method an unprecedented 2% error in top-5 ImageNet classification can be guaranteed with probability 99.9%, and almost 60% test coverage.

...read moreread less

33 citations

Patent•

Quantized neural network training and inference

[...]

Ran El-Yaniv¹, Itay Hubara¹, Daniel Soudry¹•Institutions (1)

Technion – Israel Institute of Technology¹

04 Apr 2017

TL;DR: In this paper, a neural network is constructed with neurons arranged in layers and connected by connections associated with quantized connection weight functions adapted to output quantised connection weight values, and a plurality of weight gradients are calculated during backpropagation sub-processes by computing neuron gradients, each of an output of a respective activation function in one layer with respect to an input of the respective quantized activation function.

...read moreread less

Abstract: Training neural networks by constructing a neural network model having neurons each associated with a quantized activation function adapted to output a quantized activation value. The neurons are arranged in layers and connected by connections associated quantized connection weight functions adapted to output quantized connection weight values. During a training process a plurality of weight gradients are calculated during backpropagation sub-processes by computing neuron gradients, each of an output of a respective the quantized activation function in one layer with respect to an input of the respective quantized activation function. Each neuron gradient is calculated such that when an absolute value of the input is smaller than a positive constant threshold value, the respective neuron gradient is set as a positive constant output value and when the absolute value of the input is smaller than the positive constant threshold value the neuron gradient is set to zero.

...read moreread less

30 citations

Posted Content•

The Relationship Between Agnostic Selective Classification Active Learning and the Disagreement Coefficient

[...]

Roei Gelbhart, Ran El-Yaniv

19 Mar 2017-arXiv: Learning

TL;DR: In this article, it was shown that a fast rejection rate is achieved if the rejection mass is bounded from above by O(1/m) where m is the number of labeled examples used to train the classifier and O hides logarithmic factors.

...read moreread less

Abstract: A selective classifier (f,g) comprises a classification function f and a binary selection function g, which determines if the classifier abstains from prediction, or uses f to predict. The classifier is called pointwise-competitive if it classifies each point identically to the best classifier in hindsight (from the same class), whenever it does not abstain. The quality of such a classifier is quantified by its rejection mass, defined to be the probability mass of the points it rejects. A "fast" rejection rate is achieved if the rejection mass is bounded from above by O(1/m) where m is the number of labeled examples used to train the classifier (and O hides logarithmic factors). Pointwise-competitive selective (PCS) classifiers are intimately related to disagreement-based active learning and it is known that in the realizable case, a fast rejection rate of a known PCS algorithm (called Consistent Selective Strategy) is equivalent to an exponential speedup of the well-known CAL active algorithm. We focus on the agnostic setting, for which there is a known algorithm called LESS that learns a PCS classifier and achieves a fast rejection rate (depending on Hanneke's disagreement coefficient) under strong assumptions. We present an improved PCS learning algorithm called ILESS for which we show a fast rate (depending on Hanneke's disagreement coefficient) without any assumptions. Our rejection bound smoothly interpolates the realizable and agnostic settings. The main result of this paper is an equivalence between the following three entities: (i) the existence of a fast rejection rate for any PCS learning algorithm (such as ILESS); (ii) a poly-logarithmic bound for Hanneke's disagreement coefficient; and (iii) an exponential speedup for a new disagreement-based active learner called ActiveiLESS.

...read moreread less

7 citations

Posted Content•

The Prediction Advantage: A Universally Meaningful Performance Measure for Classification and Regression.

[...]

Ran El-Yaniv, Yonatan Geifman, Yair Wiener

23 May 2017-arXiv: Learning

TL;DR: It is argued that among several known alternative performance measures, PA is the best (and only) quantity ensuring meaningfulness for all noise and imbalance levels.

...read moreread less

Abstract: We introduce the Prediction Advantage (PA), a novel performance measure for prediction functions under any loss function (eg, classification or regression) The PA is defined as the performance advantage relative to the Bayesian risk restricted to knowing only the distribution of the labels We derive the PA for well-known loss functions, including 0/1 loss, cross-entropy loss, absolute loss, and squared loss In the latter case, the PA is identical to the well-known R-squared measure, widely used in statistics The use of the PA ensures meaningful quantification of prediction performance, which is not guaranteed, for example, when dealing with noisy imbalanced classification problems We argue that among several known alternative performance measures, PA is the best (and only) quantity ensuring meaningfulness for all noise and imbalance levels

...read moreread less

4 citations

Posted Content•

Growth-Optimal Portfolio Selection under CVaR Constraints

[...]

Guy Uziel, Ran El-Yaniv

27 May 2017-arXiv: Mathematical Finance

TL;DR: In this paper, the authors consider online learning of portfolios of stocks whose prices are governed by arbitrary (unknown) stationary and ergodic processes, where the goal is to maximize wealth while keeping the conditional value at risk (CVaR) below a desired threshold.

...read moreread less

Abstract: Online portfolio selection research has so far focused mainly on minimizing regret defined in terms of wealth growth. Practical financial decision making, however, is deeply concerned with both wealth and risk. We consider online learning of portfolios of stocks whose prices are governed by arbitrary (unknown) stationary and ergodic processes, where the goal is to maximize wealth while keeping the conditional value at risk (CVaR) below a desired threshold. We characterize the asymptomatically optimal risk-adjusted performance and present an investment strategy whose portfolios are guaranteed to achieve the asymptotic optimal solution while fulfilling the desired risk constraint. We also numerically demonstrate and validate the viability of our method on standard datasets.

...read moreread less

3 citations

Posted Content•

Multi-Objective Non-parametric Sequential Prediction

[...]

Guy Uziel¹, Ran El-Yaniv²•Institutions (2)

IBM¹, Technion – Israel Institute of Technology²

05 Mar 2017-arXiv: Learning

TL;DR: In this article, the authors extend the multi-objective framework to the case of stationary and ergodic processes, thus allowing dependencies among observations, and present an algorithm whose predictions achieve the optimal solution while fulfilling any continuous and convex constraining criterion.

...read moreread less

Abstract: Online-learning research has mainly been focusing on minimizing one objective function. In many real-world applications, however, several objective functions have to be considered simultaneously. Recently, an algorithm for dealing with several objective functions in the i.i.d. case has been presented. In this paper, we extend the multi-objective framework to the case of stationary and ergodic processes, thus allowing dependencies among observations. We first identify an asymptomatic lower bound for any prediction strategy and then present an algorithm whose predictions achieve the optimal solution while fulfilling any continuous and convex constraining criterion.

...read moreread less

3 citations

Proceedings Article•

Multi-Objective Non-parametric Sequential Prediction

[...]

Guy Uziel¹, Ran El-Yaniv²•Institutions (2)

IBM¹, Technion – Israel Institute of Technology²

01 Jan 2017

TL;DR: The multi-objective framework is extended to the case of stationary and ergodic processes, thus allowing dependencies among observations and presenting an algorithm whose predictions achieve the optimal solution while fulfilling any continuous and convex constraining criterion.

...read moreread less

Abstract: Online-learning research has mainly been focusing on minimizing one objective function. In many real-world applications, however, several objective functions have to be considered simultaneously. Recently, an algorithm for dealing with several objective functions in the i.i.d. case has been presented. In this paper, we extend the multi-objective framework to the case of stationary and ergodic processes, thus allowing dependencies among observations. We first identify an asymptomatic lower bound for any prediction strategy and then present an algorithm whose predictions achieve the optimal solution while fulfilling any continuous and convex constraining criterion.

...read moreread less

Showing papers by "Ran El-Yaniv published in 2017"