scispace - formally typeset
Search or ask a question
Author

Christoph Studer

Bio: Christoph Studer is an academic researcher from ETH Zurich. The author has contributed to research in topics: MIMO & Multi-user MIMO. The author has an hindex of 55, co-authored 345 publications receiving 11694 citations. Previous affiliations of Christoph Studer include Commissariat à l'énergie atomique et aux énergies alternatives & Chalmers University of Technology.


Papers
More filters
Proceedings Article
01 Jan 2019
TL;DR: In this paper, the authors propose to reuse the gradient information computed when updating model parameters to eliminate the overhead cost of generating adversarial examples by recycling the gradients of the model parameters and achieve comparable robustness to PGD adversarial training.
Abstract: Adversarial training, in which a network is trained on adversarial examples, is one of the few defenses against adversarial attacks that withstands strong attacks. Unfortunately, the high cost of generating strong adversarial examples makes standard adversarial training impractical on large-scale problems like ImageNet. We present an algorithm that eliminates the overhead cost of generating adversarial examples by recycling the gradient information computed when updating model parameters. Our "free" adversarial training algorithm achieves comparable robustness to PGD adversarial training on the CIFAR-10 and CIFAR-100 datasets at negligible additional cost compared to natural training, and can be 7 to 30 times faster than other strong adversarial training methods. Using a single workstation with 4 P100 GPUs and 2 days of runtime, we can train a robust model for the large-scale ImageNet classification task that maintains 40% accuracy against PGD attacks.

772 citations

Proceedings ArticleDOI
15 Feb 2018
TL;DR: This paper explore the structure of neural loss functions and the effect of loss landscapes on generalization, using a range of visualization methods, and explore how network architecture affects the loss landscape, and how training parameters affect the shape of minimizers.
Abstract: Neural network training relies on our ability to find "good" minimizers of highly non-convex loss functions. It is well known that certain network architecture designs (e.g., skip connections) produce loss functions that train easier, and well-chosen training parameters (batch size, learning rate, optimizer) produce minimizers that generalize better. However, the reasons for these differences, and their effect on the underlying loss landscape, is not well understood. In this paper, we explore the structure of neural loss functions, and the effect of loss landscapes on generalization, using a range of visualization methods. First, we introduce a simple "filter normalization" method that helps us visualize loss function curvature, and make meaningful side-by-side comparisons between loss functions. Then, using a variety of visualizations, we explore how network architecture affects the loss landscape, and how training parameters affect the shape of minimizers.

554 citations

Posted Content
TL;DR: This paper introduces a simple "filter normalization" method that helps to visualize loss function curvature and make meaningful side-by-side comparisons between loss functions, and explores how network architecture affects the loss landscape, and how training parameters affect the shape of minimizers.
Abstract: Neural network training relies on our ability to find "good" minimizers of highly non-convex loss functions. It is well-known that certain network architecture designs (e.g., skip connections) produce loss functions that train easier, and well-chosen training parameters (batch size, learning rate, optimizer) produce minimizers that generalize better. However, the reasons for these differences, and their effects on the underlying loss landscape, are not well understood. In this paper, we explore the structure of neural loss functions, and the effect of loss landscapes on generalization, using a range of visualization methods. First, we introduce a simple "filter normalization" method that helps us visualize loss function curvature and make meaningful side-by-side comparisons between loss functions. Then, using a variety of visualizations, we explore how network architecture affects the loss landscape, and how training parameters affect the shape of minimizers.

530 citations

Journal ArticleDOI
01 Feb 2008
TL;DR: VLSI implementation results are provided which demonstrate that single tree-search, sorted QR-decomposition, channel matrix regularization, log-likelihood ratio clipping, and imposing runtime constraints are the key ingredients for realizing soft-output MIMO detectors with near max-log performance at a chip area that is only 58% higher than that of the best-known hard-output sphere decoder VLSI Implementation.
Abstract: Multiple-input multiple-output (MIMO) detection algorithms providing soft information for a subsequent channel decoder pose significant implementation challenges due to their high computational complexity. In this paper, we show how sphere decoding can be used as an efficient tool to implement soft-output MIMO detection with flexible trade-offs between computational complexity and (error rate) performance. In particular, we provide VLSI implementation results which demonstrate that single tree-search, sorted QR-decomposition, channel matrix regularization, log-likelihood ratio clipping, and imposing runtime constraints are the key ingredients for realizing soft-output MIMO detectors with near max-log performance at a chip area that is only 58% higher than that of the best-known hard-output sphere decoder VLSI implementation.

404 citations

Journal ArticleDOI
TL;DR: It is illustrated that, for the 1-bit quantized case, pilot-based channel estimation together with maximal-ratio combing, or zero-forcing detection enables reliable multi-user communication with high-order constellations, in spite of the severe nonlinearity introduced by the ADCs.
Abstract: We investigate the uplink throughput achievable by a multiple-user (MU) massive multiple-input multiple-output (MIMO) system, in which the base station is equipped with a large number of low-resolution analog-to-digital converters (ADCs). Our focus is on the case where neither the transmitter nor the receiver have any a priori channel state information. This implies that the fading realizations have to be learned through pilot transmission followed by channel estimation at the receiver, based on coarsely quantized observations. We propose a novel channel estimator, based on Bussgang’s decomposition, and a novel approximation to the rate achievable with finite-resolution ADCs, both for the case of finite-cardinality constellations and of Gaussian inputs, that is accurate for a broad range of system parameters. Through numerical results, we illustrate that, for the 1-bit quantized case, pilot-based channel estimation together with maximal-ratio combing, or zero-forcing detection enables reliable multi-user communication with high-order constellations, in spite of the severe nonlinearity introduced by the ADCs. Furthermore, we show that the rate achievable in the infinite-resolution (no quantization) case can be approached using ADCs with only a few bits of resolution. We finally investigate the robustness of low-ADC-resolution MU-MIMO uplink against receive power imbalances between the different users, caused for example by imperfect power control.

372 citations


Cited by
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Posted Content
TL;DR: This work proposes a simple modification to recover the original formulation of weight decay regularization by decoupling the weight decay from the optimization steps taken w.r.t. the loss function, and provides empirical evidence that this modification substantially improves Adam's generalization performance.
Abstract: L$_2$ regularization and weight decay regularization are equivalent for standard stochastic gradient descent (when rescaled by the learning rate), but as we demonstrate this is \emph{not} the case for adaptive gradient algorithms, such as Adam. While common implementations of these algorithms employ L$_2$ regularization (often calling it "weight decay" in what may be misleading due to the inequivalence we expose), we propose a simple modification to recover the original formulation of weight decay regularization by \emph{decoupling} the weight decay from the optimization steps taken w.r.t. the loss function. We provide empirical evidence that our proposed modification (i) decouples the optimal choice of weight decay factor from the setting of the learning rate for both standard SGD and Adam and (ii) substantially improves Adam's generalization performance, allowing it to compete with SGD with momentum on image classification datasets (on which it was previously typically outperformed by the latter). Our proposed decoupled weight decay has already been adopted by many researchers, and the community has implemented it in TensorFlow and PyTorch; the complete source code for our experiments is available at this https URL

6,909 citations

Journal ArticleDOI
TL;DR: While massive MIMO renders many traditional research problems irrelevant, it uncovers entirely new problems that urgently need attention: the challenge of making many low-cost low-precision components that work effectively together, acquisition and synchronization for newly joined terminals, the exploitation of extra degrees of freedom provided by the excess of service antennas, reducing internal power consumption to achieve total energy efficiency reductions, and finding new deployment scenarios.
Abstract: Multi-user MIMO offers big advantages over conventional point-to-point MIMO: it works with cheap single-antenna terminals, a rich scattering environment is not required, and resource allocation is simplified because every active terminal utilizes all of the time-frequency bins. However, multi-user MIMO, as originally envisioned, with roughly equal numbers of service antennas and terminals and frequency-division duplex operation, is not a scalable technology. Massive MIMO (also known as large-scale antenna systems, very large MIMO, hyper MIMO, full-dimension MIMO, and ARGOS) makes a clean break with current practice through the use of a large excess of service antennas over active terminals and time-division duplex operation. Extra antennas help by focusing energy into ever smaller regions of space to bring huge improvements in throughput and radiated energy efficiency. Other benefits of massive MIMO include extensive use of inexpensive low-power components, reduced latency, simplification of the MAC layer, and robustness against intentional jamming. The anticipated throughput depends on the propagation environment providing asymptotically orthogonal channels to the terminals, but so far experiments have not disclosed any limitations in this regard. While massive MIMO renders many traditional research problems irrelevant, it uncovers entirely new problems that urgently need attention: the challenge of making many low-cost low-precision components that work effectively together, acquisition and synchronization for newly joined terminals, the exploitation of extra degrees of freedom provided by the excess of service antennas, reducing internal power consumption to achieve total energy efficiency reductions, and finding new deployment scenarios. This article presents an overview of the massive MIMO concept and contemporary research on the topic.

6,184 citations

01 Jan 2016
TL;DR: The table of integrals series and products is universally compatible with any devices to read and is available in the book collection an online access to it is set as public so you can get it instantly.
Abstract: Thank you very much for downloading table of integrals series and products. Maybe you have knowledge that, people have look hundreds times for their chosen books like this table of integrals series and products, but end up in harmful downloads. Rather than reading a good book with a cup of coffee in the afternoon, instead they cope with some harmful virus inside their laptop. table of integrals series and products is available in our book collection an online access to it is set as public so you can get it instantly. Our book servers saves in multiple locations, allowing you to get the most less latency time to download any of our books like this one. Merely said, the table of integrals series and products is universally compatible with any devices to read.

4,085 citations