scispace - formally typeset
Search or ask a question
Author

Klaus-Robert Müller

Other affiliations: Korea University, University of Tokyo, Fraunhofer Society  ...read more
Bio: Klaus-Robert Müller is an academic researcher from Technical University of Berlin. The author has contributed to research in topics: Artificial neural network & Support vector machine. The author has an hindex of 129, co-authored 764 publications receiving 79391 citations. Previous affiliations of Klaus-Robert Müller include Korea University & University of Tokyo.


Papers
More filters
Posted Content
TL;DR: It is shown that the inherent sampling error of the Wu et al. method can be corrected by using neural network-based MCMC or importance sampling which leads to asymptotically unbiased estimators for physical quantities.
Abstract: In this comment on "Solving Statistical Mechanics Using Variational Autoregressive Networks" by Wu et al., we propose a subtle yet powerful modification of their approach. We show that the inherent sampling error of their method can be corrected by using neural network-based MCMC or importance sampling which leads to asymptotically unbiased estimators for physical quantities. This modification is possible due to a singular property of VANs, namely that they provide the exact sample probability. With these modifications, we believe that their method could have a substantially greater impact on various important fields of physics, including strongly-interacting field theories and statistical physics.

6 citations

Book ChapterDOI
10 Sep 2006
TL;DR: This paper proposes an extension of the Singular Information Criterion to apply it to many singular machines, and evaluate the efficiency in Gaussian mixtures, which offers an effective strategy to select the optimal size.
Abstract: To decide the optimal size of learning machines is a central issue in the statistical learning theory, and that is why some theoretical criteria such as the BIC are developed. However, they cannot be applied to singular machines, and it is known that many practical learning machines e.g. mixture models, hidden Markov models, and Bayesian networks, are singular. Recently, we proposed the Singular Information Criterion (SingIC), which allows us to select the optimal size of singular machines. The SingIC is based on the analysis of the learning coefficient. So, the machines, to which the SingIC can be applied, are still limited. In this paper, we propose an extension of this criterion, which enables us to apply it to many singular machines, and evaluate the efficiency in Gaussian mixtures. The results offer an effective strategy to select the optimal size.

6 citations

Posted Content
TL;DR: This paper presents a novel N-ary coding scheme that decomposes the original multi-class problem into simpler multi- class subproblems, which is similar to applying a divide-and-conquer method and shows that the proposed coding scheme achieves superior prediction performance over the state-of-the-art coding methods.
Abstract: The coding matrix design plays a fundamental role in the prediction performance of the error correcting output codes (ECOC)-based multi-class task. {In many-class classification problems, e.g., fine-grained categorization, it is difficult to distinguish subtle between-class differences under existing coding schemes due to a limited choices of coding values.} In this paper, we investigate whether one can relax existing binary and ternary code design to $N$-ary code design to achieve better classification performance. {In particular, we present a novel $N$-ary coding scheme that decomposes the original multi-class problem into simpler multi-class subproblems, which is similar to applying a divide-and-conquer method.} The two main advantages of such a coding scheme are as follows: (i) the ability to construct more discriminative codes and (ii) the flexibility for the user to select the best $N$ for ECOC-based classification. We show empirically that the optimal $N$ (based on classification performance) lies in $[3, 10]$ with some trade-off in computational cost. Moreover, we provide theoretical insights on the dependency of the generalization error bound of an $N$-ary ECOC on the average base classifier generalization error and the minimum distance between any two codes constructed. Extensive experimental results on benchmark multi-class datasets show that the proposed coding scheme achieves superior prediction performance over the state-of-the-art coding methods.

6 citations

Posted ContentDOI
01 Feb 2014
TL;DR: The first long-term aerosol sampling and chemical characterization results from measurements at the Cape Verde Atmospheric Observatory (CVAO) on the island of São Vicente are presented in this paper.
Abstract: 11 The first long-term aerosol sampling and chemical characterization results from measurements at 12 the Cape Verde Atmospheric Observatory (CVAO) on the island of São Vicente are presented 13 and are discussed with respect to air mass origin and seasonal trends. In total 671 samples were 14 collected using a high volume PM10 sampler on quartz fiber filters from January 2007 to 15 December 2011. The samples were analyzed for their aerosol chemical composition including 16 their ionic and organic constituents. Back trajectory analyses showed that the aerosol at CVAO 17 was strongly influenced by emissions from Europe and Africa with the later often responsible for 18 high mineral dust loading. Sea salt and mineral dust dominated the aerosol mass and made up in 19 total about 80% of the aerosol mass. The 5 year PM10 mean was 47.1 ± 55.5 μg/m3 while the 20 mineral dust and sea salt means were 27.9 ± 48.7 μg/m3 and 11.1 ± 5.5 μg/m3, respectively. Non21 sea-salt (nss) sulfate made up 62 % of the total sulfate and originated from both long range 22 transport from Africa or Europe and marine sources. Strong seasonal variation was observed for 23 the aerosol components. While nitrate showed no clear seasonal variation with an annual mean of 24 1.1 ± 0.6 μg/m3, the aerosol mass, OC and EC, showed strong winter maxima due to strong 25 influence of African air mass inflow. Additionally during summer, elevated concentrations of 26 OM were observed originating from marine emissions. A summer maximum was observed for 27 non-sea-salt sulfate and was connected to periods when air mass inflow was predominantly of 28 2 marine origin indicating that marine biogenic emissions were a significant source. Ammonium 29 showed a distinct maximum in spring and coincided with ocean surface water chlorophyll a 30 concentrations. Good correlations were also observed between nss-sulfate and oxalate during the 31 summer and winter seasons indicating a likely photochemical in-cloud processing of the marine 32 and anthropogenic precursors of these species. High temporal variability was observed in both 33 chloride and bromide depletion differing significantly within the seasons, air mass history and 34 Saharan dust concentration. Chloride (bromide) depletion varied from 8.8 ± 8.5 % (62 ± 42 %) in 35 Saharan dust dominated air mass to 30 ± 12 % (87 ± 11 %) in polluted Europe air masses. During 36 summer, bromide depletion often reached 100 % in marine as well as in polluted continental 37 samples. In addition to the influence of the aerosol acidic components, photochemistry was one 38 of the main drivers of halogenide depletion during the summer while during dust events, 39 displacement reaction with nitric acid was found to be the dominant mechanism. PMF analysis 40 identified three major aerosol sources including sea salt, aged sea salt and long range transport. 41 The ionic budget was dominated by the first two of these factors while the long range transport 42 factor could only account for about 14 % of the total observed ionic mass. 43

6 citations

Proceedings ArticleDOI
01 Jul 2018
TL;DR: The Curly performed well both: in classical game situations and when interacting with human opponents, namely, the top-ranked Korean amateur high school curling team.
Abstract: Most artificial intelligence (AI) based learning systems act in virtual or laboratory environments. Here we demonstrate an AI-based curling robot system named `Curly' that competes on a real-world curling ice sheet. Curly encompasses (1) an AI-based curling strategy and simulation engine under consideration of the high `icy' uncertainty, (2) the thrower robot enabled by autonomous driving with traction control, and (3) the skip robot that allows to recognize the curling field and stone configuration based on vision technology. The Curly performed well both: in classical game situations and when interacting with human opponents, namely, the top-ranked Korean amateur high school curling team.

6 citations


Cited by
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Posted Content
TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

44,703 citations

Book
18 Nov 2016
TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

38,208 citations

Journal ArticleDOI

[...]

08 Dec 2001-BMJ
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

33,785 citations

Proceedings Article
Sergey Ioffe1, Christian Szegedy1
06 Jul 2015
TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Abstract: Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization, and in some cases eliminates the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.82% top-5 test error, exceeding the accuracy of human raters.

30,843 citations