scispace - formally typeset
Search or ask a question
Institution

National Research University – Higher School of Economics

EducationMoscow, Russia
About: National Research University – Higher School of Economics is a education organization based out in Moscow, Russia. It is known for research contribution in the topics: Population & Computer science. The organization has 12873 authors who have published 23376 publications receiving 256396 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors studied the representation theory of quantum continuous gl∞, which is a deformed version of the enveloping algebra of the Lie algebra of difference operators acting on the space of Laurent polynomials in one variable.
Abstract: We begin a study of the representation theory of quantum continuous gl∞, which we denote by E. This algebra depends on two parameters and is a deformed version of the enveloping algebra of the Lie algebra of difference operators acting on the space of Laurent polynomials in one variable. Fundamental representations of E are labeled by a continuous parameter u∈C. The representation theory of E has many properties familiar from the representation theory of gl∞: vector representations, Fock modules, and semiinfinite constructions of modules. Using tensor products of vector representations, we construct surjective homomorphisms from E to spherical double affine Hecke algebras SHN for all N. A key step in this construction is an identification of a natural basis of the tensor products of vector representations with Macdonald polynomials. We also show that one of the Fock representations is isomorphic to the module constructed earlier by means of the K-theory of Hilbert schemes.

112 citations

Posted Content
TL;DR: It is shown that simple averaging of multiple points along the trajectory of SGD, with a cyclical or constant learning rate, leads to better generalization than conventional training, and Stochastic Weight Averaging (SWA) is extremely easy to implement, improves generalization, and has almost no computational overhead.
Abstract: Deep neural networks are typically trained by optimizing a loss function with an SGD variant, in conjunction with a decaying learning rate, until convergence. We show that simple averaging of multiple points along the trajectory of SGD, with a cyclical or constant learning rate, leads to better generalization than conventional training. We also show that this Stochastic Weight Averaging (SWA) procedure finds much flatter solutions than SGD, and approximates the recent Fast Geometric Ensembling (FGE) approach with a single model. Using SWA we achieve notable improvement in test accuracy over conventional SGD training on a range of state-of-the-art residual networks, PyramidNets, DenseNets, and Shake-Shake networks on CIFAR-10, CIFAR-100, and ImageNet. In short, SWA is extremely easy to implement, improves generalization, and has almost no computational overhead.

111 citations

Proceedings Article
20 May 2017
TL;DR: In this paper, a new Bayesian model is proposed that takes into account the computational structure of neural net-works and provides structured sparsity, e.g. removing neurons and/or convolutional channels in CNNs.
Abstract: Dropout-based regularization methods can be regarded as injecting random noise with pre-defined magnitude to different parts of the neural network during training. It was recently shown that Bayesian dropout procedure not only improves gener- alization but also leads to extremely sparse neural architectures by automatically setting the individual noise magnitude per weight. However, this sparsity can hardly be used for acceleration since it is unstructured. In the paper, we propose a new Bayesian model that takes into account the computational structure of neural net- works and provides structured sparsity, e.g. removes neurons and/or convolutional channels in CNNs. To do this we inject noise to the neurons outputs while keeping the weights unregularized. We establish the probabilistic model with a proper truncated log-uniform prior over the noise and truncated log-normal variational approximation that ensures that the KL-term in the evidence lower bound is com- puted in closed-form. The model leads to structured sparsity by removing elements with a low SNR from the computation graph and provides significant acceleration on a number of deep neural architectures. The model is easy to implement as it can be formulated as a separate dropout-like layer.

111 citations

Journal ArticleDOI
TL;DR: Two distinct heartbeat-related influences on conscious perception differentially related to early vs. late somatosensory processing are identified and proposed, which might reflect spontaneous shifts between interoception and exteroception or modulations of general attentional resources.
Abstract: Even though humans are mostly not aware of their heartbeats, several heartbeat-related effects have been reported to influence conscious perception. It is not clear whether these effects are distinct or related phenomena, or whether they are early sensory effects or late decisional processes. Combining electroencephalography and electrocardiography, along with signal detection theory analyses, we identify two distinct heartbeat-related influences on conscious perception differentially related to early vs. late somatosensory processing. First, an effect on early sensory processing was found for the heartbeat-evoked potential (HEP), a marker of cardiac interoception. The amplitude of the prestimulus HEP negatively correlated with localization and detection of somatosensory stimuli, reflecting a more conservative detection bias (criterion). Importantly, higher HEP amplitudes were followed by decreases in early (P50) as well as late (N140, P300) somatosensory-evoked potential (SEP) amplitudes. Second, stimulus timing along the cardiac cycle also affected perception. During systole, stimuli were detected and correctly localized less frequently, relating to a shift in perceptual sensitivity. This perceptual attenuation was accompanied by the suppression of only late SEP components (P300) and was stronger for individuals with a more stable heart rate. Both heart-related effects were independent of alpha oscillations' influence on somatosensory processing. We explain cardiac cycle timing effects in a predictive coding account and suggest that HEP-related effects might reflect spontaneous shifts between interoception and exteroception or modulations of general attentional resources. Thus, our results provide a general conceptual framework to explain how internal signals can be integrated into our conscious perception of the world.

111 citations

Posted Content
TL;DR: This paper introduces Neural Oblivious Decision Ensembles (NODE), a new deep learning architecture, designed to work with any tabular data that generalizes ensembles of oblivious decision trees, but benefits from both end-to-end gradient-based optimization and the power of multi-layer hierarchical representation learning
Abstract: Nowadays, deep neural networks (DNNs) have become the main instrument for machine learning tasks within a wide range of domains, including vision, NLP, and speech. Meanwhile, in an important case of heterogenous tabular data, the advantage of DNNs over shallow counterparts remains questionable. In particular, there is no sufficient evidence that deep learning machinery allows constructing methods that outperform gradient boosting decision trees (GBDT), which are often the top choice for tabular problems. In this paper, we introduce Neural Oblivious Decision Ensembles (NODE), a new deep learning architecture, designed to work with any tabular data. In a nutshell, the proposed NODE architecture generalizes ensembles of oblivious decision trees, but benefits from both end-to-end gradient-based optimization and the power of multi-layer hierarchical representation learning. With an extensive experimental comparison to the leading GBDT packages on a large number of tabular datasets, we demonstrate the advantage of the proposed NODE architecture, which outperforms the competitors on most of the tasks. We open-source the PyTorch implementation of NODE and believe that it will become a universal framework for machine learning on tabular data.

110 citations


Authors

Showing all 13307 results

NameH-indexPapersCitations
Rasmus Nielsen13555684898
Matthew Jones125116196909
Fedor Ratnikov123110467091
Kenneth J. Arrow113411111221
Wil M. P. van der Aalst10872542429
Peter Schmidt10563861822
Roel Aaij98107144234
John W. Berry9735152470
Federico Alessio96105442300
Denis Derkach96118445772
Marco Adinolfi9583140777
Michael Alexander9588138749
Alexey Boldyrev9443932000
Shalom H. Schwartz9422067609
Richard Blundell9348761730
Network Information
Related Institutions (5)
Saint Petersburg State University
53.4K papers, 1.1M citations

88% related

Moscow State University
123.3K papers, 1.7M citations

88% related

Russian Academy of Sciences
417.5K papers, 4.5M citations

84% related

Carnegie Mellon University
104.3K papers, 5.9M citations

83% related

École Polytechnique
39.2K papers, 1.2M citations

82% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
2023129
2022586
20212,478
20203,025
20192,590
20182,259