Home
/
Authors
/
Shuxiao Chen

Author

Shuxiao Chen

Bio: Shuxiao Chen is an academic researcher from University of Pennsylvania. The author has contributed to research in topics: Empirical risk minimization & Minimax. The author has an hindex of 6, co-authored 16 publications receiving 126 citations.

Papers

PDF

Open Access

More filters

Posted Content•

A Group-Theoretic Framework for Data Augmentation

[...]

Shuxiao Chen¹, Edgar Dobriban¹, Jane H. Lee¹•Institutions (1)

University of Pennsylvania¹

25 Jul 2019-arXiv: Machine Learning

TL;DR: It is shown that data augmentation is equivalent to an averaging operation over the orbits of a certain group that keeps the data distribution approximately invariant, and it is proved that it leads to variance reduction.

...read moreread less

Abstract: Data augmentation is a widely used trick when training deep neural networks: in addition to the original data, properly transformed data are also added to the training set. However, to the best of our knowledge, a clear mathematical framework to explain the performance benefits of data augmentation is not available. In this paper, we develop such a theoretical framework. We show data augmentation is equivalent to an averaging operation over the orbits of a certain group that keeps the data distribution approximately invariant. We prove that it leads to variance reduction. We study empirical risk minimization, and the examples of exponential families, linear regression, and certain two-layer neural networks. We also discuss how data augmentation could be used in problems with symmetry where other approaches are prevalent, such as in cryo-electron microscopy (cryo-EM).

...read moreread less

111 citations

Journal Article•DOI•

Valid Inference Corrected for Outlier Removal

[...]

Shuxiao Chen¹, Jacob Bien²•Institutions (2)

University of Pennsylvania¹, University of Southern California²

02 Apr 2020-Journal of Computational and Graphical Statistics

TL;DR: This article highlights the fact that the standard “detect-and-forget” OLS approach can lead to invalid inference and shows how recently developed tools in selective inference can be used to properly account for outlier detection and removal.

...read moreread less

Abstract: Ordinary least square (OLS) estimation of a linear regression model is well-known to be highly sensitive to outliers. It is common practice to (1) identify and remove outliers by looking at the dat...

...read moreread less

39 citations

Posted Content•

Invariance reduces Variance: Understanding Data Augmentation in Deep Learning and Beyond.

[...]

Shuxiao Chen, Edgar Dobriban, Jane H. Lee

25 Jul 2019

TL;DR: A theoretical framework to start to shed light on how data augmentation could be used in problems with symmetry where other approaches are prevalent, such as in cryo-electron microscopy (cryo-EM).

...read moreread less

Abstract: Many complex deep learning models have found success by exploiting symmetries in data. Convolutional neural networks (CNNs), for example, are ubiquitous in image classification due to their use of translation symmetry, as image identity is roughly invariant to translations. In addition, many other forms of symmetry such as rotation, scale, and color shift are commonly used via data augmentation: the transformed images are added to the training set. However, a clear framework for understanding data augmentation is not available. One may even say that it is somewhat mysterious: how can we increase performance by simply adding transforms of our data to the model? Can that be information theoretically possible? In this paper, we develop a theoretical framework to start to shed light on some of these problems. We explain data augmentation as averaging over the orbits of the group that keeps the data distribution invariant, and show that it leads to variance reduction. We study finite-sample and asymptotic empirical risk minimization (using results from stochastic convex optimization, Rademacher complexity, and asymptotic statistical theory). We work out as examples the variance reduction in exponential families, linear regression, and certain two-layer neural networks under shift invariance (using discrete Fourier analysis). We also discuss how data augmentation could be used in problems with symmetry where other approaches are prevalent, such as in cryo-electron microscopy (cryo-EM).

...read moreread less

36 citations

Posted Content•

Global and Individualized Community Detection in Inhomogeneous Multilayer Networks.

[...]

Shuxiao Chen¹, Sifan Liu², Zongming Ma¹•Institutions (2)

University of Pennsylvania¹, Stanford University²

02 Dec 2020-arXiv: Statistics Theory

TL;DR: The present paper studies community detection in a stylized yet informative inhomogeneous multilayer network model, and provides an efficient algorithm that is simultaneously asymptotic minimax optimal for both estimation tasks under mild conditions.

...read moreread less

Abstract: In network applications, it has become increasingly common to obtain datasets in the form of multiple networks observed on the same set of subjects, where each network is obtained in a related but different experiment condition or application scenario. Such datasets can be modeled by multilayer networks where each layer is a separate network itself while different layers are associated and share some common information. The present paper studies community detection in a stylized yet informative inhomogeneous multilayer network model. In our model, layers are generated by different stochastic block models, the community structures of which are (random) perturbations of a common global structure while the connecting probabilities in different layers are not related. Focusing on the symmetric two block case, we establish minimax rates for both \emph{global estimation} of the common structure and \emph{individualized estimation} of layer-wise community structures. Both minimax rates have sharp exponents. In addition, we provide an efficient algorithm that is simultaneously asymptotic minimax optimal for both estimation tasks under mild conditions. The optimal rates depend on the \emph{parity} of the number of most informative layers, a phenomenon that is caused by inhomogeneity across layers.

...read moreread less

17 citations

Proceedings Article•

Label-Aware Neural Tangent Kernel: Toward Better Generalization and Local Elasticity

[...]

Shuxiao Chen¹, Hangfeng He¹, Weijie J. Su¹•Institutions (1)

University of Pennsylvania¹

01 Jan 2020

TL;DR: A novel approach from the perspective of label-awareness to reduce the performance gap for the neural tangent kernels and shows that the models trained with the proposed kernels better simulate NNs in terms of generalization ability and local elasticity.

...read moreread less

Abstract: As a popular approach to modeling the dynamics of training overparametrized neural networks (NNs), the neural tangent kernels (NTK) are known to fall behind real-world NNs in generalization ability. This performance gap is in part due to the \textit{label agnostic} nature of the NTK, which renders the resulting kernel not as \textit{locally elastic} as NNs~\citep{he2019local}. In this paper, we introduce a novel approach from the perspective of \emph{label-awareness} to reduce this gap for the NTK. Specifically, we propose two label-aware kernels that are each a superimposition of a label-agnostic part and a hierarchy of label-aware parts with increasing complexity of label dependence, using the Hoeffding decomposition. Through both theoretical and empirical evidence, we show that the models trained with the proposed kernels better simulate NNs in terms of generalization ability and local elasticity.

...read moreread less

16 citations

1
2
3
4
…
5

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Regression Diagnostics: Identifying Influential Data and Sources of Collinearity

[...]

W. W. Muir¹•Institutions (1)

University of Strathclyde¹

01 May 1981

TL;DR: This chapter discusses Detecting Influential Observations and Outliers, a method for assessing Collinearity, and its applications in medicine and science.

...read moreread less

Abstract: 1. Introduction and Overview. 2. Detecting Influential Observations and Outliers. 3. Detecting and Assessing Collinearity. 4. Applications and Remedies. 5. Research Issues and Directions for Extensions. Bibliography. Author Index. Subject Index.

...read moreread less

4,948 citations

Plots Transformations And Regression An Introduction To Graphical Methods Of Diagnostic Regression Analysis

[...]

Michael Frueh

01 Jan 2016

TL;DR: In this paper, plots transformations and regression is used as an introduction to graphical methods of diagnostic regression analysis, but end up in malicious downloads, instead of reading a good book with a cup of coffee in the afternoon, instead they are facing with some infectious bugs inside their laptop.

...read moreread less

Abstract: Thank you very much for reading plots transformations and regression an introduction to graphical methods of diagnostic regression analysis. Maybe you have knowledge that, people have look numerous times for their favorite readings like this plots transformations and regression an introduction to graphical methods of diagnostic regression analysis, but end up in malicious downloads. Rather than reading a good book with a cup of coffee in the afternoon, instead they are facing with some infectious bugs inside their laptop.

...read moreread less

138 citations

Posted Content•

Enhanced Convolutional Neural Tangent Kernels

[...]

Zhiyuan Li¹, Ruosong Wang², Dingli Yu¹, Simon S. Du, Wei Hu¹, Ruslan Salakhutdinov², Sanjeev Arora¹ - Show less +3 more•Institutions (2)

Princeton University¹, Carnegie Mellon University²

25 Sep 2019-arXiv: Learning

TL;DR: The resulting kernel, CNN-GP with LAP and horizontal flip data augmentation, achieves 89% accuracy, matching the performance of AlexNet, which is the best such result the authors know of for a classifier that is not a trained neural network.

...read moreread less

Abstract: Recent research shows that for training with $\ell_2$ loss, convolutional neural networks (CNNs) whose width (number of channels in convolutional layers) goes to infinity correspond to regression with respect to the CNN Gaussian Process kernel (CNN-GP) if only the last layer is trained, and correspond to regression with respect to the Convolutional Neural Tangent Kernel (CNTK) if all layers are trained. An exact algorithm to compute CNTK (Arora et al., 2019) yielded the finding that classification accuracy of CNTK on CIFAR-10 is within 6-7% of that of that of the corresponding CNN architecture (best figure being around 78%) which is interesting performance for a fixed kernel. Here we show how to significantly enhance the performance of these kernels using two ideas. (1) Modifying the kernel using a new operation called Local Average Pooling (LAP) which preserves efficient computability of the kernel and inherits the spirit of standard data augmentation using pixel shifts. Earlier papers were unable to incorporate naive data augmentation because of the quadratic training cost of kernel regression. This idea is inspired by Global Average Pooling (GAP), which we show for CNN-GP and CNTK is equivalent to full translation data augmentation. (2) Representing the input image using a pre-processing technique proposed by Coates et al. (2011), which uses a single convolutional layer composed of random image patches. On CIFAR-10, the resulting kernel, CNN-GP with LAP and horizontal flip data augmentation, achieves 89% accuracy, matching the performance of AlexNet (Krizhevsky et al., 2012). Note that this is the best such result we know of for a classifier that is not a trained neural network. Similar improvements are obtained for Fashion-MNIST.

...read moreread less

105 citations

Posted Content•

Data augmentation instead of explicit regularization

[...]

Alex Hernández-García¹, Peter König¹•Institutions (1)

University of Osnabrück¹

15 Feb 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: The contribution on generalization of weight decay and dropout is not only superfluous when sufficient implicit regularization is provided, but also such techniques can dramatically deteriorate the performance if the hyperparameters are not carefully tuned for the architecture and data set.

...read moreread less

Abstract: Contrary to most machine learning models, modern deep artificial neural networks typically include multiple components that contribute to regularization. Despite the fact that some (explicit) regularization techniques, such as weight decay and dropout, require costly fine-tuning of sensitive hyperparameters, the interplay between them and other elements that provide implicit regularization is not well understood yet. Shedding light upon these interactions is key to efficiently using computational resources and may contribute to solving the puzzle of generalization in deep learning. Here, we first provide formal definitions of explicit and implicit regularization that help understand essential differences between techniques. Second, we contrast data augmentation with weight decay and dropout. Our results show that visual object categorization models trained with data augmentation alone achieve the same performance or higher than models trained also with weight decay and dropout, as is common practice. We conclude that the contribution on generalization of weight decay and dropout is not only superfluous when sufficient implicit regularization is provided, but also such techniques can dramatically deteriorate the performance if the hyperparameters are not carefully tuned for the architecture and data set. In contrast, data augmentation systematically provides large generalization gains and does not require hyperparameter re-tuning. In view of our results, we suggest to optimize neural networks without weight decay and dropout to save computational resources, hence carbon emissions, and focus more on data augmentation and other inductive biases to improve performance and robustness.

...read moreread less

101 citations

Journal Article•DOI•

EDL-COVID: Ensemble Deep Learning for COVID-19 Case Detection From Chest X-Ray Images

[...]

Shanjiang Tang¹, Chunjiang Wang¹, Jiangtian Nie², Neeraj Kumar³, Yang Zhang, Zehui Xiong⁴, Ahmed Barnawi⁵ - Show less +3 more•Institutions (5)

Tianjin University¹, Nanyang Technological University², Thapar University³, Singapore University of Technology and Design⁴, King Abdulaziz University⁵

08 Feb 2021-IEEE Transactions on Industrial Informatics

TL;DR: Experimental results show that EDL-COVID offers promising results for COVID-19 case detection with an accuracy of 95%, better than CO VID-Net of 93.3% and a proposed weighted averaging ensembling method that is aware of different sensitivities of deep learning models on different classes types.

...read moreread less

Abstract: Effective screening of COVID-19 cases has been becoming extremely important to mitigate and stop the quick spread of the disease during the current period of COVID-19 pandemic worldwide. In this article, we consider radiology examination of using chest X-ray images, which is among the effective screening approaches for COVID-19 case detection. Given deep learning is an effective tool and framework for image analysis, there have been lots of studies for COVID-19 case detection by training deep learning models with X-ray images. Although some of them report good prediction results, their proposed deep learning models might suffer from overfitting, high variance, and generalization errors caused by noise and a limited number of datasets. Considering ensemble learning can overcome the shortcomings of deep learning by making predictions with multiple models instead of a single model, we propose EDL-COVID , an ensemble deep learning model employing deep learning and ensemble learning. The EDL-COVID model is generated by combining multiple snapshot models of COVID-Net, which has pioneered in an open-sourced COVID-19 case detection method with deep neural network processed chest X-ray images, by employing a proposed weighted averaging ensembling method that is aware of different sensitivities of deep learning models on different classes types. Experimental results show that EDL-COVID offers promising results for COVID-19 case detection with an accuracy of 95%, better than COVID-Net of 93.3%.

...read moreread less

94 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51

Collapse