Home
/
Authors
/
Wei Pan

Author

Wei Pan

Bio: Wei Pan is an academic researcher from University of Minnesota. The author has contributed to research in topics: Genome-wide association study & Feature selection. The author has an hindex of 51, co-authored 237 publications receiving 11057 citations.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Akaike's Information Criterion in Generalized Estimating Equations

[...]

Wei Pan¹•Institutions (1)

University of Minnesota¹

01 Mar 2001-Biometrics

TL;DR: This work proposes a modification to AIC, where the likelihood is replaced by the quasi-likelihood and a proper adjustment is made for the penalty term.

...read moreread less

Abstract: Correlated response data are common in biomedical studies. Regression analysis based on the generalized estimating equations (GEE) is an increasingly important method for such data. However, there seem to be few model-selection criteria available in GEE. The well-known Akaike Information Criterion (AIC) cannot be directly applied since AIC is based on maximum likelihood estimation while GEE is nonlikelihood based. We propose a modification to AIC, where the likelihood is replaced by the quasi-likelihood and a proper adjustment is made for the penalty term. Its performance is investigated through simulation studies. For illustration, the method is applied to a real data set.

...read moreread less

2,233 citations

Journal Article•DOI•

A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments.

[...]

Wei Pan¹•Institutions (1)

University of Minnesota¹

01 Apr 2002-Bioinformatics

TL;DR: All the three methods here are based on using the two-sample t-statistic or its minor variation, but they differ in how to associate a statistical significance level to the corresponding statistic, leading to possibly large difference in the resulting significance levels and the numbers of genes detected.

...read moreread less

Abstract: Motivation A common task in analyzing microarray data is to determine which genes are differentially expressed across two kinds of tissue samples or samples obtained under two experimental conditions. Recently several statistical methods have been proposed to accomplish this goal when there are replicated samples under each condition. However, it may not be clear how these methods compare with each other. Our main goal here is to compare three methods, the t-test, a regression modeling approach (Thomas et al., Genome Res., 11, 1227-1236, 2001) and a mixture model approach (Pan et al., http://www.biostat.umn.edu/cgi-bin/rrs?print+2001,2001a,b) with particular attention to their different modeling assumptions. Results It is pointed out that all the three methods are based on using the two-sample t-statistic or its minor variation, but they differ in how to associate a statistical significance level to the corresponding statistic, leading to possibly large difference in the resulting significance levels and the numbers of genes detected. In particular, we give an explicit formula for the test statistic used in the regression approach. Using the leukemia data of Golub et al. (Science, 285, 531-537, 1999), we illustrate these points. We also briefly compare the results with those of several other methods, including the empirical Bayesian method of Efron et al. (J. Am. Stat. Assoc., to appear, 2001) and the Significance Analysis of Microarray (SAM) method of Tusher et al. (PROC: Natl Acad. Sci. USA, 98, 5116-5121, 2001).

...read moreread less

555 citations

Journal Article•DOI•

A Data-Adaptive Sum Test for Disease Association with Multiple Common or Rare Variants

[...]

Fang Han¹, Wei Pan¹•Institutions (1)

University of Minnesota¹

23 Apr 2010-Human Heredity

TL;DR: This article presents a powerful association test based on data-adaptive modifications to a so-called Sum test originally proposed for common variants, which aims to strike a balance between utilizing information on multiple markers in linkage disequilibrium and reducing the cost of large degrees of freedom or of multiple testing adjustment.

...read moreread less

Abstract: Since associations between complex diseases and common variants are typically weak, and approaches to genotyping rare variants (e.g. by next-generation resequencing) multiply, there is an urgent demand to develop powerful association tests that are able to detect disease associations with both common and rare variants. In this article we present such a test. It is based on data-adaptive modifications to a so-called Sum test originally proposed for common variants, which aims to strike a balance between utilizing information on multiple markers in linkage disequilibrium and reducing the cost of large degrees of freedom or of multiple testing adjustment. When applied to multiple common or rare variants in a candidate region, the proposed test is easy to use with 1 degree of freedom and without the need for multiple testing adjustment. We show that the proposed test has high power across a wide range of scenarios with either common or rare variants, or both. In particular, in some situations the proposed test performs better than several commonly used methods.

...read moreread less

308 citations

Journal Article•

Penalized Model-Based Clustering with Application to Variable Selection

[...]

Wei Pan, Xiaotong Shen

01 May 2007-Journal of Machine Learning Research

TL;DR: A penalized likelihood approach with an L1 penalty function is proposed, automatically realizing variable selection via thresholding and delivering a sparse solution in model-based clustering analysis with a common diagonal covariance matrix.

...read moreread less

Abstract: Variable selection in clustering analysis is both challenging and important. In the context of model-based clustering analysis with a common diagonal covariance matrix, which is especially suitable for "high dimension, low sample size" settings, we propose a penalized likelihood approach with an L1 penalty function, automatically realizing variable selection via thresholding and delivering a sparse solution. We derive an EM algorithm to fit our proposed model, and propose a modified BIC as a model selection criterion to choose the number of components and the penalization parameter. A simulation study and an application to gene function prediction with gene expression profiles demonstrate the utility of our method.

...read moreread less

307 citations

Journal Article•DOI•

Likelihood-based selection and sharp parameter estimation.

[...]

Xiaotong Shen¹, Wei Pan¹, Yunzhang Zhu¹•Institutions (1)

University of Minnesota¹

31 Jan 2012-Journal of the American Statistical Association

TL;DR: Theoretically, it is shown that constrained L 0 likelihood and its computational surrogate are optimal in that they achieve feature selection consistency andsharp parameter estimation, under one necessary condition required for any method to be selection consistent and to achieve sharp parameter estimation.

...read moreread less

Abstract: In high-dimensional data analysis, feature selection becomes one effective means for dimension reduction, which proceeds with parameter estimation. Concerning accuracy of selection and estimation, we study nonconvex constrained and regularized likelihoods in the presence of nuisance parameters. Theoretically, we show that constrained L 0 likelihood and its computational surrogate are optimal in that they achieve feature selection consistency and sharp parameter estimation, under one necessary condition required for any method to be selection consistent and to achieve sharp parameter estimation. It permits up to exponentially many candidate features. Computationally, we develop difference convex methods to implement the computational surrogate through prime and dual subproblems. These results establish a central role of L 0 constrained and regularized likelihoods in feature selection and parameter estimation involving selection. As applications of the general method and theory, we perform feature selection...

...read moreread less

282 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Journal Article•

Statistical method for testing the neutral mutation hypothesis by DNA polymorphism.

[...]

Fumio Tajima¹•Institutions (1)

Kyushu University¹

30 Oct 1989-Genomics

TL;DR: It is suggested that the natural selection against large insertion/deletion is so weak that a large amount of variation is maintained in a population.

...read moreread less

11,521 citations

Journal Article•DOI•

Adjusting batch effects in microarray expression data using empirical Bayes methods

[...]

W. Evan Johnson¹, Cheng Li¹, Ariel Rabinovic¹•Institutions (1)

Harvard University¹

01 Jan 2007-Biostatistics

TL;DR: This paper proposed parametric and non-parametric empirical Bayes frameworks for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples.

...read moreread less

Abstract: SUMMARY Non-biological experimental variation or “batch effects” are commonly observed across multiple batches of microarray experiments, often rendering the task of combining data from these batches difficult. The ability to combine microarray data sets is advantageous to researchers to increase statistical power to detect biological phenomena from studies where logistical considerations restrict sample size or in studies that require the sequential hybridization of arrays. In general, it is inappropriate to combine data sets without adjusting for batch effects. Methods have been proposed to filter batch effects from data, but these are often complicated and require large batch sizes (>25) to implement. Because the majority of microarray studies are conducted using much smaller sample sizes, existing methods are not sufficient. We propose parametric and non-parametric empirical Bayes frameworks for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples. We illustrate our methods using two example data sets and show that our methods are justifiable, easy to apply, and useful in practice. Software for our method is freely available at: http://biosun1.harvard.edu/complab/batch/.

...read moreread less

6,319 citations

Journal Article•DOI•

Heart Disease and Stroke Statistics—2019 Update: A Report From the American Heart Association

[...]

Emelia J. Benjamin, Paul Muntner, Alvaro Alonso, Márcio Sommer Bittencourt, Clifton W. Callaway, April P. Carson, Alanna M. Chamberlain, Alex R. Chang, Susan Cheng, Sandeep R Das, Francesca N. Delling, Luc Djoussé, Mitchell S.V. Elkind, Jane F. Ferguson, Myriam Fornage, Lori C. Jordan, Sadiya S. Khan, Brett M. Kissela, Kristen L. Knutson, Tak W. Kwan, Daniel T. Lackland, Tené T. Lewis, Judith H. Lichtman, Chris T. Longenecker, Matthew Shane Loop, Pamela L. Lutsey, Seth S. Martin, Kunihiro Matsushita, Andrew E. Moran, Michael E. Mussolino, Martin O'Flaherty, Ambarish Pandey, Amanda M. Perak, Wayne D. Rosamond, Gregory A. Roth, Uchechukwu K.A. Sampson, Gary Satou, Emily B. Schroeder, Svati H. Shah, Nicole L. Spartano, Andrew Stokes, David L. Tirschwell, Connie W. Tsao, Mintu P. Turakhia, Lisa B. VanWagner, John T. Wilkins, Sally S. Wong, Salim S. Virani - Show less +44 more

05 Mar 2019-Circulation

TL;DR: March 5, 2019 e1 WRITING GROUP MEMBERS Emelia J. Virani, MD, PhD, FAHA, Chair Elect On behalf of the American Heart Association Council on Epidemiology and Prevention Statistics Committee and Stroke Statistics Subcommittee.

...read moreread less

Abstract: March 5, 2019 e1 WRITING GROUP MEMBERS Emelia J. Benjamin, MD, ScM, FAHA, Chair Paul Muntner, PhD, MHS, FAHA, Vice Chair Alvaro Alonso, MD, PhD, FAHA Marcio S. Bittencourt, MD, PhD, MPH Clifton W. Callaway, MD, FAHA April P. Carson, PhD, MSPH, FAHA Alanna M. Chamberlain, PhD Alexander R. Chang, MD, MS Susan Cheng, MD, MMSc, MPH, FAHA Sandeep R. Das, MD, MPH, MBA, FAHA Francesca N. Delling, MD, MPH Luc Djousse, MD, ScD, MPH Mitchell S.V. Elkind, MD, MS, FAHA Jane F. Ferguson, PhD, FAHA Myriam Fornage, PhD, FAHA Lori Chaffin Jordan, MD, PhD, FAHA Sadiya S. Khan, MD, MSc Brett M. Kissela, MD, MS Kristen L. Knutson, PhD Tak W. Kwan, MD, FAHA Daniel T. Lackland, DrPH, FAHA Tené T. Lewis, PhD Judith H. Lichtman, PhD, MPH, FAHA Chris T. Longenecker, MD Matthew Shane Loop, PhD Pamela L. Lutsey, PhD, MPH, FAHA Seth S. Martin, MD, MHS, FAHA Kunihiro Matsushita, MD, PhD, FAHA Andrew E. Moran, MD, MPH, FAHA Michael E. Mussolino, PhD, FAHA Martin O’Flaherty, MD, MSc, PhD Ambarish Pandey, MD, MSCS Amanda M. Perak, MD, MS Wayne D. Rosamond, PhD, MS, FAHA Gregory A. Roth, MD, MPH, FAHA Uchechukwu K.A. Sampson, MD, MBA, MPH, FAHA Gary M. Satou, MD, FAHA Emily B. Schroeder, MD, PhD, FAHA Svati H. Shah, MD, MHS, FAHA Nicole L. Spartano, PhD Andrew Stokes, PhD David L. Tirschwell, MD, MS, MSc, FAHA Connie W. Tsao, MD, MPH, Vice Chair Elect Mintu P. Turakhia, MD, MAS, FAHA Lisa B. VanWagner, MD, MSc, FAST John T. Wilkins, MD, MS, FAHA Sally S. Wong, PhD, RD, CDN, FAHA Salim S. Virani, MD, PhD, FAHA, Chair Elect On behalf of the American Heart Association Council on Epidemiology and Prevention Statistics Committee and Stroke Statistics Subcommittee

...read moreread less

5,739 citations

Journal Article•DOI•

Heart Disease and Stroke Statistics-2018 Update: A Report From the American Heart Association.

[...]

Emelia J. Benjamin, Salim S. Virani, Clifton W. Callaway, Alanna M. Chamberlain, Alex R. Chang, Susan Cheng, Stephanie E. Chiuve, Mary Cushman, Francesca N. Delling, Rajat Deo, Sarah D. de Ferranti, Jane F. Ferguson, Myriam Fornage, Cathleen Gillespie, Carmen R. Isasi, Monik C. Jiménez, Lori C. Jordan, Suzanne E. Judd, Daniel T. Lackland, Judith H. Lichtman, Lynda D. Lisabeth, Simin Liu, Chris T. Longenecker, Pamela L. Lutsey, Jason Mackey, David B. Matchar, Kunihiro Matsushita, Michael E. Mussolino, Khurram Nasir, Martin O'Flaherty, Latha Palaniappan, Ambarish Pandey, Dilip K. Pandey, Mathew J. Reeves, Matthew D. Ritchey, Carlos J. Rodriguez, Gregory A. Roth, Wayne D. Rosamond, Uchechukwu K.A. Sampson, Gary Satou, Svati H. Shah, Nicole L. Spartano, David L. Tirschwell, Connie W. Tsao, Jenifer H. Voeks, Joshua Z. Willey, John T. Wilkins, Jason H Y Wu, Heather M. Alger, Sally S. Wong, Paul Muntner - Show less +47 more

20 Mar 2018-Circulation

TL;DR: The Statistical Update represents the most up-to-date statistics related to heart disease, stroke, and the cardiovascular risk factors listed in the AHA's My Life Check - Life’s Simple 7, which include core health behaviors and health factors that contribute to cardiovascular health.

...read moreread less

Abstract: Each chapter listed in the Table of Contents (see next page) is a hyperlink to that chapter. The reader clicks the chapter name to access that chapter. Each chapter listed here is a hyperlink. Click on the chapter name to be taken to that chapter. Each year, the American Heart Association (AHA), in conjunction with the Centers for Disease Control and Prevention, the National Institutes of Health, and other government agencies, brings together in a single document the most up-to-date statistics related to heart disease, stroke, and the cardiovascular risk factors listed in the AHA’s My Life Check - Life’s Simple 7 (Figure1), which include core health behaviors (smoking, physical activity, diet, and weight) and health factors (cholesterol, blood pressure [BP], and glucose control) that contribute to cardiovascular health. The Statistical Update represents …

...read moreread less

5,102 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse