scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Simultaneous inference in general parametric models.

TL;DR: This paper describes simultaneous inference procedures in general parametric models, where the experimental questions are specified through a linear combination of elemental model parameters, and extends the canonical theory of multiple comparison procedures in ANOVA models to linear regression problems, generalizedlinear models, linear mixed effects models, the Cox model, robust linear models, etc.
Abstract: Simultaneous inference is a common problem in many areas of application. If multiple null hypotheses are tested simultaneously, the probability of rejecting erroneously at least one of them increases beyond the pre-specified significance level. Simultaneous inference procedures have to be used which adjust for multiplicity and thus control the overall type I error rate. In this paper we describe simultaneous inference procedures in general parametric models, where the experimental questions are specified through a linear combination of elemental model parameters. The framework described here is quite general and extends the canonical theory of multiple comparison procedures in ANOVA models to linear regression problems, generalized linear models, linear mixed effects models, the Cox model, robust linear models, etc. Several examples using a variety of different statistical models illustrate the breadth

Summary (4 min read)

1 Introduction

  • Common multiple comparison procedures adjust for multiplicity and thus ensure that the overall type I error remains below the pre-specified significance level α.
  • Section˜2 defines the general model and obtains the asymptotic or exact distribution of linear functions of elemental model parameters under rather weak conditions.
  • In Section˜3 the authors describe the framework for simultaneous inference procedures in general parametric models.
  • Most interesting from a practical point of view is Section˜6 where the authors analyze four rather challenging problems with the tools developed in this paper.

2 Model and Parameters

  • In this section the authors introduce the underlying model assumptions and derive some asymptotic results necessary in the subsequent sections.
  • The results from this section form the basis for the simultaneous inference procedures described in Section˜3.
  • The model contains fixed but unknown elemental parameters θ ∈ Rp and other (random or nuisance) parameters η.
  • In what follows the authors describe the underlying model assumptions, the limiting distribution of estimates of their parameters of interest ϑ, as well as the corresponding test statistics for hypotheses about ϑ and their limiting joint distribution.

3 Global and Simultaneous Inference

  • Based on the results from Section˜2, the authors now focus on the derivation of suitable inference procedures.
  • The authors start considering the general linear hypothesis (Searle, 1971) formulated in terms of their parameters of interest ϑ.
  • This approximating distribution will now be used as the reference distribution when constructing the inference procedures.
  • Note that a small global p-value (obtained from one of these procedures) leading to a rejection of H0 does not give further indication about the nature of the significant result.
  • A stronger assumption than asymptotic normality of θ̂n in (2) is exact normality, i.e., θ̂n ∼ Np(θ,Σ).

3.1 Global Inference

  • The F - and the χ2-test are classical approaches to assess the global null hypothesis H0.
  • Standard results (such as Theorem 3.5, Serfling, 1980) ensure that X2 = T⊤nR + Furthermore, Rank(Rn) + denotes the Moore-Penrose inverse of the correlation matrix Rank(R).

3.2 Simultaneous Inference

  • In what follows the authors use adjusted p-values to describe the decision rules.
  • The adjusted p-values are calculated from expression˜(5).
  • Similar results also hold for one-sided testing problems.
  • Single-step procedures can always be improved by stepwise extensions based on the closed test procedure.

4 Applications

  • The methodological framework described in Sections˜2 and 3 is very general and thus applicable to a wide range of statistical models.
  • Many estimation techniques, such as maximum likelihood and M-estimation, provide at least asymptotically normal estimates of the elemental parameters together with consistent estimates of their covariance matrix.
  • Detailed numerical examples are discussed in Section˜6.
  • Many further multiple comparison procedures have been investigated in the past, which all fit into this framework.
  • Thus, related simultaneous tests and confidence intervals do not rely on asymptotics and can be computed analytically instead, as shown in Section˜3.

5 Implementation

  • The multcomp package (Hothorn et˜al., 2008) in R (R Development Core Team, 2008) provides a general implementation of the framework for simultaneous inference in semiparametric models described in Sections˜2 and˜3.
  • The two remaining arguments alternative and rhs define the direction of the alternative (see Section˜3) and m, respectively.
  • One should do this whenever there is doubt about what the default contrasts measure, which typically happens in models with higher order interaction terms.
  • The numerical accuracy of adjusted p-values and simultaneous confidence intervals implemented in multcomp is continuously checked against results reported by Westfall et˜al.

6.1 Genetic Components of Alcoholism

  • Various studies have linked alcohol dependence phenotypes to chromosome 4.
  • One candidate gene is NACP (non-amyloid component of plaques), coding for alpha synuclein.
  • Bönsch et˜al. (2005) found longer alleles of NACP -REP1 in alcohol-dependent patients compared with healthy controls and report that the allele lengths show some association with levels of expressed alpha synuclein mRNA in alcohol-dependent subjects .
  • The data are available from package coin.
  • Thus, the authors fit a simple one-way ANOVA model to the data and define K such that Kθ contains all three group differences (Tukey’s all-pairwise comparisons):.

R> summary(amod_glht)

  • Because of the variance heterogeneity that can be observed in Figure˜1, one might be concerned with the validity of the above results stating that there is no difference between any combination of the three allele lengths.
  • Sn might be more appropriate in this situation, and the vcov argument can be used to specify a function to compute some alternative covariance estimator.

R> summary(amod_glht_sw)

  • The authors used the sandwich function from package sandwich (Zeileis, 2004, 2006) which provides us with a heteroscedasticity-consistent estimator of the covariance matrix.
  • This result is more in line with previously published findings for this study obtained from nonparametric test procedures such as the Kruskal-Wallis test.
  • A comparison of the simultaneous confidence intervals calculated based on the ordinary and sandwich estimator is given in Figure˜2.

6.2 Prediction of Total Body Fat

  • Garcia et˜al. (2005) report on the development of predictive regression equations for body fat content by means of p = 9 common anthropometric measurements which were obtained for n = 71 healthy German women.
  • In addition, the women’s body composition was measured by Dual Energy X-Ray Absorptiometry (DXA).
  • This reference method is very accurate in measuring body fat but finds little applicability in practical environments, mainly because of high costs and the methodological efforts needed.
  • Backward-elimination was applied to select important variables from the available anthropometrical measurements and Garcia et˜al.
  • Here, the authors fit the saturated model to the data and use the max-t test over all t-statistics to select important variables based on adjusted p-values.

R> summary(lmod <- lm(DEXfat ~ ., data = bodyfat))

  • The marix of linear functions K is basically the identity matrix, except for the intercept which is omitted.
  • Once the matrix K has been defined, it can be used to set up the general linear hypotheses:.

R> summary(lmod_glht)

  • Only two covariates, waist and hip circumference, seem to be important and caused the rejection of H0.
  • Alternatively, an MM-estimator (Yohai, 1987) as implemented by lmrob from package lmrob (Todorov et˜al., 2007) can be used to fit a robust version of the above linear model, the results coincide rather nicely (note that the control arguments to lmrob were changed in multcomp version 1.2-6 and thus the results have slightly changed):.

6.3 Smoking and Alzheimer’s Disease

  • Salib and Hillier (1997) report results of a case-control study on Alzheimer’s disease and smoking behavior of 198 female and male Alzheimer patients and 164 controls.
  • The alzheimer data have been re-constructed from Table˜4 in Salib and Hillier (1997).
  • The authors conclude that ‘cigarette smoking is less frequent in men with Alzheimer’s disease.’.
  • Here, the authors focus on how a potential association can be described (see Hothorn et˜al., 2006, for a non-parametric approach).
  • First, the authors fit a logistic regression model including both main effects and an interaction effect of smoking and gender.

R> summary(gmod)

  • The negative regression coefficient for heavy smoking males indicates that Alzheimer’s disease might be less frequent in this group, but the model is still difficult to interpret based on the coefficients and corresponding p-values only.
  • Therefore, confidence intervals on the probability scale for the different ‘risk groups’ are interesting and can be computed as follows.
  • For each combination of gender and smoking behavior, the probability of suffering from Alzheimer’s disease can be estimated by computing the logit function of the linear predictor from model gmod.
  • Using the predict method for generalized linear models is a convenient way to compute these probability estimates.

R> plot(gmod_ci, xlab = "Probability of Developing Alzheimer",

  • + xlim = c(0, 1)) The simultaneous confidence intervals are depicted in Figure˜3.
  • Using this representation of the results, it is obvious that Alzheimer’s disease is less frequent in heavy smoking men compared to all other configurations of the two covariates.

6.4 Acute Myeloid Leukemia Survival

  • The treatment of patients suffering from acute myeloid leukemia (AML) is determined by a tumor classification scheme taking the status of various cytogenetic aberrations into account.
  • Bullinger et˜al. (2004) investigate an extended tumor classification scheme incorporating molecular subgroups of the disease obtained by gene expression profiling.
  • The overall survival time and censoring indicator as well as the clinical variables age, sex, lactic dehydrogenase level (LDH), white blood cell count (WBC), and treatment group are taken from Supplementary Table 1 in Bullinger et˜al.
  • One interesting question might be the usefulness of this risk score.
  • Tukey’s allpairwise comparisons highlight that there seems to be a difference between ‘high’ scores and both ‘low’ and ‘intermediate’ ones but the latter two aren’t distinguishable:.

R> summary(glht(smod, linfct = mcp(risk = "Tukey")))

  • Simultaneous Tests for General Linear Hypotheses Multiple Comparisons of Means: Tukey Contrasts Fit: survreg(formula = Surv(time, event) ~ Sex +.
  • Again, a sandwich estimator of the covariance matrix.
  • Sn can be plugged-in but the results stay very much the same in this case.

6.5 Forest Regeneration

  • Young trees suffer from browsing damage, mostly by roe and red deer.
  • The survey takes place in all 756 game management districts (‘Hegegemeinschaften’) in Bavaria.
  • The data of 2700 trees include the species and a binary variable indicating whether or not the tree suffers from damage caused by deer browsing.
  • The authors fit a mixed logistic regression model (using package lme4, Bates, 2005, 2007) without intercept and with random effects accounting for the spatial variation of the trees.
  • For each plot nested within a set of five plots orientated on a 100m transect (the location of the transect is determined by a pre-defined equally spaced lattice of the area under test), a random intercept is included in the model.

R> ci$confint[,2:3] <- ci$confint[,3:2]

  • Browsing is less frequent in hardwood but especially small oak trees are severely at risk.
  • The local authorities increased the number of roe deers to be harvested in the following years.
  • The large confidence interval for ash, maple, elm and lime trees is caused by the small sample size.

7 Conclusion

  • In essence, all that is required is a parameter estimate θ̂n following an asymptotic multivariate normal distribution, and a consistent estimate of its covariance matrix.
  • Standard software packages can be used to compute these quantities.
  • The examples given in Section˜6 illustrate two facts.
  • At first, the presented approach helps to formulate simultaneous inference procedures in situations that were previously hard to deal with and, at second, a flexible open-source implementation offers tools to actually perform such procedures rather easily.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Simultaneous Inference
in General Parametric Models
Torsten Hothorn
Institut f
¨
ur Statistik
Ludwig-Maximilians-Universit
¨
at M
¨
unchen
Ludwigstraße 33, D–80539 M
¨
unchen, Germany
Frank Bretz
Statistical Methodology, Clinical Information Sciences
Novartis Pharma AG
CH-4002 Basel, Switzerland
Peter Westfall
Texas Tech University
Lubbock, TX 79409, U.S.A
March 15, 2013
Abstract
Simu ltaneous inference is a common problem in many areas of application. If
multiple null hypotheses are tested simultaneously, the probability of rejecting er-
roneously at least one of them increases beyond the pre-specified significance level.
Simu ltaneous inference procedures have to be used which adjust for multiplicity and
thu s control the overall type I error rate. In this paper we describe simultaneous infer-
ence procedures in general parametric models, where the experimental questions are
specified through a linear combination of elemental model parameters. The frame-
work described here is quite general and extends the canonical theory of multiple
comparison procedures in ANOVA models to linear regression problems, generalized
linear models, linear mixed effects models, the Cox model, robust linear models, etc.
Seve ral examples using a variety of different statistical models illustrate the breadth
This is a preprint of an article published in Biometrical Journal, Volume 50, Number 3, 346–363.
Copyright
2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim; available online
http://www.
biometrical-journal.com
.
1

of the results. For the analyses we use the R add-on package multcomp, which pro-
vides a convenient interface to the general approach adopted here.
Key words: multiple tests, multiple comparisons, simultaneous confidence intervals,
adjusted p-values, multivariate normal distribution, robust statistics.
1 Introduction
Multiplicity is an intrinsic problem of any simultaneous inference. If each of k, say, null
hypotheses is tested at nominal level α, the overall type I error rate can be substantially
larger than α. That is, the probability of at least one erroneous rejection is larger tha n
α for k 2. Common multiple comparison procedures adjust for multiplicity and thus
ensure that the overall type I error remains below the pre-sp ecifie d significance level α.
Examples of such multiple comparison procedures include Dunnett’s many-to-one compar-
isons, Tukey’s all-pairwise comparisons, sequential pairwise contrasts, comparisons with
the average, changep oint analyses, dose-response contrasts, etc. These procedures are all
well established for classical regression and ANOVA models allowing for cova riates and/or
factorial treatment structures with i.i.d.˜normal errors and constant variance, see Bretz
et˜al.
(2008) and the references therein. For a general reading on multiple co mparison
procedures we refer to
Hochberg and Tamhane (1987) and Hsu (1996).
In this paper we aim at a unified description of simultaneous inference procedures in para-
metric models with generally correlated parameter estimates. Each individual null hypothe-
sis is specified through a linear combination of elemental model parameters and we allow for
k of such null hypotheses to be tested simultaneously, regardless of the number of elemental
model parameters p. The general framework described here extends the current canoni-
cal theory with respect to the following aspects: (i) model assumptions such as normality
and homoscedasticity are relaxed, thus allowing for simultaneous inference in generalized
linear models, mixed effects models, survival models, etc.; (ii) arbitrary linear functions of
the elemental parameters are allowed, not just contrasts of means in AN(C)OVA models;
(iii) computing the r eference distribution is feasible for arbitrary designs, especially for
unbalanced designs; and (iv) a unified implementation is provided which allows for a fast
transition of the theoretical results to the desks of data analysts interested in simultaneous
inferences for multiple hypotheses.
Accordingly, the paper is organized as follows. Section˜
2 defines the general model and ob-
tains the asymptotic or exact distribution of linear functions of elemental model parameters
under rather weak conditions. In Section˜
3 we describe the framework for simultaneous
inference procedures in general parametric models. An overview about important applica-
tions of the methodology is given in Section˜4 followed by a s hort discussion of the software
implementation in Section˜
5. Most interesting from a practical point of view is Section˜6
where we analyze four rather challenging problems with the tools developed in this paper.

2 Model and Parameters
In this section we introduce the underlying model assumptions and derive some asymptotic
results necessary in the subsequent sections. The results from this section form the basis
for the simultaneous inference procedures described in Section˜3.
Let M((Z
1
, . . . , Z
n
), θ, η ) denote a semi-parametric statistical model. The set of n obser-
vations is described by (Z
1
, . . . , Z
n
). The model contains fixed but unknown elemental
parameters θ R
p
and other (random or nuisance) parameters η. We are primarily in-
terested in the linear functions ϑ := Kθ of the parameter vector θ as specified through
the consta nt matrix K R
k,p
. In what follows we describe the underlying model assump-
tions, the limiting distribution of estimates of our parameters of interest ϑ, as well as the
corresponding test statistics for hypotheses about ϑ and their limiting joint distribution.
Suppose
ˆ
θ
n
R
p
is an estimate of θ and S
n
R
p,p
is an estimate of cov(
ˆ
θ
n
) with
a
n
S
n
P
Σ R
p,p
(1)
for some positive, nondecreasing sequence a
n
. Furthermore, we assume that a multivariate
central limit theorem holds, i.e.,
a
1/2
n
(
ˆ
θ
n
θ)
d
N
p
(0, Σ). (2)
If both (
1) and (2) are fulfilled we write
ˆ
θ
n
a
N
p
(θ, S
n
). Then, by Theorem 3.3.A in Serfling
(1980), the linear function
ˆ
ϑ
n
= K
ˆ
θ
n
, i.e., an estimate of our parameters of interest, also
follows an approximate multivariate normal distribution
ˆ
ϑ
n
= K
ˆ
θ
n
a
N
k
(ϑ, S
n
)
with covariance matrix S
n
:= KS
n
K
for any fixed matrix K R
k,p
. Thus we need not
to distinguish between elemental parameters θ or derived parameters ϑ = Kθ that are of
interest to the r esearcher. Instead we simply assume for the moment that we have (in
analogy to (
1) and (2))
ˆ
ϑ
n
a
N
k
(ϑ, S
n
) with a
n
S
n
P
Σ
:= KΣK
R
k,k
(3)
and that the k parameters in ϑ are themselves the parameters of interest to the researcher.
It is assumed that the diagonal elements of the covariance matrix are positive, i.e., Σ
jj
> 0
for j = 1, . . . , k.
Then, the standardized estimator
ˆ
ϑ
n
is again asymptotically normally distributed
T
n
:= D
1/2
n
(
ˆ
ϑ
n
ϑ)
a
N
k
(0, R
n
) (4)
where D
n
= diag(S
n
) is the diagonal matrix given by the diagonal elements of S
n
and
R
n
= D
1/2
n
S
n
D
1/2
n
R
k,k
1

is the correlation matrix of the k-dimensional statistic T
n
. To demonstrate (4), note that
with (3) we have a
n
S
n
P
Σ
and a
n
D
n
P
diag(Σ
). Define the sequence ˜a
n
needed to
establish ˜a-convergence in (
4) by ˜a
n
1. Then we have
˜a
n
R
n
= D
1/2
n
S
n
D
1/2
n
= (a
n
D
n
)
1/2
(a
n
S
n
)(a
n
D
n
)
1/2
P
diag(Σ
)
1/2
Σ
diag(Σ
)
1/2
=: R R
k,k
where the convergence in probability to a constant follows from Slutzky’s Theorem (The-
orem 1.5.4,
Serfling, 1980) and therefore (4) holds. To finish note that
T
n
= D
1/2
n
(
ˆ
ϑ
n
ϑ) = (a
n
D
n
)
1/2
a
1/2
n
(
ˆ
ϑ
n
ϑ)
d
N
k
(0, R).
For the purposes of multiple comparisons, we need convergence of multivariate probabilities
calculated for the vector T
n
when T
n
is assumed normally distributed with R
n
treated
as if it were the true correlation matrix. However, such probabilities P(max(|T
n
| t)
are continuous functions of R
n
(and a critical value t) which converge by R
n
P
R as
a consequence of Theorem 1.7 in
Serfling (1980). In cases where T
n
is assumed multi-
variate t distributed with R
n
treated as the estimated correlation matrix, we have similar
convergence as the degrees of freedom approach infinity.
Since we only assume that the parameter estimates are asymptotically normally distributed
with a consistent estimate of the associated covariance matrix being available, our frame-
work covers a large class of statistical models, including linear regression and ANOVA
models, generalized linear models, linear mixed effects models, the Cox model, robust lin-
ear models, etc. Standard software packages can be used to fit such models and obtain
the estimates
ˆ
θ
n
and S
n
which are essentially the only two quantities that are needed for
what follow s in Section˜
3. It should be noted that the elemental parameters θ are not
necessarily means or differences of means in AN(C)OVA models. Also, we do not restrict
our attention to contrasts of such means, but allow for any set of constants leading to the
linear functions ϑ = Kθ of interest. Specific examples for K and θ will be given later in
Sections˜
4 and 6.
3 Global and Simultaneous Inference
Based on the results from Section˜
2, we now focus on the derivation of suitable inference
procedures. We start considering the general linear hypothesis (
Searle, 1971) formulated
in terms of our parameters of interest ϑ
H
0
: ϑ := Kθ = m.
2

Under the conditions of H
0
it follows from Section˜2 that
T
n
= D
1/2
n
(
ˆ
ϑ
n
m)
a
N
k
(0, R
n
).
This approximating distribution will now be used as the reference distribution when con-
structing the inference procedures. The global hypothesis H
0
can be tested using standard
global tests, such as the F - or the χ
2
-test. An alternative approach is to use maximum
tests, as explained in Subsection˜
3.1. Note that a small global p-value (obtained from one
of these procedures) leading to a rejection of H
0
does not give further indication about
the nature of the significant result. Therefore, one is often interested in the individual null
hypotheses
H
j
0
: ϑ
j
= m
j
.
Testing the hypotheses set {H
1
0
, . . . , H
k
0
} simultaneously thus requires the individual as-
sessments while maintaining the familywise error rate, as discussed in Subsection˜3.2
At this point it is worth considering two special cases. A stronger a ssumption than asymp-
totic normality of
ˆ
θ
n
in (
2) is exact normality, i.e.,
ˆ
θ
n
N
p
(θ, Σ). If the covariance matrix
Σ is known, it follows by standard arguments that T
n
N
k
(0, R), when T
n
is normalized
using fixed, known variances. Otherwise, in the typical situation of linear models with
normal i.i.d. errors, Σ = σ
2
A, where σ
2
is unknown but A is fixed and known, the exact
distribution of T
n
is a k-dimensional multivariate t
k
(ν, R) distribution with ν degrees of
freedom (ν = n p 1 for linear models), see
Tong (1990 ).
3.1 Global Inference
The F - and the χ
2
-test are classical approaches to assess the global null hypothesis H
0
.
Standard results (such as Theorem 3.5,
Serfling, 1980) ensure that
X
2
= T
n
R
+
n
T
n
d
χ
2
(Rank(R)) when
ˆ
θ
n
a
N
p
(θ, S
n
)
F =
T
n
R
+
T
n
Rank(R)
F(Rank(R), ν) when
ˆ
θ
n
N
p
(θ, σ
2
A),
where Rank(R) and ν are the corresponding degrees of freedom of the χ
2
and F distri-
bution, respectively. Furthermore, Rank(R
n
)
+
denotes the Moore-Penrose inverse of the
correlation matrix Rank(R).
Another suitable scalar test statistic for testing the global hypothesis H
0
is to consider
the maximum of the individual test statistics T
1,n
, . . . , T
k,n
of the multivariate statistic
T
n
= (T
1,n
, . . . , T
k,n
), leading to a max-t type test statistic max(|T
n
|). The distribution
of this statistic under the conditions of H
0
can be handled through the k-dimensional
distribution
P(max(|T
n
|) t)
=
t
Z
t
· · ·
t
Z
t
ϕ
k
(x
1
, . . . , x
k
; R, ν) dx
1
· · · dx
k
=: g
ν
(R, t) (5)
3

Citations
More filters
Book
11 Jul 2019
TL;DR: The aim of the present paper is to provide an overview of state-of-the-art dose-response analysis, both in terms of general concepts that have evolved and matured over the years and by means of concrete examples.
Abstract: Dose-response analysis can be carried out using multi-purpose commercial statistical software, but except for a few special cases the analysis easily becomes cumbersome as relevant, non-standard output requires manual programming. The extension package drc for the statistical environment R provides a flexible and versatile infrastructure for dose-response analyses in general. The present version of the package, reflecting extensions and modifications over the last decade, provides a user-friendly interface to specify the model assumptions about the dose-response relationship and comes with a number of extractors for summarizing fitted models and carrying out inference on derived parameters. The aim of the present paper is to provide an overview of state-of-the-art dose-response analysis, both in terms of general concepts that have evolved and matured over the years and by means of concrete examples.

1,827 citations


Cites methods from "Simultaneous inference in general p..."

  • ...However, adjusted p-values controlling the family-wise error rate may be obtained using the function glht() in the packagemultcomp, assuming that test statistics jointly follow a multivariate normal or t distribution [43]....

    [...]

Journal ArticleDOI
Jonas Schulte-Schrepping1, Nico Reusch1, Daniela Paclik2, Kevin Baßler1, Stephan Schlickeiser2, Bowen Zhang3, Benjamin Krämer4, Tobias Krammer, Sophia Brumhard2, Lorenzo Bonaguro1, Elena De Domenico5, Daniel Wendisch2, Martin Grasshoff3, Theodore S. Kapellos1, Michael Beckstette3, Tal Pecht1, Adem Saglam5, Oliver Dietrich, Henrik E. Mei6, Axel Schulz6, Claudia Conrad2, Désirée Kunkel2, Ehsan Vafadarnejad, Cheng-Jian Xu3, Cheng-Jian Xu7, Arik Horne1, Miriam Herbert1, Anna Drews5, Charlotte Thibeault2, Moritz Pfeiffer2, Stefan Hippenstiel2, Andreas C. Hocke2, Holger Müller-Redetzky2, Katrin-Moira Heim2, Felix Machleidt2, Alexander Uhrig2, Laure Bosquillon de Jarcy2, Linda Jürgens2, Miriam Stegemann2, Christoph R. Glösenkamp2, Hans-Dieter Volk2, Christine Goffinet2, Markus Landthaler8, Emanuel Wyler8, Philipp Georg2, Maria Schneider2, Chantip Dang-Heine2, Nick Neuwinger2, Kai Kappert2, Rudolf Tauber2, Victor M. Corman2, Jan Raabe4, Kim Melanie Kaiser4, Michael To Vinh4, Gereon Rieke4, Christian Meisel2, Thomas Ulas5, Matthias Becker5, Robert Geffers, Martin Witzenrath2, Christian Drosten2, Norbert Suttorp2, Christof von Kalle2, Florian Kurth2, Florian Kurth9, Florian Kurth10, Kristian Händler5, Joachim L. Schultze1, Joachim L. Schultze5, Anna C. Aschenbrenner1, Anna C. Aschenbrenner7, Yang Li7, Yang Li3, Jacob Nattermann4, Birgit Sawitzki2, Antoine-Emmanuel Saliba, Leif E. Sander2, Angel Angelov, Robert Bals, Alexander Bartholomäus, Anke Becker, Daniela Bezdan, Ezio Bonifacio, Peer Bork, Thomas Clavel, Maria Colomé-Tatché, Andreas Diefenbach, Alexander T. Dilthey, Nicole Fischer, Konrad U. Förstner, Julia-Stefanie Frick, Julien Gagneur, Alexander Goesmann, Torsten Hain, Michael Hummel, Stefan Janssen, Jörn Kalinowski, René Kallies, Birte Kehr, Andreas Keller, Sarah Kim-Hellmuth, Christoph Klein, Oliver Kohlbacher, Jan O. Korbel, Ingo Kurth, Kerstin U. Ludwig, Oliwia Makarewicz, Manja Marz, Alice C. McHardy, Christian Mertes, Markus M. Nöthen, Peter Nürnberg, Uwe Ohler, Stephan Ossowski, Jörg Overmann, Silke Peter, Klaus Pfeffer, Anna R. Poetsch, Alfred Pühler, Nikolaus Rajewsky, Markus Ralser, Olaf Rieß, Stephan Ripke, Ulisses Nunes da Rocha, Philip Rosenstiel, Philipp H. Schiffer, Eva-Christina Schulte, Alexander Sczyrba, Oliver Stegle, Jens Stoye, Fabian J. Theis, Janne Vehreschild, Jörg Vogel, Max von Kleist, Andreas Walker, Jörn Walter, Dagmar Wieczorek, John Ziebuhr 
17 Sep 2020-Cell
TL;DR: This study provides detailed insights into the systemic immune response to SARS-CoV-2 infection and it reveals profound alterations in the myeloid cell compartment associated with severe COVID-19.

1,042 citations


Cites methods from "Simultaneous inference in general p..."

  • ...P values resulting from differential abundance testing (via R multcomp and lsmeans packages) were adjusted using the Benjamini-Hochberg procedure and an FDR-cutoff of 5% across all clusters/ subsets and between-group comparisons (Hothorn et al., 2008; Lenth, 2016)....

    [...]

Book
27 Jul 2010
TL;DR: In this article, the multcomp package is used for multiple comparisons with a control and all pairwise comparisons with the same pairwise comparison, under the assumption of heteroscedasticity.
Abstract: Introduction General Concepts Error rates and general concepts Construction methods Methods based on Bonferroni's inequality Methods based on Simes' inequality Multiple Comparisons in Parametric Models General linear models Extensions to general parametric models The multcomp package Applications Multiple comparisons with a control All pairwise comparisons Dose response analyses Variable selection in regression models Simultaneous confidence bands Multiple comparisons under heteroscedasticity Multiple comparisons in logistic regression models Multiple comparisons in survival models Multiple comparisons in mixed-effects models Further Topics Resampling-based multiple comparison procedures Group sequential and adaptive designs Combining multiple comparisons with modeling Bibliography Index

923 citations

Book
01 Jan 2003

911 citations

Journal ArticleDOI
TL;DR: DNA methylation profiles segregated patients with CEBPA aberrations from other subtypes of leukemia, defined four epigenetically distinct forms of AML with NPM1 mutations, and showed that established AML1-ETO, CBFb-MYH11, and PML-RARA leukemia entities are associated with specific methylation profile.

767 citations


Cites methods from "Simultaneous inference in general p..."

  • ...…of the aberrant DNA methylation signature for each cluster was performed using an ANOVA test, with correction for multiple testing according to the Benjamini-Hochberg method, followed by Dunnett’s post hoc test using the normal CD34+ samples as the reference group (Hothorn et al., 2008)....

    [...]

  • ...Identification of the aberrant DNA methylation signature for each cluster was performed using an ANOVA test, with correction for multiple testing according to the Benjamini-Hochberg method, followed by Dunnett’s post hoc test using the normal CD34+ samples as the reference group (Hothorn et al., 2008)....

    [...]

References
More filters
Book
01 Jan 1987
TL;DR: This paper presents the results of a two-year study of the statistical treatment of outliers in the context of one-Dimensional Location and its applications to discrete-time reinforcement learning.
Abstract: 1. Introduction. 2. Simple Regression. 3. Multiple Regression. 4. The Special Case of One-Dimensional Location. 5. Algorithms. 6. Outlier Diagnostics. 7. Related Statistical Techniques. References. Table of Data Sets. Index.

6,955 citations

Book
08 Dec 1980
TL;DR: In this paper, the basic sample statistics are used for Parametric Inference, and the Asymptotic Theory in Parametric Induction (ATIP) is used to estimate the relative efficiency of given statistics.
Abstract: Preliminary Tools and Foundations. The Basic Sample Statistics. Transformations of Given Statistics. Asymptotic Theory in Parametric Inference. U--Statistics. Von Mises Differentiable Statistical Functions. M--Estimates. L--Estimates. R--Estimates. Asymptotic Relative Efficiency. Appendix. References. Author Index. Subject Index.

4,827 citations

Book
01 Jan 1971

3,429 citations


"Simultaneous inference in general p..." refers methods in this paper

  • ...We refer to Hsu (1996), Chapter 7, and Searle (1971), Chapter 7.3, for further discussions and examples on this issue....

    [...]

  • ...We start considering the general linear hypothesis (Searle, 1971) formulated in terms of our parameters of interest ϑ H0 : ϑ := Kθ = m....

    [...]

  • ...We start considering the general linear hypothesis (Searle, 1971) formulated in terms of our parameters of interest θ...

    [...]

Book
01 Sep 1987
TL;DR: In this article, a theory of multiple comparison problems is presented, along with a procedure for pairwise and more general comparisons among all treatments among all the treatments in a clinical trial.
Abstract: PROCEDURES BASED ON CLASSICAL APPROACHES FOR FIXED--EFFECTS LINEAR MODELS WITH NORMAL HOMOSCEDASTIC INDEPENDENT ERRORS. Some Theory of Multiple Comparisons Procedure Fixed--effects Linear Models. Single--step Procedures for Pairwise and More General Comparisons Among All Treatments. Stepwise Procedures for Pairwise and More General Comparisons Among All Treatments. Procedures for Some Other Nonhierarchical Finite Families of Comparisons. Designing Experiments for Multiple Comparisons. PROCEDURES FOR OTHER MODELS AND PROBLEMS, AND PROCEDURES BASED ON ALTERNATIVE APPROACHES. Procedures for One--way Layouts with Unequal Variances. Procedures for Some Mixed--effects Models. Distribution--free and Robust Procedures. Some Miscellaneous Multiple Comparison Problems. Optimal Procedures Using Decision--theoretic, Bayesian, and Other Approaches. Appendixes. Tables. References. Index.

2,401 citations

Journal ArticleDOI

1,695 citations


"Simultaneous inference in general p..." refers methods in this paper

  • ...For a general reading on multiple comparison procedures we refer to Hochberg and Tamhane (1987) and Hsu (1996)....

    [...]

  • ...Multiple comparisons in linear models have been in use for a long time, see Hochberg and Tamhane (1987), Hsu (1996), and Bretz et ̃al....

    [...]

Frequently Asked Questions (14)
Q1. What are the contributions mentioned in the paper "Simultaneous inference in general parametric models" ?

In this paper the authors describe simultaneous inference procedures in general parametric models, where the experimental questions are specified through a linear combination of elemental model parameters. Several examples using a variety of different statistical models illustrate the breadth ∗This is a preprint of an article published in Biometrical Journal, Volume 50, Number 3, 346–363. 

The so-called ”treatment contrast” vector θ = (µ, γ2− γ1, γ3− γ1, . . . , γq−γ1) is, for example, the default re-parametrization used as elemental parameters in the R-system for statistical computing (R Development Core Team, 2008). 

In this paper the authors aim at a unified description of simultaneous inference procedures in parametric models with generally correlated parameter estimates. 

Single-step procedures have the advantage that corresponding simultaneous confidence intervals are easily available, as previously noted. 

That is, for a given family of null hypotheses H10 , . . . , H k 0 , an individual hypothesis H j 0 is rejected only if all intersection hypotheses HJ = ⋂ i∈J H i 0 with j ∈ J ⊆ {1, . . . , k} are rejected (Marcus et˜al., 1976). 

By construction, the authors can reject an individual null hypothesis Hj0 , j = 1, . . . , k, whenever the associated adjusted p-value is less than or equal to the pre-specified significance level α, i.e., pj ≤ α. 

The response is modelled by a linear combination of the covariates with normal error εi and constant variance σ 2,Yi = β0 +q ∑j=1βjXij + 

The general framework described here extends the current canonical theory with respect to the following aspects: (i) model assumptions such as normality and homoscedasticity are relaxed, thus allowing for simultaneous inference in generalized linear models, mixed effects models, survival models, etc.; (ii) arbitrary linear functions of the elemental parameters are allowed, not just contrasts of means in AN(C)OVA models; (iii) computing the reference distribution is feasible for arbitrary designs, especially for unbalanced designs; and (iv) a unified implementation is provided which allows for a fast transition of the theoretical results to the desks of data analysts interested in simultaneous inferences for multiple hypotheses. 

In the present context of single-step tests, the (at least asymptotic) adjusted p-value for the jth individual two-sided hypothesis Hj0 : ϑj = mj, j = 1, . . . , k, is given bypj = 1− gν(Rn, |tj|),where t1, . . . , tk denote the observed test statistics. 

The resulting global p-value (exact or approximate, depending on context) for H0 is 1 − gν(Rn,max |t|) when T = t has been observed. 

Another suitable scalar test statistic for testing the global hypothesis H0 is to consider the maximum of the individual test statistics T1,n, . . . , Tk,n of the multivariate statistic Tn = (T1,n, . . . , Tk,n), leading to a max-t type test statistic max(|Tn|). 

Examples of such multiple comparison procedures include Dunnett’s many-to-one comparisons, Tukey’s all-pairwise comparisons, sequential pairwise contrasts, comparisons with the average, changepoint analyses, dose-response contrasts, etc. 

Because it is impossible to determine the parameters of interest automatically in this case, mcp() in multcomp will by default generate comparisons for the main effects γj only, ignoring covariates and interactions. 

Then the authors haveãnRn = D −1/2 n S ⋆ nD −1/2 n= (anDn) −1/2(anS ⋆ n)(anDn) −1/2P −→ diag(Σ⋆)−1/2 Σ⋆ diag(Σ⋆)−1/2 =: R ∈ Rk,kwhere the convergence in probability to a constant follows from Slutzky’s Theorem (Theorem 1.5.4, Serfling, 1980) and therefore (4) holds.