# A Baseline Category Logit Model for Assessing Competing Strains of Rhizobium Bacteria

29 Mar 2011-Journal of Agricultural Biological and Environmental Statistics (Springer-Verlag)-Vol. 16, Iss: 3, pp 409-421

TL;DR: In this article, the authors describe a methodology for evaluating competition among strains of rhizobium bacteria which can be found naturally occurring in or can be introduced into soil and propose an extension of multinomial baseline category logit models that includes multiple offsets and random terms to allow for correlation among clustered responses.

Abstract: In this paper we describe novel methodology for evaluating competition among strains of Rhizobium bacteria which can be found naturally occurring in or can be introduced into soil. Rhizobia can occupy nodules on the roots of legume plants allowing the plant to ‘fix’ atmospheric nitrogen. Our model defines competitive outcomes for a community (the multinomial count of nodules occupied by each strain at the end of a time period) relative to the past state of the community (the proportion of each strain present at the beginning of the time period) and incorporates this prior information in the analysis. Our approach for assessing competition provides an analogy to multivariate methods for continuous responses in competition studies and an alternative to univariate methods for discrete responses that respects the multivariate nature of the data. It can also handle zero values in the multinomial response providing an alternative to compositional data analysis methods, which traditionally have not been able to facilitate zero values. The proposed experimental design is based on the simplex design and the model is an extension of multinomial baseline category logit models that includes multiple offsets and random terms to allow for correlation among clustered responses. Supplemental materials for this article are available from the journal website.

## Summary (2 min read)

Jump to: [1. INTRODUCTION] – [2. METHODS] – [3.1. EXPERIMENTAL DESIGN] – [3.2. MODEL FITTING] – [3.3. RESULTS] and [4. DISCUSSION]

### 1. INTRODUCTION

- Competition occurs among species when a required resource is limited and the species ‘compete’ to each obtain the resource.
- Offsets have previously been used with models for discrete responses (logistic regression in Agresti 2002) but multiple offsets have not been used with multinomial models or for the purpose of assessing competition among species.
- The experimental response in their motivating example is the number of nodules acquired by each strain of rhizobia in each community and this is a multinomial vector.
- The novel features are the marrying of simplex designs with multinomial responses in a discrete modeling framework that defines competitive outcomes for a community of species relative to a previous state of the community and incorporates this prior information in the analysis.

### 2. METHODS

- The authors propose a multinomial baseline category logit model (Agresti 2002) to measure the competition between J species that will allow the assessment of competitive relationships among species and consequences for community structure.
- This model is analogous to the specification of the RGRD model in Connolly and Wayne (2005, Equation (4)).
- The authors extend this model to include a community specific random effect to allow for variation from community-to-community (Hartzel, Agresti, and Caffo 2001).
- To interpret the model the final proportions of success counts for each species can be predicted for a range of initial communities and these predictions used to determine the outcome of competition.
- Π̂ij /pij compares the predicted proportion of success counts relative to initial proportion present for an individual species, also known as Compositional change measure (1).

### 3.1. EXPERIMENTAL DESIGN

- When a Rhizobium strain has occupied a nodule on the root of a legume, it normally has the ability to ‘fix’ nitrogen (N) from the atmosphere and supply the host plant with N and provide additional N in the legume environment.
- Competition was investigated among M. loti strains Ml8,Ml19 and Ml16; named A, B and C, respectively, from here on.
- Nodule occupancy of the rhizobial strains was determined by ERIC-PCR fingerprinting (de Bruijn 1992).
- Several of the responses for individual species were zero.

### 3.2. MODEL FITTING

- The authors fitted a series of multinomial baseline category logit random effects models to the multinomial data.
- The authors maximized the log of the likelihood function given in (2.5) using the NLMIXED procedure in SAS software .
- The authors predicted from the fitted model for a range of initial compositions using Equation (2.6).
- Approximate standard errors were generated for these tests using the Delta method (Billingsley 1986).

### 3.3. RESULTS

- The final model, after extensive model selection using AIC , included the initial proportions of each strain and density in the linear predictor.
- Two interaction terms piApiB and piBpiC were also included and these interactions were found to be of similar strength and so were constrained to be equal.
- While the inclusion of the random effects to account for variation from community-to-community was not significant , this component was included in the models to respect the structure in the experimental design.
- Based on these two compositional change measures, strain C was the most competitive strain, particularly at high density, while there was no out-right winner between strains A and B. Ta bl e 2.

### 4. DISCUSSION

- In this paper the authors present an experimental design and modeling framework for assessing multinomial responses from multiple species competition studies.
- It also provides a multivariate alternative to the univariate methods used for discrete responses based on Lotka–Volterra models (Leslie 1958; May 2001) that allows for correlation among responses within a community.
- Strain C occupied a large number of the nodules even when it was least represented in the inoculum particularly at high inoculum density, which in general is an indication of a highly competitive strain (Thies, Benbohlool, and Singleton 1992).
- The authors have shown that using an appropriate simplex design allows the fitting of model (2.7) through which they can assess the relative competitiveness of species, whether species interfere with or interact with each other, and the outcome of these interspecific relationships on community composition.
- The authors model is also closely related to the Lotka–Volterra differential equations for competing species.

Did you find this useful? Give us your feedback

Supplementary materials for this article are available at 10.1007/s13253-011-0058-6.

A Baseline Category Logit Model for Assessing

Competing Strains of Rhizobium Bacteria

C. BROPHY,J.CONNOLLY,I.L.FAGERLI,S.DUODU,and

M. M. S

VENNING

In this paper we describe novel methodology for evaluating competition among

strains of Rhizobium bacteria which can be found naturally occurring in or can be in-

troduced into soil. Rhizobia can occupy nodules on the roots of legume plants allowing

the plant to ‘ﬁx’ atmospheric nitrogen. Our model deﬁnes competitive outcomes for a

community (the multinomial count of nodules occupied by each strain at the end of a

time period) relative to the past state of the community (the proportion of each strain

present at the beginning of the time period) and incorporates this prior information

in the analysis. Our approach for assessing competition provides an analogy to mul-

tivariate methods for continuous responses in competition studies and an alternative

to univariate methods for discrete responses that respects the multivariate nature of the

data. It can also handle zero values in the multinomial response providing an alternative

to compositional data analysis methods, which traditionally have not been able to facil-

itate zero values. The proposed experimental design is based on the simplex design and

the model is an extension of multinomial baseline category logit models that includes

multiple offsets and random terms to allow for correlation among clustered responses.

Supplemental materials for this article are available from the journal website.

Key Words: Competition with discrete response; Compositional data analysis; Dis-

crete multivariate analysis; Random effects; Simplex design; Zero values.

1. INTRODUCTION

Competition occurs among species when a required resource is limited and the species

‘compete’ to each obtain the resource. Competition has been widely studied experimen-

tally across many organisms (Nicol and Thornton 1941; Connell 1983; Schoener 1983;

C. Brophy (

) is a Lecturer in Statistics, Department of Mathematics & Statistics, National University of Ire-

land Maynooth, Maynooth, Co. Kildare, Ireland (E-mail: caroline.brophy@nuim.ie). J. Connolly is Associate

Professor in Statistics, UCD School of Mathematical Sciences, Environmental & Ecological Modelling Group,

University College Dublin, Belﬁeld, Dublin 4, Ireland. I. L. Fagerli was a Research Student and M. M. Svenning

is Professor and Head, Department of Arctic and Marine Biology, University of Tromsø, 9037 Tromsø, Norway.

S. Duodu is a Researcher, National Veterinary Institute, P.O. Box 750, Sentrum, 0106 Oslo, Norway.

© 2011 International Biometric Society

Journal of Agricultural, Biological, and Environmental Statistics, Volume 16, Number 3, Pages 409–421

DOI: 10.1007/s13253-011-0058-6

409

410 C. BROPHY ET AL.

Firbank and Watkinson 1985; Goldberg and Barton 1992; Iwasa, Nakamaru, and Levin

1998). The analytical approaches for assessing effects range from multivariate models for

continuous responses (Connolly and Wayne 2005) to univariate approaches for discrete re-

sponses (May 2001) to compositional methods (Aitchison 1986; Aitchison and Ng 2005).

Here we develop a modeling approach for discrete multinomial response data that extends

the current competition literature in three ways: (1) it is analogous to a competition model

derived for continuous responses by Connolly and Wayne (2005) that deﬁnes competitive

outcomes relative to the past state of the community and incorporates this prior information

in the analysis, (2) it allows for the multivariate nature of the response data, (3) it will han-

dle zero response values. Our model is a baseline category logit model extended to include

random effects (Hartzel, Agresti, and Caffo 2001) to allow for correlated responses and

multiple offset terms to allow for initial starting values of species. Offsets have previously

been used with models for discrete responses (logistic regression in Agresti 2002)butmul-

tiple offsets have not been used with multinomial models or for the purpose of assessing

competition among species.

The models developed in this paper are motivated by a study of competition among

strains of rhizobia bacteria, which are found naturally occurring in soil or can be introduced

deliberately into soil. Rhizobia can occupy nodules on the root of legume plant species

resulting in atmospheric nitrogen ﬁxation and thereby supply the host plant with N and

provide additional N in the legume environment. This natural source of N can be beneﬁcial

to the productivity of grassland systems and can reduce the cost of running the system. It

is possible that some strains of rhizobia are superior at occupying nodules and at ﬁxing N.

Does the proportion of strains of rhizobia present in the soil at a given point in time affect

the proportion of nodules that the strain will occupy at a later time? To answer this question

we applied three strains of rhizobia to the roots of a legume species in a range of initial

proportions and after a period of time counted the number of nodules each strain occupied.

There were a limited number of available sites for nodulation and the strains competed to

occupy them. For each community (root section) we have a vector of initial proportions

and a ﬁnal multinomial response vector. We modeled the change from initial proportion

applied to ﬁnal proportion of nodules occupied for each strain.

In a community, a good competitor is one that gains proportionately more over time

than other species (Connolly, Wayne, and Bazzaz 2001). Connolly and Wayne (2005) and

Ramseier, Connolly, and Bazzaz (2005) developed a multivariate modeling approach to

assessing the effects of the species identity, environment and species initial relative abun-

dance on the outcome of competition. The continuous and multivariate response measured

was the relative growth rate of each species in a community over a period of time. The

variable(s) modeled were the differences in relative growth rates between pairs of species

in a community, giving the name RGRD (relative growth rate difference) to the models.

The RGRD model does not currently facilitate discrete responses.

When the response for each species in a community is a discrete whole number each

experimental community provides a multinomial response vector. There is a long his-

tory of modeling approaches to community dynamics for such discrete responses (May

2001) and these can been related to a discrete version of the Lotka–Volterra model

MODELING MULTINOMIAL DATA 411

(Leslie 1958). However, these approaches rarely deal with the multivariate nature of these

types of data. Other approaches have been to use compositional data analysis methods

for changing compositions (Aitchison 1986; Aitchison and Ng 2005), but these meth-

ods break down when species with zero compositions occur in the response. Some ap-

proaches to facilitate zero methods have been developed (e.g. Aitchison and Kay 2003;

Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn 2003; Butler and Glasbey 2008)

but these rely on assumptions about the type of zero or are suited only to analysis for

speciﬁc hypotheses e.g. to compare compositions of different groups.

In a simplex design (Scheffe 1963; Cornell 2002), the initial relative abundances of

competing species are manipulated so that not all experimental communities have all

species equally present to begin with. This design has been used in a range of multi-

species competition studies (e.g. Ramseier, Connolly, and Bazzaz 2005; Kirwan et al. 2007;

Suter et al. 2007) as it allows a broad coverage of the design space and facilitates the

simultaneous assessment of species identity, the effect of species on each other and, if

required, environmental effects (Connolly, Wayne, and Bazzaz 2001). Ideally, in compe-

tition studies, the simplex design would comprise a wide range of compositions in the

simplex space at a number of overall densities (Ramseier, Connolly, and Bazzaz 2005;

Kirwan et al. 2007).

In this paper we propose an experimental and analytical framework for assessing com-

petition among species where the outcome is discrete. The experimental response in our

motivating example is the number of nodules acquired by each strain of rhizobia in each

community and this is a multinomial vector. We describe a multinomial modeling frame-

work for discrete responses from this multi-strain competition experiment and the experi-

mental design needed to estimate model parameters, and we detail how to predict and test

predictions from the models. The novel features are the marrying of simplex designs with

multinomial responses in a discrete modeling framework that deﬁnes competitive outcomes

for a community of species relative to a previous state of the community and incorporates

this prior information in the analysis.

2. METHODS

We propose a multinomial baseline category logit model (Agresti 2002) to measure

the competition between J species (categories) that will allow the assessment of compet-

itive relationships among species and consequences for community structure. The cate-

gorical response vector is (y

i1

,...,y

iJ

) for i =1,...,c (the number of communities) and

j = 1,...,J (the number of species) and represents the number of ‘success counts’ for

each species at time t with

J

j=1

y

ij

= n

i

being the total number of success counts for

community i. A multinomial baseline category logit model is a series of J − 1 models

relating the jth to the J th species where the J th species is called the baseline category.

The ordering of the j = 1toJ species and the use of a particular species as the ‘base-

line’ is arbitrary and independent of interpretation. We can model the vector of parameters

(π

i1

,...,π

iJ

), the proportion of success counts for each species in the ith community at

412 C. BROPHY ET AL.

time t, with

J

j=1

π

ij

=1, as

log

π

ij

π

iJ

=x

i

β

j

for j =1,...,J −1 (2.1)

where x

i

denotes the vector of K explanatory variables for the ith community, β

j

is the

parameter vector of coefﬁcients for the j th model and could include abiotic effects such

as an environmental treatment. If β

j

= 0, then

π

ij

π

iJ

= 1 and we conclude that species j

and J have the same proportion of success counts at time t. While model (2.1) can assess

proportion of success counts by species relative to the baseline species at a given point in

time (t), it can not address questions of competitive relations or consequences for commu-

nity dynamics without incorporating information on the proportions of each species in the

community at time 0 (or some other reference time) (Connolly, Wayne, and Bazzaz 2001).

If the proportion of each species initially present in the ith community at time 0 is given

by the vector (p

i1

,...,p

iJ

), then we propose the model:

log

π

ij

/p

ij

π

iJ

/p

iJ

=x

i

β

j

for j =1,...,J −1 (2.2)

which can be rewritten as

log

π

ij

π

iJ

=x

i

β

j

+log

p

ij

p

iJ

for j =1,...,J −1 (2.3)

where log(

p

ij

p

iJ

) is an offset term, i.e. a regression term with known coefﬁcient equal to 1.

If β

j

=0, it indicates no change in relative abundance from time 0 to time t between the

two competing species j and J and implies that the two species are equally competitive.

This model is analogous to the speciﬁcation of the RGRD model in Connolly and Wayne

(2005, Equation (4)).

We extend this model to include a community speciﬁc random effect to allow for varia-

tion from community-to-community (Hartzel, Agresti, and Caffo 2001). The model com-

paring the j th to the J th species is

log

π

ij

π

iJ

=x

i

β

j

+log

p

ij

p

iJ

+z

i

u

ij

for j =1,...,J −1 (2.4)

where z

i

denotes the design vector for the random effect for the ith community and u

ij

is

assumed multivariate normal with an unstructured covariance matrix () to keep indepen-

dence of the choice of baseline category (Hartzel, Agresti, and Caffo 2001).

We can ﬁt model (2.4) using maximum likelihood. Denoting the linear predictor, lp

ij

=

x

i

β

j

+log(

p

ij

p

iJ

) +z

i

u

ij

, the likelihood function for the ith response vector is, integrating

out the random effects and omitting a ﬁxed constant:

i

∞

−∞

···

∞

−∞

J −1

j

exp(lp

ij

)

1 +

J −1

j=1

exp(lp

ij

)

y

ij

1

1 +

J −1

j=1

exp(lp

ij

)

y

iJ

×f(u

ij

;) du

ij

. (2.5)

MODELING MULTINOMIAL DATA 413

We predict (denoted by the ˆsymbol, which is also used to denote the maximum likelihood

estimate of model parameters) the proportion of success counts for the j th species from

the model at the median value of the random effect using the equations:

ˆπ

ij

=

exp

x

i

ˆ

β

j

+log

p

ij

p

iJ

1 +

J −1

j=1

exp

x

i

ˆ

β

j

+log

p

ij

p

iJ

for j =1,...,J −1,

ˆπ

iJ

=1 −

J −1

j=1

ˆπ

ij

for J.

(2.6)

While this model may be applied to a wide range of count data it is particularly relevant

to data from experiments based on a simplex design (Scheffe 1963; Cornell 2002) in which

the initial p

ij

values and overall initial density of species are deliberately manipulated. The

relative abundance of each species at time 0, (p

ij

,...,p

iJ

), may be important determinants

of species relative competitiveness and hence of the ﬁnal composition (π

ij

,...,π

iJ

) of the

ith community. At its simplest, the x matrix in model (2.4) would include the relative

abundances p

ij

,...,p

iJ

giving:

log

π

ij

π

iJ

=

J

k=1

β

jk

p

ik

+β

jD

D

i

+log

p

ij

p

iJ

+u

ij

for j =1,...,J −1 (2.7)

where p

ik

is the initial proportion of the kth species for k =1,...,J, D

i

is the total density

of the ith community and u

ij

is a random effect with variance σ

2

j

and may be correlated

with the other J −2 random effects. Interactions among the p

ik

’s and between the p

ik

’s

and other independent variables, such as a treatment factor or community density (D) may

also be included in the model speciﬁcation.

For model (2.7), if β

jk

=0 for all k = 1,...,J and β

jD

=0, then the relative propor-

tions of the j th and J th species are the same at times 0 and t, and species j and J are

equally competitive i.e. (

π

ij

π

iJ

) =(

p

ij

p

iJ

). When these parameters are not zero and interaction

effects are present, the number of competition coefﬁcients may mean it is difﬁcult to see

their combined impact on community relative composition. To interpret the model the ﬁ-

nal proportions of success counts for each species can be predicted for a range of initial

communities and these predictions used to determine the outcome of competition. Predic-

tions can be displayed graphically using ternary diagrams (where there are three competing

species), and we distinguish between two numerical comparisons. Compositional change

measure (1): ˆπ

ij

/p

ij

compares the predicted proportion of success counts relative to ini-

tial proportion present for an individual species. This measure determines how a species

performs relative to its own expectation (p

ij

) but even a species that performs better than

expected may not be the most competitive species. Compositional change measure (2):

ˆπ

ij

/p

ij

ˆπ

ij

/p

ij

for j =j

, compares two species and determines which is the more competitive of

the two.

##### Citations

More filters

••

University College Dublin

^{1}, Imperial College London^{2}, Maynooth University^{3}, Teagasc^{4}, University of Minnesota^{5}, ETH Zurich^{6}, University of the Republic^{7}, Helmholtz Centre for Environmental Research - UFZ^{8}, University of Lleida^{9}, Leipzig University^{10}TL;DR: It is shown that Generalized Diversity-Interactions models quantitatively integrate several methods that separately address effects of species richness, evenness and composition on ecosystem function, and serve to unify the modelling of BEF relationships.

Abstract: Summary 1. The development of models of the relationship between biodiversity and ecosystem function (BEF) has advanced rapidly over the last 20 years, incorporating insights gained through extensive experimental work. We propose Generalised Diversity-Interactions models that include many of the features of existing models and have several novel features. Generalised Diversity-Interactions models characterise the contribution of two species to ecosystem function as being proportional to the product of their relative abundances raised to the power of a coefficient h. 2. A value of h < 1 corresponds to a stronger than expected contribution of species’ pairs to ecosystem functioning, particularly at low relative abundance of species. 3. Varying the value of h has profound consequences for community-level properties of BEF relationships, including: (i) saturation properties of the BEF relationship; (ii) the stability of ecosystem function across communities; (iii) the likelihood of transgressive overyielding. 4. For low values of h, loss of species can have a much greater impact on ecosystem functioning than loss of community evenness. 5. Generalised Diversity-Interactions models serve to unify the modelling of BEF relationships as they include several other current models as special cases. 6. Generalised Diversity-Interactions models were applied to seven data sets and three functions: total biomass (five grassland experiments), community respiration (one bacterial experiment) and nitrate leaching (one earthworm experiment). They described all the nonrandom structure in the data in six experiments, and most of it in the seventh experiment and so fit as well or better than competing BEF models for these data. They were significantly better than Diversity-Interactions models in five experiments. 7. Synthesis. We show that Generalized Diversity-Interactions models quantitatively integrate several methods that separately address effects of species richness, evenness and composition on ecosystem function. They describe empirical data at least as well as alternative models and improve the ability to quantitatively test among several theoretical and practical hypotheses about the effects of

64 citations

### Cites background from "A Baseline Category Logit Model for..."

...They have been used in understanding the BEF relationship in a number of plant and invertebrate assemblages (Sheehan et al. 2006; Kirwan et al. 2007; Connolly et al. 2009, 2011; Frankow-Lindberg et al. 2009; Nyfeler et al. 2009; O’Hea, Kirwan & Finn 2010; Brophy et al. 2011)....

[...]

••

TL;DR: In this article , the authors used an univariate multiple logit regression model to determine the synergistic expression of transgenes and endogenous AMGs in the head kidney post-bacterial infection.

1 citations

29 Dec 2013

TL;DR: The inoculation of legumes with effective rhizobia or bradyrhizobia represents an inexpensive alternative to the use of chemical nitrogen fertilizers, whose prices have risen due to the high cost of energy involved in their production.

Abstract: The inoculation of legumes with effective rhizobia or bradyrhizobia represents an inexpensive alternative to the use of chemical nitrogen fertilizers, whose prices have risen due to the high cost of energy involved in their production. These fertilizers are also pollution hazards. The process of symbiotic biological nitrogen fixation requires that the host crop be adequately nodulated by the specific root-nodule bacteria effective in nitrogen fixation. Not all the strains of Rhizobium or Bradyrhizobium that can produce nodules on a given host are able to use N2 rapidly and efficiently. Nonetheless, selection of an effective (i.e. N2-fixing) strain is a prerequisite for any crop to be inoculated. A second important characteristic is the competitiveness of the strain. Unfortunately, effectiveness and competitiveness are generally mutually exclusive and are not dependent upon each other. Little information exists on the effects of systemic fungicides on symbiotic nitrogen fixation or nodulation. It has been reported that the systemic fungicide benomyl increased the relative abundance of nodules formed by the inoculated strain, the number of added rhizobia on the root, the total N content, and the percentage N of soybean plants grown in four soils when the seeds were inoculated with a benomyl-resistant strain of Bradyrhizobium japonicum. It was also found that oxamyl (a basipetally translocated fungicide) applied to the seeds, foliage, or both increased the yield, N content, percentage N, and weight of nodules, pods, and grains along with the number of nodules formed by the inoculated strain when soybean seeds were inoculated with oxamyl-resistant Rhizobium japonicum.

1 citations

### Cites background from "A Baseline Category Logit Model for..."

...Recently Brophy et al. (2011) described a novel methodology for evaluating competition among strains of Rhizobium bacteria which can be found naturally occurring in or can be introduced into soil....

[...]

##### References

More filters

•

01 Jan 1979

TL;DR: In this paper, the convergence of distributions is considered in the context of conditional probability, i.e., random variables and expected values, and the probability of a given distribution converging to a certain value.

Abstract: Probability. Measure. Integration. Random Variables and Expected Values. Convergence of Distributions. Derivatives and Conditional Probability. Stochastic Processes. Appendix. Notes on the Problems. Bibliography. List of Symbols. Index.

6,334 citations

••

TL;DR: In this article, the authors present a generalized linear model for categorical data, which is based on the Logit model, and use it to fit Logistic Regression models.

Abstract: Preface. 1. Introduction: Distributions and Inference for Categorical Data. 1.1 Categorical Response Data. 1.2 Distributions for Categorical Data. 1.3 Statistical Inference for Categorical Data. 1.4 Statistical Inference for Binomial Parameters. 1.5 Statistical Inference for Multinomial Parameters. Notes. Problems. 2. Describing Contingency Tables. 2.1 Probability Structure for Contingency Tables. 2.2 Comparing Two Proportions. 2.3 Partial Association in Stratified 2 x 2 Tables. 2.4 Extensions for I x J Tables. Notes. Problems. 3. Inference for Contingency Tables. 3.1 Confidence Intervals for Association Parameters. 3.2 Testing Independence in Two Way Contingency Tables. 3.3 Following Up Chi Squared Tests. 3.4 Two Way Tables with Ordered Classifications. 3.5 Small Sample Tests of Independence. 3.6 Small Sample Confidence Intervals for 2 x 2 Tables . 3.7 Extensions for Multiway Tables and Nontabulated Responses. Notes. Problems. 4. Introduction to Generalized Linear Models. 4.1 Generalized Linear Model. 4.2 Generalized Linear Models for Binary Data. 4.3 Generalized Linear Models for Counts. 4.4 Moments and Likelihood for Generalized Linear Models . 4.5 Inference for Generalized Linear Models. 4.6 Fitting Generalized Linear Models. 4.7 Quasi likelihood and Generalized Linear Models . 4.8 Generalized Additive Models . Notes. Problems. 5. Logistic Regression. 5.1 Interpreting Parameters in Logistic Regression. 5.2 Inference for Logistic Regression. 5.3 Logit Models with Categorical Predictors. 5.4 Multiple Logistic Regression. 5.5 Fitting Logistic Regression Models. Notes. Problems. 6. Building and Applying Logistic Regression Models. 6.1 Strategies in Model Selection. 6.2 Logistic Regression Diagnostics. 6.3 Inference About Conditional Associations in 2 x 2 x K Tables. 6.4 Using Models to Improve Inferential Power. 6.5 Sample Size and Power Considerations . 6.6 Probit and Complementary Log Log Models . 6.7 Conditional Logistic Regression and Exact Distributions . Notes. Problems. 7. Logit Models for Multinomial Responses. 7.1 Nominal Responses: Baseline Category Logit Models. 7.2 Ordinal Responses: Cumulative Logit Models. 7.3 Ordinal Responses: Cumulative Link Models. 7.4 Alternative Models for Ordinal Responses . 7.5 Testing Conditional Independence in I x J x K Tables . 7.6 Discrete Choice Multinomial Logit Models . Notes. Problems. 8. Loglinear Models for Contingency Tables. 8.1 Loglinear Models for Two Way Tables. 8.2 Loglinear Models for Independence and Interaction in Three Way Tables. 8.3 Inference for Loglinear Models. 8.4 Loglinear Models for Higher Dimensions. 8.5 The Loglinear Logit Model Connection. 8.6 Loglinear Model Fitting: Likelihood Equations and Asymptotic Distributions . 8.7 Loglinear Model Fitting: Iterative Methods and their Application . Notes. Problems. 9. Building and Extending Loglinear/Logit Models. 9.1 Association Graphs and Collapsibility. 9.2 Model Selection and Comparison. 9.3 Diagnostics for Checking Models. 9.4 Modeling Ordinal Associations. 9.5 Association Models . 9.6 Association Models, Correlation Models, and Correspondence Analysis . 9.7 Poisson Regression for Rates. 9.8 Empty Cells and Sparseness in Modeling Contingency Tables. Notes. Problems. 10. Models for Matched Pairs. 10.1 Comparing Dependent Proportions. 10.2 Conditional Logistic Regression for Binary Matched Pairs. 10.3 Marginal Models for Square Contingency Tables. 10.4 Symmetry, Quasi symmetry, and Quasiindependence. 10.5 Measuring Agreement Between Observers. 10.6 Bradley Terry Model for Paired Preferences. 10.7 Marginal Models and Quasi symmetry Models for Matched Sets . Notes. Problems. 11. Analyzing Repeated Categorical Response Data. 11.1 Comparing Marginal Distributions: Multiple Responses. 11.2 Marginal Modeling: Maximum Likelihood Approach. 11.3 Marginal Modeling: Generalized Estimating Equations Approach. 11.4 Quasi likelihood and Its GEE Multivariate Extension: Details . 11.5 Markov Chains: Transitional Modeling. Notes. Problems. 12. Random Effects: Generalized Linear Mixed Models for Categorical Responses. 12.1 Random Effects Modeling of Clustered Categorical Data. 12.2 Binary Responses: Logistic Normal Model. 12.3 Examples of Random Effects Models for Binary Data. 12.4 Random Effects Models for Multinomial Data. 12.5 Multivariate Random Effects Models for Binary Data. 12.6 GLMM Fitting, Inference, and Prediction. Notes. Problems. 13. Other Mixture Models for Categorical Data . 13.1 Latent Class Models. 13.2 Nonparametric Random Effects Models. 13.3 Beta Binomial Models. 13.4 Negative Binomial Regression. 13.5 Poisson Regression with Random Effects. Notes. Problems. 14. Asymptotic Theory for Parametric Models. 14.1 Delta Method. 14.2 Asymptotic Distributions of Estimators of Model Parameters and Cell Probabilities. 14.3 Asymptotic Distributions of Residuals and Goodnessof Fit Statistics. 14.4 Asymptotic Distributions for Logit/Loglinear Models. Notes. Problems. 15. Alternative Estimation Theory for Parametric Models. 15.1 Weighted Least Squares for Categorical Data. 15.2 Bayesian Inference for Categorical Data. 15.3 Other Methods of Estimation. Notes. Problems. 16. Historical Tour of Categorical Data Analysis . 16.1 Pearson Yule Association Controversy. 16.2 R. A. Fisher s Contributions. 16.3 Logistic Regression. 16.4 Multiway Contingency Tables and Loglinear Models. 16.5 Recent and Future? Developments. Appendix A. Using Computer Software to Analyze Categorical Data. A.1 Software for Categorical Data Analysis. A.2 Examples of SAS Code by Chapter. Appendix B. Chi Squared Distribution Values. References. Examples Index. Author Index. Subject Index. Sections marked with an asterisk are less important for an overview.

4,650 citations

•

21 Aug 1986

TL;DR: In this article, the authors present an approach to perform compositional analysis of geochemical compositions of rocks using logratio linear models and a combination of matrix covariance analysis and linear linear models.

Abstract: 1 Compositional data: some challenging problems.- 1.1 Introduction.- 1.2 Geochemical compositions of rocks.- 1.3 Sediments at different depths.- 1.4 Ternary diagrams.- 1.5 Partial analyses and subcompositions.- 1.6 Supervisory behaviour.- 1.7 Household budget surveys.- 1.8 Steroid metabolite patterns in adults and children.- 1.9 Activity patterns of a statistician.- 1.10 Calibration of white-cell compositions.- 1.11 Fruit evaluation.- 1.12 Firework mixtures.- 1.13 Clam ecology.- 1.14 Bibliographic notes.- Problems.- 2 The simplex as sample space.- 2.1 Choice of sample space.- 2.2 Compositions and simplexes.- 2.3 Spaces, vectors, matrices.- 2.4 Bases and compositions.- 2.5 Subcompositions.- 2.6 Amalgamations.- 2.7 Partitions.- 2.8 Perturbations.- 2.9 Geometrical representations of compositional data.- 2.10 Bibliographic notes.- Problems.- 3 The special difficulties of compositional data analysis.- 3.1 Introduction.- 3.2 High dimensionality.- 3.3 Absence of an interpretable covariance structure.- 3.4 Difficulty of parametric modelling.- 3.5 The mixture variation difficulty.- 3.6 Bibliographic notes.- Problems.- 4 Covariance structure.- 4.1 Fundamentals.- 4.2 Specification of the covariance structure.- 4.3 The compositional variation array.- 4.4 Recovery of the compositional variation array from the crude mean vector and covariance matrix.- 4.5 Subcompositional analysis.- 4.6 Matrix specifications of covariance structures.- 4.7 Some important elementary matrices.- 4.8 Relationships between the matrix specifications.- 4.9 Estimated matrices for hongite compositions.- 4.10 Logratios and logcontrasts.- 4.11 Covariance structure of a basis.- 4.12 Commentary.- 4.13 Bibliographic notes.- Problems.- 5 Properties of matrix covariance specifications.- 5.1 Logratio notation.- 5.2 Logcontrast variances and covariances.- 5.3 Permutations.- 5.4 Properties of P and QP matrices.- 5.5 Permutation invariants involving ?.- 5.6 Covariance matrix inverses.- 5.7 Subcompositions.- 5.8 Equivalence of characteristics of ?, ?, ?.- 5.9 Logratio-uncorrelated compositions.- 5.10 Isotropic covariance structures.- 5.11 Bibliographic notes.- Problems.- 6 Logistic normal distributions on the simplex.- 6.1 Introduction.- 6.2 The additive logistic normal class.- 6.3 Density function.- 6.4 Moment properties.- 6.5 Composition of a lognormal basis.- 6.6 Class-preserving properties.- 6.7 Conditional subcompositional properties.- 6.8 Perturbation properties.- 6.9 A central limit theorem.- 6.10 A characterization by logcontrasts.- 6.11 Relationships with the Dirichlet class.- 6.12 Potential for statistical analysis.- 6.13 The multiplicative logistic normal class.- 6.14 Partitioned logistic normal classes.- 6.15 Some notation.- 6.16 Bibliographic notes.- Problems.- 7 Logratio analysis of compositions.- 7.1 Introduction.- 7.2 Estimation of ? and ?.- 7.3 Validation: tests of logistic normality.- 7.4 Hypothesis testing strategy and techniques.- 7.5 Testing hypotheses about ? and ?.- 7.6 Logratio linear modelling.- 7.7 Testing logratio linear hypotheses.- 7.8 Further aspects of logratio linear modelling.- 7.9 An application of logratio linear modelling.- 7.10 Predictive distributions, atypicality indices and outliers.- 7.11 Statistical discrimination.- 7.12 Conditional compositional modelling.- 7.13 Bibliographic notes.- Problems.- 8 Dimension-reducing techniques.- 8.1 Introduction.- 8.2 Crude principal component analysis.- 8.3 Logcontrast principal component analysis.- 8.4 Applications of logcontrast principal component analysis.- 8.5 Subcompositional analysis.- 8.6 Applications of subcompositional analysis.- 8.7 Canonical component analysis.- 8.8 Bibliographic notes.- Problems.- 9 Bases and compositions.- 9.1 Fundamentals.- 9.2 Covariance relationships.- 9.3 Principal and canonical component comparisons.- 9.4 Distributional relationships.- 9.5 Compositional invariance.- 9.6 An application to household budget analysis.- 9.7 An application to clinical biochemistry.- 9.8 Reappraisal of an early shape and size analysis.- 9.9 Bibliographic notes.- Problems.- 10 Subcompositions and partitions.- 10.1 Introduction.- 10.2 Complete subcompositional independence.- 10.3 Partitions of order 1.- 10.4 Ordered sequences of partitions.- 10.5 Caveat.- 10.6 Partitions of higher order.- 10.7 Bibliographic notes.- Problems.- 11 Irregular compositional data.- 11.1 Introduction.- 11.2 Modelling imprecision in compositions.- 11.3 Analysis of sources of imprecision.- 11.4 Imprecision and tests of independence.- 11.5 Rounded or trace zeros.- 11.6 Essential zeros.- 11.7 Missing components.- 11.8 Bibliographic notes.- Problems.- 12 Compositions in a covariate role.- 12.1 Introduction.- 12.2 Calibration.- 12.3 A before-and-after treatment problem.- 12.4 Experiments with mixtures.- 12.5 An application to firework mixtures.- 12.6 Classification from compositions.- 12.7 An application to geological classification.- 12.8 Bibliographic notes.- Problems.- 13 Further distributions on the simplex.- 13.1 Some generalizations of the Dirichlet class.- 13.2 Some generalizations of the logistic normal classes.- 13.3 Recapitulation.- 13.4 The Ad(?,B) class.- 13.5 Maximum likelihood estimation.- 13.6 Neutrality and partition independence.- 13.7 Subcompositional independence.- 13.8 A generalized lognormal gamma distribution with compositional in variance.- 13.9 Discussion.- 13.10 Bibliographic notes.- Problems.- 14 Miscellaneous problems.- 14.1 Introduction.- 14.2 Multi-way compositions.- 14.3 Multi-stage compositions.- 14.4 Multiple compositions.- 14.5 Kernel density estimation for compositional data.- 14.6 Compositional stochastic processes.- 14.7 Relation to Bayesian statistical analysis.- 14.8 Compositional and directional data.- Problems.- Appendices.- A Algebraic properties of elementary matrices.- B Bibliography.- C Computer software for compositional data analysis.- D Data sets.- Author index.

4,162 citations

••

01 Jul 19874,051 citations