scispace - formally typeset
Search or ask a question

Showing papers in "Sociological Methodology in 1995"


Journal ArticleDOI
TL;DR: In this article, a Bayesian approach to hypothesis testing, model selection, and accounting for model uncertainty is presented, which is straightforward through the use of the simple and accurate BIC approximation, and it can be done using the output from standard software.
Abstract: It is argued that P-values and the tests based upon them give unsatisfactory results, especially in large samples. It is shown that, in regression, when there are many candidate independent variables, standard variable selection procedures can give very misleading results. Also, by selecting a single model, they ignore model uncertainty and so underestimate the uncertainty about quantities of interest. The Bayesian approach to hypothesis testing, model selection, and accounting for model uncertainty is presented. Implementing this is straightforward through the use of the simple and accurate BIC approximation, and it can be done using the output from standard software. Specific results are presented for most of the types of model commonly used in sociology. It is shown that this approach overcomes the difficulties with P-values and standard model selection procedures based on them. It also allows easy comparison of nonnested models, and permits the quantification of the evidence for a null hypothesis of interest, such as a convergence theory or a hypothesis about societal norms.

6,100 citations


Journal ArticleDOI
TL;DR: In this paper, structural equation modeling analysis is used for the analysis of large-scale surveys using complex sample designs, where the authors identify several recent methodological lines of inquiry which taken together provide a powerful and general statistical basis for a complex sample.
Abstract: Large-scale surveys using complex sample designs are frequently carried out by government agencies. The statistical analysis technology available for such data is, however, limited in scope. This study investigates and further develops statistical methods that could be used in software for the analysis of data collected under complex sample designs. First, it identifies several recent methodological lines of inquiry which taken together provide a powerful and general statistical basis for a complex sample, structural equation modeling analysis. Second, it extends some of this research to new situations of interest. A Monte Carlo study that empirically evaluates these techniques on simulated data comparable to those in largescale complex surveys demonstrates that they work well in practice. Due to the generality of the approaches, the methods cover not only continuous normal variables but also continuous nonnormal variables and dichotomous variables. Two methods designed to take into account the complex sample structure were

1,407 citations


Journal ArticleDOI
TL;DR: The complete set of all units comprising such datasets an "apparent population" is called, which is to say, all the data there are for the substantive issues being addressed.
Abstract: It is common in sociological publications to find statistical inference applied to datasets that are not samples in the usual sense. For the substantive issues being addressed, the data on hand are all the data there are. No additional data could be collected, even in principle. In this paper, we call the complete set of all units comprising such datasets an "apparent population." Consider the following examples.

147 citations


Journal ArticleDOI
TL;DR: In this article, a framework for analyzing structural equation models (SEMs) that include nonlinear functions of latent or a mix of latent and observed variables in their equations is presented. But their methods are limited by the required distributional assumptions, by their complexity in implementation, and by the unknown distributions of the estimators.
Abstract: Busemeyer and Jones (1983) and Kenny and Judd (1984) proposed methods to include interactions of latent variables in structural equation models (SEMs). Despite the value of these works, their methods are limited by the required distributional assumptions, by their complexity in implementation, and by the unknown distributions of the estimators. This paper provides a framework for analyzing SEMs ("LISREL" models) that include nonlinear functions of latent or a mix of latent and observed variables in their equations. It permits such nonlinear functions in equations that are part of latent variable models or measurement models. I estimate the coefficient parameters with a two-stage least squares estimator that is consistent and asymptotically normal with a known asymptotic covariance matrix. The observed random variables can come from nonnormal distributions. Several hypothetical cases and an empirical example illustrate the method.

118 citations


Journal ArticleDOI
TL;DR: Raftery's paper as discussed by the authors addresses two important problems in the statistical analysis of social science data: (1) choosing an appropriate model when so much data are available that standard P-values reject all parsimonious models; and (2) making estimates and predictions when there are not enough data available to fit the desired model using standard techniques.
Abstract: Raftery's paper addresses two important problems in the statistical analysis of social science data: (1) choosing an appropriate model when so much data are available that standard P-values reject all parsimonious models; and (2) making estimates and predictions when there are not enough data available to fit the desired model using standard techniques. For both problems, we agree with Raftery that classical frequentist methods fail and that Raftery's suggested methods based on BIC can point in better directions. Nevertheless, we disagree with his solutions because, in principle, they are still directed off-target and only by serendipity manage to hit the target in special circumstances. Our primary criticisms of Raftery's proposals are that (1) he promises the impossible: the selection of a model that is adequate for specific purposes without consideration of those purposes; and (2) he uses the same limited tool for model averaging as for model selection, thereby depriving himself of the benefits of the broad range of available Bayesian procedures. Despite our criticisms, we applaud Raftery's desire to improve practice by providing methods and computer programs for all to use and applying these methods to real problems. We believe that his paper makes a positive contribution to social science, by focusing on

108 citations


Journal ArticleDOI
TL;DR: A reliability coefficient applicable to a largely neglected component of data-generating processes: the identification of units for analysis within essentially continuous phenomena is developed.
Abstract: We develop here a reliability coefficient applicable to a largely neglected component of data-generating processes: the identification of units for analysis within essentially continuous phenomena. Whether we count evidence found in texts, examine relevant sections of videotapes, or punctuate interaction sequences suitable for further analysis, unitizing underlies all quantifications. Measures of the reliability of this process provide researchers with information regarding the degree to which data may be trusted. In compliance with widely used reliability standards, we conceive the new measure as a member of a family of reliability coefficients a-which have so far been applicable only to various coding tasks-and derive it accordingly.

92 citations


Journal ArticleDOI
TL;DR: In this paper, the authors report the results of a series of Monte Carlo simulation studies that investigate estimation issues for heterogeneous diffusion models, focusing on the properties of maximum likelihood estimators, considering variation across parameter values and different forms of model misspecification.
Abstract: Heterogeneous diffusion models let one combine the analysis of intrinsic propensities with that of intrapopulation contagion, and to disaggregate contagion effects into individual susceptibilities, the infectiousness of prior adopters, and the social proximity of prior-potential adopter pairs. This paper reports the results of a series of Monte Carlo simulation studies that investigate estimation issues for this class of models. Graphical analysis of population-level hazard rates is shown to provide little insight into these processes. We focus on the properties of maximum likelihood estimators, considering variation across parameter values and differentforms of model misspecification. When models are correctly specified, we find few conditions under which estimation appears problematic. Difficult cases involve binary networks where network linkages have very strong effects or network density is high. Estimation deteriorates in

73 citations


Journal ArticleDOI
TL;DR: To illustrate the nonobvious implications of a verbal theory, simulation experiments are performed on the conflict theory that rulers seek domestic legitimacy through external power-prestige.
Abstract: Verbally formulated theory is usually comparative statics: For example, conflict causes solidarity or a military-industrial complex causes warfare. A difficulty with such theory is that systems of multiple causes with feedbacks among them can lead to unexpected outcomes. The implications of a theory for historical change can be discovered by investigating its properties as a continuous-state continuous-time dynamic system. Theory-discovery by simulation experiments operates on a lower level of generality than the direct mathematical solution of sets of equations, but it is advantageous (1) when systems of equations are difficult or impossible to solve directly; (2) when equilibria, maxima, and minima are not important features of empirical social systems, or when our main concern is the trajectory of the system in disequilibrium-that is, during historical change; (3) as a method accessible to large numbers of sociologists who lack high levels of mathematical virtuosity. To illustrate the nonobvious implications of a verbal theory, simulation experiments are performed on the conflict theory that rulers seek domestic legitimacy through external power-prestige. The his-

62 citations


Journal ArticleDOI
TL;DR: Significance tests have long been controversial in sociology and in the social sciences (e.g., Morrison and Henkel 1970). Despite critiques of them, they still are a near universal feature of quantitative analyses in sociology as mentioned in this paper.
Abstract: Significance tests have long been controversial in sociology and in the social sciences (e.g., Morrison and Henkel 1970). Despite critiques of them, they still are a near universal feature of quantitative analyses in sociology. Berk, Western, and Weiss (1995) (hereafter BWW) have reopened this issue in the context of what they call "apparent populations." These are datasets that are "nonreplicable" such as data for all SMSAs, all states in the United States, or all nations in the developing world. Most of us who work with such samples have not adequately addressed the ambiguity in the meaning of significance tests in such contexts. BWW have done us a service by forcing us to rethink this issue. BWW have two major parts to their paper. One is a critique of the rationales and current practices of descriptive analysis and significance testing in apparent populations. The second part suggests that Bayesian analysis is a superior alternative. Most of my comments concentrate on the first part, though I do have some observations on their Bayesian proposal. My comments are directed at a number of claims in their paper: (1) there are fatal flaws in the justifications for significance testing in apparent populations, (2) the super population or randomization models are the only rationales for significance tests, (3) treating an apparent population as a true population implies a deterministic world, (4) apparent populations are not replicable, (5) the examples that they give can be characterized as apparent populations, and (6) a Bayesian approach has far fewer problems than the alternatives. The sections that follow address each of these.

42 citations


Journal ArticleDOI
TL;DR: In this paper, the authors develop a theoretical expectation that missing data in organizational surveys will normally be nonrandom relative to important organizational characteristics and develop suggestions for future organizational survey design to minimize missing data problems, and outline approaches for analyses of organizational data in the presence of selection biases associated with unit and item nonresponse.
Abstract: Nonrandom missing data can distort estimates of substantive relationships. In elaborating this principle for organizational research, we first develop a theoretical expectation that missing data in organizational surveys will normally be nonrandom relative to important organizational characteristics. We summarize empirical findings from a previous paper that demonstrate that unit nonresponse is a predictable outcome of organizational processes. Next, we examine expectations about organizational processes and item nonresponse and find that nonresponse is systematically associated with variables that tap organizational authority, capacity and motive to respond. In light of thesefindings, we develop suggestions for future organizational survey design to minimize missing data problems. We also outline approaches for analyses of organizational data in the presence of selection biases associated with unit and item nonresponse.

32 citations


Journal ArticleDOI
TL;DR: Hauser and Rubin this paper discuss the importance of model selection as an essential part of building a realistic model in social research, while they view it as "relatively unimportant".
Abstract: I would like to thank Hauser and Gelman and Rubin for their thoughtful comments. Hauser's discussion is very useful because it identifies new ways in which Bayesian model selection can shed light on scientific debates, and because it points to directions for further research. Gelman and Rubin and I agree that classical methods fail, that Bayesian model selection can point in better directions, and that Bayesian model averaging is better than using a single model. We also have disagreements, however. I have found model selection to be an essential part of the task of building a realistic model in social research, while they view it as "relatively unimportant." We have different views of what it means for a model "not to fit the data." Also, Gelman and Rubin suggest in several places that BIC is based on a uniform, improper prior, but this is not the case. Several other points are discussed below.

Journal ArticleDOI
TL;DR: In this article, a new index of structure for the analysis and summary description of models for mobility tables and other kinds of crossclassifications is proposed, which measures model misfit (or lack of fit) as the minimum possible proportion of the population outside the specified model.
Abstract: A new index of structure for the analysis and summary description of models for mobility tables and other kinds of crossclassifications is proposed. For a given model, this index measures model misfit (or lack of fit) as the minimum possible proportion of the population outside the specified model. It measures structure in relation to a given model or hypothesis. The framework that gives rise to this index consists of embedding a specified model (independence, quasi-independence,

Journal ArticleDOI
TL;DR: Since frequentist approaches rest on the imagery of many repeated samples from a single population, what Rubin calls "unique datasets" undermine frequentist inference, and Bayesian inference is possible and may produce satisfactory inference.
Abstract: For many datasets common in sociology (and elsewhere), the imagery of a large number of random samples from a real population does serious violence to the empirical setting being studied. The data in hand are all the data there are, or that could be. We call such data an apparent population. Since frequentist approaches rest on the imagery of many repeated samples from a single population, what Rubin calls "unique datasets" undermine frequentist inference. In contrast, Bayesian inference is possible and may produce satisfactory inference.

Journal ArticleDOI
TL;DR: In their provocative paper in this volume, Berk, Western, and Weiss (hereafter BWW) see "promise in the Bayesian approach" to statistical inference for apparent populations as a challenge to conventional Bayesian inference.
Abstract: In their provocative paper in this volume, Berk, Western, and Weiss (hereafter BWW) see "promise in the Bayesian approach" (p. 453) to statistical inference for apparent populations. As they note, however, the "true test is whether Bayesian inference improves the science" (p. 453), and this we can determine only by applying Bayesian methods in actual research. I am more skeptical than they that the sort of Bayesian inference they suggest will improve the science. Indeed, I fear that their Bayesian methods sometimes will make matters worse.

Journal ArticleDOI
TL;DR: The Bayesian Information Criterion (BIC) as mentioned in this paper has been used to guide model selection in the context of structural equation models, and it has been shown to improve decision-making in discrete multivariate analysis.
Abstract: About a decade ago, after David Grusky and I had suffered endlessly over the choices among alternative models in our comparative analyses of social mobility (Grusky and Hauser 1984), our satisfaction with the product of those analyses was temporarily shattered by the news that Adrian Raftery (1986) would publish a methodological comment on the work. What would he have to say? And could we defend our work? Would we take the standard defensive posture of sociologists whose work was under criticism? In actuality, Raftery's brief and elegant comment turned out not to require a defense at all. Rather, it outlined a superior way to think about the decisions that we had faced-namely, how to choose among alternative models in a sample so large that standard inferential methods would lead us to reject all but a saturated model. Raftery's proposal to use the Bayesian information criterion (BIC) relieved, rather than increased our discomfort at having ignored standard rules of statistical inference, and it even supported some-though not all-of the decisions that we had made.1 Pleased as I was that some parts of the Grusky-Hauser analysis survived Raftery's scrutiny, I am happier yet that our efforts prompted the introduction of a simple and defensible rule of thumb that could be used to improve decisions in discrete multivariate analysis and structural equation models (Raftery 1993). For the past several years, I have routinely used BIC as a guide in model selection (Hauser and Wong 1989; Wong and Hauser 1992; Hout and Hauser 1992; Hauser 1993; Hauser and Phang 1993; Kuo and Hauser 1995a, 1995b), sometimes without showing the details of inferential procedures in the text.

Journal ArticleDOI
TL;DR: In this article, a framework for analyzing interdependencies between events over an individual's life course is proposed by means of hazard-rate models, where the amount of schooling a person obtains may depend on her family behavior, and vice versa the family behavior may depend depending on whether the person is in school or not.
Abstract: The idea of interdependence between events over an individual's life course is widespread. For example, the amount of schooling a person obtains may depend on her family behavior, and vice versa the family behavior may depend on whether the person is in school or not. This paper proposes a framework for analyzing such interdependencies by means of hazard-rate models. First, I discuss two existing approaches to studying interdependencies, the coupled and the Kalbfleisch and Prentice approaches. Then, I propose a new approach

Journal ArticleDOI
TL;DR: The authors (Berk, Western, and Weiss) are to be congratulated for addressing a challenging problem and for providing a variety of comments relevant to it.
Abstract: The authors (Berk, Western, and Weiss) are to be congratulated for addressing a challenging problem and for providing a variety of comments relevant to it. At the outset, it is important to note that the problem of "apparent populations" is not special to social science. Many fields of science deal with unique datasets, which in no sense are imbedded in randomization-based surveys or experiments, nor which can be viewed as drawn from independent and identically distributed actual replications. Consider cosmology and the origins of the universe, studies of atmospheric pollution, and discussions of the evolution of life on earth (for a fascinating account of William James on precisely this issue of unique datasets, apparent populations, and the evolution of man, see Gould 1988, p. 17). "Could a hypothesized model, with allowance for both its stochastic components and the imperfect estimation of its unknown constants (parameters), have plausibly generated the observed dataset?"-This question is a primitive of science, more fundamental than mathematical statisticians' formalizations of it. Such questions can be legitimately asked of any dataset, whether unique or one of many that was chosen by the scientist according to some randomized design. Addressing such questions with mathematical rigor is more

Journal ArticleDOI
TL;DR: In this article, the exact variance of indirect effects in recursive linear models with no latent variables was obtained using the delta method. But the variance was not asymptotic as Sobel's estimator.
Abstract: Using the delta method, Sobel obtained the asymptotic variance of indirect effects in linear structural equation models. Using a reduced-form parameterization and a conditioning argument, I obtain the exact variance of indirect effects in the special case of recursive linear models with no latent variables. I then show that a consistent estimator for the exact variance is identical to Sobel's estimator.