scispace - formally typeset
Search or ask a question

Showing papers on "Reliability (statistics) published in 2002"


Journal ArticleDOI
TL;DR: The authors argue that qualitative researchers should reclaim responsibility for reliability and validity by implementing verification strategies integral and self-correcting during the conduct of inquiry itself, which ensures the attainment of rigor using strategies inherent within each qualitative design, and moves the responsibility for incorporating and maintaining reliability and validation from external reviewers' judgements to the investigators themselves.
Abstract: The rejection of reliability and validity in qualitative inquiry in the 1980s has resulted in an interesting shift for "ensuring rigor" from the investigator’s actions during the course of the research, to the reader or consumer of qualitative inquiry. The emphasis on strategies that are implemented during the research process has been replaced by strategies for evaluating trustworthiness and utility that are implemented once a study is completed. In this article, we argue that reliability and validity remain appropriate concepts for attaining rigor in qualitative research. We argue that qualitative researchers should reclaim responsibility for reliability and validity by implementing verification strategies integral and self-correcting during the conduct of inquiry itself. This ensures the attainment of rigor using strategies inherent within each qualitative design, and moves the responsibility for incorporating and maintaining reliability and validity from external reviewers’ judgements to the investigators themselves. Finally, we make a plea for a return to terminology for ensuring rigor that is used by mainstream science.

4,980 citations


Journal ArticleDOI
TL;DR: The SAC's current conceptualization of eight key attributes of health status and QoL instruments and the criteria by which instruments would be reviewed on each of those attributes are offered.
Abstract: The field of health status and quality of life (QoL) measurement – as a formal discipline with a cohesive theoretical framework, accepted methods, and diverse applications – has been evolving for the better part of 30 years. To identify health status and QoL instruments and review them against rigorous criteria as a precursor to creating an instrument library for later dissemination, the Medical Outcomes Trust in 1994 created an independently functioning Scientific Advisory Committee (SAC). In the mid-1990s, the SAC defined a set of attributes and criteria to carry out instrument assessments; 5 years later, it updated and revised these materials to take account of the expanding theories and technologies upon which such instruments were being developed. This paper offers the SAC's current conceptualization of eight key attributes of health status and QoL instruments (i.e., conceptual and measurement model; reliability; validity; responsiveness; interpretability; respondent and administrative burden; alternate forms; and cultural and language adaptations) and the criteria by which instruments would be reviewed on each of those attributes. These are suggested guidelines for the field to consider and debate; as measurement techniques become both more familiar and more sophisticated, we expect that experts will wish to update and refine these criteria accordingly.

1,965 citations


Journal ArticleDOI
TL;DR: In this paper, the authors demonstrate how substantive researchers can use a Monte Carlo study to decide on sample size and determine power, using two models, a confirmatory factor analysis (CFA) model and a growth model.
Abstract: A common question asked by researchers is, "What sample size do I need for my study?" Over the years, several rules of thumb have been proposed. In reality there is no rule of thumb that applies to all situations. The sample size needed for a study depends on many factors, including the size of the model, distribution of the variables, amount of missing data, reliability of the variables, and strength of the relations among the variables. The purpose of this article is to demonstrate how substantive researchers can use a Monte Carlo study to decide on sample size and determine power. Two models are used as examples, a confirmatory factor analysis (CFA) model and a growth model. The analyses are carried out using the Mplus program (Muthen& Muthen 1998).

1,728 citations


Journal ArticleDOI
01 Jan 2002
TL;DR: The sequential optimization and reliability assessment (SORA) as mentioned in this paper method employs a single-loop strategy, where a serial of cycles of optimization and assessment is employed, and the reliability assessment is decoupled from each other.
Abstract: Probabilistic optimization design offers tools for making reliable decisions with the consideration of uncertainty associated with design variables/parameters and simulation models. In a probabilistic design, such as reliability-based design and robust design, the design feasibility is formulated probabilistically such that the probability of the constraint satisfaction (reliability) exceeds the desired limit. The reliability assessment for probabilistic constraints often involves an iterative procedure; therefore, two loops are involved in a probabilistic optimization. Due to the double-loop procedure, the computational demand is extremely high. To improve the efficiency of a probabilistic design, a novel method – sequential optimization and reliability assessment (SORA) is developed in this paper. The SORA method employs a single-loop strategy where a serial of cycles of optimization and reliability assessment is employed. In each cycle optimization and reliability assessment are decoupled from each other; no reliability assessment is required within optimization and the reliability assessment is only conducted after the optimization. The key concept of the proposed method is to shift the boundaries of violated deterministic constraints (with low reliability) to the feasible direction based on the reliability information obtained in the previous cycle. Hence the design is quickly improved from cycle to cycle and the computational efficiency is improved significantly. Two engineering applications, the reliability-based design for vehicle crashworthiness of side impact and the integrated reliability and robust design of a speed reducer, are presented to demonstrate the effectiveness of the SORA method.Copyright © 2002 by ASME

934 citations


Journal Article
TL;DR: The application of a new version of WebQual to Internet bookstores: Amazon, BOL, and the Internet Bookshop is reported on.
Abstract: WebQual is a method for assessing the quality of Web sites. The method has been developed iteratively through application in various domains, including Internet bookstores and Internet auction sites. In this paper we report on the application of a new version of WebQual to Internet bookstores: Amazon, BOL, and the Internet Bookshop. WebQual draws on previous work in three areas: Web site usability, information quality, and service interaction quality to provide a rounded framework for assessing e-commerce offerings. Although WebQual is grounded in the subjective impressions of Web site users, the data collected lends itself to quantitative analysis and the production of e-commerce metrics such as the WebQual Index. The reliability of the instrument is examined and core constructs of Web site quality identified using factor analysis. The role of WebQual in assessing an organization’s e-commerce capability is discussed.

900 citations


Journal ArticleDOI
TL;DR: The capacity reliability analysis is extended by providing a comprehensive methodology, which combines reliability and uncertainty analysis, network equilibrium models, sensitivity analysis of equilibrium network flow and expected performance measure, as well as Monte Carlo methods, to assess the performance of a degradable road network.
Abstract: Existing reliability studies of road networks are mainly limited to connectivity and travel time reliability and may not be sufficient for a comprehensive network performance measure. Recently Chen et al. (J. Adv. Transp. 33 (2) (1999) 183–200) introduced capacity reliability as a new network performance index. It is defined as the probability that the network can accommodate a certain traffic demand at a required service level, while accounting for drivers' route choice behavior. The proposed capacity reliability index includes connectivity reliability as a special case and also provides travel time reliability as a side product. This paper extends the capacity reliability analysis by providing a comprehensive methodology, which combines reliability and uncertainty analysis, network equilibrium models, sensitivity analysis of equilibrium network flow and expected performance measure, as well as Monte Carlo methods, to assess the performance of a degradable road network. Numerical results are also provided to demonstrate the feasibility of the proposed framework.

554 citations


Journal ArticleDOI
TL;DR: Empirical evidence collected from information system professionals demonstrated a structure similar to previously published studies with adequate reliability, convergent validity, and discriminant validity for the SERVQUAL instrument.
Abstract: There has been much debate as of late over the use of the SERVQUAL instrument to measure Information Systems service quality. Detractors argue that the difference score leads to unreliable measures and that the dimensionality and validity is erratic. Proponents argue for the diagnostic power of the gap between expectations and perceived delivery while demonstrating some empirical stability and reliability. To extend the discussion requires the examination of the instrument from the viewpoint of the information system professional. Importantly, a large variety of samples must view the instrument and measures in the same light for the instrument to have applicability. Likewise, analysis of differences between users and providers requires that both populations have similar structural views of the instrument. Empirical evidence collected from information system professionals demonstrated a structure similar to previously published studies with adequate reliability, convergent validity, and discriminant validity. The structure is the same as is found for a gap between users and IS professionals.

531 citations


Journal ArticleDOI
TL;DR: A reliable, practical, and easy-to-use method for collecting detailed "street-level" data on physical environmental factors that are potential influences on walking in local neighborhoods is described.

484 citations


Journal ArticleDOI
TL;DR: In this article, the authors present an empirical comparison of several market research techniques for measuring consumers' willingness to pay (WTP) in estimating demand for private and public goods and in designing optimal price schedules.
Abstract: Economists, psychologists, and marketing researchers rely on measures of consumers’ willingness to pay (WTP) in estimating demand for private and public goods and in designing optimal price schedules. Existing market research techniques for measuring WTP differ in whether they provide an incentive to consumers to reveal their true WTP and in whether they simulate actual point-of-purchase contexts. The authors present an empirical comparison of several procedures for eliciting WTP that are applicable directly at the point of purchase. In particular, the authors test the applicability of Becker, DeGroot, and Marschak’s (1964) well-known incentive-compatible procedure for assessing the utility of lotteries to measuring consumers’ WTP. In three studies, the authors explore the reliability, validity, and feasibility of the procedure and show that it yields lower WTP estimates than do non-incentive-compatible methods such as open-ended and double-bounded contingent valuation. They show experimentally t...

476 citations


Journal ArticleDOI
TL;DR: In this article, the authors examined the application of neural networks (NN) to reliability-based structural optimization of large-scale structural systems, where the failure of the structural system is associated with the plastic collapse.

471 citations


Journal Article
TL;DR: This culture-specific study shows that this adaptation of the brief form is a good alternative to the long form of the WHOQOL questionnaire for use in Taiwan and the results of validity and reliability testing are shown.

Journal ArticleDOI
TL;DR: In this article, the authors focus on comparative evaluations and characterizations and the performance of the homogeneous assays for LDL-C determination, focusing on their reliability and specificity, especially in samples with atypical lipoproteins.
Abstract: Background: Because LDL-cholesterol (LDL-C) is a modifiable risk for coronary heart disease, its routine measurement is recommended in the evaluation and management of hypercholesterolemia We critically examine here the new homogeneous assays for direct determination of LDL-C Approach: This review relies on published studies and data of the authors using research and routine methods for LDL-C determination We review experience with methods from their earlier use in lipid research laboratories through the transition to routine clinical testing and the recent development of homogeneous assays We focus on comparative evaluations and characterizations and the performance of the assays Content: Homogeneous assays seem to be able to meet current National Cholesterol Education Program (NCEP) requirements for LDL-C testing for precision (CV <4%) and accuracy (bias <4%), when samples collected from nonfasting individuals are used In addition, all five currently available assays have been certified by the Cholesterol Reference Methods Laboratory Network The homogeneous methods also appear to better classify individuals into NCEP cutpoints than the Friedewald calculation However, the limited evaluations to date raise questions about their reliability and specificity, especially in samples with atypical lipoproteins Conclusions: Available evidence supports recommending the homogeneous assays for LDL-C to supplement the Friedewald calculation in those cases where the calculation is known to be unreliable, eg, triglycerides >4000 mg/L Before the homogeneous assays can be confidently recommended to replace the calculation in routine practice, more evaluation is needed

Journal ArticleDOI
TL;DR: In this paper, the results from a co-ordinated study, known as the World Wide Failure Exercise, whereby 12 of the leading theories for predicting failure in composite laminates have been tested against experimental evidence.

Journal ArticleDOI
TL;DR: Reliability and validity of a new assessment procedure, the Southampton Hand Assessment Procedure (SHAP), that allows contextual results of hand function to be obtained readily in a clinical environment are developed.

Journal ArticleDOI
TL;DR: A fuzzy-logic-based method for FMEA is presented and a platform for a fuzzy expert assessment is integrated with the proposed system to overcome the potential difficulty in sharing information among experts from various disciplines.

Journal ArticleDOI
TL;DR: In this paper, the authors considered the development and validation of a measurement instrument of brand equity based on the value ascribed to brands by consumers, and the results indicated the existence of four basic dimensions of brand utilities: product functional utility, product symbolic utility, brand name functional utility and brand name symbolic utility.
Abstract: This work considers the development and validation of a measurement instrument of brand equity based on the value ascribed to brands by consumers The results obtained indicate the existence of four basic dimensions of brand utilities: product functional utility, product symbolic utility, brand name functional utility, brand name symbolic utility The various tests employed show a reasonable degree of reliability and validity of the proposed scale for the sports shoes sector

Journal ArticleDOI
TL;DR: Some support for a simplified approach to measuring SES was found and reliability was high, but the weakest agreement across measures was found when families had one wage earner who was female.
Abstract: This study investigated issues related to commonly used socioeconomic status (SES) measures in 140 participants from three cities (Atlanta, Boston, and Toronto) in two countries (United States and Canada). Measures of SES were two from the United States (four-factor Hollingshead scale, Nakao and Treas scale) and one from Canada (Blishen, Carroll, and Moore scale). Reliability was examined both within (interrater agreement) and across (intermeasure agreement) measures. Interrater reliability and classification agreement was high for the total sample (ranger = .86 to .91), as were intermeasure correlations and classification agreement (range r = .81 to .88). The weakest agreement across measures was found when families had one wage earner who was female. Validity data for these SES measures with academic and intellectual measures also were obtained. Some support for a simplified approach to measuring SES was found. Implications of these findings for the use of SES in social and behavioral science research are discussed.

Journal Article
TL;DR: In this article, the authors compared the reliability, validity, and responsiveness of the motor subscale of the functional independence measure (FIM), the original 10 item Barthel index (BI), and the 5 item short form BI (BI-5) in inpatients with stroke receiving rehabilitation.
Abstract: Objectives: To compare the reliability, validity, and responsiveness of the motor subscale of the functional independence measure (FIM), the original 10 item Barthel index (BI), and the 5 item short form BI (BI-5) in inpatients with stroke receiving rehabilitation. Methods: 118 inpatients with stroke at a rehabilitation unit participated in the study. The patients were tested with the FIM motor subscale and original BI at admission to the rehabilitation ward and before discharge from the hospital. The distribution, internal consistency, concurrent validity, and responsiveness of each measure were examined. Results: The BI and FIM motor subscale showed acceptable distribution, high internal consistency (α coefficient ≥ 0.84), high concurrent validity (Spearman's correlation coefficient, rs ≥ 0.92, intraclass correlation coefficient (ICC) ≥ 0.83), and high responsiveness (standardised response mean ≥ 1.2, p < 0.001). The BI-5 exhibited a notable floor effect at admission but this was not found at discharge. The BI-5 showed acceptable internal consistency at admission and discharge (α coefficient ≥ 0.71). The concurrent validity of the BI-5 was poor to fair at admission (rs = 0.74, ICC ≤ 0.55) but was good at discharge (rs ≥ 0.92, ICC ≥ 0.74). It is noted that the responsiveness of the BI-5 was as high as that of the BI and the FIM motor subscale. Conclusions: The results showed that the BI and FIM motor subscale had very acceptable and similar psychometric characteristics. The BI-5 appeared to have limited discriminative ability at admission, particularly for patients with severe disability; otherwise the BI-5 had very adequate psychometric properties. These results may provide information useful in the selection of activities of daily living measures for both clinicians and researchers.

01 Jan 2002
TL;DR: Results suggest that self-monitoring has relevance for understanding many organizational concerns, including job performance and leadership emergence, and theory building and additional research are needed to better understand the construct-related inferences about self- monitoring personality.
Abstract: The validity of self-monitoring personality in organizational settings was examined. Meta-analyses were conducted (136 samples; total N 23,191) investigating the relationship between self-monitoring personality and work-related variables, as well as the reliability of various self-monitoring measures. Results suggest that self-monitoring has relevance for understanding many organizational concerns, including job performance and leadership emergence. Sample-weighted mean differences favoring male respondents were also noted, suggesting that the sex-related effects for self-monitoring may partially explain noted disparities between men and women at higher organizational levels (i.e., the glass ceiling). Theory building and additional research are needed to better understand the construct-related inferences about self-monitoring personality, especially in terms of the performance, leadership, and attitudes of those at top organizational levels.

Journal ArticleDOI
TL;DR: Data indicate that Jamar and Rolyan dynamometers measure grip strength equivalently and can be used interchangeably and therapists using the Rolyans are justified in using published normative data, which were collected with the Jamar dynamometer.
Abstract: This study compared the Jamar and Rolyan hydraulic dynamometers to determine their concurrent validity with known weights as well as their inter-instrument reliability and concurrent validity for measuring grip strength in a clinical setting. Thirty females and 30 males were tested on these two grip strength measurement devices using a repeated measure design. Results showed that the Jamar and Rolyan dynamometers have acceptable concurrent validity with known weights (that is, correlation coefficients were r > or = 0.9994), excellent inter-instrument reliability (that is, intraclass correlation coefficients ranged from 0.90 to 0.97) and strong concurrent validity (that is, no significant differences between dynamometers' scores). Data indicate that Jamar and Rolyan dynamometers measure grip strength equivalently and can be used interchangeably. Thus, therapists using the Rolyan dynamometer are justified in using published normative data, which were collected with the Jamar dynamometer.


Journal ArticleDOI
TL;DR: This article developed a 12-item Likert-type measure of collective efficacy in schools, designed to assess the extent to which a faculty believes in its conjoint capability in terms of its ability to achieve collective efficacy.
Abstract: The present study reports on the development of a 12-item Likert-type measure of collective efficacy in schools. Designed to assess the extent to which a faculty believes in its conjoint capability...

Journal ArticleDOI
TL;DR: A recent control scheme based on the cumulative quantity between observations of defects has been proposed which can be easily adopted to monitor the failure process for exponentially distributed inter-failure time and can detect process improvement even in a high-reliability environment.

Book
01 Jan 2002
TL;DR: The "Communication and Symbolic Behavior Scales Developmental Profile (CSBSDP)" as mentioned in this paper is an easy-to-use, norm-referenced screening and evaluation tool that measures the communicative competence of children with a functional communication age of 6 to 24 months and a chronological age from 6 months to 6 years.
Abstract: This helpful manual guide professionals through the process of administering, scoring, and interpreting the "Communication and Symbolic Behavior Scales Developmental Profile (CSBS DP ), " an easy-to-use, norm-referenced screening and evaluation tool that measures the communicative competence of children with a functional communication age of 6 to 24 months and a chronological age of 6 months to 6 years. The manual includes: information on how and why "CSBS DP " was developed and refined detailed, step-by-step instructions on how to administer and score each part of "CSBS DP: " the "Infant-Toddler Checklist, " the "Caregiver Questionnaire, " and the "Behavior Sample" a chapter on the technical characteristics of "CSBS DP, " including standardization, reliability, and validity helpful tips on putting caregivers at ease and encouraging the most communication from very young children an extensive companion to the "CSBS DP " tutorial videotapes, including completed Behavior Samples for the six children shown and comments on the sampling and scoring decisions on the forms guidelines, case studies, and sample letters to parents that help professionals interpret and report the results of "CSBS DP " With the clear instructions in this manual reinforced by practical tips, charts, case studies, and scoring practice professionals will use "CSBS DP " accurately and confidently with the children and families they serve.Available separately or as part of the "CSBS DP Complete Kit" are the other materials required to conduct a "CSBS DP " assessment. This manual is part of "CSBS DP ," an easy-to-use, norm-referenced screening and evaluation tool that helps determine the communicative competence (use of eye gaze, gestures, sounds, words, understanding, and play) of young children. "CSBS DP" is an ideal starting point for IFSP planning and can be used as a guide to indicate areas that need further assessment. Learn more about the whole CSBS DP system. "

Book
12 Aug 2002
TL;DR: In this article, Thompson and Vacha-Haase introduced the concept of score reliability and the nature of reliability, and provided guidelines for authors reporting score reliability estimates, as well as a brief introduction to generalizability theory and a note on the frequency of use of various types.
Abstract: Preface Part 1 Basic Concepts in Score Reliability Ch 1 Understanding Reliability and Coefficient Alpha, Really alpha, Really - Bruce Thompson Exercises - Bruce Thompson Ch 2 Correcting Effect Sizes for Score Reliability - Frank Baugh Ch 3 A Brief Introduction to Generalizability Theory - Bruce Thompson Ch 4 Reliability Methods: A Note on the Frequency of Use of Various Types - Thomas P. Hogan, Amy Benjamin, and Kristen L. Brezinski Ch 5 Confidence Intervals About Score Reliability Coefficients - Xitao Fan and Bruce Thompson Exercises - Bruce Thompson Part 2 The Nature of Reliability Ch 6 Guidelines for Authors Reporting Score Reliability Estimates - Bruce Thompson Ch 7 Reliability as Psychometrics versus Datametircs - Shlomo S. Sawilowsky Ch 8 Psychometrics is Datametrics: The Test is Not Reliable - Bruce Thompson and Tammi Vacha-Haase Ch 9 Reliability:Rejoinder to Thompson and Vacha-Haase - Shlomo S. Sawilowsky Part 3 Reliability Induction and Reporting Practices Ch 10 Sample Compositions and Variabilities in Published Studies versus Those in Test Manuals: Validity of Score Reliability Inductions - Tammi Vacha-Haase, Lori R. Kogan, and Bruce Thompson Ch 11 How Well Do Researchers Report Their Measures?: An Evaluation of Measurement in Published Educational Research - Dale Whittington Ch 12 The Degree of Congruence Between Test Standards and Test Documentation Within Journal Publications - Audrey L. Qualls and Angela D. Moss Exercises - Bruce Thompson Ch 13 Reliability Generalization: Exploring Variance in Measurement Error Affecting Score Reliability Across Studies - Tammi Vacha-Haase Ch 14 Assesing the Reliability of Beck Depression Inventory Scores: Reliability Generalization Across Studies - Ping Yin and Xitao Fan Ch 15 Measurement Error in "Big Five Factors" Personality Assessment: Reliability Generalization Across Studies and Measures - Chockalingam Viswesvaran and Deniz S. Ones Ch 16 Reliability Generalization of the NEO Personality Scales - John C. Caruso Exercises - Bruce Thompson Index About the Author

Journal ArticleDOI
TL;DR: The APARQ has acceptable to good reliability and acceptable validity, but further validation using other methods and in other population groups is required.
Abstract: BOOTH, M. L., A. D. OKELY, T. CHEY, and A. BAUMAN. The reliability and validity of the Adolescent Physical Activity Recall Questionnaire. Med. Sci. Sports Exerc., Vol. 34, No. 12, pp. 1986–1995, 2002.PurposeThis study assessed the test-retest reliability and validity of the Adolescent Physic

Journal ArticleDOI
TL;DR: In this article, a study on the probabilistic methodology for the estimation of the remaining life of pressurized pipelines containing active corrosion defects is presented, which is carried out using several already published failure pressure models.

Journal ArticleDOI
TL;DR: In this article, the spectral stochastic finite element method (SSFEM) is considered in conjunction with the first-order reliability method (FORM) and with importance sampling for finite element reliability analysis.

Journal ArticleDOI
TL;DR: In this article, the authors use decision making models to understand the behavior of different agreement metrics, such as observed agreement and specific agreement, in reliability studies that involve categorical data.

Journal ArticleDOI
TL;DR: In this paper, the SERVQUAL instrument is modified for a service setting and empirically tested and confirmed that it is appropriate for measuring internal service quality, including reliability, assurance, tangibles, empathy, responsiveness.
Abstract: Internal marketing is an important approach for fostering a service‐ and customer‐oriented culture in an organization. A critical component of internal marketing is the provision of internal service quality. While researchers have conducted studies of internal service quality, there has been no general agreement on the measurement of the concept. Work to date has attempted to use the SERVQUAL instrument as a tool for measuring internal service quality. Researchers have not, however, demonstrated that the instrument can be reasonably modified to measure internal service quality. The current study modified the SERVQUAL instrument for a service setting and empirically tested and confirmed that it is appropriate for measuring internal service quality. While previous research has not confirmed the validity and reliability of all five SERVQUAL dimensions in a service setting, the results of the current study confirmed that all five dimensions – reliability, assurance, tangibles, empathy, responsiveness – were distinct and conceptually clear.