scispace - formally typeset
Search or ask a question

Showing papers on "Reliability (statistics) published in 2011"


Journal ArticleDOI
TL;DR: In this paper, the authors developed guidelines for reporting reliability and agreement studies in interrater and intra-arater reliability and agreements, and proposed 15 issues that should be addressed when reporting such studies.

1,605 citations


Book
01 Aug 2011
TL;DR: This chapter discusses the development of a measurement instrument, field testing - item reduction and data structure, and systematic reviews of measurement properties Index.
Abstract: 1. Introduction 2. Concepts, theories and models, and types of measurements 3. The development of a measurement instrument 4. Field testing - item reduction and data structure 5. Reliability 6. Validity 7. Responsiveness 8. Interpretation 9. Systematic reviews of measurement properties Index.

1,262 citations


Journal ArticleDOI
TL;DR: An iterative approach based on Monte Carlo Simulation and Kriging metamodel to assess the reliability of structures in a more efficient way and is shown to be very efficient as the probability of failure obtained with AK-MCS is very accurate and this, for only a small number of calls to the performance function.

1,234 citations


Journal ArticleDOI
TL;DR: A measure, the Relationship Structures questionnaire of the Experiences in Close Relationships-Revised (ECR-RS), designed to assess attachment dimensions in multiple contexts is reported, and it is shown that ECR-RS scores are reliable and have a structure similar to those produced by other measures.
Abstract: Most research on adult attachment is based on the assumption that working models are relatively general and trait-like. Recent research, however, suggests that people develop attachment representations that are relationship-specific, leading people to hold distinct working models in different relationships. The authors report a measure, the Relationship Structures questionnaire of the Experiences in Close Relationships—Revised (ECR-RS; R. C. Fraley, N. G. Waller, & K. A. Brennan, 2000), that is designed to assess attachment dimensions in multiple contexts. Based on a sample of over 21,000 individuals studied online, it is shown that ECR-RS scores are reliable and have a structure similar to those produced by other measures. In Study 2 (N 388), it is shown that relationship-specific measures of attachment generally predict intra- and interpersonal outcomes better than broader attachment measures but that broader measures predict personality traits better than relationship-specific measures. Moreover, it is demonstrated that differentiation in working models is not related to psychological outcomes independently of mean levels of security.

814 citations


Journal ArticleDOI
TL;DR: The iteratively developed magnetic resonance imaging Osteoarthritis Knee Score shows very good to excellent reliability for the large majority of features assessed and further iterative development and research will include assessment of its validation and responsiveness.

677 citations


Book
25 Jan 2011
TL;DR: This chapter discusses applications of the Delphi in Nursing and Health Research, as well as analysing data from a Delphi and Reporting Results, and the importance of reliability and Validity.
Abstract: Preface. Acknowledgements. 1 The Delphi Technique. 2 Debates, Criticisms and Limitations of the Delphi. 3 Applications of the Delphi in Nursing and Health Research. 4 How to Get Started with the Delphi Technique. 5 Conducting the Research Using the Delphi Technique. 6 Analysing Data from a Delphi and Reporting Results. 7 Reliability and Validity. 8 Ethical Considerations. 10 A Modified Delphi Case Study. 11 e-Delphi Case Study. Annotated Bibliography. References. Index.

622 citations


Journal ArticleDOI
TL;DR: The reliability of the MinimaxX accelerometer is acceptable both within and within devices under controlled laboratory conditions, and between devices during field testing, suggesting that accelerometers can detect changes or differences in physical activity during Australian football.
Abstract: Purpose: To assess the reliability of triaxial accelerometers as a measure of physical activity in team sports. Methods: Eight accelerometers (MinimaxX 2.0, Catapult, Australia) were attached to a hydraulic universal testing machine (Instron 8501) and oscillated over two protocols (0.5 g and 3.0 g) to assess within- and between-device reliability. A static assessment was also conducted. Secondly, 10 players were instrumented with two accelerometers during Australian football matches. The vector magnitude was calculated, expressed as Player load and assessed for reliability using typical error (TE) ± 90% confidence intervals (CI), and expressed as a coefficient of variation (CV%). The smallest worthwhile difference (SWD) in Player load was calculated to determine if the device was capable of detecting differences in physical activity. Results: Laboratory: Within- (Dynamic: CV 0.91 to 1.05%; Static: CV 1.01%) and between-device (Dynamic: CV 1.02 to 1.04%; Static: CV 1.10%) reliability was acceptable across ...

484 citations


Journal ArticleDOI
TL;DR: The objective was to develop guidelines for reporting reliability and agreement studies and the proposed guidelines intend to improve the quality of reporting.

481 citations


Book ChapterDOI
28 Jan 2011

440 citations


Journal ArticleDOI
TL;DR: In this paper, the authors highlight recent accomplishments in the life-cycle performance assessment, maintenance, monitoring, management and optimisation of structural systems under uncertainty, and identify challenges in this area.
Abstract: Our knowledge to model, analyse, design, maintain, monitor, manage, predict and optimise the life-cycle performance of structures and infrastructures under uncertainty is continually growing. However, in many countries, including the United States, the civil infrastructure is no longer within desired levels of performance and safety. Decisions regarding civil infrastructure systems should be supported by an integrated reliability-based life-cycle multi-objective optimisation framework by considering, among other factors, the likelihood of successful performance and the total expected cost accrued over the entire life-cycle. The primary objective of this paper is to highlight recent accomplishments in the life-cycle performance assessment, maintenance, monitoring, management and optimisation of structural systems under uncertainty. Challenges are also identified.

379 citations


Journal ArticleDOI
TL;DR: The aim of the present paper is to develop a strategy for solving reliability-based design optimization (RBDO) problems that remains applicable when the performance models are expensive to evaluate.
Abstract: The aim of the present paper is to develop a strategy for solving reliability-based design optimization (RBDO) problems that remains applicable when the performance models are expensive to evaluate. Starting with the premise that simulation-based approaches are not affordable for such problems, and that the most-probable-failure-point-based approaches do not permit to quantify the error on the estimation of the failure probability, an approach based on both metamodels and advanced simulation techniques is explored. The kriging metamodeling technique is chosen in order to surrogate the performance functions because it allows one to genuinely quantify the surrogate error. The surrogate error onto the limit-state surfaces is propagated to the failure probabilities estimates in order to provide an empirical error measure. This error is then sequentially reduced by means of a population-based adaptive refinement technique until the kriging surrogates are accurate enough for reliability analysis. This original refinement strategy makes it possible to add several observations in the design of experiments at the same time. Reliability and reliability sensitivity analyses are performed by means of the subset simulation technique for the sake of numerical efficiency. The adaptive surrogate-based strategy for reliability estimation is finally involved into a classical gradient-based optimization algorithm in order to solve the RBDO problem. The kriging surrogates are built in a so-called augmented reliability space thus making them reusable from one nested RBDO iteration to the other. The strategy is compared to other approaches available in the literature on three academic examples in the field of structural mechanics.

Journal ArticleDOI
TL;DR: In this article, a multi-objective function is presented to determine the optimal locations to place DGs in distribution system to minimize power loss of the system and enhance reliability improvement and voltage profile.

Journal ArticleDOI
TL;DR: This survey is designed for scholars and IT professionals approaching this field, reviewing existing tools and providing a view on the past, the present and the future of digital image forensics.
Abstract: Digital visual media represent nowadays one of the principal means for communication. Lately, the reliability of digital visual information has been questioned, due to the ease in counterfeiting both its origin and content. Digital image forensics is a brand new research field which aims at validating the authenticity of images by recovering information about their history. Two main problems are addressed: the identification of the imaging device that captured the image, and the detection of traces of forgeries. Nowadays, thanks to the promising results attained by early studies and to the always growing number of applications, digital image forensics represents an appealing investigation domain for many researchers. This survey is designed for scholars and IT professionals approaching this field, reviewing existing tools and providing a view on the past, the present and the future of digital image forensics.

Journal ArticleDOI
01 Mar 2011-Pain
TL;DR: It is concluded that standardized QST performed by trained examiners is a valuable diagnostic instrument with good test–retest and interobserver reliability within 2 days, suggesting that disease‐related systematic variance enhances reliability of QST.
Abstract: Quantitative sensory testing (QST) is an instrument to assess positive and negative sensory signs, helping to identify mechanisms underlying pathologic pain conditions. In this study, we evaluated the test-retest reliability (TR-R) and the interobserver reliability (IO-R) of QST in patients with sensory disturbances of different etiologies. In 4 centres, 60 patients (37 male and 23 female, 56.4±1.9years) with lesions or diseases of the somatosensory system were included. QST comprised 13 parameters including detection and pain thresholds for thermal and mechanical stimuli. QST was performed in the clinically most affected test area and a less or unaffected control area in a morning and an afternoon session on 2 consecutive days by examiner pairs (4 QSTs/patient). For both, TR-R and IO-R, there were high correlations (r=0.80-0.93) at the affected test area, except for wind-up ratio (TR-R: r=0.67; IO-R: r=0.56) and paradoxical heat sensations (TR-R: r=0.35; IO-R: r=0.44). Mean IO-R (r=0.83, 31% unexplained variance) was slightly lower than TR-R (r=0.86, 26% unexplained variance, P<.05); the difference in variance amounted to 5%. There were no differences between study centres. In a subgroup with an unaffected control area (n=43), reliabilities were significantly better in the test area (TR-R: r=0.86; IO-R: r=0.83) than in the control area (TR-R: r=0.79; IO-R: r=0.71, each P<.01), suggesting that disease-related systematic variance enhances reliability of QST. We conclude that standardized QST performed by trained examiners is a valuable diagnostic instrument with good test-retest and interobserver reliability within 2days. With standardized training, observer bias is much lower than random variance. Quantitative sensory testing performed by trained examiners is a valuable diagnostic instrument with good interobserver and test-retest reliability for use in patients with sensory disturbances of different etiologies to help identify mechanisms of neuropathic and non-neuropathic pain.

Journal ArticleDOI
TL;DR: In this paper, a stochastic response surface method for reliability analysis involving correlated non-normal random variables, in which the Nataf transformation is adopted to effectively transform the correlated nonnormal variables into independent standard normal variables, is presented.

Journal ArticleDOI
TL;DR: In this paper, the authors evaluate the reliability and validity of methods used to assess the multiple components of sedentary behaviour (i.e. screen time, sitting, not moving and existing at low energy expenditure) in children and adolescents.
Abstract: The aim of this review was to evaluate the reliability and validity of methods used to assess the multiple components of sedentary behaviour (i.e. screen time, sitting, not moving and existing at low energy expenditure) in children and adolescents. Twenty-six studies met our inclusion criteria and were reviewed. Thirteen studies reported the reliability of self- and proxy-report measures of sedentary behaviour and seven of these were found to have acceptable test-retest reliability. Evidence for the criterion validity of self- and proxy-report measures was examined in three studies with mixed results. Seven studies examined the reliability and/or validity of direct observation and the findings were generally positive. Five studies demonstrated the utility of accelerometers to accurately classify sedentary behaviour. Self-report measures provide reliable estimates of screen time, yet their validity remains largely untested. While accelerometers can accurately classify participants' behaviour as sedentary, they do not provide information about type of sedentary behaviour or context. Studies utilizing measures of sedentary behaviour need to more adequately report on the validity and reliability of the measures used. We recommend the use of objective measures of sedentary behaviour such as accelerometers, in conjunction with subjective measures (e.g. self-report), to assess type and context of behaviour.

Book
14 Sep 2011
TL;DR: S oftware engineering management encompasses two major functions, planning and control, both of which require the capability to accurately and reliably measure the software being delivered.
Abstract: S oftware engineering management encompasses two major functions, planning and control, both of which require the capability to accurately and reliably measure the software being delivered. Planning of software development projects emphasizes estimation of appropriate budgets and schedules. Control of software development requires a means to measure progress on the project and to perform after-the-fact evaluations of the project, for example, to evaluate the effectiveness of the tools and techniques employed on the project to improve productivity.

Journal ArticleDOI
TL;DR: A new refined definition of soft biometrics is presented, emphasizing on the aspect of human compliance, and candidate traits that accept this novel definition are identified.
Abstract: In this work we seek to provide insight on the general topic of soft biometrics. We firstly present a new refined definition of soft biometrics, emphasizing on the aspect of human compliance, and then proceed to identify candidate traits that accept this novel definition. We then address relations between traits and discuss associated benefits and limitations of these traits. We also consider two novel soft biometric traits, namely weight and color of clothes and we analyze their reliability. Related promising results on the performance are provided. Finally, we consider a new application, namely human identification solely carried out by a bag of facial, body and accessory soft biometric traits, and as an evidence of its practicality, we provide preliminary promising results.

Journal ArticleDOI
Chen Jiang1, Xu Han1, G. Y. Lu1, Jie Liu1, Zhang Zhe1, Y. C. Bai1 
TL;DR: In this article, a non-probabilistic convex model is proposed to construct the multidimensional ellipsoids on the uncertainty, and a covariance matrix and correlation matrix can be created through all marginal convex models and covariances.

Journal ArticleDOI
TL;DR: In this paper, the authors used principal component analysis (PCA), reliability analyses, and linear regression analysis to study the relationship between personal, social and building factors and perceived comfort.

Journal ArticleDOI
TL;DR: This work presents a surrogate-based approach to constructing surrogate models that simultaneously addresses the issues of accuracy, efficiency, and unimportant failure modes and is demonstrated to be both an efficient and accurate method for system-level reliability analysis.

Journal ArticleDOI
TL;DR: Figures indicate good overall levels of internal consistency, inter-rater and test-retest reliability, but some HRSD items do not appear to possess a satisfactory reliability.
Abstract: The aim of this study was to provide a comprehensive meta-analytic review of the reliability of the Hamilton Rating Scale for Depression (HRSD) for the period 1960-2008, taking into consideration all three types of reliability: internal consistency, inter-rater, and test-retest reliability. This is the first such meta-analytic study of a clinician-administered psychiatric scale. A thorough literature search was conducted using MEDLINE and PsycINFO. The total number of collected articles was 5548, of which 409 reported one or more reliability coefficients. The effect size was obtained by the z-transformation of reliability coefficients. The meta-analysis was performed separately for internal consistency, inter-rater and test-retest reliability. A pooled mean for alpha coefficient in random effects model was 0.789 (95%CI 0.766-0.810). The meta-regression analysis revealed that higher alpha coefficients were associated with higher variability of the HRSD total scores. With regard to inter-rater reliability, pooled means in random effects model were 0.937 (95%CI 0.914-0.954) for the intraclass correlation coefficient, 0.81 (95%CI 0.72-0.88) for the kappa coefficient, 0.94 (95%CI 0.90-0.97) for the Pearson correlation coefficient, and 0.91 (95%CI 0.78-0.96) for the Spearman rank correlation coefficient. A meta-regression analysis showed positive association between inter-rater reliability and publication year. Test-retest reliability of HRSD ranged between 0.65 and 0.98 and generally decreased with extending the interval between two measurements (Spearman r between the duration of interval and test-retest reliability figures=-0.74). Results suggest that HRSD provides a reliable assessment of depression. Figures indicate good overall levels of internal consistency, inter-rater and test-retest reliability, but some HRSD items (e.g., "loss of insight") do not appear to possess a satisfactory reliability.

Journal ArticleDOI
TL;DR: Alwin et al. as mentioned in this paper assess the level of unreliability in individual survey items found in general-population surveys, on which much scholarship in sociology and kindred fields depends, and find that respondents may better comprehend short questions and more readily access and retrieve information needed to answer factual ones.
Abstract: This important book assesses the level of unreliability—random measurement error—in individual survey items found in general-population surveys, on which much scholarship in sociology and kindred fields depends. Duane Alwin aspires to reduce measurement error at its source by identifying less error-prone methods of constructing and administering surveys. His study contributes to understanding survey quality by showing how reliability varies with item content and instrument design; many findings provide empirical grounding for well-established survey practices, while others suggest that some common data collection protocols may heighten error. The study rests on an original, unique data base of reliability estimates for nearly 500 individual survey items drawn from longitudinal surveys representing well-defined populations. Questions included measure both basic sociodemographic facts and subjective phenomena (beliefs, attitudes, selfperceptions). Alwin coded item properties (number of response alternatives, length), question content (factual or nonfactual), and survey context (inclusion in a topical series or ‘‘battery’’ of related questions, ordinal position within a questionnaire), and then assessed associations between these design features and reliability. As befits a study of data quality, much of Margins of Error justifies the measurement of its dependent variable, item reliability. Three chapters that outline and critique extant approaches to reliability assessment can be read profitably on their own. But the key here is that Alwin seeks reliability measures for single survey items, not composite scales. He stresses the distinction between multiple measures (verbatimreplicated items) and multiple indicators (distinct items related to a common underlying construct). He finds widely-applied ‘‘internal consistency’’ approaches based on classical test score theory (coefficient a) wanting, because they estimate the reliability of multiple-indicator composites rather than individual items, and because such composites need not be ‘‘univocal’’—that is, they combine indicators that often have imperfectly correlated true scores. A particular difficulty is that those methods understate item reliability by classifying stable, but measure-specific, variance in a survey response as erroneous rather than reliable. Alwin argues that cross-sectional designs cannot adequately estimate the reliability of single items, because respondent memory raises correlations among multiple measures or indicators. He advocates longitudinal designs that administer identically worded questions on at least three occasions, suggesting that those measurements be separated by intervals of up to two years to avoid memoryinduced inflation of reliability estimates. When these demanding data requirements are met, suitable analytic methods can distinguish reliability and stability, and incorporate stable item-specific variance within true score variance. Many results substantiate widely-used and -taught guidelines for constructing survey instruments. For example, reliability tends to be higher for factual questions than for items measuring subjective content, for selfreports than for proxy responses about others, and (usually) for shorter questions. In keeping with much recent methodological research on survey data, Alwin invokes cognitive considerations to interpret such associations; he suggests, for instance, that respondents may better comprehend short questions, and more readily access and retrieve information needed to answer factual ones. Of particular note is Alwin’s finding that the widespread survey practice of presenting items in batteries—sets of consecutive questions using the same response format—tends

Journal ArticleDOI
TL;DR: A comparative analysis of SVM effectiveness in forecasting time-to-failure and reliability of engineered components based on time series data shows that in the analyzed cases, SVM outperforms or is comparable to other techniques.

Journal ArticleDOI
TL;DR: This is the first study to present test-retest reliability data on the self-reported OPUS scales, the PSFS in people with lower-limb amputations, and a new, easier-to-use scoring mechanism for the PEQ.
Abstract: Background Use of outcome measures to examine outcomes of amputation is complicated by a number of factors, including ease of administration and lack of scientific evidence to guide selection and interpretation. Objective The purposes of this study were: (1) to estimate test-retest reliability of a modified version of the Prosthetic Evaluation Questionnaire (PEQ), scales of a version of the 36-Item Short-Form Health Survey questionnaire adapted for the veteran population (SF-36V), the Orthotics and Prosthetics Users' Survey (OPUS), the Patient-Specific Functional Scale (PSFS), the Two-Minute Walk Test, the Six-Minute Walk Test, the Timed “Up & Go” Test, and the Amputee Mobility Predictor; (2) to calculate minimal detectable change (MDC) of each measure; and (3) to conduct item analysis of the modified PEQ. Design This was a multi-site study with repeated measurements. Methods Forty-four patients with unilateral lower-limb amputation participated. Participants were tested twice within 1 week. We calculated test-retest reliability of each measure using intraclass correlation coefficient (ICC [2,1]), estimated standard error of the measurement and MDC, and assessed scale score distribution. Results The study demonstrated strong test-retest reliability scores of performance measures (ICC=.83–.97) suggesting that these measures are good choices for evaluation of people with lower-limb amputation. Reliability of PEQ subscales (ICC=.41–.93) was comparable to that reported in the literature (ICC=.56–.90). Limitations This study examined only statistically measurable differences and did not evaluate whether changes in scores were clinically important. Conclusions Minimal detectable change scores can be used to determine whether change in test scores exceeds measurement error associated with day-to-day variation. This is the first study to present test-retest reliability data on the self-reported OPUS scales, the PSFS in people with lower-limb amputations, and a new, easier-to-use scoring mechanism for the PEQ.

Journal ArticleDOI
TL;DR: This paper presents a helpful tool for readers who want to evaluate or assess the quality of a measurement instrument on reliability and validity using standardised criteria that were recently published by the COSMIN group.
Abstract: High quality instruments are useful tools for clinical and research purposes. To determine whether an instrument has high quality, measurement properties such as reliability and validity need to be assessed, using standardised criteria. This paper discusses these quality domains and measurement properties using the standardised criteria that were recently published by the COSMIN group. Examples are given of studies evaluating the measurement properties of instruments frequently used in trauma. This paper presents a helpful tool for readers who want to evaluate or assess the quality of a measurement instrument on reliability and validity.

Book
31 Aug 2011
TL;DR: In this article, the reliability of measures used in industrial marketing research surveys to identify the structure of buying groups was investigated and the results were reported in a pilot study of the survey.
Abstract: The authors report findings bearing on the reliability of measures used in industrial marketing research surveys to identify the structure of buying groups. Results obtained in a pilot study of the...

Journal ArticleDOI
TL;DR: This paper proposes to use a bivariate Birnbaum–Saunders distribution and its marginal distributions to approximate the reliability function of a product that has two dependent performance characteristics and that their degradation can be modeled by gamma processes.

Journal ArticleDOI
TL;DR: Results indicate that a trial-based functional analysis may be a viable assessment method when resources needed to conduct a standard functional analysis are unavailable.
Abstract: We evaluated a trial-based approach to conducting functional analyses in classroom settings. Ten students referred for problem behavior were exposed to a series of assessment trials, which were interspersed among classroom activities throughout the day. Results of these trial-based functional analyses were compared to those of more traditional functional analyses. Outcomes of both assessments showed correspondence in 6 of the 10 cases and partial correspondence in a 7th case. Results of the standard functional analysis suggested reasons for obtained differences in 2 cases of noncorrespondence, which were verified when portions of the trial-based functional analyses were modified and repeated. These results indicate that a trial-based functional analysis may be a viable assessment method when resources needed to conduct a standard functional analysis are unavailable. Implications for classroom-based assessment methodologies and future directions for research are discussed.

Journal ArticleDOI
TL;DR: In this article, the authors examined the sustainable tourism attitude scale (SUS-TAS), which measures residents' attitudes toward sustainable tourism and identified a shorter version of the scale that would not compromise its psychometric properties.
Abstract: This project examined the Sustainable Tourism Attitude Scale (SUS-TAS), which measures residents’ attitudes toward sustainable tourism. This study has two major purposes: (1) to reassess reliability and construct validity of the 44-item SUS-TAS using confirmatory factor analysis and (2) to identify a shorter version of the SUS-TAS that would not compromise the scale’s psychometric properties. To accomplish these purposes, an empirical study was conducted in rural Orange County, Indiana. Findings support a seven-dimension SUS-TAS model using 27 items that maintained construct validity and internal consistency.