scispace - formally typeset
Search or ask a question

Showing papers on "Reliability (statistics) published in 2017"


Proceedings ArticleDOI
21 Jul 2017
TL;DR: The channel and spatial reliability concepts are introduced to DCF tracking and a novel learning algorithm is provided for its efficient and seamless integration in the filter update and the tracking process.
Abstract: Short-term tracking is an open and challenging problem for which discriminative correlation filters (DCF) have shown excellent performance. We introduce the channel and spatial reliability concepts to DCF tracking and provide a novel learning algorithm for its efficient and seamless integration in the filter update and the tracking process. The spatial reliability map adjusts the filter support to the part of the object suitable for tracking. This allows tracking of non-rectangular objects as well as extending the search region. Channel reliability reflects the quality of the learned filter and it is used as a feature weighting coefficient in localization. Experimentally, with only two simple standard features, HOGs and Colornames, the novel CSR-DCF method – DCF with Channel and Spatial Reliability – achieves state-of-the-art results on VOT 2016, VOT 2015 and OTB. The CSR-DCF runs in real-time on a CPU.

941 citations


Book ChapterDOI
02 Nov 2017
TL;DR: This work uses a simple and common pre-processing step ---adding a constant shift to the input data--- to show that a transformation with no effect on the model can cause numerous methods to incorrectly attribute.
Abstract: Saliency methods aim to explain the predictions of deep neural networks. These methods lack reliability when the explanation is sensitive to factors that do not contribute to the model prediction. We use a simple and common pre-processing step which can be compensated for easily—adding a constant shift to the input data—to show that a transformation with no effect on how the model makes the decision can cause numerous methods to attribute incorrectly. In order to guarantee reliability, we believe that the explanation should not change when we can guarantee that two networks process the images in identical manners. We show, through several examples, that saliency methods that do not satisfy this requirement result in misleading attribution. The approach can be seen as a type of unit test; we construct a narrow ground truth to measure one stated desirable property. As such, we hope the community will embrace the development of additional tests.

490 citations


Journal ArticleDOI
TL;DR: This study examined the intercoder reliability and validity of WebPlotDigitizer (Rohatgi, 2015), a web-based plot digitizing tool for extracting data from a variety of plots, including XY coordinates of interrupted time-series data.
Abstract: Quantitative synthesis of data from single-case designs (SCDs) is becoming increasingly common in psychology and education journals. Because researchers do not ordinarily report numerical data in addition to graphical displays, reliance on plot digitizing tools is often a necessary component of this research. Intercoder reliability of data extraction is a commonly overlooked, but potentially important, step of this process. The purpose of this study was to examine the intercoder reliability and validity of WebPlotDigitizer (Rohatgi, 2015), a web-based plot digitizing tool for extracting data from a variety of plots, including XY coordinates of interrupted time-series data. Two coders extracted 3,596 data points from 168 data series in 36 graphs across 18 studies. Results indicated high levels of intercoder reliability and validity. Implications of and recommendations based on these results are discussed in relation to researchers involved in quantitative synthesis of data from SCDs.

361 citations


Journal ArticleDOI
TL;DR: Reliability was lowest for subcortical connections and highest for within‐network cortical connections, and Multivariate reliability was greater than univariate; these findings are among the first to underscore this distinction for functional connectivity.
Abstract: Best practices are currently being developed for the acquisition and processing of resting-state magnetic resonance imaging data used to estimate brain functional organization-or "functional connectivity." Standards have been proposed based on test-retest reliability, but open questions remain. These include how amount of data per subject influences whole-brain reliability, the influence of increasing runs versus sessions, the spatial distribution of reliability, the reliability of multivariate methods, and, crucially, how reliability maps onto prediction of behavior. We collected a dataset of 12 extensively sampled individuals (144 min data each across 2 identically configured scanners) to assess test-retest reliability of whole-brain connectivity within the generalizability theory framework. We used Human Connectome Project data to replicate these analyses and relate reliability to behavioral prediction. Overall, the historical 5-min scan produced poor reliability averaged across connections. Increasing the number of sessions was more beneficial than increasing runs. Reliability was lowest for subcortical connections and highest for within-network cortical connections. Multivariate reliability was greater than univariate. Finally, reliability could not be used to improve prediction; these findings are among the first to underscore this distinction for functional connectivity. A comprehensive understanding of test-retest reliability, including its limitations, supports the development of best practices in the field.

296 citations


Journal ArticleDOI
TL;DR: These studies confirmed the validity and good reliability and internal consistency of session-RPE method in several sports and physical activities with men and women of different age categories (children, adolescents, and adults) among various expertise levels.
Abstract: Purpose: The aim of this review is to (1) retrieve all data validating the Session-rating of perceived exertion (RPE)-method using various criteria, (2) highlight the rationale of this method and its ecological usefulness, and (3) describe factors that can alter RPE and users of this method should take into consideration. Method: Search engines such as SPORTDiscus, PubMed, and Google Scholar databases in the English language between 2001 and 2016 were consulted for the validity and usefulness of the session-RPE method. Studies were considered for further analysis when they used the session-RPE method proposed by Foster et al. in 2001. Participants were athletes of any gender, age, or level of competition. Studies using languages other than English were excluded in the analysis of the validity and reliability of the session-RPE method. Other studies were examined to explain the rationale of the session-RPE method and the origin of RPE. Results: A total of 950 studies cited the Foster et al. study that proposed the session RPE-method. 36 studies have examined the validity and reliability of this proposed method using the modified CR-10. Conclusion: These studies confirmed the validity and good reliability and internal consistency of session-RPE method in several sports and physical activities with men and women of different age categories (children, adolescents, and adults) among various expertise levels. This method could be used as "standing alone" method for training load (TL) monitoring purposes though some recommend to combine it with other physiological parameters as heart rate.

282 citations


Journal ArticleDOI
TL;DR: Results show that LIF and the new method proposed in this research are very efficient when dealing with nonlinear performance function, small probability, complicated limit state and engineering problems with high dimension.

268 citations


Patent
08 Mar 2017
TL;DR: In this article, a method for rejecting an unintentional palm touch is described, where the software module uses the reliability value and an activity context to determine a confidence level of the touch and if the confidence level for the touch is too low, it may be rejected.
Abstract: A method for rejecting an unintentional palm touch is disclosed. In at least some embodiments, a touch is detected by a touch-sensitive surface associated with a display. Characteristics of the touch may be used to generate a set of parameters related to the touch. In an embodiment, firmware is used to determine a reliability value for the touch. The reliability value and the location of the touch is provided to a software module. The software module uses the reliability value and an activity context to determine a confidence level of the touch. In an embodiment, the confidence level may include an evaluation of changes in the reliability value over time. If the confidence level for the touch is too low, it may be rejected.

260 citations


Journal Article
TL;DR: The three articles in this special issue illustrate current trends in reliability engineering, with a focus on software engineering and the connected world.
Abstract: Reliability engineering dates back to reliability studies in the 20th century; since then, various models have been defined and used. Software engineering plays a key role from several viewpoints, but the main concern is that we're moving toward a more connected world, including enterprises and mobile devices. The three articles in this special issue illustrate current trends in this domain.

250 citations


Book ChapterDOI
26 Jun 2017

248 citations


Journal ArticleDOI
TL;DR: In this article, an automated algorithm for unified rejection and repair of bad trials in magnetoencephalography (MEG) and EEG (EEG) signals is presented. But this approach is limited to EEG and cannot handle MEG.

243 citations


Posted Content
TL;DR: An attempt has been taken here to review the reliability and validity, and threat to them, of measurement instruments that are used in research.
Abstract: Reliability and validity are the two most important and fundamental features in the evaluation of any measurement instrument or tool for a good research. The purpose of this research is to discuss the validity and reliability of measurement instruments that are used in research. Validity concerns what an instrument measures, and how well it does so. Reliability concerns the faith that one can have in the data obtained from the use of an instrument, that is, the degree to which any measuring tool controls for random error. An attempt has been taken here to review the reliability and validity, and threat to them in some details.

Journal ArticleDOI
TL;DR: In this article, the authors discuss the validity and reliability of measurement instruments that are used in research and discuss the threat to them in some details, and present an attempt to review the reliability and validity of such instruments.
Abstract: Reliability and validity are two most important and fundamental features in the evaluation of any measurement instrument or toll for a good research. The purpose of this research is to discuss the validity and reliability of measurement instruments that are used in research. Validity concerns what an instrument measures, and how well it does so. Reliability concerns the faith that one can have in the data obtained from use of an instrument, that is, the degree to which any measuring tool controls for random error. An attempt has been taken here to review the reliability and validity, and threat to them in some details.

Journal ArticleDOI
TL;DR: This work develops an efficient reliability method which takes advantage of the Adaptive Support Vector Machine (ASVM) and the Monte Carlo Simulation (MCS), leading to accurate estimation of failure probability with rather low computational cost.


Journal ArticleDOI
TL;DR: This review focuses on the unprecedented opportunities that consumer physical activity monitors offer for human physiology and pathophysiology research because of their ability to measure activity continuously under real-life conditions and because they are already widely used by consumers.
Abstract: A sedentary lifestyle and lack of physical activity are well-established risk factors for chronic disease and adverse health outcomes. Thus, there is enormous interest in measuring physical activity in biomedical research. Many consumer physical activity monitors, including Basis Health Tracker, BodyMedia Fit, DirectLife, Fitbit Flex, Fitbit One, Fitbit Zip, Garmin Vivofit, Jawbone UP, MisFit Shine, Nike FuelBand, Polar Loop, Withings Pulse O2, and others have accuracies similar to that of research-grade physical activity monitors for measuring steps. This review focuses on the unprecedented opportunities that consumer physical activity monitors offer for human physiology and pathophysiology research because of their ability to measure activity continuously under real-life conditions and because they are already widely used by consumers. We examine current and potential uses of consumer physical activity monitors as a measuring or monitoring device, or as an intervention in strategies to change behavior and predict health outcomes. The accuracy, reliability, reproducibility, and validity of consumer physical activity monitors are reviewed, as are limitations and challenges associated with using these devices in research. Other topics covered include how smartphone apps and platforms, such as the Apple ResearchKit, can be used in conjunction with consumer physical activity monitors for research. Lastly, the future of consumer physical activity monitors and related technology is considered: pattern recognition, integration of sleep monitors, and other biosensors in combination with new forms of information processing.

Journal ArticleDOI
TL;DR: It is concluded that utilizing the variance of the dynamic connectivity is an important component in any dynamic FC-derived summary measure, as the fluctuations of dynamic FC has a strong potential to provide summary measures that can be used to find meaningful individual differences in dynamic FC.

Journal ArticleDOI
TL;DR: The use of surrogate outcomes should be limited to situations where a surrogate has demonstrated robust ability to predict meaningful benefits, or where cases are dire, rare or with few treatment options.
Abstract: Surrogate outcomes are not intrinsically beneficial to patients, but are designed to be easier and faster to measure than clinically meaningful outcomes. The use of surrogates as an endpoint in clinical trials and basis for regulatory approval is common, and frequently exceeds the guidance given by regulatory bodies. In this article, we demonstrate that the use of surrogates in oncology is widespread and increasing. At the same time, the strength of association between the surrogates used and clinically meaningful outcomes is often unknown or weak. Attempts to validate surrogates are rarely undertaken. When this is done, validation relies on only a fraction of available data, and often concludes that the surrogate is poor. Post-marketing studies, designed to ensure drugs have meaningful benefits, are often not performed. Alternatively, if a drug fails to improve quality of life or overall survival, market authorization is rarely revoked. We suggest this reliance on surrogates, and the imprecision surrounding their acceptable use, means that numerous drugs are now approved based on small yet statistically significant increases in surrogates of questionable reliability. In turn, this means the benefits of many approved drugs are uncertain. This is an unacceptable situation for patients and professionals, as prior experience has shown that such uncertainty can be associated with significant harm. The use of surrogate outcomes should be limited to situations where a surrogate has demonstrated robust ability to predict meaningful benefits, or where cases are dire, rare or with few treatment options. In both cases, surrogates must be used only when continuing studies examining hard endpoints have been fully recruited.

Posted Content
TL;DR: In this article, a simple and common pre-processing step, adding a constant shift to the input data, is used to show that a transformation with no effect on the model can cause numerous methods to incorrectly attribute.
Abstract: Saliency methods aim to explain the predictions of deep neural networks. These methods lack reliability when the explanation is sensitive to factors that do not contribute to the model prediction. We use a simple and common pre-processing step ---adding a constant shift to the input data--- to show that a transformation with no effect on the model can cause numerous methods to incorrectly attribute. In order to guarantee reliability, we posit that methods should fulfill input invariance, the requirement that a saliency method mirror the sensitivity of the model with respect to transformations of the input. We show, through several examples, that saliency methods that do not satisfy input invariance result in misleading attribution.

Journal ArticleDOI
TL;DR: The load–velocity relationship 1RM prediction method used in this study cannot accurately modify sessional training loads because of large V1RM variability, and could not accurately predict 1RM, which was stable between trials.
Abstract: Banyard, HG, Nosaka, K, and Haff, GG. Reliability and validity of the load–velocity relationship to predict the 1RM back squat. J Strength Cond Res 31(7): 1897–1904, 2017—This study investigated the reliability and validity of the load–velocity relationship to predict the free-weight back sq

Journal ArticleDOI
TL;DR: An extended TODIM method based on the Choquet integral for multi-criteria decision-making (MCDM) problems with linguistic Z-numbers is developed, which is a more comprehensive reflection of the decision-makers’ cognition but also is more in line with expression habits.
Abstract: Z-numbers are a new concept considering both the description of cognitive information and the reliability of information. Linguistic terms are useful tools to adequately and effectively model real-life cognitive information, as well as to characterize the randomness of events. However, a form of Z-numbers, in which their two components are in the form of linguistic terms, is rarely studied, although it is common in decision-making problems. In terms of Z-numbers and linguistic term sets, we provided the definition of linguistic Z-numbers as a form of Z-numbers or a subclass of Z-numbers. Then, we defined some operations of linguistic Z-numbers and proposed a comparison method based on the score and accuracy functions of linguistic Z-numbers. We also presented the distance measure of linguistic Z-numbers. Next, we developed an extended TODIM (an acronym in Portuguese of interactive and multi-criteria decision-making) method based on the Choquet integral for multi-criteria decision-making (MCDM) problems with linguistic Z-numbers. Finally, we provided an example concerning the selection of medical inquiry applications to demonstrate the feasibility of our proposed approach. We then verified the applicability and superiority of our approach through comparative analyses with other existing methods. Illustrative and comparative analyses indicated that the proposed approach was valid and feasible for different decision-makers and cognitive environments. Furthermore, the final ranking results of the proposed approach were closer to real decision-making processes. Linguistic Z-numbers can flexibly characterize real cognitive information as well as describe the reliability of information. This method not only is a more comprehensive reflection of the decision-makers’ cognition but also is more in line with expression habits. The proposed method inherited the merits of the classical TODIM method and considers the interactivity of criteria; therefore, the proposed method was effective for dealing with real-life MCDM problems. Consideration about bounded rational and the interactivity of criteria made final outcomes convincing and consistent with real decision-making.

Journal ArticleDOI
01 Jul 2017-RMD Open
TL;DR: The EULAR-OMERACT score demonstrated moderate-good reliability in MCP joints using a standardised scan and is equally applicable in non-MCP joints, which should underpin improved reliability and consequently the responsiveness of US in RA clinical trials.
Abstract: Objectives To test the reliability of new ultrasound (US) definitions and quantification of synovial hypertrophy (SH) and power Doppler (PD) signal, separately and in combination, in a range of joints in patients with rheumatoid arthritis (RA) using the European League Against Rheumatisms–Outcomes Measures in Rheumatology (EULAR-OMERACT) combined score for PD and SH. Methods A stepwise approach was used: (1) scoring static images of metacarpophalangeal (MCP) joints in a web-based exercise and subsequently when scanning patients; (2) scoring static images of wrist, proximal interphalangeal joints, knee and metatarsophalangeal joints in a web-based exercise and subsequently when scanning patients using different acquisitions (standardised vs usual practice). For reliability, kappa coefficients (κ) were used. Results Scoring MCP joints in static images showed substantial intraobserver variability but good to excellent interobserver reliability. In patients, intraobserver reliability was the same for the two acquisition methods. Interobserver reliability for SH (κ=0.87) and PD (κ=0.79) and the EULAR-OMERACT combined score (κ=0.86) were better when using a ‘standardised’ scan. For the other joints, the intraobserver reliability was excellent in static images for all scores (κ=0.8–0.97) and the interobserver reliability marginally lower. When using standardised scanning in patients, the intraobserver was good (κ=0.64 for SH and the EULAR-OMERACT combined score, 0.66 for PD) and the interobserver reliability was also good especially for PD (κ range=0.41–0.92). Conclusion The EULAR-OMERACT score demonstrated moderate-good reliability in MCP joints using a standardised scan and is equally applicable in non-MCP joints. This scoring system should underpin improved reliability and consequently the responsiveness of US in RA clinical trials.

Journal ArticleDOI
Paul Willner1
TL;DR: It is concluded that CMS is in fact a rather robust model, but the factors that result in a less effective implementation in a minority of laboratories remain to be firmly established.

Journal ArticleDOI
TL;DR: This work has found that cooperative spectrum sensing among cognitive users to increase the reliability of detection in cognitive radio systems is beneficial.
Abstract: Cognitive radio systems necessitate the incorporation of cooperative spectrum sensing among cognitive users to increase the reliability of detection. We have found that cooperative spectrum sensing...

Journal ArticleDOI
TL;DR: A systematic review of clinical assessment methods for classifying Generalized Joint Hypermobility (GJH), evaluate their clinimetric properties, and perform the best evidence synthesis of these methods found BS with cut‐point of 5 of 9 including historical information to be the best method for clinical use in adults.
Abstract: The purpose was to perform a systematic review of clinical assessment methods for classifying Generalized Joint Hypermobility (GJH), evaluate their clinimetric properties, and perform the best evidence synthesis of these methods. Four test assessment methods (Beighton Score [BS], Carter and Wilkinson, Hospital del Mar, Rotes-Querol) and two questionnaire assessment methods (Five-part questionnaire [5PQ], Beighton Score-self reported [BS-self]) were identified on children or adults. Using the Consensus-based Standards for selection of health Measurement Instrument (COSMIN) checklist for evaluating the methodological quality of the identified studies, all included studies were rated "fair" or "poor." Most studies were using BS, and for BS the reliability most of the studies showed limited positive to conflicting evidence, with some shortcomings on studies for the validity. The three other test assessment methods lack satisfactory information on both reliability and validity. For the questionnaire assessment methods, 5PQ was the most frequently used, and reliability showed conflicting evidence, while the validity had limited positive to conflicting evidence compared with test assessment methods. For BS-self, the validity showed unknown evidence compared with test assessment methods. In conclusion, following recommended uniformity of testing procedures, the recommendation for clinical use in adults is BS with cut-point of 5 of 9 including historical information, while in children it is BS with cut-point of at least 6 of 9. However, more studies are needed to conclude on the validity properties of these assessment methods, and before evidence-based recommendations can be made for clinical use on the "best" assessment method for classifying GJH. © 2017 Wiley Periodicals, Inc.

Journal ArticleDOI
TL;DR: Systematic reviewers often encounter incomplete or missing data, and the information desired may be difficult to obtain from a study author, so they have to resort to estimating data from figures with little or no raw data in a study's corresponding text or tables.
Abstract: Background Systematic reviewers often encounter incomplete or missing data, and the information desired may be difficult to obtain from a study author. Thus, systematic reviewers may have to resort to estimating data from figures with little or no raw data in a study's corresponding text or tables. Methods We discuss a case study in which participants used a publically available Web-based program, called webplotdigitizer, to estimate data from 2 figures. We evaluated and used the intraclass coefficient and the accuracy of the estimates to the true data to inform considerations when using estimated data from figures in systematic reviews. Results The estimates for both figures were consistent, although the distribution of estimates in the figure of a continuous outcome was slightly higher. For the continuous outcome, the percent difference ranged from 0.23% to 30.35% while the percent difference of the event rate ranged from 0.22% to 8.92%. For both figures, the intraclass coefficient was excellent (>0.95). Conclusions Systematic reviewers should consider and be transparent when estimating data from figures when the information cannot be obtained from study authors and perform sensitivity analyses of pooled results to reduce bias.

Journal ArticleDOI
TL;DR: It is shown that while QM is highly effective in correcting bias, it cannot ensure reliability in forecast ensemble spread or guarantee coherence, because QM ignores the correlation between raw ensemble forecasts and observations.
Abstract: GCMs are used by many national weather services to produce seasonal outlooks of atmospheric and oceanic conditions and fluxes. Postprocessing is often a necessary step before GCM forecasts can be applied in practice. Quantile mapping (QM) is rapidly becoming the method of choice by operational agencies to postprocess raw GCM outputs. The authors investigate whether QM is appropriate for this task. Ensemble forecast postprocessing methods should aim to 1) correct bias, 2) ensure forecasts are reliable in ensemble spread, and 3) guarantee forecasts are at least as skillful as climatology, a property called “coherence.” This study evaluates the effectiveness of QM in achieving these aims by applying it to precipitation forecasts from the POAMA model. It is shown that while QM is highly effective in correcting bias, it cannot ensure reliability in forecast ensemble spread or guarantee coherence. This is because QM ignores the correlation between raw ensemble forecasts and observations. When raw foreca...

Journal ArticleDOI
TL;DR: Psychometric data provided on the updated Sexual Experiences Survey–Short Form Perpetration (SES-SFP) and the SES-SFV with men and women supported the validity and reliability of both the SETs.
Abstract: The Sexual Experiences Survey (SES), the most widely used measure of unwanted sexual experiences, was recently updated (Koss et al., 2007). The purpose of this study was to provide psychometric data on the updated Sexual Experiences Survey-Short Form Perpetration (SES-SFP) and the Sexual Experiences Survey-Short Form Victimization (SES-SFV). Men (n = 136) and women (n = 433) were randomly assigned to in-person or Internet formats of administration for 3 measurement points. Women completed victimization surveys and trauma measures. Men completed perpetration surveys and attitude/ personality measures. Results supported the validity and reliability of both the SES-SFV with women and the SES-SFP with men. Further research is needed regarding the use of the SES-SFV with men and the SES-SFP with women.

Journal ArticleDOI
TL;DR: A modified cognitive reliability and error analysis method (CREAM) for estimating the human error probability in the maritime accident process on the basis of an evidential reasoning approach to facilitate subjective human reliability analysis in different engineering systems where uncertainty exists in practice.
Abstract: This article proposes a modified cognitive reliability and error analysis method (CREAM) for estimating the human error probability in the maritime accident process on the basis of an evidential reasoning approach. This modified CREAM is developed to precisely quantify the linguistic variables of the common performance conditions and to overcome the problem of ignoring the uncertainty caused by incomplete information in the existing CREAM models. Moreover, this article views maritime accident development from the sequential perspective, where a scenario- and barrier-based framework is proposed to describe the maritime accident process. This evidential reasoning-based CREAM approach together with the proposed accident development framework are applied to human reliability analysis of a ship capsizing accident. It will facilitate subjective human reliability analysis in different engineering systems where uncertainty exists in practice.© 2017 Society for Risk Analysis. Language: en

Journal ArticleDOI
TL;DR: The validity of the Japanese version of the GDS-15-J for depression assessed against DSM-IV-TR criteria was excellent and some items that might be removed in future studies of an abbreviated scale are suggested.
Abstract: Objective: The 15-item Geriatric Depression Scale (GDS-15) is one of the most widely used screening instruments for depression among the elderly. The aim of this study was to examine the va...

Journal ArticleDOI
19 Nov 2017-Energies
TL;DR: In this paper, the authors collected results of different initiatives and harmonized the results, and mapped the existing reliability characteristics to a system structure according to Reference Designation System for Power Plants (RDS-PP®).
Abstract: Performance (availability and yield) and reliability of wind turbines can make the difference between success and failure of wind farm projects and these factors are vital to decrease the cost of energy. During the last years, several initiatives started to gather data on the performance and reliability of wind turbines on- and offshore and published findings in different journals and conferences. Even though the scopes of the different initiatives are similar, every initiative follows a different approach and results are therefore difficult to compare. The present paper faces this issue, collects results of different initiatives and harmonizes the results. A short description and assessment of every considered data source is provided. To enable this comparison, the existing reliability characteristics are mapped to a system structure according to the Reference Designation System for Power Plants (RDS-PP®). The review shows a wide variation in the performance and reliability metrics of the individual initiatives. Especially the comparison on onshore wind turbines reveals significant differences between the results. Only a few publications are available on offshore wind turbines and the results show an increasing performance and reliability of offshore wind turbines since the first offshore wind farms were erected and monitored.