scispace - formally typeset
Search or ask a question

Showing papers by "Gordon H. Guyatt published in 2011"


Journal ArticleDOI
TL;DR: The GRADE process begins with asking an explicit question, including specification of all important outcomes, and provides explicit criteria for rating the quality of evidence that include study design, risk of bias, imprecision, inconsistency, indirectness, and magnitude of effect.

6,093 citations


Journal ArticleDOI
TL;DR: The approach of GRADE to rating quality of evidence specifies four categories-high, moderate, low, and very low-that are applied to a body of evidence, not to individual studies.

5,228 citations


Journal ArticleDOI
TL;DR: In the GRADE approach, randomized trials start as high-quality evidence and observational studies as low- quality evidence, but both can be rated down if most of the relevant evidence comes from studies that suffer from a high risk of bias.

2,059 citations


Journal ArticleDOI
TL;DR: This article introduces a 20-part series providing guidance for the use of GRADE methodology that will appear in the Journal of Clinical Epidemiology.

1,975 citations


Journal ArticleDOI
TL;DR: It is suggested that examination of 95% confidence intervals (CIs) provides the optimal primary approach to decisions regarding imprecision and rating down the quality of evidence is required if clinical action would differ if the upper versus the lower boundary of the CI represented the truth.

1,844 citations


Journal ArticleDOI
TL;DR: Credibility is increased if subgroup effects are based on a small number of a priori hypotheses with a specified direction; subgroup comparisons come from within rather than between studies; tests of interaction generate low P-values; and have a biological rationale.

1,535 citations


Journal ArticleDOI
TL;DR: In the GRADE approach, randomized trials start as high-quality evidence and observational studies as low- quality evidence, but both can be rated down if a body of evidence is associated with a high risk of publication bias.

1,295 citations


Journal ArticleDOI
TL;DR: In considering the importance of a surrogate outcome, authors should rate the importanceof the patient-important outcome for which the surrogate is a substitute and subsequently rate down the quality of evidence for indirectness of outcome.

1,280 citations


Journal ArticleDOI
TL;DR: Decisions regarding indirectness of patients and interventions depend on an understanding of whether biological or social factors are sufficiently different that one might expect substantial differences in the magnitude of effect.

1,225 citations


Journal ArticleDOI
TL;DR: Strong evidence is recorded that tranexamic acid should be given as early as possible to bleeding trauma patients with significant haemorrhage, and for trauma patients admitted late after injury, tranExamic acid is less effective and could be harmful.

969 citations


Journal ArticleDOI
TL;DR: Systematic review authors and guideline developers may also consider rating up quality of evidence when a dose-response gradient is present, and when all plausible confounders or biases would decrease an apparent treatment effect, or would create a spurious effect when results suggest no effect.

Journal ArticleDOI
TL;DR: Routine monitoring of troponin levels in at-risk patients is needed after surgery to detect most MIs, which have an equally poor prognosis regardless of whether they are symptomatic or asymptomatic.
Abstract: Background: Each year, millions of patients worldwide have a perioperative myocardial infarction (MI) after noncardiac surgery. Objective: To examine the characteristics and short-term outcome of perioperative MI. Design: Cohort study. (ClinicalTrials.gov registration number: NCT00182039) Setting: 190 centers in 23 countries. Patients: 8351 patients included in the POISE (PeriOperative ISchemic Evaluation) trial. Measurements: Four cardiac biomarker or enzyme assays were measured within 3 days of surgery. The definition of perioperative MI included either autopsy findings of acute MI or an elevated level of a cardiac biomarker or enzyme and at least 1 of the following defining features: ischemic symptoms, development of pathologic Q waves, ischemic changes on electrocardiography, coronary artery intervention, or cardiac imaging evidence of MI. Results: Within 30 days of random assignment, 415 patients (5.0%) had a perioperative MI. Most MIs (74.1 %) occurred within 48 hours of surgery; 65.3% of patients did not experience ischemic symptoms. The 30-day mortality rate was 11.6% (48 of 415 patients) among patients who had a perioperative MI and 2.2% (178 of 7936 patients) among those who did not (P < 0.001). Among patients with a perioperative MI, mortality rates were elevated and similar between those with (9.7%; adjusted odds ratio, 4.76 [95% CI, 2.68 to 8.43]) and without (12.5%; adjusted odds ratio, 4.00 [CI, 2.65 to 6.06]) ischemic symptoms. Limitation: Cardiac markers were measured only until day 3 after surgery, and additional asymptomatic MIs may have been missed. Conclusion: Most patients with a perioperative MI will not experience ischemic symptoms. Data suggest that routine monitoring of troponin levels in at-risk patients is needed after surgery to detect most MIs, which have an equally poor prognosis regardless of whether they are symptomatic or asymptomatic.

Journal Article
TL;DR: In this article, the characteristics and short-term prognosis of perioperative myocardial infarction (MI) in the setting of noncardiac surgery were studied.
Abstract: Little is known about the characteristics and short-term prognosis of perioperative myocardial infarction (MI) in the setting of noncardiac surgery. In this multinational study of 8351 patients und...

Journal ArticleDOI
TL;DR: In this article, the authors evaluated the effect of low-molecular-weight heparin on venous thromboembolism, bleeding, and other outcomes in critically ill patients.
Abstract: Background The effects of thromboprophylaxis with low-molecular-weight heparin, as compared with unfractionated heparin, on venous thromboembolism, bleeding, and other outcomes are uncertain in critically ill patients. Methods In this multicenter trial, we tested the superiority of dalteparin over unfractionated heparin by randomly assigning 3764 patients to receive either subcutaneous dalteparin (at a dose of 5000 IU once daily) plus placebo once daily (for parallel-group twice-daily injections) or unfractionated heparin (at a dose of 5000 IU twice daily) while they were in the intensive care unit. The primary outcome, proximal leg deep-vein thrombosis, was diagnosed on compression ultrasonography performed within 2 days after admission, twice weekly, and as clinically indicated. Additional testing for venous thromboembolism was performed as clinically indicated. Data were analyzed according to the intention-to-treat principle. Results There was no significant between-group difference in the rate of proximal leg deep-vein thrombosis, which occurred in 96 of 1873 patients (5.1%) receiving dalteparin versus 109 of 1873 patients (5.8%) receiving unfractionated heparin (hazard ratio in the dalteparin group, 0.92; 95% confidence interval [CI], 0.68 to 1.23; P=0.57). The proportion of patients with pulmonary emboli was significantly lower with dalteparin (24 patients, 1.3%) than with unfractionated heparin (43 patients, 2.3%) (hazard ratio, 0.51; 95% CI, 0.30 to 0.88; P=0.01). There was no significant between-group difference in the rates of major bleeding (hazard ratio, 1.00; 95% CI, 0.75 to 1.34; P=0.98) or death in the hospital (hazard ratio, 0.92; 95% CI, 0.80 to 1.05; P=0.21). In prespecified per-protocol analyses, the results were similar to those of the main analyses, but fewer patients receiving dalteparin had heparin-induced thrombocytopenia (hazard ratio, 0.27; 95% CI, 0.08 to 0.98; P=0.046). Conclusions Among critically ill patients, dalteparin was not superior to unfractionated heparin in decreasing the incidence of proximal deep-vein thrombosis. (Funded by the Canadian Institutes of Health Research and others; PROTECT ClinicalTrials.gov number, NCT00182143.).

Journal ArticleDOI
18 Oct 2011-PLOS ONE
TL;DR: Meta-analyses that surpassed their optimal information size had sufficient protection against overestimation of intervention effects due to random error, but the number of patients and events required to limit the risk of overestimation depended considerably on the underlying simulation settings.
Abstract: Background: Meta-analyses including a limited number of patients and events are prone to yield overestimated intervention effect estimates. While many assume bias is the cause of overestimation, theoretical considerations suggest that random error may be an equal or more frequent cause. The independent impact of random error on meta-analyzed intervention effects has not previously been explored. It has been suggested that surpassing the optimal information size (i.e., the required meta-analysis sample size) provides sufficient protection against overestimation due to random error, but this claim has not yet been validated. Methods: We simulated a comprehensive array of meta-analysis scenarios where no intervention effect existed (i.e., relative risk reduction (RRR)=0%) or where a small but possibly unimportant effect existed (RRR=10%). We constructed different scenarios by varying the control group risk, the degree of heterogeneity, and the distribution of trial sample sizes. For each scenario, we calculated the probability of observing overestimates of RRR.20% and RRR.30% for each cumulative 500 patients and 50 events. We calculated the cumulative number of patients and events required to reduce the probability of overestimation of intervention effect to 10%, 5%, and 1%. We calculated the optimal information size for each of the simulated scenarios and explored whether meta-analyses that surpassed their optimal information size had sufficient protection against overestimation of intervention effects due to random error. Results: The risk of overestimation of intervention effects was usually high when the number of patients and events was small and this risk decreased exponentially over time as the number of patients and events increased. The number of patients and events required to limit the risk of overestimation depended considerably on the underlying simulation settings. Surpassing the optimal information size generally provided sufficient protection against overestimation. Conclusions: Random errors are a frequent cause of overestimation of intervention effects in meta-analyses. Surpassing the optimal information size will provide sufficient protection against overestimation.

Journal ArticleDOI
TL;DR: The findings suggest that increasing drug law enforcement is unlikely to reduce drug market violence and that alternative regulatory models will be required if drug supply and drug market Violence are to be meaningfully reduced.

Journal ArticleDOI
TL;DR: An increased troponin measurement after surgery is an independent predictor of mortality, particularly within the first year; limited data suggest an increased creatine kinase muscle and brain isoenzyme measurement also predicts subsequent mortality.
Abstract: There is uncertainty regarding the prognostic value of troponin and creatine kinase muscle and brain isoenzyme measurements after noncardiac surgery.

Journal ArticleDOI
TL;DR: 12 approaches in three categories are identified for enhancing interpretability and usefulness of systematic reviews involving HRQL outcomes: summary estimates derived from the pooled standardized mean difference, conversion to units of the most familiar instrument or to risk difference or odds ratio.
Abstract: Summary of findingsMeta-analyses of continuous HRQL data present difficulties in interpretation when studies use differentinstruments to measure the same or similar construct. Given the interpretational challenges, we have categorizedand described methods for enhancing interpretability of summary estimates for continuous HRQL meta-analyses.These methods fall into three categories on the basis of the data and statistics from which they are derived: thepooled SMD (category 1), the individual trial summary statistics (category 2), and the individual trial summarystatistics and knowledge of the MID for each instrument (category 3) (Table 1). In our examples, all approachesfor obtaining ORs yielded similar results (Table 2). Estimates of differences (i.e., the RD, MD in natural units, andMDinMIDunits)wererelativelysimilarinoneexample(example1)butdiscrepantinanother(example2)(Table2).In example 1, the observed magnitude of effect was consistently large across all three categories. The relativelylarge number of trials and patients and SDs that were reasonably similar across trials (see Table 3) likely contributeto this consistency. Further, the instruments used in these examples (CRQ and SGRQ) also have considerableevidence of validity, which are commonly used in their respective fields and have established MIDs (Joneset al., 1992; Schunemann and Guyatt, 2005; Schunemann et al., 2005).The VAS instruments employed in each of the studies in example 2 also have established measurementproperties and are commonly used to measure pain (Carlson, 1983; DeLoach et al., 1998; Price et al., 1983; WewersandLowe,1990).Inour dexamethasoneforpain example(example2),allsummaryestimatesbasedonthepooledSMD (category 1) appeared large, whereas category 2 and 3 approaches yielded summary estimates suggestingsmall or moderate effects. This discrepancy most likely results from the enrolment of homogeneous populationswith respect to pain, as the SDs are much smaller in relation to their accompanying MDs than in, for instance,example 1 (Tables 3 and 4). All estimates based on the pooled SMD were accompanied by a substantial degreeof uncertainty (i.e., wide CIs).RecommendationsNosingleapproachorcategoryofapproacheswillbeoptimalforallcontinuousHRQLdatameta-analysisscenarios(Table 1). A few clinical and statistical considerations can, however, facilitate the preferred approach in a givenscenario. Figure 1 provides a 2-step algorithm for choosing an optimal approach to enhance interpretability. Weprefer and recommend conversion to probabilities and RD because such measures are useful for trading offdesirable and undesirable consequences (Guyatt et al., 2008b). Both RDs and measures of relative effect are veryfamiliar to clinicians and clinical researchers.If an MID has been established for all instruments, we recommend using category 3 approach ii because thisconversion is anchored in nonarbitrary thresholds (the respective instrument MIDs) and is not vulnerable toheterogeneity across SDs. We also recommend the use of at least one complementary method of reporting. Ifsome of the trials measure the effect with an instrument that is very familiar to clinicians and from which, as aresult, clinicians can infer the importance of the effect, we recommend the reporting of results in natural unitsusing category 2 approach i. In our examples, we have chosen the CRQ and the 10cm VAS as the familiarK. THORLUND ET AL.

Journal ArticleDOI
TL;DR: How causation may relate to developing recommendations and how the Bradford Hill criteria are considered in GRADE are described, using examples from the public health literature with a focus on immunisation.
Abstract: This article describes how the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach to grading the quality of evidence and strength of recommendations considers the Bradford Hill criteria for causation and how GRADE may relate to questions in public health. A primary concern in public health is that evidence from non-randomised studies may provide a more adequate or best available measure of a public health strategy's impact, but that such evidence might be graded as lower quality in the GRADE framework. GRADE, however, presents a framework that describes both criteria for assessing the quality of research evidence and the strength of recommendations that includes considerations arising from the Bradford Hill criteria. GRADE places emphasis on recommendations and in assessing quality of evidence; GRADE notes that randomisation is only one of many relevant factors. This article describes how causation may relate to developing recommendations and how the Bradford Hill criteria are considered in GRADE, using examples from the public health literature with a focus on immunisation.

Journal ArticleDOI
28 Mar 2011-BMJ
TL;DR: Industry funded randomised controlled trials, in the absence of statistically significant primary outcomes, are more likely to report subgroup analyses than non-industry funded trials.
Abstract: Objective To investigate the impact of industry funding on reporting of subgroup analyses in randomised controlled trials. Design Systematic review. Data sources Medline. Study selection Randomised controlled trials published in 118 core clinical journals (defined by the National Library of Medicine) in 2007. 1140 study reports in a 1:1 ratio by high (five general medicine journals with largest number of total citations in 2007) versus lower impact journals, were randomly sampled. Two reviewers, independently and in duplicate, used standardised, piloted forms to screen study reports for eligibility and to extract data. They also used explicit criteria to determine whether a randomised controlled trial reported subgroup analyses. Logistic regression was used to examine the association of prespecified study characteristics with reporting versus not reporting of subgroup analyses. Results 469 randomised controlled trials were included, of which 207 (44%) reported subgroup analyses. High impact journals (adjusted odds ratio 2.64, 95% confidence interval 1.62 to 4.33), non-surgical (versus surgical) trials (2.10, 1.26 to 3.50), and larger sample size (3.38, 1.64 to 6.99) were associated with more frequent reporting of subgroup analyses. The strength of association between trial funding and reporting of subgroups differed in trials with and without statistically significant primary outcomes (interaction P=0.02). In trials without statistically significant results for the primary outcome, industry funded trials were more likely to report subgroup analyses (2.29, 1.30 to 4.72) than non-industry funded trials. This was not true for trials with a statistically significant primary outcome (0.79, 0.46 to 1.36). Industry funded trials were associated with less frequent prespecification of subgroup hypotheses (31.3% v 38.0%, adjusted odds ratio 0.49, 0.26 to 0.94), and less use of the interaction test for analyses of subgroup effects (41.4% v 49.1%, 0.52, 0.28 to 0.97) than non-industry funded trials. Conclusion Industry funded randomised controlled trials, in the absence of statistically significant primary outcomes, are more likely to report subgroup analyses than non-industry funded trials. Industry funded trials less frequently prespecify subgroup hypotheses and less frequently test for interaction than non-industry funded trials. Subgroup analyses from industry funded trials with negative results for the primary outcome should be viewed with caution.

Journal ArticleDOI
28 Jan 2011-PLOS ONE
TL;DR: This systematic review and meta-analysis of randomized controlled trials comparing pressure and volume-limited (PVL) ventilation strategies with more traditional mechanical ventilation in adults with ALI and ARDS suggests that PVL strategies for mechanical ventilation with traditional approaches to ventilation in critically ill adults with AlI andARDS reduce mortality and are associated with increased use of paralytic agents.
Abstract: Background Acute lung injury (ALI) and acute respiratory distress syndrome (ARDS) are life threatening clinical conditions seen in critically ill patients with diverse underlying illnesses. Lung injury may be perpetuated by ventilation strategies that do not limit lung volumes and airway pressures. We conducted a systematic review and meta-analysis of randomized controlled trials (RCTs) comparing pressure and volume-limited (PVL) ventilation strategies with more traditional mechanical ventilation in adults with ALI and ARDS.

Journal ArticleDOI
TL;DR: In this article, patients with diabetes report a strong preference for practical trials measuring the effect of treatments on patient-important outcomes, and patients with poor glycemic control were more likely to prefer HbA1c as a primary end point (odds ratio: 1.5; 95% confidence interval: 1, 2.1).

Journal ArticleDOI
TL;DR: appropriate prophylaxis provides better value in terms of costs and health gains than routine screening for DVT and programs achieving increased adherence to best-practice venous thromboembolism prevention were cost-effective over a wide range of program costs and were robust in probabilistic sensitivity analyses.
Abstract: Rationale: Venous thromboembolism is difficult to diagnose in critically ill patients and may increase morbidity and mortality.Objectives: To evaluate the cost-effectiveness of strategies to reduce morbidity from venous thromboembolism in critically ill patients.Methods: A Markov decision analytic model to compare weekly compression ultrasound screening (screening) plus investigation for clinically suspected deep vein thrombosis (DVT) (case finding) versus case finding alone; and a hypothetical program to increase adherence to DVT prevention. Probabilities were derived from a systematic review of venous thromboembolism in medical–surgical intensive care unit patients. Costs (in 2010 $US) were obtained from hospitals in Canada, Australia, and the United States, and the medical literature. Analyses were conducted from a societal perspective over a lifetime horizon. Outcomes included costs, quality-adjusted life-years (QALY), and incremental cost-effectiveness ratios.Measurements and Main Results: In the bas...

Journal Article
TL;DR: This study suggests that major perioperative vascular events are common, that the RCRI underestimates risk, and that monitoring troponins after surgery can assist physicians to avoid missing myocardial infarction.
Abstract: Objectives among patients undergoing noncardiac surgery, our objectives were to: (1) determine the feasibility of undertaking a large international cohort study; (2) estimate the current incidence of major perioperative vascular events; (3) compare the observed event rates to the expected event rates according to the Revised Cardiac Risk Index (RCRI); and (4) provide an estimate of the proportion of myocardial infarctions without ischemic symptoms that may go undetected without perioperative troponin monitoring. Design An international prospective cohort pilot study. Participants Patients undergoing noncardiac surgery who were >45 years of age, receiving a general or regional anesthetic, and requiring hospital admission. Measurements Patients had a Roche fourth-generation Elecsys troponin T measurement collected 6 to 12 hours postoperatively and on the first, second, and third days after surgery. Our primary outcome was major vascular events (a composite of vascular death [i.e., death from vascular causes], nonfatal myocardial infarction, nonfatal cardiac arrest, and nonfatal stroke) at 30 days after surgery. Our definition for perioperative myocardial infarction included: (1) an elevated troponin T measurement with at least one of the following defining features: ischemic symptoms, development of pathologic Q waves, ischemic electrocardiogram changes, coronary artery intervention, or cardiac imaging evidence of myocardial infarction; or (2) autopsy findings of acute or healing myocardial infarction. Results We recruited 432 patients across 5 hospitals in Canada, China, Italy, Colombia, and Brazil. During the first 30 days after surgery, 6.3% (99% confidence interval 3.9-10.0) of the patients suffered a major vascular event (10 vascular deaths, 16 nonfatal myocardial infarctions, and 1 nonfatal stroke). The observed event rate was increased 6-fold compared with the event rate expected from the RCRI. Of the 18 patients who suffered a myocardial infarction, 12 (66.7%) had no ischemic symptoms to suggest myocardial infarction. Conclusions This study suggests that major perioperative vascular events are common, that the RCRI underestimates risk, and that monitoring troponins after surgery can assist physicians to avoid missing myocardial infarction. These results underscore the need for a large international prospective cohort study.

Journal ArticleDOI
TL;DR: In this article, a critique of a set of Australian clinical practice guidelines (CPG) highlighted problematic issues in guideline development concerning conflicts of interest of guideline panellists, validity and strength of recommendations, and involvement of end users and external stakeholders.
Abstract: A recently published critique of a set of Australian clinical practice guidelines (CPG) highlighted problematic issues in guideline development concerning conflicts of interest of guideline panellists, validity and strength of recommendations, and involvement of end users and external stakeholders. Management of financial or intellectual conflicts of interest requires: full disclosure; limitations on industry or agency financial support during guideline development; a representative panel that includes conflict-free members; and only conflict-free panellists to be involved in drafting guideline recommendations. Guideline panels should consider adopting the GRADE (Grading of Recommendations Assessment, Development and Evaluation) system to assist in determining the validity and strength of recommendations. Guideline panels should seek formal feedback from external stakeholders and end users. Enacting such policies aims to lend greater transparency and credibility to CPG, limit protracted and unhelpful interpretive debates, and promote wider use of CPG.

Journal ArticleDOI
TL;DR: If PROTECT shows thatLMWH is more effective than UFH, this trial will change practice in that LMWH may be the anticoagulant thromboprophylaxis of choice for this population, and PROTECT will be the largest investigator-initiated peer-review funded thromboEmbolism trial in critical care in the world.

Journal ArticleDOI
TL;DR: Disease biology rather than intensity of conditioning regimen seems to determine outcomes of alloHCT in patients aged 40–60 years with AML/MDS.
Abstract: The optimum intensity of conditioning therapy in patients aged 40-60 years with AML and myelodysplastic syndrome (MDS) undergoing allogeneic haematopoietic cell transplantation (alloHCT) remains uncertain. We compared outcomes of reduced intensity conditioning (RIC) and conventional intensity conditioning (CIC) in 101 consecutive patients (CIC, 62; RIC, 39) with AML and MDS aged 40-60 years undergoing alloHCT from 2002 to 2008 at our centre. The median age, unrelated transplants and co-morbidity index were higher in the RIC group. Median OS and EFS were 31.0 months (95% confidence interval (CI): 12.8-59.3) and 20.7 months (95% CI: 11.0-30.4), respectively, with no significant difference between the two cohorts. The 3-year treatment-related mortality (TRM) and relapse were 28% (95% CI: 21-39) and 25% (95% CI: 17-36), respectively, with no significant difference between the two cohorts. No difference in OS, EFS, TRM or relapse was observed between the two cohorts in the multivariate model. Only disease risk was significantly associated with OS (Hazard ratio (HR): 1.85, CI: 1.01-3.45), EFS (HR: 1.73, 95% CI: 1.00-3.10) and cumulative relapse (HR: 3.24, 95% CI: 1.08-10.12). Disease biology rather than intensity of conditioning regimen seems to determine outcomes of alloHCT in patients aged 40-60 years with AML/MDS.


Journal ArticleDOI
TL;DR: The objective is to complete the cultural adaptation of Qualiveen‐30 into Italian, a neurological urinary disorder‐specific health‐related quality of life instrument recommended in the European Association of Urology guideline 2008.
Abstract: Purpose Qualiveen-30 is a neurological urinary disorder (UD)-specific health-related quality of life (HRQL) instrument, recommended in the European Association of Urology guideline 2008. The objective is to complete the cultural adaptation of Qualiveen-30 into Italian. Materials and Methods One hundred and twenty eight Italian-speaking spinal cord injury (SCI) patients completed Qualiveen-30 and the SF-12 physical and mental component (PC and MC) at enrolment and 4 weeks later. At follow-up, patients also made global ratings of change (GRC) in urinary HRQL (GRC). Results Qualiveen-30 proved reliable (intraclass correlation coefficients of four domains: 0.77–0.90). Correlations with SF-12 and GRC were generally consistent with our a priori predictions. Qualiveen-30 domains showed weak-to-moderate cross-sectional correlations with SF-12 scores (0.31–0.45 PC and 0.28–0.45 MC). Correlations between changes in Qualiveen-30 scores and in SF-12-PC scores were weak or absent. Correlations between changes in Qualiveen-30 scores and in SF-12-MC scores were weak to moderate (0.25–0.38). Relationships between change in Qualiveen-30 and GRC were moderate to strong (0.48–0.56). The responsiveness was excellent, similar to the original form (SMR: 1.76–2.31). Minimally important difference values in the four domains varied from 0.34 to 0.47. Conclusions Italian Qualiveen-30 is a reliable, valid, and responsive measure of UD-related HRQL in SPI patients. Investigators can be confident of the Qualiveen-30 questionnaire's ability in distinguishing between patients in a cross-sectional survey, as well as in measuring within-subject changes over time in clinical trials in French, English, and Italian. Neurourol. Urodynam. 30:354–359, 2011. © 2011 Wiley-Liss, Inc.

Journal ArticleDOI
TL;DR: Seven widely used criteria to assess subgroup analyses in the surgical literature are introduced and two examples of sub group analyses from a large randomized trial are included to elaborate on the use of these criteria.
Abstract: Subgroup analyses are often reported in randomized controlled trials and meta-analyses. Apparent subgroup effects may, however, be misleading. Surgeons may therefore find it challenging to decide whether to believe a claim of subgroup effect (i.e., an apparent difference in treatment effect between subgroups of the study population). In the present study, we introduce seven widely used criteria to assess subgroup analyses in the surgical literature and include two examples of subgroup analyses from a large randomized trial to elaborate on the use of these criteria. Typically, inferences regarding subgroup effects are stronger if the comparison is made within rather than between studies, if the test for interaction suggests that chance is an unlikely explanation for apparent differences, if the subgroup hypothesis was specified a priori, if it was one of a small number of hypotheses tested, if the difference in effect between subgroup categories is large, if it is consistent across studies, and if there is indirect evidence supporting the difference (a biological rationale). When testing the impact of surgical interventions, investigators may examine whether the effects differ between subgroups of patients or ways of administering an intervention—so-called subgroup analysis. For instance, in a randomized trial of removable splinting compared with casting for wrist buckle fractures in children, children with moderate injury (but not those with mild or severe injury) in the splint group had a larger change in scores on the Activity Scales for Kids than did the casting group1. In another example, a meta-analysis of sutures compared with staples for skin closure in orthopaedic surgery, the risk of a wound infection developing in patients with hip surgery (but not in other groups) was four times greater after staple closure than after suture closure2. Typically, the primary hypothesis of a randomized trial is to investigate …