scispace - formally typeset
Search or ask a question

Showing papers in "Emerging Themes in Epidemiology in 2017"


Journal ArticleDOI
TL;DR: As multiple imputation becomes further established as a standard approach for handling missing data, it will become increasingly important that researchers employ appropriate model checking approaches to ensure that reliable results are obtained when using this method.
Abstract: Multiple imputation has become very popular as a general-purpose method for handling missing data. The validity of multiple-imputation-based analyses relies on the use of an appropriate model to impute the missing values. Despite the widespread use of multiple imputation, there are few guidelines available for checking imputation models. In this paper, we provide an overview of currently available methods for checking imputation models. These include graphical checks and numerical summaries, as well as simulation-based methods such as posterior predictive checking. These model checking techniques are illustrated using an analysis affected by missing data from the Longitudinal Study of Australian Children. As multiple imputation becomes further established as a standard approach for handling missing data, it will become increasingly important that researchers employ appropriate model checking approaches to ensure that reliable results are obtained when using this method.

123 citations


Journal ArticleDOI
TL;DR: A novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees, and advocates the use of the newer CTree technique due to its simplicity and ease of interpretation.
Abstract: In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods. We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees. Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.

75 citations


Journal ArticleDOI
TL;DR: A proposed framework provides a basis for systematically developing the emergent science of participatory epidemiology and specifies how participatory approaches not only differ from, but also how they can enhance common approaches in epidemiology.
Abstract: Background Epidemiology has contributed in many ways to identifying various risk factors for disease and to promoting population health. However, there is a continuing debate about the ability of epidemiology not only to describe, but also to provide results which can be better translated into public health practice. It has been proposed that participatory research approaches be applied to epidemiology as a way to bridge this gap between description and action. A systematic account of what constitutes participatory epidemiology practice has, however, been lacking.

60 citations


Journal ArticleDOI
TL;DR: This study supports the equivalence of the compared survey designs and suggests that, in the studied setting, using online-only design does not cause strong distortion of the results.
Abstract: Increasing availability of the Internet allows using only online data collection for more epidemiological studies. We compare response patterns in a population-based health survey using two survey designs: mixed-mode (choice between paper-and-pencil and online questionnaires) and online-only design (without choice). We used data from a longitudinal panel, the Hygiene and Behaviour Infectious Diseases Study (HaBIDS), conducted in 2014/2015 in four regions in Lower Saxony, Germany. Individuals were recruited using address-based probability sampling. In two regions, individuals could choose between paper-and-pencil and online questionnaires. In the other two regions, individuals were offered online-only participation. We compared sociodemographic characteristics of respondents who filled in all panel questionnaires between the mixed-mode group (n = 1110) and the online-only group (n = 482). Using 134 items, we performed multinomial logistic regression to compare responses between survey designs in terms of type (missing, “do not know” or valid response) and ordinal regression to compare responses in terms of content. We applied the false discovery rates (FDR) to control for multiple testing and investigated effects of adjusting for sociodemographic characteristic. For validation of the differential response patterns between mixed-mode and online-only, we compared the response patterns between paper and online mode among the respondents in the mixed-mode group in one region (n = 786). Respondents in the online-only group were older than those in the mixed-mode group, but both groups did not differ regarding sex or education. Type of response did not differ between the online-only and the mixed-mode group. Survey design was associated with different content of response in 18 of the 134 investigated items; which decreased to 11 after adjusting for sociodemographic variables. In the validation within the mixed-mode, only two of those were among the 11 significantly different items. The probability of observing by chance the same two or more significant differences in this setting was 22%. We found similar response patterns in both survey designs with only few items being answered differently, likely attributable to chance. Our study supports the equivalence of the compared survey designs and suggests that, in the studied setting, using online-only design does not cause strong distortion of the results.

51 citations


Journal ArticleDOI
TL;DR: This study demonstrates that using routinely collected health data with record linkage and capture–recapture can produce plausible estimates for dementia prevalence and incidence at a population level.
Abstract: Obtaining population-level estimates of the incidence and prevalence of dementia is challenging due to under-diagnosis and under-reporting. We investigated the feasibility of using multiple linked datasets and capture–recapture techniques to estimate rates of dementia among women in Australia. This work is based on the Australian Longitudinal Study on Women’s Health. A random sample of 12,432 women born in 1921–1926 was recruited in 1996. Over 16 years of follow-up records of dementia were obtained from five sources: three-yearly self-reported surveys; clinical assessments for aged care assistance; death certificates; pharmaceutical prescriptions filled; and, in three Australian States only, hospital in-patient records. A total of 2534 women had a record of dementia in at least one of the data sources. The aged care assessments included dementia records for 79.3% of these women, while pharmaceutical data included 34.6%, death certificates 31.0% and survey data 18.5%. In the States where hospital data were available this source included dementia records for 55.8% of the women. Using capture–recapture methods we estimated an additional 728 women with dementia had not been identified, increasing the 16 year prevalence for the cohort from 20.4 to 26.0% (95% confidence interval [CI] 25.2, 26.8%). This study demonstrates that using routinely collected health data with record linkage and capture–recapture can produce plausible estimates for dementia prevalence and incidence at a population level.

44 citations


Journal ArticleDOI
TL;DR: The way scientists detect and trace signals in terms of information transmission is conceptualized, which is a generalization of the mark transmission theory developed by philosopher Wesley Salmon, capable of helping to conceptualize how heterogeneous factors such as micro and macro-biological and psycho-social—are causally linked.
Abstract: In the last decades, Systems Biology (including cancer research) has been driven by technology, statistical modelling and bioinformatics. In this paper we try to bring biological and philosophical thinking back. We thus aim at making different traditions of thought compatible: (a) causality in epidemiology and in philosophical theorizing—notably, the “sufficient-component-cause framework” and the “mark transmission” approach; (b) new acquisitions about disease pathogenesis, e.g. the “branched model” in cancer, and the role of biomarkers in this process; (c) the burgeoning of omics research, with a large number of “signals” and of associations that need to be interpreted. In the paper we summarize first the current views on carcinogenesis, and then explore the relevance of current philosophical interpretations of “cancer causes”. We try to offer a unifying framework to incorporate biomarkers and omic data into causal models, referring to a position called “evidential pluralism”. According to this view, causal reasoning is based on both “evidence of difference-making” (e.g. associations) and on “evidence of underlying biological mechanisms”. We conceptualize the way scientists detect and trace signals in terms of information transmission, which is a generalization of the mark transmission theory developed by philosopher Wesley Salmon. Our approach is capable of helping us conceptualize how heterogeneous factors such as micro and macro-biological and psycho-social—are causally linked. This is important not only to understand cancer etiology, but also to design public health policies that target the right causal factors at the macro-level.

32 citations


Journal ArticleDOI
TL;DR: It is suggested that VL elimination have not been achieved yet because existing transmission dynamics models for VL fails to capture relevant local socio-economic risk factors.
Abstract: Neglected tropical diseases (NTD), account for a large proportion of the global disease burden, and their control faces several challenges including diminishing human and financial resources for those distressed from such diseases. Visceral leishmaniasis (VL), the second-largest parasitic killer (after malaria) and an NTD affects poor populations and causes considerable cost to the affected individuals. Mathematical models can serve as a critical and cost-effective tool for understanding VL dynamics, however, complex array of socio-economic factors affecting its dynamics need to be identified and appropriately incorporated within a dynamical modeling framework. This study reviews literature on vector-borne diseases and collects challenges and successes related to the modeling of transmission dynamics of VL. Possible ways of creating a comprehensive mathematical model is also discussed. Published literature in three categories are reviewed: (i) identifying non-traditional but critical mechanisms for VL transmission in resource limited regions, (ii) mathematical models used for dynamics of Leishmaniasis and other related vector borne infectious diseases and (iii) examples of modeling that have the potential to capture identified mechanisms of VL to study its dynamics. This review suggests that VL elimination have not been achieved yet because existing transmission dynamics models for VL fails to capture relevant local socio-economic risk factors. This study identifies critical risk factors of VL and distribute them in six categories (atmosphere, access, availability, awareness, adherence, and accedence). The study also suggests novel quantitative models, parts of it are borrowed from other non-neglected diseases, for incorporating these factors and using them to understand VL dynamics and evaluating control programs for achieving VL elimination in a resource-limited environment. Controlling VL is expensive for local communities in endemic countries where individuals remain in the vicious cycle of disease and poverty. Smarter public investment in control programs would not only decrease the VL disease burden but will also help to alleviate poverty. However, dynamical models are necessary to evaluate intervention strategies to formulate a cost-effective optimal policy for eradication of VL.

27 citations


Journal ArticleDOI
TL;DR: A practical method is introduced for the analysis of ABPM where the rate of increase or decrease for different periods of the day can be determined and may be particularly useful in examining chronotherapy effects of antihypertensive medication.
Abstract: There are many examples of physiological processes that follow a circadian cycle and researchers are interested in alternative methods to illustrate and quantify this diurnal variation. Circadian blood pressure (BP) deserves additional attention given uncertainty relating to the prognostic significance of BP variability in relation to cardiovascular disease. However, the majority of studies exploring variability in ambulatory blood pressure monitoring (ABPM) collapse the data into single readings ignoring the temporal nature of the data. Advanced statistical techniques are required to explore complete variation over 24 h. We use piecewise linear splines in a mixed-effects model with a constraint to ensure periodicity as a novel application for modelling daily blood pressure. Data from the Mitchelstown Study, a cross-sectional study of Irish adults aged 47–73 years (n = 2047) was utilized. A subsample (1207) underwent 24-h ABPM. We compared patterns between those with and without evidence of subclinical target organ damage (microalbuminuria). We were able to quantify the steepest rise and fall in SBP, which occurred just after waking (2.23 mmHg/30 min) and immediately after falling asleep (−1.93 mmHg/30 min) respectively. The variation about an individual’s trajectory over 24 h was 12.3 mmHg (standard deviation). On average those with microalbuminuria were found to have significantly higher SBP (7.6 mmHg, 95% CI 5.0–10.1) after adjustment for age, sex and BMI. Including an interaction term between each linear spline and microalbuminuria did not improve model fit. We have introduced a practical method for the analysis of ABPM where we can determine the rate of increase or decrease for different periods of the day. This may be particularly useful in examining chronotherapy effects of antihypertensive medication. It offers new measures of short-term BP variability as we can quantify the variation about an individual’s trajectory but also allows examination of the variation in slopes between individuals (random-effects).

25 citations


Journal ArticleDOI
TL;DR: The simple expression of the common HR estimator would be a useful summary of exposure effect, which is less sensitive to censoring patterns than the marginal HR estimators, which are complementary alternatives for each other.
Abstract: In matched-pair cohort studies with censored events, the hazard ratio (HR) may be of main interest. However, it is lesser known in epidemiologic literature that the partial maximum likelihood estimator of a common HR conditional on matched pairs is written in a simple form, namely, the ratio of the numbers of two pair-types. Moreover, because HR is a noncollapsible measure and its constancy across matched pairs is a restrictive assumption, marginal HR as “average” HR may be targeted more than conditional HR in analysis. Based on its simple expression, we provided an alternative interpretation of the common HR estimator as the odds of the matched-pair analog of C-statistic for censored time-to-event data. Through simulations assuming proportional hazards within matched pairs, the influence of various censoring patterns on the marginal and common HR estimators of unstratified and stratified proportional hazards models, respectively, was evaluated. The methods were applied to a real propensity-score matched dataset from the Rotterdam tumor bank of primary breast cancer. We showed that stratified models unbiasedly estimated a common HR under the proportional hazards within matched pairs. However, the marginal HR estimator with robust variance estimator lacks interpretation as an “average” marginal HR even if censoring is unconditionally independent to event, unless no censoring occurs or no exposure effect is present. Furthermore, the exposure-dependent censoring biased the marginal HR estimator away from both conditional HR and an “average” marginal HR irrespective of whether exposure effect is present. From the matched Rotterdam dataset, we estimated HR for relapse-free survival of absence versus presence of chemotherapy; estimates (95% confidence interval) were 1.47 (1.18–1.83) for common HR and 1.33 (1.13–1.57) for marginal HR. The simple expression of the common HR estimator would be a useful summary of exposure effect, which is less sensitive to censoring patterns than the marginal HR estimator. The common and the marginal HR estimators, both relying on distinct assumptions and interpretations, are complementary alternatives for each other.

24 citations


Journal ArticleDOI
TL;DR: This article proposes that most study designs for the evaluation of cluster-level interventions fall into four broad categories: the cluster randomised trial (CRT), the non-randomised cluster trial (NCT), the controlled before-and-after study (CBA), and the before- and- after study without control (BA).
Abstract: The preferred method to evaluate public health interventions delivered at the level of whole communities is the cluster randomised trial (CRT). The practical limitations of CRTs and the need for alternative methods continue to be debated. There is no consensus on how to classify study designs to evaluate interventions, and how different design features are related to the strength of evidence. This article proposes that most study designs for the evaluation of cluster-level interventions fall into four broad categories: the CRT, the non-randomised cluster trial (NCT), the controlled before-and-after study (CBA), and the before-and-after study without control (BA). A CRT needs to fulfil two basic criteria: (1) the intervention is allocated at random; (2) there are sufficient clusters to allow a statistical between-arm comparison. In a NCT, statistical comparison is made across trial arms as in a CRT, but treatment allocation is not random. The defining feature of a CBA is that intervention and control arms are not compared directly, usually because there are insufficient clusters in each arm to allow a statistical comparison. Rather, baseline and follow-up measures of the outcome of interest are compared in the intervention arm, and separately in the control arm. A BA is a CBA without a control group. Each design may provide useful or misleading evidence. A precise baseline measurement of the outcome of interest is critical for causal inference in all studies except CRTs. Apart from statistical considerations the exploration of pre/post trends in the outcome allows a more transparent discussion of study weaknesses than is possible in non-randomised studies without a baseline measure.

23 citations


Journal ArticleDOI
TL;DR: This review aims to identify spatial analysis methods used in CRTs and improve understanding of the impact of spatial effects on trial results and found that existing approaches fell into two categories; spatial variables and spatial modelling.
Abstract: Cluster randomised trials (CRTs) often use geographical areas as the unit of randomisation, however explicit consideration of the location and spatial distribution of observations is rare. In many trials, the location of participants will have little importance, however in some, especially against infectious diseases, spillover effects due to participants being located close together may affect trial results. This review aims to identify spatial analysis methods used in CRTs and improve understanding of the impact of spatial effects on trial results. A systematic review of CRTs containing spatial methods, defined as a method that accounts for the structure, location, or relative distances between observations. We searched three sources: Ovid/Medline, Pubmed, and Web of Science databases. Spatial methods were categorised and details of the impact of spatial effects on trial results recorded. We identified ten papers which met the inclusion criteria, comprising thirteen trials. We found that existing approaches fell into two categories; spatial variables and spatial modelling. The spatial variable approach was most common and involved standard statistical analysis of distance measurements. Spatial modelling is a more sophisticated approach which incorporates the spatial structure of the data within a random effects model. Studies tended to demonstrate the importance of accounting for location and distribution of observations in estimating unbiased effects. There have been a few attempts to control and estimate spatial effects within the context of human CRTs, but our overall understanding is limited. Although spatial effects may bias trial results, their consideration was usually a supplementary, rather than primary analysis. Further work is required to evaluate and develop the spatial methodologies relevant to a range of CRTs.

Journal ArticleDOI
TL;DR: In longitudinal studies with loss to follow-up, incorporating proxies for this study outcome obtained via linkage to external sources of data as auxiliary variables in MI models can give practically important bias reduction and efficiency gains when the study outcome is MNAR.
Abstract: When an outcome variable is missing not at random (MNAR: probability of missingness depends on outcome values), estimates of the effect of an exposure on this outcome are often biased. We investigated the extent of this bias and examined whether the bias can be reduced through incorporating proxy outcomes obtained through linkage to administrative data as auxiliary variables in multiple imputation (MI). Using data from the Avon Longitudinal Study of Parents and Children (ALSPAC) we estimated the association between breastfeeding and IQ (continuous outcome), incorporating linked attainment data (proxies for IQ) as auxiliary variables in MI models. Simulation studies explored the impact of varying the proportion of missing data (from 20 to 80%), the correlation between the outcome and its proxy (0.1–0.9), the strength of the missing data mechanism, and having a proxy variable that was incomplete. Incorporating a linked proxy for the missing outcome as an auxiliary variable reduced bias and increased efficiency in all scenarios, even when 80% of the outcome was missing. Using an incomplete proxy was similarly beneficial. High correlations (> 0.5) between the outcome and its proxy substantially reduced the missing information. Consistent with this, ALSPAC analysis showed inclusion of a proxy reduced bias and improved efficiency. Gains with additional proxies were modest. In longitudinal studies with loss to follow-up, incorporating proxies for this study outcome obtained via linkage to external sources of data as auxiliary variables in MI models can give practically important bias reduction and efficiency gains when the study outcome is MNAR.

Journal ArticleDOI
TL;DR: Through a flexible joint-modeling approach, this approach provides a means to simultaneously estimate lung function trajectories and the risk of pulmonary exacerbations for individual patients and demonstrates how this approach offers additional insights into the clinical course of cystic fibrosis that were not possible using conventional approaches.
Abstract: Epidemiologic surveillance of lung function is key to clinical care of individuals with cystic fibrosis, but lung function decline is nonlinear and often impacted by acute respiratory events known as pulmonary exacerbations. Statistical models are needed to simultaneously estimate lung function decline while providing risk estimates for the onset of pulmonary exacerbations, in order to identify relevant predictors of declining lung function and understand how these associations could be used to predict the onset of pulmonary exacerbations. Using longitudinal lung function (FEV1) measurements and time-to-event data on pulmonary exacerbations from individuals in the United States Cystic Fibrosis Registry, we implemented a flexible semiparametric joint model consisting of a mixed-effects submodel with regression splines to fit repeated FEV1 measurements and a time-to-event submodel for possibly censored data on pulmonary exacerbations. We contrasted this approach with methods currently used in epidemiological studies and highlight clinical implications. The semiparametric joint model had the best fit of all models examined based on deviance information criterion. Higher starting FEV1 implied more rapid lung function decline in both separate and joint models; however, individualized risk estimates for pulmonary exacerbation differed depending upon model type. Based on shared parameter estimates from the joint model, which accounts for the nonlinear FEV1 trajectory, patients with more positive rates of change were less likely to experience a pulmonary exacerbation (HR per one standard deviation increase in FEV1 rate of change = 0.566, 95% CI 0.516–0.619), and having higher absolute FEV1 also corresponded to lower risk of having a pulmonary exacerbation (HR per one standard deviation increase in FEV1 = 0.856, 95% CI 0.781–0.937). At the population level, both submodels indicated significant effects of birth cohort, socioeconomic status and respiratory infections on FEV1 decline, as well as significant effects of gender, socioeconomic status and birth cohort on pulmonary exacerbation risk. Through a flexible joint-modeling approach, we provide a means to simultaneously estimate lung function trajectories and the risk of pulmonary exacerbations for individual patients; we demonstrate how this approach offers additional insights into the clinical course of cystic fibrosis that were not possible using conventional approaches.

Journal ArticleDOI
TL;DR: A smaller group within this population, with arguably more complex neurological disabilities, children with more than one ND had the highest number of admissions and longest inpatient stays, and this group may be at greater risk of the adverse effects of hospitalisations.
Abstract: Advances in healthcare have improved the survival of children with neurological disabilities (ND). Studies in the US have shown that children with ND use a substantial proportion of resources in children’s hospitals, however, little research has been conducted in the UK. We aimed to test the hypothesis that children with neurological disabilities use more inpatient resources than children without neurological disabilities, and to quantify any significant differences in resource use. A retrospective observational study was conducted, looking at the number of hospital admissions, total inpatient days and the reason for admissions for paediatric inpatients from January 1st to March 31st 2015. Inpatients were assigned into one of three groups: children without ND, children with one ND, and children with more than one ND. The sample population included 942 inpatients (mean age 6y 6mo). Children with at least one ND accounted for 15.3% of the inpatients, 17.7% of total hospital inpatient admission episodes, and 27.8% of the total inpatients days. Neurological disability had a statistically significant effect on total hospital admissions (p < 0.001). Neurological disability also had a statistically significant effect on total inpatient days (p < 0.001). Neurological disability increased the length of inpatient stay across medicine, specialties, and surgery. Children with ND had more frequent hospital admission episode and longer inpatient stays. We identified a smaller group within this population, with arguably more complex neurological disabilities, children with more than one ND. This group had the highest number of admissions and longest inpatient stays. More frequent hospital admissions and longer inpatient stays may place children with ND at greater risk of the adverse effects of hospitalisations. We recommend further investigations looking at each the effects of the different categories of ND on inpatient resource use, and repeat of this study at a national level and over a longer period of time.