scispace - formally typeset
Search or ask a question

Showing papers by "Ashley Akbari published in 2018"


Journal ArticleDOI
TL;DR: The analysis has shown that the UKMSR portal population is highly analogous to the entirely clinical (validated) population and therefore that the portal population can be utilised as a viable and valid cohort of people with Multiple Sclerosis for study.
Abstract: The UK Multiple Sclerosis Register (UKMSR) is a large cohort study designed to capture ‘real world’ information about living with multiple sclerosis (MS) in the UK from diverse sources. The primary source of data is directly from people with Multiple Sclerosis (pwMS) captured by longitudinal questionnaires via an internet portal. This population's diagnosis of MS is self-reported and therefore unverified. The second data source is clinical data which is captured from MS Specialist Treatment centres across the UK. This includes a clinically confirmed diagnosis of MS (by Macdonald criteria) for consented patients. A proportion of the internet population have also been consented at their hospital making comparisons possible. This dataset is called the ‘linked dataset’. The purpose of this paper is to examine the characteristics of the three datasets: the self-reported portal data, clinical data and linked data, in order to assess the validity of the self-reported portal data. The internet (n = 11,021) and clinical (n = 3,003) populations were studied for key shared characteristics. We found them to be closely matched for mean age at diagnosis (clinical = 37.39, portal = 39.28) and gender ratio (female %, portal = 73.1, clinical = 75.2). The Two Sample Kolmogorov-Smirnov test was for the continuous variables to examine is they were drawn from the same distribution. The null hypothesis was rejected only for age at diagnosis (D = 0.078, p Our analysis has shown that the UKMSR portal population is highly analogous to the entirely clinical (validated) population. This supports the validity of the self-reported diagnosis and therefore that the portal population can be utilised as a viable and valid cohort of people with Multiple Sclerosis for study.

30 citations


Journal ArticleDOI
TL;DR: These implications are the beginnings of a research agenda for Population Data Science, which if approached as a collective field can catalyze significant advances in the understanding of trends in society, health, and human behavior.
Abstract: Information is increasingly digital, creating opportunities to respond to pressing issues about human populations in near real time using linked datasets that are large, complex, and diverse. The potential social and individual benefits that can come from data-intensive science are large, but raise challenges of balancing individual privacy and the public good, building appropriate socio-technical systems to support data-intensive science, and determining whether defining a new field of inquiry might help move those collective interests and activities forward. A combination of expert engagement, literature review, and iterative conversations led to our conclusion that defining the field of Population Data Science (challenge 3) will help address the other two challenges as well. We define Population Data Science succinctly as the science of data about people and note that it is related to but distinct from the fields of data science and informatics. A broader definition names four characteristics of: data use for positive impact on citizens and society; bringing together and analyzing data from multiple sources; finding population-level insights; and developing safe, privacy-sensitive and ethical infrastructure to support research. One implication of these characteristics is that few people possess all of the requisite knowledge and skills of Population Data Science, so this is by nature a multi-disciplinary field. Other implications include the need to advance various aspects of science, such as data linkage technology, various forms of analytics, and methods of public engagement. These implications are the beginnings of a research agenda for Population Data Science, which if approached as a collective field, can catalyze significant advances in our understanding of trends in society, health, and human behavior.

27 citations


Journal ArticleDOI
TL;DR: In this article, the authors found that parent-reported wheezing was more prevalent than GP-recorded asthma diagnoses in preschool-aged children, and this difference diminishes with age.
Abstract: Introduction Electronic health records (EHRs) are increasingly used to estimate the prevalence of childhood asthma. The relation of these estimates to those obtained from parent-reported wheezing suggestive of asthma is unclear. We hypothesised that parent-reported wheezing would be more prevalent than general practitioner (GP)-recorded asthma diagnoses in preschool-aged children. Methods 1529 of 1840 (83%) Millennium Cohort Study children registered with GPs in the Welsh Secure Anonymised Information Linkage databank were linked. Prevalences of parent-reported wheezing and GP-recorded asthma diagnoses in the previous 12 months were estimated, respectively, from parent report at ages 3, 5, 7 and 11 years, and from Read codes for asthma diagnoses and prescriptions based on GP EHRs over the same time period. Prevalences were weighted to account for clustered survey design and non-response. Cohen’s kappa statistics were used to assess agreement. Results Parent-reported wheezing was more prevalent than GP-recorded asthma diagnoses at 3 and 5 years. Both diminished with age: by age 11, prevalences of parent-reported wheezing and GP-recorded asthma diagnosis were 12.9% (95% CI 10.6 to 15.4) and 10.9% (8.8 to 13.3), respectively (difference: 2% (−0.5 to 4.5)). Other GP-recorded respiratory diagnoses accounted for 45.7% (95% CI 37.7 to 53.9) and 44.8% (33.9 to 56.2) of the excess in parent-reported wheezing at ages 3 and 5 years, respectively. Conclusion Parent-reported wheezing is more prevalent than GP-recorded asthma diagnoses in the preschool years, and this difference diminishes in primary school-aged children. Further research is needed to evaluate the implications of these differences for the characterisation of longitudinal childhood asthma phenotypes from EHRs.

17 citations


Journal ArticleDOI
01 Oct 2018-BMJ Open
TL;DR: The effectiveness of home adaptations, both in preventing hospital admissions due to falls for older people, and improving timely discharge are evaluated, to provide evidence for services at the interface between health and social care, informing policies seeking to promote healthy ageing through prudent healthcare and prevention.
Abstract: Introduction This study will evaluate the effectiveness of home adaptations, both in preventing hospital admissions due to falls for older people, and improving timely discharge. Results will provide evidence for services at the interface between health and social care, informing policies seeking to promote healthy ageing through prudent healthcare and fall prevention. Methods and analysis All individuals living in Wales, UK, aged 60 years and over, will be included in the study using anonymised linked data from the Secure Anonymised Information Linkage Databank. We will use a national database of home modifications implemented by the charity organisation Care & Repair Cymru (C&R) from 2009 to 2017 to define an intervention cohort. We will use the electronic Frailty Index to assign individual levels of frailty (fit, mild, moderate or severe) and use these to create a comparator group (non-C&R) of people who have not received a C&R intervention. Coprimary outcomes will be quarterly numbers of emergency hospital admissions attributed to falls at home, and the associated length of stay. Secondary outcomes include the time in moving to a care home following a fall, and the indicative financial costs of care for individuals who had a fall. We will use appropriate multilevel generalised linear models to analyse the number of hospital admissions related to falls. We will use Cox proportional hazard models to compare the length of stay for fall-related hospital admissions and the time in moving to a care home between the C&R and non-C&R cohorts. We will assess the impact per frailty group, correct for population migration and adjust for confounding variables. Indicative costs will be calculated using financial codes for individual-level hospital stays. Results will provide evidence for services at the interface between health and social care, informing policies seeking to promote healthy ageing through prudent healthcare and prevention. Ethics and dissemination Information governance requirements for the use of record-linked data have been approved and only anonymised data will be used in our analysis. Our results will be submitted for publication in peer-reviewed journals. We will also work with lay members and the knowledge transfer team at Swansea University to create communication and dissemination materials on key findings.

12 citations


Journal ArticleDOI
TL;DR: This study fills the gap in the evidence base around environmental planning policy to shape living environments to benefit health and contribute to international work on impacts of the built environment on mental health and wellbeing.
Abstract: IntroductionGreen-blue spaces (GBS), such as parks, woodlands, and beaches, may be beneficial for population mental health and wellbeing. However, there are few longitudinal studies on the association between GBS and mental health and wellbeing, and few that incorporate network analysis as opposed to simple Euclidian proximity. Objectives and ApproachWe are examining the association between the availability of GBS with wellbeing and common mental health disorders. We will use geographic information systems (GIS) to create quarterly household level GBS availability data using digital map and satellite data (2008-2018) for over 1 million homes in Wales, United Kingdom. We will link GBS availability to individual level mental health (1.7 million people with General Practitioner (GP) data) and data from the National Survey for Wales (n = 24,000) on wellbeing (Warwick Edinburgh Mental Wellbeing Scale (WEMWBS)) using the Secure Anonymised Information Linkage (SAIL) databank. ResultsWe created an historic dataset of GBS availability using road network and path data to create quarterly household level GBS exposures (2008-2018). We tested Residential Anonymised Linking Fields (RALFs) and accurately linked 97\% of individuals and their health data to their home and GBS exposure. The 1.65 million exposure-health data pairs, updated quarterly, will enable a longitudinal panel study to be built. Using GP recorded data on treatments, diagnoses, symptoms and prescriptions for mental health problems we identified 35,000 people had a common mental health disorder in 2016, and 24,000 people answered the National Survey for Wales questions about their wellbeing and use of GBS. We will explore how house moves, and visits to GBS change the association between GBS availability and outcomes. Conclusion/ImplicationsThis study fills the gap in the evidence base around environmental planning policy to shape living environments to benefit health. It will inform the planning and management of GBS in urban and rural environments and contribute to international work on impacts of the built environment on mental health and wellbeing.

8 citations


Journal ArticleDOI
TL;DR: This project focused on the risk of the new non-vitamin K Target Specific Oral Anticoagulants and collaborated between the Farr institute in Wales and Scotland and used a fusion of two approaches among five considered.
Abstract: IntroductionDue to various regulatory barriers, it is increasingly difficult to move pseudonymised routine health data across platforms and among jurisdictions. To tackle this challenge, we summarized five approaches considered to support a scientific research project focused on the risk of the new non-vitamin K Target Specific Oral Anticoagulants (TSOACs) and collaborated between the Farr institute in Wales and Scotland. ApproachIn Wales, routinely collected health records held in the Secure Anonymous Information Linkage (SAIL) Databank were used to identify the study cohort. In Scotland, data was extracted from national dataset resources administered by the eData Research & Innovation Service (eDRIS) and stored in the Scottish National Data Safe Haven. We adopted a federated data and multiple analysts approach, but arranged simultaneous accesses for Welsh and Scottish analysts to generate study cohorts separately by implementing the same algorithm. Our study cohort across two countries was boosted to 6,829 patients towards risk analysis. Source datasets and data types applied to generate cohorts were reviewed and compared by analysts based on both sites to ensure the consistency and harmonised output. DiscussionThis project used a fusion of two approaches among five considered. The approach we adopted is a simple, yet efficient and cost-effective method to ensure consistency in analysis and coherence with multiple governance systems. It has limitations and potentials of extending and scaling. It can also be considered as an initialisation of a developing infrastructure to support a distributed team science approach to research using Electronic Health Records (EHRs) across the UK and more widely. KeywordsTeam science, cross-jurisdictional data linkage, electronic health records

5 citations


Journal ArticleDOI
TL;DR: This study will explore whether asthma and seasonal allergic rhinitis, when exacerbated by acute exposure to air pollution, is associated with educational attainment, as a proxy for cognition.
Abstract: Introduction There is a lack of evidence on the adverse effects of air pollution on cognition for people with air quality-related health conditions. We propose that educational attainment, as a proxy for cognition, may increase with improved air quality. This study will explore whether asthma and seasonal allergic rhinitis, when exacerbated by acute exposure to air pollution, is associated with educational attainment. Objectives To describe the preparation of individual and household-level linked environmental and health data for analysis within an anonymised safe haven. Also to introduce our statistical analysis plan for our study: COgnition, Respiratory Tract illness and Effects of eXposure (CORTEX). Methods We imported daily air pollution and aeroallergen data, and individual level education data into the SAIL databank, an anonymised safe haven for person-based records. We linked individual-level education, socioeconomic and health data to air quality data for home and school locations, creating tailored exposures for individuals across a city. We developed daily exposure data for all pupils in repeated cross sectional exam cohorts (2009-2015). Conclusion We have used the SAIL databank, an innovative, data safe haven to create individual-level exposures to air pollution and pollen for multiple daily home and school locations. The analysis platform will allow us to evaluate retrospectively the impact of air quality on attainment for multiple cross-sectional cohorts of pupils. Our methods will allow us to distinguish between the pollution impacts on educational attainment for pupils with and without respiratory health conditions. The results from this study will further our understanding of the effects of air quality and respiratory-related health conditions on cognition. Highlights This city-wide study includes longitudinal routinely-recorded educational attainment data for all pupils taking exams over seven years; High spatial resolution air pollution data were linked within a privacy protected databank to obtain individual exposure at multiple daily locations; This study will use health data linked at the individual level to explore associations between air pollution, related morbidity, and educational attainment.

4 citations


Journal ArticleDOI
TL;DR: The results demonstrate that NLP techniques can be used to accurately extract rich phenotypic details from clinic letters that is often missing from routinely-collected data, in addition to potential applicability to other disease areas.
Abstract: IntroductionElectronic health records (EHR) are a powerful resource in enabling large-scale healthcare research EHRs often lack detailed disease-specific information that is collected in free text within clinical settings This challenge can be addressed by using Natural Language Processing (NLP) to derive and extract detailed clinical information from free text Objectives and ApproachUsing a training sample of 40 letters, we used the General Architecture for Text Engineering (GATE) framework to build custom rule sets for nine categories of epilepsy information as well as clinic date and date of birth We used a validation set of 200 clinic letters to compare the results of our algorithm to a separate manual review by a clinician, where we evaluated a “per item” and a “per letter” approach for each category ResultsThe “per letter” approach identified 1,939 items of information with overall precision, recall and F1-score of 927%, 777% and 856% Precision and recall for epilepsy specific categories were: diagnosis (853%,924%), type (937%,832%), focal seizure (990%,683%), generalised seizure (925%,570%), seizure frequency (920%,523%), medication (961%,940%), CT (667%,471%), MRI (966%,514%) and EEG (958%,406%) By combining all items per category, per letter we were able to achieve higher precision, recall and F1-scores of 946%, 842% and 890% across all categories Conclusion/ImplicationsOur results demonstrate that NLP techniques can be used to accurately extract rich phenotypic details from clinic letters that is often missing from routinely-collected data Capturing these new data types provides a platform for conducting novel precision neurology research, in addition to potential applicability to other disease areas

4 citations


Journal ArticleDOI
TL;DR: Factors associated with changing general practice in early life and continuity of participation in the Secure Anonymised Information Linkage (SAIL) databank, to which approximately 80% of Welsh practices contribute, are investigated.
Abstract: IntroductionPrimary care electronic health records (pcEHRs) are a valuable resource for life course research, however loss to follow up due to changing practices has received little attention. We investigated factors associated with changes in registration and record continuity in the Secure Anonymised Information Linkage (SAIL) databank, with ~80% practice coverage. Objectives and ApproachWe analysed linked pcEHRs for 1834 (882 girls) Millennium Cohort Study (MCS) participants, resident in Wales and with parental consent to health record linkage at the age seven MCS interview. We studied time from first to next general practice (GP) registration in Wales by fitting Cox proportional hazards models, and estimated mutually-adjusted hazard ratios (aHRs) for the following factors: child (sex, ethnicity, mode of delivery, gestation, birthweight, neonatal illness, wheeze, longstanding illness); maternal (age, education, lone parent status); household (income, housing tenure, residential mobility, urban/rural residence); GP type (SAIL-contributing/-non-contributing). Analyses were weighted for survey design (Stata: Release 15; StataCorp LP). ResultsThere were 3065 Welsh GP registrations for 1834 children. By age 5 years, 25% of children changed GP at least once, with 1070 (58.3%), 477 (26.0%) and 287 (15.7%) registered with 1, 2, 3+ GPs respectively up to 14 years of age. Children with older mothers (aHRs; 95% CI: 0.96; 0.95, 0.98; per year) or those residing in rural areas (0.75;0.56,0.99) were less likely, and those whose first registration was not with a SAIL contributing GP (2.16;1.60,2.93), whose mothers had no educational qualifications (1.40;1.15,1.71), or had recently changed address (1.62;1.21,2.16) more likely, to change GP. 305 (16.6%) children had never registered with a SAIL-contributing GP. Of 403 children initially registered with a SAIL contributing GP who then changed GP, 66.7% re-registered with a SAIL contributing GP. Conclusion/ImplicationsGeographically contiguous primary care databanks, such as the SAIL databank, enable a high proportion of children to be reliably followed over time despite changing GP. Similar analyses of databases based on geographically disparate volunteer GPs are needed to quality assure their suitability for life course epidemiology research.

3 citations


Journal ArticleDOI
TL;DR: The generation of comparable walkability indicators for the built environment has allowed subsequent analysis into hospital admissions for people living with T2D in Caranda and Wales, and highlights the challenges in creating internationally comparable environmental exposure metrics.
Abstract: BackgroundThe impacts of the built environment on health is a widely studied international area of research. One area of research is how urban morphology (e.g. active living environments, also known as neighbourhood walkability) may promote healthy behaviour within a population. However urban morphology and data relating to the built environment varies across different countries. ObjectivesOne of the challenges in international studies is producing consistent, comparable measures of the built environment, in this case active living environments. As part of a study which compares the impact of neighbourhood environments on health outcomes for patients with type 2 diabetes (T2D), neighbourhood-level measures for walkable environments were derived for Canada and Wales using Geographic Information Systems (GIS). MethodsUsing method based upon the Canadian Active Living Environments Database (Can-ALE) we created walkability indicators for Wales, UK. We created GIS models using OpenStreetMap and Office for National Statistics (ONS) Open Data to produce walkability metrics for each Lower Layer Super Output Area (LSOA) in Wales for linkage into the SAIL databank. We compared the GIS generated walkability metrics for Wales with those produced for Canada to evaluate whether the GIS methods are internationally transferable in the context of generating walkability indictors and associations with T2D. FindingsThis work highlights the challenges in creating internationally comparable environmental exposure metrics. The differences in urban morphology and scale in Canada and Wales are significant, however this work demonstrates how with considered methodological choices these differences can be overcome to generate comparable built environment indicators. ConclusionsThe generation of comparable walkability indicators for the built environment has allowed subsequent analysis into hospital admissions for people living with T2D in Caranda and Wales. This study has wider implications for international research into the impacts of the built environment on population health and are reproducible on future studies.

3 citations


Journal ArticleDOI
TL;DR: Combining and harmonising data from multiple sources and linking them to information from a longitudinal cohort create useful resources for population health research and can be utilised by other researchersand projects.
Abstract: BackgroundHarmonisation of different data sources from various electronic health records (EHRs) across systems enhances the potential scope and granularity of data available to health data research. ObjectiveTo describe data harmonisation of routine electronic healthcare records in Wales and Scotland linked to a UK longitudinal birth cohort, the Millennium Cohort Study (MCS). MethodsComparable secondary care data was linked, with parental consent, to MCS information for 1838 and 1431 children participating in MCS and residing in Wales and Scotland, by assigning, respectively, unique Anonymised Linkage Fields to personbased records in the privacy protecting Secure Anonymised Information Linkage (SAIL) databank at Swansea University, and by the National Health Service (NHS) Information Standards Division. Survey and non-response weights were created to account for the clustered sample, sample attrition and consent to linkage. Heterogeneous variables from the Patient Episode Dataset for Wales, Emergency Department Data Set for Wales, Scottish Medical Record 01 and Accident and Emergency dataset for Scotland were harmonised enabling data to be pooled and standardised for research. FindingsOverall linkage to harmonised health care data was achieved for 98.9% (99.9% for Wales and 97.6% for Scotland) of consented MCS participants. 66% of children experienced at least one hospital admission (total 5747 hospital admissions) up totheir 14th birthday, while 60% attended A&E departments at least once (total 5221 attendances) between their 9th and 14th birthday. We managed date granularity by generating random dates of birth, standardising periods of data collection,identifying inconsistencies and then mapping and bridging differences in definitions of periods of care across countries and datasets. ConclusionsCombining and harmonising data from multiple sources and linking them to information from a longitudinal cohort create useful resources for population health research. These methods are reproducible and can be utilised by other researchersand projects.

Journal ArticleDOI
TL;DR: This work validates and refines the eFI, which is a particularly useful resource as it uses existing primary care data to identify frailty, meaning no additional resources are required.
Abstract: IntroductionAging populations with increasing frailty have major implications for health services internationally, and evidence-based treatment becomes increasingly important The development of an electronic Frailty Index (eFI) using routine primary care data facilitates implementation of evidence-based interventions However, the eFI does not account for time restrictions regarding when information was recorded Objectives and ApproachOur aim is to implement and further validate the eFI using the Secure Anonymised Information Linkage (SAIL) databank, introducing refinements based on time restrictions Our implementation of the eFI identifies frailty based on 1574 Read codes, which are mapped amongst 36 categories known as deficits The eFI is based on the internationally established cumulative deficit model, and each deficit contributes equally to the eFI value However, although each deficit is equally weighted, only one of them is currently time dependent We therefore analyse the time at which each deficit is identified, and propose time dependent cut-points based on our findings ResultsWe were able to successfully implement the eFI using data from over 400,000 individuals from the Welsh population using data held in the SAIL databank Our results agree with the baseline characteristics and distributions of frailty found in the original development of the eFI We also found that the percentage of individuals identified as frail increased as the number of years of records included was increased Furthermore, the increase in percentage year by year was almost linear for a number of the deficits This led to the identification of time bounds for particular deficits, which could help to refine future implementations of the eFI Conclusion/ImplicationsOur work validates and refines the eFI, which is a particularly useful resource as it uses existing primary care data to identify frailty, meaning no additional resources are required Furthermore, our implementation is readily available, meaning that future research related to frailty is easily reproducible and achievable by others

Proceedings ArticleDOI
01 Apr 2018-BMJ Open
TL;DR: Fe feasibility of undertaking a multi-centre cluster randomised trial to evaluate clinical and cost effectiveness of referral of patients attended by emergency ambulance paramedic with low-risk TIA directly to specialist TIA clinic for early review is assessed.
Abstract: Aim Early specialist assessment of Transient Ischaemic Attack (TIA) can reduce the risk of stroke and death. We assessed feasibility of undertaking a multi-centre cluster randomised trial to evaluate clinical and cost effectiveness of referral of patients attended by emergency ambulance paramedic with low-risk TIA directly to specialist TIA clinic for early review. Method We randomly allocated volunteer paramedics to intervention or control group. Intervention paramedics were trained to deliver the intervention during the patient recruitment period. Control paramedics continued to deliver care as usual. Patients with TIA were identified from hospital records. Results Development and recruitment phases are complete, with outcome follow up ongoing. Eighty nine of 134 (66%) paramedics participated in TIER. Of 1377 patients attended by trial paramedics during the patient recruitment period, 53 (3.8%) were identified as eligible for trial inclusion. Three of 36 (8%) patients attended by intervention paramedics were referred to the TIA clinic. Of the others, only one appeared to be a missed referral; in one case there was no prehospital record of TIA; one was attended by a paramedic who was not TIER trained; one patient record was missing; all others were recorded with contraindications: FAST positive (n=13); ABCD2 score >3 (n=5); already taking warfarin (n=2); crescendo TIA (n=1) other clinical factors (n=8). Conclusion Preliminary results indicate challenges in recruitment and low referral rates. Further analyses will focus on whether progression criteria for a definitive trial were met, and clinical outcomes from this feasibility trial. Conflict of interest None Funding None

Journal ArticleDOI
Sophie Wood1, Sarah Rees1, Ting Wang1, Amanda Marchant1, Ashley Akbari1, Ann John1 
TL;DR: Routine data has the potential to make a difference to care, however collection and access needs to be standardised in order to improve efficiency and effectiveness in improving the care for children and young people with mental health disorders.
Abstract: The diagnosis, management and services available for mental disorders are of growing concern and controversy in the UK. Transitional care between child and adult services and the interface between primary and secondary/ specialist services is often disjointed and thresholds for referral to Child and Adolescent Mental Health Services are high. Objectives and Approach Routinely collected healthcare datasets and data linkage were used to identify patterns of healthcare utilisation by young people and young adults with mental health disorders across the four UK Nations. We explored the extent to which routinely collected datasets can contribute to an assessment of the health needs and the quality of care that children and young people with mental health disorders receive. Data was requested from the national data providers in each country. A series of descriptive analyses were performed and methods were developed for cross- national comparisons to be made (e.g. Four Nation Person Spell). Results It is feasible to explore healthcare utilisation across the four countries of the UK using routine data. However the recording, availability and access varied considerably between countries, making meaningful comparisons challenging. Descriptive analyses showed strong deprivation gradients in the diagnoses and care provided for young people and young adults with mental health disorders. Depression and anxiety were the most commonly recorded mental health conditions in primary care. In secondary care drug/alcohol disorders and self-harm were the most commonly recorded. Re-admissions to emergency departments were higher for those admitted for self-harm or psychiatric conditions. Conclusion/Implications Routine data has the potential to make a difference to care. However collection and access needs to be standardised in order to improve efficiency and effectiveness in improving the care for children and young people with mental health disorders. MQ has funded an Adolescent Data Platform to facilitate this.

Journal ArticleDOI
TL;DR: Data harmonisation for a UK longitudinal birth cohort, the Millennium Cohort Study (MCS), which was linked to routine inpatient and emergency department, and, where available, general practice and child health records for 1838 Welsh and 1431 Scottish consenting MCS participants is described.
Abstract: IntroductionHarmonization of different data sources from various electronic health records across systems enhances the potential scope and granularity of data available to health data research, providing more opportunities for research by improving the generalizability and effective sample size of a range of outcome metrics. Objectives and ApproachThis study describes data harmonisation for a UK longitudinal birth cohort, the Millennium Cohort Study (MCS) which was linked to routine inpatient and emergency department, and, where available, general practice and child health records for 1838 Welsh and 1431 Scottish consenting MCS participants. Datasets requiring harmonisation were: from Wales, Patient Episode Dataset for Wales (PEDW) and Emergency Department Data Set (EDDS) data and from Scotland, Scottish Medical Record 01 (SMR01) and Accident and Emergency dataset (A&E2). Heterogeneous variables were created by transforming variable names, concepts, codes to improve scope for analysis. ResultsA harmonized dataset of 2166 participants and 5747 hospital admissions were derived of cohort members who had at least 1 hospital inpatient or AE standardising periods of data collection; identifying inconsistencies and then mapping and bridging differences in definitions of periods of care and levels of diagnostic and operational coding across countries and datasets. Conclusion/ImplicationsHeterogeneous variables from different data sources were pooled and converted into standardised data for research, extending existing harmonisation work, including curation of a population based anonymously linkable longitudinal cohort. [AA1] These methods are reproducible and can be utilised by other researchers and projects applying to use these routine data sources.

Journal ArticleDOI
TL;DR: Using individual-level multi-location daily exposure assessment will help to clarify the role of traffic and prevent potential community-level confounding, and treatment seeking behaviour may explain the positive association between SAR and educational attainment.
Abstract: BackgroundThere is a lack of evidence of the adverse effects which air quality has on cognition for people with air quality-related health conditions, these are not widely documented in the literature. Educational attainment, as a proxy for cognition, may increase with improved air quality. ObjectivesPrepare individual and household level linked environmental and health data for analysis within an anonymised safe haven; analyse the linked dataset for our study investigating: Cognition, Respiratory Tract illness and Effects of eXposure (CORTEX). MethodsAnonymised, routinely collected health and education data were linked with high spatial resolution pollution measurements and daily pollen measurements to provide repeated cross-sectional cohorts (2009-2015) on 18,241 pupils across the city of Cardiff, using the SAIL databank. A fully adjusted multilevel linear regression analysis examined associations between health status and/or air quality. Cohort, school and individual level confounders were controlled for. We hope that using individual-level multi-location daily exposure assessment will help to clarify the role of traffic and prevent potential community-level confounding. Combined effects of air quality on variation in educational attainment between those treated for asthma and/or Severe Allergic Rhinitis (SAR), and those not treated, was also investigated. FindingsAsthma was not associated with exam performance (p=0.7). However, SAR was positively associated with exam performance (p<0.001). Exposure to air pollution was negatively associated with educational attainment regardless of health status. ConclusionsIrrespective of health status, air quality was negatively associated with educational attainment. Treatment seeking behaviour may explain the positive association between SAR and educational attainment. For a more accurate reflection of health status, health outcomes not subject to treatment seeking behaviours, such as emergency hospital admission, should be investigated.