scispace - formally typeset
Search or ask a question

Showing papers by "Xihong Lin published in 2022"


Journal ArticleDOI
TL;DR: The results imply that rare variants, in particular those in regions of low linkage disequilibrium, are a major source of the still missing heritability of complex traits and disease.

72 citations


Journal ArticleDOI
TL;DR: It is suggested that in NSCLC, a high number of nonsynonymous tumor mutations is associated with immune cell infiltration and inflammatory T-cell expression signatures, leading to increased sensitivity to PD-1/PD-L1 inhibition across PD-L 1 expression subgroups.
Abstract: Key Points Question Is tumor mutation burden (TMB) associated with improved outcomes of programmed cell death–1 (PD-1)/programmed death ligand–1 (PD-L1) inhibition across PD-L1 expression levels in non–small cell lung cancer (NSCLC)? Findings In this cohort study of 1552 patients with NSCLC, the group with high TMB had improved response rates and survival after receiving PD-1/PD-L1 inhibition therapy across PD-L1 expression subgroups compared with the group with low TMB. High TMB levels were associated with increased CD8-positive T-cell infiltration and distinct immune response gene expression signatures. Meaning These findings suggest that in NSCLC, a high number of nonsynonymous tumor mutations is associated with immune cell infiltration and inflammatory T-cell expression signatures, leading to increased sensitivity to PD-1/PD-L1 inhibition across PD-L1 expression subgroups.

57 citations


Journal ArticleDOI
TL;DR: The Functional Annotation of Variants Online Resources (FAVOR) is developed, a comprehensive online multi-faceted portal with summarization and visualization of all possible 9 billion single nucleotide variants across the genome, and allows for rapid variant-, gene-, and region-level online queries.
Abstract: Large-scale whole genome sequencing (WGS) studies and biobanks are rapidly generating a multitude of coding and non-coding variants. They provide an unprecedented resource for illuminating the genetic basis of human diseases. Variant functional annotations play a critical role in WGS analysis, result interpretation, and prioritization of disease- or trait-associated causal variants. Existing functional annotation databases have limited scope to perform online queries or are unable to functionally annotate the genotype data of large WGS studies and biobanks for downstream analysis. We develop the Functional Annotation of Variants Online Resources (FAVOR) to meet these pressing needs. FAVOR provides a comprehensive online multi-faceted portal with summarization and visualization of all possible 9 billion single nucleotide variants (SNVs) across the genome, and allows for rapid variant-, gene-, and region-level online queries. It integrates variant functional information from multiple sources to describe the functional characteristics of variants and facilitates prioritizing plausible causal variants influencing human phenotypes. Furthermore, a scalable annotation tool, FAVORannotator, is provided for functionally annotating and efficiently storing the genotype and variant functional annotation data of a large-scale sequencing study in an annotated GDS file format to facilitate downstream analysis. FAVOR and FAVORannotator are available at https://favor.genohub.org.

15 citations


Journal ArticleDOI
TL;DR: This paper performed cross-ancestry genome-wide association studies in European, East Asian and African populations to identify new susceptibility loci to lung cancer among diverse populations, and discovered five loci that have not been previously reported.
Abstract: To identify new susceptibility loci to lung cancer among diverse populations, we performed cross-ancestry genome-wide association studies in European, East Asian and African populations and discovered five loci that have not been previously reported. We replicated 26 signals and identified 10 new lead associations from previously reported loci. Rare-variant associations tended to be specific to populations, but even common-variant associations influencing smoking behavior, such as those with CHRNA5 and CYP2A6, showed population specificity. Fine-mapping and expression quantitative trait locus colocalization nominated several candidate variants and susceptibility genes such as IRF4 and FUBP1. DNA damage assays of prioritized genes in lung fibroblasts indicated that a subset of these genes, including the pleiotropic gene IRF4, potentially exert effects by promoting endogenous DNA damage. A cross-ancestry genome-wide association meta-analysis of lung cancer including 61,047 cases and 947,237 controls identifies five new cross-ancestry susceptibility loci and highlights ancestry-specific effects of common and rare variants on lung cancer risk.

13 citations


Journal ArticleDOI
TL;DR: In this paper , a computationally efficient and robust noncoding rare-variant association detection framework, STAARpipeline, was proposed to automatically annotate a whole-genome sequencing study and perform flexible non-coding RV association analysis, including gene-centric analysis and fixed window based and dynamic window-based non-genecentric analysis by incorporating variant functional annotations.
Abstract: Large-scale whole-genome sequencing studies have enabled analysis of noncoding rare-variant (RV) associations with complex human diseases and traits. Variant-set analysis is a powerful approach to study RV association. However, existing methods have limited ability in analyzing the noncoding genome. We propose a computationally efficient and robust noncoding RV association detection framework, STAARpipeline, to automatically annotate a whole-genome sequencing study and perform flexible noncoding RV association analysis, including gene-centric analysis and fixed window-based and dynamic window-based non-gene-centric analysis by incorporating variant functional annotations. In gene-centric analysis, STAARpipeline uses STAAR to group noncoding variants based on functional categories of genes and incorporate multiple functional annotations. In non-gene-centric analysis, STAARpipeline uses SCANG-STAAR to incorporate dynamic window sizes and multiple functional annotations. We apply STAARpipeline to identify noncoding RV sets associated with four lipid traits in 21,015 discovery samples from the Trans-Omics for Precision Medicine (TOPMed) program and replicate several of them in an additional 9,123 TOPMed samples. We also analyze five non-lipid TOPMed traits. STAARpipeline is a comprehensive framework for flexible and scalable rare-variant association analysis using whole-genome sequencing data and annotation information.

13 citations


Journal ArticleDOI
TL;DR: In this article , an unsupervised multivariate mixed-model framework capable of integrating annotations of diverse origin was proposed to assess multi-dimensional functional roles for both coding and non-coding variants.
Abstract: Attempts to identify and prioritize functional DNA elements in coding and non-coding regions, particularly through use of in silico functional annotation data, continue to increase in popularity. However, specific functional roles can vary widely from one variant to another, making it challenging to summarize different aspects of variant function with a one-dimensional rating. Here we propose multi-dimensional annotation-class integrative estimation (MACIE), an unsupervised multivariate mixed-model framework capable of integrating annotations of diverse origin to assess multi-dimensional functional roles for both coding and non-coding variants. Unlike existing one-dimensional scoring methods, MACIE views variant functionality as a composite attribute encompassing multiple characteristics and estimates the joint posterior functional probabilities of each genomic position. This estimate offers more comprehensive and interpretable information in the presence of multiple aspects of functionality. Applied to a variety of independent coding and non-coding datasets, MACIE demonstrates powerful and robust performance in discriminating between functional and non-functional variants. We also show an application of MACIE to fine-mapping and heritability enrichment analysis by using the lipids GWAS summary statistics data from the European Network for Genetic and Genomic Epidemiology Consortium. Attempts to identify and prioritize functional DNA elements in coding and non-coding regions, particularly through use of in silico functional annotation data, continue to increase in popularity. However, specific functional roles can vary widely from one variant to another, making it challenging to summarize different aspects of variant function with a one-dimensional rating. Here we propose multi-dimensional annotation-class integrative estimation (MACIE), an unsupervised multivariate mixed-model framework capable of integrating annotations of diverse origin to assess multi-dimensional functional roles for both coding and non-coding variants. Unlike existing one-dimensional scoring methods, MACIE views variant functionality as a composite attribute encompassing multiple characteristics and estimates the joint posterior functional probabilities of each genomic position. This estimate offers more comprehensive and interpretable information in the presence of multiple aspects of functionality. Applied to a variety of independent coding and non-coding datasets, MACIE demonstrates powerful and robust performance in discriminating between functional and non-functional variants. We also show an application of MACIE to fine-mapping and heritability enrichment analysis by using the lipids GWAS summary statistics data from the European Network for Genetic and Genomic Epidemiology Consortium.

11 citations


Journal ArticleDOI
TL;DR: The authors performed a whole genome association study of 2,291 metabolite peaks (known and unknown features) in 2,466 Black individuals from the Jackson Heart Study and identified 519 locus-metabolite associations for 427 metabolites and validated their findings in two multi-ethnic cohorts.
Abstract: Integrating genetic information with metabolomics has provided new insights into genes affecting human metabolism. However, gene-metabolite integration has been primarily studied in individuals of European Ancestry, limiting the opportunity to leverage genomic diversity for discovery. In addition, these analyses have principally involved known metabolites, with the majority of the profiled peaks left unannotated. Here, we perform a whole genome association study of 2,291 metabolite peaks (known and unknown features) in 2,466 Black individuals from the Jackson Heart Study. We identify 519 locus-metabolite associations for 427 metabolite peaks and validate our findings in two multi-ethnic cohorts. A significant proportion of these associations are in ancestry specific alleles including findings in APOE, TTR and CD36. We leverage tandem mass spectrometry to annotate unknown metabolites, providing new insight into hereditary diseases including transthyretin amyloidosis and sickle cell disease. Our integrative omics approach leverages genomic diversity to provide novel insights into diverse cardiometabolic diseases.

9 citations


Journal ArticleDOI
TL;DR: MetaSTAAR as mentioned in this paper is a powerful and resource-efficient rare variant meta-analysis framework for large-scale whole genome sequencing/whole exome sequencing (WGS/WES) studies.
Abstract: Meta-analysis of whole genome sequencing/whole exome sequencing (WGS/WES) studies provides an attractive solution to the problem of collecting large sample sizes for discovering rare variants associated with complex phenotypes. Existing rare variant meta-analysis approaches are not scalable to biobank-scale WGS data. Here we present MetaSTAAR, a powerful and resource-efficient rare variant meta-analysis framework for large-scale WGS/WES studies. MetaSTAAR accounts for relatedness and population structure, can analyze both quantitative and dichotomous traits and boosts the power of rare variant tests by incorporating multiple variant functional annotations. Through meta-analysis of four lipid traits in 30,138 ancestrally diverse samples from 14 studies of the Trans Omics for Precision Medicine (TOPMed) Program, we show that MetaSTAAR performs rare variant meta-analysis at scale and produces results comparable to using pooled data. Additionally, we identified several conditionally significant rare variant associations with lipid traits. We further demonstrate that MetaSTAAR is scalable to biobank-scale cohorts through meta-analysis of TOPMed WGS data and UK Biobank WES data of ~200,000 samples.

4 citations


Journal ArticleDOI
01 Feb 2022
TL;DR: In this article , the authors developed a method, ancestry-specific allele frequency estimation in admixed populations (AFA), to estimate the frequencies of biallelic variants in an admixed population with an unlimited number of ancestries.
Abstract: Allele frequency estimates in admixed populations, such as Hispanics and Latinos, rely on the sample's specific admixture composition and thus may differ between two seemingly similar populations. However, ancestry-specific allele frequencies, i.e., pertaining to the ancestral populations of an admixed group, may be particularly useful for prioritizing genetic variants for genetic discovery and personalized genomic health. We developed a method, ancestry-specific allele frequency estimation in admixed populations (AFA), to estimate the frequencies of biallelic variants in admixed populations with an unlimited number of ancestries. AFA uses maximum-likelihood estimation by modeling the conditional probability of having an allele given proportions of genetic ancestries. It can be applied using either local ancestry interval proportions encompassing the variant (local-ancestry-specific allele frequency estimations in admixed populations [LAFAs]) or global proportions of genetic ancestries (global-ancestry-specific allele frequency estimations in admixed populations [GAFAs]), which are easier to compute and are more widely available. Simulations and comparisons to existing software demonstrated the high accuracy of the method. We implemented AFA on high-quality imputed data of ∼9,000 Hispanics and Latinos from the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), an understudied, admixed population with three predominant continental ancestries: Amerindian, European, and African. Comparison of the European and African estimated frequencies to the respective gnomAD frequencies demonstrated high correlations (Pearson R2 = 0.97-0.99). We provide a genome-wide dataset of the estimated ancestry-specific allele frequencies for available variants with allele frequency between 5% and 95% in at least one of the three ancestral populations. Association analysis of Amerindian-enriched variants with cardiometabolic traits identified five loci associated with lipid traits in Hispanics and Latinos, demonstrating the utility of ancestry-specific allele frequencies in admixed populations.

4 citations


Journal ArticleDOI
TL;DR: Rare variants in Caveolin-1, a membrane scaffolding protein essential in multiple cellular and metabolic functions, are associated with higher CAV1 gene expression and lower OSA severity, suggesting a novel target for modulating Osa severity.
Abstract: INTRODUCTION Obstructive sleep apnea (OSA) is a common disorder associated with increased risk for cardiovascular disease, diabetes, and premature mortality. There is strong clinical and epi-demiologic evidence supporting the importance of genetic factors influencing OSA, but limited data implicating specific genes. METHODS Leveraging high depth genomic sequencing data from the National Heart, Lung, and Blood Institute (NHLBI) Trans-Omics for Precision Medicine (TOPMed) program and imputed genotype data from multiple population-based studies, we performed linkage analysis in the Cleve-land Family Study (CFS) followed by multi-stage gene-based association analyses in independent cohorts to search for rare variants contributing to OSA severity as assessed by the apnea-hypopnea index (AHI) in a total of 7,708 individuals of European ancestry. RESULTS Linkage analysis in CFS identified a suggestive linkage peak on chromosome 7q31 (LOD=2.31). Gene-based analysis identified 21 non-coding rare variants in Caveolin-1 (CAV1) associated with lower AHI after accounting for multiple comparisons (p=7.4×10-8). These non-coding variants together significantly contributed to the linkage evidence (p<10-3). Follow-up anal-ysis revealed significant associations between these variants and increased CAV1 expression, and increased CAV1 expression in peripheral monocytes was associated with lower AHI (p=0.024) and higher minimum overnight oxygen saturation (p=0.007). CONCLUSION Rare variants in CAV1, a membrane scaffolding protein essential in multiple cellular and metabolic functions, are associated with higher CAV1 gene expression and lower OSA severity, suggesting a novel target for modulating OSA severity.

3 citations


Journal ArticleDOI
Daniel DiCorpo, Sheila M. Gaynor, Emily M. Russell, Kenneth Westerman, Laura M. Raffield, Timothy D. Majarian, Peitao Wu, Chloé Sarnowski, Heather M. Highland, Anne U. Jackson, Natalie R Hasbani, Paul S. de Vries, Jennifer A. Brody, Bertha Hidalgo, Xiuqing Guo, James A. Perry, Jeffrey R. O'Connell, Samantha Lent, May E. Montasser, Brian E. Cade, Deepti Jain, Heming Wang, Ricardo D’Oliveira Albanus, Arushi Varshney, Lisa R. Yanek, Leslie A. Lange, Nicholette D. Palmer, Marcio Almeida, Juan M. Peralta, Stella Aslibekyan, Abigail S. Baldridge, Alain G. Bertoni, Lawrence F. Bielak, Chung-Shiuan Chen, Yii-Der Ida Chen, Won Jung Choi, Mark O. Goodarzi, James S. Floyd, Marguerite R. Irvin, Rita Kalyani, Tanika N. Kelly, Seonwook Lee, Ching-Ti Liu, Douglas Loesch, JoAnn E. Manson, Ryan L. Minster, Take Naseri, James S. Pankow, Laura J. Rasmussen-Torvik, Alexander P. Reiner, Muagututi‘a Sefuiva Reupena, Elizabeth Selvin, Jennifer A. Smith, Daniel E. Weeks, Huichun Xu, Jie Yao, Wei Zhao, Stephen C. J. Parker, Álvaro Alonso, Donna K. Arnett, John Blangero, Eric Boerwinkle, Adolfo Correa, L. Adrienne Cupples, Joanne E. Curran, Ravindranath Duggirala, Jiang He, Susan R. Heckbert, Sharon L.R. Kardia, Ryan W. Kim, Charles Kooperberg, Simin Liu, Rasika A. Mathias, Stephen T. McGarvey, Braxton D. Mitchell, Alanna C. Morrison, Patricia A. Peyser, Bruce M. Psaty, Susan Redline, Alan R. Shuldiner, Kent D. Taylor, Ramachandran S. Vasan, Karine A. Viaud-Martinez, JC Florez, James F. Wilson, Robert Sladek, Stephen S. Rich, Jerome I. Rotter, Xihong Lin, Josée Dupuis, James B. Meigs, Jennifer Wessel, Alisa K. Manning 
28 Jul 2022
TL;DR: The genetic determinants of fasting glucose (FG) and fasting insulin (FI) have been studied mostly through genome arrays, resulting in over 100 associated variants as discussed by the authors , and the authors extended this work with high-coverage whole genome sequencing analyses from fifteen cohorts in NHLBI's Trans-Omics for Precision Medicine (TOPMed) program.
Abstract: The genetic determinants of fasting glucose (FG) and fasting insulin (FI) have been studied mostly through genome arrays, resulting in over 100 associated variants. We extended this work with high-coverage whole genome sequencing analyses from fifteen cohorts in NHLBI's Trans-Omics for Precision Medicine (TOPMed) program. Over 23,000 non-diabetic individuals from five race-ethnicities/populations (African, Asian, European, Hispanic and Samoan) were included. Eight variants were significantly associated with FG or FI across previously identified regions MTNR1B, G6PC2, GCK, GCKR and FOXA2. We additionally characterize suggestive associations with FG or FI near previously identified SLC30A8, TCF7L2, and ADCY5 regions as well as APOB, PTPRT, and ROBO1. Functional annotation resources including the Diabetes Epigenome Atlas were compiled for each signal (chromatin states, annotation principal components, and others) to elucidate variant-to-function hypotheses. We provide a catalog of nucleotide-resolution genomic variation spanning intergenic and intronic regions creating a foundation for future sequencing-based investigations of glycemic traits.

Journal ArticleDOI
TL;DR: COVID-19 Spread Mapper is described, a unified framework for estimating and quantifying the uncertainty in the smoothed daily effective reproduction number, case rate, and death rate in a region using log-linear models that is critical to evaluate the impact of CO VID-19 and make informed policy decisions.
Abstract: SUMMARY Amidst the continuing spread of COVID-19, real-time data analysis and visualization remain critical the general public to track the pandemic's impact and to inform policy making by officials. Multiple metrics permit the evaluation of the spread, infection, and mortality of infectious diseases. For example, numbers of new cases and deaths provide easily interpretable measures of absolute impact within a given population and time frame, while the effective reproduction rate provides an epidemiological measure of the rate of spread. By evaluating multiple metrics concurrently, users can leverage complementary insights into the impact and current state of the pandemic when formulating prevention and safety plans for oneself and others. We describe COVID-19 Spread Mapper, a unified framework for estimating and quantifying the uncertainty in the smoothed daily effective reproduction number, case rate, and death rate in a region using log-linear models. We apply this framework to characterize COVID-19 impact at multiple geographic resolutions, including by US county and state as well as by country, demonstrating the variation across resolutions and the need for harmonized efforts to control the pandemic. We provide an open-source online dashboard for real-time analysis and visualization of multiple key metrics, which are critical to evaluate the impact of COVID-19 and make informed policy decisions. AVAILABILITY AND IMPLEMENTATION Our model and tool are publicly available as implemented in R and hosted at https://metrics.covid19-analysis.org/. The source code is freely available from https://github.com/lin-lab/COVID19-Rt and https://github.com/lin-lab/COVID19-Viz. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Journal ArticleDOI
TL;DR: This work proposes Surrogate Phenotype Regression Analysis (Spray) for leveraging information from a correlated surrogate outcome to improve inference on a partially missing target outcome and describes and implements an expectation conditional maximization algorithm for performing estimation in the presence of bilateral outcome missingness.
Abstract: Sample sizes vary substantially across tissues in the Genotype-Tissue Expression (GTEx) project, where considerably fewer samples are available from certain inaccessible tissues, such as the substantia nigra (SSN), than from accessible tissues, such as blood. This severely limits power for identifying tissue-specific expression quantitative trait loci (eQTL) in undersampled tissues. Here we propose Surrogate Phenotype Regression Analysis (Spray) for leveraging information from a correlated surrogate outcome (e.g. expression in blood) to improve inference on a partially missing target outcome (e.g. expression in SSN). Rather than regarding the surrogate outcome as a proxy for the target outcome, Spray jointly models the target and surrogate outcomes within a bivariate regression framework. Unobserved values of either outcome are treated as missing data. We describe and implement an expectation conditional maximization algorithm for performing estimation in the presence of bilateral outcome missingness. Spray estimates the same association parameter estimated by standard eQTL mapping and controls the type I error even when the target and surrogate outcomes are truly uncorrelated. We demonstrate analytically and empirically, using simulations and GTEx data, that in comparison with marginally modeling the target outcome, jointly modeling the target and surrogate outcomes increases estimation precision and improves power.

Journal ArticleDOI
TL;DR: In this article , the authors constructed twenty-nine tissue-specific eQTL networks using GTEx data and evaluated a comprehensive set of network specifications based on false discovery rates, test statistics, and p values, focusing on the degree centrality.
Abstract: Expression quantitative trait locus (eQTL) analysis associates SNPs with gene expression; these relationships can be represented as a bipartite network with association strength as “edge weights” between SNPs and genes. However, most eQTL networks use binary edge weights based on thresholded FDR estimates: definitions that influence reproducibility and downstream analyses. We constructed twenty-nine tissue-specific eQTL networks using GTEx data and evaluated a comprehensive set of network specifications based on false discovery rates, test statistics, and p values, focusing on the degree centrality—a metric of an SNP or gene node’s potential network influence. We found a thresholded Benjamini-Hochberg q value weighted by the Z-statistic balances metric reproducibility and computational efficiency. Our estimated gene degrees positively correlate with gene degrees in gene regulatory networks, demonstrating that these networks are complementary in understanding regulation. Gene degrees also correlate with genetic diversity, and heritability analyses show that highly connected nodes are enriched for tissue-relevant traits.

Journal ArticleDOI
TL;DR: In this article , the authors proposed an efficient and accurate frailty model approach for genome-wide survival association analysis of censored time-to-event (TTE) phenotypes by accounting for both population structure and relatedness.
Abstract: With decades of electronic health records linked to genetic data, large biobanks provide unprecedented opportunities for systematically understanding the genetics of the natural history of complex diseases. Genome-wide survival association analysis can identify genetic variants associated with ages of onset, disease progression and lifespan. We propose an efficient and accurate frailty model approach for genome-wide survival association analysis of censored time-to-event (TTE) phenotypes by accounting for both population structure and relatedness. Our method utilizes state-of-the-art optimization strategies to reduce the computational cost. The saddlepoint approximation is used to allow for analysis of heavily censored phenotypes (>90%) and low frequency variants (down to minor allele count 20). We demonstrate the performance of our method through extensive simulation studies and analysis of five TTE phenotypes, including lifespan, with heavy censoring rates (90.9% to 99.8%) on ~400,000 UK Biobank participants with white British ancestry and ~180,000 individuals in FinnGen. We further analyzed 871 TTE phenotypes in the UK Biobank and presented the genome-wide scale phenome-wide association results with the PheWeb browser.

Posted ContentDOI
14 Dec 2022-bioRxiv
TL;DR: SynSurr as mentioned in this paper is an approach that jointly analyzes an incompletely observed target phenotype together with its predicted value from an ML model, referred to its prediction as a synthetic surrogate for the target phenotype.
Abstract: While population biobanks have dramatically expanded opportunities for genomewide association studies (GWAS), these large-scale analyses bring new statistical challenges. A key bottleneck is that phenotypes of interest are often partially missing. For example, phenotypes derived from specialized imaging modalities are often only measured for a subset of the cohort. Fortunately, biobanks contain surrogate phenotype information, in the form of routinely collected clinical data, that can often be leveraged to build machine learning (ML) models that accurately predict missing values of the target phenotype. However, simply imputing the missing values of the target phenotype can invalidate subsequent statistical inference. To address this significant barrier, we introduce SynSurr, an approach that jointly analyzes an incompletely observed target phenotype together with its predicted value from an ML model. As the ML model can combine or synthesize multiple sources of evidence to infer the missing phenotypic values, we refer to its prediction as a “synthetic surrogate” for the target phenotype. SynSurr estimates the same effect size as a standard GWAS of the target phenotype, but does so with increased power when the synthetic surrogate is correlated with the target phenotype. Unlike classical imputation, SynSurr does not require that the synthetic surrogate is obtained from a correctly specified generative model, only that it is correlated with the target outcome. SynSurr is also computationally feasible at biobank scale and has been implemented in the open source R package SurrogateRegression. In a genome-wide ablation analysis of 2 well-studied traits from the UK Biobank (UKBB), SynSurr consistently recovered more of the associations present in the full sample than standard GWAS using the observed target phenotypes. When applied to 6 incompletely measured body composition phenotypes from the UKBB, SynSurr identified 15.6 times as many genome-wide significant associations than standard GWAS, on average, and did so at 2.9 times the level of significance. These associations were highly enriched for biologically relevant gene sets and overlapped substantially with known body composition associations from the GWAS catalog.

Journal ArticleDOI
01 Feb 2022-BMJ Open
TL;DR: Simple risk scores using age, sex, a complete blood cell count, CRP and D-dimer were highly predictive of AKI and death and can help simplify and better inform clinical decision making.
Abstract: Objective To develop simple but clinically informative risk stratification tools using a few top demographic factors and biomarkers at COVID-19 diagnosis to predict acute kidney injury (AKI) and death. Design Retrospective cohort analysis, follow-up from 1 February through 28 May 2020. Setting 3 teaching hospitals, 2 urban and 1 community-based in the Boston area. Participants Eligible patients were at least 18 years old, tested COVID-19 positive from 1 February through 28 May 2020, and had at least two serum creatinine measurements within 30 days of a new COVID-19 diagnosis. Exclusion criteria were having chronic kidney disease or having a previous AKI within 3 months of a new COVID-19 diagnosis. Main outcomes and measures Time from new COVID-19 diagnosis until AKI event, time until death event. Results Among 3716 patients, there were 1855 (49.9%) males and the average age was 58.6 years (SD 19.2 years). Age, sex, white blood cell, haemoglobin, platelet, C reactive protein (CRP) and D-dimer levels were most strongly associated with AKI and/or death. We created risk scores using these variables predicting AKI within 3 days and death within 30 days of a new COVID-19 diagnosis. Area under the curve (AUC) for predicting AKI within 3 days was 0.785 (95% CI 0.758 to 0.813) and AUC for death within 30 days was 0.861 (95% CI 0.843 to 0.878). Haemoglobin was the most predictive component for AKI, and age the most predictive for death. Predictive accuracies using all study variables were similar to using the simplified scores. Conclusion Simple risk scores using age, sex, a complete blood cell count, CRP and D-dimer were highly predictive of AKI and death and can help simplify and better inform clinical decision making.

Journal ArticleDOI
TL;DR: This work describes how correlation structures generate marked differences in relative operating characteristics for settings (a) and (b) and develops novel power bounds that facilitate the aforementioned calculations and allow for analysis of individual testing settings.
Abstract: Set-based association tests are widely popular in genetic association settings for their ability to aggregate weak signals and reduce multiple testing burdens. In particular, a class of set-based tests including the Higher Criticism, Berk-Jones, and other statistics have recently been popularized for reaching a so-called detection boundary when signals are rare and weak. Such tests have been applied in two subtly different settings: (a) associating a genetic variant set with a single phenotype and (b) associating a single genetic variant with a phenotype set. A significant issue in practice is the choice of test, especially when deciding between innovated and generalized type methods for detection boundary tests. Conflicting guidance is present in the literature. This work describes how correlation structures generate marked differences in relative operating characteristics for settings (a) and (b). The implications for study design are significant. We also develop novel power bounds that facilitate the aforementioned calculations and allow for analysis of individual testing settings. In more concrete terms, our investigation is motivated by translational expression quantitative trait loci (eQTL) studies in lung cancer. These studies involve both testing for groups of variants associated with a single gene expression (multiple explanatory factors) and testing whether a single variant is associated with a group of gene expressions (multiple outcomes). Results are supported by a collection of simulation studies and illustrated through lung cancer eQTL examples.

Journal ArticleDOI
Yu Xie, Xihong Lin, Jun Li, Qian He, Junming Huang 
TL;DR: Wang et al. as discussed by the authors revealed the widespread fear among scientists of Chinese descent in the United States arising from conducting routine research and academic activities and pointed out that if this fear is not alleviated, there are significant risks of underutilization and losing scientific talent to China and other countries.
Abstract: Significance Our study reveals the widespread fear among scientists of Chinese descent in the United States arising from conducting routine research and academic activities. If this fear is not alleviated, there are significant risks of underutilization of scientific talent as well as losing scientific talent to China and other countries. Addressing the fear of US-based scientists of Chinese descent and making the American academic environment welcoming and attractive to all will help retain and attract scientific talent and strengthen the US global leadership in science and technology in the long run.