scispace - formally typeset
Search or ask a question

Showing papers by "James J. Lee published in 2014"


Journal ArticleDOI
TL;DR: PLINK as discussed by the authors is a C/C++ toolset for genome-wide association studies (GWAS) and research in population genetics, which has been widely used in the literature.
Abstract: PLINK 1 is a widely used open-source C/C++ toolset for genome-wide association studies (GWAS) and research in population genetics. However, the steady accumulation of data from imputation and whole-genome sequencing studies has exposed a strong need for even faster and more scalable implementations of key functions. In addition, GWAS and population-genetic data now frequently contain probabilistic calls, phase information, and/or multiallelic variants, none of which can be represented by PLINK 1's primary data format. To address these issues, we are developing a second-generation codebase for PLINK. The first major release from this codebase, PLINK 1.9, introduces extensive use of bit-level parallelism, O(sqrt(n))-time/constant-space Hardy-Weinberg equilibrium and Fisher's exact tests, and many other algorithmic improvements. In combination, these changes accelerate most operations by 1-4 orders of magnitude, and allow the program to handle datasets too large to fit in RAM. This will be followed by PLINK 2.0, which will introduce (a) a new data format capable of efficiently representing probabilities, phase, and multiallelic variants, and (b) extensions of many functions to account for the new types of information. The second-generation versions of PLINK will offer dramatic improvements in performance and compatibility. For the first time, users without access to high-end computing resources can perform several essential analyses of the feature-rich and very large genetic datasets coming into use.

3,513 citations


23 Sep 2014
TL;DR: This work identifies several common genetic variants associated with cognitive performance using a two-stage approach: a genome-wide association study of educational attainment to generate a set of candidates, and then the association of these variants with Cognitive performance is estimated.
Abstract: We identify common genetic variants associated with cognitive performance using a two-stage approach, which we call the proxyphenotype method. First, we conduct a genome-wide association study of educational attainment in a large sample (n = 106,736), which produces a set of 69 education-associated SNPs. Second, using independent samples (n = 24,189), we measure the association of these education-associated SNPs with cognitive performance. Three SNPs (rs1487441, rs7923609, and rs2721173) are significantly associated with cognitive performance after correction for multiple hypothesis testing. In an independent sample of older Americans (n = 8,652), we also show that a polygenic score derived from the education-associated SNPs is associated with memory and absence of dementia. Convergent evidence from a set of bioinformatics analyses implicates four specific genes (KNCMA1, NRXN1, POU2F3, and SCRT). All of these genes are associated with a particular neurotransmitter pathway involved in synaptic plasticity, the main cellular mechanism for learning and memory.

237 citations


Journal ArticleDOI
TL;DR: This article identified common genetic variants associated with cognitive performance using a two-stage approach, which they call the proxy-phenotype method, and measured the association of these education-associated SNPs with the cognitive performance.
Abstract: We identify common genetic variants associated with cognitive performance using a two-stage approach, which we call the proxy-phenotype method. First, we conduct a genome-wide association study of educational attainment in a large sample (n = 106,736), which produces a set of 69 education-associated SNPs. Second, using independent samples (n = 24,189), we measure the association of these education-associated SNPs with cognitive performance. Three SNPs (rs1487441, rs7923609, and rs2721173) are significantly associated with cognitive performance after correction for multiple hypothesis testing. In an independent sample of older Americans (n = 8,652), we also show that a polygenic score derived from the education-associated SNPs is associated with memory and absence of dementia. Convergent evidence from a set of bioinformatics analyses implicates four specific genes (KNCMA1, NRXN1, POU2F3, and SCRT). All of these genes are associated with a particular neurotransmitter pathway involved in synaptic plasticity, the main cellular mechanism for learning and memory.

229 citations


Journal ArticleDOI
TL;DR: Practical measures of signal recovery are robust to linkage disequilibrium between a true causal variant and markers residing in the same genomic region and this approach to the GWAS analysis of height is applied.
Abstract: The aim of a genome-wide association study (GWAS) is to isolate DNA markers for variants affecting phenotypes of interest. This is constrained by the fact that the number of markers often far exceeds the number of samples. Compressed sensing (CS) is a body of theory regarding signal recovery when the number of predictor variables (i.e., genotyped markers) exceeds the sample size. Its applicability to GWAS has not been investigated. Using CS theory, we show that all markers with nonzero coefficients can be identified (selected) using an efficient algorithm, provided that they are sufficiently few in number (sparse) relative to sample size. For heritability equal to one (h 2 = 1), there is a sharp phase transition from poor performance to complete selection as the sample size is increased. For heritability below one, complete selection still occurs, but the transition is smoothed. We find for h 2 ∼ 0.5 that a sample size of approximately thirty times the number of markers with nonzero coefficients is sufficient for full selection. This boundary is only weakly dependent on the number of genotyped markers. Practical measures of signal recovery are robust to linkage disequilibrium between a true causal variant and markers residing in the same genomic region. Given a limited sample size, it is possible to discover a phase transition by increasing the penalization; in this case a subset of the support may be recovered. Applying this approach to the GWAS analysis of height, we show that 70-100% of the selected markers are strongly correlated with height-associated markers identified by the GIANT Consortium.

55 citations


Journal ArticleDOI
TL;DR: It is shown that the novel method for estimating a lower bound on heritability directly from GWAS data that uses realized genetic similarities between nominally unrelated individuals should remain a robust means of obtaining hSNP2 under circumstances wider than those under which it has so far been derived.
Abstract: The heritability of a trait (h2) is the proportion of its population variance caused by genetic differences, and estimates of this parameter are important for interpreting the results of genome-wide association studies (GWAS). In recent years, researchers have adopted a novel method for estimating a lower bound on heritability directly from GWAS data that uses realized genetic similarities between nominally unrelated individuals. The quantity estimated by this method is purported to be the contribution to heritability that could in principle be recovered from association studies employing the given panel of SNPs (\(h_{\text{SNP}}^{2}\)). Thus far, the validity of this approach has mostly been tested empirically. Here, we provide a mathematical explication and show that the method should remain a robust means of obtaining \(h_{\text{SNP}}^{2}\) under circumstances wider than those under which it has so far been derived.

47 citations


Journal ArticleDOI
TL;DR: There is evidence of anti-tumor activity with V comparable to that of other PARP inhibitors in the BRCA+ population, although there was evidence of benefit in a small number of patients.
Abstract: 135 Background: Veliparib (V) (ABT-888) is an oral, potent inhibitor of PARP 1/2. PARP inhibitors have preclinical and clinical efficacy in BRCA+ malignancies. There are genotypic and phenotypic similarities between BRCA+ cancers, serous ovarian cancer and basal-like breast cancer and we postulated that these tumors types may be similarly sensitive to single-agent PARP inhibition. This study sought to establish the maximum tolerated dose (MTD), dose -limiting toxicities (DLT), pharmacokinetic and pharmocodynamic properties, and preliminary efficacy of chronically-dosed V in 2 cohorts of patients, BRCA+ and BRCA-wt (consisting of serous ovarian cancer and triple-negative breast cancer (TNBC). Methods: A 3+3 dose escalation phase I trial was performed. Nine dose levels (DL) were planned, and dose escalation started at 50 mg BID to a maximum of 500 mg BID to determine a maximum tolerated dose (MTD) and recommended phase II dose (RP2D). V was administered orally continuously on a 28 day cycle. BRCA+ and BRCA-...

19 citations


Posted ContentDOI
03 Mar 2014-bioRxiv
TL;DR: It is shown that the method for estimating a lower bound on heritability directly from GWAS data that uses realized genetic similarities between nominally unrelated individuals should remain a robust means of obtaining under circumstances wider than those under which it has so far been derived.
Abstract: The heritability of a trait ($h^2$) is the proportion of its population variance caused by genetic differences, and estimates of this parameter are important for interpreting the results of genome-wide association studies (GWAS). In recent years, researchers have adopted a novel method for estimating a lower bound on heritability directly from GWAS data that uses realized genetic similarities between nominally unrelated individuals. The quantity estimated by this method is purported to be the contribution to heritability that could in principle be recovered from association studies employing the given panel of SNPs ($h^2_\textrm{SNP}$). Thus far the validity of this approach has mostly been tested empirically. Here, we provide a mathematical explication and show that the method should remain a robust means of obtaining $h^2_\textrm{SNP}$ under circumstances wider than those under which it has so far been derived.

11 citations