Application of t-SNE to human genetic data.
read more
Citations
A Survey on Explainable Artificial Intelligence (XAI): Toward Medical XAI
The art of using t-SNE for single-cell transcriptomics.
Revealing the vectors of cellular identity with single-cell genomics
UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts.
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integration in Precision Medicine
References
Visualizing Data using t-SNE
Inference of population structure using multilocus genotype data
PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses
A global reference for human genetic variation.
Principal components analysis corrects for stratification in genome-wide association studies
Related Papers (5)
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
Nonlinear dimensionality reduction by locally linear embedding.
Scikit-learn: Machine Learning in Python
Frequently Asked Questions (11)
Q2. What is the purpose of imposing extra constraints in NPCA?
The purpose of imposing extra constraints in NPCA is that positive and negative terms in classic PCA may cancel each other, leading to a loss of local feature.
Q3. Why do the authors want to use t-SNE?
The authors are particularly interested in tSNE’s claimed ability to “reveal structure at many different scales” [28], as major population stratification co-exists with other small-scaled shared evolutionary history among samples.
Q4. What is the main feature of t-SNE?
The main feature of t-SNE, the ability to exhibit structures at multiple scales, is responsible for two observations in this paper.
Q5. how many individuals were used in the phase 3 1000 genome project?
The phase 3 1000 Genomes Project data for 2504 individuals was downloaded from http://www.internationalgenome.org/data.Li et al. 15Genetic distance between samplesKING (Kinship-based INference for Genome-wide association studies), a computationally efficient program to calculate the person-person distance based on genetic data, was used: (http://people.virginia.edu/~wc9c/KING/) [52].t-SNEThe R (http://www.r-project.org/) implementation of the t-SNE, Rtsne version 0.11 (June 30, 2016) (https://cran.r-project.org/web/packages/Rtsne/ or https://github.com/jkrijthe/Rtsne) was used.
Q6. Why is t-SNE used in genetic analysis?
t-SNE has demonstrated its built-in advantages in capturing local data characteristics and revealing subtle data structures in visualization, as shown in the original publication [28].t-SNE is a popular choice in the analysis of single-cell RNA-seq data (e.g. [29, 30, 31]), but has not been applied extensively to genetic data.
Q7. What is the purpose of the study?
ForLi et al. 6that purpose, the authors extract 99 Utah residents with North/West European ancestry (CEU), 91 England/Scotland samples (GBR), and 5 African-American in Southwest (ASW), 5 MexicanAmerican in Los Angeles (MXL), 2 Chinese in Beijing (CHB), 2 Chinese in South China (CHS).
Q8. How many SNPs are in the 1000 Genomes Project?
For that, the authors extract a denser set of SNPs from chromosome 1 in the 1000 Genomes Project with the criteria of alternative/minor allele frequency > 0.2, and spacing between neighboring SNPs > 20, 000 bases.
Q9. What is the computational complexity of t-SNE?
The time computational complexity of standard PCA/SVD is O(min(Np2 + p3, pN2 +N3) where N is the number of samples and p the number of factors [48].
Q10. Why do the America samples have their own cluster?
America samples are the only one which do not form its own cluster, due to the well known admixture between indigenous America population and European colonists.
Q11. What is the main effect of t-SNE?
Another observation that both continental and sub-continental population structures can be viewed in a single t-SNE plot is also a consequence of this property of t-SNE.