Author
Tianwei Yu
Other affiliations: The Chinese University of Hong Kong, University of Illinois at Chicago, University of California, Los Angeles ...read more
Bio: Tianwei Yu is an academic researcher from Emory University. The author has contributed to research in topics: Computer science & Engineering. The author has an hindex of 38, co-authored 132 publications receiving 5577 citations. Previous affiliations of Tianwei Yu include The Chinese University of Hong Kong & University of Illinois at Chicago.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: The nonlinear K-profiles clustering method is designed, which can be seen as the nonlinear counterpart of the K-means clustering algorithm, and has a built-in statistical testing procedure that ensures genes not belonging to any cluster do not impact the estimation of cluster profiles.
Abstract: With modern technologies such as microarray, deep sequencing, and liquid chromatography-mass spectrometry (LC-MS), it is possible to measure the expression levels of thousands of genes/proteins simultaneously to unravel important biological processes. A very first step towards elucidating hidden patterns and understanding the massive data is the application of clustering techniques. Nonlinear relations, which were mostly unutilized in contrast to linear correlations, are prevalent in high-throughput data. In many cases, nonlinear relations can model the biological relationship more precisely and reflect critical patterns in the biological systems. Using the general dependency measure, Distance Based on Conditional Ordered List (DCOL) that we introduced before, we designed the nonlinear K-profiles clustering method, which can be seen as the nonlinear counterpart of the K-means clustering algorithm. The method has a built-in statistical testing procedure that ensures genes not belonging to any cluster do not impact the estimation of cluster profiles. Results from extensive simulation studies showed that K-profiles clustering not only outperformed traditional linear K-means algorithm, but also presented significantly better performance over our previous General Dependency Hierarchical Clustering (GDHC) algorithm. We further analyzed a gene expression dataset, on which K-profile clustering generated biologically meaningful results.
1,005 citations
••
TL;DR: It is found that WS contains more informative proteins, peptides, and mRNA, as compared with gland-specific saliva, that can be used in generating candidate biomarkers for the detection of primary SS.
Abstract: Sjogren’s syndrome (SS), which was first described in 1933 by the Swedish physician Henrik Sjogren (1), is a chronic autoimmune disorder clinically characterized by a dry mouth (xerostomia) and dry eyes (keratoconjunctivitis sicca). The disease primarily affects women, with a ratio of 9:1 over the occurrence in men. While SS affects up to 4 million Americans, about half of the cases are primary SS. Primary SS occurs alone, whereas secondary SS presents in connection with another autoimmune disease, such as rheumatoid arthritis or systemic lupus erythematosus (SLE). Histologically, SS is characterized by infiltration of exocrine gland tissues by predominantly CD4 T lymphocytes. At the molecular level, glandular epithelial cells express high levels of HLA–DR, which has led to the speculation that these cells are presenting antigen (viral antigen or autoantigen) to the invading T cells. Cytokine production follows, with interferon (IFN) and interleukin-2 (IL-2) being especially important. There is also evidence of B cell activation with autoantibody production and an increase in B cell malignancy. SS patients exhibit a 40-fold increased risk of developing lymphoma.
SS is a complex disease that can go undiagnosed for several months to years. Although the underlying immune-mediated glandular destruction is thought to develop slowly over several years, a long delay from the start of symptoms to the final diagnosis has been frequently reported. SS presumably involves the interplay of genetic and environmental factors. To date, few of these factors are well understood. As a result, there is a lack of early diagnostic markers, and diagnosis usually lags symptom onset by years. A new international consensus for the diagnosis of SS requires objective signs and symptoms of dryness, including a characteristic appearance of a biopsy sample from a minor or major salivary gland and/or the presence of autoantibody such as anti-SSA (2–4). However, establishing the diagnosis of primary SS has been difficult in light of its nonspecific symptoms (dry eyes and mouth) and the lack of both sensitive and specific biomarkers, either body fluid– or tissue-based, for its detection. It is widely believed that developing molecular biomarkers for the early diagnosis of primary SS will improve the application of systematic therapies and the setting of criteria with which to monitor therapies and assess prognosis (e.g., lymphoma development).
Saliva is the product of 3 pairs of major salivary glands (the parotid, submandibular, and sublingual glands) and multiple minor salivary glands that lie beneath the oral mucosa. Human saliva contains many informative proteins that can be used for the detection of diseases. Saliva is an attractive diagnostic fluid because testing of saliva provides several key advantages, including low cost, noninvasiveness, and easy sample collection and processing. This biologic fluid has been used for the survey of general health and for the diagnosis of diseases in humans, such as human immunodeficiency virus, periodontal diseases, and autoimmune diseases (5–8). Our laboratory is active in the comprehensive analysis of the saliva proteome (for more information, see www.hspp.ucla.edu), thus providing the technologies and expertise to contrast proteomic constituents in primary SS with those in control saliva (9–11). Thus far, we have identified over 1,000 proteins in whole saliva (WS). In addition, we have recently identified and cataloged ~3,000 messenger RNAs (mRNA) in human WS (12). These studies have provided a solid foundation for the discovery of biomarkers in the saliva of patients with primary SS. We have previously demonstrated proteome- and genome-wide approaches to harnessing saliva protein and mRNA signatures for the detection of oral cancer in humans (13,14).
There have been continuous efforts in the search for biomarkers in human serum or saliva for the diagnosis of primary SS. Some gene products were found at elevated levels in SS patient sera or saliva, including β2-microglobulin (β2m), soluble IL-2 receptor, IL-6, anti-Ro/SSA, anti-La/SSB, and anti–α-fodrin autoantibodies (15–20). However, none of them individually is sensitive or specific enough to use for the confirmative diagnosis of SS (15). Therefore, it is crucial to use emerging proteome- and genome-wide approaches to discover a wide spectrum of informative and discriminatory biomarkers that can be combined to improve the sensitivity and specificity for the detection of primary SS.
355 citations
••
TL;DR: This study provided a transcriptomic signature for OTSCC that may lead to a diagnosis or screen tool and provide the foundation for further functional validation of these specific candidate genes for O TSCC.
Abstract: The head and neck/oral squamous cell carcinoma (HNOSCC) is a diverse group of cancers, which develop from many different anatomic sites and are associated with different risk factors and genetic characteristics. The oral tongue squamous cell carcinoma (OTSCC) is one of the most common types of HNOSCC. It is significantly more aggressive than other forms of HNOSCC, in terms of local invasion and spread. In this study, we aim to identify specific transcriptomic signatures that associated with OTSCC. Genome-wide transcriptomic profiles were obtained for 53 primary OTSCCs and 22 matching normal tissues. Genes that exhibit statistically significant differences in expression between OTSCCs and normal were identified. These include up-regulated genes (MMP1, MMP10, MMP3, MMP12, PTHLH, INHBA, LAMC2, IL8, KRT17, COL1A2, IFI6, ISG15, PLAU, GREM1, MMP9, IFI44, CXCL1), and down-regulated genes (KRT4, MAL, CRNN, SCEL, CRISP3, SPINK5, CLCA4, ADH1B, P11, TGM3, RHCG, PPP1R3C, CEACAM7, HPGD, CFD, ABCA8, CLU, CYP3A5). The expressional difference of IL8 and MMP9 were further validated by real-time quantitative RT-PCR and immunohistochemistry. The Gene Ontology analysis suggested a number of altered biological processes in OTSCCs, including enhancements in phosphate transport, collagen catabolism, I-kappaB kinase/NF-kappaB signaling cascade, extracellular matrix organization and biogenesis, chemotaxis, as well as suppressions of superoxide release, hydrogen peroxide metabolism, cellular response to hydrogen peroxide, keratinization, and keratinocyte differentiation in OTSCCs. In summary, our study provided a transcriptomic signature for OTSCC that may lead to a diagnosis or screen tool and provide the foundation for further functional validation of these specific candidate genes for OTSCC.
309 citations
••
TL;DR: A set of algorithms for the processing of high-resolution LC/MS data, including the adaptive tolerance level searching rather than hard cutoff or binning, the use of non-parametric methods to fine-tune intensity grouping, and the model-based estimation of peak intensities for absolute quantification are presented.
Abstract: Motivation: Liquid chromatography-mass spectrometry (LC/MS) profiling is a promising approach for the quantification of metabolites from complex biological samples. Significant challenges exist in the analysis of LC/MS data, including noise reduction, feature identification/ quantification, feature alignment and computation efficiency.
Result: Here we present a set of algorithms for the processing of high-resolution LC/MS data. The major technical improvements include the adaptive tolerance level searching rather than hard cutoff or binning, the use of non-parametric methods to fine-tune intensity grouping, the use of run filter to better preserve weak signals and the model-based estimation of peak intensities for absolute quantification. The algorithms are implemented in an R package apLCMS, which can efficiently process large LC/ MS datasets.
Availability: The R package apLCMS is available at www.sph.emory.edu/apLCMS.
Contact: ude.yrome.hps@8uyt
Supplementary information: Supplementary data are available at Bioinformatics online.
291 citations
••
TL;DR: The xMSanalyzer program was designed to integrate with existing packages such as apLCMS and XCMS, but the framework can also be used to enhance data extraction for other LC/MS data software.
Abstract: Detection of low abundance metabolites is important for de novo mapping of metabolic pathways related to diet, microbiome or environmental exposures. Multiple algorithms are available to extract m/z features from liquid chromatography-mass spectral data in a conservative manner, which tends to preclude detection of low abundance chemicals and chemicals found in small subsets of samples. The present study provides software to enhance such algorithms for feature detection, quality assessment, and annotation. xMSanalyzer is a set of utilities for automated processing of metabolomics data. The utilites can be classified into four main modules to: 1) improve feature detection for replicate analyses by systematic re-extraction with multiple parameter settings and data merger to optimize the balance between sensitivity and reliability, 2) evaluate sample quality and feature consistency, 3) detect feature overlap between datasets, and 4) characterize high-resolution m/z matches to small molecule metabolites and biological pathways using multiple chemical databases. The package was tested with plasma samples and shown to more than double the number of features extracted while improving quantitative reliability of detection. MS/MS analysis of a random subset of peaks that were exclusively detected using xMSanalyzer confirmed that the optimization scheme improves detection of real metabolites. xMSanalyzer is a package of utilities for data extraction, quality control assessment, detection of overlapping and unique metabolites in multiple datasets, and batch annotation of metabolites. The program was designed to integrate with existing packages such as apLCMS and XCMS, but the framework can also be used to enhance data extraction for other LC/MS data software.
279 citations
Cited by
More filters
01 Jan 2016
TL;DR: The modern applied statistics with s is universally compatible with any devices to read, and is available in the digital library an online access to it is set as public so you can download it instantly.
Abstract: Thank you very much for downloading modern applied statistics with s. As you may know, people have search hundreds times for their favorite readings like this modern applied statistics with s, but end up in harmful downloads. Rather than reading a good book with a cup of coffee in the afternoon, instead they cope with some harmful virus inside their laptop. modern applied statistics with s is available in our digital library an online access to it is set as public so you can download it instantly. Our digital library saves in multiple countries, allowing you to get the most less latency time to download any of our books like this one. Kindly say, the modern applied statistics with s is universally compatible with any devices to read.
5,249 citations
•
TL;DR: A diagnosis of gestational diabetes mellitus (GDM) (diabetes diagnosed in the second or third trimester of pregnancy that is not clearly overt diabetes) or chemical-induced diabetes (such as in the treatment of HIV/AIDS or after organ transplantation)
Abstract: 1. Type 1 diabetes (due to b-cell destruction, usually leading to absolute insulin deficiency) 2. Type 2 diabetes (due to a progressive insulin secretory defect on the background of insulin resistance) 3. Gestational diabetes mellitus (GDM) (diabetes diagnosed in the second or third trimester of pregnancy that is not clearly overt diabetes) 4. Specific types of diabetes due to other causes, e.g., monogenic diabetes syndromes (such as neonatal diabetes and maturity-onset diabetes of the young [MODY]), diseases of the exocrine pancreas (such as cystic fibrosis), and drugor chemical-induced diabetes (such as in the treatment of HIV/AIDS or after organ transplantation)
2,339 citations
••
QIMR Berghofer Medical Research Institute1, Garvan Institute of Medical Research2, University of Queensland3, Royal North Shore Hospital4, University of Western Sydney5, Fremantle Hospital6, Royal Adelaide Hospital7, Princess Alexandra Hospital8, University of Western Australia9, Glasgow Royal Infirmary10, Beatson West of Scotland Cancer Centre11, University of Bergen12, Dresden University of Technology13, Johns Hopkins University School of Medicine14, University of Texas MD Anderson Cancer Center15, Memorial Sloan Kettering Cancer Center16, University of Verona17, University of California, San Francisco18, University of Glasgow19
TL;DR: Genomic instability co-segregated with inactivation of DNA maintenance genes (BRCA1, BRCA2 or PALB2) and a mutational signature of DNA damage repair deficiency, and 4 of 5 individuals with these measures of defective DNA maintenance responded to platinum therapy.
Abstract: Pancreatic cancer remains one of the most lethal of malignancies and a major health burden. We performed whole-genome sequencing and copy number variation (CNV) analysis of 100 pancreatic ductal adenocarcinomas (PDACs). Chromosomal rearrangements leading to gene disruption were prevalent, affecting genes known to be important in pancreatic cancer (TP53, SMAD4, CDKN2A, ARID1A and ROBO2) and new candidate drivers of pancreatic carcinogenesis (KDM6A and PREX2). Patterns of structural variation (variation in chromosomal structure) classified PDACs into 4 subtypes with potential clinical utility: the subtypes were termed stable, locally rearranged, scattered and unstable. A significant proportion harboured focal amplifications, many of which contained druggable oncogenes (ERBB2, MET, FGFR1, CDK6, PIK3R3 and PIK3CA), but at low individual patient prevalence. Genomic instability co-segregated with inactivation of DNA maintenance genes (BRCA1, BRCA2 or PALB2) and a mutational signature of DNA damage repair deficiency. Of 8 patients who received platinum therapy, 4 of 5 individuals with these measures of defective DNA maintenance responded.
2,035 citations
••
TL;DR: This article analyzed multiple compartments of circulating immune memory to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in 254 samples from 188 COVID-19 cases, including 43 samples at ≥ 6 months after infection.
Abstract: Understanding immune memory to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is critical for improving diagnostics and vaccines and for assessing the likely future course of the COVID-19 pandemic. We analyzed multiple compartments of circulating immune memory to SARS-CoV-2 in 254 samples from 188 COVID-19 cases, including 43 samples at ≥6 months after infection. Immunoglobulin G (IgG) to the spike protein was relatively stable over 6+ months. Spike-specific memory B cells were more abundant at 6 months than at 1 month after symptom onset. SARS-CoV-2-specific CD4+ T cells and CD8+ T cells declined with a half-life of 3 to 5 months. By studying antibody, memory B cell, CD4+ T cell, and CD8+ T cell memory to SARS-CoV-2 in an integrated manner, we observed that each component of SARS-CoV-2 immune memory exhibited distinct kinetics.
1,980 citations
01 Mar 2001
TL;DR: Using singular value decomposition in transforming genome-wide expression data from genes x arrays space to reduced diagonalized "eigengenes" x "eigenarrays" space gives a global picture of the dynamics of gene expression, in which individual genes and arrays appear to be classified into groups of similar regulation and function, or similar cellular state and biological phenotype.
Abstract: ‡We describe the use of singular value decomposition in transforming genome-wide expression data from genes 3 arrays space to reduced diagonalized ‘‘eigengenes’’ 3 ‘‘eigenarrays’’ space, where the eigengenes (or eigenarrays) are unique orthonormal superpositions of the genes (or arrays). Normalizing the data by filtering out the eigengenes (and eigenarrays) that are inferred to represent noise or experimental artifacts enables meaningful comparison of the expression of different genes across different arrays in different experiments. Sorting the data according to the eigengenes and eigenarrays gives a global picture of the dynamics of gene expression, in which individual genes and arrays appear to be classified into groups of similar regulation and function, or similar cellular state and biological phenotype, respectively. After normalization and sorting, the significant eigengenes and eigenarrays can be associated with observed genome-wide effects of regulators, or with measured samples, in which these regulators are overactive or underactive, respectively.
1,815 citations