scispace - formally typeset
Search or ask a question

Showing papers by "Tianwei Yu published in 2020"


Journal ArticleDOI
TL;DR: A new approach is developed that address the batch effect issue in the preprocessing stage, resulting in better peak detection, alignment and quantification, and can be combined with down-stream batch effect correction methods to further correct for between-batch intensity differences.
Abstract: With the growth of metabolomics research, more and more studies are conducted on large numbers of samples. Due to technical limitations of the Liquid Chromatography-Mass Spectrometry (LC/MS) platform, samples often need to be processed in multiple batches. Across different batches, we often observe differences in data characteristics. In this work, we specifically focus on data generated in multiple batches on the same LC/MS machinery. Traditional preprocessing methods treat all samples as a single group. Such practice can result in errors in the alignment of peaks, which cannot be corrected by post hoc application of batch effect correction methods. In this work, we developed a new approach that address the batch effect issue in the preprocessing stage, resulting in better peak detection, alignment and quantification. It can be combined with down-stream batch effect correction methods to further correct for between-batch intensity differences. The method is implemented in the existing workflow of the apLCMS platform. Analyzing data with multiple batches, both generated from standardized quality control (QC) plasma samples and from real biological studies, the new method resulted in feature tables with better consistency, as well as better down-stream analysis results. The method can be a useful addition to the tools available for large studies involving multiple batches. The method is available as part of the apLCMS package. Download link and instructions are at https://mypage.cuhk.edu.cn/academics/yutianwei/apLCMS/ .

30 citations


Journal ArticleDOI
Teng Fei1, Tianwei Yu1
TL;DR: ScBatch as mentioned in this paper is a numerical algorithm for batch effect correction on bulk and single-cell RNA-seq data with emphasis on improving both clustering and gene differential expression analysis.
Abstract: Motivation Batch effect is a frequent challenge in deep sequencing data analysis that can lead to misleading conclusions. Existing methods do not correct batch effects satisfactorily, especially with single-cell RNA sequencing (RNA-seq) data. Results We present scBatch, a numerical algorithm for batch-effect correction on bulk and single-cell RNA-seq data with emphasis on improving both clustering and gene differential expression analysis. scBatch is not restricted by assumptions on the mechanism of batch-effect generation. As shown in simulations and real data analyses, scBatch outperforms benchmark batch-effect correction methods. Availability and implementation The R package is available at github.com/tengfei-emory/scBatch. The code to generate results and figures in this article is available at github.com/tengfei-emory/scBatch-paper-scripts. Supplementary information Supplementary data are available at Bioinformatics online.

20 citations


Journal ArticleDOI
TL;DR: A forest graph-embedded deep feedforward network (forgeNet) model is proposed, to integrate the GEDFN architecture with a forest feature graph extractor, so that the feature graph can be learned in a supervised manner and specifically constructed for a given prediction task.
Abstract: Motivation A unique challenge in predictive model building for omics data has been the small number of samples (n) versus the large amount of features (p). This 'n≪p' property brings difficulties for disease outcome classification using deep learning techniques. Sparse learning by incorporating known functional relationships between the biological units, such as the graph-embedded deep feedforward network (GEDFN) model, has been a solution to this issue. However, such methods require an existing feature graph, and potential mis-specification of the feature graph can be harmful on classification and feature selection. Results To address this limitation and develop a robust classification model without relying on external knowledge, we propose a forest graph-embedded deep feedforward network (forgeNet) model, to integrate the GEDFN architecture with a forest feature graph extractor, so that the feature graph can be learned in a supervised manner and specifically constructed for a given prediction task. To validate the method's capability, we experimented the forgeNet model with both synthetic and real datasets. The resulting high classification accuracy suggests that the method is a valuable addition to sparse deep learning models for omics data. Availability and implementation The method is available at https://github.com/yunchuankong/forgeNet. Contact tianwei.yu@emory.edu. Supplementary information Supplementary data are available at Bioinformatics online.

19 citations


Journal ArticleDOI
TL;DR: These data-driven findings show a stronger association of hepatic fat with key CMD risk factors compared with abdominal fats, and insight into potential mechanisms underlying the hepaticfat–insulin resistance interface in youth is provided.
Abstract: Introduction Body fat distribution is strongly associated with cardiometabolic disease (CMD), but the relative importance of hepatic fat as an underlying driver remains unclear. Here, we applied a systems biology approach to compare the clinical and molecular subnetworks that correlate with hepatic fat, visceral fat, and abdominal subcutaneous fat distribution. Research design and methods This was a cross-sectional sub-study of 283 children/adolescents (7–19 years) from the Yale Pediatric NAFLD Cohort. Untargeted, high-resolution metabolomics (HRM) was performed on plasma and combined with existing clinical variables including hepatic and abdominal fat measured by MRI. Integrative network analysis was coupled with pathway enrichment analysis and multivariable linear regression (MLR) to examine which metabolites and clinical variables associated with each fat depot. Results The data divided into four communities of correlated variables (|r|>0.15, p Conclusions These data-driven findings show a stronger association of hepatic fat with key CMD risk factors compared with abdominal fats. The molecular network identified using HRM that associated with hepatic fat provides insight into potential mechanisms underlying the hepatic fat–insulin resistance interface in youth.

15 citations


Journal ArticleDOI
TL;DR: A single-cell RNA sequencing analysis of the complete and invariant embryonic cell lineage of the tunicate Ciona savignyi from fertilization to the onset of gastrulation reveals insights into asymmetric cell division, FGF signaling, and notochord specification.
Abstract: Progressive unfolding of gene expression cascades underlies diverse embryonic lineage development. Here, we report a single-cell RNA sequencing analysis of the complete and invariant embryonic cell lineage of the tunicate Ciona savignyi from fertilization to the onset of gastrulation. We reconstructed a developmental landscape of 47 cell types over eight cell cycles in the wild-type embryo and identified eight fate transformations upon fibroblast growth factor (FGF) inhibition. For most FGF-dependent asymmetric cell divisions, the bipotent mother cell displays the gene signature of the default daughter fate. In convergent differentiation of the two notochord lineages, we identified additional gene pathways parallel to the master regulator T/Brachyury Last, we showed that the defined Ciona cell types can be matched to E6.5-E8.5 stage mouse cell types and display conserved expression of limited number of transcription factors. This study provides a high-resolution single-cell dataset to understand chordate early embryogenesis and cell lineage differentiation.

13 citations


Journal ArticleDOI
TL;DR: In this article, a thresholded graph Laplacian Gaussian prior is proposed for Bayesian network marker selection in the generalized linear model (GLM) framework, which adopts the GGLG prior to characterize the conditional dependence between neighboring markers accounting for the global network structure.
Abstract: Selecting informative nodes over large-scale networks becomes increasingly important in many research areas. Most existing methods focus on the local network structure and incur heavy computational costs for the large-scale problem. In this work, we propose a novel prior model for Bayesian network marker selection in the generalized linear model (GLM) framework: the Thresholded Graph Laplacian Gaussian (TGLG) prior, which adopts the graph Laplacian matrix to characterize the conditional dependence between neighboring markers accounting for the global network structure. Under mild conditions, we show the proposed model enjoys the posterior consistency with a diverging number of edges and nodes in the network. We also develop a Metropolis-adjusted Langevin algorithm (MALA) for efficient posterior computation, which is scalable to large-scale networks. We illustrate the superiorities of the proposed method compared with existing alternatives via extensive simulation studies and an analysis of the breast cancer gene expression dataset in the Cancer Genome Atlas (TCGA).

12 citations


Journal ArticleDOI
TL;DR: Among adults with C MDs, more metabolomic features differed after a meal challenge, which reflected lower metabolic flexibility relative to individuals without CMDs.
Abstract: Context Metabolic flexibility is the physiologic acclimatization to differing energy availability and requirement states. Effectively maintaining metabolic flexibility remains challenging, particularly since metabolic dysregulations in meal consumption during cardiometabolic disease (CMD) pathophysiology are incompletely understood. Objective We compared metabolic flexibility following consumption of a standardized meal challenge among adults with or without CMDs. Design Setting and Participants Study participants (n = 349; age 37-54 years, 55% female) received a standardized meal challenge (520 kcal, 67.4 g carbohydrates, 24.3 g fat, 8.0 g protein; 259 mL). Blood samples were collected at baseline and 2 hours postchallenge. Plasma samples were assayed by high-resolution, nontargeted metabolomics with dual-column liquid chromatography and ultrahigh-resolution mass spectrometry. Metabolome-wide associations between features and meal challenge timepoint were assessed in multivariable linear regression models. Results Sixty-five percent of participants had ≥1 of 4 CMDs: 33% were obese, 6% had diabetes, 39% had hypertension, and 50% had metabolic syndrome. Log2-normalized ratios of feature peak areas (postprandial:fasting) clustered separately among participants with versus without any CMDs. Among participants with CMDs, the meal challenge altered 1756 feature peak areas (1063 reversed-phase [C18], 693 hydrophilic interaction liquid chromatography [HILIC]; all q < 0.05). In individuals without CMDs, the meal challenge changed 1383 feature peak areas (875 C18; 508 HILIC; all q < 0.05). There were 108 features (60 C18; 48 HILIC) that differed by the meal challenge and CMD status, including dipeptides, carnitines, glycerophospholipids, and a bile acid metabolite (all P < 0.05). Conclusions Among adults with CMDs, more metabolomic features differed after a meal challenge, which reflected lower metabolic flexibility relative to individuals without CMDs.

5 citations


Posted ContentDOI
25 Feb 2020-bioRxiv
TL;DR: Altered systemic lipid and fatty acid are linked with early memory decline in middle-aged individuals and may be related to systemic metabolic changes.
Abstract: INTRODUCTION: Some aspects of memory start declining in the fifth decade which may be related to systemic metabolic changes. These changes have not been fully identified. This is the first Metabolome-Wide Association Study of the human plasma for the longitudinal change in memory in healthy adults. METHODS: Ultra-high resolution mass spectrometry with liquid chromatography was performed on 207 University employees9 plasma. RESULTS: From 10,201 measured metabolic features, 558 differed between those experiencing change vs no change in memory (False Discovery Rate, FDR< 0.2). Differentially abundant metabolites were observed in lipid and fatty acid metabolism pathways: glycerophospholipid (p=0.0003), fatty acid (p=0.0003) and linoleate (p=0.0003) pathways. Within these pathways, higher homocysteine (OR for memory decline=1.09, FDR=0.19) and lower arachidonic acid (OR=0.97, FDR=0.19), sterol (OR=0.92, FDR=0.02), acetylcholine (OR=0.78, FDR=0.19), carnitine (OR=0.75, FDR=0.19) and linoleic acid (OR=0.74, FDR=0.19) were associated memory decline. DISCUSSION: Altered systemic lipid and fatty acid are linked with early memory decline in middle-aged individuals. Keywords: memory decline, metabolomics, fatty acid metabolism, mass spectrometry, liquid chromatography

3 citations


Journal ArticleDOI
TL;DR: Energy, macronutrient, and bile acid metabolism pathways were responsive to a standardized meal challenge in adults without cardiometabolic diseases, and these findings reflect metabolic flexibility in disease-free individuals.
Abstract: BACKGROUND The healthy human metabolome, including its physiological responses after meal consumption, remains incompletely understood One major research gap is the limited literature assessing how human metabolomic profiles differ between fasting and postprandial states after physiological challenges OBJECTIVES Our study objective was to evaluate alterations in high-resolution metabolomic profiles following a standardized meal challenge, relative to fasting, in Guatemalan adults METHODS We studied 123 Guatemalan adults without obesity, hypertension, diabetes, metabolic syndrome, or comorbidities Every participant received a standardized meal challenge (520 kcal, 674 g carbohydrates, 243 g fat, 80 g protein) and provided blood samples while fasting and at 2 h postprandial Plasma samples were assayed by high-resolution metabolomics with dual-column LC [C18 (negative electrospray ionization), hydrophilic interaction LC (HILIC, positive electrospray ionization)] coupled to ultra-high-resolution MS Associations between metabolomic features and the meal challenge timepoint were assessed in feature-by-feature multivariable linear mixed regression models Two algorithms (mummichog, gene set enrichment analysis) were used for pathway analysis, and P values were combined by the Fisher method RESULTS Among participants (626% male, median age 430 y), 1130 features (C18: 777; HILIC: 353) differed between fasting and postprandial states (all false discovery rate-adjusted q < 005) Based on differing C18 features, top pathways included: tricarboxylic acid cycle (TCA), primary bile acid biosynthesis, and linoleic acid metabolism (all Pcombined < 005) Mass spectral features included: taurine and cholic acid in primary bile acid biosynthesis; and fumaric acid, malic acid, and citric acid in the TCA HILIC features that differed in the meal challenge reflected linoleic acid metabolism (Pcombined < 005) CONCLUSIONS Energy, macronutrient, and bile acid metabolism pathways were responsive to a standardized meal challenge in adults without cardiometabolic diseases Our findings reflect metabolic flexibility in disease-free individuals

3 citations


Posted ContentDOI
03 Mar 2020-bioRxiv
TL;DR: This study provides a high-resolution single cell dataset to understand chordate embryogenesis and the relationship between fate trajectories and the cell lineage and found that, for the majority of asymmetrical cell divisions, the bipotent mother cell shows predominantly the gene signature of one of the daughter fates, with the other daughter being induced by subsequent signaling.
Abstract: In multicellular organisms, a single zygote develops along divergent lineages to produce distinct cell types. What governs these processes is central to the understanding of cell fate specification and stem cell engineering. Here we used the protochordate model Ciona savignyi to determine gene expression profiles of every cell of single embryos from fertilization through the onset of gastrulation and provided a comprehensive map of chordate early embryonic lineage specification. We identified 47 cell types across 8 developmental stages up to the 110-cell stage in wild type embryos and 8 fate transformations at the 64-cell stage upon FGF-MAPK inhibition. The identities of all cell types were evidenced by in situ expression pattern of marker genes and expected number of cells based on the invariant lineage. We found that, for the majority of asymmetrical cell divisions, the bipotent mother cell shows predominantly the gene signature of one of the daughter fates, with the other daughter being induced by subsequent signaling. Our data further indicated that the asymmetric segregation of mitochondria in some of these divisions does not depend on the concurrent fate inducing FGF-MAPK signaling. In the notochord, which is an evolutionary novelty of chordates, the convergence of cell fate from two disparate lineages revealed modular structure in the gene regulatory network beyond the known master regulator T/Brachyury. Comparison to single cell transcriptomes of the early mouse embryo showed a clear match of cell types at the tissue level and supported the hypothesis of developmental-genetic toolkit. This study provides a high-resolution single cell dataset to understand chordate embryogenesis and the relationship between fate trajectories and the cell lineage.

2 citations