scispace - formally typeset
Search or ask a question

Showing papers by "Cheng Li published in 2011"


Journal ArticleDOI
15 Sep 2011-Blood
TL;DR: E elevated CLL genomic complexity is identified as an independent and powerful marker for the identification of patients with aggressive CLL and short survival with the use of multivariate analyses incorporating the most important prognostic factors in CLL together with SNP 6.0 array-based genomic lesion loads.

107 citations


Journal ArticleDOI
TL;DR: The data suggest that the clinical course of CLL is accelerated in patients with large (type II) 13q 14 deletions that span the RB1 gene, therefore justifying routine identification of 13q14 subtypes in CLL management.
Abstract: Purpose: To further our understanding of the biology and prognostic significance of various chromosomal 13q14 deletions in chronic lymphocytic leukemia (CLL). Experimental Design: We analyzed data from SNP 6.0 arrays to define the anatomy of various 13q14 deletions in a cohort of 255 CLL patients and have correlated two subsets of 13q14 deletions (type I exclusive of RB1 and type II inclusive of RB1 ) with patient survival. Furthermore, we measured the expression of the 13q14-resident microRNAs by quantitative PCR (Q-PCR) in 242 CLL patients and subsequently assessed their prognostic significance. We sequenced all coding exons of RB1 in patients with monoallelic RB1 deletion and have sequenced the 13q14-resident miR locus in all patients. Results: Large 13q14 (type II) deletions were detected in approximately 20% of all CLL patients and were associated with shortened survival. A strong association between 13q14 type II deletions and elevated genomic complexity, as measured through CLL-FISH or SNP 6.0 array profiling, was identified, suggesting that these lesions may contribute to CLL disease evolution through genomic destabilization. Sequence and copy number analysis of the RB1 gene identified a small CLL subset that is RB1 null. Finally, neither the expression levels of the 13q14-resident microRNAs nor the degree of 13q14 deletion, as measured through SNP 6.0 array-based copy number analysis, had significant prognostic importance. Conclusions: Our data suggest that the clinical course of CLL is accelerated in patients with large (type II) 13q14 deletions that span the RB1 gene, therefore justifying routine identification of 13q14 subtypes in CLL management. Clin Cancer Res; 17(21); 6778–90. ©2011 AACR .

96 citations


Journal ArticleDOI
TL;DR: It is suggested that antigenicity is a local property of the protein sequences and that protein sequence properties of composition, secondary structure, solvent accessibility and evolutionary conservation are the determinants of antigenicity and specificity in immune response.
Abstract: Target specific antibodies are pivotal for the design of vaccines, immunodiagnostic tests, studies on proteomics for cancer biomarker discovery, identification of protein-DNA and other interactions, and small and large biochemical assays. Therefore, it is important to understand the properties of protein sequences that are important for antigenicity and to identify small peptide epitopes and large regions in the linear sequence of the proteins whose utilization result in specific antibodies. Our analysis using protein properties suggested that sequence composition combined with evolutionary information and predicted secondary structure, as well as solvent accessibility is sufficient to predict successful peptide epitopes. The antigenicity and the specificity in immune response were also found to depend on the epitope length. We trained the B-Cell Epitope Oracle (BEOracle), a support vector machine (SVM) classifier, for the identification of continuous B-Cell epitopes with these protein properties as learning features. The BEOracle achieved an F1-measure of 81.37% on a large validation set. The BEOracle classifier outperformed the classical methods based on propensity and sophisticated methods like BCPred and Bepipred for B-Cell epitope prediction. The BEOracle classifier also identified peptides for the ChIP-grade antibodies from the modENCODE/ENCODE projects with 96.88% accuracy. High BEOracle score for peptides showed some correlation with the antibody intensity on Immunofluorescence studies done on fly embryos. Finally, a second SVM classifier, the B-Cell Region Oracle (BROracle) was trained with the BEOracle scores as features to predict the performance of antibodies generated with large protein regions with high accuracy. The BROracle classifier achieved accuracies of 75.26-63.88% on a validation set with immunofluorescence, immunohistochemistry, protein arrays and western blot results from Protein Atlas database. Together our results suggest that antigenicity is a local property of the protein sequences and that protein sequence properties of composition, secondary structure, solvent accessibility and evolutionary conservation are the determinants of antigenicity and specificity in immune response. Moreover, specificity in immune response could also be accurately predicted for large protein regions without the knowledge of the protein tertiary structure or the presence of discontinuous epitopes. The dataset prepared in this work and the classifier models are available for download at https://sites.google.com/site/oracleclassifiers/ .

61 citations


Journal ArticleDOI
TL;DR: Experimental and clinical evidence shows Sp1 as an important transcription factor in myeloma that can be therapeutically targeted for clinical application by terameprocol, a small molecule that specifically competes with Sp1-DNA binding in vitro and in vivo.
Abstract: Purpose: The transcription factor specificity protein 1 (Sp1) controls number of cellular processes by regulating the expression of critical cell cycle, differentiation, and apoptosis-related genes containing proximal GC/GT-rich promoter elements. We here provide experimental and clinical evidence that Sp1 plays an important regulatory role in multiple myeloma (MM) cell growth and survival. Experimental Design: We have investigated the functional Sp1 activity in MM cells using a plasmid with Firefly luciferase reporter gene driven by Sp1-responsive promoter. We have also used both siRNA- and short hairpin RNA–mediated Sp1 knockdown to investigate the growth and survival effects of Sp1 on MM cells and further investigated the anti-MM activity of terameprocol (TMP), a small molecule that specifically competes with Sp1-DNA binding in vitro and in vivo . Results: We have confirmed high Sp1 activity in MM cells that is further induced by adhesion to bone marrow stromal cells (BMSC). Sp1 knockdown decreases MM cell proliferation and induces apoptosis. Sp1-DNA binding inhibition by TMP inhibits MM cell growth both in vitro and in vivo , inducing caspase-9–dependent apoptosis and overcoming the protective effects of BMSCs. Conclusions: Our results show Sp1 as an important transcription factor in myeloma that can be therapeutically targeted for clinical application by TMP. Clin Cancer Res; 17(20); 6500–9. ©2011 AACR .

48 citations


Journal ArticleDOI
TL;DR: CaSNP was used to study the CNA of protein-coding genes as well as LincRNA genes across all cancer SNP arrays, and found putative regions harboring novel oncogenes and tumor suppressors.
Abstract: Cancer is known to have abundant copy number alterations (CNAs) that greatly contribute to its pathogenesis and progression. Investigation of CNA regions could potentially help identify oncogenes and tumor suppressor genes and infer cancer mechanisms. Although single-nucleotide polymorphism (SNP) arrays have strengthened our ability to identify CNAs with unprecedented resolution, a comprehensive collection of CNA information from SNP array data is still lacking. We developed a web-based CaSNP (http://cistrome.dfci.harvard.edu/CaSNP/) database for storing and interrogating quantitative CNA data, which curated ∼11 500 SNP arrays on 34 different cancer types in 104 studies. With a user input of region or gene of interest, CaSNP will return the CNA information summarizing the frequencies of gain/loss and averaged copy number for each study, and provide links to download the data or visualize it in UCSC Genome Browser. CaSNP also displays the heatmap showing copy numbers estimated at each SNP marker around the query region across all studies for a more comprehensive visualization. Finally, we used CaSNP to study the CNA of protein-coding genes as well as LincRNA genes across all cancer SNP arrays, and found putative regions harboring novel oncogenes and tumor suppressors. In summary, CaSNP is a useful tool for cancer CNA association studies, with the potential to facilitate both basic science and translational research on cancer.

27 citations


Journal ArticleDOI
Gao H1, Cheng Li1, Rong Mu1, Yong Guo1, Tao Liu1, Chen S1, Yin Su1, Zhanguo Li1 
06 Jun 2011-Lupus
TL;DR: The results of the present study suggest that SCH is a common complication in SLE patients, and closely related with lupus nephritis, and close related with LN.
Abstract: The aim of this study was to evaluate the prevalence of thyroid diseases in Chinese systemic lupus erythematosus (SLE) patients and the relevance of subclinical hypothyroidism (SCH) with lupus nephritis (LN). A large cohort of 1006 SLE patients was retrospectively analyzed. The prevalence of autoimmune thyroid disease was 2.78%, clinical hypothyroidism 1.69%, subclinical hypothyroidism 10.04%, central hypothyroidism 1.29%, hyperthyroidism 1.19%, euthyroid sick syndrome (ESS) 9.54%, and nodules 1.09%, respectively. Compared with the prevalence of thyroid abnormalities in the general Chinese population (0.91-6.05%), SCH was much higher (10.04%) in this study. In addition, SCH was more frequent in patients with LN (13.4%) than those without LN (7.3%, p = 0.001). Case control study was performed to explore the relative risk factors of SCH. In multiple logistic regression models, 24 h urine protein and estimated glomerular filtration rate (eGFR) were retained as independent correlates of SCH after adjusting for demographic variables, risk factors, and other potential confounders. The results of the present study suggest that SCH is a common complication in SLE patients, and closely related with LN.

20 citations


Book ChapterDOI
01 Jan 2011
TL;DR: This chapter is a survey of common classification techniques and related methods to increase their accuracies for microarray analysis based on data mining methodology.
Abstract: With the recent advance of biomedical technology, a lot of ‘OMIC’ data from genomic, transcriptomic, and proteomic domain can now be collected quickly and cheaply. One such technology is the microarray technology which allows researchers to gather information on expressions of thousands of genes all at the same time. With the large amount of data, a new problem surfaces – how to extract useful information from them. Data mining and machine learning techniques have been applied in many computer applications for some time. It would be natural to use some of these techniques to assist in drawing inference from the volume of information gathered through microarray experiments. This chapter is a survey of common classification techniques and related methods to increase their accuracies for microarray analysis based on data mining methodology. Publicly available datasets are used to evaluate their performance.

19 citations


Journal ArticleDOI
16 Dec 2011-PLOS ONE
TL;DR: A simple and efficient strategy to derive candidate urine markers for prostate tumor by mining cancer genomic profiles from public databases and suggested a few urine markers as preferred prognostic markers to monitor the invasion and progression of PCa.
Abstract: Urine has emerged as an attractive biofluid for the noninvasive detection of prostate cancer (PCa). There is a strong imperative to discover candidate urinary markers for the clinical diagnosis and prognosis of PCa. The rising flood of various omics profiles presents immense opportunities for the identification of prospective biomarkers. Here we present a simple and efficient strategy to derive candidate urine markers for prostate tumor by mining cancer genomic profiles from public databases. Prostate, bladder and kidney are three major tissues from which cellular matters could be released into urine. To identify urinary markers specific for PCa, upregulated entities that might be shed in exosomes of bladder cancer and kidney cancer are first excluded. Through the ontology-based filtering and further assessment, a reduced list of 19 entities encoding urinary proteins was derived as putative PCa markers. Among them, we have found 10 entities closely associated with the process of tumor cell growth and development by pathway enrichment analysis. Further, using the 10 entities as seeds, we have constructed a protein-protein interaction (PPI) subnetwork and suggested a few urine markers as preferred prognostic markers to monitor the invasion and progression of PCa. Our approach is amenable to discover and prioritize potential markers present in a variety of body fluids for a spectrum of human diseases.

18 citations


Journal ArticleDOI
TL;DR: The dChip survival module provides user-friendly way to perform survival analysis and visualize the results in the context of genes and cytobands and requires no coding expertise and only minimal learning curve for thousands of existing dChip users.
Abstract: Genome-wide expression signatures are emerging as potential marker for overall survival and disease recurrence risk as evidenced by recent commercialization of gene expression based biomarkers in breast cancer. Similar predictions have recently been carried out using genome-wide copy number alterations and microRNAs. Existing software packages for microarray data analysis provide functions to define expression-based survival gene signatures. However, there is no software that can perform survival analysis using SNP array data or draw survival curves interactively for expression-based sample clusters. We have developed the survival analysis module in the dChip software that performs survival analysis across the genome for gene expression and copy number microarray data. Built on the current dChip software's microarray analysis functions such as chromosome display and clustering, the new survival functions include interactive exploring of Kaplan-Meier (K-M) plots using expression or copy number data, computing survival p-values from the log-rank test and Cox models, and using permutation to identify significant chromosome regions associated with survival. The dChip survival module provides user-friendly way to perform survival analysis and visualize the results in the context of genes and cytobands. It requires no coding expertise and only minimal learning curve for thousands of existing dChip users. The implementation in Visual C++ also enables fast computation. The software and demonstration data are freely available at http://dchip-surv.chenglilab.org .

9 citations


Journal ArticleDOI
TL;DR: This study uses a Bayesian regression method to model all variants simultaneously to identify rare variants in a data set from Genetic Analysis Workshop 17 and identified several positive single-nucleotide polymorphisms for traits Q1 and Q2.
Abstract: Recent advances in next-generation sequencing technologies have made it possible to generate large amounts of sequence data with rare variants in a cost-effective way. Statistical methods that test variants individually are underpowered to detect rare variants, so it is desirable to perform association analysis of rare variants by combining the information from all variants. In this study, we use a Bayesian regression method to model all variants simultaneously to identify rare variants in a data set from Genetic Analysis Workshop 17. We studied the association between the quantitative risk traits Q1, Q2, and Q4 and the single-nucleotide polymorphisms and identified several positive single-nucleotide polymorphisms for traits Q1 and Q2. However, the model also generated several apparent false positives and missed many true positives, suggesting that there is room for improvement in this model.

5 citations


Journal ArticleDOI
18 Nov 2011-Blood
TL;DR: Evaluated the impact of therapy on GES utilizing two large publicly available gene expression datasets from newly-diagnosed multiple myeloma patients generated using Affymetrix U133+2 microarrays to derive a sparse multivariate survival signature.