scispace - formally typeset
Search or ask a question
Author

Vinh Q. Dang

Bio: Vinh Q. Dang is an academic researcher. The author has contributed to research in topics: Feature selection & Biological data. The author has an hindex of 1, co-authored 1 publications receiving 4 citations.

Papers
More filters
01 Jan 2014
TL;DR: This dissertation aims to provide a history of web exceptionalism from 1989 to 2002, a period chosen in order to explore its roots as well as specific cases up to and including the year in which descriptions of “Web 2.0” began to circulate.
Abstract: Data mining techniques have been used widely in many areas such as business, science, engineering and medicine The techniques allow a vast amount of data to be explored in order to extract useful information from the data One of the foci in the health area is finding interesting biomarkers from biomedical data Mass throughput data generated from microarrays and mass spectrometry from biological samples are high dimensional and is small in sample size Examples include DNA microarray datasets with up to 500,000 genes and mass spectrometry data with 300,000 m/z values While the availability of such datasets can aid in the development of techniques/drugs to improve diagnosis and treatment of diseases, a major challenge involves its analysis to extract useful and meaningful information The aims of this project are: 1) to investigate and develop feature selection algorithms that incorporate various evolutionary strategies, 2) using the developed algorithms to find the “most relevant” biomarkers contained in biological datasets and 3) and evaluate the goodness of extracted feature subsets for relevance (examined in terms of existing biomedical domain knowledge and from classification accuracy obtained using different classifiers) The project aims to generate good predictive models for classifying diseased samples from control

4 citations


Cited by
More filters
Book ChapterDOI
TL;DR: An unsupervised approach for finding out the significant genes from microarray gene expression datasets using a quantum clustering approach to represent gene-expression data as equations and uses the procedure to search for the most probable set of clusters given the available data.
Abstract: In this paper, we have implemented an unsupervised approach for finding out the significant genes from microarray gene expression datasets. The proposed method is based on implements a quantum clustering approach to represent gene-expression data as equations and uses the procedure to search for the most probable set of clusters given the available data. The main contribution of this approach lies in the ability to take into account the essential features or genes using clustering. Here, we present a novel clustering approach that extends ideas from scale-space clustering and support-vector clustering. This clustering method is used as a feature selection method. Our approach is fundamentally based on the representation of datapoints or features in the Hilbert space, which is then represented by the Schrodinger equation, of which the probability function is a solution. This Schrodinger equation contains a potential function that is extended from the initial probability function.The minima of the potential values are then treated as cluster centres. The cluster centres thus stand out as representative genes. These genes are evaluated using classifiers, and their performance is recorded over various indices of classification. From the experiments, it is found that the classification performance of the reduced set is much better than the entire dataset.The only free-scale parameter, sigma, is then altered to obtain the highest accuracy, and the corresponding biological significance of the genes is noted.

1 citations

Proceedings ArticleDOI
21 Feb 2016
TL;DR: A hybrid approach incorporating the Nearest Shrunken Centroid (NSC) and Memetic Algorithm (MA) is proposed to automatically search for an optimal range of shrinkage threshold values for the NSC to improve feature selection and classification accuracy.
Abstract: High-throughput technologies such as microarrays and mass spectrometry produced high dimensional biological datasets both in abundance and with increasing complexity. Prediction Analysis for Microarrays (PAM) is a well-known implementation of the Nearest Shrunken Centroid (NSC) method which has been widely used for classification of biological data. In this paper, a hybrid approach incorporating the Nearest Shrunken Centroid (NSC) and Memetic Algorithm (MA) is proposed to automatically search for an optimal range of shrinkage threshold values for the NSC to improve feature selection and classification accuracy. Evaluation of the approach involved nine biological datasets and results showed improved feature selection stability over existing evolutionary approaches as well as improved classification accuracy.

1 citations

Book ChapterDOI
01 Jan 2021
TL;DR: A clustering-based feature selection algorithm to select the particular gene responsible for a particular disease has been proposed and compared with two other well-established feature selection techniques under three different classification approaches, in terms of accuracy, precision, recall and F-score.
Abstract: Genes are the blueprint for all activities of living systems that help them to sustain and have a stable life cycle under normal conditions. Any mistakes in the genetic regulation can disturb their synchronous activity and cause a disease. Due to this, identifying the particular disease-causing genes is very significant research area in bioinformatics. In this paper, a clustering-based feature selection algorithm to select the particular gene responsible for a particular disease has been proposed by us. We have used a well-established clustering algorithm, mean shift clustering for this purpose. Mathematically, we can say that each cluster will represent genes having characteristics different from genes in other clusters. From each cluster, we shall fetch the cluster centres only and test our model on the dataset with reduced dimension. We have opted for density-based approach for its ability to predict the number of clusters by itself. Our algorithm is experimented on benchmark datasets which are publicly available and compared with two other well-established feature selection techniques under three different classification approaches, in terms of accuracy, precision, recall and F-score. Our proposed algorithm performed well in most of the cases.