scispace - formally typeset
Search or ask a question
Author

Gang Qian

Other affiliations: Michigan State University
Bio: Gang Qian is an academic researcher from University of Central Oklahoma. The author has contributed to research in topics: Search engine indexing & Tree (data structure). The author has an hindex of 9, co-authored 28 publications receiving 994 citations. Previous affiliations of Gang Qian include Michigan State University.

Papers
More filters
Proceedings ArticleDOI
10 Dec 2002
TL;DR: The feature extraction method has been applied for both image segmentation as well as histogram generation applications - two distinct approaches to content based image retrieval (CBIR), showing better identification of objects in an image.
Abstract: We have analyzed the properties of the HSV (hue, saturation and value) color space with emphasis on the visual perception of the variation in hue, saturation and intensity values of an image pixel. We extract pixel features by either choosing the hue or the intensity as the dominant property based on the saturation value of a pixel. The feature extraction method has been applied for both image segmentation as well as histogram generation applications - two distinct approaches to content based image retrieval (CBIR). Segmentation using this method shows better identification of objects in an image. The histogram retains a uniform color transition that enables us to do a window-based smoothing during retrieval. The results have been compared with those generated using the RGB color space.

555 citations

Proceedings ArticleDOI
14 Mar 2004
TL;DR: This paper compares two commonly used distance measures in vector models, namely, Euclidean distance (EUD) and cosine angle distance (CAD), for nearest neighbor (NN) queries in high dimensional data spaces and shows that CAD works no worse than EUD.
Abstract: Understanding the relationship among different distance measures is helpful in choosing a proper one for a particular application. In this paper, we compare two commonly used distance measures in vector models, namely, Euclidean distance (EUD) and cosine angle distance (CAD), for nearest neighbor (NN) queries in high dimensional data spaces. Using theoretical analysis and experimental results, we show that the retrieval results based on EUD are similar to those based on CAD when dimension is high. We have applied CAD for content based image retrieval (CBIR). Retrieval results show that CAD works no worse than EUD, which is a commonly used distance measure for CBIR, while providing other advantages, such as naturally normalized distance.

281 citations

Book ChapterDOI
09 Sep 2003
TL;DR: In this paper, a dynamic indexing technique called the ND-tree is proposed to support efficient similarity searches in an NDDS, which extends the relevant geometric concepts as well as some indexing strategies used in CDSs to NDDSs.
Abstract: Similarity searches in multidimensional Nonordered Discrete Data Spaces (NDDS) are becoming increasingly important for application areas such as genome sequence databases. Existing indexing methods developed for multidimensional (ordered) Continuous Data Spaces (CDS) such as R-tree cannot be directly applied to an NDDS. This is because some essential geometric concepts/properties such as the minimum bounding region and the area of a region in a CDS are no longer valid in an NDDS. On the other hand, indexing methods based on metric spaces such as M-tree are too general to effectively utilize the data distribution characteristics in an NDDS. Therefore, their retrieval performance is not optimized. To support efficient similarity searches in an NDDS, we propose a new dynamic indexing technique, called the ND-tree. The key idea is to extend the relevant geometric concepts as well as some indexing strategies used in CDSs to NDDSs. Efficient algorithms for ND-tree construction are presented. Our experimental results on synthetic and genomic sequence data demonstrate that the performance of the ND-tree is significantly better than that of the linear scan and M-tree in high dimensional NDDSs.

35 citations

01 Jan 2006
TL;DR: The key idea is to extend the relevant geometric concepts as well as some indexing strategies used in CDSs to NDDSs, and demonstrate that the performance of the ND-tree is significantly better than that of the linear scan and M-tree in high dimensionalNDDSs.
Abstract: Similarity searches in multidimensional Non-ordered Discrete Data Spaces (NDDS) are becoming increasingly important for application areas such as bioinformatics, biometrics, data mining and Ecommerce. Ecien t similarity searches require robust indexing techniques. Unfortunately, existing indexing methods developed for multidimensional (ordered) Continuous Data Spaces (CDS) such as the R-tree cannot be directly applied to an NDDS. This is because some essential geometric concepts/properties such as the minimum bounding region and the area of a region in a CDS are no longer valid in an NDDS. Other indexing methods based on metric spaces such as the M-tree and the Slim-trees are too general to eectiv ely utilize the special characteristics of NDDSs, resulting in non-optimized performance. In this paper, we propose a new dynamic indexing technique, called the ND-tree, to support ecien t similarity searches in an NDDS. The key idea is to extend the relevant geometric concepts as well as some indexing strategies used in CDSs to NDDSs. Ecien t algorithms for ND-tree construction and techniques to solve relevant issues such as handling dimensions with dieren t alphabets in an NDDS are presented. Our experimental results on synthetic data and real genome sequence data demonstrate that the ND-tree outperforms the linear scan, the M-tree and the Slim-trees for similarity searches in multidimensional NDDSs. A theoretical model is also developed to predict the performance of the ND-tree for random data.

33 citations

Journal ArticleDOI
TL;DR: The experimental results on synthetic data and real genome sequence data demonstrate that the ND-tree outperforms the linear scan, the M-tree and the Slim-trees for similarity searches in multidimensional NDDSs.
Abstract: Similarity searches in multidimensional Non-ordered Discrete Data Spaces (NDDS) are becoming increasingly important for application areas such as bioinformatics, biometrics, data mining and E-commerce. Efficient similarity searches require robust indexing techniques. Unfortunately, existing indexing methods developed for multidimensional (ordered) Continuous Data Spaces (CDS) such as the R-tree cannot be directly applied to an NDDS. This is because some essential geometric concepts/properties such as the minimum bounding region and the area of a region in a CDS are no longer valid in an NDDS. Other indexing methods based on metric spaces such as the M-tree and the Slim-trees are too general to effectively utilize the special characteristics of NDDSs, resulting in nonoptimized performance. In this article, we propose a new dynamic data-partitioning-based indexing technique, called the ND-tree, to support efficient similarity searches in an NDDS. The key idea is to extend the relevant geometric concepts as well as some indexing strategies used in CDSs to NDDSs. Efficient algorithms for ND-tree construction and techniques to solve relevant issues such as handling dimensions with different alphabets in an NDDS are presented. Our experimental results on synthetic data and real genome sequence data demonstrate that the ND-tree outperforms the linear scan, the M-tree and the Slim-trees for similarity searches in multidimensional NDDSs. A theoretical model is also developed to predict the performance of the ND-tree for random data.

31 citations


Cited by
More filters
01 Jun 2012
TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).
Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.

10,124 citations

Proceedings ArticleDOI
10 Dec 2002
TL;DR: The feature extraction method has been applied for both image segmentation as well as histogram generation applications - two distinct approaches to content based image retrieval (CBIR), showing better identification of objects in an image.
Abstract: We have analyzed the properties of the HSV (hue, saturation and value) color space with emphasis on the visual perception of the variation in hue, saturation and intensity values of an image pixel. We extract pixel features by either choosing the hue or the intensity as the dominant property based on the saturation value of a pixel. The feature extraction method has been applied for both image segmentation as well as histogram generation applications - two distinct approaches to content based image retrieval (CBIR). Segmentation using this method shows better identification of objects in an image. The histogram retains a uniform color transition that enables us to do a window-based smoothing during retrieval. The results have been compared with those generated using the RGB color space.

555 citations

Journal Article
TL;DR: BLOCKIN BLOCKINÒ BLOCKin× ½¸ÔÔº ¾ßß¿º ¿ ¾ ¾ à ¼ à à 0
Abstract: BLOCKIN BLOCKINÒ BLOCKIN× ½¸ÔÔº ¿ßß¿º ¿

373 citations

Proceedings ArticleDOI
24 Oct 2016
TL;DR: A new bug search scheme is proposed which addresses the scalability challenge in existing cross-platform bug search techniques and further improves search accuracy, and implemented a bug search engine, Genius, and compared it with state-of-art bug search approaches.
Abstract: Because of rampant security breaches in IoT devices, searching vulnerabilities in massive IoT ecosystems is more crucial than ever. Recent studies have demonstrated that control-flow graph (CFG) based bug search techniques can be effective and accurate in IoT devices across different architectures. However, these CFG-based bug search approaches are far from being scalable to handle an enormous amount of IoT devices in the wild, due to their expensive graph matching overhead. Inspired by rich experience in image and video search, we propose a new bug search scheme which addresses the scalability challenge in existing cross-platform bug search techniques and further improves search accuracy. Unlike existing techniques that directly conduct searches based upon raw features (CFGs) from the binary code, we convert the CFGs into high-level numeric feature vectors. Compared with the CFG feature, high-level numeric feature vectors are more robust to code variation across different architectures, and can easily achieve realtime search by using state-of-the-art hashing techniques. We have implemented a bug search engine, Genius, and compared it with state-of-art bug search approaches. Experimental results show that Genius outperforms baseline approaches for various query loads in terms of speed and accuracy. We also evaluated Genius on a real-world dataset of 33,045 devices which was collected from public sources and our system. The experiment showed that Genius can finish a search within 1 second on average when performed over 8,126 firmware images of 420,558,702 functions. By only looking at the top 50 candidates in the search result, we found 38 potentially vulnerable firmware images across 5 vendors, and confirmed 23 of them by our manual analysis. We also found that it took only 0.1 seconds on average to finish searching for all 154 vulnerabilities in two latest commercial firmware images from D-LINK. 103 of them are potentially vulnerable in these images, and 16 of them were confirmed.

325 citations