scispace - formally typeset
Search or ask a question
Author

Guang R. Gao

Other affiliations: University of Alberta, University of Delaware, IBM  ...read more
Bio: Guang R. Gao is an academic researcher from Chinese Academy of Sciences. The author has contributed to research in topics: Compiler & Software pipelining. The author has an hindex of 50, co-authored 416 publications receiving 7775 citations. Previous affiliations of Guang R. Gao include University of Alberta & University of Delaware.


Papers
More filters
Journal ArticleDOI
TL;DR: Tandem Repeat Occurrence Locator (TROLL), is a light-weight Simple Sequence Repeat (SSR) finder based on a slight modification of the Aho-Corasick algorithm that is fast and only requires a standard Personal Computer to operate.
Abstract: Summary: Tandem Repeat Occurrence Locator (TROLL), is a light-weight Simple Sequence Repeat (SSR) finder based on a slight modification of the Aho‐Corasick algorithm. It is fast and only requires a standard Personal Computer (PC) to operate. We report running times of 127 s to find all SSRs of length 20 bp or more on the complete Arabdopsis genome—approx. 130 Mbases divided in five chromosomes—using a PC Athlon 650 MHz with 256 MB of RAM. Availability: TROLL is an open source project and is available at http://finder.sourceforge.net.

209 citations

Journal ArticleDOI
TL;DR: Application of TMMOD to a collection of complete genomes shows that the number of predicted membrane proteins accounts for approximately 20-30% of all genes in those genomes, and that the topology where both the N- and C-termini are in the cytoplasm is dominant in these organisms except for Caenorhabditis elegans.
Abstract: Motivation: Knowledge of the transmembrane helical topology can help identify binding sites and infer functions for membrane proteins. However, because membrane proteins are hard to solubilize and purify, only a very small amount of membrane proteins have structure and topology experimentally determined. This has motivated various computational methods for predicting the topology of membrane proteins. Results: We present an improved hidden Markov model, TMMOD, for the identification and topology prediction of transmembrane proteins. Our model uses TMHMM as a prototype, but differs from TMHMM by the architecture of the submodels for loops on both sides of the membrane and also by the model training procedure. In cross-validation experiments using a set of 83 transmembrane proteins with known topology, TMMOD outperformed TMHMM and other existing methods, with an accuracy of 89% for both topology and locations. In another experiment using a separate set of 160 transmembrane proteins, TMMOD had 84% for topology and 89% for locations. When utilized for identifying transmembrane proteins from non-transmembrane proteins, particularly signal peptides, TMMOD has consistently fewer false positives than TMHMM does. Application of TMMOD to a collection of complete genomes shows that the number of predicted membrane proteins accounts for ∼20--30% of all genes in those genomes, and that the topology where both the N- and C-termini are in the cytoplasm is dominant in these organisms except for Caenorhabditis elegans. Availability: http://liao.cis.udel.edu/website/servers/TMMOD/ Contact: lliao@cis.udel.edu

180 citations

Journal ArticleDOI
TL;DR: Although the cyanobacterial blooms had persisted during the monitoring period, there had been a reduction in frequency and intensity of the Cyanobacterial bloom induced black water agglomerates, and there have been no further drinking water crises.

162 citations

Proceedings ArticleDOI
19 Apr 2010
TL;DR: Experimental results show that the proposed task-based dynamic load-balancing solution can utilize the hardware more efficiently than the CUDA scheduler for unbalanced workload, and achieves near-linear speedup, load balance, and significant performance improvement over techniques based on standard CUDA APIs.
Abstract: The computational power provided by many-core graphics processing units (GPUs) has been exploited in many applications. The programming techniques currently employed on these GPUs are not sufficient to address problems exhibiting irregular, and unbalanced workload. The problem is exacerbated when trying to effectively exploit multiple GPUs concurrently, which are commonly available in many modern systems. In this paper, we propose a task-based dynamic load-balancing solution for single-and multi-GPU systems. The solution allows load balancing at a finer granularity than what is supported in current GPU programming APIs, such as NVIDIA's CUDA. We evaluate our approach using both micro-benchmarks and a molecular dynamics application that exhibits significant load imbalance. Experimental results with a single-GPU configuration show that our fine-grained task solution can utilize the hardware more efficiently than the CUDA scheduler for unbalanced workload. On multi-GPU systems, our solution achieves near-linear speedup, load balance, and significant performance improvement over techniques based on standard CUDA APIs.

147 citations

Proceedings ArticleDOI
11 Nov 2007
TL;DR: The implementations of the Smith-Waterman algorithm for both DNA and protein sequences on the XD1000 platform are presented and a multistage PE (processing element) design is brought forward which significantly reduces the FPGA resource usage and hence allows more parallelism to be exploited.
Abstract: An innovative reconfigurable supercomputing platform -- XD1000 is developed by XtremeData Inc. to exploit the rapid progress of FPGA technology and the high-performance of Hyper-Transport interconnection. In this paper, we present the implementations of the Smith-Waterman algorithm for both DNA and protein sequences on the platform. The main features include: (1) we bring forward a multistage PE (processing element) design which significantly reduces the FPGA resource usage and hence allows more parallelism to be exploited; (2) our design features a pipelined control mechanism with uneven stage latencies -- a key to minimize the overall PE pipeline cycle time; (3) we also put forward a compressed substitution matrix storage structure, resulting in substantial decrease of the on-chip SRAM usage. Finally, we implement a 384-PE systolic array running at 66.7MHz, which can achieve 25.6GCUPS peak performance. Compared with the 2.2GHz AMD Opteron host processor, the FPGA coprocessor speedups 185X and 250X respectively.

144 citations


Cited by
More filters
Journal ArticleDOI

[...]

08 Dec 2001-BMJ
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

33,785 citations

Journal ArticleDOI
TL;DR: This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications.
Abstract: Functional characterization of a protein sequence is a common goal in biology, and is usually facilitated by having an accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described.

3,495 citations

Journal ArticleDOI
TL;DR: This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications.
Abstract: Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described.

3,006 citations

Journal ArticleDOI
TL;DR: Repbase Update is a comprehensive database of repetitive elements from diverse eukaryotic organisms that contains over 3600 annotated sequences representing different families and subfamilies of repeats, many of which are unreported anywhere else.
Abstract: Repbase Update is a comprehensive database of repetitive elements from diverse eukaryotic organisms. Currently, it contains over 3600 annotated sequences representing different families and subfamilies of repeats, many of which are unreported anywhere else. Each sequence is accompanied by a short description and references to the original contributors. Repbase Update includes Repbase Reports, an electronic journal publishing newly discovered transposable elements, and the Transposon Pub, a web-based browser of selected chromosomal maps of transposable elements. Sequences from Repbase Update are used to screen and annotate repetitive elements using programs such as Censor and RepeatMasker. Repbase Update is available on the worldwide web at http://www.girinst.org/Repbase_Update.html.

2,921 citations