There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality.

Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

I and i

Functional characterization of a protein sequence is a common goal in biology, and is usually facilitated by having an accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described.

/pdf/comparative-protein-structure-modeling-using-modeller-262v4shee8.pdf

Comparative Protein Structure Modeling Using MODELLER

Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described.

/pdf/comparative-protein-structure-modeling-using-modeller-1ye86wsiv7.pdf

Comparative protein structure modeling using Modeller.

Repbase Update is a comprehensive database of repetitive elements from diverse eukaryotic organisms. Currently, it contains over 3600 annotated sequences representing different families and subfamilies of repeats, many of which are unreported anywhere else. Each sequence is accompanied by a short description and references to the original contributors. Repbase Update includes Repbase Reports, an electronic journal publishing newly discovered transposable elements, and the Transposon Pub, a web-based browser of selected chromosomal maps of transposable elements. Sequences from Repbase Update are used to screen and annotate repetitive elements using programs such as Censor and RepeatMasker. Repbase Update is available on the worldwide web at http://www.girinst.org/Repbase_Update.html.

Repbase Update, a database of eukaryotic repetitive elements

Matrix Factorization Techniques for Recommender Systems

Summary: Tandem Repeat Occurrence Locator (TROLL), is a light-weight Simple Sequence Repeat (SSR) finder based on a slight modification of the Aho‐Corasick algorithm. It is fast and only requires a standard Personal Computer (PC) to operate. We report running times of 127 s to find all SSRs of length 20 bp or more on the complete Arabdopsis genome—approx. 130 Mbases divided in five chromosomes—using a PC Athlon 650 MHz with 256 MB of RAM. Availability: TROLL is an open source project and is available at http://finder.sourceforge.net.

TROLL--tandem repeat occurrence locator.

Motivation: Knowledge of the transmembrane helical topology can help identify binding sites and infer functions for membrane proteins. However, because membrane proteins are hard to solubilize and purify, only a very small amount of membrane proteins have structure and topology experimentally determined. This has motivated various computational methods for predicting the topology of membrane proteins.

Results: We present an improved hidden Markov model, TMMOD, for the identification and topology prediction of transmembrane proteins. Our model uses TMHMM as a prototype, but differs from TMHMM by the architecture of the submodels for loops on both sides of the membrane and also by the model training procedure. In cross-validation experiments using a set of 83 transmembrane proteins with known topology, TMMOD outperformed TMHMM and other existing methods, with an accuracy of 89% for both topology and locations. In another experiment using a separate set of 160 transmembrane proteins, TMMOD had 84% for topology and 89% for locations. When utilized for identifying transmembrane proteins from non-transmembrane proteins, particularly signal peptides, TMMOD has consistently fewer false positives than TMHMM does. Application of TMMOD to a collection of complete genomes shows that the number of predicted membrane proteins accounts for ∼20--30% of all genes in those genomes, and that the topology where both the N- and C-termini are in the cytoplasm is dominant in these organisms except for Caenorhabditis elegans.

Availability: http://liao.cis.udel.edu/website/servers/TMMOD/

Contact: lliao@cis.udel.edu

/pdf/an-improved-hidden-markov-model-for-transmembrane-protein-1h9l5tw3ug.pdf

An improved hidden Markov model for transmembrane protein detection and topology prediction and its applications to complete genomes

Cyanobacterial bloom management through integrated monitoring and forecasting in large shallow eutrophic Lake Taihu (China)

The computational power provided by many-core graphics processing units (GPUs) has been exploited in many applications. The programming techniques currently employed on these GPUs are not sufficient to address problems exhibiting irregular, and unbalanced workload. The problem is exacerbated when trying to effectively exploit multiple GPUs concurrently, which are commonly available in many modern systems. In this paper, we propose a task-based dynamic load-balancing solution for single-and multi-GPU systems. The solution allows load balancing at a finer granularity than what is supported in current GPU programming APIs, such as NVIDIA's CUDA. We evaluate our approach using both micro-benchmarks and a molecular dynamics application that exhibits significant load imbalance. Experimental results with a single-GPU configuration show that our fine-grained task solution can utilize the hardware more efficiently than the CUDA scheduler for unbalanced workload. On multi-GPU systems, our solution achieves near-linear speedup, load balance, and significant performance improvement over techniques based on standard CUDA APIs.

/pdf/dynamic-load-balancing-on-single-and-multi-gpu-systems-3gizpqkhec.pdf

Dynamic load balancing on single- and multi-GPU systems

An innovative reconfigurable supercomputing platform -- XD1000 is developed by XtremeData Inc. to exploit the rapid progress of FPGA technology and the high-performance of Hyper-Transport interconnection. In this paper, we present the implementations of the Smith-Waterman algorithm for both DNA and protein sequences on the platform. The main features include: (1) we bring forward a multistage PE (processing element) design which significantly reduces the FPGA resource usage and hence allows more parallelism to be exploited; (2) our design features a pipelined control mechanism with uneven stage latencies -- a key to minimize the overall PE pipeline cycle time; (3) we also put forward a compressed substitution matrix storage structure, resulting in substantial decrease of the on-chip SRAM usage. Finally, we implement a 384-PE systolic array running at 66.7MHz, which can achieve 25.6GCUPS peak performance. Compared with the 2.2GHz AMD Opteron host processor, the FPGA coprocessor speedups 185X and 250X respectively.

/pdf/implementation-of-the-smith-waterman-algorithm-on-a-1ihysv2imn.pdf

Guang R. Gao

Papers

TROLL--tandem repeat occurrence locator.

An improved hidden Markov model for transmembrane protein detection and topology prediction and its applications to complete genomes

Cyanobacterial bloom management through integrated monitoring and forecasting in large shallow eutrophic Lake Taihu (China)

Dynamic load balancing on single- and multi-GPU systems

Implementation of the Smith-Waterman algorithm on a reconfigurable supercomputing platform