scispace - formally typeset
Search or ask a question

Showing papers in "Bioinformation in 2007"


Journal ArticleDOI
TL;DR: An R package termed Mfuzz is constructed implementing soft clustering tools for microarray data analysis, which can overcome shortcomings of conventional hard clustering techniques and offer further advantages.
Abstract: For the analysis of microarray data, clustering techniques are frequently used. Most of such methods are based on hard clustering of data wherein one gene (or sample) is assigned to exactly one cluster. Hard clustering, however, suffers from several drawbacks such as sensitivity to noise and information loss. In contrast, soft clustering methods can assign a gene to several clusters. They can overcome shortcomings of conventional hard clustering techniques and offer further advantages. Thus, we constructed an R package termed Mfuzz implementing soft clustering tools for microarray data analysis. The additional package Mfuzzgui provides a convenient TclTk based graphical user interface. Availability The R package Mfuzz and Mfuzzgui are available at http://itb1.biologie.hu-berlin.de/~futschik/software/R/Mfuzz/index.html. Their distribution is subject to GPL version 2 license.

828 citations


Journal ArticleDOI
TL;DR: A simple but effective web application creating Venn diagrams from two or three gene lists, each gene in the group list has link to the related information in NCBI's Entrez Nucleotide database.
Abstract: Numerous methods are available to compare results of multiple microarray studies. One of the simplest but most effective of these procedures is to examine the overlap of resulting gene lists in a Venn diagram. Venn diagrams are graphical ways of representing interactions among sets to display information that can be read easily. Here we propose a simple but effective web application creating Venn diagrams from two or three gene lists. Each gene in the group list has link to the related information in NCBI's Entrez Nucleotide database. Availability GeneVenn is available for free at http://mcbc.usm.edu/genevenn/

168 citations


Journal ArticleDOI
TL;DR: ACUA (Automated Codon Usage Tool) has been developed to perform high throughput sequence analysis aiding statistical profiling of codon usage, and is capable of on-click sequence retrieval from the results interface, and this feature is unique to ACUA.
Abstract: UNLABELLED Currently available codon usage analysis tools lack intuitive graphical user interface and are limited to inbuilt calculations. ACUA (Automated Codon Usage Tool) has been developed to perform high throughput sequence analysis aiding statistical profiling of codon usage. The results of ACUA are presented in a spreadsheet with all perquisite codon usage data required for statistical analysis, displayed in a graphical interface. The package is also capable of on-click sequence retrieval from the results interface, and this feature is unique to ACUA. AVAILABILITY The package is available for non-commercial purposes and can be downloaded from: http://www.bioinsilico.com/acua.

55 citations


Journal ArticleDOI
TL;DR: This paper contains a technical survey of the developments of the FDR-related paradigms, emphasizing precise formulation of the problem, concepts of error measurements, and considerations in applications.
Abstract: The microarray gene expression applications have greatly stimulated the statistical research on the massive multiple hypothesis tests problem. There is now a large body of literature in this area and basically five paradigms of massive multiple tests: control of the false discovery rate (FDR), estimation of FDR, significance threshold criteria, control of family-wise error rate (FWER) or generalized FWER (gFWER), and empirical Bayes approaches. This paper contains a technical survey of the developments of the FDR-related paradigms, emphasizing precise formulation of the problem, concepts of error measurements, and considerations in applications. The goal is not to do an exhaustive literature survey, but rather to review the current state of the field.

42 citations


Journal ArticleDOI
TL;DR: A web database containing information (name, literature citation, active compounds and few related full text articles) of the diabetes medicinal plants exhibiting hypoglycemic, antioxidant and antimicrobial effects is described.
Abstract: Effective treatment of diabetes is increasingly dependent on active constituents of medicinal plants capable of controlling hyperglycemia as well as its secondary complications. Sensing the importance of documenting such medicinal plants, here we describe a web database containing information (name, literature citation, active compounds and few related full text articles) of the diabetes medicinal plants exhibiting hypoglycemic, antioxidant and antimicrobial effects. Availability http://www.autogeneralfilters.com/holycross/Home.html.

38 citations


Journal ArticleDOI
TL;DR: This article focuses on the identification of drug targets in E. histolytica by subjecting the Entamoeba genome to BLAST with the e-value inclusion threshold set to 0.005 and choke point analysis.
Abstract: With the Entamoeba genome essentially complete, the organism can be studied from a whole genome standpoint. The understanding of cellular mechanisms and interactions between cellular components is instrumental to the development of new effective drugs and vaccines. Metabolic pathway analysis is becoming increasingly important for assessing inherent network properties in reconstructed biochemical reaction networks. Metabolic pathways illustrate how proteins work in concert to produce cellular compounds or to transmit information at different levels. Identification of drug targets in E. histolytica through metabolic pathway analysis promises to be a novel approach in this direction. This article focuses on the identification of drug targets by subjecting the Entamoeba genome to BLAST with the e-value inclusion threshold set to 0.005 and choke point analysis. A total of 86.9 percent of proposed drug targets with biological evidence are chokepoint reactions in Entamoeba genome database.

33 citations


Journal ArticleDOI
TL;DR: The recently used techniques for comparative genomics and their derived inferences in genome biology are discussed to help assign novel functions for un-annotated genes.
Abstract: The rapidly emerging field of comparative genomics has yielded dramatic results. Comparative genome analysis has become feasible with the availability of a number of completely sequenced genomes. Comparison of complete genomes between organisms allow for global views on genome evolution and the availability of many completely sequenced genomes increases the predictive power in deciphering the hidden information in genome design, function and evolution. Thus, comparison of human genes with genes from other genomes in a genomic landscape could help assign novel functions for un-annotated genes. Here, we discuss the recently used techniques for comparative genomics and their derived inferences in genome biology.

29 citations


Journal ArticleDOI
TL;DR: The present report deals the use of Shannon index for comparing SNP/ indel frequencies mined from ESTlibraries and confirms that the frequency of SNP occurrence in oil palm to use them as markers for genetic studies.
Abstract: The oil palm is a tropical oil bearing tree. Recently EST-derived SNPs and SSRs are a free by-product of the currently expanding EST (Expressed Sequence Tag) data bases. The development of high-throughput methods for the detection of SNPs (Single Nucleotide Polymorphism) and small indels (insertion / deletion) has led to a revolution in their use as molecular markers. Available (5452) Oil palm EST sequences were mined from dbEST of NCBI. CAP3 program was used to assemble EST sequences into contigs. Candidate SNPs and Indel polymorphisms were detected using the perl script auto_snip version 1.0 which has used 576 ESTs for detecting SNPs and Indel sites. We found 1180 SNP sites and 137 indel polymorphisms with frequency 1.36 SNPs / 100 bp. Among the six tissues from which the EST libraries had been generated, mesocarp had high frequency of 2.91 SNPs and indels per 100 bp whereas the zygotic embryos had lowest frequency of 0.15 per 100 bp. We also used the Shannon index to analyze the proportion of ten possible types of SNP/indels. ESTs from tissues of normal apex showed highest values of Shannon index (0.60) whereas abnormal apex had least value (0.02). The present report deals the use of Shannon index for comparing SNP/ indel frequencies mined from ESTlibraries and also confirm that the frequency of SNP occurrence in oil palm to use them as markers for genetic studies.

26 citations


Journal ArticleDOI
TL;DR: The gene selection's ability and the computational effectiveness of the proposed algorithm can be demonstrated and the new algorithm can generate classification results as good as other classification methods, and effectively determine relevant genes for classification purpose.
Abstract: One of the applications of the discriminant analysis on microarray data is to classify patient and normal samples based on gene expression values. The analysis is especially important in medical trials and diagnosis of cancer subtypes. The main contribution of this paper is to propose a simple Fisher-type discriminant method on gene selection in microarray data. In the new algorithm, we calculate a weight for each gene and use the weight values as an indicator to identify the subsets of relevant genes that categorize patient and normal samples. A l(2) - l(1) norm minimization method is implemented to the discriminant process to automatically compute the weights of all genes in the samples. The experiments on two microarray data sets have shown that the new algorithm can generate classification results as good as other classification methods, and effectively determine relevant genes for classification purpose. In this study, we demonstrate the gene selection's ability and the computational effectiveness of the proposed algorithm. Experimental results are given to illustrate the usefulness of the proposed model.

22 citations


Journal ArticleDOI
TL;DR: It is identified that amino acid residues Arg541, Trp762 are important for inhibitor recognition via hydrogen bonding interactions and can be exploited to design Ag85C specific inhibitors.
Abstract: The Ag85 family enzymes are responsible for the synthesis of cell wall components in mycobacterial species. Inhibitors to these enzymes are potential antimycobacterial agents. We have carried out the docking of phoshonate and trehalose analog inhibitors into the three dimensional structure of mycolyltransferase enzyme, Ag85C of M. tuberculosis using the GOLD software. The inhibitor binding positions and affinity were evaluated using both the scoring fitness functions- GoldScore and ChemScore. We observed that the inhibitor binding position identified using the GoldScore was marginally better than the ChemScore. A qualitative agreement between the reported experimental biological activities (IC50) and the GoldScore was observed. We identified that amino acid residues Arg541, Trp762 are important for inhibitor recognition via hydrogen bonding interactions. This information can be exploited to design Ag85C specific inhibitors.

21 citations


Journal ArticleDOI
TL;DR: The present paper briefly reviews the databases in preserving the biodiversity data.
Abstract: The massive development of biodiversity related information systems over the WWW (World Wide Web) has created much excitement in recent years. These arrays of new data sources are counterbalanced by the difficulty in knowing their location and nature. However, biologists and computer scientists have started to pull together in a rising tide of coherence and organization to address this issue. The fledging field of biodiversity informatics is expected to deliver major advances that could turn the WWW into a giant global biodiversity information system. The present paper briefly reviews the databases in preserving the biodiversity data.

Journal ArticleDOI
TL;DR: A fine example is presented which points out the interest of MED-SuMo approach for functional structural annotation for whole-genome sequencing projects.
Abstract: Whole-genome sequencing projects are a major source of unknown function proteins. However, as predicting protein function from sequence remains a difficult task, research groups recently started to use 3D protein structures and structural models to bypass it. MED-SuMo compares protein surfaces analyzing the composition and spatial distribution of specific chemical groups (hydrogen bond donor, acceptor, positive, negative, aromatic, hydrophobic, guanidinium, hydroxyl, acyl and glycine). It is able to recognize proteins that have similar binding sites and thus, may perform similar functions. We present here a fine example which points out the interest of MED-SuMo approach for functional structural annotation.

Journal ArticleDOI
TL;DR: AsthmaPlantBase is described, a database containing information of medicinal plants for treatment of asthma, and is of considerable interest to ethno-botanical community to understand the plants and the parts used for treatment.
Abstract: The knowledge of most plants used in the treatment of asthma, the plant part which is effective in treatment is confined to very few persons who are engaged in folklore medicine. However, this form of medicine is not very popular. Therefore, it is of considerable interest to ethno-botanical community to understand the plants and the parts used for treatment. Here, we describe AsthmaPlantBase, a database containing information of medicinal plants for treatment of asthma. Availability http://www.asthmaplants.com.

Journal ArticleDOI
TL;DR: The results suggest that changes in the response of NK cells to negative or positive modifiers follow progression of AD, and that the response to modulation by cortisol or by IL-2 was significantly greater in patients with AD.
Abstract: Patients with Alzheimer's disease (AD) are characterized by an altered sensitivity to cortisol-mediated modulation of circulating lymphocytes. Longitudinal studies are needed to address the clinical applicability of these abnormalities as prognostic factors. Therefore, we designed a longitudinal study to address the clinical applicability of physiologic modulation of Natural Killer (NK) cell activity as a prognostic factor in AD. NK activity was assessed as baseline measurement and in response to modulation by cortisol at 10-6M. To verify the immunophysiological integrity of the NK cell population, we tested augmentation of NK cytotoxicity by human recombinant interleukin (IL)-2 (100 IU/ml) as control. The response to modulation by cortisol or by IL-2 was significantly greater in patients with AD. Based on change in the Mini-Mental State score at entry and at 18 months, patients with AD could be assigned to a “fast progression” (Δ > 2 points) or to a “slow progression” group (Δ ≤ 2 points). The change in the response of NK cytotoxic activity to cortisol, and the strength of the association of this parameter with circulating activated T cells in time was greater in patients with Fast Progression vs. Slow Progression AD. These results suggest that changes in the response of NK cells to negative (e.g., cortisol) or positive modifiers (e.g., IL-2) follow progression of AD.

Journal ArticleDOI
TL;DR: Through microarray and cluster analysis, genome-wide identification of RNA transcripts associated with quinic acid metabolism in N. crassa is described and suggests a connection to other metabolic circuits.
Abstract: The products of five structural genes and two regulatory genes of the qa gene cluster of Neurospora crassa control the metabolism of quinic acid (QA) as a carbon source. A detailed genetic network model of this metabolic process has been reported. This investigation is designed to expand the current model of the QA reaction network. The ensemble method of network identification was used to model RNA profiling data on the qa gene cluster. Through microarray and cluster analysis, genome-wide identification of RNA transcripts associated with quinic acid metabolism in N. crassa is described and suggests a connection to other metabolic circuits. More than 100 genes whose products include carbon metabolism, protein degradation and modification, amino acid metabolism and ribosome synthesis appear to be connected to quinic acid metabolism. The core of the qa gene cluster network is validated with respect to RNA profiling data obtained from microarrays.

Journal ArticleDOI
TL;DR: The Database of Databases (DoD2007), constructed using html and javascript, provides a partial solution to data integration, with a web-based user interface with simple global search, specific database search, keyword help as well as links to abstracts, full-text and database home pages.
Abstract: Molecular biology databases are an integral part of biological research. To date, many databases were established with varied options to access associated biological data. Depending on the data being annotated, some are architecturally similar while others are specialized. In order to provide a partial solution to data integration, we report Database of Databases (DoD2007), constructed using html and javascript. The database has a web-based user interface with simple global search, specific database search, keyword help as well as links to abstracts, full-text and database home pages. Majority of data were derived form Nucleic Acids Research database issue and other published resources. The current release includes 15 categories with updated descriptions and links to 1082 databases, of which, 209 are new entries. New databases included in this issue are represented with ‘+’ sign before the name and a ‘*’ symbol provided for those that remained silent. Availability The database is freely available at http://www.progenebio.in/DoD/index.htm.

Journal ArticleDOI
TL;DR: In this article, an iterative local Gaussian clustering (ILGC) method was used to identify clusters of expressed genes in colorectal cancer and found three clusters, two large and one small gene clusters, similar to their results which used Gaussian mixture clustering.
Abstract: Gene expression profiling plays an important role in the identification of biological and clinical properties of human solid tumors such as colorectal carcinoma. Profiling is required to reveal underlying molecular features for diagnostic and therapeutic purposes. A non-parametric density-estimation-based approach called iterative local Gaussian clustering (ILGC), was used to identify clusters of expressed genes. We used experimental data from a previous study by Muro and others consisting of 1,536 genes in 100 colorectal cancer and 11 normal tissues. In this dataset, the ILGC finds three clusters, two large and one small gene clusters, similar to their results which used Gaussian mixture clustering. The correlation of each cluster of genes and clinical properties of malignancy of human colorectal cancer was analysed for the existence of tumor or normal, the existence of distant metastasis and the existence of lymph node metastasis.

Journal ArticleDOI
TL;DR: A novel approach for predicting the enzymes and non-enzymes from its amino-acid sequence using artificial neural network (ANN), which is able to achieve 79 percent correct prediction of enzymes/non-enz enzymes (in the set of 660 proteins).
Abstract: The problem of predicting the enzymes and non-enzymes from the protein sequence information is still an open problem in bioinformatics. It is further becoming more important as the number of sequenced information grows exponentially over time. We describe a novel approach for predicting the enzymes and non-enzymes from its amino-acid sequence using artificial neural network (ANN). Using 61 sequence derived features alone we have been able to achieve 79 percent correct prediction of enzymes/non-enzymes (in the set of 660 proteins). For the complete set of 61 parameters using 5-fold cross-validated classification, ANN model reveal a superior model (accuracy = 78.79 plus or minus 6.86 percent, Q(pred) = 74.734 plus or minus 17.08 percent, sensitivity = 84.48 plus or minus 6.73 percent, specificity = 77.13 plus or minus 13.39 percent). The second module of ANN is based on PSSM matrix. Using the same 5-fold cross-validation set, this ANN model predicts enzymes/non-enzymes with more accuracy (accuracy = 80.37 plus or minus 6.59 percent, Q(pred) = 67.466 plus or minus 12.41 percent, sensitivity = 0.9070 plus or minus 3.37 percent, specificity = 74.66 plus or minus 7.17 percent).

Journal ArticleDOI
TL;DR: This intentionally provocative note discusses the issue of sample size in microarray studies from several angles and suggests that the current view of microarrays as no more than a screening tool be changed and small sample studies no longer be considered appropriate.
Abstract: Our answer to the question posed in the title is negative. This intentionally provocative note discusses the issue of sample size in microarray studies from several angles. We suggest that the current view of microarrays as no more than a screening tool be changed and small sample studies no longer be considered appropriate.

Journal ArticleDOI
TL;DR: This work first cleaned and re-structured the PDB data, then analyzed the residue composition of the binding sites in the whole PDB for frequency and for hidden association rules, and found numerous significant relations of the residue-composition of the ligand binding sites on protein surfaces.
Abstract: The Protein Data Bank contains the description of more than 45,000 three-dimensional protein and nucleic-acid structures today. Started to exist as the computer-readable depository of crystallographic data complementing printed articles, the proper interpretation of the content of the individual files in the PDB still frequently needs the detailed information found in the citing publication. This fact implies that the fully automatic processing of the whole PDB is a very hard task. We first cleaned and re-structured the PDB data, then analyzed the residue composition of the binding sites in the whole PDB for frequency and for hidden association rules. Main results of the paper: (i) the cleaning and repairing algorithm (ii) redundancy elimination from the data (iii) application of association rule mining to the cleaned non-redundant data set. We have found numerous significant relations of the residue-composition of the ligand binding sites on protein surfaces, summarized in two figures. One of the classical data-mining methods for exploring implication-rules, the association-rule mining, is capable to find previously unknown residue-set preferences of bind ligands on protein surfaces. Since protein-ligand binding is a key step in enzymatic mechanisms and in drug discovery, these uncovered preferences in the study of more than 19,500 binding sites may help in identifying new binding protein-ligand pairs.

Journal ArticleDOI
TL;DR: Simulation studies reveal that the resulting bootstrap-based methodology for gene selection maintains the false positive rate at the nominal level while competing well with ORIOGEN in terms of power.
Abstract: This article extends the order restricted inference approach for time-course or dose-response gene expression microarray data, introduced by Peddada and colleagues (2003) for the case when gene expression is heteroscedastic over time or dose. The new methodology uses an iterative algorithm to estimate mean expression at various times/doses when mean expression is subject to pre-defined patterns or profiles, known as order-restrictions. Simulation studies reveal that the resulting bootstrap-based methodology for gene selection maintains the false positive rate at the nominal level while competing well with ORIOGEN in terms of power. The proposed methodology is illustrated using a breast cancer cell-line data analyzed by Peddada and colleagues (2003).

Journal ArticleDOI
TL;DR: Virtual learning's importance in developing and delivering an educational system in Bioinformatics based on e-learning environment is discussed.
Abstract: In recent years, virtual learning is growing rapidly. Universities, colleges, and secondary schools are now delivering training and education over the internet. Beside this, resources available over the WWW are huge and understanding the various techniques employed in the field of Bioinformatics is increasingly complex for students during implementation. Here, we discuss its importance in developing and delivering an educational system in Bioinformatics based on e-learning environment.

Journal ArticleDOI
TL;DR: Observations support the hypothesis that Al significantly impairs certain cellular immune responses, and confirm that Al-mediated cell toxicity may play an important role in AD.
Abstract: Aluminium (Al) has been investigated as a neurotoxic substance. Al ranks among the potential environmental risk factors for Alzheimer's disease (AD). Epidemiological studies tested the relationship between Al in drinking water and AD, showing a significant correlation between elevated levels of monomeric Al in water and AD, although data to date remain inconclusive with respect to total Al. The aim of this study was to test whether or not Al exacerbates cellular toxicity mediated by the amyloid β (Aβ) peptide. We evaluated the role of Al in modulating programmed cell death (apoptosis) in human cell cultures. We used the osteosarcoma cell line monolayer (SaOs-2) to demonstrate that treatment of SaOs-2 cultures with the Aβ peptide mid-fragment (25 to 35) at nano M, followed by co-incubation with physiological concentrations of aluminium chloride, which release monomeric Al in solution, led to marked expression of caspase 3, but not caspase 9, key markers of the apoptotic process. The same experimental conditions were shown to blunt significantly the proliferative response of normal human peripheral blood mononuclear cells (PBMC) to phytohemagglutinin (PHA) stimulation. Our observations support the hypothesis that Al significantly impairs certain cellular immune responses, and confirm that Al-mediated cell toxicity may play an important role in AD.

Journal ArticleDOI
TL;DR: It is concluded that an integrative analysis of global gene-expression of the developing embryo can form the foundation for constructing a reference library of signaling pathways and networks for normal and abnormal regulation of the embryonic transcriptome.
Abstract: Monitoring global gene expression provides insight into how genes and regulatory signals work together to guide embryo development. The fields of developmental biology and teratology are now confronted with the need for automated access to a reference library of gene-expression signatures that benchmark programmed (genetic) and adaptive (environmental) regulation of the embryonic transcriptome. Such a library must be constructed from highly-distributed microarray data. Birth Defects Systems Manager (BDSM), an open access knowledge management system, provides custom software to mine public microarray data focused on developmental health and disease. The present study describes tools for seamless data integration in the BDSM library (MetaSample, MetaChip, CIAeasy) using the QueryBDSM module. A field test of the prototype was run using published microarray data series derived from a variety of laboratories, experiments, microarray platforms, organ systems, and developmental stages. The datasets focused on several developing systems in the mouse embryo, including preimplantation stages, heart and nerve development, testis and ovary development, and craniofacial development. Using BDSM data integration tools, a gene-expression signature for 346 genes was resolved that accurately classified samples by organ system and developmental sequence. The module builds a potential for the BDSM approach to decipher a large number developmental processes through comparative bioinformatics analysis of embryological systems at-risk for specific defects, using multiple scenarios to define the range of probabilities leading from molecular phenotype to clinical phenotype. We conclude that an integrative analysis of global gene-expression of the developing embryo can form the foundation for constructing a reference library of signaling pathways and networks for normal and abnormal regulation of the embryonic transcriptome. These tools are available free of charge from the web-site http://systemsanalysis.louisville.edu requiring only a short registration process.

Journal ArticleDOI
TL;DR: A novel text mining approach based on keywords identified at the level of gene annotation sentences (in particular sentences characterizing biological function) instead of entire abstracts to improve keyword extraction for annotational clustering and other types of analyses.
Abstract: Gene function annotation remains a key challenge in modern biology. This is especially true for high-throughput techniques such as gene expression experiments. Vital information about genes is available electronically from biomedical literature in the form of full texts and abstracts. In addition, various publicly available databases (such as GenBank, Gene Ontology and Entrez) provide access to gene-related information at different levels of biological organization, granularity and data format. This information is being used to assess and interpret the results from high-throughput experiments. To improve keyword extraction for annotational clustering and other types of analyses, we have developed a novel text mining approach, which is based on keywords identified at the level of gene annotation sentences (in particular sentences characterizing biological function) instead of entire abstracts. Further, to improve the expressiveness and usefulness of gene annotation terms, we investigated the combination of sentence-level keywords with terms from the Medical Subject Headings (MeSH) and Gene Ontology (GO) resources. We find that sentence-level keywords combined with MeSH terms outperforms the typical ‘baseline’ set-up (term frequencies at the level of abstracts) by a significant margin, whereas the addition of GO terms improves matters only marginally. We validated our approach on the basis of a manually annotated corpus of 200 abstracts generated on the basis of 2 cancer categories and 10 genes per category. We applied the method in the context of three sets of differentially expressed genes obtained from pediatric brain tumor samples. This analysis suggests novel interpretations of discovered gene expression patterns.

Journal ArticleDOI
TL;DR: The aim of this database is to provide the researcher with a quick overview of potential links between genes and proteins with related neurodegenerative diseases, and DND providing a user-friendly interface is designed as a source to enhance research on neuro degenerative disorders.
Abstract: UNLABELLED: A neurological disorder is a disorder caused by the deterioration of certain nerve cells called neurons. Changes in these cells cause them to function abnormally, eventually bringing about their death. In this paper we present a comprehensive database for neurodegenerative diseases, a first-of-its kind covering all known or suspected genes, proteins, pathways related to neurodegenerative diseases. This dynamically compiled database allows researchers to link neurological disorders to the candidate genes & proteins. It serves as a tool to navigate potential gene-protein-pathway relationships in the context of neurodegenerative diseases. The neurodegenerative disorder database covers more then 100 disease concepts including synonyms and research topics. The current version of the database provides links to 728 abstracts and over 203 unique genes/proteins with 137 drugs. Also it is integrated well with other related databases. The aim of this database is to provide the researcher with a quick overview of potential links between genes and proteins with related neurodegenerative diseases. Thus DND providing a user-friendly interface is designed as a source to enhance research on neurodegenerative disorders. AVAILABILITY: http://www.bioinfosastra.com/services/dnd/dnd.html.

Journal ArticleDOI
TL;DR: The availability of an Integrated Web-server as a bioinformatics online package dedicated for in-silico analysis of protein sequence and structure data (IWS) is reported, which provides web interface to both in-house and widely accepted programs from major bio informatics groups.
Abstract: UNLABELLED Rapid increase in protein sequence information from genome sequencing projects demand the intervention of bioinformatics tools to recognize interesting gene-products and associated function. Often, multiple algorithms need to be employed to improve accuracy in predictions and several structure prediction algorithms are on the public domain. Here, we report the availability of an Integrated Web-server as a bioinformatics online package dedicated for in-silico analysis of protein sequence and structure data (IWS). IWS provides web interface to both in-house and widely accepted programs from major bioinformatics groups, organized as 10 different modules. IWS also provides interactive images for Analysis Work Flow, which will provide transparency to the user to carry out analysis by moving across modules seamlessly and to perform their predictions in a rapid manner. AVAILABILITY IWS IS AVAILABLE FROM THE URL: http://caps.ncbs.res.in/iws.

Journal ArticleDOI
TL;DR: The present paper makes a strong case for utilizing existing biological information in the clustering process by marrying existing biological knowledge with experimental data in creating an overall dissimilarity that can be used with any clustering algorithm that uses a general Dissimilarity matrix.
Abstract: UNLABELLED In this paper we propose a data based algorithm to marry existing biological knowledge (e.g., functional annotations of genes) with experimental data (gene expression profiles) in creating an overall dissimilarity that can be used with any clustering algorithm that uses a general dissimilarity matrix. We explore this idea with two publicly available gene expression data sets and functional annotations where the results are compared with the clustering results that uses only the experimental data. Although more elaborate evaluations might be called for, the present paper makes a strong case for utilizing existing biological information in the clustering process. AVAILABILITY Supplement is available at www.somnathdatta.org/Supp/Bioinformation/appendix.pdf.

Journal ArticleDOI
TL;DR: Computational prediction methodology was employed to define putative gene-gene and gene-environment interactions in vSMCs subjected to oxidative chemical stress and established biological relationships were derived computationally confirming the usefulness of the algorithm in uncovering novel biological relationships worthy of future investigation.
Abstract: To understand the complex nature of the atherogenic response initiated by oxidative stress in vascular smooth muscle cells (vSMCs), computational prediction methodology was employed to define putative gene-gene and gene-environment interactions in vSMCs subjected to oxidative chemical stress. Computational relationships were derived from the global gene expression profiles of murine cells challenged with a chemical pro-oxidant to cause oxidative stress or cells treated with anti-oxidant prior to oxidative injury. Target clones were chosen based on their biological relevance within the context of the atherogenic response and included lysyl oxidase, matrix metalloproteinase 2, insulin like growth factor binding protein 5, and lymphocyte antigen 6c. Established biological relationships were derived computationally confirming the usefulness of the algorithm in uncovering novel biological relationships worthy of future investigation. Thus, the predictive algorithm can be a useful tool to advance the frontiers of biological discovery.

Journal ArticleDOI
TL;DR: In this article, it was shown that large proteins, which contain more than a single domain, do have isoelectric points less variable than small proteins, such as small proteins which contain only one domain.
Abstract: Although the distribution of protein isoelectric points is multi-modal, large proteins show isoelectric points less variable than small proteins and their isoelectric points tend to converge to a unique value, close to the pH of the milieu in which the proteins are functional, as far as the protein dimension increases. This study demonstrates that large proteins, which contain more than a single domain, do have isoelectric points less variable than small proteins, which contains a single domain. However, the distribution of the isoelectric points of the single domains, contained in large proteins, resembles that of small proteins, which contain a single domain. Thus, large proteins can be soluble even if their pI is very close to the pH of the milieu, in which they perform their function, since they can contain several domains, the electrostatic properties of each of which mirror those of small proteins.