scispace - formally typeset
Search or ask a question

Showing papers in "Genome Biology in 2003"


Journal ArticleDOI
TL;DR: DAMID is a web-accessible program that integrates functional genomic annotations with intuitive graphical summaries that assists in the interpretation of genome-scale datasets by facilitating the transition from data collection to biological meaning.
Abstract: The distributed nature of biological knowledge poses a major challenge to the interpretation of genome-scale datasets, including those derived from microarray and proteomic studies. This report describes DAVID, a web-accessible program that integrates functional genomic annotations with intuitive graphical summaries. Lists of gene or protein identifiers are rapidly annotated and summarized according to shared categorical data for Gene Ontology, protein domain, and biochemical pathway membership. DAVID assists in the interpretation of genome-scale datasets by facilitating the transition from data collection to biological meaning.

8,849 citations


Journal ArticleDOI
TL;DR: The results reaffirm the thesis that miRNAs have an important role in establishing the complex spatial and temporal patterns of gene activity necessary for the orderly progression of development and suggest additional roles in the function of the mature organism.
Abstract: Background: The recent discoveries of microRNA (miRNA) genes and characterization of the first few target genes regulated by miRNAs in Caenorhabditis elegans and Drosophila melanogaster have set the stage for elucidation of a novel network of regulatory control. We present a computational method for wholegenome prediction of miRNA target genes. The method is validated using known examples. For each miRNA, target genes are selected on the basis of three properties: sequence complementarity using a position-weighted local alignment algorithm, free energies of RNA-RNA duplexes, and conservation of target sites in related genomes. Application to the D. melanogaster, Drosophila pseudoobscura and Anopheles gambiae genomes identifies several hundred target genes potentially regulated by one or more known miRNAs.

2,997 citations


Journal ArticleDOI
TL;DR: EASE is a customizable software application for rapid biological interpretation of gene lists that result from the analysis of microarray, proteomics, SAGE and other high-throughput genomic data and is robust to varying methods of normalization, intensity calculation and statistical selection of genes.
Abstract: EASE is a customizable software application for rapid biological interpretation of gene lists that result from the analysis of microarray, proteomics, SAGE and other high-throughput genomic data. The biological themes returned by EASE recapitulate manually determined themes in previously published gene lists and are robust to varying methods of normalization, intensity calculation and statistical selection of genes. EASE is a powerful tool for rapidly converting the results of functional genomics studies from 'genes' to 'themes'.

1,985 citations


Journal ArticleDOI
TL;DR: This work merges many of the available yeast protein-abundance datasets, using the resulting larger 'meta-dataset' to find correlations between protein and mRNA expression, both globally and within smaller categories.
Abstract: Attempts to correlate protein abundance with mRNA expression levels have had variable success. We review the results of these comparisons, focusing on yeast. In the process, we survey experimental techniques for determining protein abundance, principally two-dimensional gel electrophoresis and mass-spectrometry. We also merge many of the available yeast protein-abundance datasets, using the resulting larger 'meta-dataset' to find correlations between protein and mRNA expression, both globally and within smaller categories.

1,812 citations


Journal ArticleDOI
TL;DR: GoMiner, a program package that organizes lists of 'interesting' genes for biological interpretation in the context of the Gene Ontology, provides quantitative and statistical output files and two useful visualizations.
Abstract: We have developed GoMiner, a program package that organizes lists of 'interesting' genes (for example, under- and overexpressed genes from a microarray experiment) for biological interpretation in the context of the Gene Ontology. GoMiner provides quantitative and statistical output files and two useful visualizations. The first is a tree-like structure analogous to that in the AmiGO browser and the second is a compact, dynamically interactive 'directed acyclic graph'. Genes displayed in GoMiner are linked to major public bioinformatics resources.

1,262 citations


Journal ArticleDOI
TL;DR: iJR904 will help to outline the genotype-phenotype relationship for E. coli K-12, as it can account for genomic, transcriptomic, proteomic and fluxomic data simultaneously, and has improved capabilities over iJE660a.
Abstract: Diverse datasets, including genomic, transcriptomic, proteomic and metabolomic data, are becoming readily available for specific organisms. There is currently a need to integrate these datasets within an in silico modeling framework. Constraint-based models of Escherichia coli K-12 MG1655 have been developed and used to study the bacterium's metabolism and phenotypic behavior. The most comprehensive E. coli model to date (E. coli iJE660a GSM) accounts for 660 genes and includes 627 unique biochemical reactions. An expanded genome-scale metabolic model of E. coli (iJR904 GSM/GPR) has been reconstructed which includes 904 genes and 931 unique biochemical reactions. The reactions in the expanded model are both elementally and charge balanced. Network gap analysis led to putative assignments for 55 open reading frames (ORFs). Gene to protein to reaction associations (GPR) are now directly included in the model. Comparisons between predictions made by iJR904 and iJE660a models show that they are generally similar but differ under certain circumstances. Analysis of genome-scale proton balancing shows how the flux of protons into and out of the medium is important for maximizing cellular growth. E. coli iJR904 has improved capabilities over iJE660a. iJR904 is a more complete and chemically accurate description of E. coli metabolism than iJE660a. Perhaps most importantly, iJR904 can be used for analyzing and integrating the diverse datasets. iJR904 will help to outline the genotype-phenotype relationship for E. coli K-12, as it can account for genomic, transcriptomic, proteomic and fluxomic data simultaneously.

1,102 citations


Journal ArticleDOI
TL;DR: Analysis of variance (ANOVA) can be used, and the mixed ANOVA model is a general and powerful approach for microarray experiments with multiple factors and/or several sources of variation.
Abstract: Extracting biological information from microarray data requires appropriate statistical methods. The simplest statistical method for detecting differential expression is the t test, which can be used to compare two conditions when there is replication of samples. With more than two conditions, analysis of variance (ANOVA) can be used, and the mixed ANOVA model is a general and powerful approach for microarray experiments with multiple factors and/or several sources of variation.

957 citations


Journal ArticleDOI
TL;DR: GenMAPPFinder allows the user to rapidly identify GO terms with over-represented numbers of gene-expression changes, and generates GenMAPP graphical files where gene relationships can be explored, annotated, and files can be freely exchanged.
Abstract: MAPPFinder is a tool that creates a global gene-expression profile across all areas of biology by integrating the annotations of the Gene Ontology (GO) Project with the free software package GenMAPP http://www.GenMAPP.org. The results are displayed in a searchable browser, allowing the user to rapidly identify GO terms with over-represented numbers of gene-expression changes. Clicking on GO terms generates GenMAPP graphical files where gene relationships can be explored, annotated, and files can be freely exchanged.

924 citations


Journal ArticleDOI
TL;DR: A long-standing research aim has been to define the mechanisms by which Sp1-like factors and KLFs regulate gene expression and cellular function in a cell- and promoter-specific manner.
Abstract: Sp1-like proteins and Kruppel-like factors (KLFs) are highly related zinc-finger proteins that are important components of the eukaryotic cellular transcriptional machinery. By regulating the expression of a large number of genes that have GC-rich promoters, Sp1-like/KLF transcription regulators may take part in virtually all facets of cellular function, including cell proliferation, apoptosis, differentiation, and neoplastic transformation. Individual members of the Sp1-like/KLF family can function as activators or repressors depending on which promoter they bind and the coregulators with which they interact. A long-standing research aim has been to define the mechanisms by which Sp1-like factors and KLFs regulate gene expression and cellular function in a cell- and promoter-specific manner. Most members of this family have been identified in mammals, with at least 21 Sp1-like/KLF proteins encoded in the human genome, and members are also found in frogs, worms and flies. Sp1-like/KLF proteins have highly conserved carboxy-terminal zinc-finger domains that function in DNA binding. The amino terminus, containing the transcription activation domain, can vary significantly between family members.

872 citations


Journal ArticleDOI
TL;DR: A computational strategy succeeded in identifying bona fide miRNA genes and suggests that miRNAs constitute nearly 1% of predicted protein-coding genes in Drosophila, a percentage similar to the percentage of miRN as recently attributed to other metazoan genomes.
Abstract: Background: MicroRNAs (miRNAs) are a large family of 21-22 nucleotide non-coding RNAs with presumed post-transcriptional regulatory activity. Most miRNAs were identified by direct cloning of small RNAs, an approach that favors detection of abundant miRNAs. Three observations suggested that miRNA genes might be identified using a computational approach. First, miRNAs generally derive from precursor transcripts of 70-100 nucleotides with extended stem-loop structure. Second, miRNAs are usually highly conserved between the genomes of related species. Third, miRNAs display a characteristic pattern of evolutionary divergence. Results: We developed an informatic procedure called 'miRseeker', which analyzed the completed euchromatic sequences of Drosophila melanogaster and D. pseudoobscura for conserved sequences that adopt an extended stem-loop structure and display a pattern of nucleotide divergence characteristic of known miRNAs. The sensitivity of this computational procedure was demonstrated by the presence of 75% (18/24) of previously identified Drosophila miRNAs within the top 124 candidates. In total, we identified 48 novel miRNA candidates that were strongly conserved in more distant insect, nematode, or vertebrate genomes. We verified expression for a total of 24 novel miRNA genes, including 20 of 27 candidates conserved in a third species and 4 of 11 high-scoring, Drosophila-specific candidates. Our analyses lead us to estimate that drosophilid genomes contain around 110 miRNA genes. Conclusions: Our computational strategy succeeded in identifying bona fide miRNA genes and suggests that miRNAs constitute nearly 1% of predicted protein-coding genes in Drosophila, a percentage similar to the percentage of miRNAs recently attributed to other metazoan genomes.

773 citations


Journal ArticleDOI
TL;DR: The different sodium channels have remarkably similar functional properties, but small changes in sodium-channel function are biologically relevant, as underscored by mutations that cause several human diseases of hyperexcitability.
Abstract: Selective permeation of sodium ions through voltage-dependent sodium channels is fundamental to the generation of action potentials in excitable cells such as neurons. These channels are large integral membrane proteins and are encoded by at least ten genes in mammals. The different sodium channels have remarkably similar functional properties, but small changes in sodium-channel function are biologically relevant, as underscored by mutations that cause several human diseases of hyperexcitability.

Journal ArticleDOI
TL;DR: In the cytoplasm, PABPs facilitate the formation of the 'closed loop' structure of the messenger ribonucleoprotein particle that is crucial for additional PABP activities that promote translation initiation and termination, recycling of ribosomes, and stability of the mRNA.
Abstract: Most eukaryotic mRNAs are subject to considerable post-transcriptional modification, including capping, splicing, and polyadenylation. The process of polyadenylation adds a 3' poly(A) tail and provides the mRNA with a binding site for a major class of regulatory factors, the poly(A)-binding proteins (PABPs). These highly conserved polypeptides are found only in eukaryotes; single-celled eukaryotes each have a single PABP, whereas humans have five and Arabidopis has eight. They typically bind poly(A) using one or more RNA-recognition motifs, globular domains common to numerous other eukaryotic RNA-binding proteins. Although they lack catalytic activity, PABPs have several roles in mediating gene expression. Nuclear PABPs are necessary for the synthesis of the poly(A) tail, regulating its ultimate length and stimulating maturation of the mRNA. Association with PABP is also a requirement for some mRNAs to be exported from the nucleus. In the cytoplasm, PABPs facilitate the formation of the 'closed loop' structure of the messenger ribonucleoprotein particle that is crucial for additional PABP activities that promote translation initiation and termination, recycling of ribosomes, and stability of the mRNA. Collectively, these sequential nuclear and cytoplasmic contributions comprise a cycle in which PABPs and the poly(A) tail first create and then eliminate a network of cis- acting interactions that control mRNA function.

Journal ArticleDOI
TL;DR: The database Myc Target Gene prioritizes candidate target genes according to experimental evidence and clusters responsive genes into functional groups and coupled the prioritization of target genes with phylogenetic sequence comparisons to predict c-Myc target binding sites.
Abstract: We report a database of genes responsive to the Myc oncogenic transcription factor. The database Myc Target Gene prioritizes candidate target genes according to experimental evidence and clusters responsive genes into functional groups. We coupled the prioritization of target genes with phylogenetic sequence comparisons to predict c-Myc target binding sites, which are in turn validated by chromatin immunoprecipitation assays. This database is essential for the understanding of the genetic regulatory networks underlying the genesis of cancers.

Journal ArticleDOI
Raul Urrutia1
TL;DR: The largest family of zinc-finger transcription factors comprises those containing the Krüppel-associated box (or KRAB domain), which are present only in tetrapod vertebrates and are involved in maintenance of the nucleolus, cell differentiation, cell proliferation, apoptosis, and neoplastic transformation.
Abstract: The largest family of zinc-finger transcription factors comprises those containing the Kruppel-associated box (or KRAB domain), which are present only in tetrapod vertebrates. Many genes encoding KRAB-containing proteins are arranged in clusters in the human genome, with one cluster close to chromosome 9ql3 and others in centromeric and telomeric regions of other chromosomes, but other genes occur individually throughout the genome. The KRAB domain, which is found in the amino-terminal region of the proteins, behaves as a transcriptional repressor domain by binding to corepressor proteins, whereas the C2H2 zinc-finger motifs bind DNA. The functions currently proposed for members of the KRAB-containing protein family include transcriptional repression of RNA polymerase I, II, and III promoters and binding and splicing of RNA. Members of the family are involved in maintenance of the nucleolus, cell differentiation, cell proliferation, apoptosis, and neoplastic transformation.

Journal ArticleDOI
TL;DR: Osprey builds data-rich graphical representations that are color-coded for gene function and experimental interaction data that allow rapid elaboration and organization of network diagrams in a spoke model format.
Abstract: We have developed a software platform called Osprey for visualization and manipulation of complex interaction networks. Osprey builds data-rich graphical representations that are color-coded for gene function and experimental interaction data. Mouse-over functions allow rapid elaboration and organization of network diagrams in a spoke model format. User-defined large-scale data sets can be readily combined with Osprey for comparison of different methods.

Journal ArticleDOI
TL;DR: Structural data are starting to illuminate the mechanistic role of σ factors in transcription initiation, and members of the σ70 family of sigma factors can broadly be divided into four main groups.
Abstract: Members of the sigma70 family of sigma factors are components of the RNA polymerase holoenzyme that direct bacterial or plastid core RNA polymerase to specific promoter elements that are situated 10 and 35 base-pairs upstream of transcription-initiation points. Members of the sigma70 family also function as contact points for some activator proteins, such as PhoB and lambda(cl), and play a role in the initiation process itself. The primary sigma factor, which is essential for general transcription in exponentially growing cells, is reversibly associated with RNA polymerase and can be replaced by alternative sigma factors that co-ordinately express genes involved in diverse functions, such as stress responses, morphological development and iron uptake. On the basis of gene structure and function, members of the sigma70 family can broadly be divided into four main groups. Sequence alignments of the sigma70 family members reveal that they have four conserved regions, although the highest conservation is found in regions 2 and 4, which are involved in binding to RNA polymerase, recognizing promoters and separating DNA strands (so-called 'DNA melting'). The division of the linear sequence of sigma70 factors into four regions is largely supported by recent structural data indicating that primary sigma factors have three stable domains that incorporate regions 2, 3 and 4. Furthermore, structures of the RNA polymerase holoenzyme have revealed that these domains of sigma70 are spread out across one face of RNA polymerase. These structural data are starting to illuminate the mechanistic role of sigma factors in transcription initiation.

Journal ArticleDOI
TL;DR: DNA microarrays based on long oligonucleotides are powerful tools for the functional annotation and exploration of the P. falciparum genome and may serve as the basis for future drug targets and vaccine development.
Abstract: Background The worldwide persistence of drug-resistant Plasmodium falciparum, the most lethal variety of human malaria, is a global health concern. The P. falciparum sequencing project has brought new opportunities for identifying molecular targets for antimalarial drug and vaccine development.

Journal ArticleDOI
TL;DR: The overall SSR density was comparable in all chromosomes, but the density of different repeats, however, showed significant variation.
Abstract: Background: Simple sequence repeats (SSRs) are found in most organisms, and occupy about 3% of the human genome. Although it is becoming clear that such repeats are important in genomic organization and function and may be associated with disease conditions, their systematic analysis has not been reported. This is the first report examining the distribution and density of simple sequence repeats (1-6 base-pairs (bp)) in the entire human genome. Results: The densities of SSRs across the human chromosomes were found to be relatively uniform. However, the overall density of SSR was found to be high in chromosome 19. Triplets and hexamers were more predominant in exonic regions compared to intronic and intergenic regions, except for chromosome Y. Comparison of densities of various SSRs revealed that whereas trimers and pentamers showed a similar pattern (500-1,000 bp/Mb) across the chromosomes, di- tetra- and hexa-nucleotide repeats showed patterns of higher (2,000-3,000 bp/Mb) density. Repeats of the same nucleotide were found to be higher than other repeat types. Repeats of A, AT, AC, AAT, AAC, AAG, AGC, AAAC, AAAT, AAAG, AAGG, AGAT predominate, whereas repeats of C, CG, ACT, ACG, AACC, AACG, AACT, AAGC, AAGT, ACCC, ACCG, ACCT, CCCG and CCGG are rare. Conclusions: The overall SSR density was comparable in all chromosomes. The density of different repeats, however, showed significant variation. Tri- and hexa-nucleotide repeats are more abundant in exons, whereas other repeats are more abundant in non-coding regions.

Journal ArticleDOI
TL;DR: The predicted structural features indicate that the N1pC/P60 enzymes contain a fold similar to the papain-like peptidases, transglutaminases and arylamine acetyltransferases, as well as several related, but distinct, catalytic activities, such as murein degradation, acyl transfer and amide hydrolysis, have emerged in this superfamily.
Abstract: Peptidoglycan is hydrolyzed by a diverse set of enzymes during bacterial growth, development and cell division. The N1pC/P60 proteins define a family of cell-wall peptidases that are widely represented in various bacterial lineages. Currently characterized members are known to hydrolyze D-γ-glutamyl-meso-diaminopimelate or N-acetylmuramate-L-alanine linkages. Detailed analysis of the N1pC/P60 peptidases showed that these proteins define a large superfamily encompassing several diverse groups of proteins. In addition to the well characterized P60-like proteins, this superfamily includes the AcmB/LytN and YaeF/YiiX families of bacterial proteins, the amidase domain of bacterial and kinetoplastid glutathionylspermidine synthases (GSPSs), and several proteins from eukaryotes, phages, poxviruses, positive-strand RNA viruses, and certain archaea. The eukaryotic members include lecithin retinol acyltransferase (LRAT), nematode developmental regulator Egl-26, and candidate tumor suppressor H-rev107. These eukaryotic proteins, along with the bacterial YaeF/poxviral G6R family, show a circular permutation of the catalytic domain. We identified three conserved residues, namely a cysteine, a histidine and a polar residue, that are involved in the catalytic activities of this superfamily. Evolutionary analysis of this superfamily shows that it comprises four major families, with diverse domain architectures in each of them. Several related, but distinct, catalytic activities, such as murein degradation, acyl transfer and amide hydrolysis, have emerged in the N1pC/P60 superfamily. The three conserved catalytic residues of this superfamily are shown to be equivalent to the catalytic triad of the papain-like thiol peptidases. The predicted structural features indicate that the N1pC/P60 enzymes contain a fold similar to the papain-like peptidases, transglutaminases and arylamine acetyltransferases.

Journal ArticleDOI
TL;DR: The matrix metalloproteinase family in humans comprises 23 enzymes, which are involved in many biological processes and diseases, but this view has changed with the discovery that non-extracellular-matrix molecules are also substrates.
Abstract: The matrix metalloproteinase family in humans comprises 23 enzymes, which are involved in many biological processes and diseases. It was previously thought that these enzymes acted only to degrade components of the extracellular matrix, but this view has changed with the discovery that non-extracellular-matrix molecules are also substrates.

Journal ArticleDOI
TL;DR: The GRID displays data-rich interaction tables for any protein of interest, combines literature-derived and high-throughput interaction datasets, and is readily accessible via the web.
Abstract: We have developed a relational database, called the General Repository for Interaction Datasets (The GRID) to archive and display physical, genetic and functional interactions. The GRID displays data-rich interaction tables for any protein of interest, combines literature-derived and high-throughput interaction datasets, and is readily accessible via the web. Interactions parsed in The GRID can be viewed in graphical form with a versatile visualization tool called Osprey.

Journal ArticleDOI
TL;DR: PRODISTIN, a new computational method allowing the functional clustering of proteins on the basis of protein-protein interaction data, is described, which enabled it to classify 11% of the Saccharomyces cerevisiae proteome into several groups and to predict a cellular function for many otherwise uncharacterized proteins.
Abstract: We here describe PRODISTIN, a new computational method allowing the functional clustering of proteins on the basis of protein-protein interaction data. This method, assessed biologically and statistically, enabled us to classify 11% of the Saccharomyces cerevisiae proteome into several groups, the majority of which contained proteins involved in the same biological process(es), and to predict a cellular function for many otherwise uncharacterized proteins.

Journal ArticleDOI
TL;DR: Protein kinases with a conserved catalytic domain make up one of the largest 'superfamilies' of eukaryotic proteins and play many key roles in biology and disease.
Abstract: Protein kinases with a conserved catalytic domain make up one of the largest 'superfamilies' of eukaryotic proteins and play many key roles in biology and disease. Efforts to identify and classify all the members of the eukaryotic protein kinase superfamily have recently culminated in the mining of essentially complete human genome data.

Journal ArticleDOI
TL;DR: ICA outperforms other leading methods, such as principal component analysis, k-means clustering and the Plaid model, in constructing functionally coherent clusters on microarray datasets from Saccharomyces cerevisiae, Caenorhabditis elegans and human.
Abstract: We apply linear and nonlinear independent component analysis (ICA) to project microarray data into statistically independent components that correspond to putative biological processes, and to cluster genes according to over- or under-expression in each component. We test the statistical significance of enrichment of gene annotations within clusters. ICA outperforms other leading methods, such as principal component analysis, k-means clustering and the Plaid model, in constructing functionally coherent clusters on microarray datasets from Saccharomyces cerevisiae, Caenorhabditis elegans and human.

Journal ArticleDOI
TL;DR: Genomic and proteomic studies have identified many of the genes and gene products differentially expressed during biofilm formation, revealing the complexity of this developmental process.
Abstract: Bacterial communities that are attached to a surface, so-called biofilms, and their inherent resistance to antimicrobial agents are a cause of many persistent and chronic bacterial infections. Recent genomic and proteomic studies have identified many of the genes and gene products differentially expressed during biofilm formation, revealing the complexity of this developmental process.

Journal ArticleDOI
TL;DR: Using two distinct computational approaches, most of the sequences in the human genome that have undergone recent segmental duplications are identified and a significant subset of single-nucleotide polymorphisms in the public databases that are not true SNPs but are potential paralogous sequence variants are identified.
Abstract: Previous studies have suggested that recent segmental duplications, which are often involved in chromosome rearrangements underlying genomic disease, account for some 5% of the human genome. We have developed rapid computational heuristics based on BLAST analysis to detect segmental duplications, as well as regions containing potential sequence misassignments in the human genome assemblies. Our analysis of the June 2002 public human genome assembly revealed that 107.4 of 3,043.1 megabases (Mb) (3.53%) of sequence contained segmental duplications, each with size equal or more than 5 kb and 90% identity. We have also detected that 38.9 Mb (1.28%) of sequence within this assembly is likely to be involved in sequence misassignment errors. Furthermore, we have identified a significant subset (199,965 of 2,327,473 or 8.6%) of single-nucleotide polymorphisms (SNPs) in the public databases that are not true SNPs but are potential paralogous sequence variants. Using two distinct computational approaches, we have identified most of the sequences in the human genome that have undergone recent segmental duplications. Near-identical segmental duplications present a major challenge to the completion of the human genome sequence. Potential sequence misassignments detected in this study would require additional efforts to resolve.

Journal ArticleDOI
TL;DR: It is shown that POCUS can provide high (up to 81-fold) enrichment of real disease genes in the candidate-gene shortlists it produces compared with the original large sets of positional candidates.
Abstract: Here we present POCUS (prioritization of candidate genes using statistics), a novel computational approach to prioritize candidate disease genes that is based on over-representation of functional annotation between loci for the same disease. We show that POCUS can provide high (up to 81-fold) enrichment of real disease genes in the candidate-gene shortlists it produces compared with the original large sets of positional candidates. In contrast to existing methods, POCUS can also suggest counterintuitive candidates.

Journal ArticleDOI
TL;DR: X-ray structures of a truncated, deglycosylated form of germinal ACE and a related enzyme from Drosophila have been reported, and these show that the active site is deep within a central cavity.
Abstract: Angiotensin-I-converting enzyme (ACE) is a monomeric, membrane-bound, zinc- and chloride-dependent peptidyl dipeptidase that catalyzes the conversion of the decapeptide angiotensin I to the octapeptide angiotensin II, by removing a carboxy-terminal dipeptide. ACE has long been known to be a key part of the renin angiotensin system that regulates blood pressure, and ACE inhibitors are important for the treatment of hypertension. There are two forms of the enzyme in humans, the ubiquitous somatic ACE and the sperm-specific germinal ACE, both encoded by the same gene through transcription from alternative promoters. Somatic ACE has two tandem active sites with distinct catalytic properties, whereas germinal ACE, the function of which is largely unknown, has just a single active site. Recently, an ACE homolog, ACE2, has been identified in humans that differs from ACE in being a carboxypeptidase that preferentially removes carboxy-terminal hydrophobic or basic amino acids; it appears to be important in cardiac function. ACE homologs (also known as members of the M2 gluzincin family) have been found in a wide variety of species, even in those that neither have a cardiovascular system nor synthesize angiotensin. X-ray structures of a truncated, deglycosylated form of germinal ACE and a related enzyme from Drosophila have been reported, and these show that the active site is deep within a central cavity. Structure-based drug design targeting the individual active sites of somatic ACE may lead to a new generation of ACE inhibitors, with fewer side-effects than currently available inhibitors.

Journal ArticleDOI
TL;DR: The tightly maintained gene neighborhoods of post-segregational cell killing-related systems appear to have evolved by in situ displacement of genes for toxins or antitoxins by functionally equivalent but evolutionarily unrelated genes.
Abstract: Background Several prokaryotic plasmids maintain themselves in their hosts by means of diverse post-segregational cell killing systems. Recent findings suggest that chromosomally encoded copies of toxins and antitoxins of post-segregational cell killing systems - such as the RelE system - might function as regulatory switches under stress conditions. The RelE toxin cleaves ribosome-associated transcripts, whereas another post-segregational cell killing toxin, ParE, functions as a gyrase inhibitor.

Journal ArticleDOI
TL;DR: This work evaluated several clustering algorithms that incorporate repeated measurements, and shows that algorithms that take advantage of repeated measurements yield more accurate and more stable clusters.
Abstract: Clustering is a common methodology for the analysis of array data, and many research laboratories are generating array data with repeated measurements. We evaluated several clustering algorithms that incorporate repeated measurements, and show that algorithms that take advantage of repeated measurements yield more accurate and more stable clusters. In particular, we show that the infinite mixture model-based approach with a built-in error model produces superior results.