scispace - formally typeset
Search or ask a question

Showing papers in "Genome Biology in 2001"


Journal ArticleDOI
TL;DR: A subset of the FGF family, expressed in adult tissue, is important for neuronal signal transduction in the central and peripheral nervous systems.
Abstract: Fibroblast growth factors (FGFs) make up a large family of polypeptide growth factors that are found in organisms ranging from nematodes to humans. In vertebrates, the 22 members of the FGF family range in molecular mass from 17 to 34 kDa and share 13-71% amino acid identity. Between vertebrate species, FGFs are highly conserved in both gene structure and amino-acid sequence. FGFs have a high affinity for heparan sulfate proteoglycans and require heparan sulfate to activate one of four cell-surface FGF receptors. During embryonic development, FGFs have diverse roles in regulating cell proliferation, migration and differentiation. In the adult organism, FGFs are homeostatic factors and function in tissue repair and response to injury. When inappropriately expressed, some FGFs can contribute to the pathogenesis of cancer. A subset of the FGF family, expressed in adult tissue, is important for neuronal signal transduction in the central and peripheral nervous systems.

2,228 citations


Journal ArticleDOI
TL;DR: The model-based approach reduces the variability of low expression estimates, and provides a natural method of calculating expression values for PM-only arrays, and the standard errors attached to expression values can be used to assess the reliability of downstream analysis.
Abstract: A model-based analysis of oligonucleotide expression arrays we developed previously uses a probe-sensitivity index to capture the response characteristic of a specific probe pair and calculates model-based expression indexes (MBEI). MBEI has standard error attached to it as a measure of accuracy. Here we investigate the stability of the probe-sensitivity index across different tissue types, the reproducibility of results in replicate experiments, and the use of MBEI in perfect match (PM)-only arrays. Probe-sensitivity indexes are stable across tissue types. The target gene's presence in many arrays of an array set allows the probe-sensitivity index to be estimated accurately. We extended the model to obtain expression values for PM-only arrays, and found that the 20-probe PM-only model is comparable to the 10-probe PM/MM difference model, in terms of the expression correlations with the original 20-probe PM/MM difference model. MBEI method is able to extend the reliable detection limit of expression to a lower mRNA concentration. The standard errors of MBEI can be used to construct confidence intervals of fold changes, and the lower confidence bound of fold change is a better ranking statistic for filtering genes. We can assign reliability indexes for genes in a specific cluster of interest in hierarchical clustering by resampling clustering trees. A software dChip implementing many of these analysis methods is made available. The model-based approach reduces the variability of low expression estimates, and provides a natural method of calculating expression values for PM-only arrays. The standard errors attached to expression values can be used to assess the reliability of downstream analysis.

1,065 citations


Journal ArticleDOI
TL;DR: Protein microarrays can provide a practical means to characterize patterns of variation in hundreds of thousands of different proteins in clinical or research applications, and are suggested to be sufficient for measurement of many clinically important proteins in patient blood samples.
Abstract: We describe a method for printing protein microarrays, and using these microarrays in a comparative fluorescence assay to measure the abundance of many specific proteins in complex solutions. A robotic device was used to print hundreds of specific antibody or antigen solutions in an array on the surface of derivatized microscope slides. Two complex protein samples, one serving as a standard for comparative quantitation, and the other representing an experimental sample in which the concentrations of specific proteins were to be measured, were labeled by covalent attachment of spectrally-resolvable fluorescent dyes. Specific antibody-antigen interactions localized specific components of the complex mixtures to defined cognate spots in the array, where the relative intensity of the fluorescent signals representing the experimental sample and the reference standard provided a measure of each protein's abundance in the experimental sample. To characterize the specificity, sensitivity and accuracy of this assay, we analyzed the performance of 115 antibody/antigen pairs. 50% of the arrayed antigens, and 20% of the arrayed antibodies, provided specific and accurate measurements of their cognate ligands at or below concentrations of 1.6 µg/ml and 0.34 µg/ml, respectively. Some of the antibody/antigen pairs allowed detection of the cognate ligands at absolute concentrations below 1 ng/ml, and partial concentrations of less than 1 part in 106, sensitivities sufficient for measurement of many clinically important proteins in patient blood samples. Protein microarrays can provide a simple and practical means to characterize patterns of variation in hundreds or thousands of different proteins, in clinical or research applications.

939 citations


Journal ArticleDOI
TL;DR: The Rab family is part of the Ras superfamily of small GTPases, which regulate vesicle formation, actin- and tubulin-dependent vesicles movement, and membrane fusion in yeast and humans.
Abstract: The Rab family is part of the Ras superfamily of small GTPases. There are at least 60 Rab genes in the human genome, and a number of Rab GTPases are conserved from yeast to humans. The different Rab GTPases are localized to the cytosolic face of specific intracellular membranes, where they function as regulators of distinct steps in membrane traffic pathways. In the GTP-bound form, the Rab GTPases recruit specific sets of effector proteins onto membranes. Through their effectors, Rab GTPases regulate vesicle formation, actin- and tubulin-dependent vesicle movement, and membrane fusion.

745 citations


Journal ArticleDOI
TL;DR: Here, using sequence profile searches, it is shown that several previously undetected protein families contain 2OG-Fe(II) oxygenase fold, which allows us to predict the catalytic activity for a wide range of biologically important, but biochemically uncharacterized proteins from eukaryotes and bacteria.
Abstract: Protein fold recognition using sequence profile searches frequently allows prediction of the structure and biochemical mechanisms of proteins with an important biological function but unknown biochemical activity. Here we describe such predictions resulting from an analysis of the 2-oxoglutarate (2OG) and Fe(II)-dependent oxygenases, a class of enzymes that are widespread in eukaryotes and bacteria and catalyze a variety of reactions typically involving the oxidation of an organic substrate using a dioxygen molecule. We employ sequence profile analysis to show that the DNA repair protein AlkB, the extracellular matrix protein leprecan, the disease-resistance-related protein EGL-9 and several uncharacterized proteins define novel families of enzymes of the 2OG-Fe(II) oxygenase superfamily. The identification of AlkB as a member of the 2OG-Fe(II) oxygenase superfamily suggests that this protein catalyzes oxidative detoxification of alkylated bases. More distant homologs of AlkB were detected in eukaryotes and in plant RNA viruses, leading to the hypothesis that these proteins might be involved in RNA demethylation. The EGL-9 protein from Caenorhabditis elegans is necessary for normal muscle function and its inactivation results in resistance against paralysis induced by the Pseudomonas aeruginosa toxin. EGL-9 and leprecan are predicted to be novel protein hydroxylases that might be involved in the generation of substrates for protein glycosylation. Here, using sequence profile searches, we show that several previously undetected protein families contain 2OG-Fe(II) oxygenase fold. This allows us to predict the catalytic activity for a wide range of biologically important, but biochemically uncharacterized proteins from eukaryotes and bacteria.

520 citations


Journal ArticleDOI
TL;DR: A phylogenetic analysis of the conserved amino acids encoded by these Arabidopsis genes reveals the presence of 14 distinct groups of UGTs, providing a foundation for further analysis of this large enzyme superfamily as well as a platform for future biotechnological applications.
Abstract: Uridine diphosphate (UDP) glycosyltransferases (UGTs) mediate the transfer of glycosyl residues from activated nucleotide sugars to acceptor molecules (aglycones), thus regulating properties of the acceptors such as their bioactivity, solubility and transport within the cell and throughout the organism. A superfamily of over 100 genes encoding UGTs, each containing a 42 amino acid consensus sequence, has been identified in the model plant Arabidopsis thaliana. A phylogenetic analysis of the conserved amino acids encoded by these Arabidopsis genes reveals the presence of 14 distinct groups of UGTs in this organism. Genes encoding UGTs have also been identified in several other higher plant species. Very little is yet known about the regulation of plant UGT genes or the localization of the enzymes they encode at the cellular and subcellular levels. The substrate specificities of these UGTs are now beginning to be established and will provide a foundation for further analysis of this large enzyme superfamily as well as a platform for future biotechnological applications.

499 citations


Journal ArticleDOI
TL;DR: Complete sequences of numerous mitochondrial, many prokaryotic, and several nuclear genomes are now available and confirm that the mitochondrial genome originated from a eubacterial ancestor but raise questions about the evolutionary antecedents of the mitochondrial proteome.
Abstract: Complete sequences of numerous mitochondrial, many prokaryotic, and several nuclear genomes are now available. These data confirm that the mitochondrial genome originated from a eubacterial (specifically α-proteobacterial) ancestor but raise questions about the evolutionary antecedents of the mitochondrial proteome.

461 citations


Journal ArticleDOI
TL;DR: It is shown here that a simple model of processes acting at the nucleotide level explains codon usage across a large sample of species and quantitatively predicts responses of individual codons and amino acids to genome composition.
Abstract: Correlations between genome composition (in terms of GC content) and usage of particular codons and amino acids have been widely reported, but poorly explained. We show here that a simple model of processes acting at the nucleotide level explains codon usage across a large sample of species (311 bacteria, 28 archaea and 257 eukaryotes). The model quantitatively predicts responses (slope and intercept of the regression line on genome GC content) of individual codons and amino acids to genome composition. Codons respond to genome composition on the basis of their GC content relative to their synonyms (explaining 71-87% of the variance in response among the different codons, depending on measure). Amino-acid responses are determined by the mean GC content of their codons (explaining 71-79% of the variance). Similar trends hold for genes within a genome. Position-dependent selection for error minimization explains why individual bases respond differently to directional mutation pressure. Our model suggests that GC content drives codon usage (rather than the converse). It unifies a large body of empirical evidence concerning relationships between GC content and amino-acid or codon usage in disparate systems. The relationship between GC content and codon and amino-acid usage is ahistorical; it is replicated independently in the three domains of living organisms, reinforcing the idea that genes and genomes at mutation/selection equilibrium reproduce a unique relationship between nucleic acid and protein composition. Thus, the model may be useful in predicting amino-acid or nucleotide sequences in poorly characterized taxa.

399 citations


Journal ArticleDOI
TL;DR: Non-syntenic associations between different chromosomes introduce predictable distortions in quantitative trait locus (QTL) datasets that can be partly corrected using two-locus correlation matrices.
Abstract: Background: Recombinant inbred (RI) strains of mice are an important resource used to map and analyze complex traits. They have proved particularly effective in multidisciplinary genetic studies. Widespread use of RI strains has been hampered by their modest numbers and by the difficulty of combining results derived from different RI sets. Results: We have increased the density of typed microsatellite markers two- to five-fold in each of several major RI sets that share C57BL/6 as a parental strain (AXB, BXA, BXD, BXH and CXB). A common set of 490 markers was genotyped in just over 100 RI strains. Genotypes of around 1,100 additional microsatellites in one or more RI sets were generated, collected and checked for errors. Consensus RI maps that integrate genotypes of approximately 1,600 microsatellite loci were assembled. The genomes of individual strains typically incorporate 45-55 recombination breakpoints. The collected RI set - termed the BXN set - contains approximately 5,000 breakpoints. The distribution of recombinations approximates a Poisson distribution and distances between breakpoints average about 0.5 centimorgans (cM). Locations of most breakpoints have been defined with a precision of < 2 cM. Genotypes deviate from Hardy-Weinberg equilibrium in only a small number of intervals. Conclusions: Consensus maps derived from RI strains conform almost exactly to theoretical expectation and are close to the length predicted by the Haldane-Waddington equation (x3.6 for a 2-3 cM interval between markers). Non-syntenic associations between different chromosomes introduce predictable distortions in quantitative trait locus (QTL) datasets that can be partly corrected using two-locus correlation matrices.

386 citations


Journal ArticleDOI
TL;DR: The present genomic-scale measurement of mRNA turnover uncovered a regulatory logic that links gene function with mRNA half-life, and suggests that flavopiridol may be more effective against types of cancer that are highly dependent on genes with unstable mRNAs.
Abstract: Background Flavopiridol, a flavonoid currently in cancer clinical trials, inhibits cyclin-dependent kinases (CDKs) by competitively blocking their ATP-binding pocket. However, the mechanism of action of flavopiridol as an anti-cancer agent has not been fully elucidated.

378 citations


Journal ArticleDOI
TL;DR: The inhibitor of apoptosis (IAP) family of proteins prevent cell death by binding to and inhibiting active caspases and are negatively regulated by IAP-binding proteins, such as the mammalian protein DIABLO/Smac.
Abstract: Apoptosis is a physiological cell death process important for development, homeostasis and the immune defence of multicellular animals. The key effectors of apoptosis are caspases, cysteine proteases that cleave after aspartate residues. The inhibitor of apoptosis (IAP) family of proteins prevent cell death by binding to and inhibiting active caspases and are negatively regulated by IAP-binding proteins, such as the mammalian protein DIABLO/Smac. IAPs are characterized by the presence of one to three domains known as baculoviral IAP repeat (BIR) domains and many also have a RING-finger domain at their carboxyl terminus. More recently, a second group of BIR-domain-containing proteins (BIRPs) have been identified that includes the mammalian proteins Bruce and Survivin as well as BIR-containing proteins in yeasts and Caenorhabditis elegans. These Survivin-like BIRPs regulate cytokinesis and mitotic spindle formation. In this review, we describe the IAPs and other BIRPs, their evolutionary relationships and their subcellular and tissue localizations.

Journal ArticleDOI
TL;DR: Comparative genomic analysis revealed that the presence of the ESAT-6 gene cluster is a feature of some high-G+C Gram-positive bacteria.
Abstract: Background The genome of Mycobacterium tuberculosis H37Rv has five copies of a cluster of genes known as the ESAT-6 loci. These clusters contain members of the CFP-10 (lhp) and ESAT-6 (esat-6) gene families (encoding secreted T-cell antigens that lack detectable secretion signals) as well as genes encoding secreted, cell-wall-associated subtilisin-like serine proteases, putative ABC transporters, ATP-binding proteins and other membrane-associated proteins. These membrane-associated and energy-providing proteins may function to secrete members of the ESAT-6 and CFP-10 protein families, and the proteases may be involved in processing the secreted peptide.

Journal ArticleDOI
TL;DR: The nitrilase superfamily consists of thiol enzymes involved in natural product biosynthesis and post-translational modification in plants, animals, fungi and certain prokaryotes and genetic and biochemical analysis of the family members and their associated domains assists in predicting the localization, specificity and cell biology of hundreds of uncharacterized protein sequences.
Abstract: The nitrilase superfamily consists of thiol enzymes involved in natural product biosynthesis and post-translational modification in plants, animals, fungi and certain prokaryotes. On the basis of sequence similarity and the presence of additional domains, the superfamily can be classified into 13 branches, nine of which have known or deduced specificity for specific nitrile- or amide-hydrolysis or amide-condensation reactions. Genetic and biochemical analysis of the family members and their associated domains assists in predicting the localization, specificity and cell biology of hundreds of uncharacterized protein sequences.

Journal ArticleDOI
TL;DR: The identification and cloning of all functional human odorant receptor genes is an important initial step in understanding receptor-ligand specificity and combinatorial encoding of odorant stimuli in human olfaction.
Abstract: The mammalian olfactory apparatus is able to recognize and distinguish thousands of structurally diverse volatile chemicals. This chemosensory function is mediated by a very large family of seven-transmembrane olfactory (odorant) receptors encoded by approximately 1,000 genes, the majority of which are believed to be pseudogenes in humans. The strategy of our sequence database mining for full-length, functional candidate odorant receptor genes was based on the high overall sequence similarity and presence of a number of conserved sequence motifs in all known mammalian odorant receptors as well as the absence of introns in their coding sequences. We report here the identification and physical cloning of 347 putative human full-length odorant receptor genes. Comparative sequence analysis of the predicted gene products allowed us to identify and define a number of consensus sequence motifs and structural features of this vast family of receptors. A new nomenclature for human odorant receptors based on their chromosomal localization and phylogenetic analysis is proposed. We believe that these sequences represent the essentially complete repertoire of functional human odorant receptors. The identification and cloning of all functional human odorant receptor genes is an important initial step in understanding receptor-ligand specificity and combinatorial encoding of odorant stimuli in human olfaction.

Journal ArticleDOI
TL;DR: The structures and functions of family-B GPCRs are described and a simplified nomenclature for these proteins are proposed and it is suggested that these proteins have a common evolutionary origin for all of them.
Abstract: All G-protein-coupled receptors (GPCRs) share a common molecular architecture (with seven putative transmembrane segments) and a common signaling mechanism, in that they interact with G proteins (heterotrimeric GTPases) to regulate the synthesis of intracellular second messengers such as cyclic AMP, inositol phosphates, diacylglycerol and calcium ions. Historically, GPCRs have been classified into six families, which were thought to be unrelated; three of these are found in vertebrates. Recent work has identified several new GCPR families and suggested the possibility of a common evolutionary origin for all of them. Family B (the secretin-receptor family or 'family 2') of the GPCRs is a small but structurally and functionally diverse group of proteins that includes receptors for polypeptide hormones, molecules thought to mediate intercellular interactions at the plasma membrane and a group of Drosophila proteins that regulate stress responses and longevity. Family-B GPCRs have been found in all animal species investigated, including mammals, Caenorhabditis elegans and Drosophila melanogaster, but not in plants, fungi or prokaryotes. In this article, I describe the structures and functions of family-B GPCRs and propose a simplified nomenclature for these proteins.

Journal ArticleDOI
TL;DR: Analysis of variance (ANOVA) and feature selection methods designed to select genes showing strain- or region-dependent patterns of expression are concluded to be more powerful than methods based on fold-change thresholds and other ad hoc selection criteria.
Abstract: We performed a statistical analysis of a previously published set of gene expression microarray data from six different brain regions in two mouse strains. In the previous analysis, 24 genes showing expression differences between the strains and about 240 genes with regional differences in expression were identified. Like many gene expression studies, that analysis relied primarily on ad hoc 'fold change' and 'absent/present' criteria to select genes. To determine whether statistically motivated methods would give a more sensitive and selective analysis of gene expression patterns in the brain, we decided to use analysis of variance (ANOVA) and feature selection methods designed to select genes showing strain- or region-dependent patterns of expression. Our analysis revealed many additional genes that might be involved in behavioral differences between the two mouse strains and functional differences between the six brain regions. Using conservative statistical criteria, we identified at least 63 genes showing strain variation and approximately 600 genes showing regional variation. Unlike ad hoc methods, ours have the additional benefit of ranking the genes by statistical score, permitting further analysis to focus on the most significant. Comparison of our results to the previous studies and to published reports on individual genes show that we achieved high sensitivity while preserving selectivity. Our results indicate that molecular differences between the strains and regions studied are larger than indicated previously. We conclude that for large complex datasets, ANOVA and feature selection, alone or in combination, are more powerful than methods based on fold-change thresholds and other ad hoc selection criteria.

Journal ArticleDOI
TL;DR: The most plausible interpretation of this reconstruction is that Buchnera lost many genes through the fixation of large deletions soon after the acquisition of an obligate endosymbiotic lifestyle, suggesting that final genome composition may be partly the chance outcome of initial deletions and that neighboring genes influence the likelihood of loss of particular genes and pathways.
Abstract: Very small genomes have evolved repeatedly in eubacterial lineages that have adopted obligate associations with eukaryotic hosts. Complete genome sequences have revealed that small genomes retain very different gene sets, raising the question of how final genome content is determined. To examine the process of genome reduction, the tiny genome of the endosymbiont Buchnera aphidicola was compared to the larger ancestral genome, reconstructed on the basis of the phylogenetic distribution of gene orthologs among fully sequenced relatives of Escherichia coli and Buchnera. The reconstructed ancestral genome contained 2,425 open reading frames (ORFs). The Buchnera genome, containing 564 ORFs, consists of 153 fragments of 1-34 genes that are syntenic with reconstructed ancestral regions. On the basis of this reconstruction, 503 genes were eliminated within syntenic fragments, and 1,403 genes were lost from the gaps between syntenic fragments, probably in connection with genome rearrangements. Lost regions are sometimes large, and often span functionally unrelated genes. In addition, individual genes and regulatory regions have been lost or eroded. For the categories of DNA repair genes and rRNA genes, most lost loci fall in regions between syntenic fragments. This history of gene loss is reflected in the sequences of intergenic spacers at positions where genes were once present. The most plausible interpretation of this reconstruction is that Buchnera lost many genes through the fixation of large deletions soon after the acquisition of an obligate endosymbiotic lifestyle. An implication is that final genome composition may be partly the chance outcome of initial deletions and that neighboring genes influence the likelihood of loss of particular genes and pathways.

Journal ArticleDOI
TL;DR: The human genome contains many endogenous retroviral sequences, and these have been suggested to play important roles in a number of physiological and pathological processes.
Abstract: The human genome contains many endogenous retroviral sequences, and these have been suggested to play important roles in a number of physiological and pathological processes. Can the draft human genome sequences help us to define the role of these elements more closely?

Journal ArticleDOI
TL;DR: Preliminary data indicate a strongly transcellular component for the flux of water in roots, and a new NMR approach is introduced for the purpose of analyzing water movement in plant roots in vivo.
Abstract: In the post-genomic era newly sequenced genomes can be used to deduce organismal functions from our knowledge of other systems. Here we apply this approach to analyzing the aquaporin gene family in Arabidopsis thaliana. The aquaporins are intrinsic membrane proteins that have been characterized as facilitators of water flux. Originally termed major intrinsic proteins (MIPs), they are now also known as water channels, glycerol facilitators and aqua-glyceroporins, yet recent data suggest that they facilitate the movement of other low-molecular-weight metabolites as well. The Arabidopsis genome contains 38 sequences with homology to aquaporin in four subfamilies, termed PIP, TIP, NIP and SIP. We have analyzed aquaporin family structure and expression using the A. thaliana genome sequence, and introduce a new NMR approach for the purpose of analyzing water movement in plant roots in vivo. Our preliminary data indicate a strongly transcellular component for the flux of water in roots.

Journal ArticleDOI
TL;DR: This work combined in silico and experimental approaches to define the complete human nuclear receptor (NR) set and identified two novel NR sequences, indicating that both are pseudogenes.
Abstract: The availability of complete genome sequences enables all the members of a gene family to be identified without limitations imposed by temporal, spatial or quantitative aspects of mRNA expression. Using the nearly completed human genome sequence, we combined in silico and experimental approaches to define the complete human nuclear receptor (NR) set. This information was used to carry out a comparative genomic study of the NR superfamily. Our analysis of the human genome identified two novel NR sequences. Both these contained stop codons within the coding regions, indicating that both are pseudogenes. One (HNF4 γ-related) contained no introns and expressed no detectable mRNA, whereas the other (FXR-related) produced mRNA at relatively high levels in testis. If translated, the latter is predicted to encode a short, non-functional protein. Our analysis indicates that there are fewer than 50 functional human NRs, dramatically fewer than in Caenorhabditis elegans and about twice as many as in Drosophila. Using the complete human NR set we made comparisons with the NR sets of C. elegans and Drosophila. Searches for the >200 NRs unique to C. elegans revealed no human homologs. The comparative analysis also revealed a Drosophila member of NR subfamily NR3, confirming an ancient metazoan origin for this subfamily. This work provides the basis for new insights into the evolution and functional relationships of NR superfamily members.

Journal ArticleDOI
TL;DR: Common features of all importin-β-like transport factors are their ability to shuttle between the nucleus and the cytoplasm, their interaction with RanGTP as well as their able to recognize specific transport substrates.
Abstract: In recent years, our understanding of macromolecular transport processes across the nuclear envelope has grown dramatically, and a large number of soluble transport receptors mediating either nuclear import or nuclear export have been identified. Most of these receptors belong to one large family of proteins, all of which share homology with the protein import receptor importin β (also named karyopherin β). Members of this family have been classified as importins or exportins on the basis of the direction they carry their cargo. To date, the family includes 14 members in the yeast Saccharomyces cerevisiae and at least 22 members in humans. Importins and exportins are regulated by the small GTPase Ran, which is thought to be highly enriched in the nucleus in its GTP-bound form. Importins recognize their substrates in the cytoplasm and transport them through nuclear pores into the nucleus. In the nucleoplasm, RanGTP binds to importins, inducing the release of import cargoes. In contrast, exportins interact with their substrates only in the nucleus in the presence of RanGTP and release them after GTP hydrolysis in the cytoplasm, causing disassembly of the export complex. Thus, common features of all importin-β-like transport factors are their ability to shuttle between the nucleus and the cytoplasm, their interaction with RanGTP as well as their ability to recognize specific transport substrates.

Journal ArticleDOI
TL;DR: It is found that the procedure may require a large number of experimental samples to successfully discover interactions, and is a potentially useful tool for exploration of gene expression data and identification of interesting clusters of genes worthy of further investigation.
Abstract: We propose a new method for supervised learning from gene expression data. We call it 'tree harvesting'. This technique starts with a hierarchical clustering of genes, then models the outcome variable as a sum of the average expression profiles of chosen clusters and their products. It can be applied to many different kinds of outcome measures such as censored survival times, or a response falling in two or more classes (for example, cancer classes). The method can discover genes that have strong effects on their own, and genes that interact with other genes. We illustrate the method on data from a lymphoma study, and on a dataset containing samples from eight different cancers. It identified some potentially interesting gene clusters. In simulation studies we found that the procedure may require a large number of experimental samples to successfully discover interactions. Tree harvesting is a potentially useful tool for exploration of gene expression data and identification of interesting clusters of genes worthy of further investigation.

Journal ArticleDOI
TL;DR: This review discusses recent studies of covalent histone modifications and the enzymatic machines that generate them.
Abstract: The modification of chromatin structure is important for a number of nuclear functions, exemplified by the regulation of transcription. This review discusses recent studies of covalent histone modifications and the enzymatic machines that generate them.

Journal ArticleDOI
TL;DR: Nuclear pore complexes, the conduits for information exchange between the nucleus and cytoplasm, appear broadly similar in eukaryotes from yeast to human and recent advances in the identification and characterization of components of the complex have provided new insights.
Abstract: Nuclear pore complexes, the conduits for information exchange between the nucleus and cytoplasm, appear broadly similar in eukaryotes from yeast to human. Precisely how nuclear pore complexes regulate macromolecular and ionic traffic remains unknown, but recent advances in the identification and characterization of components of the complex by proteomics and genomics have provided new insights.

Journal ArticleDOI
TL;DR: Comparative genome analyses, including chromosome painting in over 40 diverse mammalian species, ordered gene maps from several representatives of different mammalian and vertebrate orders, and large-scale sequencing of the human and mouse genomes are beginning to provide insight into the rates and patterns of chromosomal evolution on a whole-genome scale.
Abstract: Comparative genome analyses, including chromosome painting in over 40 diverse mammalian species, ordered gene maps from several representatives of different mammalian and vertebrate orders, and large-scale sequencing of the human and mouse genomes are beginning to provide insight into the rates and patterns of chromosomal evolution on a whole-genome scale, as well as into the forces that have sculpted the genomes of extant mammalian species.

Journal ArticleDOI
TL;DR: Gene order conservation is a genomic measure that can be useful for studying relationships between prokaryotes and the evolutionary forces shaping their genomes, and could be used as a valid phylogenetic measure to study relationships between species.
Abstract: As more complete genomes are sequenced, conservation of gene order between different organisms is emerging as an informative property of the genomes. Conservation of gene order has been used for predicting function and functional interactions of proteins, as well as for studying the evolutionary relationships between genomes. The reasons for the maintenance of gene order are still not well understood, as the organization of the prokaryote genome into operons and lateral gene transfer cannot possibly account for all the instances of conservation found. Comprehensive studies of gene order are one way of elucidating the nature of these maintaining forces. Gene order is extensively conserved between closely related species, but rapidly becomes less conserved among more distantly related organisms, probably in a cooperative fashion. This trend could be universal in prokaryotic genomes, as archaeal genomes are likely to behave similarly to bacterial genomes. Gene order conservation could therefore be used as a valid phylogenetic measure to study relationships between species. Even between very distant species, remnants of gene order conservation exist in the form of highly conserved clusters of genes. This suggests the existence of selective processes that maintain the organization of these regions. Because the clusters often span more than one operon, common regulation probably cannot be invoked as the cause of the maintenance of gene order. Gene order conservation is a genomic measure that can be useful for studying relationships between prokaryotes and the evolutionary forces shaping their genomes. Gene organization is extensively conserved in some genomic regions, and further studies are needed to elucidate the reason for this conservation.

Journal ArticleDOI
TL;DR: The number of UBC genes appears to increase with developmental complexity, and the results suggest functional overlap in many of these enzymes.
Abstract: The eukaryotic ubiquitin-conjugation system sets the turnover rate of many proteins and includes activating enzymes (E1s), conjugating enzymes (UBCs/E2s), and ubiquitin-protein ligases (E3s), which are responsible for activation, covalent attachment and substrate recognition, respectively. There are also ubiquitin-like proteins with distinct functions, which require their own E1s and E2s for attachment. We describe the results of RNA interference (RNAi) experiments on the E1s, UBC/E2s and ubiquitin-like proteins in Caenorhabditis elegans. We also present a phylogenetic analysis of UBCs. The C. elegans genome encodes 20 UBCs and three ubiquitin E2 variant proteins. RNAi shows that only four UBCs are essential for embryogenesis: LET-70 (UBC-2), a functional homolog of yeast Ubc4/5p, UBC-9, an ortholog of yeast Ubc9p, which transfers the ubiquitin-like modifier SUMO, UBC-12, an ortholog of yeast Ubc12p, which transfers the ubiquitin-like modifier Rub1/Nedd8, and UBC-14, an ortholog of Drosophila Courtless. RNAi of ubc-20, an ortholog of yeast UBC1, results in a low frequency of arrested larval development. A phylogenetic analysis of C. elegans, Drosophila and human UBCs shows that this protein family can be divided into 18 groups, 13 of which include members from all three species. The activating enzymes and the ubiquitin-like proteins NED-8 and SUMO are required for embryogenesis. The number of UBC genes appears to increase with developmental complexity, and our results suggest functional overlap in many of these enzymes. The ubiquitin-like proteins NED-8 and SUMO and their corresponding activating enzymes are required for embryogenesis.

Journal ArticleDOI
TL;DR: Granzymes, a family of serine proteases, are expressed exclusively by cytotoxic T lymphocytes and natural killer cells, components of the immune system that protect higher organisms against viral infection and cellular transformation.
Abstract: Granzymes, a family of serine proteases, are expressed exclusively by cytotoxic T lymphocytes and natural killer (NK) cells, components of the immune system that protect higher organisms against viral infection and cellular transformation. Following receptor-mediated conjugate formation between a granzyme-containing cell and an infected or transformed target cell, granzymes enter the target cell via endocytosis and induce apoptosis. Granzyme B is the most powerful pro-apoptotic member of the granzyme family. Like caspases, cysteine proteases that play an important role in apoptosis, it can cleave proteins after acidic residues, especially aspartic acid. Other granzymes may serve additional functions, and some may not induce apoptosis. Granzymes have been well characterized only in human and rodents, and can be grouped into three subfamilies according to substrate specificity: members of the granzyme family that have enzymatic activity similar to the serine protease chymotrypsin are encoded by a gene cluster termed the 'chymase locus'; granzymes with trypsin-like specificities are encoded by the 'tryptase locus'; and a third subfamily cleaves after unbranched hydrophobic residues, especially methionine, and is encoded by the 'Met-ase locus'. All granzymes are synthesized as zymogens and, after clipping of the leader peptide, maximal enzymatic activity is achieved by removal of an amino-terminal dipeptide. They can all be blocked by serine protease inhibitors, and a new group of inhibitors has recently been identified - serpins, some of which are specific for granzymes. Future studies of serpins may bring insights into how cells that synthesize granzymes are protected from inadvertent cell suicide.

Journal ArticleDOI
TL;DR: Antisense oligonucleotides provide a promising approach to investigating gene function in vivo, but their ability to offer unambiguous insights into phenotypes has been debated.
Abstract: Antisense oligonucleotides provide a promising approach to investigating gene function in vivo, but their ability to offer unambiguous insights into phenotypes has been debated The recent use of morpholino antisense oligonucleotides in zebrafish embryos may prove a major advance, but rigorous controls are essential

Journal ArticleDOI
TL;DR: Taking the amino-acid frequency into account decreases the fraction of random codes that beat the natural code, and leads to an attempt to propose a tentative picture of primitive life.
Abstract: The genetic code is known to be efficient in limiting the effect of mistranslation errors. A misread codon often codes for the same amino acid or one with similar biochemical properties, so the structure and function of the coded protein remain relatively unaltered. Previous studies have attempted to address this question quantitatively, by estimating the fraction of randomly generated codes that do better than the genetic code in respect of overall robustness. We extended these results by investigating the role of amino-acid frequencies in the optimality of the genetic code. We found that taking the amino-acid frequency into account decreases the fraction of random codes that beat the natural code. This effect is particularly pronounced when more refined measures of the amino-acid substitution cost are used than hydrophobicity. To show this, we devised a new cost function by evaluating in silico the change in folding free energy caused by all possible point mutations in a set of protein structures. With this function, which measures protein stability while being unrelated to the code's structure, we estimated that around two random codes in a billion (109) are fitter than the natural code. When alternative codes are restricted to those that interchange biosynthetically related amino acids, the genetic code appears even more optimal. These results lead us to discuss the role of amino-acid frequencies and other parameters in the genetic code's evolution, in an attempt to propose a tentative picture of primitive life.