scispace - formally typeset
Search or ask a question

Showing papers on "Gene published in 2002"


Journal ArticleDOI
TL;DR: How BLAT was optimized is described, which is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences.
Abstract: Analyzing vertebrate genomes requires rapid mRNA/DNA and cross-species protein alignments A new tool, BLAT, is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences BLAT's speed stems from an index of all nonoverlapping K-mers in the genome This index fits inside the RAM of inexpensive computers, and need only be computed once for each genome assembly BLAT has several major stages It uses the index to find regions in the genome likely to be homologous to the query sequence It performs an alignment between homologous regions It stitches together these aligned regions (often exons) into larger alignments (typically genes) Finally, BLAT revisits small internal exons possibly missed at the first stage and adjusts large gap boundaries that have canonical splice sites where feasible This paper describes how BLAT was optimized Effects on speed and sensitivity are explored for various K-mer sizes, mismatch schemes, and number of required index matches BLAT is compared with other alignment programs on various test sets and then used in several genome-wide applications http://genomeucscedu hosts a web-based BLAT server for the human genome

8,326 citations


Journal ArticleDOI
TL;DR: In this article, a Support Vector Machine (SVM) method based on recursive feature elimination (RFE) was proposed to select a small subset of genes from broad patterns of gene expression data, recorded on DNA micro-arrays.
Abstract: DNA micro-arrays now permit scientists to screen thousands of genes simultaneously and determine whether those genes are active, hyperactive or silent in normal or cancerous tissue. Because these new micro-array devices generate bewildering amounts of raw data, new analytical methods must be developed to sort out whether cancer tissues have distinctive signatures of gene expression over normal tissues or other types of cancer tissues. In this paper, we address the problem of selection of a small subset of genes from broad patterns of gene expression data, recorded on DNA micro-arrays. Using available training examples from cancer and normal patients, we build a classifier suitable for genetic diagnosis, as well as drug discovery. Previous attempts to address this problem select genes with correlation techniques. We propose a new method of gene selection utilizing Support Vector Machine methods based on Recursive Feature Elimination (RFE). We demonstrate experimentally that the genes selected by our techniques yield better classification performance and are biologically relevant to cancer. In contrast with the baseline method, our method eliminates gene redundancy automatically and yields better and more compact gene subsets. In patients with leukemia our method discovered 2 genes that yield zero leave-one-out error, while 64 genes are necessary for the baseline method to get the best result (one leave-one-out error). In the colon cancer database, using only 4 genes our method is 98% accurate, while the baseline method is only 86% accurate.

7,939 citations


Journal ArticleDOI
06 Dec 2002-Science
TL;DR: The protein kinase complement of the human genome is catalogued using public and proprietary genomic, complementary DNA, and expressed sequence tag sequences to provide a starting point for comprehensive analysis of protein phosphorylation in normal and disease states and a detailed view of the current state of human genome analysis through a focus on one large gene family.
Abstract: We have catalogued the protein kinase complement of the human genome (the "kinome") using public and proprietary genomic, complementary DNA, and expressed sequence tag (EST) sequences. This provides a starting point for comprehensive analysis of protein phosphorylation in normal and disease states, as well as a detailed view of the current state of human genome analysis through a focus on one large gene family. We identify 518 putative protein kinase genes, of which 71 have not previously been reported or described as kinases, and we extend or correct the protein sequences of 56 more kinases. New genes include members of well-studied families as well as previously unidentified families, some of which are conserved in model organisms. Classification and comparison with model organism kinomes identified orthologous groups and highlighted expansions specific to human and other lineages. We also identified 106 protein kinase pseudogenes. Chromosomal mapping revealed several small clusters of kinase genes and revealed that 244 kinases map to disease loci or cancer amplicons.

7,486 citations


Journal ArticleDOI
TL;DR: Detailed deletion and expression analysis shows that miR15 and miR16 are located within a 30-kb region of loss in CLL, and that both genes are deleted or down-regulated in the majority (≈68%) of CLL cases.
Abstract: Micro-RNAs (miR genes) are a large family of highly conserved noncoding genes thought to be involved in temporal and tissue-specific gene regulation MiRs are transcribed as short hairpin precursors (≈70 nt) and are processed into active 21- to 22-nt RNAs by Dicer, a ribonuclease that recognizes target mRNAs via base-pairing interactions Here we show that miR15 and miR16 are located at chromosome 13q14, a region deleted in more than half of B cell chronic lymphocytic leukemias (B-CLL) Detailed deletion and expression analysis shows that miR15 and miR16 are located within a 30-kb region of loss in CLL, and that both genes are deleted or down-regulated in the majority (≈68%) of CLL cases

5,113 citations


Journal ArticleDOI
19 Apr 2002-Science
TL;DR: It is shown that siRNA expression mediated by this vector causes efficient and specific down-regulation of gene expression, resulting in functional inactivation of the targeted genes.
Abstract: Mammalian genetic approaches to study gene function have been hampered by the lack of tools to generate stable loss-of-function phenotypes efficiently. We report here a new vector system, named pSUPER, which directs the synthesis of small interfering RNAs (siRNAs) in mammalian cells. We show that siRNA expression mediated by this vector causes efficient and specific down-regulation of gene expression, resulting in functional inactivation of the targeted genes. Stable expression of siRNAs using this vector mediates persistent suppression of gene expression, allowing the analysis of loss-of-function phenotypes that develop over longer periods of time. Therefore, the pSUPER vector constitutes a new and powerful system to analyze gene function in a variety of mammalian cell types.

4,937 citations


Journal ArticleDOI
25 Jul 2002-Nature
TL;DR: It is shown that previously known and new genes are necessary for optimal growth under six well-studied conditions: high salt, sorbitol, galactose, pH 8, minimal medium and nystatin treatment, and less than 7% of genes that exhibit a significant increase in messenger RNA expression are also required for optimal Growth in four of the tested conditions.
Abstract: Determining the effect of gene deletion is a fundamental approach to understanding gene function. Conventional genetic screens exhibit biases, and genes contributing to a phenotype are often missed. We systematically constructed a nearly complete collection of gene-deletion mutants (96% of annotated open reading frames, or ORFs) of the yeast Saccharomyces cerevisiae. DNA sequences dubbed 'molecular bar codes' uniquely identify each strain, enabling their growth to be analysed in parallel and the fitness contribution of each gene to be quantitatively assessed by hybridization to high-density oligonucleotide arrays. We show that previously known and new genes are necessary for optimal growth under six well-studied conditions: high salt, sorbitol, galactose, pH 8, minimal medium and nystatin treatment. Less than 7% of genes that exhibit a significant increase in messenger RNA expression are also required for optimal growth in four of the tested conditions. Our results validate the yeast gene-deletion collection as a valuable resource for functional genomics.

4,328 citations


Journal ArticleDOI
03 Oct 2002-Nature
TL;DR: The genome sequence of P. falciparum clone 3D7 is reported, which is the most (A + T)-rich genome sequenced to date and is being exploited in the search for new drugs and vaccines to fight malaria.
Abstract: The parasite Plasmodium falciparum is responsible for hundreds of millions of cases of malaria, and kills more than one million African children annually. Here we report an analysis of the genome sequence of P. falciparum clone 3D7. The 23-megabase nuclear genome consists of 14 chromosomes, encodes about 5,300 genes, and is the most (A + T)-rich genome sequenced to date. Genes involved in antigenic variation are concentrated in the subtelomeric regions of the chromosomes. Compared to the genomes of free-living eukaryotic microbes, the genome of this intracellular parasite encodes fewer enzymes and transporters, but a large proportion of genes are devoted to immune evasion and host-parasite interactions. Many nuclear-encoded proteins are targeted to the apicoplast, an organelle involved in fatty-acid and isoprenoid metabolism. The genome sequence provides the foundation for future studies of this organism, and is being exploited in the search for new drugs and vaccines to fight malaria.

4,312 citations


Journal ArticleDOI
05 Apr 2002-Science
TL;DR: A draft sequence of the rice genome for the most widely cultivated subspecies in China, Oryza sativa L. ssp.indica, by whole-genome shotgun sequencing is produced, with a large proportion of rice genes with no recognizable homologs due to a gradient in the GC content of rice coding sequences.
Abstract: We have produced a draft sequence of the rice genome for the most widely cultivated subspecies in China, Oryza sativa L. ssp. indica, by whole-genome shotgun sequencing. The genome was 466 megabases in size, with an estimated 46,022 to 55,615 genes. Functional coverage in the assembled sequences was 92.0%. About 42.2% of the genome was in exact 20-nucleotide oligomer repeats, and most of the transposons were in the intergenic regions between genes. Although 80.6% of predicted Arabidopsis thaliana genes had a homolog in rice, only 49.4% of predicted rice genes had a homolog in A. thaliana. The large proportion of rice genes with no recognizable homologs is due to a gradient in the GC-content of rice coding sequences.

4,064 citations


Book ChapterDOI
TL;DR: The yeast Saccharomyces cerevisiae is now recognized as a model system representing a simple eukaryote whose genome can be easily manipulated and made particularly accessible to gene cloning and genetic engineering techniques.
Abstract: Publisher Summary The yeast Saccharomyces cerevisiae is now recognized as a model system representing a simple eukaryote whose genome can be easily manipulated. Yeast has only a slightly greater genetic complexity than bacteria and shares many of the technical advantages that permitted rapid progress in the molecular genetics of prokaryotes and their viruses. Some of the properties that make yeast particularly suitable for biological studies include rapid growth, dispersed cells, the ease of replica plating and mutant isolation, a well-defined genetic system, and most important, a highly versatile DNA transformation system. Being nonpathogenic, yeast can be handled with little precautions. Large quantities of normal baker's yeast are commercially available and can provide a cheap source for biochemical studies. The development of DNA transformation has made yeast particularly accessible to gene cloning and genetic engineering techniques. Structural genes corresponding to virtually any genetic trait can be identified by complementation from plasmid libraries. Plasmids can be introduced into yeast cells either as replicating molecules or by integration into the genome. In contrast to most other organisms, integrative recombination of transforming DNA in yeast proceeds exclusively via homologous recombination. Cloned yeast sequences, accompanied by foreign sequences on plasmids, can therefore be directed at will to specific locations in the genome.

3,547 citations


Journal ArticleDOI
25 Oct 2002-Science
TL;DR: This work determines how most of the transcriptional regulators encoded in the eukaryote Saccharomyces cerevisiae associate with genes across the genome in living cells, and identifies network motifs, the simplest units of network architecture, and demonstrates that an automated process can use motifs to assemble a transcriptional regulatory network structure.
Abstract: We have determined how most of the transcriptional regulators encoded in the eukaryote Saccharomyces cerevisiaeassociate with genes across the genome in living cells. Just as maps of metabolic networks describe the potential pathways that may be used by a cell to accomplish metabolic processes, this network of regulator-gene interactions describes potential pathways yeast cells can use to regulate global gene expression programs. We use this information to identify network motifs, the simplest units of network architecture, and demonstrate that an automated process can use motifs to assemble a transcriptional regulatory network structure. Our results reveal that eukaryotic cellular functions are highly connected through networks of transcriptional regulators that regulate other transcriptional regulators.

3,127 citations


Journal ArticleDOI
09 May 2002-Nature
TL;DR: The 8,667,507 base pair linear chromosome of Streptomyces coelicolor is reported, containing the largest number of genes so far discovered in a bacterium.
Abstract: Streptomyces coelicolor is a representative of the group of soil-dwelling, filamentous bacteria responsible for producing most natural antibiotics used in human and veterinary medicine. Here we report the 8,667,507 base pair linear chromosome of this organism, containing the largest number of genes so far discovered in a bacterium. The 7,825 predicted genes include more than 20 clusters coding for known or predicted secondary metabolites. The genome contains an unprecedented proportion of regulatory genes, predominantly those likely to be involved in responses to external stimuli and stresses, and many duplicated gene sets that may represent 'tissue-specific' isoforms operating in different phases of colonial development, a unique situation for a bacterium. An ancient synteny was revealed between the central 'core' of the chromosome and the whole chromosome of pathogens Mycobacterium tuberculosis and Corynebacterium diphtheriae. The genome sequence will greatly increase our understanding of microbial life in the soil as well as aiding the generation of new drug candidates by genetic engineering.

Journal ArticleDOI
TL;DR: Different discrimination methods for the classification of tumors based on gene expression data include nearest-neighbor classifiers, linear discriminant analysis, and classification trees, which are applied to datasets from three recently published cancer gene expression studies.
Abstract: A reliable and precise classification of tumors is essential for successful diagnosis and treatment of cancer. cDNA microarrays and high-density oligonucleotide chips are novel biotechnologies increasingly used in cancer research. By allowing the monitoring of expression levels in cells for thousands of genes simultaneously, microarray experiments may lead to a more complete understanding of the molecular variations among tumors and hence to a finer and more informative classification. The ability to successfully distinguish between tumor classes (already known or yet to be discovered) using gene expression data is an important aspect of this novel approach to cancer classification. This article compares the performance of different discrimination methods for the classification of tumors based on gene expression data. The methods include nearest-neighbor classifiers, linear discriminant analysis, and classification trees. Recent machine learning approaches, such as bagging and boosting, are also considere...

Journal ArticleDOI
TL;DR: The results support the notion that the clinical behavior of prostate cancer is linked to underlying gene expression differences that are detectable at the time of diagnosis.

Journal ArticleDOI
TL;DR: Recent advances in understanding the regulation of PC biosynthesis and MT gene expression and the possible roles of PCs and MTs in heavy metal detoxification and homeostasis are reviewed.
Abstract: ▪ Abstract Among the heavy metal-binding ligands in plant cells the phytochelatins (PCs) and metallothioneins (MTs) are the best characterized. PCs and MTs are different classes of cysteine-rich, heavy metal-binding protein molecules. PCs are enzymatically synthesized peptides, whereas MTs are gene-encoded polypeptides. Recently, genes encoding the enzyme PC synthase have been identified in plants and other species while the completion of the Arabidopsis genome sequence has allowed the identification of the entire suite of MT genes in a higher plant. Recent advances in understanding the regulation of PC biosynthesis and MT gene expression and the possible roles of PCs and MTs in heavy metal detoxification and homeostasis are reviewed.

Journal ArticleDOI
Robert A. Holt1, G. Mani Subramanian1, Aaron L. Halpern1, Granger G. Sutton1, Rosane Charlab1, Deborah R. Nusskern1, Patrick Wincker2, Andrew G. Clark3, José M. C. Ribeiro4, Ron Wides5, Steven L. Salzberg6, Brendan J. Loftus6, Mark Yandell1, William H. Majoros1, William H. Majoros6, Douglas B. Rusch1, Zhongwu Lai1, Cheryl L. Kraft1, Josep F. Abril, Véronique Anthouard2, Peter Arensburger7, Peter W. Atkinson7, Holly Baden1, Véronique de Berardinis2, Danita Baldwin1, Vladimir Benes, Jim Biedler8, Claudia Blass, Randall Bolanos1, Didier Boscus2, Mary Barnstead1, Shuang Cai1, Kabir Chatuverdi1, George K. Christophides, Mathew A. Chrystal9, Michele Clamp10, Anibal Cravchik1, Val Curwen10, Ali N Dana9, Arthur L. Delcher1, Ian M. Dew1, Cheryl A. Evans1, Michael Flanigan1, Anne Grundschober-Freimoser11, Lisa Friedli7, Zhiping Gu1, Ping Guan1, Roderic Guigó, Maureen E. Hillenmeyer9, Susanne L. Hladun1, James R. Hogan9, Young S. Hong9, Jeffrey Hoover1, Olivier Jaillon2, Zhaoxi Ke1, Zhaoxi Ke9, Chinnappa D. Kodira1, Kokoza Eb, Anastasios C. Koutsos12, Ivica Letunic, Alex Levitsky1, Yong Liang1, Jhy-Jhu Lin1, Jhy-Jhu Lin6, Neil F. Lobo9, John Lopez1, Joel A. Malek6, Tina C. McIntosh1, Stephan Meister, Jason R. Miller1, Clark M. Mobarry1, Emmanuel Mongin13, Sean D. Murphy1, David A. O'Brochta11, Cynthia Pfannkoch1, Rong Qi1, Megan A. Regier1, Karin A. Remington1, Hongguang Shao8, Maria V. Sharakhova9, Cynthia Sitter1, Jyoti Shetty6, Thomas J. Smith1, Renee Strong1, Jingtao Sun1, Dana Thomasova, Lucas Q. Ton9, Pantelis Topalis12, Zhijian Tu8, Maria F. Unger9, Brian P. Walenz1, Aihui Wang1, Jian Wang1, Mei Wang1, X. Wang9, Kerry J. Woodford1, Jennifer R. Wortman1, Jennifer R. Wortman6, Martin Wu6, Alison Yao1, Evgeny M. Zdobnov, Hongyu Zhang1, Qi Zhao1, Shaying Zhao6, Shiaoping C. Zhu1, Igor F. Zhimulev, Mario Coluzzi14, Alessandra della Torre14, Charles Roth15, Christos Louis12, Francis Kalush1, Richard J. Mural1, Eugene W. Myers1, Mark Raymond Adams1, Hamilton O. Smith1, Samuel Broder1, Malcolm J. Gardner6, Claire M. Fraser6, Ewan Birney13, Peer Bork, Paul T. Brey15, J. Craig Venter1, J. Craig Venter6, Jean Weissenbach2, Fotis C. Kafatos, Frank H. Collins9, Stephen L. Hoffman1 
04 Oct 2002-Science
TL;DR: Analysis of the PEST strain of A. gambiae revealed strong evidence for about 14,000 protein-encoding transcripts, and prominent expansions in specific families of proteins likely involved in cell adhesion and immunity were noted.
Abstract: Anopheles gambiae is the principal vector of malaria, a disease that afflicts more than 500 million people and causes more than 1 million deaths each year. Tenfold shotgun sequence coverage was obtained from the PEST strain of A. gambiae and assembled into scaffolds that span 278 million base pairs. A total of 91% of the genome was organized in 303 scaffolds; the largest scaffold was 23.1 million base pairs. There was substantial genetic variation within this strain, and the apparent existence of two haplotypes of approximately equal frequency ("dual haplotypes") in a substantial fraction of the genome likely reflects the outbred nature of the PEST strain. The sequence produced a conservative inference of more than 400,000 single-nucleotide polymorphisms that showed a markedly bimodal density distribution. Analysis of the genome sequence revealed strong evidence for about 14,000 protein-encoding transcripts. Prominent expansions in specific families of proteins likely involved in cell adhesion and immunity were noted. An expressed sequence tag analysis of genes regulated by blood feeding provided insights into the physiological adaptations of a hematophagous insect.

Journal ArticleDOI
TL;DR: A full-length cDNA microarray containing approximately 7000 independent, full- length cDNA groups is prepared to analyse the expression profiles of genes under drought, cold (low temperature) and high-salinity stress conditions over time, suggesting that various transcriptional regulatory mechanisms function in the drought,cold or high- salinity stress signal transduction pathways.
Abstract: Full-length cDNAs are essential for functional analysis of plant genes in the post-sequencing era of the Arabidopsis genome. Recently, cDNA microarray analysis has been developed for quantitative analysis of global and simultaneous analysis of expression profiles. We have prepared a full-length cDNA microarray containing approximately 7000 independent, full-length cDNA groups to analyse the expression profiles of genes under drought, cold (low temperature) and high-salinity stress conditions over time. The transcripts of 53, 277 and 194 genes increased after cold, drought and high-salinity treatments, respectively, more than fivefold compared with the control genes. We also identified many highly drought-, cold- or high-salinity- stress-inducible genes. However, we observed strong relationships in the expression of these stress-responsive genes based on Venn diagram analysis, and found 22 stress-inducible genes that responded to all three stresses. Several gene groups showing different expression profiles were identified by analysis of their expression patterns during stress-responsive gene induction. The cold-inducible genes were classified into at least two gene groups from their expression profiles. DREB1A was included in a group whose expression peaked at 2 h after cold treatment. Among the drought, cold or high-salinity stress-inducible genes identified, we found 40 transcription factor genes (corresponding to approximately 11% of all stress-inducible genes identified), suggesting that various transcriptional regulatory mechanisms function in the drought, cold or high-salinity stress signal transduction pathways.

Journal ArticleDOI
TL;DR: The principle that c-MYC transcription can be controlled by ligand-mediated G-quadruplex stabilization is established, establishing the principle that the purine-rich strand of the DNA in this region can form two different intramolecular G- quadruplex structures.
Abstract: The nuclease hypersensitivity element III1 upstream of the P1 promoter of c-MYC controls 85–90% of the transcriptional activation of this gene. We have demonstrated that the purine-rich strand of the DNA in this region can form two different intramolecular G-quadruplex structures, only one of which seems to be biologically relevant. This biologically relevant structure is the kinetically favored chair-form G-quadruplex, which is destabilized when mutated with a single G → A transition, resulting in a 3-fold increase in basal transcriptional activity of the c-MYC promoter. The cationic porphyrin TMPyP4, which has been shown to stabilize this G-quadruplex structure, is able to suppress further c-MYC transcriptional activation. These results provide compelling evidence that a specific G-quadruplex structure formed in the c-MYC promoter region functions as a transcriptional repressor element. Furthermore, we establish the principle that c-MYC transcription can be controlled by ligand-mediated G-quadruplex stabilization.

Journal ArticleDOI
TL;DR: Oligonucleotide microarrays used to analyze the pattern of genes expressed in leukemic blasts from 360 pediatric ALL patients identified each of the prognostically important leukemia subtypes, and within some genetic subgroups, expression profiles identified those patients that would eventually fail therapy.

Journal Article
01 Jan 2002-Science
TL;DR: A draft sequence of the rice genome for the most widely cultivated subspecies in China, Oryza sativa L. ssp. indica, by whole-genome shotgun sequencing was presented in this paper.
Abstract: We have produced a draft sequence of the rice genome for the most widely cultivated subspecies in China, Oryza sativa L. ssp. indica, by whole-genome shotgun sequencing. The genome was 466 megabases in size, with an estimated 46,022 to 55,615 genes. Functional coverage in the assembled sequences was 92.0%. About 42.2% of the genome was in exact 20-nucleotide oligomer repeats, and most of the transposons were in the intergenic regions between genes. Although 80.6% of predicted Arabidopsis thaliana genes had a homolog in rice, only 49.4% of predicted rice genes had a homolog in A. thaliana. The large proportion of rice genes with no recognizable homologs is due to a gradient in the GC content of rice coding sequences.

Journal ArticleDOI
23 Aug 2002-Cell
TL;DR: Global analysis of cellular transcription indicated that active genes were preferential integration targets, particularly genes that were activated in cells after infection by HIV-1, and this data suggests how selective targeting promotes aggressive HIV replication.

Journal ArticleDOI
TL;DR: It is shown how the total variation in the level of expression of a given gene can be decomposed into its intrinsic and extrinsic components and theoretically that simultaneous measurement of two identical genes per cell enables discrimination of these two types of noise.
Abstract: Gene expression is a stochastic, or “noisy,” process. This noise comes about in two ways. The inherent stochasticity of biochemical processes such as transcription and translation generates “intrinsic” noise. In addition, fluctuations in the amounts or states of other cellular components lead indirectly to variation in the expression of a particular gene and thus represent “extrinsic” noise. Here, we show how the total variation in the level of expression of a given gene can be decomposed into its intrinsic and extrinsic components. We demonstrate theoretically that simultaneous measurement of two identical genes per cell enables discrimination of these two types of noise. Analytic expressions for intrinsic noise are given for a model that involves all the major steps in transcription and translation. These expressions give the sensitivity to various parameters, quantify the deviation from Poisson statistics, and provide a way of fitting experiment. Transcription dominates the intrinsic noise when the average number of proteins made per mRNA transcript is greater than ≃2. Below this number, translational effects also become important. Gene replication and cell division, included in the model, cause protein numbers to tend to a limit cycle. We calculate a general form for the extrinsic noise and illustrate it with the particular case of a single fluctuating extrinsic variable—a repressor protein, which acts on the gene of interest. All results are confirmed by stochastic simulation using plausible parameters for Escherichia coli.

Journal ArticleDOI
Valerie Wood1, R. Gwilliam1, Marie-Adèle Rajandream1, M. Lyne1, Rachel Lyne1, A. Stewart2, J. Sgouros2, N. Peat2, Jacqueline Hayles2, Stephen Baker1, D. Basham1, Sharen Bowman1, Karen Brooks1, D. Brown1, Steve D.M. Brown1, Tracey Chillingworth1, Carol Churcher1, Mark O. Collins1, R. Connor1, Ann Cronin1, P. Davis1, Theresa Feltwell1, Andrew G. Fraser1, S. Gentles1, Arlette Goble1, N. Hamlin1, David Harris1, J. Hidalgo1, Geoffrey M. Hodgson1, S. Holroyd1, T. Hornsby1, S. Howarth1, Elizabeth J. Huckle1, Sarah E. Hunt1, Kay Jagels1, Kylie R. James1, L. Jones1, Matthew Jones1, S. Leather1, S. McDonald1, J. McLean1, P. Mooney1, Sharon Moule1, Karen Mungall1, Lee Murphy1, D. Niblett1, C. Odell1, Karen Oliver1, Susan O'Neil1, D. Pearson1, Michael A. Quail1, Ester Rabbinowitsch1, Kim Rutherford1, Simon Rutter1, David L. Saunders1, Kathy Seeger1, Sarah Sharp1, Jason Skelton1, Mark Simmonds1, R. Squares1, S. Squares1, K. Stevens1, K. Taylor1, Ruth Taylor1, Adrian Tivey1, S. Walsh1, T. Warren1, S. Whitehead1, John Woodward1, Guido Volckaert3, Rita Aert3, Johan Robben3, B. Grymonprez3, I. Weltjens3, E. Vanstreels3, Michael A. Rieger, M. Schafer, S. Muller-Auer, C. Gabel, M. Fuchs, C. Fritzc, E. Holzer, D. Moestl, H. Hilbert, K. Borzym4, I. Langer4, Alfred Beck4, Hans Lehrach4, Richard Reinhardt4, Thomas M. Pohl5, P. Eger5, Wolfgang Zimmermann, H. Wedler, R. Wambutt, Bénédicte Purnelle6, André Goffeau6, Edouard Cadieu7, Stéphane Dréano7, Stéphanie Gloux7, Valerie Lelaure7, Stéphanie Mottier7, Francis Galibert7, Stephen J. Aves8, Z. Xiang8, Cherryl Hunt8, Karen Moore8, S. M. Hurst8, M. Lucas9, M. Rochet9, Claude Gaillardin9, Victor A. Tallada10, Victor A. Tallada11, Andrés Garzón11, Andrés Garzón10, G. Thode11, Rafael R. Daga11, Rafael R. Daga10, L. Cruzado11, Juan Jimenez10, Juan Jimenez11, Miguel del Nogal Sánchez12, F. del Rey12, J. Benito12, Angel Domínguez12, José L. Revuelta12, Sergio Moreno12, John Armstrong13, Susan L. Forsburg14, L. Cerrutti1, Todd M. Lowe15, W. R. McCombie16, Ian T. Paulsen17, Judith A. Potashkin18, G. V. Shpakovski19, David W. Ussery20, Bart Barrell1, Paul Nurse2 
21 Feb 2002-Nature
TL;DR: The genome of fission yeast (Schizosaccharomyces pombe), which contains the smallest number of protein-coding genes yet recorded for a eukaryote, is sequenced and highly conserved genes important for eukARYotic cell organization including those required for the cytoskeleton, compartmentation, cell-cycle control, proteolysis, protein phosphorylation and RNA splicing are identified.
Abstract: We have sequenced and annotated the genome of fission yeast (Schizosaccharomyces pombe), which contains the smallest number of protein-coding genes yet recorded for a eukaryote: 4,824. The centromeres are between 35 and 110 kilobases (kb) and contain related repeats including a highly conserved 1.8-kb element. Regions upstream of genes are longer than in budding yeast (Saccharomyces cerevisiae), possibly reflecting more-extended control regions. Some 43% of the genes contain introns, of which there are 4,730. Fifty genes have significant similarity with human disease genes; half of these are cancer related. We identify highly conserved genes important for eukaryotic cell organization including those required for the cytoskeleton, compartmentation, cell-cycle control, proteolysis, protein phosphorylation and RNA splicing. These genes may have originated with the appearance of eukaryotic life. Few similarly conserved genes that are important for multicellular organization were identified, suggesting that the transition from prokaryotes to eukaryotes required more new genes than did the transition from unicellular to multicellular organization.

Journal ArticleDOI
TL;DR: The results provide the first direct experimental evidence of the biochemical origin of phenotypesic noise, demonstrating that the level of phenotypic variation in an isogenic population can be regulated by genetic parameters.
Abstract: Stochastic mechanisms are ubiquitous in biological systems. Biochemical reactions that involve small numbers of molecules are intrinsically noisy, being dominated by large concentration fluctuations 1‐3 . This intrinsic noise has been implicated in the random lysis/lysogeny decision of bacteriophage-λ 4 , in the loss of synchrony of circadian clocks 5,6 and in the decrease of precision of cell signals7. We sought to quantitatively investigate the extent to which the occurrence of molecular fluctuations within single cells (biochemical noise) could explain the variation of gene expression levels between cells in a genetically identical population (phenotypic noise). We have isolated the biochemical contribution to phenotypic noise from that of other noise sources by carrying out a series of differential measurements. We varied independently the rates of transcription and translation of a single fluorescent reporter gene in the chromosome of Bacillus subtilis, and we quantitatively measured the resulting changes in the phenotypic noise characteristics. We report that of these two parameters, increased translational efficiency is the predominant source of increased phenotypic noise. This effect is consistent with a stochastic model of gene expression in which proteins are produced in random and sharp bursts. Our results thus provide the first direct experimental evidence of the biochemical origin of phenotypic noise, demonstrating that the level of phenotypic variation in an isogenic population can be regulated by genetic parameters. We selected as our reporter system a single-copy chromosomal gene with an inducible promoter. As an estimated 50‐80% of bacterial genes are transcriptionally regulated 8 , this system typifies the majority of naturally occurring genes, allowing our results to be extended to natural systems. We incorporated a single copy of our reporter, the green fluorescent protein gene (gfp), into the chromosome of B. subtilis. We chose to integrate gfpinto the chromosome itself, rather than in the form of plasmids, as variation in plasmid copy number 9,10 can act as an additional and unwanted source of noise. Transcriptional efficiency was regulated by using an isopropyl-β-D-thiogalactopyranoside (IPTG)‐inducible promoter, Pspac, upstream of gfp, and varying the concentration of IPTG in the growth medium. Translational

Journal ArticleDOI
Yasushi Okazaki, Masaaki Furuno, Takeya Kasukawa1, Jun Adachi, Hidemasa Bono, S. Kondo, Itoshi Nikaido2, Naoki Osato, Rintaro Saito3, Harukazu Suzuki, Itaru Yamanaka, H. Kiyosawa2, Ken Yagi, Yasuhiro Tomaru4, Yuki Hasegawa2, A. Nogami2, Christian Schönbach, Takashi Gojobori, Richard M. Baldarelli, David P. Hill, Carol J. Bult, David A. Hume5, John Quackenbush6, Lynn M. Schriml7, Alexander Kanapin, Hideo Matsuda8, Serge Batalov9, Kirk W. Beisel10, Judith A. Blake, Dirck W. Bradt, Vladimir Brusic, Cyrus Chothia11, Lori E. Corbani, S. Cousins, Emiliano Dalla, Tommaso A. Dragani, Colin F. Fletcher12, Colin F. Fletcher9, Alistair R. R. Forrest5, K. S. Frazer13, Terry Gaasterland14, Manuela Gariboldi, Carmela Gissi15, Adam Godzik16, Julian Gough11, Sean M. Grimmond5, Stefano Gustincich17, Nobutaka Hirokawa18, Ian J. Jackson19, Erich D. Jarvis20, Akio Kanai3, Hideya Kawaji8, Hideya Kawaji1, Yuka Imamura Kawasawa21, Rafal M. Kedzierski21, Benjamin L. King, Akihiko Konagaya, Igor V. Kurochkin, Yong-Hwan Lee6, Boris Lenhard22, Paul A. Lyons23, Donna Maglott7, Lois J. Maltais, Luigi Marchionni, Louise M. McKenzie, Harukata Miki18, Takeshi Nagashima, Koji Numata3, Toshihisa Okido, William J. Pavan7, Geo Pertea6, Graziano Pesole15, Nikolai Petrovsky24, Ramesh S. Pillai, Joan Pontius7, D. Qi, Sridhar Ramachandran, Timothy Ravasi5, Jonathan C. Reed16, Deborah J Reed, Jeffrey G. Reid, Brian Z. Ring, M. Ringwald, Albin Sandelin22, Claudio Schneider, Colin A. Semple19, Mitsutoshi Setou18, K. Shimada25, Razvan Sultana6, Yoichi Takenaka8, Martin S. Taylor19, Rohan D. Teasdale5, Masaru Tomita3, Roberto Verardo, Lukas Wagner7, Claes Wahlestedt22, Y. Wang6, Yoshiki Watanabe25, Christine A. Wells5, Laurens G. Wilming26, Anthony Wynshaw-Boris27, Masashi Yanagisawa21, Ivana V. Yang6, L. Yang, Zheng Yuan5, Mihaela Zavolan14, Yunhui Zhu, Anne M. Zimmer28, Piero Carninci, N. Hayatsu, Tomoko Hirozane-Kishikawa, Hideaki Konno, M. Nakamura, Naoko Sakazume, K. Sato4, Toshiyuki Shiraki, Kazunori Waki, Jun Kawai, Katsunori Aizawa, Takahiro Arakawa, S. Fukuda, A. Hara, W. Hashizume, K. Imotani, Y. Ishii, Masayoshi Itoh, Ikuko Kagawa, A. Miyazaki, K. Sakai, D. Sasaki, K. Shibata, Akira Shinagawa, Ayako Yasunishi, Masayasu Yoshino, Robert H. Waterston29, Eric S. Lander30, Jane Rogers26, Ewan Birney, Yoshihide Hayashizaki 
05 Dec 2002-Nature
TL;DR: The present work, completely supported by physical clones, provides the most comprehensive survey of a mammalian transcriptome so far, and is a valuable resource for functional genomics.
Abstract: Only a small proportion of the mouse genome is transcribed into mature messenger RNA transcripts There is an international collaborative effort to identify all full-length mRNA transcripts from the mouse, and to ensure that each is represented in a physical collection of clones Here we report the manual annotation of 60,770 full-length mouse complementary DNA sequences These are clustered into 33,409 'transcriptional units', contributing 901% of a newly established mouse transcriptome database Of these transcriptional units, 4,258 are new protein-coding and 11,665 are new non-coding messages, indicating that non-coding RNA is a major component of the transcriptome 41% of all transcriptional units showed evidence of alternative splicing In protein-coding transcripts, 79% of splice variations altered the protein product Whole-transcriptome analyses resulted in the identification of 2,431 sense-antisense pairs The present work, completely supported by physical clones, provides the most comprehensive survey of a mammalian transcriptome so far, and is a valuable resource for functional genomics

Journal ArticleDOI
TL;DR: A novel family of repetitive DNA sequences that is present among both domains of the prokaryotes but absent from eukaryotes or viruses is studied, characterized by direct repeats, varying in size from 21 to 37 bp, interspaced by similarly sized non‐repetitive sequences.
Abstract: Using in silico analysis we studied a novel family of repetitive DNA sequences that is present among both domains of the prokaryotes (Archaea and Bacteria), but absent from eukaryotes or viruses. This family is characterized by direct repeats, varying in size from 21 to 37 bp, interspaced by similarly sized non-repetitive sequences. To appreciate their characteri-stic structure, we will refer to this family as the clustered regularly interspaced short palindromic repeats (CRISPR). In most species with two or more CRISPR loci, these loci were flanked on one side by a common leader sequence of 300-500 b. The direct repeats and the leader sequences were conserved within a species, but dissimilar between species. The presence of multiple chromosomal CRISPR loci suggests that CRISPRs are mobile elements. Four CRISPR-associated (cas) genes were identified in CRISPR-containing prokaryotes that were absent from CRISPR-negative prokaryotes. The cas genes were invariably located adjacent to a CRISPR locus, indicating that the cas genes and CRISPR loci have a functional relationship. The cas3 gene showed motifs characteristic for helicases of the superfamily 2, and the cas4 gene showed motifs of the RecB family of exonucleases, suggesting that these genes are involved in DNA metabolism or gene expression. The spatial coherence of CRISPR and cas genes may stimulate new research on the genesis and biological role of these repeats and genes.

Journal ArticleDOI
25 Jan 2002-Cell
TL;DR: What is known about the biological functions of the BRCA proteins is examined and how their disruption can induce susceptibility to specific types of cancer is asked.

Journal ArticleDOI
TL;DR: The LRP5V171 mutation causes high bone density, with a thickened mandible and torus palatinus, by impairing the action of a normal antagonist of the Wnt pathway and thus increasing Wnt signaling.
Abstract: Background Osteoporosis is a major public health problem of largely unknown cause. Loss-of-function mutations in the gene for low-density lipoprotein receptor–related protein 5 (LRP5), which acts in the Wnt signaling pathway, have been shown to cause osteoporosis–pseudoglioma. Methods We performed genetic and biochemical analyses of a kindred with an autosomal dominant syndrome characterized by high bone density, a wide and deep mandible, and torus palatinus. Results Genetic analysis revealed linkage of the syndrome to chromosome 11q12–13 (odds of linkage, >1 million to 1), an interval that contains LRP5. Affected members of the kindred had a mutation in this gene, with valine substituted for glycine at codon 171 (LRP5V171 ). This mutation segregated with the trait in the family and was absent in control subjects. The normal glycine lies in a so-called propeller motif that is highly conserved from fruit flies to humans. Markers of bone resorption were normal in the affected subjects, whereas markers of bo...

Journal ArticleDOI
TL;DR: Gel mobility shift assay using mutant DREB proteins showed that the two amino acids, valine and glutamic acid conserved in the ERF/AP2 domains, especially valine, have important roles in DNA-binding specificity.

Journal ArticleDOI
TL;DR: The complete genomic sequences of human chromosomes 21 and 22 are used to examine the properties of CpG islands in different sequence classes by using a search algorithm that is compatible with the recent detection of 5-methylcytosine in Drosophila, and might suggest that S. cerevisiae has, or once had, C pG methylation.
Abstract: CpG islands are useful markers for genes in organisms containing 5-methylcytosine in their genomes. In addition, CpG islands located in the promoter regions of genes can play important roles in gene silencing during processes such as X-chromosome inactivation, imprinting, and silencing of intragenomic parasites. The generally accepted definition of what constitutes a CpG island was proposed in 1987 by Gardiner-Garden and Frommer [Gardiner-Garden, M. & Frommer, M. (1987) J. Mol. Biol. 196, 261–282] as being a 200-bp stretch of DNA with a C+G content of 50% and an observed CpG/expected CpG in excess of 0.6. Any definition of a CpG island is somewhat arbitrary, and this one, which was derived before the sequencing of mammalian genomes, will include many sequences that are not necessarily associated with controlling regions of genes but rather are associated with intragenomic parasites. We have therefore used the complete genomic sequences of human chromosomes 21 and 22 to examine the properties of CpG islands in different sequence classes by using a search algorithm that we have developed. Regions of DNA of greater than 500 bp with a G+C equal to or greater than 55% and observed CpG/expected CpG of 0.65 were more likely to be associated with the 5′ regions of genes and this definition excluded most Alu-repetitive elements. We also used genome sequences to show strong CpG suppression in the human genome and slight suppression in Drosophila melanogaster and Saccharomyces cerevisiae. This finding is compatible with the recent detection of 5-methylcytosine in Drosophila, and might suggest that S. cerevisiae has, or once had, CpG methylation.

Journal ArticleDOI
TL;DR: The genome-wide program of gene expression during the cell division cycle in a human cancer cell line (HeLa) was characterized using cDNA microarrays to provide a comprehensive catalog of cell cycle regulated genes that can serve as a starting point for functional discovery.
Abstract: The genome-wide program of gene expression during the cell division cycle in a human cancer cell line (HeLa) was characterized using cDNA microarrays. Transcripts of >850 genes showed periodic variation during the cell cycle. Hierarchical clustering of the expression patterns revealed coexpressed groups of previously well-characterized genes involved in essential cell cycle processes such as DNA replication, chromosome segregation, and cell adhesion along with genes of uncharacterized function. Most of the genes whose expression had previously been reported to correlate with the proliferative state of tumors were found herein also to be periodically expressed during the HeLa cell cycle. However, some of the genes periodically expressed in the HeLa cell cycle do not have a consistent correlation with tumor proliferation. Cell cycle-regulated transcripts of genes involved in fundamental processes such as DNA replication and chromosome segregation seem to be more highly expressed in proliferative tumors simply because they contain more cycling cells. The data in this report provide a comprehensive catalog of cell cycle regulated genes that can serve as a starting point for functional discovery. The full dataset is available at http://genome-www.stanford.edu/Human-CellCycle/HeLa/.