Showing papers in &quot;Genome Research in 2006&quot;

Copy number variation: New insights in genome diversity

TL;DR: A high-throughput 3C approach, 3C-Carbon Copy (5C), that employs microarrays or quantitative DNA sequencing using 454-technology as detection methods that should be widely applicable for large-scale mapping of cis- and trans- interaction networks of genomic elements and for the study of higher-order chromosome structure.

...read moreread less

Abstract: Physical interactions between genetic elements located throughout the genome play important roles in gene regulation and can be identified with the Chromosome Conformation Capture (3C) methodology. 3C converts physical chromatin interactions into specific ligation products, which are quantified individually by PCR. Here we present a high-throughput 3C approach, 3C-Carbon Copy (5C), that employs microarrays or quantitative DNA sequencing using 454-technology as detection methods. We applied 5C to analyze a 400-kb region containing the human beta-globin locus and a 100-kb conserved gene desert region. We validated 5C by detection of several previously identified looping interactions in the beta-globin locus. We also identified a new looping interaction in K562 cells between the beta-globin Locus Control Region and the gamma-beta-globin intergenic region. Interestingly, this region has been implicated in the control of developmental globin gene switching. 5C should be widely applicable for large-scale mapping of cis- and trans- interaction networks of genomic elements and for the study of higher-order chromosome structure.

...read moreread less

1,178 citations

Journal Article•DOI•

[...]

Jennifer L. Freeman¹, George H. Perry, Lars Feuk², Richard Redon³, Steven A. McCarroll⁴, David Altshuler⁴, Hiroyuki Aburatani⁵, Keith W. Jones⁶, Chris Tyler-Smith³, Matthew E. Hurles³, Nigel P. Carter³, Stephen W. Scherer², Charles Lee⁴ - Show less +9 more•Institutions (6)

Brigham and Women's Hospital¹, University of Toronto², Wellcome Trust Sanger Institute³, Harvard University⁴, University of Tokyo⁵, Thermo Fisher Scientific⁶

Tissue-specific expression and regulation of sexually dimorphic genes in mice

TL;DR: Current efforts are directed toward a more comprehensive cataloging and characterization of CNVs that will provide the basis for determining how genomic diversity impacts biological function, evolution, and common human diseases.

...read moreread less

Abstract: DNA copy number variation has long been associated with specific chromosomal rearrangements and genomic disorders, but its ubiquity in mammalian genomes was not fully realized until recently. Although our understanding of the extent of this variation is still developing, it seems likely that, at least in humans, copy number variants (CNVs) account for a substantial amount of genetic variation. Since many CNVs include genes that result in differential levels of gene expression, CNVs may account for a significant proportion of normal phenotypic variation. Current efforts are directed toward a more comprehensive cataloging and characterization of CNVs that will provide the basis for determining how genomic diversity impacts biological function, evolution, and common human diseases.

...read moreread less

855 citations

Journal Article•DOI•

[...]

Xia Yang¹, Eric E. Schadt, Susanna Wang, Hui Wang, Arthur P. Arnold, Leslie Ingram-Drake, Thomas A. Drake, Aldons J. Lusis - Show less +4 more•Institutions (1)

University of California, Los Angeles¹

Widespread genome duplications throughout the history of flowering plants

TL;DR: Genetic analyses provided evidence of the global regulation of subsets of the sexually dimorphic genes, as the transcript levels of a large number of these genes were controlled by several expression quantitative trait loci (eQTL) hotspots that exhibited tissue-specific control.

...read moreread less

Abstract: We report a comprehensive analysis of gene expression differences between sexes in multiple somatic tissues of 334 mice derived from an intercross between inbred mouse strains C57BL/6J and C3H/HeJ. The analysis of a large number of individuals provided the power to detect relatively small differences in expression between sexes, and the use of an intercross allowed analysis of the genetic control of sexually dimorphic gene expression. Microarray analysis of 23,574 transcripts revealed that the extent of sexual dimorphism in gene expression was much greater than previously recognized. Thus, thousands of genes showed sexual dimorphism in liver, adipose, and muscle, and hundreds of genes were sexually dimorphic in brain. These genes exhibited highly tissue-specific patterns of expression and were enriched for distinct pathways represented in the Gene Ontology database. They also showed evidence of chromosomal enrichment, not only on the sex chromosomes, but also on several autosomes. Genetic analyses provided evidence of the global regulation of subsets of the sexually dimorphic genes, as the transcript levels of a large number of these genes were controlled by several expression quantitative trait loci (eQTL) hotspots that exhibited tissue-specific control. Moreover, many tissue-specific transcription factor binding sites were found to be enriched in the sexually dimorphic genes.

...read moreread less

801 citations

Journal Article•DOI•

[...]

Liying Cui¹, P. Kerr Wall¹, Jim Leebens-Mack¹, Bruce G. Lindsay¹, Douglas E. Soltis², Jeff J. Doyle³, Pamela S. Soltis², John E. Carlson, Kathiravetpilla Arumuganathan⁴, Abdelali Barakat¹, Victor A. Albert⁵, Hong Ma, Claude W. dePamphilis - Show less +9 more•Institutions (5)

Pennsylvania State University¹, University of Florida², Cornell University³, Virginia Mason Medical Center⁴, University of Oslo⁵

01 Jun 2006-Genome Research

TL;DR: Cross-species sequence divergence estimates suggest that synonymous substitution rates in the basal angiosperms are less than half those previously reported for core eudicots and members of Poaceae, and lower substitution rates permit inference of older duplication events.

...read moreread less

Abstract: Genomic comparisons provide evidence for ancient genome-wide duplications in a diverse array of animals and plants. We developed a birth–death model to identify evidence for genome duplication in EST data, and applied a mixture model to estimate the age distribution of paralogous pairs identified in EST sets for species representing the basal-most extant flowering plant lineages. We found evidence for episodes of ancient genome-wide duplications in the basal angiosperm lineages including Nuphar advena (yellow water lily: Nymphaeaceae) and the magnoliids Persea americana (avocado: Lauraceae), Liriodendron tulipifera (tulip poplar: Magnoliaceae), and Saruma henryi (Aristolochiaceae). In addition, we detected independent genome duplications in the basal eudicot Eschscholzia californica (California poppy: Papaveraceae) and the basal monocot Acorus americanus (Acoraceae), both of which were distinct from duplications documented for ancestral grass (Poaceae) and core eudicot lineages. Among gymnosperms, we found equivocal evidence for ancient polyploidy in Welwitschia mirabilis (Gnetales) and no evidence for polyploidy in pine, although gymnosperms generally have much larger genomes than the angiosperms investigated. Cross-species sequence divergence estimates suggest that synonymous substitution rates in the basal angiosperms are less than half those previously reported for core eudicots and members of Poaceae. These lower substitution rates permit inference of older duplication events. We hypothesize that evidence of an ancient duplication observed in the Nuphar data may represent a genome duplication in the common ancestor of all or most extant angiosperms, except Amborella.

...read moreread less

677 citations

Journal Article•DOI•

High-throughput DNA methylation profiling using universal bead arrays

[...]

Marina Bibikova¹, Zhenwu Lin², Lixin Zhou¹, Eugene Chudin¹, Eliza Wickham Garcia¹, Bonnie Wu¹, Dennis Doucet¹, Neal J. Thomas², Yunhua Wang², Ekkehard Vollmer, Torsten Goldmann, Carola Seifart³, Wei Jiang⁴, David L. Barker¹, Mark S. Chee¹, Joanna Floros, Jian-Bing Fan¹ - Show less +13 more•Institutions (4)

Illumina¹, Pennsylvania State University², University of Marburg³, Discovery Institute⁴

01 Mar 2006-Genome Research

TL;DR: The results demonstrate the effectiveness of the method for reliably profiling many CpG sites in parallel for the discovery of informative methylation markers and should prove useful for DNA methylation analyses in large populations.

...read moreread less

Abstract: We have developed a high-throughput method for analyzing the methylation status of hundreds of preselected genes simultaneously and have applied it to the discovery of methylation signatures that distinguish normal from cancer tissue samples. Through an adaptation of the GoldenGate genotyping assay implemented on a BeadArray platform, the methylation state of 1536 specific CpG sites in 371 genes (one to nine CpG sites per gene) was measured in a single reaction by multiplexed genotyping of 200 ng of bisulfite-treated genomic DNA. The assay was used to obtain a quantitative measure of the methylation level at each CpG site. After validating the assay in cell lines and normal tissues, we analyzed a panel of lung cancer biopsy samples (N = 22) and identified a panel of methylation markers that distinguished lung adenocarcinomas from normal lung tissues with high specificity. These markers were validated in a second sample set (N = 24). These results demonstrate the effectiveness of the method for reliably profiling many CpG sites in parallel for the discovery of informative methylation markers. The technology should prove useful for DNA methylation analyses in large populations, with potential application to the classification and diagnosis of a broad range of cancers and other diseases.

...read moreread less

676 citations

Journal Article•DOI•

An initial map of insertion and deletion (INDEL) variation in the human genome

[...]

Ryan E. Mills¹, Christopher T. Luttig¹, Christine E. Larkins¹, Adam D. Beauchamp¹, Circe Tsui¹, W. Stephen Pittard¹, Scott E. Devine¹ - Show less +3 more•Institutions (1)

Emory University¹

High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping

TL;DR: An initial map of human INDEL variation that contains 415,436 unique INDEL polymorphisms, which range from 1 bp to 9989 bp in length and are split almost equally between insertions and deletions, relative to the chimpanzee genome sequence.

...read moreread less

Abstract: Although many studies have been conducted to identify single nucleotide polymorphisms (SNPs) in humans, few studies have been conducted to identify alternative forms of natural genetic variation, such as insertion and deletion (INDEL) polymorphisms. In this report, we describe an initial map of human INDEL variation that contains 415,436 unique INDEL polymorphisms. These INDELs were identified with a computational approach using DNA re-sequencing traces that originally were generated for SNP discovery projects. They range from 1 bp to 9989 bp in length and are split almost equally between insertions and deletions, relative to the chimpanzee genome sequence. Five major classes of INDELs were identified, including (1) insertions and deletions of single-base pairs, (2) monomeric base pair expansions, (3) multi-base pair expansions of 2–15 bp repeat units, (4) transposon insertions, and (5) INDELs containing random DNA sequences. Our INDELs are distributed throughout the human genome with an average density of one INDEL per 7.2 kb of DNA. Variation hotspots were identified with up to 48-fold regional increases in INDEL and/or SNP variation compared with the chromosomal averages for the same chromosomes. Over 148,000 INDELs (35.7%) were identified within known genes, and 5542 of these INDELs were located in the promoters and exons of genes, where gene function would be expected to be influenced the greatest. All INDELs in this study have been deposited into dbSNP and have been integrated into maps of human genetic variation that are available to the research community.

...read moreread less

644 citations

Journal Article•DOI•

[...]

Daniel A. Peiffer¹, Jennie M. Le, Frank J. Steemers, Weihua Chang, Tony Jenniges, Francisco Garcia, Kirtley B. Haden, Jiangzhen Li, Chad A. Shaw, John W. Belmont, Sau Wai Cheung, Richard Shen, David L. Barker, Kevin L. Gunderson - Show less +10 more•Institutions (1)

Illumina¹

The chemoreceptor superfamily in the honey bee Apis mellifera: Expansion of the odorant, but not gustatory, receptor family

TL;DR: The utility of SNP-CGH is demonstrated with two Infinium whole-genome genotyping BeadChips, assaying 109,000 and 317,000 SNP loci, and the statistical ability to detect common aberrations was modeled by analysis of an X chromosome titration model system, and sensitivity was modeling by titration of gDNA from a tumor cell with that of its paired normal cell line.

...read moreread less

Abstract: Array-CGH is a powerful tool for the detection of chromosomal aberrations. The introduction of high-density SNP genotyping technology to genomic profiling, termed SNP-CGH, represents a further advance, since simultaneous measurement of both signal intensity variations and changes in allelic composition makes it possible to detect both copy number changes and copy-neutral loss-of-heterozygosity (LOH) events. We demonstrate the utility of SNP-CGH with two Infinium whole-genome genotyping BeadChips, assaying 109,000 and 317,000 SNP loci, to detect chromosomal aberrations in samples bearing constitutional aberrations as well tumor samples at sub-100 kb effective resolution. Detected aberrations include homozygous deletions, hemizygous deletions, copy-neutral LOH, duplications, and amplifications. The statistical ability to detect common aberrations was modeled by analysis of an X chromosome titration model system, and sensitivity was modeled by titration of gDNA from a tumor cell with that of its paired normal cell line. Analysis was facilitated by using a genome browser that plots log ratios of normalized intensities and allelic ratios along the chromosomes. We developed two modes of SNP-CGH analysis, a single sample and a paired sample mode. The single sample mode computes log intensity ratios and allelic ratios by referencing to canonical genotype clusters generated from ∼120 reference samples, whereas the paired sample mode uses a paired normal reference sample from the same individual. Finally, the two analysis modes are compared and contrasted for their utility in analyzing different types of input gDNA: low input amounts, fragmented gDNA, and Phi29 whole-genome pre-amplified DNA.

...read moreread less

547 citations

Journal Article•DOI•

[...]

Hugh M. Robertson¹, Kevin W. Wanner¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Nov 2006-Genome Research

TL;DR: The honey bee genome sequence reveals a remarkable expansion of the insect odorant receptor (Or) family relative to the repertoires of the flies Drosophila melanogaster and Anopheles gambiae, which have 62 and 79 Ors respectively.

...read moreread less

Abstract: The honey bee genome sequence reveals a remarkable expansion of the insect odorant receptor (Or) family relative to the repertoires of the flies Drosophila melanogaster and Anopheles gambiae, which have 62 and 79 Ors respectively. A total of 170 Or genes were annotated in the bee, of which seven are pseudogenes. These constitute five bee-specific subfamilies in an insect Or family tree, one of which has expanded to a total of 157 genes encoding proteins with 15%-99% amino acid identity. Most of the Or genes are in tandem arrays, including one with 60 genes. This bee-specific expansion of the Or repertoire presumably underlies their remarkable olfactory abilities, including perception of several pheromone blends, kin recognition signals, and diverse floral odors. The number of Apis mellifera Ors is approximately equal to the number of glomeruli in the bee antennal lobe (160-170), consistent with a general one-receptor/one-neuron/one-glomerulus relationship. The bee genome encodes just 10 gustatory receptors (Grs) compared with the D. melanogaster and A. gambiae repertoires of 68 and 76 Grs, respectively. A lack of Gr gene family expansion primarily accounts for this difference. A nurturing hive environment and a mutualistic relationship with plants may explain the lack of Gr family expansion. The Or family is the most dramatic example of gene family expansion in the bee genome, and characterizing their caste- and sex-specific gene expression may provide clues to their specific roles in detection of pheromone, kin, and floral odors.

...read moreread less

542 citations

Journal Article•DOI•

Doubling genome size without polyploidization: dynamics of retrotransposition-driven genomic expansions in Oryza australiensis, a wild relative of rice.

[...]

Benoît Piégu, Romain Guyot, Nathalie Picault, Anne C. Roulin, Abhijit Saniyal¹, Hyeran Kim², Kristi Collura², Darshan S. Brar³, Scott A. Jackson¹, Rod A. Wing, Olivier Panaud - Show less +7 more•Institutions (3)

Purdue University¹, University of Arizona², International Rice Research Institute³

Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS)

TL;DR: In this article, the authors show that Oryza australiensis, a wild relative of the Asian cultivated rice O. sativa, has undergone recent bursts of three LTR-retrotransposon families, leading to a rapid twofold increase of its size.

...read moreread less

Abstract: Retrotransposons are the main components of eukaryotic genomes, representing up to 80% of some large plant genomes. These mobile elements transpose via a “copy and paste” mechanism, thus increasing their copy number while active. Their accumulation is now accepted as the main factor of genome size increase in higher eukaryotes, besides polyploidy. However, the dynamics of this process are poorly understood. In this study, we show that Oryza australiensis, a wild relative of the Asian cultivated rice O. sativa, has undergone recent bursts of three LTR-retrotransposon families. This genome has accumulated more than 90,000 retrotransposon copies during the last three million years, leading to a rapid twofold increase of its size. In addition, phenetic analyses of these retrotransposons clearly confirm that the genomic bursts occurred posterior to the radiation of the species. This provides direct evidence of retrotransposon-mediated variation of genome size within a plant genus.

...read moreread less

541 citations

Journal Article•DOI•

[...]

Gregory E. Crawford¹, Ingeborg Holt¹, James R R Whittle¹, Bryn D. Webb¹, Denise Tai¹, Sean Davis¹, Elliott H. Margulies¹, Yi Chen¹, John A. Bernat², David Ginsburg, Daixing Zhou, Shujun Luo, Thomas J. Vasicek, Mark J. Daly³, Tyra G. Wolfsberg¹, Francis S. Collins¹ - Show less +12 more•Institutions (3)

National Institutes of Health¹, University of Michigan², Harvard University³

Experimental validation of the regulated expression of large numbers of non-coding RNAs from the mouse genome

TL;DR: High-throughput analysis, using massively parallel signature sequencing (MPSS), of 230,000 tags from a DNase library generated from quiescent human CD4+ T cells identifies 14,190 clusters of sequences that group within close proximity to each other that represent valid DNase HS sites.

...read moreread less

Abstract: A major goal in genomics is to understand how genes are regulated in different tissues, stages of development, diseases, and species. Mapping DNase I hypersensitive (HS) sites within nuclear chromatin is a powerful and well-established method of identifying many different types of regulatory elements, but in the past it has been limited to analysis of single loci. We have recently described a protocol to generate a genome-wide library of DNase HS sites. Here, we report high-throughput analysis, using massively parallel signature sequencing (MPSS), of 230,000 tags from a DNase library generated from quiescent human CD4+ T cells. Of the tags that uniquely map to the genome, we identified 14,190 clusters of sequences that group within close proximity to each other. By using a real-time PCR strategy, we determined that the majority of these clusters represent valid DNase HS sites. Approximately 80% of these DNase HS sites uniquely map within one or more annotated regions of the genome believed to contain regulatory elements, including regions 2 kb upstream of genes, CpG islands, and highly conserved sequences. Most DNase HS sites identified in CD4+ T cells are also HS in CD8+ T cells, B cells, hepatocytes, human umbilical vein endothelial cells (HUVECs), and HeLa cells. However, ∼10% of the DNase HS sites are lymphocyte specific, indicating that this procedure can identify gene regulatory elements that control cell type specificity. This strategy, which can be applied to any cell line or tissue, will enable a better understanding of how chromatin structure dictates cell function and fate.

...read moreread less

513 citations

Journal Article•DOI•

[...]

Timothy Ravasi¹, Harukazu Suzuki, Ken C Pang², Shintaro Katayama, Masaaki Furuno, Rie Okunishi, Shiro Fukuda, Kelin Ru, Martin C. Frith, Milena Gongora, Sean M. Grimmond, David A. Hume², Yoshihide Hayashizaki, John S. Mattick - Show less +10 more•Institutions (2)

University of Queensland¹, University of Edinburgh²

Diversification of transcriptional modulation: Large-scale identification and characterization of putative alternative promoters of human genes

TL;DR: Analysis of the genomic landscape around these sequences indicates that some cDNA clones were produced not from terminal poly(A) tracts but internal priming sites within longer transcripts, only a minority of which is encompassed by known genes.

...read moreread less

Abstract: Recent large-scale analyses of mainly full-length cDNA libraries generated from a variety of mouse tissues indicated that almost half of all representative cloned sequences did not contain an apparent protein-coding sequence, and were putatively derived from non-protein-coding RNA (ncRNA) genes. However, many of these clones were singletons and the majority were unspliced, raising the possibility that they may be derived from genomic DNA or unprocessed pre-mRNA contamination during library construction, or alternatively represent nonspecific "transcriptional noise." Here we show, using reverse transcriptase-dependent PCR, microarray, and Northern blot analyses, that many of these clones were derived from genuine transcripts of unknown function whose expression appears to be regulated. The ncRNA transcripts have larger exons and fewer introns than protein-coding transcripts. Analysis of the genomic landscape around these sequences indicates that some cDNA clones were produced not from terminal poly(A) tracts but internal priming sites within longer transcripts, only a minority of which is encompassed by known genes. A significant proportion of these transcripts exhibit tissue-specific expression patterns, as well as dynamic changes in their expression in macrophages following lipopolysaccharide stimulation. Taken together, the data provide strong support for the conclusion that ncRNAs are an important, regulated component of the mammalian transcriptome.

...read moreread less

Journal Article•DOI•

[...]

Kouichi Kimura¹, Ai Wakamatsu, Yutaka Suzuki², Toshio Ota, Tetsuo Nishikawa¹, Riu Yamashita², Junichi Yamamoto, Mitsuo Sekine, Katsuki Tsuritani², Hiroyuki Wakaguri², Shizuko Ishii, Tomoyasu Sugiyama³, Kaoru Saito, Yuko Isono, Ryotaro Irie, Norihiro Kushida, Takahiro Yoneyama, Rie Otsuka, Katsuhiro Kanda¹, Takahide Yokoi¹, Hiroshi Kondo¹, Masako Wagatsuma¹, Katsuji Murakawa¹, Shinichi Ishida¹, Tadashi Ishibashi¹, Asako Takahashi-Fujii, Tomo-o Tanase, Keiichi Nagai¹, Hisashi Kikuchi, Kenta Nakai², Takao Isogai, Sumio Sugano² - Show less +28 more•Institutions (3)

Hitachi¹, University of Tokyo², Tokyo University of Technology³

Gene-balanced duplications, like tetraploidy, provide predictable drive to increase morphological complexity

TL;DR: The findings suggest that use of alternate promoters and consequent alternative use of first exons should play a pivotal role in generating the complexity required for the highly elaborated molecular systems in humans.

...read moreread less

Abstract: By analyzing 1,780,295 5'-end sequences of human full-length cDNAs derived from 164 kinds of oligo-cap cDNA libraries, we identified 269,774 independent positions of transcriptional start sites (TSSs) for 14,628 human RefSeq genes. These TSSs were clustered into 30,964 clusters that were separated from each other by more than 500 bp and thus are very likely to constitute mutually distinct alternative promoters. To our surprise, at least 7674 (52%) human RefSeq genes were subject to regulation by putative alternative promoters (PAPs). On average, there were 3.1 PAPs per gene, with the composition of one CpG-island-containing promoter per 2.6 CpG-less promoters. In 17% of the PAP-containing loci, tissue-specific use of the PAPs was observed. The richest tissue sources of the tissue-specific PAPs were testis and brain. It was also intriguing that the PAP-containing promoters were enriched in the genes encoding signal transduction-related proteins and were rarer in the genes encoding extracellular proteins, possibly reflecting the varied functional requirement for and the restricted expression of those categories of genes, respectively. The patterns of the first exons were highly diverse as well. On average, there were 7.7 different splicing types of first exons per locus partly produced by the PAPs, suggesting that a wide variety of transcripts can be achieved by this mechanism. Our findings suggest that use of alternate promoters and consequent alternative use of first exons should play a pivotal role in generating the complexity required for the highly elaborated molecular systems in humans.

...read moreread less

Journal Article•DOI•

[...]

Michael Freeling¹, Brian C. Thomas•Institutions (1)

University of California, Berkeley¹

01 Jul 2006-Genome Research

TL;DR: It is argued that "balanced gene drive" is a sufficient explanation for the trend that the maximums of morphological complexity have gone up, and not down, in both plant and animal eukaryotic lineages.

...read moreread less

Abstract: Controversy surrounds the apparent rising maximums of morphological complexity during eukaryotic evolution, with organisms increasing the number and nestedness of developmental areas as evidenced by morphological elaborations reflecting area boundaries. No "predictable drive" to increase this sort of complexity has been reported. Recent genetic data and theory in the general area of gene dosage effects has engendered a robust "gene balance hypothesis," with a theoretical base that makes specific predictions as to gene content changes following different types of gene duplication. Genomic data from both chordate and angiosperm genomes fit these predictions: Each type of duplication provides a one-way injection of a biased set of genes into the gene pool. Tetraploidies and balanced segments inject bias for those genes whose products are the subunits of the most complex biological machines or cascades, like transcription factors (TFs) and proteasome core proteins. Most duplicate genes are removed after tetraploidy. Genic balance is maintained by not removing those genes that are dose-sensitive, which tends to leave duplicate "functional modules" as the indirect products (spandrels) of purifying selection. Functional modules are the likely precursors of coadapted gene complexes, a unit of natural selection. The result is a predictable drive mechanism where "drive" is used rigorously, as in "meiotic drive." Rising morphological gain is expected given a supply of duplicate functional modules. All flowering plants have survived at least three large-scale duplications/diploidizations over the last 300 million years (Myr). An equivalent period of tetraploidy and body plan evolution may have ended for animals 500 million years ago (Mya). We argue that "balanced gene drive" is a sufficient explanation for the trend that the maximums of morphological complexity have gone up, and not down, in both plant and animal eukaryotic lineages.

...read moreread less

Journal Article•DOI•

Following tetraploidy in an Arabidopsis ancestor, genes were removed preferentially from one homeolog leaving clusters enriched in dose-sensitive genes

[...]

Brian C. Thomas¹, Brent S. Pedersen, Michael Freeling•Institutions (1)

University of California, Berkeley¹

01 Jul 2006-Genome Research

TL;DR: It is found that islands of retention contain "connected genes," those genes predicted-by the gene balance hypothesis-to be resistant to removal because the products they encode interact with other products in a dose-sensitive manner, creating a web of dependency.

...read moreread less

Abstract: Approximately 90% of Arabidopsis’ unique gene content is found in syntenic blocks that were formed during the most recent whole-genome duplication. Within these blocks, 28.6% of the genes have a retained pair; the remaining genes have been lost from one of the homeologs. We create a minimized genome by condensing local duplications to one gene, removing transposons, and including only genes within blocks defined by retained pairs. We use a moving average of retained and non-retained genes to find clusters of retention and then identify the types of genes that appear in clusters at frequencies above expectations. Significant clusters of retention exist for almost all chromosomal segments. Detailed alignments show that, for 85% of the genome, one homeolog was preferentially (1.6×) targeted for fractionation. This homeolog fractionation bias suggests an epigenetic mechanism. We find that islands of retention contain “connected genes,” those genes predicted—by the gene balance hypothesis—to be resistant to removal because the products they encode interact with other products in a dose-sensitive manner, creating a web of dependency. Gene families that are overrepresented in clusters include those encoding components of the proteasome/protein modification complexes, signal transduction machinery, ribosomes, and transcription factor complexes. Gene pair fractionation following polyploidy or segmental duplication leaves a genome enriched for “connected” genes. These clusters of duplicate genes may help explain the evolutionary origin of coregulated chromosomal regions and new functional modules.

...read moreread less

Journal Article•DOI•

Comparative isoschizomer profiling of cytosine methylation: The HELP assay

[...]

Batbayar Khulan¹, Reid F. Thompson², Kenny Ye², Melissa Fazzari², Masako Suzuki², Edyta Stasiek², Maria E. Figueroa², Jacob L. Glass², Quan Chen², Cristina Montagna², Eli Hatchwell³, Rebecca R. Selzer⁴, Todd Richmond⁴, Roland Green⁴, Ari Melnick², John M. Greally² - Show less +12 more•Institutions (4)

Albert Einstein College of Medicine¹, Yeshiva University², Cold Spring Harbor Laboratory³, Hoffmann-La Roche⁴

Large-scale identification of protein-protein interaction of Escherichia coli K-12.

TL;DR: The HpaII tiny fragment Enrichment by Ligation-mediated PCR assay is robust, quantitative, and accurate and is providing new insights into the distribution and dynamic nature of cytosine methylation in the genome.

...read moreread less

Abstract: The distribution of cytosine methylation in 6.2 Mb of the mouse genome was tested using cohybridization of genomic representations from a methylation-sensitive restriction enzyme and its methylation-insensitive isoschizomer. This assay, termed HELP (HpaII tiny fragment Enrichment by Ligation-mediated PCR), allows both intragenomic profiling and intergenomic comparisons of cytosine methylation. The intragenomic profile shows most of the genome to be contiguous methylated sequence with occasional clusters of hypomethylated loci, usually but not exclusively at promoters and CpG islands. Intergenomic comparison found marked differences in cytosine methylation between spermatogenic and brain cells, identifying 223 new candidate tissue-specific differentially methylated regions (T-DMRs). Bisulfite pyrosequencing confirmed the four candidates tested to be T-DMRs, while quantitative RT-PCR for two genes with T-DMRs located at their promoters showed the HELP data to be correlated with gene activity at these loci. The HELP assay is robust, quantitative, and accurate and is providing new insights into the distribution and dynamic nature of cytosine methylation in the genome.

...read moreread less

Journal Article•DOI•

[...]

Mohammad Arifuzzaman¹, Maki Maeda¹, Aya Itoh², Kensaku Nishikata¹, Chiharu Takita, Rintaro Saito², Takeshi Ara², Kenji Nakahigashi², Hsuan Cheng Huang³, Aki Hirai, Kohei Tsuzuki², Seira Nakamura², Mohammad Altaf-Ul-Amin¹, Taku Oshima¹, Tomoya Baba², Tomoya Baba¹, Natsuko Yamamoto¹, Tomoyo Kawamura, Tomoko Ioka-Nakamichi, Masanari Kitagawa¹, Masaru Tomita², Shigehiko Kanaya¹, Chieko Wada⁴, Hirotada Mori¹, Hirotada Mori² - Show less +21 more•Institutions (4)

Nara Institute of Science and Technology¹, Keio University², National Yang-Ming University³, Kyoto University⁴

01 May 2006-Genome Research

TL;DR: An extended analysis of these interacting networks by bioinformatics and experimentation should provide new insights and novel strategies for E. coli systems biology.

...read moreread less

Abstract: Escherichia coli is one of the best characterized organisms and has served as a model system to study many aspects of bacterial physiology and genetics of fundamental and applied interest. Among the 4339 predicted ORFs including previous prediction (Riley et al. 2006) in E. coli, nearly 50% are experimentally uncharacterized. In addition to functional analysis of individual ORFs, systematic analyses of relationships between constituent elements, such as gene regulatory networks, protein–protein interactions (PPIs), and metabolic networks, have only recently become feasible. To date, comprehensive PPI studies have been based on the yeast two-hybrid system that detects binary interactions through activation of reporter gene expression (Fields and Song 1989; Uetz et al. 2000; Ito et al. 2001), and pull-down assays that detect large complexes by copurification of prey proteins through their interactions with bait proteins (Gavin et al. 2002; Ho et al. 2002) or protein chips (Zhu et al. 2001). In E. coli, a large-scale protein interaction network was recently carried out by pull-down assay using TAP-tagged bait proteins (Butland et al. 2005). We have already described a comprehensive E. coli ORF library (the ASKA library) as a new resource for E. coli biology (Kitagawa et al. 2005). Here, we report the use of this resource in a systematic analysis of PPIs using pull-down assays. With the advent of matrixassisted laser desorption ionization time of flight (MALDI-TOF) mass spectrometry methods, it is feasible to identify PPIs on a proteome-wide scale. We have carried out a large-scale identification of protein–protein interactions to gain further understanding of the E. coli model cell at the system level. Because E. coli is one of the best studied organisms, it should also be an excellent target for systems biology (Kitano 2002) and synthetic biology fields (Silver and Way 2004) approaches.

...read moreread less

Journal Article•DOI•

Differential lineage-specific amplification of transposable elements is responsible for genome size variation in Gossypium

[...]

Jennifer S. Hawkins¹, Hyeran Kim², John D. Nason¹, Rod A. Wing, Jonathan F. Wendel¹ - Show less +1 more•Institutions (2)

Iowa State University¹, University of Arizona²

How reliable are empirical genomic scans for selective sweeps

TL;DR: To understand the modes and mechanisms that underlie variation in genome composition, sequence data from whole genome shotgun libraries for three representative diploid members of Gossypium were generated and a pattern of lineage-specific amplification of particular subfamilies of retrotransposons within each species studied was demonstrated.

...read moreread less

Abstract: The DNA content of eukaryotic nuclei (C-value) varies ∼200,000-fold, but there is only a ∼20-fold variation in the number of protein-coding genes. Hence, most C-value variation is ascribed to the repetitive fraction, although little is known about the evolutionary dynamics of the specific components that lead to genome size variation. To understand the modes and mechanisms that underlie variation in genome composition, we generated sequence data from whole genome shotgun (WGS) libraries for three representative diploid (n = 13) members of Gossypium that vary in genome size from 880 to 2460 Mb (1C) and from a phylogenetic outgroup, Gossypioides kirkii, with an estimated genome size of 588 Mb. Copy number estimates including all dispersed repetitive sequences indicate that 40%–65% of each genome is composed of transposable elements. Inspection of individual sequence types revealed differential, lineage-specific expansion of various families of transposable elements among the different plant lineages. Copia-like retrotransposable element sequences have differentially accumulated in the Gossypium species with the smallest genome, G. raimondii, while gypsy-like sequences have proliferated in the lineages with larger genomes. Phylogenetic analyses demonstrated a pattern of lineage-specific amplification of particular subfamilies of retrotransposons within each species studied. One particular group of gypsy-like retrotransposon sequences, Gorge3 (Gossypium retrotransposable gypsy-like element), appears to have undergone a massive proliferation in two plant lineages, accounting for a major fraction of genome-size change. Like maize, Gossypium has undergone a threefold increase in genome size due to the accumulation of LTR retrotransposons over the 5–10 Myr since its origin. [The sequence data described in this paper have been submitted to the GSS Division of GenBank under accessions DX390732–DX406528.]

...read moreread less

Journal Article•DOI•

[...]

Kosuke M. Teshima¹, Graham Coop¹, Molly Przeworski¹•Institutions (1)

University of Chicago¹

01 Jun 2006-Genome Research

TL;DR: This work considered a coalescent model of directional selection in a sensible demographic setting, allowing for selection on standing variation as well as on a new mutation, and concluded that, insofar as attributes of the beneficial mutation affect the power to detect targets of selection, genomic scans will yield an unrepresentative subset of loci that contribute to adaptations.

...read moreread less

Abstract: The beneficial substitution of an allele shapes patterns of genetic variation at linked sites. Thus, in principle, adaptations can be mapped by looking for the signature of directional selection in polymorphism data. In practice, such efforts are hampered by the need for an accurate characterization of the demographic history of the species and of the effects of positive selection. In an attempt to circumvent these difficulties, researchers are increasingly taking a purely empirical approach, in which a large number of genomic regions are ordered by summaries of the polymorphism data, and loci with extreme values are considered to be likely targets of positive selection. We evaluated the reliability of the "empirical" approach, focusing on applications to human data and to maize. To do so, we considered a coalescent model of directional selection in a sensible demographic setting, allowing for selection on standing variation as well as on a new mutation. Our simulations suggest that while empirical approaches will identify several interesting candidates, they will also miss many--in some cases, most--loci of interest. The extent of the trade-off depends on the mode of positive selection and the demographic history of the population. Specifically, the false-discovery rate is higher when directional selection involves a recessive rather than a co-dominant allele, when it acts on a previously neutral rather than a new allele, and when the population has experienced a population bottleneck rather than maintained a constant size. One implication of these results is that, insofar as attributes of the beneficial mutation (e.g., the dominance coefficient) affect the power to detect targets of selection, genomic scans will yield an unrepresentative subset of loci that contribute to adaptations.

...read moreread less

Journal Article•DOI•

Large-scale structure of genomic methylation patterns

[...]

Robert A. Rollins¹, Fatemeh Haghighi¹, John R. Edwards, Rajdeep Das², Michael Q. Zhang², Jingyue Ju¹, Timothy H. Bestor¹ - Show less +3 more•Institutions (2)

Columbia University¹, Cold Spring Harbor Laboratory²

01 Feb 2006-Genome Research

TL;DR: The enrichment of Regulatory sequences in the relatively small unmethylated compartment suggests that cytosine methylation constrains the effective size of the genome through the selective exposure of regulatory sequences.

...read moreread less

Abstract: The mammalian genome depends on patterns of methylated cytosines for normal function, but the relationship between genomic methylation patterns and the underlying sequence is unclear. We have characterized the methylation landscape of the human genome by global analysis of patterns of CpG depletion and by direct sequencing of 3073 unmethylated domains and 2565 methylated domains from human brain DNA. The genome was found to consist of short (<4 kb) unmethylated domains embedded in a matrix of long methylated domains. Unmethylated domains were enriched in promoters, CpG islands, and first exons, while methylated domains comprised interspersed and tandem-repeated sequences, exons other than first exons, and non-annotated single-copy sequences that are depleted in the CpG dinucleotide. The enrichment of regulatory sequences in the relatively small unmethylated compartment suggests that cytosine methylation constrains the effective size of the genome through the selective exposure of regulatory sequences. This buffers regulatory networks against changes in total genome size and provides an explanation for the C value paradox, which concerns the wide variations in genome size that scale independently of gene number. This suggestion is compatible with the finding that cytosine methylation is universal among large-genome eukaryotes, while many eukaryotes with genome sizes <5 x 10(8) bp do not methylate their DNA.

...read moreread less

Journal Article•DOI•

Graemlin: general and robust alignment of multiple large interaction networks.

[...]

Jason Flannick¹, Antal F. Novak, Balaji Srinivasan, Harley H. McAdams, Serafim Batzoglou - Show less +1 more•Institutions (1)

Stanford University¹

MicroRNAs and other small RNAs enriched in the Arabidopsis RNA-dependent RNA polymerase-2 mutant.

TL;DR: Graemlin is developed, the first algorithm capable of scalable multiple network alignment and the first quantitative benchmarks for network alignment, which allow comparisons of algorithms in terms of their ability to recapitulate the KEGG database of conserved functional modules.

...read moreread less

Abstract: The recent proliferation of protein interaction networks has motivated research into network alignment: the cross-species comparison of conserved functional modules. Previous studies have laid the foundations for such comparisons and demonstrated their power on a select set of sparse interaction networks. Recently, however, new computational techniques have produced hundreds of predicted interaction networks with interconnection densities that push existing alignment algorithms to their limits. To find conserved functional modules in these new networks, we have developed Graemlin, the first algorithm capable of scalable multiple network alignment. Graemlin's explicit model of functional evolution allows both the generalization of existing alignment scoring schemes and the location of conserved network topologies other than protein complexes and metabolic pathways. To assess Graemlin's performance, we have developed the first quantitative benchmarks for network alignment, which allow comparisons of algorithms in terms of their ability to recapitulate the KEGG database of conserved functional modules. We find that Graemlin achieves substantial scalability gains over previous methods while improving sensitivity.

...read moreread less

Journal Article•DOI•

[...]

Cheng Lu¹, Karthik Kulkarni, Frederic F. Souret, Ramesh MuthuValliappan, Shivakundan Singh Tej, R. Scott Poethig, Ian R. Henderson, Steven E. Jacobsen, Wenzhong Wang, Pamela J. Green, Blake C. Meyers - Show less +7 more•Institutions (1)

Delaware Biotechnology Institute¹

Suz12 binds to silenced regions of the genome in a cell-type-specific manner

TL;DR: Deep sequencing of mutants provides a genetic approach for the dissection and characterization of diverse small RNA populations and the identification of low abundance miRNAs.

...read moreread less

Abstract: The Arabidopsis genome contains a highly complex and abundant population of small RNAs, and many of the endogenous siRNAs are dependent on RNA-Dependent RNA Polymerase 2 (RDR2) for their biogenesis. By analyzing an rdr2 loss-of-function mutant using two different parallel sequencing technologies, MPSS and 454, we characterized the complement of miRNAs expressed in Arabidopsis inflorescence to considerable depth. Nearly all known miRNAs were enriched in this mutant and we identified 13 new miRNAs, all of which were relatively low abundance and constitute new families. Trans-acting siRNAs (ta-siRNAs) were even more highly enriched. Computational and gel blot analyses suggested that the minimal number of miRNAs in Arabidopsis is approximately 155. The size profile of small RNAs in rdr2 reflected enrichment of 21-nt miRNAs and other classes of siRNAs like ta-siRNAs, and a significant reduction in 24-nt heterochromatic siRNAs. Other classes of small RNAs were found to be RDR2-independent, particularly those derived from long inverted repeats and a subset of tandem repeats. The small RNA populations in other Arabidopsis small RNA biogenesis mutants were also examined; a dcl2/3/4 triple mutant showed a similar pattern to rdr2, whereas dcl1-7 and rdr6 showed reductions in miRNAs and ta-siRNAs consistent with their activities in the biogenesis of these types of small RNAs. Deep sequencing of mutants provides a genetic approach for the dissection and characterization of diverse small RNA populations and the identification of low abundance miRNAs.

...read moreread less

Journal Article•DOI•

[...]

Sharon L. Squazzo¹, Henriette O'Geen¹, Vitalina M. Komashko¹, Sheryl R. Krig¹, Victor X. Jin¹, Sung Wook Jang², Raphaël Margueron³, Danny Reinberg³, Roland Green⁴, Peggy J. Farnham¹ - Show less +6 more•Institutions (4)

University of California, Davis¹, University of Wisconsin-Madison², Howard Hughes Medical Institute³, Hoffmann-La Roche⁴

01 Jul 2006-Genome Research

TL;DR: Surprisingly, the PRC complexes can be localized to discrete binding sites or spread through large regions of the mouse and human genomes, and it is suggested that OCT4 maintains stem cell self-renewal, in part, by recruitingPRC complexes to certain genes that promote differentiation.

...read moreread less

Abstract: Suz12 is a component of the Polycomb group complexes 2, 3, and 4 (PRC 2/3/4). These complexes are critical for proper embryonic development, but very few target genes have been identified in either mouse or human cells. Using a variety of ChIP-chip approaches, we have identified a large set of Suz12 target genes in five different human and mouse cell lines. Interestingly, we found that Suz12 target promoters are cell type specific, with transcription factors and homeobox proteins predominating in embryonal cells and glycoproteins and immunoglobulin-related proteins predominating in adult tumors. We have also characterized the localization of other components of the PRC complex with Suz12 and investigated the overall relationship between Suz12 binding and markers of active versus inactive chromatin, using both promoter arrays and custom tiling arrays. Surprisingly, we find that the PRC complexes can be localized to discrete binding sites or spread through large regions of the mouse and human genomes. Finally, we have shown that some Suz12 target genes are bound by OCT4 in embryonal cells and suggest that OCT4 maintains stem cell self-renewal, in part, by recruiting PRC complexes to certain genes that promote differentiation.

...read moreread less

Journal Article•DOI•

Novel patterns of genome rearrangement and their association with survival in breast cancer

[...]

James W. Hicks¹, Alexander Krasnitz¹, B. Lakshmi¹, Nicholas Navin, Michael Riggs¹, Evan Leibu¹, Diane Esposito¹, Joan Alexander¹, Jen Troge¹, Vladimir Grubor¹, Seungtai Yoon¹, Michael Wigler¹, Kenny Ye², Anne Lise Børresen-Dale, Bjørn Naume, Ellen Schlicting³, Larry Norton⁴, Torsten Hägerström⁵, Lambert Skoog⁵, Gert Auer⁵, Susanne Månér⁵, Pär Lundin⁵, Anders Zetterberg⁵ - Show less +19 more•Institutions (5)

Cold Spring Harbor Laboratory¹, Yeshiva University², University of Oslo³, Memorial Sloan Kettering Cancer Center⁴, Karolinska Institutet⁵

01 Dec 2006-Genome Research

TL;DR: Analysis of a selected subset of clinical material suggests that a simple genomic calculation, based on the number and proximity of genomic alterations, correlates with life-table estimates of the probability of overall survival in patients with primary breast cancer.

...read moreread less

Abstract: Representational Oligonucleotide Microarray Analysis (ROMA) detects genomic amplifications and deletions with boundaries defined at a resolution of approximately 50 kb. We have used this technique to examine 243 breast tumors from two separate studies for which detailed clinical data were available. The very high resolution of this technology has enabled us to identify three characteristic patterns of genomic copy number variation in diploid tumors and to measure correlations with patient survival. One of these patterns is characterized by multiple closely spaced amplicons, or "firestorms," limited to single chromosome arms. These multiple amplifications are highly correlated with aggressive disease and poor survival even when the rest of the genome is relatively quiet. Analysis of a selected subset of clinical material suggests that a simple genomic calculation, based on the number and proximity of genomic alterations, correlates with life-table estimates of the probability of overall survival in patients with primary breast cancer. Based on this sample, we generate the working hypothesis that copy number profiling might provide information useful in making clinical decisions, especially regarding the use or not of systemic therapies (hormonal therapy, chemotherapy), in the management of operable primary breast cancer with ostensibly good prognosis, for example, small, node-negative, hormone-receptor-positive diploid cases.

...read moreread less

Journal Article•DOI•

Phylogenetic analyses of cyanobacterial genomes: Quantification of horizontal gene transfer events

[...]

Olga Zhaxybayeva¹, J. Peter Gogarten, Robert L. Charlebois, W. Ford Doolittle, R. Thane Papke - Show less +1 more•Institutions (1)

Dalhousie University¹

Molecular evolution and tempo of amplification of human LINE-1 retrotransposons since the origin of primates.

TL;DR: Cyanobacterial genomes reveal a complex evolutionary history, which cannot be represented by a single strictly bifurcating tree for all genes or even most genes, although a single completely resolved phylogeny was recovered from the quartets' plurality signals.

...read moreread less

Abstract: Using 1128 protein-coding gene families from 11 completely sequenced cyanobacterial genomes, we attempt to quantify horizontal gene transfer events within cyanobacteria, as well as between cyanobacteria and other phyla. A novel method of detecting and enumerating potential horizontal gene transfer events within a group of organisms based on analyses of “embedded quartets” allows us to identify phylogenetic signal consistent with a plurality of gene families, as well as to delineate cases of conflict to the plurality signal, which include horizontally transferred genes. To infer horizontal gene transfer events between cyanobacteria and other phyla, we added homologs from 168 available genomes. We screened phylogenetic trees reconstructed for each of these extended gene families for highly supported monophyly of cyanobacteria (or lack of it). Cyanobacterial genomes reveal a complex evolutionary history, which cannot be represented by a single strictly bifurcating tree for all genes or even most genes, although a single completely resolved phylogeny was recovered from the quartets’ plurality signals. We find more conflicts within cyanobacteria than between cyanobacteria and other phyla. We also find that genes from all functional categories are subject to transfer. However, in interphylum as compared to intraphylum transfers, the proportion of metabolic (operational) gene transfers increases, while the proportion of informational gene transfers decreases.

...read moreread less

Journal Article•DOI•

[...]

Hameed Khan¹, Arian F.A. Smit², Stéphane Boissinot•Institutions (2)

City University of New York¹, Institute for Systems Biology²

Unbiased location analysis of E2F1-binding sites suggests a widespread role for E2F1 in the human genome

TL;DR: It is proposed that L1 families with different 5'UTR can coexist because they don't rely on the same host-encoded factors for their transcription and therefore do not compete with each other.

...read moreread less

Abstract: We investigated the evolution of the families of LINE-1 (L1) retrotransposons that have amplified in the human lineage since the origin of primates. We identified two phases in the evolution of L1. From approximately 70 million years ago (Mya) until approximately 40 Mya, three distinct L1 lineages were simultaneously active in the genome of ancestral primates. In contrast, during the last 40 million years (Myr), i.e., during the evolution of anthropoid primates, a single lineage of families has evolved and amplified. We found that novel (i.e., unrelated) regulatory regions (5'UTR) have been frequently recruited during the evolution of L1, whereas the two open-reading frames (ORF1 and ORF2) have remained relatively conserved. We found that L1 families coexisted and formed independently evolving L1 lineages only when they had different 5'UTRs. We propose that L1 families with different 5'UTR can coexist because they don't rely on the same host-encoded factors for their transcription and therefore do not compete with each other. The most prolific L1 families (families L1PA8 to L1PA3) amplified between 40 and 12 Mya. This period of high activity corresponds to an episode of adaptive evolution in a segment of ORF1. The correlation between the high activity of L1 families and adaptive evolution could result from the coevolution of L1 and a host-encoded repressor of L1 activity.

...read moreread less

Journal Article•DOI•

[...]

Mark Bieda¹, Xiaoqin Xu¹, Michael A. Singer², Roland Green², Peggy J. Farnham¹ - Show less +1 more•Institutions (2)

University of California, Davis¹, Hoffmann-La Roche²

01 May 2006-Genome Research

TL;DR: The results suggest that E 2F1 is recruited to promoters via a method distinct from recognition of the known consensus site and point toward a new understanding of E2F1 as a factor that contributes to the regulation of a large fraction of human genes.

...read moreread less

Abstract: The E2F family of transcription factors regulates basic cellular processes. Here, we take an unbiased approach towards identifying E2F1 target genes by examining localization of E2F1-binding sites using high-density oligonucleotide tiling arrays. To begin, we developed a statistically-based methodology for analysis of ChIP-chip data obtained from arrays that represent 30 Mb of the human genome. Using this methodology, we identified regions bound by E2F1, MYC, and RNA Polymerase II (POLR2A). We found a large number of binding sites for all three factors; extrapolation suggests there may be approximately 20,000-30,000 E2F1- and MYC-binding sites and approximately 12,000-17,000 active promoters in HeLa cells. In contrast to our results for MYC, we find that the majority of E2F1-binding sites (>80%) are located in core promoters and that 50% of the sites overlap transcription starts. Only a small fraction of E2F1 sites possess the canonical binding motif. Surprisingly, we found that approximately 30% of genes in the 30-Mb region possessed an E2F1 binding site in a core promoter and E2F1 was bound near to 83% of POLR2A-bound sites. To determine if these results were representative of the entire human genome, we performed ChIP-chip analyses of approximately 24,000 promoters and confirmed that greater than 20% of the promoters were bound by E2F1. Our results suggest that E2F1 is recruited to promoters via a method distinct from recognition of the known consensus site and point toward a new understanding of E2F1 as a factor that contributes to the regulation of a large fraction of human genes.

...read moreread less

Journal Article•DOI•

Massive genome erosion and functional adaptations provide insights into the symbiotic lifestyle of Sodalis glossinidius in the tsetse host

[...]

Hidehiro Toh¹, Brian L. Weiss², Sarah A.H. Perkin², Atsushi Yamashita¹, Kenshiro Oshima¹, Masahira Hattori¹, Serap Aksoy² - Show less +3 more•Institutions (2)

Kitasato University¹, Yale University²

01 Feb 2006-Genome Research

TL;DR: Sodalis represents an evolutionary intermediate transitioning from a free-living to a mutualistic lifestyle, and its chromosome encodes a complete flagella structure, key components of which are expressed in immature host developmental stages.

...read moreread less

Abstract: Sodalis glossinidius is a maternally transmitted endosymbiont of tsetse flies (Glossina spp.), an insect of medical and veterinary significance. Analysis of the complete sequence of Sodalis' chromosome (4,171,146 bp, encoding 2,432 protein coding sequences) indicates a reduced coding capacity of 51%. Furthermore, the chromosome contains 972 pseudogenes, an inordinately high number compared with that of other bacterial species. A high proportion of these pseudogenes are homologs of known proteins that function either in defense or in the transport and metabolism of carbohydrates and inorganic ions, suggesting Sodalis' degenerative adaptations to the immunity and restricted nutritional status of the host. Sodalis possesses three chromosomal symbiosis regions (SSR): SSR-1, SSR-2, and SSR-3, with gene inventories similar to the Type-III secretion system (TTSS) ysa from Yersinia enterolitica and SPI-1 and SPI-2 from Salmonella, respectively. While core components of the needle structure have been conserved, some of the effectors and regulators typically associated with these systems in pathogenic microbes are modified or eliminated in Sodalis. Analysis of SSR-specific invA transcript abundance in Sodalis during host development indicates that the individual symbiosis regions may exhibit different temporal expression profiles. In addition, the Sodalis chromosome encodes a complete flagella structure, key components of which are expressed in immature host developmental stages. These features may be important for the transmission and establishment of symbiont infections in the intra-uterine progeny. The data suggest that Sodalis represents an evolutionary intermediate transitioning from a free-living to a mutualistic lifestyle.

...read moreread less

Journal Article•DOI•

Skewed genomic variability in strains of the toxigenic bacterial pathogen, Clostridium perfringens

[...]

Garry S. A. Myers¹, David A. Rasko¹, David A. Rasko², Jackie K. Cheung³, Jacques Ravel¹, Rekha Seshadri¹, Robert T. DeBoy¹, Qinghu Ren¹, John J. Varga⁴, Milena M. Awad³, Lauren M. Brinkac¹, Sean C. Daugherty¹, Daniel H. Haft¹, Robert J. Dodson¹, Ramana Madupu¹, William C. Nelson¹, M. J. Rosovitz¹, Steven A. Sullivan¹, Hoda Khouri¹, George Dimitrov¹, Kisha Watkins¹, Stephanie Mulligan¹, Jonathan L. Benton¹, Diana Radune¹, Derek J. Fisher⁵, Helen S. Atkins⁶, Thomas J. Hiscox³, B. Helen Jost⁷, Stephen J. Billington⁷, J. Glenn Songer⁷, Bruce A. McClane⁵, Richard W. Titball⁶, Julian I. Rood³, Stephen B. Melville⁴, Ian T. Paulsen¹ - Show less +31 more•Institutions (7)

J. Craig Venter Institute¹, University of Texas Southwestern Medical Center², Monash University³, Virginia Tech⁴, University of Pittsburgh⁵, Defence Science and Technology Laboratory⁶, University of Arizona⁷