scispace - formally typeset
Search or ask a question

Showing papers by "Michael Snyder published in 2008"


Journal ArticleDOI
06 Jun 2008-Science
TL;DR: A quantitative sequencing-based method is developed for mapping transcribed regions, in which complementary DNA fragments are subjected to high-throughput sequencing and mapped to the genome, and it is demonstrated that most (74.5%) of the nonrepetitive sequence of the yeast genome is transcribed.
Abstract: The identification of untranslated regions, introns, and coding regions within an organism remains challenging. We developed a quantitative sequencing-based method called RNA-Seq for mapping transcribed regions, in which complementary DNA fragments are subjected to high-throughput sequencing and mapped to the genome. We applied RNA-Seq to generate a high-resolution transcriptome map of the yeast genome and demonstrated that most (74.5%) of the nonrepetitive sequence of the yeast genome is transcribed. We confirmed many known and predicted introns and demonstrated that others are not actively used. Alternative initiation codons and upstream open reading frames also were identified for many yeast genes. We also found unexpected 3'-end heterogeneity and the presence of many overlapping genes. These results indicate that the yeast transcriptome is more complex than previously appreciated.

2,506 citations


Journal ArticleDOI
03 Oct 2008-Science
TL;DR: A comparative quality assessment of current yeast interactome data sets is carried out, demonstrating that high-throughput yeast two-hybrid (Y2H) screening provides high-quality binary interaction information.
Abstract: Current yeast interactome network maps contain several hundred molecular complexes with limited and somewhat controversial representation of direct binary interactions. We carried out a comparative quality assessment of current yeast interactome data sets, demonstrating that high-throughput yeast two-hybrid (Y2H) screening provides high-quality binary interaction information. Because a large fraction of the yeast binary interactome remains to be mapped, we developed an empirically controlled mapping framework to produce a "second-generation" high-quality, high-throughput Y2H data set covering approximately 20% of all yeast binary interactions. Both Y2H and affinity purification followed by mass spectrometry (AP/MS) data are of equally high quality but of a fundamentally different and complementary nature, resulting in networks with different topological and biological properties. Compared to co-complex interactome models, this binary map is enriched for transient signaling interactions and intercomplex connections with a highly significant clustering between essential proteins. Rather than correlating with essentiality, protein connectivity correlates with genetic pleiotropy.

1,452 citations


Journal ArticleDOI
TL;DR: This work globally identified the promoters occupied by SREBP1 and its binding partners NFY and SP1 in a human hepatocyte cell line using chromatin immunoprecipitation combined with genome tiling arrays (ChIP-chip).
Abstract: The sterol regulatory element-binding protein (SREBP) family member SREBP1 is a critical transcriptional regulator of cholesterol and fatty acid metabolism and has been implicated in insulin resistance, diabetes, and other diet-related diseases. We globally identified the promoters occupied by SREBP1 and its binding partners NFY and SP1 in a human hepatocyte cell line using chromatin immunoprecipitation combined with genome tiling arrays (ChIP-chip). We find that SREBP1 occupies the promoters of 1,141 target genes involved in diverse biological pathways, including novel targets with roles in lipid metabolism and insulin signaling. We also identify a conserved SREBP1 DNA-binding motif in SREBP1 target promoters, and we demonstrate that many SREBP1 target genes are transcriptionally activated by treatment with insulin and glucose using gene expression microarrays. Finally, we show that SREBP1 cooperates extensively with NFY and SP1 throughout the genome and that unique combinations of these factors target distinct functional pathways. Our results provide insight into the regulatory circuitry in which SREBP1 and its network partners coordinate a complex transcriptional response in the liver with cues from the diet.

201 citations


Journal ArticleDOI
TL;DR: For a large majority of the locations of stimulated STAT1 binding, the dominant H3K4me1/me3 combinations were established before activation, suggesting mechanisms independent of IFNG stimulation and high-affinitySTAT1 binding.
Abstract: We characterized the relationship of H3K4me1 and H3K4me3 at distal and proximal regulatory elements by comparing ChIP-seq profiles for these histone modifications and for two functionally different transcription factors: STAT1 in the immortalized HeLa S3 cell line, with and without interferon-gamma (IFNG) stimulation; and FOXA2 in mouse adult liver tissue. In unstimulated and stimulated HeLa cells, respectively, we determined approximately 270,000 and approximately 301,000 H3K4me1-enriched regions, and approximately 54,500 and approximately 76,100 H3K4me3-enriched regions. In mouse adult liver, we determined approximately 227,000 and approximately 34,800 H3K4me1 and H3K4me3 regions. Seventy-five percent of the approximately 70,300 STAT1 binding sites in stimulated HeLa cells and 87% of the approximately 11,000 FOXA2 sites in mouse liver were distal to known gene TSS; in both cell types, approximately 83% of these distal sites were associated with at least one of the two histone modifications, and H3K4me1 was associated with over 96% of marked distal sites. After filtering against predicted transcription start sites, 50% of approximately 26,800 marked distal IFNG-stimulated STAT1 binding sites, but 95% of approximately 5800 marked distal FOXA2 sites, were associated with H3K4me1 only. Results for HeLa cells generated additional insights into transcriptional regulation involving STAT1. STAT1 binding was associated with 25% of all H3K4me1 regions in stimulated HeLa cells, suggesting that a single transcription factor can interact with an unexpectedly large fraction of regulatory regions. Strikingly, for a large majority of the locations of stimulated STAT1 binding, the dominant H3K4me1/me3 combinations were established before activation, suggesting mechanisms independent of IFNG stimulation and high-affinity STAT1 binding.

198 citations


Journal ArticleDOI
TL;DR: The results imply that a shift in predominant formation mechanism occurred in recent history: approximately 40 million years ago, during the "Alu burst" in retrotransposition activity, non-allelic homologous recombination was the main driver of genome rearrangements.
Abstract: Segmental duplications (SDs) are operationally defined as >1 kb stretches of duplicated DNA with high sequence identity. They arise from copy number variants (CNVs) fixed in the population. To investigate the formation of SDs and CNVs, we examine their large-scale patterns of co-occurrence with different repeats. Alu elements, a major class of genomic repeats, had previously been identified as prime drivers of SD formation. We also observe this association; however, we find that it sharply decreases for younger SDs. Continuing this trend, we find only weak associations of CNVs with Alus. Similarly, we find an association of SDs with processed pseudogenes, which is decreasing for younger SDs and absent entirely for CNVs. Next, we find that SDs are significantly co-localized with each other, resulting in a highly skewed "power-law" distribution and chromosomal hotspots. We also observe a significant association of CNVs with SDs, but find that an SD-mediated mechanism only accounts for some CNVs (<28%). Overall, our results imply that a shift in predominant formation mechanism occurred in recent history: approximately 40 million years ago, during the "Alu burst" in retrotransposition activity, non-allelic homologous recombination, first mediated by Alus and then the by newly formed CNVs themselves, was the main driver of genome rearrangements; however, its relative importance has decreased markedly since then, with proportionally more events now stemming from other repeats and from non-homologous end-joining. In addition to a coarse-grained analysis, we performed targeted sequencing of 67 CNVs and then analyzed a combined set of 270 CNVs (540 breakpoints) to verify our conclusions.

150 citations


Journal ArticleDOI
TL;DR: The first objective analysis of tiling array platforms, amplification procedures, and signal detection algorithms in a simulated ChIP-chip experiment is conducted, finding that microarray platform choice is not the primary determinant of overall performance.
Abstract: The most widely used method for detecting genome-wide protein–DNA interactions is chromatin immunoprecipitation on tiling microarrays, commonly known as ChIP-chip. Here, we conducted the first objective analysis of tiling array platforms, amplification procedures, and signal detection algorithms in a simulated ChIP-chip experiment. Mixtures of human genomic DNA and “spike-ins” comprised of nearly 100 human sequences at various concentrations were hybridized to four tiling array platforms by eight independent groups. Blind to the number of spike-ins, their locations, and the range of concentrations, each group made predictions of the spike-in locations. We found that microarray platform choice is not the primary determinant of overall performance. In fact, variation in performance between labs, protocols, and algorithms within the same array platform was greater than the variation in performance between array platforms. However, each array platform had unique performance characteristics that varied with tiling resolution and the number of replicates, which have implications for cost versus detection power. Long oligonucleotide arrays were slightly more sensitive at detecting very low enrichment. On all platforms, simple sequence repeats and genome redundancy tended to result in false positives. LM-PCR and WGA, the most popular sample amplification techniques, reproduced relative enrichment levels with high fidelity. Performance among signal detection algorithms was heavily dependent on array platform. The spike-in DNA samples and the data presented here provide a stable benchmark against which future ChIP platforms, protocol improvements, and analysis methods can be evaluated.

143 citations


Journal ArticleDOI
TL;DR: CNVs tend to affect specific gene functional categories, such as those associated with environmental response, and are depleted in genes related to basic cellular processes, which means they occur more often at the periphery of the protein interaction network.

114 citations


Journal ArticleDOI
TL;DR: It is observed that CNVs are more frequent among OR pseudogenes than among intact genes, presumably due to both selective constraints and CNV formation biases, and results show an enrichment of CNVs among ORs with a close human paralog or lacking a one-to-one ortholog in chimpanzee, a finding potentially related to the known diminution of the human OR repertoire.
Abstract: Olfactory receptors (ORs), which are involved in odorant recognition, form the largest mammalian protein superfamily. The genomic contentof ORgenes isconsiderably reducedin humans,asreflected bythe relatively small repertoire sizeand the high fraction (,55%) of human pseudogenes. Since several recent low-resolution surveys suggested that OR genomic loci are frequently affected by copy-number variants (CNVs), we hypothesized that CNVs may play an important role in the evolution of the human olfactory repertoire. We used high-resolution oligonucleotide tiling microarrays to detect CNVs across 851 OR gene and pseudogene loci. Examining genomic DNA from 25 individuals with ancestry from three populations, we identified 93 OR gene loci and 151 pseudogene loci affected by CNVs, generating a mosaic of OR dosages across persons. Our data suggest that ,50% of the CNVs involve more than one OR, with the largest CNV spanning 11 loci. In contrast to earlier reports, we observe that CNVs are more frequent among OR pseudogenes than among intact genes, presumably due to both selective constraints and CNV formation biases. Furthermore, our results show an enrichment of CNVs among ORs with a close human paralog or lacking a one-to-one ortholog in chimpanzee. Interestingly, among the latter we observed an enrichment in CNV losses over gains, a finding potentially related to the known diminution of the human OR repertoire. Quantitative PCR experiments performed for 122 sampled ORs agreed well with the microarray results and uncovered 23 additional CNVs. Importantly, these experiments allowed us touncover nine common deletion alleles that affect 15 ORgenes and five pseudogenes. Comparison to the chimpanzee reference genome revealed that all of the deletion alleles are human derived, therefore indicating a profound effect of human-specific deletions on the individual OR gene content. Furthermore, these deletion alleles may be used in future genetic association studies of olfactory inter-individual differences.

107 citations


Journal ArticleDOI
TL;DR: A method using phi29 DNA polymerase and trehalose and optimized control of amplification to create micrograms of specific amplicons without TIPs from down to subfemtograms of DNA, and generates products suitable for analysis by massively parallel sequencing as well as microarray hybridization.
Abstract: Highly specific amplification of complex DNA pools without bias or template-independent products (TIPs) remains a challenge. We have developed a method using phi29 DNA polymerase and trehalose and optimized control of amplification to create micrograms of specific amplicons without TIPs from down to subfemtograms of DNA. With an input of as little as 0.5-2.5 ng of human gDNA or a few cells, the product could be close to native DNA in locus representation. The amplicons from 5 and 0.5 ng of DNA faithfully demonstrated all previously known heterozygous segmental duplications and deletions (3 Mb to 18 kb) located on chromosome 22 and even a homozygous deletion smaller than 1 kb with high-resolution chromosome-wide comparative genomic hybridization. With 550k Infinium BeadChip SNP typing, the >99.7% accuracy was compared favorably with results on unamplified DNA. Importantly, underrepresentation of chromosome termini that occurred with GenomiPhi v2 was greatly rescued with the present procedure, and the call rate and accuracy of SNP typing were also improved for the amplicons with a 0.5-ng, partially degraded DNA input. In addition, the amplification proceeded logarithmically in terms of total yield before saturation; the intact cells was amplified >50 times more efficiently than an equivalent amount of extracted DNA; and the locus imbalance for amplicons with 0.1 ng or lower input of DNA was variable, whereas for higher input it was largely reproducible. This procedure facilitates genomic analysis with single cells or other traces of DNA, and generates products suitable for analysis by massively parallel sequencing as well as microarray hybridization.

106 citations


Journal ArticleDOI
Zhengdong D. Zhang1, Joel Rozowsky1, Michael Snyder1, Jin Chang1, Mark Gerstein1 
TL;DR: The results show that both the background and the binding sites need to have a markedly nonuniform distribution in order to correctly model the observed ChIP-seq data, with, for instance, the background tag counts modeled by a gamma distribution.
Abstract: ChIP sequencing (ChIP-seq) is a new method for genomewide mapping of protein binding sites on DNA. It has generated much excitement in functional genomics. To score data and determine adequate sequencing depth, both the genomic background and the binding sites must be properly modeled. To develop a computational foundation to tackle these issues, we first performed a study to characterize the observed statistical nature of this new type of high-throughput data. By linking sequence tags into clusters, we show that there are two components to the distribution of tag counts observed in a number of recent experiments: an initial power-law distribution and a subsequent long right tail. Then we develop in silico ChIP-seq, a computational method to simulate the experimental outcome by placing tags onto the genome according to particular assumed distributions for the actual binding sites and for the background genomic sequence. In contrast to current assumptions, our results show that both the background and the binding sites need to have a markedly nonuniform distribution in order to correctly model the observed ChIP-seq data, with, for instance, the background tag counts modeled by a gamma distribution. On the basis of these results, we extend an existing scoring approach by using a more realistic genomic-background model. This enables us to identify transcription-factor binding sites in ChIP-seq data in a statistically rigorous fashion.

105 citations


Journal ArticleDOI
TL;DR: It is shown that protein microarrays provide an efficient and high throughput tool for genome-wide analysis of TF-DNA interactions and offer an attractive high-throughput alternative to traditional techniques for TF functional characterization on a global scale.

Journal ArticleDOI
TL;DR: RACE sequencing is an efficient, sensitive, and highly accurate method for characterization of the transcriptome of specific cell/tissue types, and it appears that much of the genome is represented in polyA+ RNA.
Abstract: Background: Recent studies of the mammalian transcriptome have revealed a large number of additional transcribed regions and extraordinary complexity in transcript diversity. However, there is still much uncertainty regarding precisely what portion of the genome is transcribed, the exact structures of these novel transcripts, and the levels of the transcripts produced. Results: We have interrogated the transcribed loci in 420 selected ENCyclopedia Of DNA Elements (ENCODE) regions using rapid amplification of cDNA ends (RACE) sequencing. We analyzed annotated known gene regions, but primarily we focused on novel transcriptionally active regions (TARs), which were previously identified by high-density oligonucleotide tiling arrays and on random regions that were not believed to be transcribed. We found RACE sequencing to be very sensitive and were able to detect low levels of transcripts in specific cell types that were not detectable by microarrays. We also observed many instances of sense-antisense transcripts; further analysis suggests that many of the antisense transcripts (but not all) may be artifacts generated from the reverse transcription reaction. Our results show that the majority of the novel TARs analyzed (60%) are connected to other novel TARs or known exons. Of previously unannotated random regions, 17% were shown to produce overlapping transcripts. Furthermore, it is estimated that 9% of the novel transcripts encode proteins. Conclusion: We conclude that RACE sequencing is an efficient, sensitive, and highly accurate method for characterization of the transcriptome of specific cell/tissue types. Using this method, it appears that much of the genome is represented in polyA+ RNA. Moreover, a fraction of the novel RNAs can encode protein and are likely to be functional.

Journal ArticleDOI
TL;DR: This study reveals extensive sites of poly(A) addition and provides insights into the events that occur during 3' end formation, and suggests transcription termination loss of histone 3 lysine 36 methylation and later release of RNA polymerase.
Abstract: Genomic analyses have been applied extensively to analyze the process of transcription initiation in mammalian cells, but less to transcript 3′ end formation and transcription termination. We used a novel approach to prepare 3′ end fragments from polyadenylated RNA, and mapped the position of the poly(A) addition site using oligonucleotide arrays tiling 1% of the human genome. This approach revealed more 3′ ends than had been annotated. The distribution of these ends relative to RNA polymerase II (PolII) and di- and trimethylated lysine 4 and lysine 36 of histone H3 was compared. A substantial fraction of unannotated 3′ ends of RNA are intronic and antisense to the embedding gene. Poly(A) ends of annotated messages lie on average 2 kb upstream of the end of PolII binding (termination). Near the termination sites, and in some internal sites, unphosphorylated and C-terminal domain (CTD) serine 2 phosphorylated PolII (POLR2A) accumulate, suggesting pausing of the polymerase and perhaps dephosphorylation prior to release. Lysine 36 trimethylation occurs across transcribed genes, sometimes alternating with stretches of DNA in which lysine 36 dimethylation is more prominent. Lysine 36 methylation decreases at or near the site of polyadenylation, sometimes disappearing before disappearance of phosphorylated RNA PolII or release of PolII from DNA. Our results suggest that transcription termination loss of histone 3 lysine 36 methylation and later release of RNA polymerase. The latter is often associated with polymerase pausing. Overall, our study reveals extensive sites of poly(A) addition and provides insights into the events that occur during 3′ end formation.

Journal ArticleDOI
TL;DR: Genome-wide surveys of the phenomenon suggest that it is likely to be a rate-limiting control on gene activation that poises developmental and stimulus-responsive genes for prompt expression when inducing signals are received.
Abstract: Stalling of RNA polymerase II near the promoter has recently been found to be much more common than previously thought. Genome-wide surveys of the phenomenon suggest that it is likely to be a rate-limiting control on gene activation that poises developmental and stimulus-responsive genes for prompt expression when inducing signals are received.

Journal ArticleDOI
01 Mar 2008-RNA
TL;DR: P-body disassembly is delayed in the motor mutant, myo2-66, indicating that Myo2p may facilitate the release of mRNAs from the P-body.
Abstract: Myo2p is an essential class V myosin in budding yeast with several identified functions in organelle trafficking and spindle orientation. The present study demonstrates that Myo2p is a component of a large RNA-containing complex (Myo2p–RNP) that is distinct from polysomes based on sedimentation analysis and lack of ribosomal subunits in the Myo2p–RNP. Microarray analysis of RNAs that coimmunoprecipitate with Myo2p revealed the presence of a large number of mRNAs in this complex. The Myo2p–RNA complex is in part composed of the RNA processing body (P-body) based on coprecipitation with P-body protein subunits and partial colocalization of Myo2p with P-bodies. P-body disassembly is delayed in the motor mutant, myo2-66, indicating that Myo2p may facilitate the release of mRNAs from the P-body.

Journal ArticleDOI
TL;DR: Leptin is added to the repertoire of factors that mediate EMT and a role for the interleukin 6 family is demonstrated for the first time in embryonic EMT, suggesting that the leptin signaling pathway is involved in adhesion and migration processes.
Abstract: Blood circulation is dependent on heart valves to direct blood flow through the heart and great vessels. Valve development relies on epithelial to mesenchymal transition (EMT), a central feature of embryonic development and metastatic cancer. Abnormal EMT and remodeling contribute to the etiology of several congenital heart defects. Leptin and its receptor were detected in the mouse embryonic heart. Using an ex vivo model of cardiac EMT, the inhibition of leptin results in a signal transducer and activator of transcription 3 and Snail/vascular endothelial cadherin–independent decrease in EMT and migration. Our data suggest that an Akt signaling pathway underlies the observed phenotype. Furthermore, loss of leptin phenocopied the functional inhibition of αvβ3 integrin receptor and resulted in decreased αvβ3 integrin and matrix metalloprotease 2, suggesting that the leptin signaling pathway is involved in adhesion and migration processes. This study adds leptin to the repertoire of factors that mediate EMT and, for the first time, demonstrates a role for the interleukin 6 family in embryonic EMT.

Journal ArticleDOI
TL;DR: It is found that nonspecific binding by MM oligos depends upon the individual nucleotide substitutions they incorporate: C→A, C→G and T→A (yielding purine-purine mispairs) are most disruptive, whereas A→X were least disruptive.
Abstract: Background Mismatched oligonucleotides are widely used on microarrays to differentiate specific from nonspecific hybridization. While many experiments rely on such oligos, the hybridization behavior of various degrees of mismatch (MM) structure has not been extensively studied. Here, we present the results of two large-scale microarray experiments on S. cerevisiae and H. sapiens genomic DNA, to explore MM oligonucleotide behavior with real sample mixtures under tiling-array conditions.

Patent
20 Aug 2008
TL;DR: In this article, the authors proposed a method for using proteome chips to systematically assay all protein interactions in a species, in a high-throughput system, in which an array containing a plurality of proteins capable of designating position existing on the different positions on the substrate contains at least one protein coded by at least 50% of known gene in a single species.
Abstract: PROBLEM TO BE SOLVED: To provide a production method relating to a proteome chips comprising arrays, having a large proportion of all proteins expressed in a single species, and to provide a method for using proteome chips to systematically assay all protein interactions in a species, in a high-throughput system. SOLUTION: An array containing a plurality of proteins capable of designating position existing, respectively on the different positions on the substrate contains at least one protein coded by at least 50% of known gene in single species. COPYRIGHT: (C)2009,JPO&INPIT