scispace - formally typeset
Search or ask a question

Showing papers by "Michael Snyder published in 2009"


Journal ArticleDOI
TL;DR: The RNA-Seq approach to transcriptome profiling that uses deep-sequencing technologies provides a far more precise measurement of levels of transcripts and their isoforms than other methods.
Abstract: RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.

11,528 citations


Journal ArticleDOI
18 Jun 2009-Nature
TL;DR: In this paper, a look at the crucial functional elements of fly and worm genomes could change the way genetic information produces complex organisms, and the results showed that the functional elements were crucial for the evolution of complex organisms.
Abstract: Despite the successes of genomics, little is known about how genetic information produces complex organisms. A look at the crucial functional elements of fly and worm genomes could change that.

771 citations


Journal Article
TL;DR: This issue, modENCODE team members outline their plan of campaign and data from the project are to be made available on http://www.modencode.org and elsewhere as the work progresses.
Abstract: Despite the successes of genomics, little is known about how genetic information produces complex organisms. A look at the crucial functional elements of fly and worm genomes could change that.

767 citations


Journal ArticleDOI
TL;DR: A general scoring approach to address unique challenges in ChIP-seq data analysis is described, based on the observation that sites of potential binding are strongly correlated with signal peaks in the control, likely revealing features of open chromatin.
Abstract: Chromatin immunoprecipitation (ChIP) followed by tag sequencing (ChIP-seq) using high-throughput next-generation instrumentation is fast, replacing chromatin immunoprecipitation followed by genome tiling array analysis (ChIP-chip) as the preferred approach for mapping of sites of transcription-factor binding and chromatin modification. Using two deeply sequenced data sets for human RNA polymerase II and STAT1, each with matching input-DNA controls, we describe a general scoring approach to address unique challenges in ChIP-seq data analysis. Our approach is based on the observation that sites of potential binding are strongly correlated with signal peaks in the control, likely revealing features of open chromatin. We develop a two-pass strategy called PeakSeq to compensate for this. A two-pass strategy compensates for signal caused by open chromatin, as revealed by inclusion of the controls. The first pass identifies putative binding sites and compensates for genomic variation in the 'mappability' of sequences. The second pass filters out sites not significantly enriched compared to the normalized control, computing precise enrichments and significances. Our scoring procedure enables us to optimize experimental design by estimating the depth of sequencing required for a desired level of coverage and demonstrating that more than two replicates provides only a marginal gain in information.

607 citations


Journal ArticleDOI
TL;DR: The predicted MKK-MPK phosphorylation network constitutes a valuable resource to understand the function and specificity of MPK signaling systems.
Abstract: Signaling through mitogen-activated protein kinases (MPKs) cascades is a complex and fundamental process in eukaryotes, requiring MPK-activating kinases (MKKs) and MKK-activating kinases (MKKKs). However, to date only a limited number of MKK–MPK interactions and MPK phosphorylation substrates have been revealed. We determined which Arabidopsis thaliana MKKs preferentially activate 10 different MPKs in vivo and used the activated MPKs to probe high-density protein microarrays to determine their phosphorylation targets. Our analyses revealed known and novel signaling modules encompassing 570 MPK phosphorylation substrates; these substrates were enriched in transcription factors involved in the regulation of development, defense, and stress responses. Selected MPK substrates were validated by in planta reconstitution experiments. A subset of activated and wild-type MKKs induced cell death, indicating a possible role for these MKKs in the regulation of cell death. Interestingly, MKK7- and MKK9-induced death requires Sgt1, a known regulator of cell death induced during plant innate immunity. Our predicted MKK–MPK phosphorylation network constitutes a valuable resource to understand the function and specificity of MPK signaling systems.

468 citations


Journal ArticleDOI
TL;DR: The results argue against a genomic code for nucleosome positioning, and they suggest that the nucleosomal pattern in coding regions arises primarily from statistical positioning from a barrier near the promoter that involves some aspect of transcriptional initiation by RNA polymerase II.
Abstract: We assess the role of intrinsic histone-DNA interactions by mapping nucleosomes assembled in vitro on genomic DNA. Nucleosomes strongly prefer yeast DNA over Escherichia coli DNA, indicating that the yeast genome evolved to favor nucleosome formation. Many yeast promoter and terminator regions intrinsically disfavor nucleosome formation, and nucleosomes assembled in vitro show strong rotational positioning. Nucleosome arrays generated by the ACF assembly factor have fewer nucleosome-free regions, reduced rotational positioning and less translational positioning than obtained by intrinsic histone-DNA interactions. Notably, nucleosomes assembled in vitro have only a limited preference for specific translational positions and do not show the pattern observed in vivo. Our results argue against a genomic code for nucleosome positioning, and they suggest that the nucleosomal pattern in coding regions arises primarily from statistical positioning from a barrier near the promoter that involves some aspect of transcriptional initiation by RNA polymerase II.

381 citations


Journal ArticleDOI
TL;DR: A high-resolution genetic map of DS phenotypes based on an analysis of 30 subjects carrying rare segmental trisomies of various regions of HSA21 is presented, demonstrating the value of combining advanced genomics with cohorts of rare patients for studying DS, a prototype for the role of copy-number variation in complex disease.
Abstract: Down syndrome (DS), or trisomy 21, is a common disorder associated with several complex clinical phenotypes. Although several hypotheses have been put forward, it is unclear as to whether particular gene loci on chromosome 21 (HSA21) are sufficient to cause DS and its associated features. Here we present a high-resolution genetic map of DS phenotypes based on an analysis of 30 subjects carrying rare segmental trisomies of various regions of HSA21. By using state-of-the-art genomics technologies we mapped segmental trisomies at exon-level resolution and identified discrete regions of 1.8-16.3 Mb likely to be involved in the development of 8 DS phenotypes, 4 of which are congenital malformations, including acute megakaryocytic leukemia, transient myeloproliferative disorder, Hirschsprung disease, duodenal stenosis, imperforate anus, severe mental retardation, DS-Alzheimer Disease, and DS-specific congenital heart disease (DSCHD). Our DS-phenotypic maps located DSCHD to a <2-Mb interval. Furthermore, the map enabled us to present evidence against the necessary involvement of other loci as well as specific hypotheses that have been put forward in relation to the etiology of DS-i.e., the presence of a single DS consensus region and the sufficiency of DSCR1 and DYRK1A, or APP, in causing several severe DS phenotypes. Our study demonstrates the value of combining advanced genomics with cohorts of rare patients for studying DS, a prototype for the role of copy-number variation in complex disease.

356 citations


Journal ArticleDOI
12 Mar 2009-Blood
TL;DR: Findings suggest that HOTAIRM1 plays a role in the myelopoiesis through modulation of gene expression in the HOXA cluster.

323 citations


Journal ArticleDOI
TL;DR: Paired-End Mapper demonstrated high structural variant reconstruction efficiency for PEMer's coverage-adjusted multi-cutoff scoring-strategy and showed its relative insensitivity to base-calling errors.
Abstract: Personal-genomics endeavors, such as the 1000 Genomes project, are generating maps of genomic structural variants by analyzing ends of massively sequenced genome fragments. To process these we developed Paired-End Mapper (PEMer; http://sv.gersteinlab.org/pemer). This comprises an analysis pipeline, compatible with several next-generation sequencing platforms; simulation-based error models, yielding confidence-values for each structural variant; and a back-end database. The simulations demonstrated high structural variant reconstruction efficiency for PEMer's coverage-adjusted multi-cutoff scoring-strategy and showed its relative insensitivity to base-calling errors.

285 citations


Journal ArticleDOI
TL;DR: The results demonstrate that Sono-Seq can be a useful and simple method by which to map many local alterations in chromatin structure and provide insights into the mapping of binding sites by using ChIP–Seq experiments and the value of reference samples that should be used in such experiments.
Abstract: Disruptions in local chromatin structure often indicate features of biological interest such as regulatory regions. We find that sonication of cross-linked chromatin, when combined with a size-selection step and massively parallel short-read sequencing, can be used as a method (Sono-Seq) to map locations of high chromatin accessibility in promoter regions. Sono-Seq sites frequently correspond to actively transcribed promoter regions, as evidenced by their co-association with RNA Polymerase II ChIP regions, transcription start sites, histone H3 lysine 4 trimethylation (H3K4me3) marks, and CpG islands; signals over other sites, such as those bound by the CTCF insulator, are also observed. The pattern of breakage by Sono-Seq overlaps with, but is distinct from, that observed for FAIRE and DNase I hypersensitive sites. Our results demonstrate that Sono-Seq can be a useful and simple method by which to map many local alterations in chromatin structure. Furthermore, our results provide insights into the mapping of binding sites by using ChIP-Seq experiments and the value of reference samples that should be used in such experiments.

219 citations


Journal ArticleDOI
TL;DR: This work introduces an approach that employs correlation and regression to relate multiple, continuously varying factors defining an environment to the extent of particular microbial pathways present in a geographic site, and defines an ensemble of weighted pathways that maximally covaries with a combination of environmental variables (many-to-many).
Abstract: Recently, approaches have been developed to sample the genetic content of heterogeneous environments (metagenomics). However, by what means these sequences link distinct environmental conditions with specific biological processes is not well understood. Thus, a major challenge is how the usage of particular pathways and subnetworks reflects the adaptation of microbial communities across environments and habitats—i.e., how network dynamics relates to environmental features. Previous research has treated environments as discrete, somewhat simplified classes (e.g., terrestrial vs. marine), and searched for obvious metabolic differences among them (i.e., treating the analysis as a typical classification problem). However, environmental differences result from combinations of many factors, which often vary only slightly. Therefore, we introduce an approach that employs correlation and regression to relate multiple, continuously varying factors defining an environment to the extent of particular microbial pathways present in a geographic site. Moreover, rather than looking only at individual correlations (one-to-one), we adapted canonical correlation analysis and related techniques to define an ensemble of weighted pathways that maximally covaries with a combination of environmental variables (many-to-many), which we term a metabolic footprint. Applied to available aquatic datasets, we identified footprints predictive of their environment that can potentially be used as biosensors. For example, we show a strong multivariate correlation between the energy-conversion strategies of a community and multiple environmental gradients (e.g., temperature). Moreover, we identified covariation in amino acid transport and cofactor synthesis, suggesting that limiting amounts of cofactor can (partially) explain increased import of amino acids in nutrient-limited conditions.

Journal ArticleDOI
TL;DR: A multiplex short-read DNA sequencing method to perform efficient ChIP-Seq in yeast and other small genome model organisms that produces accurate results with higher throughput and reduced cost.
Abstract: Short-read high-throughput DNA sequencing technologies provide new tools to answer biological questions. However, high cost and low throughput limit their widespread use, particularly in organisms with smaller genomes such as S. cerevisiae. Although ChIP-Seq in mammalian cell lines is replacing array-based ChIP-chip as the standard for transcription factor binding studies, ChIP-Seq in yeast is still underutilized compared to ChIP-chip. We developed a multiplex barcoding system that allows simultaneous sequencing and analysis of multiple samples using Illumina's platform. We applied this method to analyze the chromosomal distributions of three yeast DNA binding proteins (Ste12, Cse4 and RNA PolII) and a reference sample (input DNA) in a single experiment and demonstrate its utility for rapid and accurate results at reduced costs. We developed a barcoding ChIP-Seq method for the concurrent analysis of transcription factor binding sites in yeast. Our multiplex strategy generated high quality data that was indistinguishable from data obtained with non-barcoded libraries. None of the barcoded adapters induced differences relative to a non-barcoded adapter when applied to the same DNA sample. We used this method to map the binding sites for Cse4, Ste12 and Pol II throughout the yeast genome and we found 148 binding targets for Cse4, 823 targets for Ste12 and 2508 targets for PolII. Cse4 was strongly bound to all yeast centromeres as expected and the remaining non-centromeric targets correspond to highly expressed genes in rich media. The presence of Cse4 non-centromeric binding sites was not reported previously. We designed a multiplex short-read DNA sequencing method to perform efficient ChIP-Seq in yeast and other small genome model organisms. This method produces accurate results with higher throughput and reduced cost. Given constant improvements in high-throughput sequencing technologies, increasing multiplexing will be possible to further decrease costs per sample and to accelerate the completion of large consortium projects such as modENCODE.

Journal ArticleDOI
20 Aug 2009-PLOS ONE
TL;DR: It is reported that chromatin structure also affects laboratory DNA manipulation in ways that distort the results of chromatin immunoprecipitation experiments, and the pervasiveness of this bias genome-wide is established and it is suggested that this bias can be used to detect differences in chromatin structures across the genome.
Abstract: Chromatin has an impact on recombination, repair, replication, and evolution of DNA. Here we report that chromatin structure also affects laboratory DNA manipulation in ways that distort the results of chromatin immunoprecipitation (ChIP) experiments. We initially discovered this effect at the Saccharomyces cerevisiae HMR locus, where we found that silenced chromatin was refractory to shearing, relative to euchromatin. Using input samples from ChIP-Seq studies, we detected a similar bias throughout the heterochromatic portions of the yeast genome. We also observed significant chromatin-related effects at telomeres, protein binding sites, and genes, reflected in the variation of input-Seq coverage. Experimental tests of candidate regions showed that chromatin influenced shearing at some loci, and that chromatin could also lead to enriched or depleted DNA levels in prepared samples, independently of shearing effects. Our results suggested that assays relying on immunoprecipitation of chromatin will be biased by intrinsic differences between regions packaged into different chromatin structures - biases which have been largely ignored to date. These results established the pervasiveness of this bias genome-wide, and suggested that this bias can be used to detect differences in chromatin structures across the genome.

Journal ArticleDOI
TL;DR: Analyzing the temporal order of binding of several key factors involved in the salt response of yeast to their target genes reveals a complex dynamic and hierarchical circuit in which specific combinations of transcription factors target distinct sets of genes at discrete times to coordinate a rapid and important biological response.
Abstract: Complex biological processes are often regulated, at least in part, by the binding of transcription factors to their targets. Recently, considerable effort has been made to analyze the binding of relevant factors to the suite of targets they regulate, thereby generating a regulatory circuit map. However, for most studies the dynamics of binding have not been analyzed, and thus the temporal order of events and mechanisms by which this occurs are poorly understood. We globally analyzed in detail the temporal order of binding of several key factors involved in the salt response of yeast to their target genes. Analysis of Yap4 and Sko1 binding to their target genes revealed multiple temporal classes of binding patterns: (1) constant binding, (2) rapid induction, (3) slow induction, and (4) transient induction. These results demonstrate that individual transcription factors can have multiple binding patterns and help define the different types of temporal binding patterns used in eukaryotic gene regulation. To investigate these binding patterns further, we also analyzed the binding of seven other key transcription factors implicated in osmotic regulation, including Hot1, Msn1, Msn2, Msn4, Skn7, and Yap6, and found significant coassociation among the different factors at their gene targets. Moreover, the binding of several key factors was correlated with distinct classes of Yap4- and Sko1-binding patterns and with distinct types of genes. Gene expression studies revealed association of Yap4, Sko1, and other transcription factor-binding patterns with different gene expression patterns. The integration and analysis of binding and expression information reveals a complex dynamic and hierarchical circuit in which specific combinations of transcription factors target distinct sets of genes at discrete times to coordinate a rapid and important biological response.

Journal ArticleDOI
TL;DR: This study performs genome‐wide profiling of 49 primary prostate cancers and identifies 20 recurrent chromosomal copy number aberrations, mainly occurring as genomic losses, and demonstrates that high‐resolution tiling arrays can be used to pin‐point breakpoints leading to fusion events.
Abstract: Emerging molecular and clinical data suggest that ETS fusion prostate cancer represents a distinct molecular subclass, driven most commonly by a hormonally regulated promoter and characterized by an aggressive natural history. The study of the genomic landscape of prostate cancer in the light of ETS fusion events is required to understand the foundation of this molecularly and clinically distinct subtype. We performed genome-wide profiling of 49 primary prostate cancers and identified 20 recurrent chromosomal copy number aberrations, mainly occurring as genomic losses. Co-occurring events included losses at 19q13.32 and 1p22.1. We discovered three genomic events associated with ERG rearranged prostate cancer, affecting 6q, 7q, and 16q. 6q loss in nonrearranged prostate cancer is accompanied by gene expression deregulation in an independent dataset and by protein deregulation of MYO6. To analyze copy number alterations within the ETS genes, we performed a comprehensive analysis of all 27 ETS genes and of the 3 Mbp genomic area between ERG and TMPRSS2 (21q) with an unprecedented resolution (30 bp). We demonstrate that high-resolution tiling arrays can be used to pinpoint breakpoints leading to fusion events. This study provides further support to define a distinct molecular subtype of prostate cancer based on the presence of ETS gene rearrangements. V C 2009 Wiley-Liss,Inc.

Journal ArticleDOI
TL;DR: This work has examined the effect of EBNA1 on cellular gene expression by microarray analysis using the B cell BJAB and the epithelial 293 cell lines transfected withEBNA1 and correlated EBNA 1 bound promoters with changes in gene expression.
Abstract: Epstein-Barr virus (EBV) is associated with several types of lymphomas and epithelial tumors including Burkitt's lymphoma (BL), HIV-associated lymphoma, posttransplant lymphoproliferative disorder, and nasopharyngeal carcinoma. EBV nuclear antigen 1 (EBNA1) is expressed in all EBV associated tumors and is required for latency and transformation. EBNA1 initiates latent viral replication in B cells, maintains the viral genome copy number, and regulates transcription of other EBV-encoded latent genes. These activities are mediated through the ability of EBNA1 to bind viral-DNA. To further elucidate the role of EBNA1 in the host cell, we have examined the effect of EBNA1 on cellular gene expression by microarray analysis using the B cell BJAB and the epithelial 293 cell lines transfected with EBNA1. Analysis of the data revealed distinct profiles of cellular gene changes in BJAB and 293 cell lines. Subsequently, chromatin immune-precipitation revealed a direct binding of EBNA1 to cellular promoters. We have correlated EBNA1 bound promoters with changes in gene expression. Sequence analysis of the 100 promoters most enriched revealed a DNA motif that differs from the EBNA1 binding site in the EBV genome.

Journal ArticleDOI
TL;DR: Overall, the studies greatly extend the understanding of protein glycosylation in eukaryotes through the cataloguing of glycoproteins, and describe a novel role for protein Glycosylations in mitochondrial protein function and localization.
Abstract: To further understand the roles of protein glycosylation in eukaryotes, we globally identified glycan-containing proteins in yeast. A fluorescent lectin binding assay was developed and used to screen protein microarrays containing over 5000 proteins purified from yeast. A total of 534 yeast proteins were identified that bound either Concanavalin A (ConA) or Wheat-Germ Agglutinin (WGA); 406 of them were novel. Among the novel glycoproteins, 45 were validated by mobility shift upon treatment with EndoH and PNGase F, thereby extending the number of validated yeast glycoproteins to 350. In addition to many components of the secretory pathway, we identified other types of proteins, such as transcription factors and mitochondrial proteins. To further explore the role of glycosylation in mitochondrial function, the localization of four mitochondrial proteins was examined in the presence and absence of tunicamycin, an inhibitor of N-linked protein glycosylation. For two proteins, localization to the mitochondria is diminished upon tunicamycin treatment, indicating that protein glycosylation is important for protein function. Overall, our studies greatly extend our understanding of protein glycosylation in eukaryotes through the cataloguing of glycoproteins, and describe a novel role for protein glycosylation in mitochondrial protein function and localization.

Journal ArticleDOI
TL;DR: Various approaches are explored to develop a framework of data release and sharing principles that will most effectively fulfill the needs of the funding agencies and the research community.
Abstract: Policies supporting the rapid and open sharing of genomic data have directly fueled the accelerated pace of discovery in large-scale genomics research. The proteomics community is starting to implement analogous policies and infrastructure for making large-scale proteomics data widely available on a precompetitive basis. On August 14, 2008, the National Cancer Institute (NCI) convened the "International Summit on Proteomics Data Release and Sharing Policy" in Amsterdam, The Netherlands, to identify and address potential roadblocks to rapid and open access to data. The six principles agreed upon by key stakeholders at the summit addressed issues surrounding (1) timing, (2) comprehensiveness, (3) format, (4) deposition to repositories, (5) quality metrics, and (6) responsibility for proteomics data release. This summit report explores various approaches to develop a framework of data release and sharing principles that will most effectively fulfill the needs of the funding agencies and the research community.

Journal ArticleDOI
TL;DR: Results reveal a number of transcription factors potentially important for mammalian neuronal differentiation, and indicate that Oct-2 may serve as a binary switch to repress differentiation in precursor cells and induce neuronal differentiation later during neuronal development.
Abstract: Neuronal differentiation is a complex process that involves a plethora of regulatory steps. To identify transcription factors that influence neuronal differentiation we developed a high throughput screen using embryonic stem (ES) cells. Seven-hundred human transcription factor clones were stably introduced into mouse ES (mES) cells and screened for their ability to induce neuronal differentiation of mES cells. Twenty-four factors that are capable of inducing neuronal differentiation were identified, including four known effectors of neuronal differentiation, 11 factors with limited evidence of involvement in regulating neuronal differentiation, and nine novel factors. One transcription factor, Oct-2, was studied in detail and found to be a bifunctional regulator: It can either repress or induce neuronal differentiation, depending on the particular isoform. Ectopic expression experiments demonstrate that isoform Oct-2.4 represses neuronal differentiation, whereas Oct-2.2 activates neuron formation. Consistent with a role in neuronal differentiation, Oct-2.2 expression is induced during differentiation, and cells depleted of Oct-2 and its homolog Oct-1 have a reduced capacity to differentiate into neurons. Our results reveal a number of transcription factors potentially important for mammalian neuronal differentiation, and indicate that Oct-2 may serve as a binary switch to repress differentiation in precursor cells and induce neuronal differentiation later during neuronal development.

Journal ArticleDOI
TL;DR: Large scale interaction, transcription factor binding and phosphorylation data have enabled the elucidation of global regulatory networks and helped provide an understanding of cellular pathways and processes at a global and systems level.

Journal ArticleDOI
TL;DR: This approach enables the rapid determination of kinase–substrate relationship on a proteome-wide scale, and although developed using yeast, has since been adapted to higher eukaryotic systems.
Abstract: Herein, we describe a protocol for the global identification of in vitro substrates targeted by protein kinases using protein microarray technology. Large numbers of fusion proteins tagged at their carboxy-termini are purified in 96-well format and spotted in duplicate onto amino-silane-coated slides in a spatially addressable manner. These arrays are incubated in the presence of purified kinase and radiolabeled ATP, and then washed, dried and analyzed by autoradiography. The extent of phosphorylation of each spot is quantified and normalized, and proteins that are reproducibly phosphorylated in the presence of the active kinase relative to control slides are scored as positive substrates. This approach enables the rapid determination of kinase-substrate relationship on a proteome-wide scale, and although developed using yeast, has since been adapted to higher eukaryotic systems. Expression, purification and printing of the yeast proteome require about 3 weeks. Afterwards, each kinase assay takes approximately 3 h to perform.

Journal ArticleDOI
TL;DR: The power of 500 million sequences correlated with 500 million phenotypes can show both small contributions as well as help identify potential causative mutations.
Abstract: Mol Syst Biol. 5: 273 With the cost of DNA sequencing decreasing rapidly, it is likely that the genome sequences of many individuals will be determined. In fact, if half of the individuals in industrialized countries choose to have their genomes sequenced, then well over 500 million personal genome sequences will be determined. Currently, such genetic information is likely to be of limited value to the individual, as the number of loci that provide useful predictive information is quite small (probably less than 200). Indeed, recent analyses of common complex traits such as diabetes, body mass and height show that in each case the genetically identifiable contribution from multiple candidate loci (18 in the case of diabetes) is only a small percentage (less than 7%) of the total identifiable genetic load (Gaulton et al , 2008; Willer et al , 2009); thus, the interpretable genetic contributions that can be identified are quite minor. Presumably, either many low‐frequency alleles at different loci contribute to the genetic load or perhaps the many phenotypes are because of other phenomena such as synergistic effects between variants at more than one locus or between different loci and factors in the environment, recurrent spontaneous mutations, or epigenetic defects. Regardless of which proves to be correct (likely a differing mixture of effects for different diseases), the ability to accurately correlate all bases with precise phenotypes is likely to be powerful only if a common set of phenotypes are scored. The power of 500 million sequences correlated with 500 million phenotypes can show both small contributions as well as help identify potential causative mutations. Indeed, a data set of this size would greatly exceed that of even the large genome‐wide association studies that typically analyze thousands of individuals to tens of thousands …

Journal ArticleDOI
Lu Yong Wang1, Alexej Abyzov, Jan O. Korbel1, Michael Snyder1, Mark Gerstein1 
TL;DR: This paper proposes a nonparametric method that achieves robust discontinuity-preserving smoothing, thus accurately segmenting chromosomes into regions of duplication and deletion and can be extended to segmenting the signal resulting from the depth-of-coverage of mapped reads from next-generation sequencing.
Abstract: Genome structural variation includes segmental duplications, deletions, and other rearrangements, and array-based comparative genomic hybridization (array-CGH) is a popular technology for determining this. Drawing relevant conclusions from array-CGH requires computational methods for partitioning the chromosome into segments of elevated, reduced, or unchanged copy number. Several approaches have been described, most of which attempt to explicitly model the underlying distribution of data based on particular assumptions. Often, they optimize likelihood functions for estimating model parameters, by expectation maximization or related approaches; however, this requires good parameter initialization through prespecifying the number of segments. Moreover, convergence is difficult to achieve, since many parameters are required to characterize an experiment. To overcome these limitations, we propose a nonparametric method without a global criterion to be optimized. Our method involves mean-shift-based (MSB) procedures; it considers the observed array-CGH signal as sampling from a probabilitydensity function, uses a kernel-based approach to estimate local gradients for this function, and iteratively follows them to determine local modes of the signal. Overall, our method achieves robust discontinuity-preserving smoothing, thus accurately segmenting chromosomes into regions of duplication and deletion. It does not require the number of segments as input, nor does its convergence depend on this. We successfully applied our method to both simulated data and array-CGH experiments on glioblastoma and adenocarcinoma. We show that it performs at least as well as, and often better than, 10 previously published algorithms. Finally, we show that our approach can be extended to segmenting the signal resulting from the depth-of-coverage of mapped reads from next-generation sequencing.

Journal ArticleDOI
19 Jan 2009-PLOS ONE
TL;DR: The novel finding that WNT16, ST14 and Pcsk1 protein levels increase in fetuses with CHDs suggests that these proteins may play a role in the etiology of human CHDs.
Abstract: BACKGROUND: Cardiovascular development is vital for embryonic survival and growth. Early gestation embryo loss or malformation has been linked to yolk sac vasculopathy and congenital heart defects ...

Journal ArticleDOI
TL;DR: A simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs) and semi-realistic simulations show how this can be combined to optimally solve the assembly at low cost.
Abstract: The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen), with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs). SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome.) To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of human genomes at maximum accuracy and low cost.

Journal ArticleDOI
TL;DR: A MAPK phosphorylation network is generated by integratingosphorylation and gene expression information to identify biologically relevant signaling modules and brings a new perspective on MAPK signaling by revealing new relationships between components of signaling pathways.
Abstract: MAP kinase (MAPK) signal transduction cascades are conserved eukaryotic pathways that modulate stress responses and developmental processes. In a recent report we have identified novel Arabidopsis MAPKK/MAPK/Substrate signaling pathways using microarrays containing 2,158 unique Arabidopsis proteins. Subsequently, several WRKY and TGA targets phosphorylated by MAPKs were verified in planta. We have also reported that specific MAPKK/MAPK modules expressed in Nicotiana benthamiana induced a cell death phenotype related to the immune response. We have generated a MAPK phosphorylation network based on our protein microarray experimental data. Here we further analyze our network by integrating phosphorylation and gene expression information to identify biologically relevant signaling modules. We have identified 108 phosphorylation events that occur among 96 annotated genes with highly similar pairwise expression profiles. Our analysis brings a new perspective on MAPK signaling by revealing new relationships between components of signaling pathways.

01 Jan 2009
TL;DR: This dissertation aims to provide a history of quantitative anaesthetism from 1989 to 2002, a period chosen in order to explore its roots as well as specific cases up to and including the year in which descriptions of “black-box surgery” began to circulate.
Abstract: http://bloodjournal.hematologylibrary.org/misc/rights.dtl#repub_requests Information about reproducing this article in parts or in its entirety may be found online at: http://bloodjournal.hematologylibrary.org/misc/rights.dtl#reprints Information about ordering reprints may be found online at: http://bloodjournal.hematologylibrary.org/subscriptions/index.dtl Information about subscriptions and ASH membership may be found online at: