Struggling to understand research papers? Don’t panic! Get simple explanations to your questions. Learn more

scispace - formally typeset
SciSpace - Your AI assistant to discover and understand research papers | Product Hunt

Conference

International Conference on Bioinformatics 

About: International Conference on Bioinformatics is an academic conference. The conference publishes majorly in the area(s): Cloud computing & Population. Over the lifetime, 5299 publication(s) have been published by the conference receiving 26414 citation(s).
Papers
More filters

Journal ArticleDOI
01 Jul 1999
TL;DR: A greedy algorithm for determining alignments of functionally related sequences is described, and the accuracy of the P value calculations are tested, and an example of using the algorithm to identify binding sites for the Escherichia coli CRP protein is given.
Abstract: Motivation: Molecular biologists frequently can obtain interesting insight by aligning a set of related DNA, RNA or protein sequences. Such alignments can be used to determine either evolutionary or functional relationships. Our interest is in identifying functional relationships. Unless the sequences are very similar, it is necessary to have a specific strategy for measuring-or scoring-the relatedness of the aligned sequences. If the alignment is not known, one can be determined by finding an alignment that optimizes the scoring scheme. Results: We describe four components to our approach for determining alignments of multiple sequences. First, we review a log-likelihood scoring scheme we call information content. Second, we describe two methods for estimating the P value of an individual information content score: (i) a method that combines a technique from large-deviation statistics with numerical calculations; (ii) a method that is exclusively numerical. Third, we describe how we count the number of possible alignments given the overall amount of sequence data. This count is multiplied by the P value to determine the expected frequency of an information content score and, thus, the statistical significance of the corresponding alignment. Statistical significance can be used to compare alignments having differing widths and containing differing numbers of sequences. Fourth, we describe a greedy algorithm for determining alignments of functionally related sequences. Finally, we test the accuracy of our P value calculations, and give an example of using our algorithm to identify binding sites for the Escherichia coli CRP protein.

1,284 citations


Proceedings ArticleDOI
22 Sep 2013
TL;DR: An integrative method is developed to identify patterns from multiple experiments simultaneously while taking full advantage of high-resolution data, discovering joint patterns across different assay types, and yields a model which elucidates the relationship between assay observations and functional elements in the genome.
Abstract: Sequence census methods like ChIP-seq now produce an unprecedented amount of genome-anchored data. We have developed an integrative method to identify patterns from multiple experiments simultaneously while taking full advantage of high-resolution data, discovering joint patterns across different assay types. We apply this method to ENCODE chromatin data for the human chronic myeloid leukemia cell line K562, including ChIP-seq data on covalent histone modifications and transcription factor binding, and DNase-seq and FAIRE-seq readouts of open chromatin. In an unsupervised fashion, we identify patterns associated with transcription start sites, gene ends, enhancers, CTCF elements, and repressed regions. The method yields a model which elucidates the relationship between assay observations and functional elements in the genome. This model identifies sequences likely to affect transcription, and we verify these predictions in laboratory experiments. We have made software and an integrative genome browser track freely available (noble.gs.washington.edu/proj/segway/).

486 citations


Journal ArticleDOI
01 Jul 1999
TL;DR: A comprehensive yeast-specific promoter database that contains relevant binding affinity and expression data where available and provides some simple but useful tools for promoter sequence analysis is developed.
Abstract: MOTIVATION: In order to facilitate a systematic study of the promoters and transcriptionally regulatory cis-elements of the yeast Saccharomyces cerevisiae on a genomic scale, we have developed a comprehensive yeast-specific promoter database, SCPD. RESULTS: Currently SCPD contains 580 experimentally mapped transcription factor (TF) binding sites and 425 transcriptional start sites (TSS) as its primary data entries. It also contains relevant binding affinity and expression data where available. In addition to mechanisms for promoter information (including sequence) retrieval and a data submission form, SCPD also provides some simple but useful tools for promoter sequence analysis. AVAILABILITY: SCPD can be accessed from the URL http://cgsigma.cshl.org/jian. The database is continually updated.

471 citations


Journal ArticleDOI
01 Nov 1999
TL;DR: A family of novel architectures which can learn to make predictions based on variable ranges of dependencies are introduced, extending recurrent neural networks and introducing non-causal bidirectional dynamics to capture both upstream and downstream information.
Abstract: Motivation: Predicting the secondary structure of a protein (alpha-helix, beta-sheet, coil) is an important step towards elucidating its three-dimensional structure, as well as its function. Presently, the best predictors are based on machine learning approaches, in particular neural network architectures with a fixed, and relatively short, input window of amino acids, centered at the prediction site. Although a fixed small window avoids overfitting problems, it does not permit capturing variable long-rang information. Results: We introduce a family of novel architectures which can learn to make predictions based on variable ranges of dependencies. These architectures extend recurrent neural networks, introducing non-causal bidirectional dynamics to capture both upstream and downstream information. The prediction algorithm is completed by the use of mixtures of estimators that leverage evolutionary information, expressed in terms of multiple alignments, both at the input and output levels. While our system currently achieves an overall performance close to 76% correct prediction ‐ at least comparable to the best existing systems ‐ the main emphasis here is on the development of new algorithmic

463 citations


Proceedings Article
08 Jul 2009
Abstract: LETTERS Variants in the CDKN2B and RTEL1 regions are associated with high-grade glioma susceptibility Margaret Wrensch 1,2,12 , Robert B Jenkins 3,12 , Jeffrey S Chang 4,12 , Ru-Fang Yeh 4,12 , Yuanyuan Xiao 4 , Paul A Decker 5 , Karla V Ballman 5 , Mitchel Berger 1 , Jan C Buckner 6 , Susan Chang 1 , Caterina Giannini 3 , Chandralekha Halder 3 , Thomas M Kollmeyer 3 , Matthew L Kosel 5 , Daniel H LaChance 7 , Lucie McCoy 1 , Brian P O’Neill 7 , Joe Patoka 1 , Alexander R Pico 8 , Michael Prados 1 , Charles Quesenberry 9 , Terri Rice 1 , Amanda L Rynearson 3 , Ivan Smirnov 1 , Tarik Tihan 10 , Joe Wiemels 2,4 , Ping Yang 11,13 & John K Wiencke 1,2,13 The causes of glioblastoma and other gliomas remain obscure 1,2 To discover new candidate genes influencing glioma susceptibility, we conducted a principal component– adjusted 3 genome-wide association study (GWAS) of 275,895 autosomal variants among 692 adult high-grade glioma cases (622 from the San Francisco Adult Glioma Study (AGS) and 70 from the Cancer Genome Atlas (TCGA)) 4 and 3,992 controls (602 from AGS and 3,390 from Illumina iControlDB (iControls)) For replication, we analyzed the 13 SNPs with P o 10 A6 using independent data from 176 high-grade glioma cases and 174 controls from the Mayo Clinic On 9p21, rs1412829 near CDKN2B had discovery P ¼ 34 Â 10 A8 , replication P ¼ 00038 and combined P ¼ 185 Â 10 A10 On 20q133, rs6010620 intronic to RTEL1 had discovery P ¼ 15 Â 10 A7 , replication P ¼ 000035 and combined P ¼ 340 Â 10 A9 For both SNPs, the direction of association was the same in discovery and replication phases Subject characteristics, including participation rates for the discovery GWAS and replication phases, are listed in Supplementary Table 1a,b The distribution of P values from the principal component–adjusted logistic regression additive model across the genome for high-grade glioma cases versus controls (Fig 1) suggests potentially meaningful associations for several SNPs on chromosomes 1, 5, 9, 11 and 20 Supplementary Table 2 summarizes results for the 13 SNPs with P o 10 A6 for association with high-grade glioma in discovery data along with results from replication data; SNPs with Hardy-Weinberg P o 10 A5 in controls or with 45% missing data in any case or control group were excluded Three of these 13 SNPs (rs1412829 on 9p21, and rs6010620 and rs4809324 intronic to RTEL1 on 20q133) had significant association with high-grade glioma risk in the discovery phase (principal component analysis P o 18 Â 10 A7 ), were inde- pendent risk predictors in a multivariable analysis of 13 top hits, and were replicated in the Mayo Clinic dataset (Table 1) As shown in Table 1 and Supplementary Table 2, the minor allele frequencies for the three SNPs consistently differed in the same direction between high-grade glioma cases and controls regardless of data source (AGS, TCGA, iControls or Mayo Clinic) Supplementary Table 3 shows results from the multivariable model of discovery data that included all 13 SNPs (four from the 9p21 region, three in RTEL1, plus six others in other locations) Eight SNPs, including one in the 9p21 region and two intronic to RTEL1, remained independently associated with high- grade glioma risk after adjustment for other SNPs This was expected given the strong linkage disequilibrium (LD) evident for the four 9p21 SNPs and two of the three RTEL1 SNPs (Supplementary Table 4) In discovery data, only the interaction between chromosome 9p21 SNP rs1412829 and TERT SNP rs2736100 on chromosome 5 was statistically significant with Wald test P ¼ 0019 (see Supplementary P value Chromosome © 2009 Nature America, Inc All rights reserved Figure 1 Distribution of P values from principal component–adjusted logistic regression additive model across the genome for high-grade glioma cases versus controls The 13 SNPs with P o 10 A6 are shown in red 1 Department of Neurological Surgery, University of California, San Francisco, San Francisco, California, USA 2 Institute of Human Genetics, University of California, San Francisco, San Francisco, California, USA 3 Department of Experimental Pathology, Mayo Clinic, Rochester, Minnesota, USA 4 Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, California, USA 5 Division of Biostatistics, 6 Department of Oncology and 7 Department of Neurology, Mayo Clinic, Rochester, Minnesota, USA 8 Gladstone Institute of Cardiovascular Disease, University of California, San Francisco, San Francisco, California, USA 9 Division of Research, Kaiser Permanente, Oakland, California, USA 10 Department of Pathology, University of California, San Francisco, San Francisco, California, USA 11 Division of Epidemiology, Mayo Clinic, Rochester, Minnesota, USA 12 These authors contributed equally to this work 13 These authors jointly directed the work Correspondence should be addressed to MW (margaretwrensch@ucsfedu) Received 13 March; accepted 1 June; published online 5 July 2009; doi:101038/ng408 NATURE GENETICS ADVANCE ONLINE PUBLICATION

427 citations


Network Information
Related Conferences (5)
Bioinformatics and Biomedicine

4.3K papers, 17.8K citations

80% related
Intelligent Systems in Molecular Biology

757 papers, 60.4K citations

78% related
Research in Computational Molecular Biology

1.2K papers, 33.8K citations

77% related
Pacific Symposium on Biocomputing

1.2K papers, 46K citations

77% related
International Conference on Data Mining

6.4K papers, 166.4K citations

75% related
Performance
Metrics
No. of papers from the Conference in previous years
YearPapers
2021289
2020482
2019366
2018390
2017339
2016328