scispace - formally typeset
Search or ask a question
Conference

International Conference on Bioinformatics 

About: International Conference on Bioinformatics is an academic conference. The conference publishes majorly in the area(s): Cloud computing & Population. Over the lifetime, 5299 publications have been published by the conference receiving 31017 citations.


Papers
More filters
Journal ArticleDOI
01 Jul 1999
TL;DR: A greedy algorithm for determining alignments of functionally related sequences is described, and the accuracy of the P value calculations are tested, and an example of using the algorithm to identify binding sites for the Escherichia coli CRP protein is given.
Abstract: Motivation: Molecular biologists frequently can obtain interesting insight by aligning a set of related DNA, RNA or protein sequences. Such alignments can be used to determine either evolutionary or functional relationships. Our interest is in identifying functional relationships. Unless the sequences are very similar, it is necessary to have a specific strategy for measuring-or scoring-the relatedness of the aligned sequences. If the alignment is not known, one can be determined by finding an alignment that optimizes the scoring scheme. Results: We describe four components to our approach for determining alignments of multiple sequences. First, we review a log-likelihood scoring scheme we call information content. Second, we describe two methods for estimating the P value of an individual information content score: (i) a method that combines a technique from large-deviation statistics with numerical calculations; (ii) a method that is exclusively numerical. Third, we describe how we count the number of possible alignments given the overall amount of sequence data. This count is multiplied by the P value to determine the expected frequency of an information content score and, thus, the statistical significance of the corresponding alignment. Statistical significance can be used to compare alignments having differing widths and containing differing numbers of sequences. Fourth, we describe a greedy algorithm for determining alignments of functionally related sequences. Finally, we test the accuracy of our P value calculations, and give an example of using our algorithm to identify binding sites for the Escherichia coli CRP protein.

1,315 citations

Proceedings ArticleDOI
22 Sep 2013
TL;DR: An integrative method is developed to identify patterns from multiple experiments simultaneously while taking full advantage of high-resolution data, discovering joint patterns across different assay types, and yields a model which elucidates the relationship between assay observations and functional elements in the genome.
Abstract: Sequence census methods like ChIP-seq now produce an unprecedented amount of genome-anchored data. We have developed an integrative method to identify patterns from multiple experiments simultaneously while taking full advantage of high-resolution data, discovering joint patterns across different assay types. We apply this method to ENCODE chromatin data for the human chronic myeloid leukemia cell line K562, including ChIP-seq data on covalent histone modifications and transcription factor binding, and DNase-seq and FAIRE-seq readouts of open chromatin. In an unsupervised fashion, we identify patterns associated with transcription start sites, gene ends, enhancers, CTCF elements, and repressed regions. The method yields a model which elucidates the relationship between assay observations and functional elements in the genome. This model identifies sequences likely to affect transcription, and we verify these predictions in laboratory experiments. We have made software and an integrative genome browser track freely available (noble.gs.washington.edu/proj/segway/).

528 citations

Journal ArticleDOI
01 Nov 1999
TL;DR: A family of novel architectures which can learn to make predictions based on variable ranges of dependencies are introduced, extending recurrent neural networks and introducing non-causal bidirectional dynamics to capture both upstream and downstream information.
Abstract: Motivation: Predicting the secondary structure of a protein (alpha-helix, beta-sheet, coil) is an important step towards elucidating its three-dimensional structure, as well as its function. Presently, the best predictors are based on machine learning approaches, in particular neural network architectures with a fixed, and relatively short, input window of amino acids, centered at the prediction site. Although a fixed small window avoids overfitting problems, it does not permit capturing variable long-rang information. Results: We introduce a family of novel architectures which can learn to make predictions based on variable ranges of dependencies. These architectures extend recurrent neural networks, introducing non-causal bidirectional dynamics to capture both upstream and downstream information. The prediction algorithm is completed by the use of mixtures of estimators that leverage evolutionary information, expressed in terms of multiple alignments, both at the input and output levels. While our system currently achieves an overall performance close to 76% correct prediction ‐ at least comparable to the best existing systems ‐ the main emphasis here is on the development of new algorithmic

509 citations

Journal ArticleDOI
01 Jul 2005
TL;DR: Li et al. as discussed by the authors proposed a least-angle regression (LARS) method to select genes that are relevant to patients' survival and to build a predictive model for future prediction, which can be used for identifying important genes that were related to time to death due to cancer and for predicting the survival of future patients.
Abstract: Motivation: An important application of microarray technology is to relate gene expression profiles to various clinical phenotypes of patients. Success has been demonstrated in molecular classification of cancer in which the gene expression data serve as predictors and different types of cancer serve as a categorical outcome variable. However, there has been less research in linking gene expression profiles to the censored survival data such as patients' overall survival time or time to cancer relapse. It would be desirable to have models with good prediction accuracy and parsimony property. Results: We propose to use the L1 penalized estimation for the Cox model to select genes that are relevant to patients' survival and to build a predictive model for future prediction. The computational difficulty associated with the estimation in the high-dimensional and low-sample size settings can be efficiently solved by using the recently developed least-angle regression (LARS) method. Our simulation studies and application to real datasets on predicting survival after chemotherapy for patients with diffuse large B-cell lymphoma demonstrate that the proposed procedure, which we call the LARS--Cox procedure, can be used for identifying important genes that are related to time to death due to cancer and for building a parsimonious model for predicting the survival of future patients. The LARS--Cox regression gives better predictive performance than the L2 penalized regression and a few other dimension-reduction based methods. Conclusions: We conclude that the proposed LARS--Cox procedure can be very useful in identifying genes relevant to survival phenotypes and in building a parsimonious predictive model that can be used for classifying future patients into clinically relevant high- and low-risk groups based on the gene expression profile and survival times of previous patients. Supplementary information: http://dna.ucdavis.edu/~hli/LARSCox-Appendix.pdf Contact: hli@ucdavis.edu

483 citations

Journal ArticleDOI
01 Jul 1999
TL;DR: A comprehensive yeast-specific promoter database that contains relevant binding affinity and expression data where available and provides some simple but useful tools for promoter sequence analysis is developed.
Abstract: MOTIVATION: In order to facilitate a systematic study of the promoters and transcriptionally regulatory cis-elements of the yeast Saccharomyces cerevisiae on a genomic scale, we have developed a comprehensive yeast-specific promoter database, SCPD. RESULTS: Currently SCPD contains 580 experimentally mapped transcription factor (TF) binding sites and 425 transcriptional start sites (TSS) as its primary data entries. It also contains relevant binding affinity and expression data where available. In addition to mechanisms for promoter information (including sequence) retrieval and a data submission form, SCPD also provides some simple but useful tools for promoter sequence analysis. AVAILABILITY: SCPD can be accessed from the URL http://cgsigma.cshl.org/jian. The database is continually updated.

475 citations

Performance
Metrics
No. of papers from the Conference in previous years
YearPapers
2021289
2020482
2019366
2018390
2017339
2016328