scispace - formally typeset
Search or ask a question
JournalISSN: 1687-8027

Advances in Bioinformatics 

Hindawi Publishing Corporation
About: Advances in Bioinformatics is an academic journal. The journal publishes majorly in the area(s): Gene & Genome. It has an ISSN identifier of 1687-8027. It is also open access. Over the lifetime, 184 publications have been published receiving 4931 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: Genevestigator V3 is a novel meta-analysis system resulting from new algorithmic and software development using a client/server architecture, large-scale manual curation and quality control of microarray data for several organisms, and curation of pathway data for mouse and Arabidopsis.
Abstract: The Web-based software tool Genevestigator provides powerful tools for biologists to explore gene expression across a wide variety of biological contexts. Its first releases, however, were limited by the scaling ability of the system architecture, multiorganism data storage and analysis capability, and availability of computationally intensive analysis methods. Genevestigator V3 is a novel meta-analysis system resulting from new algorithmic and software development using a client/server architecture, large-scale manual curation and quality control of microarray data for several organisms, and curation of pathway data for mouse and Arabidopsis. In addition to improved querying features, Genevestigator V3 provides new tools to analyze the expression of genes in many different contexts, to identify biomarker genes, to cluster genes into expression modules, and to model expression responses in the context of metabolic and regulatory networks. Being a reference expression database with user-friendly tools, Genevestigator V3 facilitates discovery research and hypothesis validation.

1,859 citations

Journal ArticleDOI
TL;DR: Various ways of performing dimensionality reduction on high-dimensional microarray data are summarised to provide a clearer idea of when to use each one of them for saving computational time and resources.
Abstract: We summarise various ways of performing dimensionality reduction on high-dimensional microarray data. Many different feature selection and feature extraction methods exist and they are being widely used. All these methods aim to remove redundant and irrelevant features so that classification of new instances will be more accurate. A popular source of data is microarrays, a biological platform for gathering gene expressions. Analysing microarrays can be difficult due to the size of the data they provide. In addition the complicated relations among the different genes make analysis more difficult and removing excess features can improve the quality of the results. We present some of the most popular methods for selecting significant features and provide a comparison between them. Their advantages and disadvantages are outlined in order to provide a clearer idea of when to use each one of them for saving computational time and resources.

749 citations

Journal ArticleDOI
TL;DR: The cluster merging algorithm under this framework improves model fit and provides a better estimate of the number of distinct cell subpopulations than either Gaussian mixture models or flowClust, especially for complicated flow cytometry data distributions.
Abstract: We present a framework for the identification of cell subpopulations in flow cytometry data based on merging mixture components using the flowClust methodology. We show that the cluster merging algorithm under our framework improves model fit and provides a better estimate of the number of distinct cell subpopulations than either Gaussian mixture models or flowClust, especially for complicated flow cytometry data distributions. Our framework allows the automated selection of the number of distinct cell subpopulations and we are able to identify cases where the algorithm fails, thus making it suitable for application in a high throughput FCM analysis pipeline. Furthermore, we demonstrate a method for summarizing complex merged cell subpopulations in a simple manner that integrates with the existing flowClust framework and enables downstream data analysis. We demonstrate the performance of our framework on simulated and real FCM data. The software is available in the flowMerge package through the Bioconductor project.

123 citations

Journal ArticleDOI
TL;DR: This paper reviews state-of-the-art FCM data analysis approaches using a framework introduced to report each of the components in a data analysis pipeline, and current challenges and possible future directions in developing fully automated FCMData analysis tools are outlined.
Abstract: Flow cytometry (FCM) is widely used in health research and in treatment for a variety of tasks, such as in the diagnosis and monitoring of leukemia and lymphoma patients, providing the counts of helper-T lymphocytes needed to monitor the course and treatment of HIV infection, the evaluation of peripheral blood hematopoietic stem cell grafts, and many other diseases. In practice, FCM data analysis is performed manually, a process that requires an inordinate amount of time and is error-prone, nonreproducible, nonstandardized, and not open for re-evaluation, making it the most limiting aspect of this technology. This paper reviews state-of-the-art FCM data analysis approaches using a framework introduced to report each of the components in a data analysis pipeline. Current challenges and possible future directions in developing fully automated FCM data analysis tools are also outlined.

120 citations

Journal ArticleDOI
TL;DR: This work constructs the unique N-mer frequency profiles of 635 microbial genomes publicly available as of February 2008 and demonstrates that such a tool can be used on the Sargasso Sea dataset, and the analysis shows that NBC can be further enhanced.
Abstract: A vast amount of microbial sequencing data is being generated through large-scale projects in ecology, agriculture, and human health. Efficient high-throughput methods are needed to analyze the mass amounts of metagenomic data, all DNA present in an environmental sample. A major obstacle in metagenomics is the inability to obtain accuracy using technology that yields short reads. We construct the unique N-mer frequency profiles of 635 microbial genomes publicly available as of February 2008. These profiles are used to train a naive Bayes classifier (NBC) that can be used to identify the genome of any fragment. We show that our method is comparable to BLAST for small 25 bp fragments but does not have the ambiguity of BLAST's tied top scores. We demonstrate that this approach is scalable to identify any fragment from hundreds of genomes. It also performs quite well at the strain, species, and genera levels and achieves strain resolution despite classifying ubiquitous genomic fragments (gene and nongene regions). Cross-validation analysis demonstrates that species-accuracy achieves 90% for highly-represented species containing an average of 8 strains. We demonstrate that such a tool can be used on the Sargasso Sea dataset, and our analysis shows that NBC can be further enhanced.

114 citations

Network Information
Related Journals (5)
BMC Bioinformatics
11.9K papers, 642K citations
83% related
Bioinformatics
17.4K papers, 2.1M citations
82% related
Briefings in Bioinformatics
3.5K papers, 156.8K citations
80% related
BMC Systems Biology
2K papers, 69.7K citations
78% related
Journal of Computational Biology
2.7K papers, 110.3K citations
78% related
Performance
Metrics
No. of papers from the Journal in previous years
YearPapers
20211
20201
20195
20187
20175
201624