Journal•ISSN: 0919-9454

Genome Informatics

Imperial College Press

About: Genome Informatics is an academic journal. The journal publishes majorly in the area(s): Genome & Gene. It has an ISSN identifier of 0919-9454. Over the lifetime, 1517 publications have been published receiving 15503 citations.

...read moreread less

Topics: Genome, Gene, Genome project, Cluster analysis, Gene regulatory network ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Open Source Clustering Software

[...]

Michiel J. L. de Hoon¹, Seiya Imoto¹, Satoru Miyano¹•Institutions (1)

University of Tokyo¹

01 Jan 2002-Genome Informatics

TL;DR: An improved version of Michael Eisen's well-known Cluster program for Windows, Mac OS X and Linux/Unix is created, and a Python and a Perl interface to the C Clustering Library is generated, thereby combining the flexibility of a scripting language with the speed of C.

...read moreread less

Abstract: SUMMARY We have implemented k-means clustering, hierarchical clustering and self-organizing maps in a single multipurpose open-source library of C routines, callable from other C and C++ programs. Using this library, we have created an improved version of Michael Eisen's well-known Cluster program for Windows, Mac OS X and Linux/Unix. In addition, we generated a Python and a Perl interface to the C Clustering Library, thereby combining the flexibility of a scripting language with the speed of C. AVAILABILITY The C Clustering Library and the corresponding Python C extension module Pycluster were released under the Python License, while the Perl module Algorithm::Cluster was released under the Artistic License. The GUI code Cluster 3.0 for Windows, Macintosh and Linux/Unix, as well as the corresponding command-line program, were released under the same license as the original Cluster code. The complete source code is available at http://bonsai.ims.u-tokyo.ac.jp/mdehoon/software/cluster. Alternatively, Algorithm::Cluster can be downloaded from CPAN, while Pycluster is also available as part of the Biopython distribution.

...read moreread less

1,493 citations

Journal Article•DOI•

Intrinsic protein disorder in complete genomes.

[...]

A.K. Dunker¹, Zoran Obradovic¹, Pedro Romero¹, Ethan C. Garner¹, Celeste J. Brown¹ - Show less +1 more•Institutions (1)

Washington State University¹

01 Jan 2000-Genome Informatics

TL;DR: Overall, intrinsic disorder appears to be a common, with eucaryotes perhaps having a higher percentage of native disorder than archaea or bacteria, and bacteria and archaea in various archaea ranged from 2 to 11%, plus an apparently anomalous 18% in bacteria.

...read moreread less

Abstract: Intrinsic protein disorder refers to segments or to whole proteins that fail to fold completely on their own. Here we predicted disorder on protein sequences from 34 genomes, including 22 bacteria, 7 archaea, and 5 eucaryotes. Predicted disordered segments > or = 50, > or = 40, and > or = 30 in length were determined as well as proteins estimated to be wholly disordered. The five eucaryotes were separated from bacteria and archaea by having the highest percentages of sequences predicted to have disordered segments > or = 50 in length: from 25% for Plasmodium to 41% for Drosophila. Estimates of wholly disordered proteins in the bacteria ranged from 1% to 8%, averaging to 3 +/- 2%, estimates in various archaea ranged from 2 to 11%, plus an apparently anomalous 18%, averaging to 7 +/- 5% that drops to 5 +/- 3% if the high value is discarded. Estimates in the 5 eucarya ranged from 3 to 17%. The putative wholly disordered proteins were often ribosomal proteins, but in addition about equal numbers were of known and unknown function. Overall, intrinsic disorder appears to be a common, with eucaryotes perhaps having a higher percentage of native disorder than archaea or bacteria.

...read moreread less

642 citations

Journal Article•DOI•

Predicting Protein Disorder for N-, C-, and Internal Regions.

[...]

Li X¹, Pedro Romero¹, M Rani¹, A. K. Dunker¹, Zoran Obradovic¹ - Show less +1 more•Institutions (1)

Washington State University¹

01 Jan 1999-Genome Informatics

TL;DR: Logistic regression, discriminant analysis, and neural networks were used to predict ordered and disordered regions in proteins to support the hypothesis that disorder is encoded by the amino acid sequence.

...read moreread less

Abstract: Logistic regression (LR), discriminant analysis (DA), and neural networks (NN) were used to predict ordered and disordered regions in proteins. Training data were from a set of non-redundant X-ray crystal structures, with the data being partitioned into N-terminal, C-terminal and internal (I) regions. The DA and LR methods gave almost identical 5-cross validation accuracies that averaged to the following values: 75.9 +/- 3.1% (N-regions), 70.7 +/- 1.5% (I-regions), and 74.6 +/- 4.4% (C-regions). NN predictions gave slightly higher scores: 78.8 +/- 1.2% (N-regions), 72.5 +/- 1.2% (I-regions), and 75.3 +/- 3.3% (C-regions). Predictions improved with length of the disordered regions. Averaged over the three methods, values ranged from 52% to 78% for length = 9-14 to >/= 21, respectively, for I-regions, from 72% to 81% for length = 5 to 12-15, respectively, for N-regions, and from 70% to 80% for length = 5 to 12-15, respectively, for C-regions. These data support the hypothesis that disorder is encoded by the amino acid sequence.

...read moreread less

540 citations

Journal Article•DOI•

A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns.

[...]

Huiqing Liu, Jinyan Li, Limsoon Wong

01 Jan 2002-Genome Informatics

TL;DR: This work presents a comparative study on six feature selection heuristics by applying them to two sets of data, which are gene expression profiles from Acute Lymphoblastic Leukemia and proteomic patterns from ovarian cancer patients.

...read moreread less

Abstract: Feature selection plays an important role in classification. We present a comparative study on six feature selection heuristics by applying them to two sets of data. The first set of data are gene expression profiles from Acute Lymphoblastic Leukemia (ALL) patients. The second set of data are proteomic patterns from ovarian cancer patients. Based on features chosen by these methods, error rates of several classification algorithms were obtained for analysis. Our results demonstrate the importance of feature selection in accurately classifying new samples.

...read moreread less

455 citations

Journal Article•DOI•

A Phylogenetic Foundation for Comparative Mammalian Genomics

[...]

Peter J. Waddell¹, Hirohisa Kishino², Rissa Ota³•Institutions (3)

University of South Carolina¹, University of Tokyo², Massey University³

01 Jan 2001-Genome Informatics

TL;DR: The largest alignments of amino acid sequence data to date are constructed and a good case is made for the tree shrew as a closer relative of primates than rodents, while also showing a slower rate of evolution in key cell cycle genes.

...read moreread less

Abstract: A major effort is being undertaken to sequence an array of mammalian genomes. Coincidentally, the evolutionary relationships of the 18 presently recognized orders of placental mammals are only just being resolved. In this work we construct and analyse the largest alignments of amino acid sequence data to date. Our findings allow us to set up a series of superordinal groups (clades) to act as prior hypotheses for further testing. Important findings include strong evidence for a clade of Euarchonta+Glires (=Supraprimates) comprised of primates, flying lemurs, tree shrews, lagomorphs and rodents. In addition, there is good evidence for a clade of all placental mammals except Xenarthra and Afrotheria (=Boreotheria) and for the previously recognised clades Laurasiatheria, Scrotifera, Fereuungulata, Ferae, Afrotheria, Euarchonta, Glires, and Eulipotyphla. Accordingly, a revised classification of the placental mammals is put forward. Using this and molecular divergence-time methods, the ages of the superordinal splits are estimated. While results are strongly consistent with the earliest superordinal divergences all being > 65 mybp (Cretaceous period), they suffer from greater uncertainty than presently appreciated. The early primate split of tarsiers from the anthropoid lineage at ∼55 mybp is seen to be an especially informative fossil calibration point. A statistical framework for testing clades using SINE data is presented and reveals significant support for the tarsier/anthropoid clade, as well as the clades Cetruminantia and Whippomorpha. Results also underline our thesis that while sequence analysis can help set up hypothesised clades, SINEs obtainable from sequencing 1-2 MB regions of placental genomes are essential to testing them. In contrast, derivations suggest that empirical Bayesian methods for sequence data may not be robust estimators of clades. Our findings, including the study of genes such as TP53, make a good case for the tree shrew as a closer relative of primates than rodents, while also showing a slower rate of evolution in key cell cycle genes. Tree shrews are consequently high value experimental animals and a strong candidate for a genome sequencing initiative.

...read moreread less

273 citations

Collapse

Network Information

Related Journals (5)

Bioinformatics

17.4K papers, 2.1M citations

86% related

BMC Bioinformatics

11.9K papers, 642K citations

86% related

Nucleic Acids Research

48.8K papers, 4.7M citations

83% related

Proteins

8K papers, 447.3K citations

83% related

Genome Research

5.5K papers, 931.7K citations

80% related

Performance

Metrics

1,517

Papers

16,010

Citations

No. of papers from the Journal in previous years
Year	Papers
2014	1
2011	5
2010	5
2009	2
2008	47
2007	48