Showing papers in "Genome Informatics in 1998"

PDF

Open Access

Journal Article•DOI•

Identifying the Interaction between Genes and Gene Products Based on Frequently Seen Verbs in Medline Abstracts.

[...]

Takeshi Sekimizu¹, Hyun Seok Park¹, Hyun Seok Park², Jun'ichi Tsujii¹, Jun'ichi Tsujii³ - Show less +1 more•Institutions (3)

University of Tokyo¹, Sungshin Women's University², University of Manchester³

01 Jan 1998-Genome Informatics

TL;DR: This work has selected the most frequently seen verbs from raw texts made up of 1-million-words of Medline abstracts, and it was able to identify (or bracket) noun phrases contained in the corpus, with a precision rate of 90%.

...read moreread less

Abstract: We have selected the most frequently seen verbs from raw texts made up of 1-million-words of Medline abstracts, and we were able to identify (or bracket) noun phrases contained in the corpus, with a precision rate of 90%. Then, based on the noun-phrase-bracketted corpus, we tried to find the subject and object terms for some frequently seen verbs in the domain. The precision rate of finding the right subject and object for each verb was about 73%. This task was only made possible because we were able to linguistically analyze (or parse) a large quantity of a raw corpus. Our approach will be useful for classifying genes and gene products and for identifying the interaction between them. It is the first step of our effort in building a genome-related thesaurus and hierarchies in a fully automatic way.

...read moreread less

216 citations

Journal Article•DOI•

Detecting Gene Symbols and Names in Biological Texts: A First Step toward Pertinent Information Extraction.

[...]

Denys Proux¹, François Rechenmann², Laurent Julliard¹, Violaine Pillet³, Bernard Jacq - Show less +1 more•Institutions (3)

Xerox¹, French Institute for Research in Computer Science and Automation², Aix-Marseille University³

01 Jan 1998-Genome Informatics

TL;DR: A program for the identification of gene symbols and names inside sentences has been devised, made up of a series of sieves of different natures, lexical, morphological and semantic, to distinguish among the words of a sentence those which can only be potential gene symbols or names.

...read moreread less

Abstract: Gathering data on molecular interactions to be fed into a specialized database has motivated the development of a computer system to help extracting pertinent information from texts, relying on advanced linguistic tools, completed with object-oriented knowledge modeling capabilities. As a first step toward this challenging objective, a program for the identification of gene symbols and names inside sentences has been devised. The main difficulty is that these names and symbols do not appear to follow construction rules. The program is thus made up of a series of sieves of different natures, lexical, morphological and semantic, to distinguish among the words of a sentence those which can only be potential gene symbols or names. Its performance has been evaluated, in terms of coverage and precision ratios, on a corpus of texts concerning D. melanogaster for which the list of names of known genes is available for checking.

...read moreread less

162 citations

Journal Article•DOI•

Predicting Disordered Regions from Amino Acid Sequence: Common Themes Despite Differing Structural Characterization.

[...]

Ethan C. Garner¹, P Cannon¹, Pedro Romero¹, Zoran Obradovic¹, A. K. Dunker¹ - Show less +1 more•Institutions (1)

Washington State University¹

01 Jan 1998-Genome Informatics

TL;DR: The results from the two predictors suggest that disordered regions comprise a sequence-dependant category distinct from that of ordered protein structure.

...read moreread less

Abstract: Using ordered and disordered regions identified either by X-ray crystallography or by NMR spectroscopy, we trained neural networks to predict order and disorder from amino acid sequence. Although the NMR-based predictor initially appeared to be much better than the one based on the X-ray data, both predictors yielded similar overall accuracies when tested on each other's training sets, and indicated similar regions of disorder upon each sequence. The predictors trained with X-ray data showed similar results for a 5-cross validation experiment and for the out-of-sample predictions on the NMR characterized data. In contrast, the predictor trained with NMR data gave substantially worse accuracies on the out-of-sample X-ray data as compared to the accuracies displayed by the 5-cross validation during the network training. Overall, the results from the two predictors suggest that disordered regions comprise a sequence-dependant category distinct from that of ordered protein structure.

...read moreread less

147 citations

Journal Article•DOI•

A System for Identifying Genetic Networks from Gene Expression Patterns Produced by Gene Disruptions and Overexpressions.

[...]

Tatsuya Akutsu¹, Satoru Kuhara², Osamu Maruyama¹, Satoru Miyano¹•Institutions (2)

University of Tokyo¹, Kyushu University²

01 Jan 1998-Genome Informatics

TL;DR: A simulator of boolean networks without time delay is presented, which includes a genetic network identifier with a graphic interface that generates instructions for experiments of gene disruptions and overexpressions.

...read moreread less

Abstract: A hot research topic in genomics is to analyze the interactions between genes by systematic gene disruptions and gene overexpressions. Based on a boolean network model without time delay, we have been investigating efficient strategies for identifying a genetic network by multiple gene disruptions and overexpressions. This paper first shows the relationship between our boolean network model without time delay and the standard synchronous boolean network model. Then we present a simulator of boolean networks without time delay for multiple gene disruptions and gene overexpressions, which includes a genetic network identifier with a graphic interface that generates instructions for experiments of gene disruptions and overexpressions.

...read moreread less

139 citations

Journal Article•DOI•

Construction of the gyrB Database for the Identification and Classification of Bacteria.

[...]

Hiroaki Kasai, Kanako Watanabe, Elisabeth Gasteiger¹, Amos Marc Bairoch¹, Katsumi Isono², Satoshi Yamamoto, Shigeaki Harayama - Show less +3 more•Institutions (2)

Swiss Institute of Bioinformatics¹, Kobe University²

01 Jan 1998-Genome Informatics

TL;DR: The gyrB gene is chosen, because it is rarely transmitted horizontally, its molecular evolution rate is higher than that of 16S rRNA, and the gene is distributed ubiquitously among bacterial species.

...read moreread less

Abstract: Nucleotide sequences of small-subunit rRNA (16S rRNA) are most commonly used for the identification and characterization of bacteria and their complex communities. However, 16S rRNA evolves slowly and is often not very convenient to resolve bacterial strains at the species level. We have therefore attempted to develop a rapid and more convenient system for bacterial identification using the gyrB gene sequences. We chose the gyrB gene, because (i) it is rarely transmitted horizontally, (ii) its molecular evolution rate is higher than that of 16S rRNA, and (iii) the gene is distributed ubiquitously among bacterial species. We PCR-amplified the 1.2 kb-long gyrB segments from about 1,000 bacterial species by using degenerate primers and determined their nucleotide sequences. The resultant data have been assembled into the gyrB database accessible via WWW.

...read moreread less

97 citations

Journal Article•DOI•

The Sequence Attribute Method for Determining Relationships Between Sequence and Protein Disorder

[...]

Q Xie¹, GE Arnold, Pedro Romero¹, Zoran Obradovic¹, Ethan C. Garner¹, AK Dunker¹ - Show less +2 more•Institutions (1)

Washington State University¹

01 Jan 1998-Genome Informatics

TL;DR: Attributes based on cysteine, the aromatics, flexible tendencies, and charge were found to be the best attributes for distinguishing order and disorder among those tested so far.

...read moreread less

Abstract: The conditional probability, P(s|x), is a statement of the probability that the event, s, will occur given prior knowledge for the value of x. If x is given and if s is randomly distributed, then an empirical approximation of the true conditional probability can be computed by the application of Bayes' Theorem. Here s represents one of two structural classes, either ordered, s (o), or disordered, s (d), and x represents an attribute value calculated over a window of 21 amino acids. Plots of P(s|x) versus x provide information about the correlation between the given sequence attribute and disorder or order. These conditional probability plots allow quantitative comparisons between individual attributes for their ability to discriminate between order and disorder states. Using such quantitative comparisons, 38 different sequence attributes have been rank-ordered. Attributes based on cysteine, the aromatics, flexible tendencies, and charge were found to be the best attributes for distinguishing order and disorder among those tested so far.

...read moreread less

55 citations

Journal Article•DOI•

Parallel Protein Information Analysis (PAPIA) System Running on a 64-Node PC Cluster.

[...]

Yutaka Akiyama, Kentaro Onizuka, Tamotsu Noguchi, Makoto Ando

01 Jan 1998-Genome Informatics

TL;DR: The PAPIA (PArallel Protein Information Analysis) system performs fast parallel processing for typical calculations in protein analysis, such as structure similarity search, sequence homology search and multiple sequence alignment, nearly 60 times faster than a single processor.

...read moreread less

Abstract: Protein information analysis is widely regarded as a key technology in drug design, macromolecular engineering, and understanding genome sequences. Because vast amount of calculations are required, further speed-up for protein information analysis is very much in demand. We have implemented the PAPIA (PArallel Protein Information Analysis) system on the RWC PC cluster IIa (PAPIA cluster) which consists of 64 Pentium Pro 200MHz microprocessors. The PAPIA system performs fast parallel processing for typical calculations in protein analysis, such as structure similarity search, sequence homology search and multiple sequence alignment, nearly 60 times faster than a single processor. We have started a WWW service (http://www.rwcp.or.jp/papia/), allowing any biologist to easily submit jobs to the PAPIA system through a WWW browser. The user can experience the power of current parallel processing technology.

...read moreread less

35 citations

Journal Article•DOI•

Fully-Automated Spot Recognition and Matching Algorithms for 2-D Gel Electrophoretogram of Genomic DNA.

[...]

Katsutoshi Takahashi¹, Masayuki Nakazawa, Yasuo Watanabe², Akihiko Konagaya¹•Institutions (2)

Japan Advanced Institute of Science and Technology¹, Kanazawa Institute of Technology²

01 Jan 1998-Genome Informatics

TL;DR: It is possible to detect DNA molecular changes such as deletions, additions, amplifications or DNA methylations occurring at or near to the restriction enzyme cleavage sites by means of comparing large amount of RLGS electrophoretograms, without any visual inspection and human interaction.

...read moreread less

Abstract: We have developed the fully-automated algorithms for processing 2-D gel electrophoretograms based on RLGS (restriction landmark genomic scanning) method; one for fully-automated spot recognition from RLGS electrophoretogram and another for fully-automated pairwise matching of the spots found on such 2-D electrophoretograms. Without any human interaction, several thousands of spots on a 2-D electrophoretogram, including hidden spots found at the shoulder of large spots, can be identified correctly by applying our spot recognition algorithm, except for only a few true-negative and false-positive spots. Once the locations and intensities of the landmark spots are correctly recognized automatically, our pairwise spot matching algorithm reliably and rapidly identifies equivalent pairs of spots found on the nonlinearly distorted RLGS electrophoretograms in the fully-automatic way, i.e., the boring and annoying spot landmarking process is unnecessary. At the beginning of the spot matching process, most suitable pair of corresponding spots is searched automatically, then the other equivalent pairs of spots are identified. With our powerful image processing algorithms, it is possible to detect DNA molecular changes such as deletions, additions, amplifications or DNA methylations occurring at or near to the restriction enzyme cleavage sites by means of comparing large amount of RLGS electrophoretograms, without any visual inspection and human interaction.

...read moreread less

28 citations

Journal Article•DOI•

Improvement of the A(*) Algorithm for Multiple Sequence Alignment.

[...]

Hirotada Kobayashi¹, Hiroshi Imai¹•Institutions (1)

University of Tokyo¹

01 Jan 1998-Genome Informatics

TL;DR: New powerful estimators utilizing k >/= 3 dimensional sub-alignments are presented, and a new bounding technique using V (Delta), a set of vertices in the paths whose lengths are at most Delta longer than the shortest path is proposed.

...read moreread less

Abstract: The alignment problem of DNA or protein sequences is very applicable and important in various fields of molecular biology. This problem can be reduced to the shortest path problem and Ikeda and Imai (Genome Informatics 5: 90-99, 1994) showed that the A(*) algorithm works efficiently with the estimator utilizing all 2-dimensional sub-alignments. In this paper we present new powerful estimators utilizing k >/= 3 dimensional sub-alignments, and propose a new bounding technique using V (Delta), a set of vertices in the paths whose lengths are at most Delta longer than the shortest path. We also extend our algorithm to a recursive-estimate version. These algorithms become more efficient when the number of sequences increase, or the similarity among sequences is lower.

...read moreread less

24 citations

Journal Article•DOI•

Developing NLP Tools for Genome Informatics: An Information Extraction Perspective.

[...]

T Hishiki¹, Nigel Collier¹, Chikashi Nobata¹, T Okazaki-Ohta¹, Norihiro Ogata¹, T Sekimizu¹, R Steiner¹, Hyun Seok Park², Hyun Seok Park¹, Jun'ichi Tsujii³, Jun'ichi Tsujii¹ - Show less +7 more•Institutions (3)

University of Tokyo¹, Sungshin Women's University², University of Manchester³

01 Jan 1998-Genome Informatics

TL;DR: This paper explains some of the current efforts for developing various NLP-based tools for tackling genome-related on-line documents for information extraction task.

...read moreread less

Abstract: Huge quantities of on-line medical texts such as Medline are available, and we would hope to extract useful information from these resources, as much as possible, hopefully in an automatic way, with the aid of computer technologies. Especially, recent advances in Natural Language Processing (NLP) techniques raise new challenges and opportunities for tackling genome-related on-line text; combining NLP techniques with genome informatics extends beyond the traditional realms of either technology to a variety of emerging applications. In this paper, we explain some of our current efforts for developing various NLP-based tools for tackling genome-related on-line documents for information extraction task.

...read moreread less

22 citations

Journal Article•DOI•

Finding Genetic Network from Experiments by Weighted Network Model

[...]

Kiyoshi Noda¹, Ayumi Shinohara¹, Masayuki Takeda¹, Satoshi Matsumoto², Satoru Miyano³, Satoru Kuhara¹ - Show less +2 more•Institutions (3)

Kyushu University¹, Tokai University², University of Tokyo³

01 Jan 1998-Genome Informatics

TL;DR: It is shown that if there exists a weighted network which is consistent with given data, the authors can find it in polynomial time, and also considers the optimization problem, where the problem is NP-hard.

...read moreread less

Abstract: We study the problem of finding a genetic network from data obtained by multiple gene disruptions and overexpressions We define a genetic network as a weighted graph, and analyze the computational complexity of the problem We show that if there exists a weighted network which is consistent with given data, we can find it in polynomial time Moreover, we also consider the optimization problem, where we try to find an optimally consistent weighted network with given data We show that the problem is NP-hard On the other hand, we give a polynomial-time approximation algorithm to solve it with approximation ratio 2 We report some simulation results on experiments

...read moreread less

Journal Article•DOI•

Systematic Prediction of Orthologous Units of Genes in the Complete Genomes.

[...]

Hidemasa Bono¹, Susumu Goto¹, Wataru Fujibuchi¹, Hiroyuki Ogata¹, Minoru Kanehisa¹ - Show less +1 more•Institutions (1)

Kyoto University¹

01 Jan 1998-Genome Informatics

TL;DR: The system and the actual analysis of the complete genome of Pyrococcus horikoshii to identify ABC transporters is described and the ortholog group table is described for the cases where the genes are clustered in physically close positions in the genome for at least one organism.

...read moreread less

Abstract: In order to fully make use of the vast amount of information in the complete genome sequences, we are developing a genome-scale system for predicting gene functions and cellular functions. The system makes use of the information of sequence similarity, the information of positional correlations in the genome, and the reference knowledge stored as the ortholog group tables in KEGG (Kyoto Encyclopedia of Genes and Genomes). The ortholog group table summarizes orthologous and paralogous relations among different organisms for a set of genes that are considered to form a functional unit, such as a conserved portion of the metabolic pathway or a molecular machinery for the membrane transport. At the moment, the ortholog group table is constructed for the cases where the genes are clustered in physically close positions in the genome for at least one organism. In this paper, we describe the system and the actual analysis of the complete genome of Pyrococcus horikoshii to identify ABC transporters.

...read moreread less

Journal Article•DOI•

Gene Classification by Self-Organization Mapping of Codon Usage in Bacteria with Completely Sequenced Genome

[...]

Shigehiko Kanaya¹, Shigehiko Kanaya², Yoshihiro Kudo², Takashi Abe², Takanori Okazaki², Carlos A. Del Carpio³, Toshimichi Ikemura⁴ - Show less +3 more•Institutions (4)

National Institute of Genetics¹, Yamagata University², Toyohashi University of Technology³, Graduate University for Advanced Studies⁴

01 Jan 1998-Genome Informatics

TL;DR: This paper aims to demonstrate the efforts towards in-situ applicability of EMMARM, which aims to provide real-time information about the physical properties of EMTs and their application in the environment.

...read moreread less

Abstract: a95550@eie.yz.yamagata-u.ac.jp carlos@translell.eco.tut.ac.jp tikemura@ddbj.nig.ac.jp 1 Department of Electrical Information Engineering, Faculty of Engineering, Yamagata University, Yonezawa, Yamagata 992-8510, Japan 2 Department of Ecological Engineering, Faculty of Engineering, Toyohashi University of Technology, Toyohashi, Aichi 441-8580, Japan 3 Department of Population Genetics, National Institute of Genetics, and the Graduate University for Advanced Studies, Mishima, Shizuoka, 441-8540, Japan. 4 Department of Developmental Genetics, National Institute of Genetics 5 CREST, JST (Japan Science and Technology)

...read moreread less