scispace - formally typeset
Search or ask a question

Showing papers in "Bioinformatics in 1985"


Journal ArticleDOI
Manolo Gouy, Christian Gautier, Marcella Attimonelli1, Cecilia Lanave1, G. di Paola1 
TL;DR: ACNUC is a database structure and retrieval software for use with either the GenBank or EMBL nucleic acid sequence data collections that allows sequence retrieval on a multi-criterion basis.
Abstract: ACNUC is a database structure and retrieval software for use with either the GenBank or EMBL nucleic acid sequence data collections. The nucleotide and textual data furnished by both collections are each restructured into a database that allows sequence retrieval on a multi-criterion basis. The main selection criteria are: species (or higher order taxon), keyword, reference, journal, author, and organelle; all logical combinations of these criteria can be used. Direct access to sequence regions that code for a specific product (protein, tRNA or rRNA) is provided. A versatile extraction procedure copies selected sequences, or fragments of them, from the database to user files suitable to be analysed by user-supplied application programs. A detailed help mechanism is provided to aid the user at any time during the retrieval session. All software has been written in FORTRAN 77 which guarantees a high degree of transportability to minicomputers or mainframes.

178 citations


Journal ArticleDOI
TL;DR: The structure of the GenBank nucleic acid sequence database, the extent of the data and the implications of the database for research on nucleic acids are discussed.
Abstract: The GenBank nucleic acid sequence database is a computer-based collection of all published DNA and RNA sequences; it contains over five million bases in close to six thousand sequence entries drawn from four thousand five hundred published articles. Each sequence is accompanied by relevant biological annotation. The database is available either on magnetic tape, on floppy diskettes, on-line or in hardcopy form. We discuss the structure of the database, the extent of the data and the implications of the database for research on nucleic acids.

60 citations


Journal ArticleDOI
TL;DR: This column can serve as a vehicle to discuss the ramifications of these on-going changes in the scientific workplace and to elicit a sharing of information and perceptions so that these new workstations will result in the maximum possible benefit to science.
Abstract: As the readers of this new journal must be aware, the computer invasion of the laboratory workplace has finally begun in earnest as a consequence of the availability of small extremely powerful microcomputer workstations. True, over the past decade or two there has been a cadre of investigators who have attempted to blend the disparate disciplines of biological and computer sciences. However, the impact of computers on research in the biosciences has been uneven because of the specialized skills which have been required of investigators to master both disciplines. Now as a direct consequence of the availability of small, 'friendly' workstations, most people believe that the way we do science will be radically altered. We feel that it is important to discuss the impact of computers in the scientific workplace both in the form of regular scientific papers as well as more free-form discussions in this communications section. We hope that this column can serve as a vehicle to discuss the ramifications of these on-going changes and to elicit a sharing of information and perceptions so that these new workstations will result in the maximum possible benefit to science. We would particularly welcome letters and comments from readers regarding their experiences with computers. If the subject is appropriate the correspondent may also use this column as a means of directly communicating with the readership of CABIOS in future issues. For the sake of establishing a point of departure for the first column I have chosen to focus on information management in the academic bioscience laboratory. For the present discussion, I have construed 'information' to include all of the varieties which impact on the scientific workplace, these being data collection and processing, word or document processing, and financial and personnel record keeping. Each of these impinge to varying extents on the day-to-day conduct of our research. While many industrial laboratories have begun to implement information management schemes with computer networks, it is rare to find academic departments outside of computer science and engineering which have ventured into this new arena. The factors which have worked to inhibit this transformation include cost and the preception that direct access to computers would not materially benefit the average scientist. I believe that the lessons learned during the

53 citations


Journal ArticleDOI
TL;DR: Three programs useful for the investigation of steady-state kinetics have been developed and one provides the solution to the steady- state rate equation and the first is a straightforward implementation of the rules developed by Chou.
Abstract: Three programs useful for the investigation of steady-state kinetics have been developed. Two provide the solution to the steady-state rate equation; the first of these is a straightforward implementation of the rules developed by Chou. The second is a very efficient procedure for evaluating King-Altman diagrams and can be used for quite large mechanisms. The third program provides the numeric solution for a specific mechanism and set of initial conditions; it is well suited to extremely large models.

39 citations


Journal ArticleDOI
TL;DR: FORTRAN 77 software is described allowing for convenient searching of any segmented and ambiguous pattern in the currently available protein or nucleotide sequence databanks, and can be instrumental in defining conserved functional domains among non-homologous overall primary structures.
Abstract: We describe FORTRAN 77 software allowing for convenient searching of any segmented and ambiguous pattern in the currently available protein or nucleotide sequence databanks. For proteins, this software can be instrumental in defining conserved functional domains among non-homologous overall primary structures. For nucleic acids, it is used in detecting complex and/or low consensus structural or regulatory patterns. As first applications we have studied the distribution of short consensus sequences believed to characterize heat-shock and glucocorticoid regulated promoters. This analysis allowed an evaluation of the specificity, probable role and thus biological significance of various regions of these consensus. In addition, the expression of several known genes are predicted to be heat-shock or glucocorticoid sensitive.

23 citations


Journal ArticleDOI
TL;DR: Morphological characters together with the results of tests using API 20E and API 50CHB, read after 24 and 48 h incubation, are used to obtain a probabilistic identification of an unknown aerobic endospore forming rod.
Abstract: A microcomputer based system for the identification of unknown isolates of Bacillus species is described. The identification matrix includes 78 test probabilities for 38 recognised species and other groups in the genus Bacillus and it is based on the work of Logan and Berkeley (1984). Morphological characters together with the results of tests using API 20E and API 50CHB, read after 24 and 48 h incubation, are used to obtain a probabilistic identification of an unknown aerobic endospore forming rod. Any differences between the observed and expected results for any identified organism are listed. Identification can be attempted on the basis of a limited set of test results, although this is rarely if ever done with this largely API based system, and if the unknown cannot be successfully identified then a set of additional tests can be selected which should permit identification. The computer system can store and recall test results entered for any isolate. This feature allows the accumulation of data on isolates which could be used to update the identification matrix in future taxonomic studies.

13 citations


Journal ArticleDOI
TL;DR: The program VTUTIN makes full screen editing along the lines of modern text editors available for the complex data type of sets of sequence gels and their consensus.
Abstract: Large DNA sequences are now routinely sequenced by the cloning of randomly generated fragments into single-stranded DNA phage vectors (the 'shotgun' method). Various programs exist for computerized assembly of such fragments, including the phases of data entry, homology searching and gel-management/editing. Many gel-management editors are rudimentary in nature, using either line-editing techniques or using unnatural displays or command systems. Others are available only on restricted types of computer system. The program VTUTIN makes full screen editing along the lines of modern text editors available for the complex data type of sets of sequence gels and their consensus. Not only are the data displayed on the VDU screen in a natural manner, but VTUTIN has also been written to model the command system of a well-established text editor (PDP-11 KED or VAX/VMS EDT) to simplify editor use and learning. VTUTIN has been written in Pascal in a modular form so that wide-spread portability is facilitated. VTUTIN is currently implemented to work on VT-100 type terminals although the modularity of the code should allow straightforward conversion for other terminal types and should also permit simple alteration to model any other text editor.

13 citations


Journal ArticleDOI
TL;DR: An overview of the results of a survey of some of the group of biologists who are aware of the value of computing, and brief comments on the editorial policy that CABIOS will adopt in its formative stage are presented.
Abstract: For most life scientists, the application of computational techniques to their subject is relatively new and has been precipitated largely by the widespread incursion of microcomputers into their laboratories. About ten years ago a laboratory computer system offering reasonable memory and mass storage and a high level language cost several times more than a preparative ultracentrifuge. Today, a sophisticated microcomputer system, offering concurrent operating systems (allowing more than one task to proceed simultaneously), and sufficient core memory and mass storage to realise the benefits of such facilities can be purchased for a small fraction of the cost of an ultracentrifuge. Powerful computers that sit on the desks of biologists can be justified relatively easily on the basis of cost, but scientific reasons for such purchases are not so easy to discover. On average, life scientists are not yet as familiar with computing as they are, for example, with centrifugation. Although the scope of the former is more widespread it constitutes a small proportion of the training of the researcher or teacher. Clearly, any development that increases awareness in this area, including Computer Applications In The Biosciences (CABIOS), will have an important role to play in the future. CABIOS has been conceived as making an important contribution to the development of the computing in the life sciences. As such, this journal must meet the needs of the readership and respond to suggestions to include topics or features that are perceived to be of value. In this context, the publishers initiated a survey of some of the group of biologists who are aware of the value of computing, and have used the information that derived from this survey in establishing the form of the journal. Presented here is an overview of the results of the survey together with brief comments on the editorial policy that CABIOS will adopt in its formative stage.

13 citations


Journal ArticleDOI
TL;DR: The large amounts of sequence data now becoming available require that algorithms for database searching be fast and efficient and considerable progress is being made in this area.
Abstract: Nucleic acid and protein sequences contain a wealth of information of interest to molecular biologists. The advent of molecular sequence databases provides a unique opportunity for the computer analysis of all available sequences. Sequence databases serve two main functions: (i) to facilitate comparisons with newly determined sequences, and (ii) to act as a source of data for the generation and testing of hypotheses concerning molecular sequence organisation and evolution. The large amounts of sequence data now becoming available require that algorithms for database searching be fast and efficient and considerable progress is being made in this area.

11 citations


Journal ArticleDOI
TL;DR: A microcomputer program which locates tRNA genes within long DNA sequences is described, useful in finding inverted repeats allowing the formation of stem-loop secondary structures in tRNA.
Abstract: A microcomputer program which locates tRNA genes within long DNA sequences is described. The search is performed either by identifying tRNA-like secondary structures or by locating eukaryotic RNA polymerase III promoter consensus sequences. The program is also useful in finding inverted repeats allowing the formation of stem-loop secondary structures in tRNA. The program has been developed in BASIC and 6502 Assembler and runs on the Apple II plus and IIe microcomputers. The execution is quite fast; all the operations are carried out in 1-90 s, depending on the required task and on the sequence length.

7 citations


Journal ArticleDOI
TL;DR: A new algorithm is described that will rapidly produce restriction maps of cloned DNA fragments that is based upon a permutation analysis and primarily designed for linear vectors, but can be used to calculate circular maps.
Abstract: A new algorithm is described that will rapidly produce restriction maps of cloned DNA fragments Information concerning the vector is stored as a data file and used in constructing probable maps As the program is based upon a permutation analysis it has two primary uses First, preliminary restriction maps can be created from fragment length data as a starting point for further analysis Second, existing maps can be confirmed as being highly probable, and other probable maps examined to ensure certain combinations have not been overlooked Although primarily designed for linear vectors, the program can be used to calculate circular maps

Journal ArticleDOI
TL;DR: The program described in this paper, SEQ-ED, has been designed to handle a large number of DNA sequences up to 200 kilobases [kb] long stored in a sequence library using three binary digits per base.
Abstract: The rapidly growing body of sequenced DNA demands efficient computer programs for its analysis and storage. The program described in this paper, SEQ-ED, has been designed to handle a large number of DNA sequences up to 200 kilobases [kb] long stored in a sequence library. In order to minimize the required storage space, the sequences are stored in a compressed format using three binary digits per base. In the development of this program, special care has been given to make it easy to use for molecular biologists without any previous computer experience.

Journal ArticleDOI
TL;DR: A fast general purpose DNA handling program has been developed in BASIC and machine language that allows file insertion and editing, translation into protein sequence, reverse translation, search for small strings and restriction enzyme sites.
Abstract: A fast general purpose DNA handling program has been developed in BASIC and machine language. The program runs on the Apple II plus or on the Apple IIe microcomputer, without additional hardware except for disk drives and printer. The program allows file insertion and editing, translation into protein sequence, reverse translation, search for small strings and restriction enzyme sites. The homology may be shown either as a comparison of two sequences or through a matrix on screen. Two additional features are: (i) drawing restriction site maps on the printer; and (ii) simulating a gel electrophoresis of restriction fragments both on screen and on paper. All the operations are very fast. The more common tasks are carried out almost instantly; only more complex routines, like finding homology between large sequences or searching and sorting all the restriction sites in a long sequence require longer, but still quite acceptable, times (generally under 30 s).

Journal ArticleDOI
TL;DR: Software for non-linear curve fitting has been written in BASIC to execute on the British Broadcasting Corporation Microcomputer using the direct search algorithm Pattern-search, a robust algorithm that has the additional advantage of needing specification of the function without inclusion of the partial derivatives.
Abstract: Software for non-linear curve fitting has been written in BASIC to execute on the British Broadcasting Corporation Microcomputer. The program uses the direct search algorithm Pattern-search, a robust algorithm that has the additional advantage of needing specification of the function without inclusion of the partial derivatives. Although less efficient than gradient methods, the program can be readily configured to solve low-dimensional optimization problems that are normally encountered in life sciences. In writing the software, emphasis has been placed upon the 'user interface' and making the most efficient use of the facilities provided by the minimal configuration of this system.

Journal ArticleDOI
TL;DR: A new computer search strategy has been devised for high-resolution nucleotide sequence analysis that is exhaustive and capable of detecting all possible homologies and other types of relationships between or within sequences irrespective of the pattern of matches and mismatches encountered.
Abstract: A new computer search strategy has been devised for high-resolution nucleotide sequence analysis. The strategy differs from those used by earlier sequence analysing programs in that it is exhaustive and capable of detecting all possible homologies and other types of relationships between or within sequences irrespective of the pattern of matches and mismatches encountered. The implementation of this strategy into a working algorithm is described.

Journal ArticleDOI
TL;DR: An algorithm and a program have been developed which enable optimal alignments of biological sequences on an 8-bit microcomputer.
Abstract: An algorithm and a program have been developed which enable optimal alignments of biological sequences on an 8-bit microcomputer The compiled program can process sequences up to 1000 residues on a Commodore 64 Since this program was written originally in the BASIC language, it may readily be adapted to other microcomputers with small changes

Journal ArticleDOI

Journal ArticleDOI
TL;DR: Electronic spreadsheets computerise the traditional layout of any tabulation or complex calculation done with pencil, paper and calculator and have great potential in aiding routine calculations which might be done by these means or with a small BASIC computer program.
Abstract: Electronic spreadsheets computerise the traditional layout of any tabulation or complex calculation done with pencil, paper and calculator. They therefore have great potential in aiding routine calculations which might be done by these means or with a small BASIC computer program. Their simple structure and strong affinity with traditional methods make them particularly suitable for those who have not yet mastered the art of programming. However, a necessarily brief review of their application to science and technology demonstrates that this potential is not being realised in comparison with their wide-spread usage in the business world. The application of both Multiplan and Visicalc running respectively on the Macintosh and the Apple IIe microcomputers in four types of calculation is demonstrated: tabulation, curve-fitting and statistics, simulation, and numerical approximation. Advantages are found in the concurrent display of data and results, the ease of correction or modification of data and the escape from traditional linear programming methods. The spreadsheet format imposes its own constraints. It is not so flexible as BASIC, it demands more memory and may have a slower execution time than a program written in a high-level language, and it is more difficult to produce graphical output.

Journal ArticleDOI

Journal ArticleDOI
Kurt Stüber1
TL;DR: Layout procedures were developed to display the homology and repeat matrices of a sequence and to predict and display the secondary structure of RNA/DNA molecules free of overlap and to Predict and display internal repeats.
Abstract: Several interactive Pascal programs have been written for the analysis and display of structural information in nucleic acid sequences. Layout procedures were developed to display the homology and repeat matrices of a sequence and to predict and display the secondary structure of RNA/DNA molecules free of overlap and to predict and display internal repeats. No special plotting devices are required because the output is adapted to line printers. Sequences from several DNA database systems can be used as input. These programs are part of a general nucleic acid sequence analysis package.

Journal ArticleDOI
TL;DR: The core of a 6502 machine language program for DNA sequence analysis on Apple II microcomputer is described, using a binary coding of nucleotides for interactive data manipulation on a low-cost configuration with execution times similar to those of larger computers.
Abstract: The core of a 6502 machine language program for DNA sequence analysis on Apple II microcomputer is described. Use of a binary coding of nucleotides allows interactive data manipulation on a low-cost configuration with execution times similar to those of larger computers. The PEGASE system should prove useful and easy to use in routine sequence handling and experiment design.

Journal ArticleDOI
TL;DR: An algorithm for prediction of RNA secondary structures that computes location and free energy of every possible stem-loop structure and lists the positions and free energies of all the stem-loops in the order of their probability sizes is presented.
Abstract: We present an algorithm for prediction of RNA secondary structures The program consists of three parts: the first computes location and free energy of every possible stem-loop structure, the second computes probability of its formation, and the third lists the positions and free energies of all the stem-loops in the order of their probability sizes The circular RNA molecule of chrysanthemum stunt viroid was used as an input data for demonstrating the operation of the program

Journal ArticleDOI
TL;DR: An inexpensive yet versatile microcomputer-based system for quantitating light intensity levels in autoradiographs using a standard video camera interfaced to an analog-to-digital convertor, permitting an easy and accurate quantitation of spots or bands of irregular shape.
Abstract: We have developed an inexpensive yet versatile microcomputer-based system for quantitating light intensity levels in autoradiographs. This system employs a standard video camera interfaced to an analog-to-digital convertor. A program has been written for this system which can measure intensities within a defined region of an autoradiograph, permitting an easy and accurate quantitation of spots or bands of irregular shape.

Journal ArticleDOI
TL;DR: Eleven published programs for performing nonlinear regression using a microcomputer have been reviewed and guidance is given as to which might be the most suitable for a particular purpose or for a given microcomputer system.
Abstract: Eleven published programs for performing nonlinear regression using a microcomputer have been reviewed. They have been assessed according to many criteria, especially: application, program language, algorithm used, method for calculating partial differentials, facility for weighting, desirable input and output features, robustness during execution, memory requirements, accessability, ease of implementation and evaluation and testing. No one program contains all the desirable characteristics discussed, but guidance is given as to which might be the most suitable for a particular purpose or for a given microcomputer system.

Journal ArticleDOI
TL;DR: An algorithm is presented that takes advantage of the high degree of homology between such sequences to construct an alignment of the matching regions, and it can overcome large gaps or mismatch zones that correspond for instance to misinterpretation of compressions on sequence gels.
Abstract: During the course of determining the sequence of a large DNA fragment, it is necessary to cross-check numerous gel readings from different DNA fragments, in order to track and eliminate mistakes. An algorithm is presented that takes advantage of the high degree of homology between such sequences to construct an alignment of the matching regions. It does not require knowledge of a starting homology zone, neither large memory areas, even for sequences of several kilobases, and it can overcome large gaps or mismatch zones that correspond for instance to misinterpretation of compressions on sequence gels. This algorithm has been implemented in 6502 assembly language on an Apple II computer as an extension to the PEGASE sequence handling system.

Journal ArticleDOI

Journal ArticleDOI
TL;DR: The schemes and algorithms, developed using BASICA on an IBM-Personal Computer, which are described in this article can serve as models for the assembly of their own programs for the collection, manipulation and plotting of time-based data.
Abstract: With the advent of increasingly integrated, powerful and inexpensive digital electronics, relatively powerful computers have become available to the general public. Along with this technological boom there has been a concomitant increase in the availability of over-the-counter software packages which can be used by research scientists for program development. In the past, the development of computer programs for the collection of large amounts of time-based data was expensive and time consuming; however, the introduction of the current generation of 16-bit microcomputers and associated hardware and software packages has enabled investigators with only a rudimentary knowledge of computers and interfacing to begin to design programs. The schemes and algorithms, developed using BASICA on an IBM-Personal Computer, which are described in this article can serve other investigators as models for the assembly of their own programs for the collection, manipulation and plotting of time-based data. The incorporation of inexpensive computer graphics hardware and software, which provided a simple solution to the problem of analysis and presentation of large amounts of data, will also be discussed.

Journal ArticleDOI

Journal ArticleDOI

Journal ArticleDOI
TL;DR: A new method of access has been devised for biologists requiring the use of computer programs offering high-resolution analysis and comparison of nucleotide sequence data through the development of a pair of programs, called SEQANAL and SEQTALK, designed to operate in tandem.
Abstract: A new method of access has been devised for biologists requiring the use of computer programs offering high-resolution analysis and comparison of nucleotide sequence data. The strategy involves the development of a pair of computer programs, called SEQANAL and SEQTALK, designed to operate in tandem. SEQANAL is a large and complex program intended to be used to discover regions of internal repeats and dyad symmetries within one sequence, or regions of homology, complementarity or optimal alignment between two sequences. Three algorithms are supported: those of Staden (1977, 1978); of Korn et al. (1977); Queen and Korn (1980); and the newly-described exhaustive tree-searching algorithm of Burnett et al. (1985, 1986). The SEQTALK program is a small, portable, interactive, front-end program with which the user can specify the instructions to control the SEQANAL program. Together, the SEQANAL and SEQTALK programs permit analyses to be performed at a remote facility on a mainframe computer under the complete control of a distant user equipped with minimal computing facilities, and without needing networking facilities.