scispace - formally typeset
Search or ask a question

Showing papers in "Nucleic Acids Research in 1986"


Journal ArticleDOI
TL;DR: A new method for identifying secretory signal sequences and for predicting the site of cleavage between a signal sequence and the mature exported protein is described.
Abstract: A new method for identifying secretory signal sequences and for predicting the site of cleavage between a signal sequence and the mature exported protein is described. The predictive accuracy is estimated to be around 75-80% for both prokaryotic and eukaryotic proteins.

4,517 citations


Journal ArticleDOI
TL;DR: Codon usage in the highly expressed group shows a higher correlation with tRNA abundance, a greater degree of third base pyrimidine bias, and a lesser tendency to the A+T richness which is characteristic of the yeast genome.
Abstract: Codon usage data has been compiled for 110 yeast genes. Cluster analysis on relative synonymous codon usage revealed two distinct groups of genes. One group corresponds to highly expressed genes, and has much more extreme synonymous codon preference. The pattern of codon usage observed is consistent with that expected if a need to match abundant tRNAs, and intermediacy of tRNA-mRNA interaction energies are important selective constraints. Thus codon usage in the highly expressed group shows a higher correlation with tRNA abundance, a greater degree of third base pyrimidine bias, and a lesser tendency to the A+T richness which is characteristic of the yeast genome. The cluster analysis can be used to predict the likely level of gene expression of any gene, and identifies the pattern of codon usage likely to yield optimal gene expression in yeast.

1,051 citations


Journal ArticleDOI
TL;DR: The set of 452 different sequences comprises all sequences that to the authors' knowledge had been published or were available from the sequence library file servers as of December 1, 1990, and that are either complete or cover a minimum of about 70% of the complete sequence.
Abstract: Table 1 lists data on 455 small ribosomal subunit RNA (further abbreviated as srRNA) sequences (references 1 —452) that have been published or submitted to the EMBL or GenBank nucleotide sequence libraries and that are presently stored in aligned form in our data base. The number identifying each sequence in the first column of Table 1 corresponds with the literature reference. If two or more closely related species share the same sequence, they bear the same number, followed by a different lower case character, and the common sequence is listed only once in our alignment. The set of 452 different sequences consists of 97 eukaryotic cytoplasmic, 19 archaebacterial, 276 eubacterial, 16 plastidial, and 44 mitochondrial srRNAs. It comprises all sequences that to our knowledge had been published or were available from the sequence library file servers as of December 1, 1990, and that are either complete or cover a minimum of about 70% of the complete sequence. Partial sequences are included because some of the methods now frequently used for srRNA sequencing preclude the determination of the structure at one or both of the termini. One such method consists of reverse transcription of the srRNA by means of primers complementary to conserved areas in the primary structure (453). In this case the 3'-terminal sequence cannot be found. Another approach (55, 328) involves amplification of the rDNA by means of the polymerase chain reaction (454), using primers binding to conserved areas near the termini, but within the sequence coding for the mature small subunit RNA. In this case both terminal sequences remain unknown. Both methods allow to establish a continuous sequence covering more than 95% of the structure, provided that a sufficient number of primers complementary to internal conserved areas is used. Some authors (e.g. 455), however, use a more limited set of primers and publish sequences that are not only partial but also discontinuous.

983 citations



Journal ArticleDOI
TL;DR: The general nature of this methodology has been further shown to be applicable to other restriction enzymes such as Hind II, Pst I and Fsp I and the mutational frequency obtained using these enzymes is between 40-80% mainly because of less efficient nicking and gapping.
Abstract: M13 RF IV DNA where phosphorothioate groups are incorporated at restriction endonuclease Nci I recognition sites in the (-)strand is efficiently nicked by the action of this enzyme. Incubation of such nicked DNA with exonuclease III produces gapped DNA. The gap can be filled by reaction with deoxynucleoside triphosphates and DNA polymerase I. When this sequence of reactions is performed with DNA containing a mismatch oligonucleotide primer in the (-)-strand mutational frequencies of 70-90% can be obtained upon transformation. The general nature of this methodology has been further shown to be applicable to other restriction enzymes such as Hind II, Pst I and Fsp I. The mutational frequency obtained using these enzymes is between 40-80% mainly because of less efficient nicking and gapping. Studies on inhibition of Nci I cleavage show that in addition to a phosphorothioate group at the position of cleavage an additional group in the 5'-neighbouring position is necessary for complete inhibition.

723 citations


Journal ArticleDOI
TL;DR: The dideoxy chain termination method using deoxy-7-deazaguanosine triphosphate (dc7GTP) in place of dGTP was found to be very useful and use of dc7G TP is concluded to improve the dide oxygen chain termination process for DNA sequencing.
Abstract: The dideoxy chain termination method using deoxy-7-deazaguanosine triphosphate (dc7GTP) in place of dGTP was found to be very useful. Sequencing of a part of the human N-myc gene having 85% GC content is impossible by the original method using dGTP, because of compression of bands. However, the nucleotide sequence of this part was unambiguously determined by analysis of both strands by the modified method. Use of dc7GTP is concluded to improve the dideoxy chain termination method for DNA sequencing.

692 citations


Journal ArticleDOI
TL;DR: It is shown that seven sequenced sigma factors comprise a homologous family of proteins that each have two copies of a sequence similar to the helix-turn-helix DNA binding motif seen in CRP, and lambda repressor and cro proteins.
Abstract: We show, using dot matrix comparisons and statistical analysis of sequence alignments, that seven sequenced sigma factors, E. coli sigma-70 and sigma-32, B. subtilis sigma-43 and sigma-29, phage SP01 gene products 28 and 34, and phage T4 gene product 55, comprise a homologous family of proteins. Sigma-70, sigma-32, and sigma-43 each have two copies of a sequence similar to the helix-turn-helix DNA binding motif seen in CRP, and lambda repressor and cro proteins. B. subtilis sigma-29, SP01 gp28, and SP01 gp34 have at least one copy similar to this sequence. We propose that a second sequence, conserved in all seven proteins is the core RNA polymerase binding site. A third region, present only in sigma-70 and sigma-43, may also be involved in interaction with core. Available mutational evidence supports our model for sigma factor structure.

555 citations


Journal ArticleDOI
TL;DR: Self-cleavage of both plus and minus RNA transcripts of the 247-residue avocado sunblotch viroid (ASBV), prepared from tandem dimeric cDNA clones, occurs specifically at two sites in each transcript to give monomeric plus andminus species.
Abstract: Self-cleavage of both plus and minus RNA transcripts of the 247-residue avocado sunblotch viroid (ASBV), prepared from tandem dimeric cDNA clones, occurs specifically at two sites in each transcript to give monomeric plus and minus species. The cleavage reaction occurs both during transcription and on incubation of purified transcripts at pH 8 and 37 degrees C in the presence of magnesium ions to give a 3'-terminal 2',3'-cyclic phosphate and a 5'-terminal hydroxyl group. Although the self-cleavage occurs at different sites in the ASBV molecule for the plus and minus species, very similar secondary structures with high sequence homology can be drawn at each site. The results are considered to provide further evidence that ASBV is replicated in vivo by a rolling circle mechanism involving non-enzymic cleavage of high molecular weight RNA precursors of ASBV.

530 citations



Journal ArticleDOI
TL;DR: It is concluded that the pattern of synonymous codon usage in regulatory genes reflects primarily the relaxation of natural selection.
Abstract: It has often been suggested that differential usage of codons recognized by rare tRNA species, i.e. "rare codons", represents an evolutionary strategy to modulate gene expression. In particular, regulatory genes are reported to have an extraordinarily high frequency of rare codons. From E. coli we have compiled codon usage data for highly expressed genes, moderately/lowly expressed genes, and regulatory genes. We have identified a clear and general trend in codon usage bias, from the very high bias seen in very highly expressed genes and attributed to selection, to a rather low bias in other genes which seems to be more influenced by mutation than by selection. There is no clear tendency for an increased frequency of rare codons in the regulatory genes, compared to a large group of other moderately/lowly expressed genes with low codon bias. From this, as well as a consideration of evolutionary rates of regulatory genes, and of experimental data on translation rates, we conclude that the pattern of synonymous codon usage in regulatory genes reflects primarily the relaxation of natural selection.

505 citations


Journal ArticleDOI
TL;DR: An improved synthesis of protected deoxynucleoside H-phosphonates is described, which is used in the chemical synthesis of deoxyoligonucleotides up to 107 bases in length.
Abstract: Deoxynucleoside H-phosphonates are used in the chemical synthesis of deoxyoligonucleotides up to 107 bases in length. The biological activity of the synthetic DNA is assessed by cloning into M13 and sequencing. An improved synthesis of protected deoxynucleoside H-phosphonates is also described.

Journal ArticleDOI
TL;DR: Short synthetic oligonucleotides have been covalently cross-linked to alkaline phosphatase using the homobifunctional reagent disuccinimidyl suberate to produce oligomer-enzyme conjugates that can hybridize to target DNA fixed to nitrocellulose within 15 minutes.
Abstract: Short synthetic oligonucleotides have been covalently cross-linked to alkaline phosphatase using the homobifunctional reagent disuccinimidyl suberate. The oligomers, twenty-one to twenty-six bases in length, are complementary to unique sequences found in herpes simplex virus, hepatitis B virus, Campylobacter jejuni and enterotoxigenic Escherichia coli. Each oligomer contains a single modified base with a 12-atom "linker arm" terminating in a reactive primary amine. Cross-linking through this amine results in oligomer-enzyme conjugates composed of one oligomer per enzyme molecule that have full alkaline phosphatase activity and can hybridize to target DNA fixed to nitrocellulose within 15 minutes. The hybrids are detected directly with a dye precipitation assay at a sensitivity of 10(6) molecules (2 X 10(-18) mol) of target DNA in 4 hours development time. The enzyme has no apparent effect on selectivity or kinetics of oligonucleotide hybridization and the conjugates can be hybridized and melted off in a conventional manner.

Journal ArticleDOI
TL;DR: The GenBank Genetic Sequence Data Bank contains over 5700 entries for DNA and RNA sequences that have been reported since 1967, and the forms in which the database is distributed.
Abstract: The GenBank Genetic Sequence Data Bank contains over 5700 entries for DNA and RNA sequences that have been reported since 1967. This paper briefly describes the contents of the database, the forms in which the database is distributed, and the services we offer to scientists who use the GenBank database.

Journal ArticleDOI
TL;DR: There is at least one position within the spacer where a base change drastically reduces recombination even when there is homology between the two recombining loxP sites, and this position is required for efficient recombination.
Abstract: The lox-Cre site-specific recombination system of bacteriophage P1 is comprised of a site on the DNA where recombination occurs called loxP, and a protein, Cre, which mediates the reaction. The loxP site is 34 base pairs (bp) in length and consists of two 13 bp inverted repeats separated by an 8 bp spacer region. Previously it has been shown that the cleavage and strand exchange of recombining loxP sites occurs within this spacer region. We report here an analysis of various base substitution mutations within the spacer region of loxP, and conclude the following: Homology is a requirement for efficient recombination between recombining loxP sites. There is at least one position within the spacer where a base change drastically reduces recombination even when there is homology between the two recombining loxP sites. When two loxP sites containing symmetric spacer regions undergo Cre-mediated recombination in vitro, the DNA between the sites undergoes both excision and inversion with equal frequency.

Journal ArticleDOI
TL;DR: TheSplice junction consensus sequences were virtually identical to those of animal introns except that the polypyrimidine stretch at the 3' splice junction was less pronounced in the plant introns.
Abstract: Splice junction and possible branch point sequences have been collected from 177 plant introns. Consensus sequences for the 5' and 3' splice junctions and for possible branch points have been derived. The splice junction consensus sequences were virtually identical to those of animal introns except that the polypyrimidine stretch at the 3' splice junction was less pronounced in the plant introns. A search for possible branch points with sequences related to the yeast, vertebrate and fungal consensus sequences revealed a similar sequence in plant introns.

Journal ArticleDOI
TL;DR: The complete DNA sequence of the short repeat region in the genome of herpes simplex virus type 1, as 6633 base pairs of composition 79.5% G+C DNA, is reported, which is the most extreme yet determined.
Abstract: We report the complete DNA sequence of the short repeat region in the genome of herpes simplex virus type 1, as 6633 base pairs of composition 79.5% G+C. This contains immediate early gene 3, encoding the IE175 protein, an important transcriptional activator of later virus genes. The IE175 coding region was identified as a 3894 base sequence of 81.5% G+C DNA. The base composition of this gene is thus the most extreme yet determined, and the IE175 predicted amino acid composition is correspondingly biased, most notably with an alanine content of 20.9%. Functionally important regions of the IE175 polypeptide were tentatively identified by comparison with the sequence of the homologous protein from varicella-zoster virus and from locations of ts mutations, and were correlated with properties of the amino acid sequence. Aspects of the evolution of such an extreme composition DNA sequence were discussed.

Journal ArticleDOI
TL;DR: The Protein Identification Resource, which provides the scientific community with an efficient on-line computer system designed for the identification and analysis of protein sequences and their corresponding coding sequences, has been established.
Abstract: The Protein Identification Resource, which provides the scientific community with an efficient on-line computer system designed for the identification and analysis of protein sequences and their corresponding coding sequences, has been established. The resource consists of an integrated computer system composed of a number of protein and nucleic acid sequence databases and the software necessary to analyze this information effectively.

Journal ArticleDOI
TL;DR: The whole package of sequence analysis software contains a comprehensive suite of programs for managing large shotgun sequencing projects, a program containing 61 functions for analysing single sequences and a program for comparing pairs of sequences for similarity.
Abstract: I describe the current status of our sequence analysis software. The package contains a comprehensive suite of programs for managing large shotgun sequencing projects, a program containing 61 functions for analysing single sequences and a program for comparing pairs of sequences for similarity. The programs that have been described before have been improved by the addition of new functions and by being made very much easier to use. The major interactive programs have 125 pages of online help available from within them. Several new programs are described including screen editing of aligned gel readings for shotgun sequencing projects; a method to highlight errors in aligned gel readings, new methods for searching for putative signals in sequences. We use the programs on a VAX computer but the whole package has been rewritten to make it easy to transport it to other machines. I believe the programs will now run on any machine with a FORTRAN77 compiler and sufficient memory. We are currently putting the programs onto an IBM PC XT/AT and another micro running under UNIX.

Journal ArticleDOI
TL;DR: Complementary DNA clones encoding the human kidney epidermal growth factor (EGF) precursor have been isolated and sequenced and it is shown that it can be synthesized as a membrane protein with its NH2-terminus external to the cell surface.
Abstract: Complementary DNA clones encoding the human kidney epidermal growth factor (EGF) precursor have been isolated and sequenced. They predict the sequence of a 1,207 amino acid protein which contains EGF flanked by polypeptide segments of 970 and 184 residues at its NH2- and COOH-termini, respectively. The structural organization of the human EGF precursor is similar to that previously described for the mouse protein and there is 66% identity between the two sequences. Transfection of COS-7 cells with the human EGF precursor cDNA linked to the SV40 early promoter indicate that it can be synthesized as a membrane protein with its NH2-terminus external to the cell surface. The human EGF precursor gene is approximately 110 kilobase pairs and has 24 exons. Its exon-intron organization revealed that various domains of the EGF precursor are encoded by individual exons. Moreover, 15 of the 24 exons encode protein segments that are homologous to sequences in other proteins. Exon duplication and shuffling appear to have played an important role in determining the present structure of this protein.

Journal ArticleDOI
TL;DR: Application of this technique to E. coli promoters as a control ensemble revealed the well known consensus sequences at -35 and -10 which indicates that the methods are adequate to approach problems of this kind.
Abstract: A representative set of 168 eukaryotic POL II promoters has been compiled from the EMBL library and subjected to computer signal search analysis. Application of this technique to E. coli promoters as a control ensemble revealed the well known consensus sequences at -35 and -10 which indicates that the methods are adequate to approach problems of this kind. The results obtained from the eukaryotic promoter set can be summarized as follows: Common sequence features are confined to a region between -50 and +10 relative to the transcriptional initiation site. The only well conserved consensus sequence is TATAAA, centered at -28. A weak motif, CA followed preferentially by pyrimidines, surrounds the cap-site. Two pentanucleotides which have been shown by experiments to stimulate transcription of certain genes, GGGCG and CCAAT, are moderately over-represented in the upstream region (between -129 and -50). However, they occur at highly variable distances from the initiation site.

Journal ArticleDOI
TL;DR: Two bacterial antibiotic resistance genes, one coding for the neomycin phosphotransferase (NPT I) from Tn903, and the other codes for the chloramphenicol acetyltransferase fromTn9 were used as plant selectable markers.
Abstract: Two bacterial antibiotic resistance genes, one coding for the neomycin phosphotransferase (NPT I) from Tn903, and the other coding for the chloramphenicol acetyltransferase from Tn9 were used as plant selectable markers. Both genes were introduced into the Nicotiana tabacum genome in a new plant expression vector, using the direct gene transfer method. The vector pDH51, used in these experiments contains a plant expression unit as a movable cassette, consisting of the strong cauliflower mosaic virus (CaMV) 35S RNA promoter and transcription terminator separated by a polylinker containing several unique restriction sites.

Journal ArticleDOI
TL;DR: The amino acid sequence of human hsp27 shows striking homology with mammalian alpha crystallin, and contains a region towards the carboxy terminus which shares homological with the small hsp of Drosophila and other organisms.
Abstract: The 27 kDa human heat shock protein (hsp27) is encoded by a gene family of 4 members. Two genomic fragments hybridizing to cDNA encoding hsp27 have been isolated, characterized, and sequenced. One clone is a member of a cluster of three genes linked within a 14-18 kb region of the genome and encodes a transcript interrupted by two intervening sequences. A single open reading frame encodes a polypeptide of 22,300 deduced molecular weight. The 5' flanking region contains two transcription start sites and sequences homologous to the Drosophila consensus heat inducible control element. Induction of both potential transcripts follows heat shock in vivo. Accurate heat inducible transcription occurs at both start sites after injection into Xenopus oocytes. The second genomic clone is a processed pseudogene lacking promoter elements and is unlinked with the other members of the hsp27 gene family. The amino acid sequence of human hsp27 shows striking homology with mammalian alpha crystallin, and contains a region towards the carboxy terminus which shares homology with the small hsp of Drosophila and other organisms.

Journal ArticleDOI
TL;DR: The contents of the database, how it is available, and possible future enhancements of Data Library services are described.
Abstract: The EMBL Data Library was the first internationally supported central resource for nucleic acid sequence data. Working in close collaboration with its American counterpart, GenBank (1), the library prepares and makes available to the scientific community a comprehensive collection of the published nucleic acid sequences. This paper describes briefly the contents of the database, how it is available, and possible future enhancements of Data Library services.

Journal ArticleDOI
TL;DR: It is proposed that birnaviruses, in particular IBDV, possess monocistronic segments and that the precursor is proteolytically processed in vivo.
Abstract: The larger RNA segment of infectious bursal disease virus (IBDV: Australian strain 002-73) has been characterized by cDNA cloning and nucleotide sequence analysis. We believe IBDV is the first birnavirus to be sequenced and so have confirmed the coding region by N-terminal amino acid sequence analysis of intact viral proteins and several tryptic peptide fragments. The large RNA segment encodes in order the 37-kDa, 28-kDa and 32-kDa proteins within a continuous open reading frame and the primary translation product appears to be subsequently processed into the mature viral proteins. The large protein precursor is still processed into the 32-kDa host protective immunogen when expressed as a fusion protein in E. coli. These results are in marked contrast to the predictions from in vitro translation data that birnavirus genomes are expressed as polycistronic templates. We can now propose that birnaviruses, in particular IBDV, possess monocistronic segments and that the precursor is proteolytically processed in vivo. The sequence data presented for the 32-kDa host protective immunogen may provide the basic information needed for the production of an effective subunit vaccine against this commercially important virus.

Journal ArticleDOI
TL;DR: A search of available nucleic acid and protein sequences has revealed that the motif CysX2CysX4HisX4Cys (NBPcys) is invarient in all replication competent retroviruses, a Syrian hamster intracisternal A-particle gene, the Drosophila retrotransposon copia and in cauliflower mosaic virus (CaMV).
Abstract: A nucleic acid binding protein (NBP) derived from the gag gene of retroviruses that is thought to interact with genomic RNA in virion cores, contains a highly conserved arrangement of cysteine residues. A search of available nucleic acid and protein sequences has revealed that the motif CysX2CysX4HisX4Cys (NBPcys) is invarient in all replication competent retroviruses, a Syrian hamster intracisternal A-particle gene, the Drosophila retrotransposon copia and in cauliflower mosaic virus (CaMV). In each case, NBPcys is located in that part of the 'gag-pol' region just preceding a conserved protease amino acid sequence. This is of special significance for CaMV as NBPcys is in the coat protein gene (ORF IV) upstream of the putative reverse transcriptase gene (ORF V) and demonstrates that the gag-pol arrangement of reverse transcribing elements is preserved in CaMV. Moreover, CaMV differs from all other known NBPcys-containing elements in that it packages a DNA genome in virions.

Journal ArticleDOI
TL;DR: The rDNA tandem repeat from the nematode C. elegans is sequenced and the data is used to quantify the evolutionary relationships among several organisms currently studied in developmental biology.
Abstract: We have sequenced one complete rDNA tandem repeat from the nematode C. elegans. By comparative analysis we derive secondary structures for the 18s, 5.8s, and 26s rRNA molecules, and comment on other important features of the sequence. We also present the sequence of a junction between the rDNA and non-ribosomal DNA. Finally, we use our data to quantify the evolutionary relationships among several organisms currently studied in developmental biology.

Journal ArticleDOI
TL;DR: The syntheses are described of two types of linker molecule useful for the specific attachment of non-radioactive labels such as biotin and fluorophores to the 5' terminus of synthetic oligodeoxyribonucleotides.
Abstract: The syntheses are described of two types of linker molecule useful for the specific attachment of non-radioactive labels such as biotin and fluorophores to the 5' terminus of synthetic oligodeoxyribonucleotides. The linkers are designed such that they can be coupled to the oligonucleotide as a final step in solid-phase synthesis using commercial DNA synthesis machines. Increased sensitivity of biotin detection was possible using an anti-biotin hybridoma/peroxidase detection system.

Journal ArticleDOI
TL;DR: The nucleotide sequence of the RNA of tobacco vein mottling virus, a member of the potyvirus group, was determined and a consensus sequence of V-(R or K)-F-Q was found on the N-terminal sides of proposed cleavage sites for proteolytic processing of the polyprotein.
Abstract: The nucleotide sequence of the RNA of tobacco vein mottling virus, a member of the potyvirus group, was determined. The RNA was found to be 9471 residues in length, excluding a 3'-terminal poly(A) tail. The first three AUG codons from the 5'-terminus were followed by in-frame termination codons. The fourth, at position 206, was the beginning of an open reading frame of 9015 residues which could encode a polyprotein of 340 kDa. No other long open reading frames were present in the sequence or its complement. This AUG was present in the sequence AGGCCAUG, which is similar to the consensus initiation sequence shared by most eukaryotic mRNAs. The chemically-determined amino acid compositions of the helper component and coat proteins were similar to those predicted from the nucleotide sequence. Amino acid sequencing of coat protein from which an amino-terminal peptide had been removed allowed exact location of the coat protein cistron. A consensus sequence of V-(R or K)-F-Q was found on the N-terminal sides of proposed cleavage sites for proteolytic processing of the polyprotein.

Journal ArticleDOI
TL;DR: Overlapping clones encoding the entire gene for tetanus toxin have now been isolated and the full sequence is reported, which contains one open reading frame encoding a protein of molecular weight 150 491.
Abstract: We have previously isolated and sequenced a portion of the structural gene for tetanus toxin from C.tetani CN3911 (1). Overlapping clones encoding the entire gene have now been isolated and the full sequence is reported below. Our designation of the start of translation is based on amino terminal analysis (2) and partial ami no acid sequence of the toxin (3). The methionine residue before the praline is presumably cleaved after synthesis. The DNA sequence contains one open reading frame encoding a protein of molecular weight 150 491, in agreement with other measurements of the size of the toxin (4).

Journal ArticleDOI
TL;DR: A large hypervariable DNA fragment from a human DNA fingerprint was purified by preparative gel electrophoresis and molecular cloning to provide a panel of extremely informative locus-specific probes ideal for linkage analysis in man.
Abstract: A large hypervariable DNA fragment from a human DNA fingerprint was purified by preparative gel electrophoresis and molecular cloning. The cloned fragment contained a 6.3 kb long minisatellite consisting of multiple copies of a 37 bp repeat unit. Each repeat contained an 11 bp copy of the "core" sequences, a putative recombination signal in human DNA. The cloned minisatellite hybridized to a single locus in the human genome. This locus is extremely polymorphic, with at least 77 different alleles containing 14 to 525 repeat units per allele being resolved in a sample of 79 individuals. All alleles except the shortest are rare and the resulting heterozygosity is very high (approximately 97%). Cloned minisatellites should therefore provide a panel of extremely informative locus-specific probes ideal for linkage analysis in man.