scispace - formally typeset
Search or ask a question

Showing papers by "Lee Rowen published in 2003"


Journal ArticleDOI
TL;DR: It was recently shown that indels are responsible for more than twice as many unmatched nucleotides as are base substitutions between samples of chimpanzee and human DNA, and a larger sample has now been examined and the result is similar.
Abstract: utations in the DNA are the source of variation in Darwinian evolution. Therefore it is likely that the exam- ination of DNA differences between closely related species or among polymorphic variations in DNA of a given species will give insight into the nature of the mutations and the process of evolution. In the present paper, published and unpublished data are summarized for examples from several distantly related phylogenetic groups, and the data show that indels dominate the process of early divergence. There is a continuing problem in these data of the upper limit in the size of detected gaps and bias against larger ones. The groups sampled are apes (chimp-human DNA comparison), sea urchins (Strongylocentrotus purpuratus polymorphism), bacteria (Escherichia coli substrain compari- son), insects (Drosophila polymorphism), nematodes (Caeno- rhabditis elegans polymorphism), and plants (Arabidopsis poly- morphism). It is also noted that human genetic diseases are frequently caused by indels. The first part of the paper summa- rizes the results for samples of chimp DNA compared with the human genome sequence. Then an example of sea urchin polymorphism is briefly described. Initial comparison of two strains of E. coli O157:H7 is described. Finally, the published polymorphism data are reviewed and brought together with the data reported here to draw the conclusion that indel formation is a major and significant evolutionary process. to align the complete chimp BAC sequence with the human genome, regardless of the presence of repeated sequences, which typically consist of about half of the BAC sequence. The repeated sequences, naturally, sometimes complicate the align- ment process. The National Institutes of Health program ''BLAST the Human Genome'' was used to find the most promising region of the human genome for alignment with each particular chimp BAC sequence. This program works well because the human repetitive sequences are filtered out during the comparisons and then apparently reinserted for mapping the results. Usually only one region of the human genome shows a full or nearly full alignment with a chimp BAC sequence, whereas other regions show short or fragmentary alignments. Where duplications of long regions have occurred as on chromosome 22 there is uncertainty and we have not included these comparisons. For the next stage in the analysis a program has been written that almost always accurately detects mismatches and gaps in the alignment. It is called GAPD for gap detection or gap determination and is described in the next few sentences. Standard sequence com- parison programs such as Smith Waterman are used to find the human sequence that aligns with the start of the chimp BAC. From this aligned start GAPD goes nucleotide by nucleotide checking for mismatches. If a mismatch is seen, then a check is made of the succeeding 10 nucleotides, and if at least 6 of these match, the original mismatch is taken to be due to a base substitution. There is a possibility that there is a local region with

207 citations


Journal ArticleDOI
TL;DR: The cross-species comparison has facilitated the identification of 60 genes in human and 61 in mouse, including a potential RNA gene for which the introns are more conserved across species than the exons, and potential cis-regulatory elements.
Abstract: In mammals, the Major Histocompatibility Complex class I and II gene clusters are separated by an ∼700-kb stretch of sequence called the MHC class III region, which has been associated with susceptibility to numerous diseases. To facilitate understanding of this medically important and architecturally interesting portion of the genome, we have sequenced and analyzed both the human and mouse class III regions. The cross-species comparison has facilitated the identification of 60 genes in human and 61 in mouse, including a potential RNA gene for which the introns are more conserved across species than the exons. Delineation of global organization, gene structure, alternative splice forms, protein similarities, and potential cis-regulatory elements leads to several conclusions: (1) The human MHC class III region is the most gene-dense region of the human genome: >14% of the sequence is coding, ∼72% of the region is transcribed, and there is an average of 8.5 genes per 100 kb. (2) Gene sizes, number of exons, and intergenic distances are for the most part similar in both species, implying that interspersed repeats have had little impact in disrupting the tight organization of this densely packed set of genes. (3) The region contains a heterogeneous mixture of genes, only a few of which have a clearly defined and proven function. Although many of the genes are of ancient origin, some appear to exist only in mammals and fish, implying they might be specific to vertebrates. (4) Conserved noncoding sequences are found primarily in or near the 5′-UTR or the first intron of genes, and seldom in the intergenic regions. Many of these conserved blocks are likely to be cis-regulatory elements.

129 citations