scispace - formally typeset
Search or ask a question
Author

Ivan Basar

Bio: Ivan Basar is an academic researcher from University of Zagreb. The author has contributed to research in topics: Tandem repeat & String (computer science). The author has an hindex of 8, co-authored 11 publications receiving 148 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: Using the Key String Algorithm (KSA) to analyze Build 35.1 assembly, consensus alpha satellite higher-order repeats (HOR) and consensus distributions of CENP-B box and pJα motif in human chromosomes 1, 4, 5, 7, 8, 10, 11, 17, 19, and X are determined.
Abstract: Using our Key String Algorithm (KSA) to analyze Build 35.1 assembly we determined consensus alpha satellite higher-order repeats (HOR) and consensus distributions of CENP-B box and pJα motif in human chromosomes 1, 4, 5, 7, 8, 10, 11, 17, 19, and X. We determined new suprachromosomal family (SF) assignments: SF5 for 13mer (2211 bp), SF5 for 13mer (2214 bp), SF2 for 11mer (1869 bp), SF1 for 18mer (3058 bp), SF3 for 12mer (2047 bp), SF3 for 14mer (2379 bp), and SF5 for 17mer (2896 bp) in chromosomes 4, 5, 8, 10, 11, 17, and 19, respectively. In chromosome 5 we identified SF5 13mer without any CENP-B box and pJα motif, highly homologous (96%) to 13mer in chromosome 19. Additionally, in chromosome 19 we identified new SF5 17mer with one CENP-B box and pJα motif, aligned to 13mer by deleting four monomers. In chromosome 11 we identified SF3 12mer, homologous to 12mer in chromosome X. In chromosome 10 we identified new SF1 18mer with eight CENP-B boxes in every other monomer (except one). In chromosome 4 we identified new SF5 13mer with CENP-B box in three consecutive monomers. We found four exceptions to the rule that CENP-B box belongs to type B and pJα motif to type A monomers.

37 citations

Journal ArticleDOI
TL;DR: DFT provides a robust detection method for higher order periodicity and is robust with respect to monomer insertions and deletions, random sequence insertions etc.
Abstract: Background Identification of approximate tandem repeats is an important task of broad significance and still remains a challenging problem of computational genomics. Often there is no single best approach to periodicity detection and a combination of different methods may improve the prediction accuracy. Discrete Fourier transform (DFT) has been extensively used to study primary periodicities in DNA sequences. Here we investigate the application of DFT method to identify and study alphoid higher order repeats.

27 citations

Journal ArticleDOI
TL;DR: A graphical user interface method, ColorHOR, is developed, based on an extension of the key-string algorithm, for fast computational identification of HORs in a given genomic sequence, without requiring a priori information on the composition of the genomic sequence.
Abstract: Motivation: GenBank data are at present lacking alpha satellite higher-order repeat (HOR) annotation. Furthermore, exact HOR consensus lengths have not been reported so far. Given the fast growth of sequence databases in the centromeric region, it is of increasing interest to have efficient tools for computational identification and analysis of HORs from known sequences. Results: We develop a graphical user interface method, ColorHOR, for fast computational identification of HORs in a given genomic sequence, without requiring a priori information on the composition of the genomic sequence. ColorHOR is based on an extension of the key-string algorithm and provides a color representation of the order and orientation of HORs. For the key string, we use a robust 6 bp string from a consensus alpha satellite and its representative nature is tested. ColorHOR algorithm provides a direct visual identification of HORs (direct and/or reverse complement). In more detail, we first illustrate the ColorHOR results for human chromosome 1. Using ColorHOR we determine for the first time the HOR annotation of the GenBank sequence of the whole human genome. In addition to some HORs, corresponding to those determined previously biochemically, we find new HORs in chromosomes 4, 8, 9, 10, 11 and 19. For the first time, we determine exact consensus lengths of HORs in 10 chromosomes. We propose that the HOR assignment obtained by using ColorHOR be included into the GenBank database. Availability: The program with graphical user interface application for ColorHOR is freely available at http://www.hazu.hr/KSA/colorHOR.html. It can be run on any platform on which wxPython is supported. Contact: [email protected] Supplementary information:http://www.hazu.hr/KSA/colorHOR.html.

22 citations

Journal ArticleDOI
TL;DR: It is found that human-chimpanzee differences are much larger for tandem repeats, in particularly for HORs, than for gene sequences, which may be of great significance in light of recent studies that are beginning to reveal the large-scale regulatory architecture of the human genome.
Abstract: Much attention has been devoted to identifying genomic patterns underlying the evolution of the human brain and its emergent advanced cognitive capabilities, which lie at the heart of differences distinguishing humans from chimpanzees, our closest living relatives. Here, we identify two particular intragene repeat structures of noncoding human DNA, spanning as much as a hundred kilobases, that are present in human genome but are absent from the chimpanzee genome and other nonhuman primates. Using our novel computational method Global Repeat Map, we examine tandem repeat structure in human and chimpanzee chromosome 1. In human chromosome 1, we find three higher order repeats (HORs), two of them novel, not reported previously, whereas in chimpanzee chromosome 1, we find only one HOR, a 2mer alphoid HOR instead of human alphoid 11mer HOR. In human chromosome 1, we identify an HOR based on 39-bp primary repeat unit, with secondary, tertiary, and quartic repeat units, fully embedded in human hornerin gene, related to regenerating and psoriatric skin. Such an HOR is not found in chimpanzee chromosome 1. We find a remarkable human 3mer HOR organization based on the ~1.6-kb primary repeat unit, fully embedded within the neuroblastoma breakpoint family genes, which is related to the function of the human brain. Such HORs are not present in chimpanzees. In general, we find that human-chimpanzee differences are much larger for tandem repeats, in particularly for HORs, than for gene sequences. This may be of great significance in light of recent studies that are beginning to reveal the large-scale regulatory architecture of the human genome, in particular the role of noncoding sequences. We hypothesize about the possible importance of human accelerated HOR patterns as components in the gene expression multilayered regulatory network.

19 citations

Journal ArticleDOI
TL;DR: The key-string algorithm was used to scan the recent GenBank data for human alpha satellite DNA sequence AC017075, which was computationally segmented into one HOR domain (super-repeat domain) and two non-HOR domains.

18 citations


Cited by
More filters
01 Jan 2005
TL;DR: It is determined that 33% of human duplications are not duplicated in chimpanzee, including some human disease-causing duplications, and that de novo duplication has contributed most significantly to differences between the species, followed by deletion of ancestral duplications.
Abstract: We present a global comparison of differences in content of segmental duplication between human and chimpanzee, and determine that 33% of human duplications (> 94% sequence identity) are not duplicated in chimpanzee, including some human disease-causing duplications. Combining experimental and computational approaches, we estimate a genomic duplication rate of 4–5 megabases per million years since divergence. These changes have resulted in gene expression differences between the species. In terms of numbers of base pairs affected, we determine that de novo duplication has contributed most significantly to differences between the species, followed by deletion of ancestral duplications. Post-speciation gene conversion accounts for less than 10% of recent segmental duplication. Chimpanzee-specific hyperexpansion (> 100 copies) of particular segments of DNA have resulted in marked quantitative differences and alterations in the genome landscape between chimpanzee and human. Almost all of the most extreme differences relate to changes in chromosome structure, including the emergence of African great ape subterminal heterochromatin. Nevertheless, base per base, large segmental duplication events have had a greater impact (2.7%) in altering the genomic landscape of these two species than single-base-pair substitution (1.2%).

353 citations

BookDOI
14 Dec 2009
TL;DR: "Data Mining Techniques for the Life Sciences" seeks to aid students and researchers in the life sciences who wish to get a condensed introduction into the vital world of biological databases and their many applications.
Abstract: Whereas getting exact data about living systems and sophisticated experimental procedures have primarily absorbed the minds of researchers previously, the development of high-throughput technologies has caused the weight to increasingly shift to the problem of interpreting accumulated data in terms of biological function and biomolecular mechanisms. In "Data Mining Techniques for the Life Sciences", experts in the field contribute valuable information about the sources of information and the techniques used for "mining" new insights out of databases. Beginning with a section covering the concepts and structures of important groups of databases for biomolecular mechanism research, the book then continues with sections on formal methods for analyzing biomolecular data and reviews of concepts for analyzing biomolecular sequence data in context with other experimental results that can be mapped onto genomes. As a volume of the highly successful Methods in Molecular Biology series, this work provides the kind of detailed description and implementation advice that is crucial for getting optimal results. Authoritative and easy to reference, "Data Mining Techniques for the Life Sciences" seeks to aid students and researchers in the life sciences who wish to get a condensed introduction into the vital world of biological databases and their many applications.

135 citations

Journal ArticleDOI
TL;DR: A review of the literature on statistical long-range correlation in DNA sequences can be found in this paper, where the authors conclude that a mixture of many length scales (including some relatively long ones) is responsible for the observed 1/f-like spectral component.
Abstract: In this paper, we review the literature on statistical long-range correlation in DNA sequences. We examine the current evidence for these correlations, and conclude that a mixture of many length scales (including some relatively long ones) in DNA sequences is responsible for the observed 1/f-like spectral component. We note the complexity of the correlation structure in DNA sequences. The observed complexity often makes it hard, or impossible, to decompose the sequence into a few statistically stationary regions. We suggest that, based on the complexity of DNA sequences, a fruitful approach to understand long-range correlation is to model duplication, and other rearrangement processes, in DNA sequences. One model, called ``expansion-modification system", contains only point duplication and point mutation. Though simplistic, this model is able to generate sequences with 1/f spectra. We emphasize the importance of DNA duplication in its contribution to the observed long-range correlation in DNA sequences.

130 citations

Journal ArticleDOI
01 Apr 2022-Science
TL;DR: In this paper , a complete, telomere-to-telomere human genome assembly (T2T-CHM13) has enabled the comprehensively characterize pericentromeric and centromeric repeats, which constitute 6.2% of the genome.
Abstract: Existing human genome assemblies have almost entirely excluded repetitive sequences within and near centromeres, limiting our understanding of their organization, evolution, and functions, which include facilitating proper chromosome segregation. Now, a complete, telomere-to-telomere human genome assembly (T2T-CHM13) has enabled us to comprehensively characterize pericentromeric and centromeric repeats, which constitute 6.2% of the genome (189.9 megabases). Detailed maps of these regions revealed multimegabase structural rearrangements, including in active centromeric repeat arrays. Analysis of centromere-associated sequences uncovered a strong relationship between the position of the centromere and the evolution of the surrounding DNA through layered repeat expansions. Furthermore, comparisons of chromosome X centromeres across a diverse panel of individuals illuminated high degrees of structural, epigenetic, and sequence variation in these complex and rapidly evolving regions.

116 citations