scispace - formally typeset
Search or ask a question
Author

Stavros Konstantinidis

Bio: Stavros Konstantinidis is an academic researcher from Saint Mary's University. The author has contributed to research in topics: Regular language & Formal language. The author has an hindex of 17, co-authored 80 publications receiving 848 citations. Previous affiliations of Stavros Konstantinidis include University of Saint Mary & University of Western Ontario.


Papers
More filters
Journal ArticleDOI
TL;DR: This work defines properties of languages which ensure that the words of such languages will not form undesirable bonds when used in DNA computations and gives several characterizations of the desired properties and provides methods for obtaining languages with such properties.

72 citations

Journal ArticleDOI
TL;DR: This paper formalizes and investigates properties of DNA languages that guarantee their robusteness during computations, and gives algorithms for deciding whether regular DNA languages are invariant under bio-operations.
Abstract: An essential step of any DNA computation is encoding the input data on single or double DNA strands. Due to the biochemical properties of DNA, complementary single strands can bind to one another forming double-stranded DNA. Consequently, data-encoding DNA strands can sometimes interact in undesirable ways when used in computations. It is crucial thus to analyze properties that guard against such phenomena and study sets of sequences that ensure that no unwanted bindings occur during any computation. This paper formalizes and investigates properties of DNA languages that guarantee their robusteness during computations. After defining and investigating several types of DNA languages possessing good encoding properties, such as sticky-free and overhang-free languages, we give algorithms for deciding whether regular DNA languages are invariant under bio-operations. We also give a method for constructing DNA languages that, in addition to being invariant and sticky-free, possess error-detecting properties. Finally, we present the results of running tests that check whether several known gene languages (the set of genes of a given organism) as well as the input DNA languages used in Adleman's DNA computing experiment, have the defined properties.

53 citations

Journal ArticleDOI
TL;DR: The edit distance (or Levenshtein distance) between two words is the smallest number of substitutions, insertions, and deletions of symbols that can be used to transform one of the words into the other.
Abstract: The edit distance (or Levenshtein distance) between two words is the smallest number of substitutions, insertions, and deletions of symbols that can be used to transform one of the words into the other In this paper, we consider the problem of computing the edit distance of a regular language (the set of words accepted by a given finite automaton) This quantity is the smallest edit distance between any pair of distinct words of the language We show that the problem is of polynomial time complexity In particular, for a given finite automaton A with n transitions, over an alphabet of r symbols, our algorithm operates in time O(n2r2q2( q+r)), where q is either the diameter of A (if A is deterministic), or the square of the number of states in A (if A is nondeterministic) Incidentally, we also obtain an upper bound on the edit distance of a regular language in terms of the automaton accepting the language

53 citations

Journal ArticleDOI
TL;DR: A list of known properties of DNA languages which are free of certain types of undesirable bonds is recalled and a general framework in which each of these properties is characterized by a solution of a uniform formal language inequation is introduced.

44 citations

Book ChapterDOI
10 Jun 2001
TL;DR: This work defines properties of languages which ensure that the words of such languages will not form undesirable bonds when used in DNA computations and gives several characterizations of the desired properties and provides methods for obtaining languages with such properties.
Abstract: The computation language of a DNA-based system consists of all the words (DNA strands) that can appear in any computation step of the system. In this work we define properties of languages which ensure that the words of such languages will not form undesirable bonds when used in DNA computations. We give several characterizations of the desired properties and provide methods for obtaining languages with such properties. The decidability of these properties is addressed as well. As an application we consider splicing systems whose computation language is free of certain undesirable bonds and is generated by nearly optimal comma-free codes.

42 citations


Cited by
More filters
Journal ArticleDOI
24 Apr 2020-PLOS ONE
TL;DR: The method achieves 100% accurate classification of the COVID-19 virus sequences, and discovers the most relevant relationships among over 5000 viral genomes within a few minutes, ab initio, suggesting that this alignment-free whole-genome machine-learning approach can provide a reliable real-time option for taxonomic classification.
Abstract: The 2019 novel coronavirus (renamed SARS-CoV-2, and generally referred to as the COVID-19 virus) has spread to 184 countries with over 1.5 million confirmed cases. Such major viral outbreaks demand early elucidation of taxonomic classification and origin of the virus genomic sequence, for strategic planning, containment, and treatment. This paper identifies an intrinsic COVID-19 virus genomic signature and uses it together with a machine learning-based alignment-free approach for an ultra-fast, scalable, and highly accurate classification of whole COVID-19 virus genomes. The proposed method combines supervised machine learning with digital signal processing (MLDSP) for genome analyses, augmented by a decision tree approach to the machine learning component, and a Spearman’s rank correlation coefficient analysis for result validation. These tools are used to analyze a large dataset of over 5000 unique viral genomic sequences, totalling 61.8 million bp, including the 29 COVID-19 virus sequences available on January 27, 2020. Our results support a hypothesis of a bat origin and classify the COVID-19 virus as Sarbecovirus, within Betacoronavirus. Our method achieves 100% accurate classification of the COVID-19 virus sequences, and discovers the most relevant relationships among over 5000 viral genomes within a few minutes, ab initio, using raw DNA sequence data alone, and without any specialized biological knowledge, training, gene or genome annotations. This suggests that, for novel viral and pathogen genome sequences, this alignment-free whole-genome machine-learning approach can provide a reliable real-time option for taxonomic classification.

676 citations

Journal ArticleDOI
TL;DR: A programming language for designing and simulating DNA circuits in which strand displacement is the main computational mechanism and includes basic elements of sequence domains, toeholds and branch migration, and assumes that strands do not possess any secondary structure is presented.
Abstract: Recently, a range of information-processing circuits have been implemented in DNA by using strand displacement as their main computational mechanism. Examples include digital logic circuits and catalytic signal amplification circuits that function as efficient molecular detectors. As new paradigms for DNA computation emerge, the development of corresponding languages and tools for these paradigms will help to facilitate the design of DNA circuits and their automatic compilation to nucleotide sequences. We present a programming language for designing and simulating DNA circuits in which strand displacement is the main computational mechanism. The language includes basic elements of sequence domains, toeholds and branch migration, and assumes that strands do not possess any secondary structure. The language is used to model and simulate a variety of circuits, including an entropy-driven catalytic gate, a simple gate motif for synthesizing large-scale circuits and a scheme for implementing an arbitrary system of chemical reactions. The language is a first step towards the design of modelling and simulation tools for DNA strand displacement, which complements the emergence of novel implementation strategies for DNA computing.

204 citations

Journal ArticleDOI
TL;DR: A comprehensive survey of error-correcting codes for channels corrupted by synchronization errors and potential applications as well as the obstacles that need to be overcome before such codes can be used in practical systems are presented.
Abstract: We present a comprehensive survey of error-correcting codes for channels corrupted by synchronization errors. We discuss potential applications as well as the obstacles that need to be overcome before such codes can be used in practical systems.

145 citations