scispace - formally typeset
Search or ask a question

Showing papers on "String (computer science) published in 2007"


Journal ArticleDOI
TL;DR: Although primarily developed for protein interaction analysis, the resource has also been successfully applied to comparative genomics, phylogenetics and network studies, which are all facilitated by programmatic access to the database backend and the availability of compact download files.
Abstract: Information on protein–protein interactions is still mostly limited to a small number of model organisms, and originates from a wide variety of experimental and computational techniques The database and online resource STRING generalizes access to protein interaction data, by integrating known and predicted interactions from a variety of sources The underlying infrastructure includes a consistent body of completely sequenced genomes and exhaustive orthology classifications, based on which interaction evidence is transferred between organisms Although primarily developed for protein interaction analysis, the resource has also been successfully applied to comparative genomics, phylogenetics and network studies, which are all facilitated by programmatic access to the database backend and the availability of compact download files As of release 7, STRING has almost doubled to 373 distinct organisms, and contains more than 15 million proteins for which associations have been pre-computed Novel features include AJAX-based web-navigation, inclusion of additional resources such as BioGRID, and detailed protein domain annotation STRING is available at http:// stringemblde/

669 citations


Journal ArticleDOI
TL;DR: A simplified and improved version of the string method, originally proposed by E et al. (2002) for identifying the minimum energy paths in barrier-crossing events, that is more stable and accurate and combined with the climbing image technique for the accurate calculation of saddle points.
Abstract: We present a simplified and improved version of the string method, originally proposed by E et al. [Phys. Rev. B 66, 052301 (2002)] for identifying the minimum energy paths in barrier-crossing events. In this new version, the step of projecting the potential force to the direction normal to the string is eliminated and the full potential force is used in the evolution of the string. This not only simplifies the numerical procedure, but also makes the method more stable and accurate. We discuss the algorithmic details of the improved string method, analyze its stability, accuracy and efficiency, and illustrate it via numerical examples. We also show how the string method can be combined with the climbing image technique for the accurate calculation of saddle points and we present another algorithm for the accurate calculation of the unstable directions at the saddle points.

638 citations


Proceedings ArticleDOI
10 Jun 2007
TL;DR: This paper proposes a precise, sound, and fully automated analysis technique for SQL injection that successfully discovered previously unknown and sometimes subtle vulnerabilities in real-world programs, has a low false positive rate, and scales to large programs.
Abstract: Web applications are popular targets of security attacks. One common type of such attacks is SQL injection, where an attacker exploits faulty application code to execute maliciously crafted database queries. Bothstatic and dynamic approaches have been proposed to detect or prevent SQL injections; while dynamic approaches provide protection for deployed software, static approaches can detect potential vulnerabilities before software deployment. Previous static approaches are mostly based on tainted information flow tracking and have at least some of the following limitations: (1) they do not model the precise semantics of input sanitization routines; (2) they require manually written specifications, either for each query or for bug patterns; or (3) they are not fully automated and may require user intervention at various points in the analysis. In this paper, we address these limitations by proposing a precise, sound, and fully automated analysis technique for SQL injection. Our technique avoids the need for specifications by consideringas attacks those queries for which user input changes the intended syntactic structure of the generated query. It checks conformance to this policy byconservatively characterizing the values a string variable may assume with a context free grammar, tracking the nonterminals that represent user-modifiable data, and modeling string operations precisely as language transducers. We have implemented the proposed technique for PHP, the most widely-used web scripting language. Our tool successfully discovered previously unknown and sometimes subtle vulnerabilities in real-world programs, has a low false positive rate, and scales to large programs (with approx. 100K loc).

416 citations


Proceedings ArticleDOI
28 Oct 2007
TL;DR: A new error-resilient privacy-preserving string searching protocol that allows to execute any finite state machine in an oblivious manner, requiring a communication complexity which is linear both in the number of states and the length of the input string.
Abstract: Human Desoxyribo-Nucleic Acid (DNA) sequences offer a wealth of information that reveal, among others, predisposition to various diseases and paternity relations. The breadth and personalized nature of this information highlights the need for privacy-preserving protocols. In this paper, we present a new error-resilient privacy-preserving string searching protocol that is suitable for running private DNA queries. This protocol checks if a short template (e.g., a string that describes a mutation leading to a disease), known to one party, is present inside a DNA sequence owned by another party, accounting for possible errors and without disclosing to each party the other party's input. Each query is formulated as a regular expression over a finite alphabet and implemented as an automaton. As the main technical contribution, we provide a protocol that allows to execute any finite state machine in an oblivious manner, requiring a communication complexity which is linear both in the number of states and the length of the input string.

239 citations


Proceedings ArticleDOI
09 Jul 2007
TL;DR: An algorithm that can track symbolic constraints across language boundaries and use those constraints in conjunction with a novel constraint solver to generate both program inputs and database state is developed and a constraints solver is proposed that can solve symbolic constraints consisting of both linear arithmetic constraints over variables as well as string constraints.
Abstract: We describe an algorithm for automatic test input generation for database applications. Given a program in an imperative language that interacts with a database through API calls, our algorithm generates both input data for the program as well as suitable database records to systematically explore all paths of the program, including those paths whose execution depend on data returned by database queries. Our algorithm is based on concolic execution, where the program is run with concrete inputs and simultaneously also with symbolic inputs for both program variables as well as the database state. The symbolic constraints generated along a path enable us to derive new input values and new database records that can cause execution to hit uncovered paths. Simultaneously, the concrete execution helps to retain precision in the symbolic computations by allowing dynamic values to be used in the symbolic executor. This allows our algorithm, for example, to identify concrete SQL queries made by the program, even if these queries are built dynamically.The contributions of this paper are the following. We develop an algorithm that can track symbolic constraints across language boundaries and use those constraints in conjunction with a novel constraint solver to generate both program inputs and database state. We propose a constraint solver that can solve symbolic constraints consisting of both linear arithmetic constraints over variables as well as string constraints (string equality, disequality, as well as membership in regular languages). Finally, we provide an evaluation of the algorithm on a Java implementation of MediaWiki, a popular wiki package that interacts with a database back-end.

232 citations


Proceedings ArticleDOI
03 Dec 2007
TL;DR: This paper introduces a general compression technique that results in at most 2N state traversals when processing a string of length N, and describes a novel alphabet reduction scheme for DFA-based structures that can yield further dramatic reductions in data structure size.
Abstract: Modern network intrusion detection systems need to perform regular expression matching at line rate in order to detect the occurrence of critical patterns in packet payloads. While deterministic finite automata (DFAs) allow this operation to be performed in linear time, they may exhibit prohibitive memory requirements. In [9], Kumar et al. propose Delayed Input DFAs (D2FAs), which provide a trade-off between the memory requirements of the compressed DFA and the number of states visited for each character processed, which corresponds directly to the memory bandwidth required to evaluate regular expressions.In this paper we introduce a general compression technique that results in at most 2N state traversals when processing a string of length N. In comparison to the D2FA approach, our technique achieves comparable levels of compression, with lower provable bounds on memory bandwidth (or greater compression for a given bandwidth bound). Moreover, our proposed algorithm has lower complexity, is suitable for scenarios where a compressed DFA needs to be dynamically built or updated, and fosters locality in the traversal process. Finally, we also describe a novel alphabet reduction scheme for DFA-based structures that can yield further dramatic reductions in data structure size.

220 citations


Patent
27 Mar 2007
TL;DR: In this article, a monolithic, three dimensional NAND string includes a first memory cell located over a second memory cell, such that a defined boundary exists between the semiconductor active region of the first memory cells and the second memory cells.
Abstract: A monolithic, three dimensional NAND string includes a first memory cell located over a second memory cell. A semiconductor active region of the first memory cell is formed epitaxially on a semiconductor active region of the second memory cell, such that a defined boundary exists between the semiconductor active region of the first memory cell and the semiconductor active region of the second memory cell.

215 citations


Proceedings Article
23 Sep 2007
TL;DR: A novel technique, called VGRAM, to judiciously choose high-quality grams of variable lengths from a collection of strings to support queries on the collection, and shows the significant performance improvements on three existing algorithms.
Abstract: Many applications need to solve the following problem of approximate string matching: from a collection of strings, how to find those similar to a given string, or the strings in another (possibly the same) collection of strings? Many algorithms are developed using fixed-length grams, which are substrings of a string used as signatures to identify similar strings. In this paper we develop a novel technique, called VGRAM, to improve the performance of these algorithms. Its main idea is to judiciously choose high-quality grams of variable lengths from a collection of strings to support queries on the collection. We give a full specification of this technique, including how to select high-quality grams from the collection, how to generate variable-length grams for a string based on the preselected grams, and what is the relationship between the similarity of the gram sets of two strings and their edit distance. A primary advantage of the technique is that it can be adopted by a plethora of approximate string algorithms without the need to modify them substantially. We present our extensive experiments on real data sets to evaluate the technique, and show the significant performance improvements on three existing algorithms.

198 citations


Book ChapterDOI
08 Sep 2007
TL;DR: This work shows sequence assembly to be NP-hard under two different models: string graphs and de Bruijn graphs, and gives the first, to the knowledge, optimal polynomial time algorithm for genome assembly that explicitly models the double-strandedness of DNA.
Abstract: Graph-theoretic models have come to the forefront as some of the most powerful and practical methods for sequence assembly. Simultaneously, the computational hardness of the underlying graph algorithms has remained open. Here we present two theoretical results about the complexity of these models for sequence assembly. In the first part, we show sequence assembly to be NP-hard under two different models: string graphs and de Bruijn graphs. Together with an earlier result on the NP-hardness of overlap graphs, this demonstrates that all of the popular graph-theoretic sequence assembly paradigms are NP-hard. In our second result, we give the first, to our knowledge, optimal polynomial time algorithm for genome assembly that explicitly models the double-strandedness of DNA. We solve the Chinese Postman Problem on bidirected graphs using bidirected flow techniques and show to how to use it to find the shortest doublestranded DNA sequence which contains a given set of k-long words. This algorithm has applications to sequencing by hybridization and short read assembly.

196 citations


Journal Article
TL;DR: In this paper, the authors consider spiking neural P systems as binary string generators, where the set of spike trains of halting computations of a given system constitutes the language generated by that system.
Abstract: We continue the study of spiking neural P systems by considering these computing devices as binary string generators: the set of spike trains of halting computations of a given system constitutes the language generated by that system. Although the "direct" generative capacity of spiking neural P systems is rather restricted (some very simple languages cannot be generated in this framework), regular languages are inverse-morphic images of languages of finite spiking neural P systems, and recursively enumerable languages are projections of inverse-morphic images of languages generated by spiking neural P systems.

170 citations


Patent
29 Nov 2007
TL;DR: In this paper, a system and method are provided for identifying active content in websites on a network, which includes a method of classifying web addresses and generating a score indicative of the reputation, or likelihood that a web site associated with an uncategorized URL contains active or other targeted content based on an analysis of the URL.
Abstract: A system and method are provided for identifying active content in websites on a network. One embodiment includes a method of classifying web addresses. One embodiment may include a method of generating a score indicative of the reputation, or likelihood that a web site associated with an uncategorized URL contains active or other targeted content based on an analysis of the URL. In certain embodiments, the score is determined solely from the URL string. Other embodiments include systems configured to perform such methods.

Proceedings ArticleDOI
09 Jul 2007
TL;DR: It is shown that string stability can be achieved for heterogeneous vehicle strings of arbitrary length and arbitrary vehicle type ordering, and the necessary and sufficient conditions forheterogeneous string stability are given for the constant spacing leader-predecessor following control strategy.
Abstract: The spacing errors of a string stable, homogeneous vehicle string attenuate uniformly down the vehicle chain. This result is useful for implementing vehicle formation control because it provides a guideline for the proper intervehicle spacing. In the heterogeneous case, the differing dynamics of the vehicles means the spacing errors do not attenuate or amplify uniformly down the vehicle chain, regardless of whether the formation is string stable or not. Questions arise regarding how heterogeneous string stability should be defined, and what should the proper intervehicle spacing be in order to guarantee nominal safety. In this paper, heterogeneous vehicle strings under simple decentralized control laws with the constant spacing control policy are analyzed. A definition for heterogeneous string stability is proposed. The necessary and sufficient conditions for heterogeneous string stability are given for the constant spacing leader-predecessor following control strategy. The scalability of the control scheme is verified by analyzing the worst case disturbance to error gain. It is shown that string stability can be achieved for heterogeneous vehicle strings of arbitrary length and arbitrary vehicle type ordering.

Patent
15 Nov 2007
TL;DR: A speech processing system includes a multiplexer that receives speech data input as part of a conversation turn in a conversation session between two or more users where one user is a speaker and each of the other users is a listener in each conversation turn as mentioned in this paper.
Abstract: A speech processing system includes a multiplexer that receives speech data input as part of a conversation turn in a conversation session between two or more users where one user is a speaker and each of the other users is a listener in each conversation turn A speech recognizing engine converts the speech data to an input string of acoustic data while a speech modifier forms an output string based on the input string by changing an item of acoustic data according to a rule The system also includes a phoneme speech engine for converting the first output string of acoustic data including modified and unmodified data to speech data for output via the multiplexer to listeners during the conversation turn

Journal ArticleDOI
TL;DR: A simple and intuitive approach to determining the kinematic parameters of a serial-link robot in Denavit-Hartenberg (DH) notation, amenable to computer algebra manipulation and a Java program is available as supplementary downloadable material.
Abstract: This paper presents a simple and intuitive approach to determining the kinematic parameters of a serial-link robot in Denavit-Hartenberg (DH) notation Once a manipulator's kinematics is parameterized in this form, a large body of standard algorithms and code implementations for kinematics, dynamics, motion planning, and simulation are available The proposed method has two parts The first is the ldquowalk through,rdquo a simple procedure that creates a string of elementary translations and rotations, from the user-defined base coordinate to the end-effector The second step is an algebraic procedure to manipulate this string into a form that can be factorized as link transforms, which can be represented in standard or modified DH notation The method allows for an arbitrary base and end-effector coordinate system as well as an arbitrary zero joint angle pose The algebraic procedure is amenable to computer algebra manipulation and a Java program is available as supplementary downloadable material

Book ChapterDOI
09 Jul 2007
TL;DR: A variation of the ring signature scheme is offered, where the signer is guaranteed anonymity even if the common reference string is maliciously generated, and an additional feature of this scheme is that it has perfect anonymity.
Abstract: Ring signatures, introduced by Rivest, Shamir and Tauman, enable a user to sign a message anonymously on behalf of a "ring". A ring is a group of users, which includes the signer. We propose a ring signature scheme that has size O(√N) where N is the number of users in the ring. An additional feature of our scheme is that it has perfect anonymity. Our ring signature like most other schemes uses the common reference string model. We offer a variation of our scheme, where the signer is guaranteed anonymity even if the common reference string is maliciously generated.

Journal ArticleDOI
TL;DR: A very fast new family of string matching algorithms based on hashing q-grams are proposed, which are the fastest on many cases, in particular, on small size alphabets.

Patent
10 Oct 2007
TL;DR: In this article, a technique is provided to facilitate use of a service tool at a downhole location, which has different operational configurations that can be selected and used without moving the service string.
Abstract: A technique is provided to facilitate use of a service tool at a downhole location. The service tool has different operational configurations that can be selected and used without moving the service string.

Proceedings ArticleDOI
07 Jan 2007
TL;DR: In this article, a storage scheme for a string S[1, n] drawn from an alphabet σ, that requires space close to the κ-th order empirical entropy of S, and allows to retrieve any l-long substring of S in optimal O(1+l/log|∑|n) time.
Abstract: We propose a storage scheme for a string S[1, n], drawn from an alphabet σ, that requires space close to the κ-th order empirical entropy of S, and allows to retrieve any l-long substring of S in optimal O(1+l/log|∑|n) time. This matches the best known bounds [14, 7], via the use of binary encodings and tables only. We also apply this storage scheme to prove new time vs space trade-offs for compressed self-indexes [5, 12] and the Burrows-Wheeler Transform [2].

Proceedings ArticleDOI
07 Jan 2007
TL;DR: This paper defines and design succinct indexes for several abstract data types, namely strings, binary relations and multi-labeled trees, and designs a succinct encoding that represents a string of length n over an alphabet of size σ using bits to support access/rank/select operations.
Abstract: We define and design succinct indexes for several abstract data types (ADTs). The concept is to design auxiliary data structures that occupy asymptotically less space than the information-theoretic lower bound on the space required to encode the given data, and support an extended set of operations using the basic operators defined in the ADT. As opposed to succinct (integrated data/index) encodings, the main advantage of succinct indexes is that we make assumptions only on the ADT through which the main data is accessed, rather than the way in which the data is encoded. This allows more freedom in the encoding of the main data. In this paper, we present succinct indexes for various data types, namely strings, binary relations and multi-labeled trees. Given the support for the interface of the ADTs of these data Types, we can support various useful operations efficiently by constructing succinct indexes for them. When the operators in the ADTs are supported in constant time, our results are comparable to previous results, while allowing more flexibility in the encoding of the given data.Using our techniques, we design a succinct encoding that represents a string of length n over an alphabet of size σ using nHk + o(n lg σ) bits1 to support access/rank/select operations in o((lg lg σ)3) time. We also design a succinct text index using nHk + o(n lg σ) bits that supports pattern matching queries in O(m lg lg σ + occ lg1+enlg lg σ) time, for a given pattern of length m. Previous results on these two problems either have a lg σ factor instead of lg lg σ in terms of running time, or are not compressible.

Proceedings ArticleDOI
23 Jun 2007
TL;DR: This work tries to find out if a nearly unmodified state-of-the-art translation system is able to cope with the problem and whether it is capable to further generalize translation rules, for example at the level of word suffixes and translation of unseen words.
Abstract: Current statistical machine translation systems handle the translation process as the transformation of a string of symbols into another string of symbols. Normally the symbols dealt with are the words in different languages, sometimes with some additional information included, like morphological data. In this work we try to push the approach to the limit, working not on the level of words, but treating both the source and target sentences as a string of letters. We try to find out if a nearly unmodified state-of-the-art translation system is able to cope with the problem and whether it is capable to further generalize translation rules, for example at the level of word suffixes and translation of unseen words. Experiments are carried out for the translation of Catalan to Spanish.

Proceedings ArticleDOI
03 Sep 2007
TL;DR: String algorithms are explored to find suitable data structures and algorithms for efficient token based clone detection and implemented them in the tool Repeated Tokens Finder (RTF), which incorporates a suffix array based linear time algorithm to detect string matches.
Abstract: Code clones are similar code fragments that occur at multiple locations in a software system. Detection of code clones provides useful information for maintenance, reengineering, program understanding and reuse. Several techniques have been proposed to detect code clones. These techniques differ in the code representation used for analysis of clones, ranging from plain text to parse trees and program dependence graphs. Clone detection based on lexical tokens involves minimal code transformation and gives good results, but is computationally expensive because of the large number of tokens that need to be compared. We explored string algorithms to find suitable data structures and algorithms for efficient token based clone detection and implemented them in our tool Repeated Tokens Finder (RTF). Instead of using suffix tree for string matching, we use more memory efficient suffix array. RTF incorporates a suffix array based linear time algorithm to detect string matches. It also provides a simple and customizable tokenization mechanism. Initial analysis and experiments show that our clone detection is simple, scalable, and performs better than the previous well-known tools.

Journal ArticleDOI
TL;DR: This note improves the bound that the number of squares in a word of length n is bounded by 2n to 2n-@Q(logn), and conjectures that the conjectured bound is n.

Patent
15 Mar 2007
TL;DR: In this paper, the authors described techniques for automatic generation of one or more tags associated with an image file using hand-written annotations for a displayed image and handwriting recognition processing of the ink annotations.
Abstract: Techniques are described for performing automatic generation of one or more tags associated with an image file. One or more ink annotations for a displayed image are received. Handwriting recognition processing of the one or more ink annotations is performed. A string is generated and the string includes one or more recognized words used to form the one or more tags associated with the image file. The handwriting recognition processing and generating the string are performed in response to receiving the ink annotations.

Book ChapterDOI
21 Feb 2007
TL;DR: A new protocol for blind signatures in which security is preserved even under arbitrarily-many concurrent executions is shown, which is the first to be proven secure in a concurrent setting without random oracles or a trusted setup assumption such as a common reference string.
Abstract: We show a new protocol for blind signatures in which security is preserved even under arbitrarily-many concurrent executions. The protocol can be based on standard cryptographic assumptions and is the first to be proven secure in a concurrent setting (under any assumptions) without random oracles or a trusted setup assumption such as a common reference string. Along the way, we also introduce new definitions of security for blind signature schemes.

Patent
11 Oct 2007
TL;DR: In this article, the authors described a plant for converting solar energy into electrical energy, comprising a photovoltaic generator (2a) including at least one string (2) of photovolastic modules (M), a pulse generator (31) able to send electrical pulses to the input of the string, a signal detector (OP) arranged at the output of a string and able to detect the presence of a signal which is a function of the electrical pulses at the input, and alarm means connected to the signal detector, and can generate an alarm in the event that there
Abstract: There is described a plant (1) for converting solar energy into electrical energy, comprising a photovoltaic generator (2a) including at least one string (2) of photovoltaic modules (M), a pulse generator (31) able to send electrical pulses to the input of the string (2), a signal detector (OP) arranged at the output of the string (2) and able to detect, at the output of the string (2), the presence of a signal which is a function of the electrical pulses at the input, and alarm means connected to the signal detector (OP) and able to generate an alarm in the event that there is no signal at the output of the string (2).

Patent
29 Nov 2007
TL;DR: In this article, a local boosted channel inhibit scheme was proposed to reduce program disturb in a NAND Flash memory cell string where no programming from the erased state is desired, where the selected memory cell was decoupled from the other cells in the NAND string.
Abstract: A method for minimizing program disturb in Flash memories. To reduce program disturb in a NAND Flash memory cell string where no programming from the erased state is desired, a local boosted channel inhibit scheme is used. In the local boosted channel inhibit scheme, the selected memory cell in a NAND string where no programming is desired, is decoupled from the other cells in the NAND string. This allows the channel of the decoupled cell to be locally boosted to a voltage level sufficient for inhibiting F-N tunneling when the corresponding wordline is raised to a programming voltage. Due to the high boosting efficiency, the pass voltage applied to the gates of the remaining memory cells in the NAND string can be reduced relative to prior art schemes, thereby minimizing program disturb while allowing for random page programming.

Book ChapterDOI
19 Aug 2007
TL;DR: This paper defines multi-string non-interactive zero-knowledge proofs and proves that they exist under general cryptographic assumptions, and suggests a universally composable commitment scheme in the multistring model.
Abstract: The common random string model introduced by Blum, Feldman and Micali permits the construction of cryptographic protocols that are provably impossible to realize in the standard model. We can think of this model as a trusted party generating a random string and giving it to all parties in the protocol. However, the introduction of such a third party should set alarm bells going off: Who is this trusted party? Why should we trust that the string is random? Even if the string is uniformly random, how do we know it does not leak private information to the trusted party? The very point of doing cryptography in the first place is to prevent us from trusting the wrong people with our secrets. In this paper, we propose the more realistic multi-string model. Instead of having one trusted authority, we have several authorities that generate random strings. We do not trust any single authority; we only assume a majority of them generate the random string honestly. This security model is reasonable, yet at the same time it is very easy to implement. We could for instance imagine random strings being provided on the Internet, and any set of parties that want to execute a protocol just need to agree on which authorities' strings they want to use. We demonstrate the use of the multi-string model in several fundamental cryptographic tasks. We define multi-string non-interactive zero-knowledge proofs and prove that they exist under general cryptographic assumptions. Our multistring NIZK proofs have very strong security properties such as simulation-extractability and extraction zero-knowledge, which makes it possible to compose them with arbitrary other protocols and to reuse the random strings. We also build efficient simulation-sound multi-string NIZK proofs for circuit satisfiability based on groups with a bilinear map. The sizes of these proofs match the best constructions in the single common random string model. We suggest a universally composable commitment scheme in the multistring model. It has been proven that UC commitment does not exist in the plain model without setup assumptions. Prior to this work, constructions were only known in the common reference string model and the registered public key model. One of the applications of the UC commitment scheme is a coin-flipping protocol in the multi-string model. Armed with the coin-flipping protocol, we can securely realize any multi-party computation protocol.

Journal ArticleDOI
TL;DR: It is shown that ρ(n)≤n and there are at most O.67n runs with periods larger than 87, which supports the conjecture that the number of all runs is smaller than n.
Abstract: A run in a string is a nonextendable (with the same minimal period) periodic segment in a string. The set of runs corresponds to the structure of internal periodicities in a string. Periodicities in strings were extensively studied and are important both in theory and practice (combinatorics of words, pattern-matching, computational biology). Let ρ(n) be the maximal number of runs in a string of length n. It has been shown that ρ(n)=O(n), the proof was very complicated and the constant coefficient in O(n) has not been given explicitly. We demystify the proof of the linear upper bound for ρ(n) and propose a new approach to the analysis of runs based on the properties of subperiods:the periods of periodic parts of the runs We show that ρ(n)≤n and there are at most O.67n runs with periods larger than 87. This supports the conjecture that the number of all runs is smaller than n. We also give a completely new proof of the linear bound and discover several new interesting "periodicity lemmas".

Proceedings ArticleDOI
01 Oct 2007
TL;DR: This work describes a more natural style of programming that yields code that is impervious to injections by construction, and automatically generates code that maps the embedded language to constructs in the host language that reconstruct the embedded sentences, adding escaping functions where appropriate.
Abstract: Software written in one language often needs to construct sentences in another language, such as SQL queries, XML output, or shell command invocations. This is almost always done using unhygienic string manipulation, the concatenation of constants and client-supplied strings. A client can then supply specially crafted input that causes the constructed sentence to be interpreted in an unintended way, leading to an injection attack. We describe a more natural style of programming that yields code that is impervious to injections by construction. Our approach embeds the grammars of the guest languages (e.g., SQL) into that of the host language (e.g., Java) and automatically generates code that maps the embedded language to constructs in the host language that reconstruct the embedded sentences, adding escaping functions where appropriate. This approach is generic, meaning that it can be applied with relative ease to any combination of host and guest languages.

Journal ArticleDOI
TL;DR: A rate distortion problem is solved that is motivated by a quantum data compression problem to send information about a source string x so that a receiver can construct a second string y for which the joint empirical probability distribution of x and y is close to some desired distribution.
Abstract: A rate distortion problem is solved that is motivated by a quantum data compression problem The goal is to send information about a source string x so that a receiver can construct a second string y for which the joint empirical probability distribution of x and y is close to some desired distribution The problem differs from the usual rate distortion problems in that one must consider both remote sources and distortion functions that are not averages of per-letter distortion functions