scispace - formally typeset
Search or ask a question
Topic

String (computer science)

About: String (computer science) is a research topic. Over the lifetime, 19430 publications have been published within this topic receiving 333247 citations. The topic is also known as: str & s.


Papers
More filters
Patent
Nicola Cancedda1, Sara Stymne1
25 Jul 2011
TL;DR: In this article, a method and a system for making merging decisions for a translation are disclosed which are suited to use where the target language is a productive compounding one, including outputting decisions on merging of pairs of words in a translated text string with a merging system.
Abstract: A method and a system for making merging decisions for a translation are disclosed which are suited to use where the target language is a productive compounding one. The method includes outputting decisions on merging of pairs of words in a translated text string with a merging system. The merging system can include a set of stored heuristics and/or a merging model. In the case of heuristics, these can include a heuristic by which two consecutive words in the string are considered for merging if the first word of the two consecutive words is recognized as a compound modifier and their observed frequency f 1 as a closed compound word is larger than an observed frequency f 2 of the two consecutive words as a bigram. In the case of a merging model, it can be one that is trained on features associated with pairs of consecutive tokens of text strings in a training set and predetermined merging decisions for the pairs. A translation in the target language is output, based on the merging decisions for the translated text string.

76 citations

Journal ArticleDOI
11 Nov 2002
TL;DR: It is shown experimentally that suffix trees can be effectively used in approximate string matching with biological data and the requirements for further database and algorithmic research to support efficient use of large suffix indexes in biological applications are detailed.
Abstract: Our aim is to develop new database technologies for the approximate matching of unstructured string data using indexes. We explore the potential of the suffix tree data structure in this context. We present a new method of building suffix trees, allowing us to build trees in excess of RAM size, which has hitherto not been possible. We show that this method performs in practice as well as the O(n) method of Ukkonen [70]. Using this method we build indexes for 200 Mb of protein and 300 Mbp of DNA, whose disk-image exceeds the available RAM. We show experimentally that suffix trees can be effectively used in approximate string matching with biological data. For a range of query lengths and error bounds the suffix tree reduces the size of the unoptimised O(mn) dynamic programming calculation required in the evaluation of string similarity, and the gain from indexing increases with index size. In the indexes we built this reduction is significant, and less than 0.3p of the expected matrix is evaluated. We detail the requirements for further database and algorithmic research to support efficient use of large suffix indexes in biological applications.

76 citations

Book ChapterDOI
27 Jun 2001
TL;DR: This paper completes the picture by showing that MGs in the sense of [11] and LCFRSs give in fact rise to the same class of derivable string languages.
Abstract: The type of a minimalist grammar (MG) as introduced by Stabler [11,12] provides an attempt of a rigorous algebraic formalization of the new perspectives adopted within the linguistic framework of transformational grammar due to the change from GB-theory to minimalism. Michaelis [6] has shown that MGs constitute a subclass of mildly context-sensitive grammars in the sense that for each MG there is a weakly equivalent linear context-free rewriting system (LCFRS). However, it has been left open in [6], whether the respective classes of string languages derivable by MGs and LCFRSs coincide. This paper completes the picture by showing that MGs in the sense of [11] and LCFRSs give in fact rise to the same class of derivable string languages.

76 citations

Posted Content
TL;DR: Two representations of a string of length n compressed into a context-free grammar of size n achieving random access time and several new techniques and data structures of independent interest are introduced, including a predecessor data structure, two "biased" weighted ancestor data structures, and a compact representation of heavy- paths in grammars.
Abstract: Grammar based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures many popular compression schemes. In this paper, we present a novel grammar representation that allows efficient random access to any character or substring without decompressing the string. Let $S$ be a string of length $N$ compressed into a context-free grammar $\mathcal{S}$ of size $n$. We present two representations of $\mathcal{S}$ achieving $O(\log N)$ random access time, and either $O(n\cdot \alpha_k(n))$ construction time and space on the pointer machine model, or $O(n)$ construction time and space on the RAM. Here, $\alpha_k(n)$ is the inverse of the $k^{th}$ row of Ackermann's function. Our representations also efficiently support decompression of any substring in $S$: we can decompress any substring of length $m$ in the same complexity as a single random access query and additional $O(m)$ time. Combining these results with fast algorithms for uncompressed approximate string matching leads to several efficient algorithms for approximate string matching on grammar-compressed strings without decompression. For instance, we can find all approximate occurrences of a pattern $P$ with at most $k$ errors in time $O(n(\min\{|P|k, k^4 + |P|\} + \log N) + occ)$, where $occ$ is the number of occurrences of $P$ in $S$. Finally, we generalize our results to navigation and other operations on grammar-compressed ordered trees. All of the above bounds significantly improve the currently best known results. To achieve these bounds, we introduce several new techniques and data structures of independent interest, including a predecessor data structure, two "biased" weighted ancestor data structures, and a compact representation of heavy paths in grammars.

76 citations

Patent
19 Mar 2001
TL;DR: In this paper, the search criteria are represented as strings of beads in a three-dimensional scene, each bead representing a criterion and each string representing a different category, for example, drama, action, suspense, and horror may be included in a category of genre.
Abstract: A user interface for querying and displaying records from a database employs a physical metaphor for the process of constructing queries and viewing results. The criteria are represented in displays as symbols that can be included in a query. The display of the symbols are ranked in terms of their respective utility, where the utility is inferred from the commands received to generate the queries. In one embodiment, the ranking is based on frequency of use. The ranking may be indicated by various display effects. For example, in an embodiment, the search criteria are indicated as strings of beads in a three-dimensional scene, each bead representing a criterion and each string representing a different category. For example the criteria, drama, action, suspense, and horror may be included in a category of genre. Criteria are selected to form a query by moving corresponding beads to a query string which is then submitted to perform the search. Those beads that correspond to highly ranked criteria are shown in the foreground of the scene and those that correspond to lesser ranked criteria are shown in the background. The beads can be rotated from background to foreground with suitable commands.

76 citations


Network Information
Related Topics (5)
Time complexity
36K papers, 879.5K citations
88% related
Tree (data structure)
44.9K papers, 749.6K citations
86% related
Graph (abstract data type)
69.9K papers, 1.2M citations
85% related
Computational complexity theory
30.8K papers, 711.2K citations
82% related
Supervised learning
20.8K papers, 710.5K citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20222
2021491
2020704
2019759
2018816
2017806