scispace - formally typeset
Search or ask a question
Topic

String (computer science)

About: String (computer science) is a research topic. Over the lifetime, 19430 publications have been published within this topic receiving 333247 citations. The topic is also known as: str & s.


Papers
More filters
Posted Content
Ely Porat1
TL;DR: This work suggests a method for holding a dictionary data structure, which maps keys to values, in the spirit of Bloom Filters, and suggests a data structure that requires only nk bits space, has O (n) preprocessing time, and has a O (logn ) query time.
Abstract: We suggest a method for holding a dictionary data structure, which maps keys to values, in the spirit of Bloom Filters. The space requirements of the dictionary we suggest are much smaller than those of a hashtable. We allow storing n keys, each mapped to value which is a string of k bits. Our suggested method requires nk + o(n) bits space to store the dictionary, and O(n) time to produce the data structure, and allows answering a membership query in O(1) memory probes. The dictionary size does not depend on the size of the keys. However, reducing the space requirements of the data structure comes at a certain cost. Our dictionary has a small probability of a one sided error. When attempting to obtain the value for a key that is stored in the dictionary we always get the correct answer. However, when testing for membership of an element that is not stored in the dictionary, we may get an incorrect answer, and when requesting the value of such an element we may get a certain random value. Our method is based on solving equations in GF(2^k) and using several hash functions. Another significant advantage of our suggested method is that we do not require using sophisticated hash functions. We only require pairwise independent hash functions. We also suggest a data structure that requires only nk bits space, has O(n2) preprocessing time, and has a O(log n) query time. However, this data structures requires a uniform hash functions. In order replace a Bloom Filter of n elements with an error proability of 2^{-k}, we require nk + o(n) memory bits, O(1) query time, O(n) preprocessing time, and only pairwise independent hash function. Even the most advanced previously known Bloom Filter would require nk+O(n) space, and a uniform hash functions, so our method is significantly less space consuming especially when k is small.

65 citations

Proceedings ArticleDOI
23 Aug 1992
TL;DR: This chapter presents evidence for preferring to extract semantic information from a syntactic analysis of a dictionary definition rather than directly from the definition string itself when the information to be extracted is found in the differentiae.
Abstract: This chapter presents evidence for preferring to extract semantic information from a syntactic analysis of a dictionary definition rather than directly from the definition string itself when the information to be extracted is found in the differentiae. We present examples of how very complex information can be extracted from the differentiae of the definition using structural analysis patterns, and why string patterns would fail to do the same.

65 citations

Proceedings ArticleDOI
22 Jun 2013
TL;DR: An expansion-based framework to measure string similarities efficiently while considering synonyms is presented, and an estimator to approximate the size of candidates to enable an online selection of signature filters to further improve the efficiency.
Abstract: A string similarity measure quantifies the similarity between two text strings for approximate string matching or comparison. For example, the strings "Sam" and "Samuel" can be considered similar. Most existing work that computes the similarity of two strings only considers syntactic similarities, e.g., number of common words or q-grams. While these are indeed indicators of similarity, there are many important cases where syntactically different strings can represent the same real-world object. For example, "Bill" is a short form of "William". Given a collection of predefined synonyms, the purpose of the paper is to explore such existing knowledge to evaluate string similarity measures more effectively and efficiently, thereby boosting the quality of string matching.In particular, we first present an expansion-based framework to measure string similarities efficiently while considering synonyms. Because using synonyms in similarity measures is, while expressive, computationally expensive (NP-hard), we propose an efficient algorithm, called selective-expansion, which guarantees the optimality in many real scenarios. We then study a novel indexing structure called SI-tree, which combines both signature and length filtering strategies, for efficient string similarity joins with synonyms. We develop an estimator to approximate the size of candidates to enable an online selection of signature filters to further improve the efficiency. This estimator provides strong low-error, high-confidence guarantees while requiring only logarithmic space and time costs, thus making our method attractive both in theory and in practice. Finally, the results from an empirical study of the algorithms verify the effectiveness and efficiency of our approach.

65 citations

Proceedings Article
01 Jan 2009
TL;DR: This work presents two new algorithms for the case where the text is fixed and many queries arrive over time, and iteratively constructs a linear size data structure which then allows answering queries in constant time, for many queries even during the construction phase.
Abstract: The Parikh vector of a string s over a finite ordered alphabet Σ = {a1, , aσ} is defined as the vector of multiplicities of the characters, ie p(s) = (p1, , pσ), where pi = |{j | sj = ai}| Parikh vector q occurs in s if s has a substring t with p(t) = q The problem of searching for a query q in a text s of length n can be solved simply and optimally with a sliding window approach in O(n) time We present two new algorithms for the case where the text is fixed and many queries arrive over time The first algorithm finds all occurrences of a given Parikh vector in a text (over a fixed alphabet of size σ ≥ 2) and appears to have a sub-linear expected time complexity The second algorithm only decides whether a given Parikh vector appears in a binary text; it iteratively constructs a linear size data structure which then allows answering queries in constant time, for many queries even during the construction phase

65 citations

Journal ArticleDOI
TL;DR: This work presents a new technique for reducing one-out-of-two string OT, based on so-called privacy amplification, that is more efficient in terms of the number of required realizations of bit OT, and allows for reducing string OT to (apparently) much weaker primitives.
Abstract: Oblivious transfer (OT) is an important primitive in cryptography. In chosen one-out-of-two string OT, a sender offers two strings, one of which the other party, called the receiver, can choose to read, not learning any information about the other string. The sender on the other hand does not obtain any information about the receiver's choice. We consider the problem of reducing this primitive to OT for single bits. Previous attempts to doing this were based on self-intersecting codes. We present a new technique for the same task, based on so-called privacy amplification. It is shown that our method has two important advantages over the previous approaches. First, it is more efficient in terms of the number of required realizations of bit OT, and second, the technique even allows for reducing string OT to (apparently) much weaker primitives. An example of such a primitive is universal OT, where the receiver can adaptively choose what type of information he wants to obtain about the two bits sent by the sender subject to the only constraint that some, possibly very small, uncertainty must remain about the pair of bits.

65 citations


Network Information
Related Topics (5)
Time complexity
36K papers, 879.5K citations
88% related
Tree (data structure)
44.9K papers, 749.6K citations
86% related
Graph (abstract data type)
69.9K papers, 1.2M citations
85% related
Computational complexity theory
30.8K papers, 711.2K citations
82% related
Supervised learning
20.8K papers, 710.5K citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20222
2021491
2020704
2019759
2018816
2017806