scispace - formally typeset
Search or ask a question
Topic

String (computer science)

About: String (computer science) is a research topic. Over the lifetime, 19430 publications have been published within this topic receiving 333247 citations. The topic is also known as: str & s.


Papers
More filters
Journal ArticleDOI
01 Sep 2005
TL;DR: This paper presents a new disk-based suffix tree construction algorithm that is based on a sort-merge paradigm, and shows that for constructing very large suffix trees with very little resources, this algorithm is more efficient than TDD.
Abstract: Sequence datasets are ubiquitous in modern life-science applications, and querying sequences is a common and critical operation in many of these applications. The suffix tree is a versatile data structure that can be used to evaluate a wide variety of queries on sequence datasets, including evaluating exact and approximate string matches, and finding repeat patterns. However, methods for constructing suffix trees are often very time-consuming, especially for suffix trees that are large and do not fit in the available main memory. Even when the suffix tree fits in memory, it turns out that the processor cache behavior of theoretically optimal suffix tree construction methods is poor, resulting in poor performance. Currently, there are a large number of algorithms for constructing suffix trees, but the practical tradeoffs in using these algorithms for different scenarios are not well characterized.In this paper, we explore suffix tree construction algorithms over a wide spectrum of data sources and sizes. First, we show that on modern processors, a cache-efficient algorithm with O(n2) worst-case complexity outperforms popular linear time algorithms like Ukkonen and McCreight, even for in-memory construction. For larger datasets, the disk I/O requirement quickly becomes the bottleneck in each algorithm's performance. To address this problem, we describe two approaches. First, we present a buffer management strategy for the O(n2) algorithm. The resulting new algorithm, which we call “Top Down Disk-based” (TDD), scales to sizes much larger than have been previously described in literature. This approach far outperforms the best known disk-based construction methods. Second, we present a new disk-based suffix tree construction algorithm that is based on a sort-merge paradigm, and show that for constructing very large suffix trees with very little resources, this algorithm is more efficient than TDD.

86 citations

Proceedings ArticleDOI
26 Jun 2006
TL;DR: This paper presents a cache-oblivious string B-tree (COSB-tree) data structure that is efficient in all these ways: searches asymptotically optimally and inserts and deletes nearly optimally, and maintains an index whose size is proportional to the front-compressed size of the dictionary.
Abstract: B-trees are the data structure of choice for maintaining searchable data on disk. However, B-trees perform suboptimally when keys are long or of variable length,when keys are compressed, even when using front compression, the standard B-tree compression scheme,for range queries, andwith respect to memory effects such as disk prefetching.This paper presents a cache-oblivious string B-tree (COSB-tree) data structure that is efficient in all these ways: The COSB-tree searches asymptotically optimally and inserts and deletes nearly optimally.It maintains an index whose size is proportional to the front-compressed size of the dictionary. Furthermore, unlike standard front-compressed strings, keys can be decompressed in a memory-efficient manner.It performs range queries with no extra disk seeks; in contrast, B-trees incur disk seeks when skipping from leaf block to leaf block.It utilizes all levels of a memory hierarchy efficiently and makes good use of disk locality by using cache-oblivious layout strategies.

86 citations

Patent
Te-Pei Tseng1, Kun-Da Wu1
02 May 2012
TL;DR: In this article, a method for calibrating an input of webpage address used in a handheld electronic device is provided, which comprises a touch display unit, a storage unit for storing a plurality website address data and a processing unit being electrically connected to the touch display units and the storage unit.
Abstract: A method for calibrating an input of webpage address used in a handheld electronic device is provided. The handheld electronic device comprises a touch display unit, a storage unit for storing a plurality website address data and a processing unit being electrically connected to the touch display unit and the storage unit. The method comprises the steps outlined in the sentences that follow. At least one character is received from the touch display unit, wherein each of the character has a plurality neighboring characters on a keyboard. A plurality of string combinations are generated by the processing unit according to the neighboring characters. The storage unit is searched by the processing unit according to the string combinations to generate an address suggestion list. A handheld electronic device is disclosed herein as well.

86 citations

Patent
Lauri Karttunen1
16 May 1997
TL;DR: In this paper, a processor implemented method of modifying a string of a regular language, which includes at least two symbols and two predetermined substrings, was described, and the processor then replaced the matching substring with the string of the lower language associated with the selected preselected substrings and outputs the modified string.
Abstract: A processor implemented method of modifying a string of a regular language, which includes at least two symbols and at least two predetermined substrings. Upon receipt of the string, the processor determines an initial position within the string of a substring matching one of the preselected substrings. To make this determination, the processor either matches symbols of the string starting from the left and proceeding to the right or by starting from the right and proceeding to the left. After identifying the initial position, the processor then selects either the longest or the shortest of the preselected substrings. The processor then replaces the matching substring with the string of the lower language associated with the selected preselected substring and outputs the modified string.

86 citations

Book ChapterDOI
18 Jul 2015
TL;DR: This paper presents a constraint solver that, given a string constraint, constructs an automaton that accepts all solutions that satisfy the constraint, and generates a function that gives the total number of solutions within that bound.
Abstract: Most common vulnerabilities in Web applications are due to string manipulation errors in input validation and sanitization code. String constraint solvers are essential components of program analysis techniques for detecting and repairing vulnerabilities that are due to string manipulation errors. For quantitative and probabilistic program analyses, checking the satisfiability of a constraint is not sufficient, and it is necessary to count the number of solutions. In this paper, we present a constraint solver that, given a string constraint, (1) constructs an automaton that accepts all solutions that satisfy the constraint, (2) generates a function that, given a length bound, gives the total number of solutions within that bound. Our approach relies on the observation that, using an automata-based constraint representation, model counting reduces to path counting, which can be solved precisely. We demonstrate the effectiveness of our approach on a large set of string constraints extracted from real-world web applications.

86 citations


Network Information
Related Topics (5)
Time complexity
36K papers, 879.5K citations
88% related
Tree (data structure)
44.9K papers, 749.6K citations
86% related
Graph (abstract data type)
69.9K papers, 1.2M citations
85% related
Computational complexity theory
30.8K papers, 711.2K citations
82% related
Supervised learning
20.8K papers, 710.5K citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20222
2021491
2020704
2019759
2018816
2017806