Topic

String (computer science)

About: String (computer science) is a research topic. Over the lifetime, 19430 publications have been published within this topic receiving 333247 citations. The topic is also known as: str & s.

...read moreread less

Papers published on a yearly basis

1 / 3

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Practical methods for constructing suffix trees

[...]

Yuanyuan Tian¹, Sandeep Tata¹, Richard A. Hankins², Jignesh M. Patel¹•Institutions (2)

University of Michigan¹, Intel²

01 Sep 2005

TL;DR: This paper presents a new disk-based suffix tree construction algorithm that is based on a sort-merge paradigm, and shows that for constructing very large suffix trees with very little resources, this algorithm is more efficient than TDD.

...read moreread less

Abstract: Sequence datasets are ubiquitous in modern life-science applications, and querying sequences is a common and critical operation in many of these applications. The suffix tree is a versatile data structure that can be used to evaluate a wide variety of queries on sequence datasets, including evaluating exact and approximate string matches, and finding repeat patterns. However, methods for constructing suffix trees are often very time-consuming, especially for suffix trees that are large and do not fit in the available main memory. Even when the suffix tree fits in memory, it turns out that the processor cache behavior of theoretically optimal suffix tree construction methods is poor, resulting in poor performance. Currently, there are a large number of algorithms for constructing suffix trees, but the practical tradeoffs in using these algorithms for different scenarios are not well characterized.In this paper, we explore suffix tree construction algorithms over a wide spectrum of data sources and sizes. First, we show that on modern processors, a cache-efficient algorithm with O(n2) worst-case complexity outperforms popular linear time algorithms like Ukkonen and McCreight, even for in-memory construction. For larger datasets, the disk I/O requirement quickly becomes the bottleneck in each algorithm's performance. To address this problem, we describe two approaches. First, we present a buffer management strategy for the O(n2) algorithm. The resulting new algorithm, which we call “Top Down Disk-based” (TDD), scales to sizes much larger than have been previously described in literature. This approach far outperforms the best known disk-based construction methods. Second, we present a new disk-based suffix tree construction algorithm that is based on a sort-merge paradigm, and show that for constructing very large suffix trees with very little resources, this algorithm is more efficient than TDD.

...read moreread less

86 citations

Proceedings Article•DOI•

Cache-oblivious string B-trees

[...]

Michael A. Bender¹, Martin Farach-Colton², Bradley C. Kuszmaul³•Institutions (3)

Stony Brook University¹, Rutgers University², Massachusetts Institute of Technology³

26 Jun 2006

TL;DR: This paper presents a cache-oblivious string B-tree (COSB-tree) data structure that is efficient in all these ways: searches asymptotically optimally and inserts and deletes nearly optimally, and maintains an index whose size is proportional to the front-compressed size of the dictionary.

...read moreread less

Abstract: B-trees are the data structure of choice for maintaining searchable data on disk. However, B-trees perform suboptimally when keys are long or of variable length,when keys are compressed, even when using front compression, the standard B-tree compression scheme,for range queries, andwith respect to memory effects such as disk prefetching.This paper presents a cache-oblivious string B-tree (COSB-tree) data structure that is efficient in all these ways: The COSB-tree searches asymptotically optimally and inserts and deletes nearly optimally.It maintains an index whose size is proportional to the front-compressed size of the dictionary. Furthermore, unlike standard front-compressed strings, keys can be decompressed in a memory-efficient manner.It performs range queries with no extra disk seeks; in contrast, B-trees incur disk seeks when skipping from leaf block to leaf block.It utilizes all levels of a memory hierarchy efficiently and makes good use of disk locality by using cache-oblivious layout strategies.

...read moreread less

86 citations

Patent•

Handheld Electronic Device and Method for Calibrating Input of Webpage Address

[...]

Te-Pei Tseng¹, Kun-Da Wu¹•Institutions (1)

HTC¹

02 May 2012

TL;DR: In this article, a method for calibrating an input of webpage address used in a handheld electronic device is provided, which comprises a touch display unit, a storage unit for storing a plurality website address data and a processing unit being electrically connected to the touch display units and the storage unit.

...read moreread less

Abstract: A method for calibrating an input of webpage address used in a handheld electronic device is provided. The handheld electronic device comprises a touch display unit, a storage unit for storing a plurality website address data and a processing unit being electrically connected to the touch display unit and the storage unit. The method comprises the steps outlined in the sentences that follow. At least one character is received from the touch display unit, wherein each of the character has a plurality neighboring characters on a keyboard. A plurality of string combinations are generated by the processing unit according to the neighboring characters. The storage unit is searched by the processing unit according to the string combinations to generate an address suggestion list. A handheld electronic device is disclosed herein as well.

...read moreread less

86 citations

Patent•

Modifying an input string partitioned in accordance with directionality and length constraints

[...]

Lauri Karttunen¹•Institutions (1)

Xerox¹

16 May 1997

TL;DR: In this paper, a processor implemented method of modifying a string of a regular language, which includes at least two symbols and two predetermined substrings, was described, and the processor then replaced the matching substring with the string of the lower language associated with the selected preselected substrings and outputs the modified string.

...read moreread less

Abstract: A processor implemented method of modifying a string of a regular language, which includes at least two symbols and at least two predetermined substrings. Upon receipt of the string, the processor determines an initial position within the string of a substring matching one of the preselected substrings. To make this determination, the processor either matches symbols of the string starting from the left and proceeding to the right or by starting from the right and proceeding to the left. After identifying the initial position, the processor then selects either the longest or the shortest of the preselected substrings. The processor then replaces the matching substring with the string of the lower language associated with the selected preselected substring and outputs the modified string.

...read moreread less

86 citations

Book Chapter•DOI•

Automata-Based Model Counting for String Constraints

[...]

Abdulbaki Aydin¹, Lucas Bang¹, Tevfik Bultan¹•Institutions (1)

University of California¹

18 Jul 2015

TL;DR: This paper presents a constraint solver that, given a string constraint, constructs an automaton that accepts all solutions that satisfy the constraint, and generates a function that gives the total number of solutions within that bound.

...read moreread less

Abstract: Most common vulnerabilities in Web applications are due to string manipulation errors in input validation and sanitization code. String constraint solvers are essential components of program analysis techniques for detecting and repairing vulnerabilities that are due to string manipulation errors. For quantitative and probabilistic program analyses, checking the satisfiability of a constraint is not sufficient, and it is necessary to count the number of solutions. In this paper, we present a constraint solver that, given a string constraint, (1) constructs an automaton that accepts all solutions that satisfy the constraint, (2) generates a function that, given a length bound, gives the total number of solutions within that bound. Our approach relies on the observation that, using an automata-based constraint representation, model counting reduces to path counting, which can be solved precisely. We demonstrate the effectiveness of our approach on a large set of string constraints extracted from real-world web applications.

...read moreread less

86 citations

Collapse

Network Information

Performance

Metrics

19,430

Papers

362,272

Citations

No. of papers in the topic in previous years
Year	Papers
2022	2
2021	491
2020	704
2019	759
2018	816
2017	806

String (computer science)

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics