Topic

String (computer science)

About: String (computer science) is a research topic. Over the lifetime, 19430 publications have been published within this topic receiving 333247 citations. The topic is also known as: str & s.

...read moreread less

Papers published on a yearly basis

1 / 3

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Approximate string search in spatial databases

[...]

Bin Yao¹, Feifei Li¹, Marios Hadjieleftheriou², Kun Hou¹•Institutions (2)

Florida State University¹, AT&T Labs²

01 Mar 2010

TL;DR: This work presents a novel index structure, MHR-tree, for efficiently answering approximate string match queries in large spatial databases based on the R-tree augmented with the min-wise signature and the linear hashing technique.

...read moreread less

Abstract: This work presents a novel index structure, MHR-tree, for efficiently answering approximate string match queries in large spatial databases. The MHR-tree is based on the R-tree augmented with the min-wise signature and the linear hashing technique. The min-wise signature for an index node u keeps a concise representation of the union of q-grams from strings under the sub-tree of u. We analyze the pruning functionality of such signatures based on set resemblance between the query string and the q-grams from the sub-trees of index nodes. MHR-tree supports a wide range of query predicates efficiently, including range and nearest neighbor queries. We also discuss how to estimate range query selectivity accurately. We present a novel adaptive algorithm for finding balanced partitions using both the spatial and string information stored in the tree. Extensive experiments on large real data sets demonstrate the efficiency and effectiveness of our approach.

...read moreread less

84 citations

Journal Article•DOI•

Shifted Hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping

[...]

Hongyi Xin¹, John Greth¹, John Emmons¹, Gennady Pekhimenko¹, Carl Kingsford¹, Can Alkan¹, Onur Mutlu¹ - Show less +3 more•Institutions (1)

Carnegie Mellon University¹

15 May 2015-Bioinformatics

TL;DR: A simple and efficient algorithm, Shifted Hamming Distance (SHD), which accelerates the alignment verification procedure in read mapping, by quickly filtering out error-abundant sequence pairs using bit- parallel and SIMD-parallel operations.

...read moreread less

Abstract: Motivation: Calculating the edit-distance (i.e. minimum number of insertions, deletions and substitutions) between short DNA sequences is the primary task performed by seed-and-extend based mappers, which compare billions of sequences. In practice, only sequence pairs with a small editdistance provide useful scientific data. However, the majority of sequence pairs analyzed by seedand-extend based mappers differ by significantly more errors than what is typically allowed. Such error-abundant sequence pairs needlessly waste resources and severely hinder the performance of read mappers. Therefore, it is crucial to develop a fast and accurate filter that can rapidly and efficiently detect error-abundant string pairs and remove them from consideration before more computationally expensive methods are used. Results: We present a simple and efficient algorithm, Shifted Hamming Distance (SHD), which accelerates the alignment verification procedure in read mapping, by quickly filtering out error-abundant sequence pairs using bit-parallel and SIMD-parallel operations. SHD only filters string pairs that contain more errors than a user-defined threshold, making it fully comprehensive. It also maintains high accuracy with moderate error threshold (up to 5% of the string length) while achieving a 3-fold speedup over the best previous algorithm (Gene Myers’s bit-vector algorithm). SHD is compatible with all mappers that perform sequence alignment for verification. Availability and implementation: We provide an implementation of SHD in C with Intel SSE instructions at: https://github.com/CMU-SAFARI/SHD.

...read moreread less

84 citations

Patent•

Apparatus for generating a statistical sequence model called class bi-multigram model with bigram dependencies assumed between adjacent sequences

[...]

Sabine Deligne, Yoshinori Sagisaka, Hideharu Nakajima

13 Apr 1999

TL;DR: In this article, a class bi-multigram model is proposed to generate a statistical class sequence model from input training strings of discrete-valued units, where bigram dependencies are assumed between adjacent variable length sequences of maximum length N units, and where class labels are assigned to the sequences.

...read moreread less

Abstract: An apparatus generates a statistical class sequence model called A class bi-multigram model from input training strings of discrete-valued units, where bigram dependencies are assumed between adjacent variable length sequences of maximum length N units, and where class labels are assigned to the sequences. The number of times all sequences of units occur are counted, as well as the number of times all pairs of sequences of units co-occur in the input training strings. An initial bigram probability distribution of all the pairs of sequences is computed as the number of times the two sequences co-occur, divided by the number of times the first sequence occurs in the input training string. Then, the input sequences are classified into a pre-specified desired number of classes. Further, an estimate of the bigram probability distribution of the sequences is calculated by using an EM algorithm to maximize the likelihood of the input training string computed with the input probability distributions. The above processes are then iteratively performed to generate statistical class sequence model.

...read moreread less

84 citations

Journal Article•DOI•

Document Spanners: A Formal Approach to Information Extraction

[...]

Ronald Fagin¹, Benny Kimelfeld¹, Frederick Reiss¹, Stijn Vansummeren²•Institutions (2)

IBM¹, Université libre de Bruxelles²

06 May 2015-Journal of the ACM

TL;DR: This article develops a foundational framework where the central construct is what they call a document spanner (or just spanner for short), and proves that the first kind has the same expressive power as regular expressions with capture variables; the second kind expresses precisely the algebra of the regular spanners.

...read moreread less

Abstract: An intrinsic part of information extraction is the creation and manipulation of relations extracted from text. In this article, we develop a foundational framework where the central construct is what we call a document spanner (or just spanner for short). A spanner maps an input string into a relation over the spans (intervals specified by bounding indices) of the string. The focus of this article is on the representation of spanners. Conceptually, there are two kinds of such representations. Spanners defined in a primitive representation extract relations directly from the input string; those defined in an algebra apply algebraic operations to the primitively represented spanners. This framework is driven by SystemT, an IBM commercial product for text analysis, where the primitive representation is that of regular expressions with capture variables. We define additional types of primitive spanner representations by means of two kinds of automata that assign spans to variables. We prove that the first kind has the same expressive power as regular expressions with capture variables; the second kind expresses precisely the algebra of the regular spanners—the closure of the first kind under standard relational operators. The core spanners extend the regular ones by string-equality selection (an extension used in SystemT). We give some fundamental results on the expressiveness of regular and core spanners. As an example, we prove that regular spanners are closed under difference (and complement), but core spanners are not. Finally, we establish connections with related notions in the literature.

...read moreread less

84 citations

Journal Article•DOI•

Adaptive neural network control of a flexible string system with non-symmetric dead-zone and output constraint

[...]

Zhijia Zhao¹, Zhijia Zhao², Jun Shi², Xuejing Lan², Xiaowei Wang², Jingfeng Yang - Show less +2 more•Institutions (2)

Advanced Technology Center¹, Guangzhou University²

01 Dec 2017-Neurocomputing

TL;DR: Under the proposed control, the bounded stability of the closed-loop system is proven based on Lyapunov functions without simplifying or discretizing the infinite-dimensional dynamics.

...read moreread less

84 citations

Collapse

Network Information

Performance

Metrics

19,430

Papers

362,272

Citations

No. of papers in the topic in previous years
Year	Papers
2022	2
2021	491
2020	704
2019	759
2018	816
2017	806

String (computer science)

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics