Topic

String (computer science)

About: String (computer science) is a research topic. Over the lifetime, 19430 publications have been published within this topic receiving 333247 citations. The topic is also known as: str & s.

...read moreread less

Papers published on a yearly basis

1 / 3

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

Fast kernels for inexact string matching

[...]

Christina S. Leslie¹, Rui Kuang¹•Institutions (1)

Columbia University¹

01 Dec 2003

TL;DR: Several new families of string kernels designed in particular for use with support vector machines (SVMs) for classification of protein sequence data are introduced, and it is shown that these new faster kernels achieve SVM classification performance comparable to the mismatch kernel and the Fisher kernel derived from profile hidden Markov models.

...read moreread less

Abstract: We introduce several new families of string kernels designed in particular for use with support vector machines (SVMs) for classification of protein sequence data. These kernels – restricted gappy kernels, substitution kernels, and wildcard kernels – are based on feature spaces indexed by k-length subsequences from the string alphabet Σ (or the alphabet augmented by a wildcard character), and hence they are related to the recently presented (k,m)-mismatch kernel and string kernels used in text classification. However, for all kernels we define here, the kernel value K(x,y) can be computed in O(c K (|x| + |y|)) time, where the constant c K depends on the parameters of the kernel but is independent of the size |Σ| of the alphabet. Thus the computation of these kernels is linear in the length of the sequences, like the mismatch kernel, but we improve upon the parameter-dependent constant \(c_K = k^{m+1} |\Sigma|^m\) of the mismatch kernel. We compute the kernels efficiently using a recursive function based on a trie data structure and relate our new kernels to the recently described transducer formalism. Finally, we report protein classification experiments on a benchmark SCOP dataset, where we show that our new faster kernels achieve SVM classification performance comparable to the mismatch kernel and the Fisher kernel derived from profile hidden Markov models.

...read moreread less

62 citations

Proceedings Article•DOI•

Dependency-Based Automatic Evaluation for Machine Translation

[...]

Karolina textscOwczarzak¹, Josef textscvan Genabith¹, Andy textscWay¹•Institutions (1)

Dublin City University¹

26 Apr 2007

TL;DR: A novel method for evaluating the output of Machine Translation, based on comparing the dependency structures of the translation and reference rather than their surface string forms, which reaches high correlation with human scores.

...read moreread less

Abstract: We present a novel method for evaluating the output of Machine Translation (MT), based on comparing the dependency structures of the translation and reference rather than their surface string forms. Our method uses a treebank-based, widecoverage, probabilistic Lexical-Functional Grammar (LFG) parser to produce a set of structural dependencies for each translation-reference sentence pair, and then calculates the precision and recall for these dependencies. Our dependency-based evaluation, in contrast to most popular string-based evaluation metrics, will not unfairly penalize perfectly valid syntactic variations in the translation. In addition to allowing for legitimate syntactic differences, we use paraphrases in the evaluation process to account for lexical variation. In comparison with other metrics on 16,800 sentences of Chinese-English newswire text, our method reaches high correlation with human scores. An experiment with two translations of 4,000 sentences from Spanish-English Europarl shows that, in contrast to most other metrics, our method does not display a high bias towards statistical models of translation.

...read moreread less

61 citations

Proceedings Article•DOI•

Handwritten Word Spotting with Corrected Attributes

[...]

Jon Almazan, Albert Gordo¹, Alicia Fornés, Ernest Valveny•Institutions (1)

French Institute for Research in Computer Science and Automation¹

01 Dec 2013

TL;DR: An attributes-based approach to multi-writer word spotting that leads to a low-dimensional, fixed-length representation of the word images that is fast to compute and, especially, fast to compare is proposed.

...read moreread less

Abstract: We propose an approach to multi-writer word spotting, where the goal is to find a query word in a dataset comprised of document images. We propose an attributes-based approach that leads to a low-dimensional, fixed-length representation of the word images that is fast to compute and, especially, fast to compare. This approach naturally leads to an unified representation of word images and strings, which seamlessly allows one to indistinctly perform query-by-example, where the query is an image, and query-by-string, where the query is a string. We also propose a calibration scheme to correct the attributes scores based on Canonical Correlation Analysis that greatly improves the results on a challenging dataset. We test our approach on two public datasets showing state-of-the-art results.

...read moreread less

61 citations

Patent•

Obscuring form data through obfuscation

[...]

Roderick C. Henderson¹, John R. Hind¹, Belinda Ying-Chieh Langner¹, Yongcheng Li¹•Institutions (1)

IBM¹

30 Dec 2009

TL;DR: In this paper, the form data to be obscured is removed from a form and inserted as a portion of a Uniform Resource Location (URL) string, and an obfuscation is then applied to the portion of the URL string, thereby obscuring the information for sending on an outbound message.

...read moreread less

Abstract: Obscuring form data to be passed in forms that are sent in messages over a communications network. The form data to be obscured is removed from a form and inserted as a portion of a Uniform Resource Location (“URL”) string. The obscured form data may comprise hidden fields and/or links. An obfuscation is then applied to the portion of the URL string, thereby obscuring the information for sending on an outbound message. The original information is recovered from an inbound message which contains the obscured information by reversing the processing used for the obscuring. In one aspect, the obfuscation comprises encryption. In another aspect, the obfuscation comprises creating a tiny URL that replaces the portion of the URL string.

...read moreread less

61 citations

Proceedings Article•DOI•

Optimally fast parallel algorithms for preprocessing and pattern matching in one and two dimensions

[...]

Richard Cole¹, Maxime Crochemore², Z. Galil², Leszek Gasieniec², R. Eariharan², S. Muthukrishnan², Kunsoo Park², Wojciech Rytter² - Show less +4 more•Institutions (2)

Courant Institute of Mathematical Sciences¹, New York University²

03 Nov 1993

TL;DR: An algorithm that computes a deterministic sample of a sufficiently long substring in constant time for string matching, solving the main open problem remaining in string matching.

...read moreread less

Abstract: All algorithms below are optimal alphabet-independent parallel CRCW PRAM algorithms. In one dimension: Given a pattern string of length m for the string-matching problem, we design an algorithm that computes a deterministic sample of a sufficiently long substring in constant time. This problem used to be a bottleneck in the pattern preprocessing for one- and two-dimensional pattern matching. The best previous time bound was O(log/sup 2/ m/log log m). We use this algorithm to obtain the following results. 1. Improving the preprocessing of the constant-time text search algorithm from O(log/sup 2/ m/log log m) to n(log log m), which is now best possible. 2. A constant-time deterministic string-matching algorithm in the case that the text length n satisfies n=/spl Omega/(m/sup 1+/spl epsiv//) for a constant /spl epsiv/>0. 3. A simple probabilistic string-matching algorithm that has constant time with high probability for random input. 4. A constant expected time Las-Vegas algorithm for computing the period of the pattern and all witnesses and thus string matching itself, solving the main open problem remaining in string matching. >

...read moreread less

61 citations

Collapse

Network Information

Performance

Metrics

19,430

Papers

362,272

Citations

No. of papers in the topic in previous years
Year	Papers
2022	2
2021	491
2020	704
2019	759
2018	816
2017	806

String (computer science)

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics