scispace - formally typeset
Search or ask a question
Topic

String (computer science)

About: String (computer science) is a research topic. Over the lifetime, 19430 publications have been published within this topic receiving 333247 citations. The topic is also known as: str & s.


Papers
More filters
Journal ArticleDOI
TL;DR: Part-of-speech tagging is considered, which was the first syntactic problem to successfully be attacked by statistical techniques and also serves as a good warm-up for the main topic-statistical parsing.
Abstract: I review current statistical work on syntactic parsing and then consider part-of-speech tagging, which was the first syntactic problem to successfully be attacked by statistical techniques and also serves as a good warm-up for the main topic-statistical parsing. Here, I consider both the simplified case in which the input string is viewed as a string of parts of speech and the more interesting case in which the parser is guided by statistical information about the particular words in the sentence. Finally, I anticipate future research directions.

250 citations

Proceedings Article
04 Nov 2016
TL;DR: Neuro-Symbolic Program Synthesis (NSPSS) as discussed by the authors is based on two neural modules: the cross correlation I/O network and the recursive-Reverse-Recursive Neural Network (R3NN).
Abstract: Recent years have seen the proposal of a number of neural architectures for the problem of Program Induction. Given a set of input-output examples, these architectures are able to learn mappings that generalize to new test inputs. While achieving impressive results, these approaches have a number of important limitations: (a) they are computationally expensive and hard to train, (b) a model has to be trained for each task (program) separately, and (c) it is hard to interpret or verify the correctness of the learnt mapping (as it is defined by a neural network). In this paper, we propose a novel technique, Neuro-Symbolic Program Synthesis, to overcome the above-mentioned problems. Once trained, our approach can automatically construct computer programs in a domain-specific language that are consistent with a set of input-output examples provided at test time. Our method is based on two novel neural modules. The first module, called the cross correlation I/O network, given a set of input-output examples, produces a continuous representation of the set of I/O examples. The second module, the Recursive-Reverse-Recursive Neural Network (R3NN), given the continuous representation of the examples, synthesizes a program by incrementally expanding partial programs. We demonstrate the effectiveness of our approach by applying it to the rich and complex domain of regular expression based string transformations. Experiments show that the R3NN model is not only able to construct programs from new input-output examples, but it is also able to construct new programs for tasks that it had never observed before during training.

248 citations

Journal ArticleDOI
TL;DR: This article proposes the use of string kernels, a novel way of computing document similarity based of matching non-consecutive subsequences of characters with sequences of words rather than characters, and presents some extensions to sequence kernels dealing with symbol-dependent and match-dependent decay factors.
Abstract: We address the problem of categorising documents using kernel-based methods such as Support Vector Machines. Since the work of Joachims (1998), there is ample experimental evidence that SVM using the standard word frequencies as features yield state-of-the-art performance on a number of benchmark problems. Recently, Lodhi et al. (2002) proposed the use of string kernels, a novel way of computing document similarity based of matching non-consecutive subsequences of characters. In this article, we propose the use of this technique with sequences of words rather than characters. This approach has several advantages, in particular it is more efficient computationally and it ties in closely with standard linguistic pre-processing techniques. We present some extensions to sequence kernels dealing with symbol-dependent and match-dependent decay factors, and present empirical evaluations of these extensions on the Reuters-21578 datasets.

248 citations

Journal ArticleDOI
TL;DR: Algorithmic Clustering of Music Based on String Compression and its Applications.
Abstract: Cilibrasi, Vitanyi, and de Wolf Computer Music Journal, 28:4, pp 49–67, Winter 2004 2004 Massachusetts Institute of Technology Rudi Cilibrasi,* Paul Vitanyi,*† and Ronald de Wolf* *Centrum voor Wiskunde en Informatica Kruislaan 413 1098 SJ Amsterdam, The Netherlands †Institute for Logic, Language, and Computation University of Amsterdam Plantage Muidergracht 24 1018 TV Amsterdam, The Netherlands {RudiCilibrasi, PaulVitanyi, RonalddeWolf}@cwinl Algorithmic Clustering of Music Based on String Compression

247 citations


Network Information
Related Topics (5)
Time complexity
36K papers, 879.5K citations
88% related
Tree (data structure)
44.9K papers, 749.6K citations
86% related
Graph (abstract data type)
69.9K papers, 1.2M citations
85% related
Computational complexity theory
30.8K papers, 711.2K citations
82% related
Supervised learning
20.8K papers, 710.5K citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20222
2021491
2020704
2019759
2018816
2017806