scispace - formally typeset
Search or ask a question

Showing papers on "String (computer science) published in 2011"


Journal ArticleDOI
TL;DR: This work proposes a new k-mer counting algorithm and associated implementation, called Jellyfish, which is fast and memory efficient, based on a multithreaded, lock-free hash table optimized for counting k-mers up to 31 bases in length.
Abstract: Motivation: Counting the number of occurrences of every k-mer (substring of length k) in a long string is a central subproblem in many applications, including genome assembly, error correction of sequencing reads, fast multiple sequence alignment and repeat detection. Recently, the deep sequence coverage generated by next-generation sequencing technologies has caused the amount of sequence to be processed during a genome project to grow rapidly, and has rendered current k-mer counting tools too slow and memory intensive. At the same time, large multicore computers have become commonplace in research facilities allowing for a new parallel computational paradigm. Results: We propose a new k-mer counting algorithm and associated implementation, called Jellyfish, which is fast and memory efficient. It is based on a multithreaded, lock-free hash table optimized for counting k-mers up to 31 bases in length. Due to their flexibility, suffix arrays have been the data structure of choice for solving many string problems. For the task of k-mer counting, important in many biological applications, Jellyfish offers a much faster and more memory-efficient solution. Availability: The Jellyfish software is written in C++ and is GPL licensed. It is available for download at http://www.cbcb.umd.edu/software/jellyfish. Contact: [email protected] Supplementary information:Supplementary data are available at Bioinformatics online.

2,779 citations


Journal ArticleDOI
Sumit Gulwani1
26 Jan 2011
TL;DR: The design of a string programming/expression language that supports restricted forms of regular expressions, conditionals and loops is described and an algorithm based on several novel concepts for synthesizing a desired program in this language is described from input-output examples.
Abstract: We describe the design of a string programming/expression language that supports restricted forms of regular expressions, conditionals and loops. The language is expressive enough to represent a wide variety of string manipulation tasks that end-users struggle with. We describe an algorithm based on several novel concepts for synthesizing a desired program in this language from input-output examples. The synthesis algorithm is very efficient taking a fraction of a second for various benchmark examples. The synthesis algorithm is interactive and has several desirable features: it can rank multiple solutions and has fast convergence, it can detect noise in the user input, and it supports an active interaction model wherein the user is prompted to provide outputs on inputs that may have multiple computational interpretations.The algorithm has been implemented as an interactive add-in for Microsoft Excel spreadsheet system. The prototype tool has met the golden test - it has synthesized part of itself, and has been used to solve problems beyond author's imagination.

801 citations


Journal ArticleDOI
TL;DR: It is proved that every stringlike logical operator of this code can be deformed to a disjoint union of short segments, each of which is in the stabilizer group, and introduced a notion of "logical string segments" to avoid difficulties in defining one-dimensional objects in discrete lattices.
Abstract: We suggest concrete models for self-correcting quantum memory by reporting examples of local stabilizer codes in 3D that have no string logical operators. Previously known local stabilizer codes in 3D all have stringlike logical operators, which make the codes non-self-correcting. We introduce a notion of “logical string segments” to avoid difficulties in defining one-dimensional objects in discrete lattices. We prove that every stringlike logical operator of our code can be deformed to a disjoint union of short segments, each of which is in the stabilizer group. The code has surfacelike logical operators whose partial implementation has unsatisfied stabilizers along its boundary.

610 citations


Patent
16 Feb 2011
TL;DR: In this article, the erasing operation to memory cells associated with a plurality of string selection lines (SSLs), the memory cells associating with the plurality of SSLs constituting a memory block, was verified.
Abstract: A method of operating a non-volatile memory device includes performing an erasing operation to memory cells associated with a plurality of string selection lines (SSLs), the memory cells associated with the plurality of SSLs constituting a memory block, and verifying the erasing operation to second memory cells associated with a second SSL after verifying the erasing operation to first memory cells associated with a first SSL.

497 citations


Journal ArticleDOI
TL;DR: This paper provides a practical means to evaluate the ACC systems applying the sliding-mode controller and provides a reasonable proposal to design the ACC controller from the perspective of the practical string stability.
Abstract: In this paper, the practical string stability of both homogeneous and heterogeneous platoons of adaptive cruise control (ACC) vehicles, which apply the constant time headway spacing policy, is investigated by considering the parasitic time delays and lags of the actuators and sensors when building the vehicle longitudinal dynamics model. The proposed control law based on the sliding-mode controller can guarantee both homogeneous and heterogeneous string stability, if the control parameters and system parameters meet certain requirements. The analysis of the negative effect of the parasitic time delays and lags on the string stability indicates that the negative effect of the time delays is larger than that of the time lags. This paper provides a practical means to evaluate the ACC systems applying the sliding-mode controller and provides a reasonable proposal to design the ACC controller from the perspective of the practical string stability.

403 citations


Journal ArticleDOI
TL;DR: This work introduces a protocol for private randomness expansion with untrusted devices which is designed to take as input an initially private random string and produce as output a longerPrivate random string.
Abstract: Randomness is an important resource for many applications, from gambling to secure communication. However, guaranteeing that the output from a candidate random source could not have been predicted by an outside party is a challenging task, and many supposedly random sources used today provide no such guarantee. Quantum solutions to this problem exist, for example a device which internally sends a photon through a beamsplitter and observes on which side it emerges, but, presently, such solutions require the user to trust the internal workings of the device. Here, we seek to go beyond this limitation by asking whether randomness can be generated using untrusted devices—even ones created by an adversarial agent—while providing a guarantee that no outside party (including the agent) can predict it. Since this is easily seen to be impossible unless the user has an initially private random string, the task we investigate here is private randomness expansion. We introduce a protocol for private randomness expansion with untrusted devices which is designed to take as input an initially private random string and produce as output a longer private random string. We point out that private randomness expansion protocols are generally vulnerable to attacks that can render the initial string partially insecure, even though that string is used only inside a secure laboratory; our protocol is designed to remove this previously unconsidered vulnerability by privacy amplification. We also discuss extensions of our protocol designed to generate an arbitrarily long random string from a finite initially private random string. The security of these protocols against the most general attacks is left as an open question.

348 citations


Proceedings ArticleDOI
11 Apr 2011
TL;DR: This paper proposes a new similarity metrics, called “fuzzy token matching based similarity”, which extends token-based similarity functions by allowing fuzzy match between two tokens, and achieves high efficiency and result quality, and significantly outperforms state-of-the-art methods.
Abstract: String similarity join that finds similar string pairs between two string sets is an essential operation in many applications, and has attracted significant attention recently in the database community. A significant challenge in similarity join is to implement an effective fuzzy match operation to find all similar string pairs which may not match exactly. In this paper, we propose a new similarity metrics, called “fuzzy token matching based similarity”, which extends token-based similarity functions (e.g., Jaccard similarity and Cosine similarity) by allowing fuzzy match between two tokens. We study the problem of similarity join using this new similarity metrics and present a signature-based method to address this problem. We propose new signature schemes and develop effective pruning techniques to improve the performance. Experimental results show that our approach achieves high efficiency and result quality, and significantly outperforms state-of-the-art methods.

137 citations


Patent
01 Apr 2011
TL;DR: In this article, a 3D memory device includes a plurality of ridge-shaped stacks, in the form of multiple strips of conductive material separated by insulating material, arranged as bit lines which can be coupled through decoding circuits to sense amplifiers.
Abstract: A 3D memory device includes a plurality of ridge-shaped stacks, in the form of multiple strips of conductive material separated by insulating material, arranged as bit lines which can be coupled through decoding circuits to sense amplifiers. Diodes are connected to the bit lines at either the string select of common source select ends of the strings. The strips of conductive material have side surfaces on the sides of the ridge-shaped stacks. A plurality of word lines, which can be coupled to row decoders, extends orthogonally over the plurality of ridge-shaped stacks. Memory elements lie in a multi-layer array of interface regions at cross-points between side surfaces of the semiconductor strips on the stacks and the word lines.

119 citations


Journal ArticleDOI
TL;DR: In this article, a memory-efficient parallel string matching scheme is proposed for low-cost hardware-based intrusion detection systems, where long target patterns are divided into sub-patterns with a fixed length.
Abstract: For the low-cost hardware-based intrusion detection systems, this paper proposes a memory-efficient parallel string matching scheme. In order to reduce the number of state transitions, the finite state machine tiles in a string matcher adopt bit-level input symbols. Long target patterns are divided into subpatterns with a fixed length; deterministic finite automata are built with the subpatterns. Using the pattern dividing, the variety of target pattern lengths can be mitigated, so that memory usage in homogeneous string matchers can be efficient. In order to identify each original long pattern being divided, a two-stage sequential matching scheme is proposed for the successive matches with subpatterns. Experimental results show that total memory requirements decrease on average by 47.8 percent and 62.8 percent for Snort and ClamAV rule sets, in comparison with several existing bit-split string matching methods.

114 citations


Proceedings Article
27 Jul 2011
TL;DR: Focusing on listings from eBay's clothing and shoes categories, the bootstrapped NER system is able to identify new brands corresponding to spelling variants and typographical errors of the known brands, as well as identifying novel brands.
Abstract: We present a named entity recognition (NER) system for extracting product attributes and values from listing titles. Information extraction from short listing titles present a unique challenge, with the lack of informative context and grammatical structure. In this work, we combine supervised NER with bootstrapping to expand the seed list, and output normalized results. Focusing on listings from eBay's clothing and shoes categories, our bootstrapped NER system is able to identify new brands corresponding to spelling variants and typographical errors of the known brands, as well as identifying novel brands. Among the top 300 new brands predicted, our system achieves 90.33% precision. To output normalized attribute values, we explore several string comparison algorithms and found n-gram substring matching to work well in practice.

110 citations


Patent
01 Feb 2011
TL;DR: In this article, a method of determining a free point of a tubular string stuck in a wellbore includes deploying a tool string in the stuck tubular with a non-electric string.
Abstract: Embodiments of the present invention generally relate to a method and/or apparatus for deploying wireline tools with a non-electric string In one embodiment, a method of determining a free point of a tubular string stuck in a wellbore includes deploying a tool string in the stuck tubular with a non-electric string The free point assembly includes a battery, a controller, and a free point tool The method further includes activating the free point tool by the controller The free point tool contacts an inner surface of the stuck tubular string The method further includes applying a tensile force and/or torque to the stuck tubular string; and measuring a response of the tubular string with the free point tool

Proceedings ArticleDOI
24 Aug 2011
TL;DR: The string stability of CACC is discussed and its performance with various packet loss ratios, beacon sending frequencies and time headway in simulations is evaluated.
Abstract: Recent development in wireless technology enables communication between vehicles. The concept of Co-operative Adaptive Cruise Control (CACC) — which uses wireless communication between vehicles — aims at string stable behaviour in a platoon of vehicles. “String stability” means any non-zero position, speed, and acceleration errors of an individual vehicle in a string do not amplify when they propagate upstream. In this paper, we will discuss the string stability of CACC and evaluate its performance with various packet loss ratios, beacon sending frequencies and time headway in simulations. The simulation framework is built up with a controller prototype, a traffic simulator, and a network simulator.

Journal ArticleDOI
TL;DR: This article introduces the first compressed suffix tree representation that requires only sublinear space on top of the compressed text size, and supports a wide set of navigational operations in almost logarithmic time.
Abstract: Suffix trees are by far the most important data structure in stringology, with a myriad of applications in fields like bioinformatics and information retrieval. Classical representations of suffix trees require Θ(n log n) bits of space, for a string of size n. This is considerably more than the n log2 σ bits needed for the string itself, where σ is the alphabet size. The size of suffix trees has been a barrier to their wider adoption in practice. Recent compressed suffix tree representations require just the space of the compressed string plus Θ(n) extra bits. This is already spectacular, but the linear extra bits are still unsatisfactory when σ is small as in DNA sequences. In this article, we introduce the first compressed suffix tree representation that breaks this Θ(n)-bit space barrier. The Fully Compressed Suffix Tree (FCST) representation requires only sublinear space on top of the compressed text size, and supports a wide set of navigational operations in almost logarithmic time. This includes extracting arbitrary text substrings, so the FCST replaces the text using almost the same space as the compressed text. An essential ingredient of FCSTs is the lowest common ancestor (LCA) operation. We reveal important connections between LCAs and suffix tree navigation. We also describe how to make FCSTs dynamic, that is, support updates to the text. The dynamic FCST also supports several operations. In particular, it can build the static FCST within optimal space and polylogarithmic time per symbol. Our theoretical results are also validated experimentally, showing that FCSTs are very effective in practice as well.

Proceedings Article
27 Jul 2011
TL;DR: An inference algorithm is presented that organizes observed words (tokens) into structured inflectional paradigms (types) and naturally predicts the spelling of unobserved forms that are missing from these paradigm, and discovers inflectionAL principles (grammar) that generalize to wholly unobserved words.
Abstract: We present an inference algorithm that organizes observed words (tokens) into structured inflectional paradigms (types). It also naturally predicts the spelling of unobserved forms that are missing from these paradigms, and discovers inflectional principles (grammar) that generalize to wholly unobserved words. Our Bayesian generative model of the data explicitly represents tokens, types, inflections, paradigms, and locally conditioned string edits. It assumes that inflected word tokens are generated from an infinite mixture of inflectional paradigms (string tuples). Each paradigm is sampled all at once from a graphical model, whose potential functions are weighted finite-state transducers with language-specific parameters to be learned. These assumptions naturally lead to an elegant empirical Bayes inference procedure that exploits Monte Carlo EM, belief propagation, and dynamic programming. Given 50--100 seed paradigms, adding a 10-million-word corpus reduces prediction error for morphological inflections by up to 10%.

Journal ArticleDOI
26 Jan 2011
TL;DR: In this article, the authors introduce streaming data string transducers that map input data strings to output data strings in a single left-to-right pass in linear time, and establish PSPACE bounds for the problems of checking functional equivalence of two streaming transducers, and of checking whether a streaming transducer satisfies pre/post verification conditions specified by streaming acceptors over input/output data-strings.
Abstract: We introduce streaming data string transducers that map input data strings to output data strings in a single left-to-right pass in linear time. Data strings are (unbounded) sequences of data values, tagged with symbols from a finite set, over a potentially infinite data domain that supports only the operations of equality and ordering. The transducer uses a finite set of states, a finite set of variables ranging over the data domain, and a finite set of variables ranging over data strings. At every step, it can make decisions based on the next input symbol, updating its state, remembering the input data value in its data variables, and updating data-string variables by concatenating data-string variables and new symbols formed from data variables, while avoiding duplication. We establish PSPACE bounds for the problems of checking functional equivalence of two streaming transducers, and of checking whether a streaming transducer satisfies pre/post verification conditions specified by streaming acceptors over input/output data-strings.We identify a class of imperative and a class of functional programs, manipulating lists of data items, which can be effectively translated to streaming data-string transducers. The imperative programs dynamically modify a singly-linked heap by changing next-pointers of heap-nodes and by adding new nodes. The main restriction specifies how the next-pointers can be used for traversal. We also identify an expressively equivalent fragment of functional programs that traverse a list using syntactically restricted recursive calls. Our results lead to algorithms for assertion checking and for checking functional equivalence of two programs, written possibly in different programming styles, for commonly used routines such as insert, delete, and reverse.

Proceedings ArticleDOI
23 Jan 2011
TL;DR: In this paper, the authors presented two representations of a string of length n compressed into a context-free grammar S of size n with O(log N) random access time and O(n · αk(n)) construction time and space on the RAM.
Abstract: Let S be a string of length N compressed into a context-free grammar S of size n We present two representations of S achieving O(log N) random access time, and either O(n · αk(n)) construction time and space on the pointer machine model, or O(n) construction time and space on the RAM Here, αk(n) is the inverse of the kth row of Ackermann's function Our representations also efficiently support decompression of any substring in S: we can decompress any substring of length m in the same complexity as a single random access query and additional O(m) time Combining these results with fast algorithms for uncompressed approximate string matching leads to several efficient algorithms for approximate string matching on grammar-compressed strings without decompression For instance, we can find all approximate occurrences of a pattern P with at most k errors in time O(n(min{|P|k, k4 +|P|} +log N) + occ), where occ is the number of occurrences of P in S Finally, we are able to generalize our results to navigation and other operations on grammar-compressed treesAll of the above bounds significantly improve the currently best known results To achieve these bounds, we introduce several new techniques and data structures of independent interest, including a predecessor data structure, two "biased" weighted ancestor data structures, and a compact representation of heavy-paths in grammars

Patent
Nicola Cancedda1, Sara Stymne1
25 Jul 2011
TL;DR: In this article, a method and a system for making merging decisions for a translation are disclosed which are suited to use where the target language is a productive compounding one, including outputting decisions on merging of pairs of words in a translated text string with a merging system.
Abstract: A method and a system for making merging decisions for a translation are disclosed which are suited to use where the target language is a productive compounding one. The method includes outputting decisions on merging of pairs of words in a translated text string with a merging system. The merging system can include a set of stored heuristics and/or a merging model. In the case of heuristics, these can include a heuristic by which two consecutive words in the string are considered for merging if the first word of the two consecutive words is recognized as a compound modifier and their observed frequency f 1 as a closed compound word is larger than an observed frequency f 2 of the two consecutive words as a bigram. In the case of a merging model, it can be one that is trained on features associated with pairs of consecutive tokens of text strings in a training set and predetermined merging decisions for the pairs. A translation in the target language is output, based on the merging decisions for the translated text string.

Patent
07 Jun 2011
TL;DR: In this article, a method for detecting and locating occurrence in a data stream of any complex string belonging to a predefined complex dictionary is disclosed, where a complex string may comprise an arbitrary number of interleaving coherent strings and ambiguous strings.
Abstract: A method for detecting and locating occurrence in a data stream of any complex string belonging to a predefined complex dictionary is disclosed. A complex string may comprise an arbitrary number of interleaving coherent strings and ambiguous strings. The method comprises a first process for transforming the complex dictionary into a simple structure to enable continuously conducting computationally efficient search, and a second process for examining received data in real time using the simple structure. The method may be implemented as an article of manufacture having a processor-readable storage medium having instructions stored thereon for execution by a processor, causing the processor to match examined data to an object complex string belonging to the complex dictionary, where the matching process is based on equality to constituent coherent strings, and congruence to ambiguous strings, of the object complex string.

Journal ArticleDOI
TL;DR: It is claimed that the use of Bloom filters for calculating string similarities in a privacy-preserving manner can also be used for a novel error-tolerant but still irreversible encrypted key.
Abstract: An anonymous linking code is an encrypted key for linking data from dierent sources. So far, quite simple algorithms for the generation of such codes based on personal characteristics as names and date of birth are in common use. These algorithms will yield many non matching codes when facing errors in the underlying indentifier values. We suggested the use of Bloom filters for calculating string similarities in a privacy-preserving manner. Here, we claim that this principle can also be used for a novel error-tolerant but still irreversible encrypted key. We call the proposed code Cryptographic Longterm Key. It consists of one single Bloom filter into which identfiers are subsequently stored. Tests on simulated databases yield linkage results comparable to non encrypted identifiers and superior to results from hitherto existing methods. Since the Cryptographic Longterm Key can be easily adapted to meet quite dierent prerequisites it might be useful for many applications.

Proceedings ArticleDOI
23 Jan 2011
TL;DR: A comprehensive set of algorithms and data structures for performing fast automata operations for string constraint solving is studied to provide an apples-to-apples comparison between techniques that are used in current tools.
Abstract: There has been significant recent interest in automated reasoning techniques, in particular constraint solvers, for string variables. These techniques support a wide variety of clients, ranging from static analysis to automated testing. The majority of string constraint solvers rely on finite automata to support regular expression constraints. For these approaches, performance depends critically on fast automata operations such as intersection, complementation, and determinization. Existing work in this area has not yet provided conclusive results as to which core algorithms and data structures work best in practice.In this paper, we study a comprehensive set of algorithms and data structures for performing fast automata operations. Our goal is to provide an apples-to-apples comparison between techniques that are used in current tools. To achieve this, we re-implemented a number of existing techniques. We use an established set of regular expressions benchmarks as an indicative workload. We also include several techniques that, to the best of our knowledge, have not yet been used for string constraint solving. Our results show that there is a substantial performance difference across techniques, which has implications for future tool design.

Proceedings Article
27 Jul 2011
TL;DR: A source dependency structure based model that requires no heuristics or separate ordering models of the previous works to control the word order of translations and performs well on long distance reordering.
Abstract: Dependency structure, as a first step towards semantics, is believed to be helpful to improve translation quality. However, previous works on dependency structure based models typically resort to insertion operations to complete translations, which make it difficult to specify ordering information in translation rules. In our model of this paper, we handle this problem by directly specifying the ordering information in head-dependents rules which represent the source side as head-dependents relations and the target side as strings. The head-dependents rules require only substitution operation, thus our model requires no heuristics or separate ordering models of the previous works to control the word order of translations. Large-scale experiments show that our model performs well on long distance reordering, and outperforms the state-of-the-art constituency-to-string model (+1.47 BLEU on average) and hierarchical phrase-based model (+0.46 BLEU on average) on two Chinese-English NIST test sets without resort to phrases or parse forest. For the first time, a source dependency structure based model catches up with and surpasses the state-of-the-art translation models.

Patent
21 Jan 2011
TL;DR: In this article, weak column information is used to facilitate error detection and correction operations on a first plurality of bits of data read from the plurality of strings using an algorithm that modifies a weighting of the reliability of one or more data bits in the first plurality.
Abstract: Methods of operating nonvolatile memory devices include testing a plurality of strings of nonvolatile memory cells in the memory device to identify at least one weak string therein having a higher probability of yielding erroneous read data error relative to other ones of the plurality of strings. An identity of the at least one weak string may be stored as weak column information. This weak column information may be used to facilitate error detection and correction operations. In particular, an error correction operation may be performed on a first plurality of bits of data read from the plurality of strings using an algorithm that modifies a weighting of the reliability of one or more data bits in the first plurality of bits of data based on the weak column information. More specifically, an algorithm may be used that interprets a bit of data read from the at least one weak string as having a relatively reduced reliability relative to other ones of the first plurality of data bits.

Patent
16 Sep 2011
TL;DR: In this article, the authors propose a method for severing a tubular string having a cable in association therewith, which can be performed through a single actuation of a single cutting apparatus, enabling at least a portion of the tube to be subsequently severed and retrieved.
Abstract: Methods for severing a tubular string having a cable in association therewith can include lowering a cutting apparatus into the tubular string and actuating the cutting apparatus to form a cut in the tubular string and sever the cable. Severing the cable in this manner can be performed through a single actuation of a single cutting apparatus, enabling at least a portion of the tubular string to be subsequently severed and retrieved, unimpeded by the cable.

Book ChapterDOI
27 Jun 2011
TL;DR: The algorithms are lightweight in that the first needs O(m log m) bits of memory to process m strings and the memory requirements of the second are constant with respect to m, and apply to any string collection over any alphabet.
Abstract: A modern DNA sequencing machine can generate a billion or more sequence fragments in a matter of days. The many uses of the BWT in compression and indexing are well known, but the computational demands of creating the BWT of datasets this large have prevented its applications from being widely explored in this context. We address this obstacle by presenting two algorithms capable of computing the BWT of very large string collections. The algorithms are lightweight in that the first needs O(m log m) bits of memory to process m strings and the memory requirements of the second are constant with respect to m. We evaluate our algorithms on collections of up to 1 billion strings and compare their performance to other approaches on smaller datasets. Although our tests were on collections of DNA sequences of uniform length, the algorithms themselves apply to any string collection over any alphabet.

Proceedings ArticleDOI
16 Jul 2011
TL;DR: This work generates bilingual lexicons in 15 language pairs, focusing on words that have been automatically identified as physical objects, and uses these explicit, monolingual, image-to-word connections to successfully learn implicit, bilingual, word- to-word translations.
Abstract: Speakers of many different languages use the Internet. A common activity among these users is uploading images and associating these images with words (in their own language) as captions, filenames, or surrounding text. We use these explicit, monolingual, image-to-word connections to successfully learn implicit, bilingual, word-to-word translations. Bilingual pairs of words are proposed as translations if their corresponding images have similar visual features. We generate bilingual lexicons in 15 language pairs, focusing on words that have been automatically identified as physical objects. The use of visual similarity substantially improves performance over standard approaches based on string similarity: for generated lexicons with 1000 translations, including visual information leads to an absolute improvement in accuracy of 8-12% over string edit distance alone.

Patent
Sean Stauth1, Sewook Wee1
04 Feb 2011
TL;DR: In this paper, a mechanism for securely transmitting credentials to instantiated virtual machines is provided, where a central server is used to turn on a virtual machine and send it a secret text string.
Abstract: A mechanism for securely transmitting credentials to instantiated virtual machines is provided. A central server is used to turn on a virtual machine. When the virtual machine is turned on, the central server sends it a secret text string. The virtual machine requests the credentials from the central server by transmitting the secret string and its instance ID. The central server validates the secret string and source IP to determine whether they are authentic. Once verified, the central server transmits the credentials to the virtual machine in a secure channel and invalidates the secret string. The credentials can now be used to authenticate API calls.

Journal ArticleDOI
TL;DR: The currently fastest algorithm for RNA Single Strand Folding requires O(nZ) time and @Q(n^2) space, where n denotes the length of the input string and Z is a sparsity parameter satisfying n=Z.

Posted Content
TL;DR: Approaches to the task of recognizing textual entailment, including the use of subsequence matching, lexical entailment probability, and latent Dirichlet allocation, can be described within this framework.
Abstract: Techniques in which words are represented as vectors have proved useful in many applications in computational linguistics, however there is currently no general semantic formalism for representing meaning in terms of vectors. We present a framework for natural language semantics in which words, phrases and sentences are all represented as vectors, based on a theoretical analysis which assumes that meaning is determined by context. In the theoretical analysis, we define a corpus model as a mathematical abstraction of a text corpus. The meaning of a string of words is assumed to be a vector representing the contexts in which it occurs in the corpus model. Based on this assumption, we can show that the vector representations of words can be considered as elements of an algebra over a field. We note that in applications of vector spaces to representing meanings of words there is an underlying lattice structure; we interpret the partial ordering of the lattice as describing entailment between meanings. We also define the context-theoretic probability of a string, and, based on this and the lattice structure, a degree of entailment between strings. We relate the framework to existing methods of composing vector-based representations of meaning, and show that our approach generalises many of these, including vector addition, component-wise multiplication, and the tensor product.

Proceedings ArticleDOI
21 Mar 2011
TL;DR: The experimental results show that the proposed string-based approach is the best performing approach for automatic prediction of Valence and Expectation dimensions, and improves prediction performance for the other dimensions when combined with at least acoustic signal-based features.
Abstract: The automatic assessment of affect is mostly based on feature-level approaches, such as distances between facial points or prosodic and spectral information when it comes to audiovisual analysis. However, it is known and intuitive that behavioural events such as smiles, head shakes or laughter and sighs also bear highly relevant information regarding a subject's affective display. Accordingly, we propose a novel string-based prediction approach to fuse such events and to predict human affect in a continuous dimensional space. Extensive analysis and evaluation has been conducted using the newly released SEMAINE database of human-to-agent communication. For a thorough understanding of the obtained results, we provide additional benchmarks by more conventional feature-level modelling, and compare these and the string-based approach to fusion of signal-based features and string-based events. Our experimental results show that the proposed string-based approach is the best performing approach for automatic prediction of Valence and Expectation dimensions, and improves prediction performance for the other dimensions when combined with at least acoustic signal-based features.

Book ChapterDOI
04 Dec 2011
TL;DR: These are the first provably secure constructions of universally composable (UC) commitments (in pairing-friendly groups) that simultaneously combine the key properties of being non-interactive, supporting commitments to strings, and offering re-usability of the common reference string for multiple commitments.
Abstract: We present the first provably secure constructions of universally composable (UC) commitments (in pairing-friendly groups) that simultaneously combine the key properties of being non-interactive, supporting commitments to strings (instead of bits only), and offering re-usability of the common reference string for multiple commitments. Our schemes are also adaptively secure assuming reliable erasures.