Showing papers on "String (computer science) published in 2006"

PDF

Open Access

Lightweight Directory Access Protocol (LDAP): Internationalized String Preparation

[...]

01 Jun 2006

TL;DR: This document defines string preparation algorithms for character-based matching rules defined for use in LDAP.

...read moreread less

Abstract: The previous Lightweight Directory Access Protocol (LDAP) technical specifications did not precisely define how character string matching is to be performed. This led to a number of usability and interoperability problems. This document defines string preparation algorithms for character-based matching rules defined for use in LDAP. [STANDARDS-TRACK]

...read moreread less

722 citations

Journal Article•DOI•

String method in collective variables: Minimum free energy paths and isocommittor surfaces

[...]

Luca Maragliano¹, Alexander Fischer², Eric Vanden-Eijnden, Giovanni Ciccotti³•Institutions (3)

Courant Institute of Mathematical Sciences¹, New York University², Sapienza University of Rome³

14 Jul 2006-Journal of Chemical Physics

TL;DR: A computational technique is proposed which combines the string method with a sampling technique to determine minimum free energy paths and captures the mechanism of transition in that it allows to determine the committor function for the reaction and, in particular, the transition state region.

...read moreread less

Abstract: A computational technique is proposed which combines the string method with a sampling technique to determine minimum free energy paths. The technique only requires to compute the mean force and another conditional expectation locally along the string, and therefore can be applied even if the number of collective variables kept in the free energy calculation is large. This is in contrast with other free energy sampling techniques which aim at mapping the full free energy landscape and whose cost increases exponentially with the number of collective variables kept in the free energy. Provided that the number of collective variables is large enough, the new technique captures the mechanism of transition in that it allows to determine the committor function for the reaction and, in particular, the transition state region. The new technique is illustrated on the example of alanine dipeptide, in which we compute the minimum free energy path for the isomerization transition using either two or four dihedral angles as collective variables. It is shown that the mechanism of transition can be captured using the four dihedral angles, but it cannot be captured using only two of them.

...read moreread less

662 citations

Proceedings Article•DOI•

Tree-to-String Alignment Template for Statistical Machine Translation

[...]

Yang Liu¹, Qun Liu¹, Shouxun Lin¹•Institutions (1)

Chinese Academy of Sciences¹

17 Jul 2006

TL;DR: A novel translation model based on tree-to-string alignment template (TAT) which describes the alignment between a source parse tree and a target string that significantly outperforms Pharaoh, a state-of-the-art decoder for phrase-based models.

...read moreread less

Abstract: We present a novel translation model based on tree-to-string alignment template (TAT) which describes the alignment between a source parse tree and a target string. A TAT is capable of generating both terminals and non-terminals and performing reordering at both low and high levels. The model is linguistically syntax-based because TATs are extracted automatically from word-aligned, source side parsed parallel texts. To translate a source sentence, we first employ a parser to produce a source parse tree and then apply TATs to transform the tree into a target string. Our experiments show that the TAT-based model significantly outperforms Pharaoh, a state-of-the-art decoder for phrase-based models.

...read moreread less

350 citations

Journal Article•DOI•

Algorithmic construction of sets for k-restrictions

[...]

Noga Alon¹, Dana Moshkovitz², Shmuel Safra¹•Institutions (2)

Tel Aviv University¹, Weizmann Institute of Science²

01 Apr 2006-ACM Transactions on Algorithms

TL;DR: This work addresses k-restriction problems, which unify combinatorial problems of the following type, and offers a generic algorithmic method that yields considerably smaller constructions.

...read moreread less

Abstract: This work addresses k-restriction problems, which unify combinatorial problems of the following type: The goal is to construct a short list of strings in Σm that satisfies a given set of k-wise demands. For every k positions and every demand, there must be at least one string in the list that satisfies the demand at these positions. Problems of this form frequently arise in different fields in Computer Science.The standard approach for deterministically solving such problems is via almost k-wise independence or k-wise approximations for other distributions. We offer a generic algorithmic method that yields considerably smaller constructions. To this end, we generalize a previous work of Naor et al. [1995]. Among other results, we enhance the combinatorial objects in the heart of their method, called splitters, and construct multi-way splitters, using a new discrete version of the topological Necklace Splitting Theorem [Alon 1987].We utilize our methods to show improved constructions for group testing [Ngo and Du 2000] and generalized hashing [Alon et al. 2003], and an improved inapproximability result for SET-COVER under the assumption P ≠ NP.

...read moreread less

339 citations

Patent•

Method and apparatus to control operation of a playback device

[...]

Vadim Brenner¹, Peter C. DiMaria¹, Dale T. Roberts¹, Michael W. Mantle¹, Michael W. Orme¹ - Show less +1 more•Institutions (1)

Gracenote¹

21 Aug 2006

TL;DR: In this paper, a plurality of media items are stored in the media metadata and each portion of the phonetic metadata is stored in an original language of the string (see FIG. 12).

...read moreread less

Abstract: Media metadata is accessible for a plurality of media items (See FIG. 12). The media metadata includes a number of strings to identify information regarding the media items (See FIG. 12). Phonetic metadata is associated the number of strings of the media metadata (See FIG. 12). Each portion of the phonetic metadata is stored in an original language of the string (See FIG. 12).

...read moreread less

262 citations

Patent•

System and method for monitoring photovoltaic power generation systems

[...]

Gordon E Presher, Carlton L Warren

18 Jan 2006

TL;DR: In this paper, a system and method for monitoring photovoltaic power generation systems or arrays (230) both on a local (site) level (100) and from a central location (610).

...read moreread less

Abstract: A system and method for monitoring photovoltaic power generation systems or arrays (230), both on a local (site) level (100) and from a central location (610). The system includes panel and string combiner sentries (70) or intelligent devices, in bidirectional communication with a master device on the site to facilitate installation and troubleshooting of faults in the array (e.g., Fig. 9), including performance monitoring and diagnostic data collection (e.g., Figs. 14, 15).

...read moreread less

238 citations

Proceedings Article•

Statistical syntax-directed translation with extended domain of locality

[...]

Liang Huang¹, Kevin Knight², Aravind K. Joshi¹•Institutions (2)

University of Pennsylvania¹, Information Sciences Institute²

08 Aug 2006

TL;DR: A simple-yet-effective algorithm to generate non-duplicate k-best translations for n-gram rescoring is devised and a direct probability model is defined and a linear-time dynamic programming algorithm is used to search for the best derivation.

...read moreread less

Abstract: In syntax-directed translation, the source-language input is first parsed into a parse-tree, which is then recursively converted into a string in the target-language. We model this conversion by an extended tree-to-string transducer that has multi-level trees on the source-side, which gives our system more expressive power and flexibility. We also define a direct probability model and use a linear-time dynamic programming algorithm to search for the best derivation. The model is then extended to the general log-linear frame-work in order to incorporate other features like n-gram language models. We devise a simple-yet-effective algorithm to generate non-duplicate k-best translations for n-gram rescoring. Preliminary experiments on English-to-Chinese translation show a significant improvement in terms of translation quality compared to a state-of-the- art phrase-based system.

...read moreread less

217 citations

Journal Article•DOI•

Fast and Scalable Pattern Matching for Network Intrusion Detection Systems

[...]

Sarang Dharmapurikar¹, John W. Lockwood¹•Institutions (1)

Washington University in St. Louis¹

01 Oct 2006-IEEE Journal on Selected Areas in Communications

TL;DR: This work presents hardware-implementable pattern matching algorithm for content filtering applications, which is scalable in terms of speed, the number of patterns and the pattern length, and is based on a memory efficient multihashing data structure called Bloom filter.

...read moreread less

Abstract: High-speed packet content inspection and filtering devices rely on a fast multipattern matching algorithm which is used to detect predefined keywords or signatures in the packets. Multipattern matching is known to require intensive memory accesses and is often a performance bottleneck. Hence, specialized hardware-accelerated algorithms are required for line-speed packet processing. We present hardware-implementable pattern matching algorithm for content filtering applications, which is scalable in terms of speed, the number of patterns and the pattern length. Our algorithm is based on a memory efficient multihashing data structure called Bloom filter. We use embedded on-chip memory blocks in field programmable gate array/very large scale integration chips to construct Bloom filters which can suppress a large fraction of memory accesses and speed up string matching. Based on this concept, we first present a simple algorithm which can scan for several thousand short (up to 16 bytes) patterns at multigigabit per second speeds with a moderately small amount of embedded memory and a few mega bytes of external memory. Furthermore, we modify this algorithm to be able to handle arbitrarily large strings at the cost of a little more on-chip memory. We demonstrate the merit of our algorithm through theoretical analysis and simulations performed on Snort's string set

...read moreread less

215 citations

Patent•

Associating geographic-related information with objects

[...]

Suman Nath¹•Institutions (1)

Microsoft¹

26 Oct 2006

TL;DR: In this article, a search is conducted on a keyword string of one or more keywords descriptive or otherwise representative of a geographically-relevant object, and a location is identified, geographic-related semantic information of the location is associated with the location.

...read moreread less

Abstract: Techniques for associating geographic-related information with objects are described. In one implementation, a search is conducted on a keyword string of one or more keywords descriptive or otherwise representative of a geographically-relevant object. If a location is identified, geographic-related semantic information of the location is associated with the geographically-relevant object. In some cases, multiple possible locations may be identified as a result of searching the keyword string. If multiple locations are identified, a probable location is determined and then geographic-related semantic information of the probable location is associated with the geographically-relevant object described by the keyword string.

...read moreread less

188 citations

Patent•

System and method for finding desired results by incremental search using an ambiguous keypad with the input containing orthographic and typographic errors

[...]

Pankaj Garg, Sashikumar Venkataraman, Gopal Mishrimalji Rajpurohit

21 Nov 2006

TL;DR: In this article, a system for finding and presenting content items in response to keystrokes entered by a user on an input device having a known layout of overloaded keys selected from a set of key layouts.

...read moreread less

Abstract: A system for finding and presenting content items in response to keystrokes entered by a user on an input device having a known layout of overloaded keys selected from a set of key layouts. The system includes (1) a database containing content items and terms characterizing the content items; (2) input logic for receiving keystrokes from the user and building a string corresponding to incremental entries by the user, each item in the string having the set of alphanumeric symbols associated with a corresponding keystroke; (3) mapping logic to map the string to the database to find the most likely content items corresponding to the incremental entries, the mapping logic operating in accordance with a defined error model corresponding to the known layout of overloaded keys; and (4) presentation logic for ordering the most likely content items identified by the mapping logic and for presenting the most likely content items.

...read moreread less

181 citations

Proceedings Article•DOI•

High Speed Pattern Matching for Network IDS/IPS

[...]

Mansoor Alicherry¹, M. Muthuprasanna², Vijay Pochampalli Kumar¹•Institutions (2)

Bell Labs¹, Iowa State University²

12 Nov 2006

TL;DR: A novel multiple string matching algorithm that can process multiple characters at a time thus achieving multi-gigabit rate search speeds and an architecture for an efficient implementation on TCAM-based hardware are proposed.

...read moreread less

Abstract: The phenomenal growth of the Internet in the last decade and society?s increasing dependence on it has brought along, a flood of security attacks on the networking and computing infrastructure. Intrusion detection/prevention systems provide defenses against these attacks by monitoring headers and payload of packets flowing through the network. Multiple string matching that can compare hundreds of string patterns simultaneously is a critical component of these systems, and is a well-studied problem. Most of the string matching solutions today are based on the classic Aho-Corasick algorithm, which has an inherent limitation; they can process only one input character in one cycle. As memory speed is not growing at the same pace as network speed, this limitation has become a bottleneck in the current network, having speeds of tens of gigabits per second. In this paper, we propose a novel multiple string matching algorithm that can process multiple characters at a time thus achieving multi-gigabit rate search speeds. We also propose an architecture for an efficient implementation on TCAM-based hardware. We additionally propose novel optimizations by making use of the properties of TCAMs to significantly reduce the memory requirements of the proposed algorithm. We finally present extensive simulation results of network-based virus/worm detection using real signature databases to illustrate the effectiveness of the proposed scheme.

...read moreread less

Patent•

Exploitation of language identification of media file data in speech dialog systems

[...]

Daniel Willett, Jochen Schwenninger, Hennecke Marcus, Raymond Brueckner

02 Oct 2006

TL;DR: In this paper, a method for outputting a synthesized speech signal corresponding to an orthographic string stored in a media file comprising audio data, comprising the steps of analyzing the audio data to determine at least one candidate for a language of the orthographical string, estimating a phonetic representation of the phonetic string based on the determined candidates, and synthesizing a speech signal based on estimated phonetic representations.

...read moreread less

Abstract: The present invention relates to a method for outputting a synthesized speech signal corresponding to an orthographic string stored in a media file comprising audio data, comprising the steps of analyzing the audio data to determine at least one candidate for a language of the orthographic string, estimating a phonetic representation of the orthographic string based on the determined at least one candidate for a language and synthesizing a speech signal based on the estimated phonetic representation of the orthographic string. The invention also relates to a media player incorporating such a method for a estimating phonetic representation for song and album titles as well as artists' names for speech recognition. Furthermore, the invention relates to the choice of an appropriate speech recognizer for automatically transcribing the lyrics of songs by using audio-based language estimates.

...read moreread less

Book Chapter•DOI•

Round-optimal composable blind signatures in the common reference string model

[...]

Marc Fischlin¹•Institutions (1)

Technische Universität Darmstadt¹

20 Aug 2006

TL;DR: This work builds concurrently executable blind signatures schemes in the common reference string model, based on general complexity assumptions, and with optimal round complexity, and puts forward the definition of universally composable blind signature schemes.

...read moreread less

Abstract: We build concurrently executable blind signatures schemes in the common reference string model, based on general complexity assumptions, and with optimal round complexity. Namely, each interactive signature generation requires the requesting user and the issuing bank to transmit only one message each. We also put forward the definition of universally composable blind signature schemes, and show how to extend our concurrently executable blind signature protocol to derive such universally composable schemes in the common reference string model under general assumptions. While this protocol then guarantees very strong security properties when executed within larger protocols, it still supports signature generation in two moves.

...read moreread less

Patent•

Multi-unit approach to text-to-speech synthesis

[...]

Matthias Neeracher¹, Devang Naik¹, Kevin B. Aitken¹, Jerome R. Bellegarda¹, Kim E. A. Silverman¹ - Show less +1 more•Institutions (1)

Apple Inc.¹

16 Feb 2006

TL;DR: In this article, a method for matching a first level of units of a received input string to audio segments from a plurality of audio segments including using properties of or between first level units to locate matching audio segments.

...read moreread less

Abstract: Methods, apparatus, systems, and computer program products are provided for synthesizing speech. One method includes matching a first level of units of a received input string to audio segments from a plurality of audio segments including using properties of or between first level units to locate matching audio segments from a plurality of selections, parsing unmatched first level units into second level units, matching the second level units to audio segments using properties of or between the units to locate matching audio segments from a plurality of selections and synthesizing the input string, including combining the audio segments associated with the first and second units.

...read moreread less

Patent•

String matching method and system and computer-readable recording medium storing the string matching method

[...]

Kyung-eun Lee¹•Institutions (1)

Samsung¹

16 Jun 2006

TL;DR: A string matching method, system, and a computer-readable medium storing instructions for determining and obtaining a representative string for a plurality of strings that are written in various manners but share the same meaning is described in this article.

...read moreread less

Abstract: A string matching method, system, and a computer-readable medium storing instructions for determining and obtaining a representative string for a plurality of strings that are written in various manners but share the same meaning. The string matching method includes: converting the input string into one or more second-language strings with reference to a language mapping table, which stores a plurality of pieces of mapping information for mapping a first-language string to a second-language string, and generating a conversion list; searching a representative list database, which storing a plurality of records each with a representative string and a corresponding second-language representative string, for records including the same second-language representative strings as the respective second-language strings in the conversion list and generating a candidate list; and determining a representative string from the candidate list to be an output representative string. Therefore, the string matching can provide string-based multimedia data classification scenarios.

...read moreread less

Journal Article•DOI•

Non-interactive correlation distillation, inhomogeneous Markov chains, and the reverse Bonami-Beckner inequality

[...]

Elchanan Mossel¹, Ryan O'Donnell², Oded Regev³, Jeffrey E. Steif⁴, Benny Sudakov⁵ - Show less +1 more•Institutions (5)

University of California, Berkeley¹, Institute for Advanced Study², Tel Aviv University³, Chalmers University of Technology⁴, Princeton University⁵

01 Dec 2006-Israel Journal of Mathematics

TL;DR: NICD, a generalization of noise sensitivity previously considered in [5, 31, 39], is extended to trees and the use of thereverse Bonami-Beckner inequality is used to prove a new isoperimetric inequality for the discrete cube and a new result on the mixing of short random walks on the cube.

...read moreread less

Abstract: In this paper we studynon-interactive correlation distillation (NICD), a generalization of noise sensitivity previously considered in [5, 31, 39]. We extend the model toNICD on trees. In this model there is a fixed undirected tree with players at some of the nodes. One node is given a uniformly random string and this string is distributed throughout the network, with the edges of the tree acting as independent binary symmetric channels. The goal of the players is to agree on a shared random bit without communicating.

...read moreread less

Journal Article•DOI•

Efficient Realization of Coordinate Structures in Combinatory Categorial Grammar

[...]

Michael White¹•Institutions (1)

University of Edinburgh¹

14 Mar 2006-Research on Language and Computation

TL;DR: A chart realization algorithm for Combinatory Categorial Grammar (CCG) is described, and it is shown how it can be used to efficiently realize a wide range of coordination phenomena, including argument cluster coordination and gapping.

...read moreread less

Abstract: We describe a chart realization algorithm for Combinatory Categorial Grammar (CCG), and show how it can be used to efficiently realize a wide range of coordination phenomena, including argument cluster coordination and gapping. The algorithm incorporates three novel methods for improving the efficiency of chart realization: (i) using rules to chunk the input logical form into sub-problems to be solved independently prior to further combination; (ii) pruning edges from the chart based on the n-gram score of the edge’s string, in comparison to other edges with equivalent categories; and (iii) formulating the search as a best-first anytime algorithm, using n-gram scores to sort the edges on the agenda. The algorithm has been implemented as an extension to the OpenCCG open source CCG parser, and initial performance tests indicate that the realizer is fast enough for practical use in natural language dialogue systems.

...read moreread less

Patent•

Network distributed tracking wire transfer protocol

[...]

John K. Overton, Stephen W. Bailey

13 Feb 2006

TL;DR: In this article, the authors propose a network distributed tracking wire transfer protocol for storing and retrieving data across a distributed data collection, which includes a location string for specifying the network location of data associated with an entity in the distributed collection, and an identification string for describing the identity of an entity.

...read moreread less

Abstract: A network distributed tracking wire transfer protocol for storing and retrieving data across a distributed data collection. The protocol includes a location string for specifying the network location of data associated with an entity in the distributed data collection, and an identification string for specifying the identity of an entity in the distributed data collection. According to the protocol, the length of the location string and the length of the identification string are variable, and an association between an identification string and a location string can be spontaneously and dynamically changed. The network distributed tracking wire transfer protocol is application independent, organizationally independent, and geographically independent. A method for using the protocol in a distributed data collection environment and a system for implementing the protocol are also provided.

...read moreread less

Journal Article•DOI•

Learning interpretable SVMs for biological sequence classification.

[...]

Gunnar Rätsch¹, Sören Sonnenburg², Christin Schäfer²•Institutions (2)

Max Planck Society¹, Fraunhofer Society²

20 Mar 2006-BMC Bioinformatics

TL;DR: Novel and efficient algorithms are proposed for solving the so-called Support Vector Multiple Kernel Learning problem and can be used to understand the obtained support vector decision function in order to extract biologically relevant knowledge about the sequence analysis problem at hand.

...read moreread less

Abstract: Support Vector Machines (SVMs) – using a variety of string kernels – have been successfully applied to biological sequence classification problems. While SVMs achieve high classification accuracy they lack interpretability. In many applications, it does not suffice that an algorithm just detects a biological signal in the sequence, but it should also provide means to interpret its solution in order to gain biological insight. We propose novel and efficient algorithms for solving the so-called Support Vector Multiple Kernel Learning problem. The developed techniques can be used to understand the obtained support vector decision function in order to extract biologically relevant knowledge about the sequence analysis problem at hand. We apply the proposed methods to the task of acceptor splice site prediction and to the problem of recognizing alternatively spliced exons. Our algorithms compute sparse weightings of substring locations, highlighting which parts of the sequence are important for discrimination. The proposed method is able to deal with thousands of examples while combining hundreds of kernels within reasonable time, and reliably identifies a few statistically significant positions.

...read moreread less

Journal Article•DOI•

Discovering Frequent Closed Partial Orders from Strings

[...]

Jian Pei¹, H. Wang², J. Liu, Ke Wang, Jianyong Wang², Philip S. Yu - Show less +2 more•Institutions (2)

Simon Fraser University¹, IEEE Computer Society²

01 Nov 2006-IEEE Transactions on Knowledge and Data Engineering

TL;DR: To tackle the problem, Frecpo is developed, a practically efficient algorithm for mining the complete set of frequent closed partial orders from large string databases and several interesting pruning techniques are devised to speed up the search.

...read moreread less

Abstract: Mining knowledge about ordering from sequence data is an important problem with many applications, such as bioinformatics, Web mining, network management, and intrusion detection. For example, if many customers follow a partial order in their purchases of a series of products, the partial order can be used to predict other related customers' future purchases and develop marketing campaigns. Moreover, some biological sequences (e.g., microarray data) can be clustered based on the partial orders shared by the sequences. Given a set of items, a total order of a subset of items can be represented as a string. A string database is a multiset of strings. In this paper, we identify a novel problem of mining frequent closed partial orders from strings. Frequent closed partial orders capture the nonredundant and interesting ordering information from string databases. Importantly, mining frequent closed partial orders can discover meaningful knowledge that cannot be disclosed by previous data mining techniques. However, the problem of mining frequent closed partial orders is challenging. To tackle the problem, we develop Frecpo (for frequent closed partial order), a practically efficient algorithm for mining the complete set of frequent closed partial orders from large string databases. Several interesting pruning techniques are devised to speed up the search. We report an extensive performance study on both real data sets and synthetic data sets to illustrate the effectiveness and the efficiency of our approach

...read moreread less

Journal Article•DOI•

A simple optimal representation for balanced parentheses

[...]

Richard F. Geary¹, Naila Rahman¹, Rajeev Raman¹, Venkatesh Raman²•Institutions (2)

University of Leicester¹, Institute of Mathematical Sciences, Chennai²

10 Dec 2006-Theoretical Computer Science

TL;DR: A new 2n+o(n)-bit representation is given that supports all the above operations in O(1) time, and is conceptually simpler, its space bound has a smaller o(n) term and it also has a simple and uniform o( n) time and space construction algorithm.

...read moreread less

Patent•

Multi-language document search and retrieval system

[...]

Wayne Loofbourrow¹, David Casseres¹•Institutions (1)

Apple Inc.¹

29 Dec 2006

TL;DR: A multi-lingual indexing and search system performs tokenization and stemming in a manner which is independent of whether index entries and search terms appear as words in a dictionary.

...read moreread less

Abstract: A multi-lingual indexing and search system performs tokenization and stemming in a manner which is independent of whether index entries and search terms appear as words in a dictionary. During the tokenization phase of the process, a string of text is separated into individual word tokens, and predetermined types of tokens are eliminated from further processing. The stemming phase of the process reduces words to grammatical stems by removing known word-endings associated with the various languages to be supported. Known word endings are removed from the word tokens without any effort to guarantee that the remaining stem is contained in a dictionary. In a preferred implementation, the stemming process is only applied to nouns.

...read moreread less

Posted Content•

Spelling-Error Tolerant, Order-Independent Pass-Phrases via the Damerau-Levenshtein String-Edit Distance Metric.

[...]

Gregory V. Bard

01 Jan 2006-IACR Cryptology ePrint Archive

TL;DR: Canetti et al. as discussed by the authors showed that a dictionary can be used with the DamerauLevenshtein stringedit distance metric to construct a case-insensitive passphrase system that can tolerate zero, one, or two spelling-errors per word, with no loss in security.

...read moreread less

Abstract: It is well understood that passwords must be very long and complex to have sufficient entropy for security purposes. Unfortunately, these passwords tend to be hard to memorize, and so alternatives are sought. Smart Cards, Biometrics, and Reverse Turing Tests (human-only solvable puzzles) are options, but another option is to use pass-phrases. This paper explores methods for making passphrases suitable for use with password-based authentication and key-exchange (PAKE) protocols, and in particular, with schemes resilient to server-file compromise. In particular, the Ω-method of Gentry, MacKenzie and Ramzan, is combined with the Bellovin-Merritt protocol to provide mutual authentication (in the random oracle model (Canetti, Goldreich & Halevi 2004, Bellare, Boldyreva & Palacio 2004, Maurer, Renner & Holenstein 2004)). Furthermore, since common password-related problems are typographical errors, and the CAPSLOCK key, we show how a dictionary can be used with the DamerauLevenshtein string-edit distance metric to construct a case-insensitive pass-phrase system that can tolerate zero, one, or two spelling-errors per word, with no loss in security. Furthermore, we show that the system can be made to accept pass-phrases that have been arbitrarily reordered, with a security cost that can be calculated. While a pass-phrase space of 2 is not achieved by this scheme, sizes in the range of 2 to 2 result from various selections of parameter sizes. An attacker who has acquired the server-file must exhaust over this space, while an attacker without the serverfile cannot succeed with non-negligible probability.

...read moreread less

Proceedings Article•DOI•

PastryStrings: A Comprehensive Content-Based Publish/Subscribe DHT Network

[...]

Ioannis Aekaterinidis¹, Peter Triantafillou¹•Institutions (1)

University of Patras¹

04 Jul 2006

TL;DR: This work proposes and develops a comprehensive infrastructure for supporting rich queries on both numerical and string attributes, (accommodating equality, prefix, suffix, and containment predicates) over DHT networks utilising prefix-based routing.

...read moreread less

Abstract: In this work we propose and develop a comprehensive infrastructure, coined PastryStrings, for supporting rich queries on both numerical (with range, and comparison predicates) and string attributes, (accommodating equality, prefix, suffix, and containment predicates) over DHT networks utilising prefix-based routing. As event-based, publish/ subscribe information systems are a champion application class, we formulate our solution in terms of this environment.

...read moreread less

Journal Article•DOI•

Search‐based software test data generation for string data using program‐specific search operators

[...]

Mohammad Alshraideh¹, Leonardo Bottaci¹•Institutions (1)

University of Hull¹

01 Sep 2006-Software Testing, Verification & Reliability

TL;DR: This paper presents a novel approach to automatic software test data generation, where the test data is intended to cover program branches which depend on string predicates such as string equality, string ordering and regular expression matching.

...read moreread less

Abstract: This paper presents a novel approach to automatic software test data generation, where the test data is intended to cover program branches which depend on string predicates such as string equality, string ordering and regular expression matching. A search-based approach is assumed and some potential search operators and corresponding evaluation functions are assembled. Their performance is assessed empirically by using them to generate test data for a number of test programs. A novel approach of using search operators based on programming language string operators and parameterized by string literals from the program under test is introduced. These operators are also assessed empirically in generating test data for the test programs and are shown to provide a significant increase in performance. Copyright © 2006 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•

The number of runs in a string : Improved analysis of the linear upper bound

[...]

Wojciech Rytter¹•Institutions (1)

University of Warsaw¹

01 Jan 2006-Lecture Notes in Computer Science

TL;DR: In this article, the authors proposed a new approach to the analysis of runs based on the properties of subperiods: the periods of periodic parts of the runs of a string.

...read moreread less

Abstract: A run (or a maximal repetition) in a string is an inclusion-maximal periodic segment in a string. Let p(n) be the maximal number of runs in a string of length n. It has been shown in [8] that p(n) = O(n), the proof was very complicated and the constant coefficient in O(n) has not been given explicitly. We propose a new approach to the analysis of runs based on the properties of subperiods: the periods of periodic parts of the runs. We show that p(n) < 5 n. Our proof is inspired by the results of [4], where the role of new periodicity lemmas has been emphasized.

...read moreread less

Proceedings Article•DOI•

Cache-oblivious string B-trees

[...]

Michael A. Bender¹, Martin Farach-Colton², Bradley C. Kuszmaul³•Institutions (3)

Stony Brook University¹, Rutgers University², Massachusetts Institute of Technology³

26 Jun 2006

TL;DR: This paper presents a cache-oblivious string B-tree (COSB-tree) data structure that is efficient in all these ways: searches asymptotically optimally and inserts and deletes nearly optimally, and maintains an index whose size is proportional to the front-compressed size of the dictionary.

...read moreread less

Abstract: B-trees are the data structure of choice for maintaining searchable data on disk. However, B-trees perform suboptimally when keys are long or of variable length,when keys are compressed, even when using front compression, the standard B-tree compression scheme,for range queries, andwith respect to memory effects such as disk prefetching.This paper presents a cache-oblivious string B-tree (COSB-tree) data structure that is efficient in all these ways: The COSB-tree searches asymptotically optimally and inserts and deletes nearly optimally.It maintains an index whose size is proportional to the front-compressed size of the dictionary. Furthermore, unlike standard front-compressed strings, keys can be decompressed in a memory-efficient manner.It performs range queries with no extra disk seeks; in contrast, B-trees incur disk seeks when skipping from leaf block to leaf block.It utilizes all levels of a memory hierarchy efficiently and makes good use of disk locality by using cache-oblivious layout strategies.

...read moreread less

Book Chapter•DOI•

Global grammar constraints

[...]

Claude-Guy Quimper¹, Toby Walsh²•Institutions (2)

University of Waterloo¹, NICTA²

25 Sep 2006

TL;DR: This paper considers how to propagate grammar constraints and a number of extensions to specify global constraints via grammars or automata and to propagate this constraint specification efficiently and effectively.

...read moreread less

Abstract: Global constraints are an important tool in the constraint toolkit. Unfortunately, whilst it is usually easy to specify when a global constraint holds, it is often difficult to build a good propagator. One promising direction is to specify global constraints via grammars or automata. For example, the Regular constraint [1] permits us to specify a wide range of global constraints by means of a DFA accepting a regular language, and to propagate this constraint specification efficiently and effectively. More precisely, the Regular constraint ensures that the values taken by a sequence of variables form a string accepted by the DFA. In this paper, we consider how to propagate such grammar constraints and a number of extensions.

...read moreread less

Patent•

Natural language search system

[...]

Kathleen Dahlgren, Edward P. Stabler, Karen Wallace, Paul Deane

08 Aug 2006

TL;DR: In this paper, a natural language system searching system develops concept and string indexes of a textual database, such as a group of litigation documents, by breaking the text to be indexed into sentences, words, dates, names and places in a reader, identifying phrases in a phrase parser, recovering word stems in a morphology module and determining the sense of potentially ambiguous words in a sense selector, all in accordance with words and concepts (word senses) stored in lexicon database 9-32.

...read moreread less

Abstract: A natural language system searching system develops concept and string indexes of a textual database, such as a group of litigation documents, by breaking the text to be indexed into sentences, words, dates, names and places in a reader, identifying phrases in a phrase parser, recovering word stems in a morphology module and determining the sense of potentially ambiguous words in a sense selector, all in accordance with words and concepts (word senses) stored in lexicon database 9-32. A query may then be processed by the reader, phrase parser, morphology module, and sense selector to provide a text meaning output which can be compared with the concept and string indexes to identify, retrieve and display documents and/or portions of documents related to the query. A lexicon enhancer adds vocabulary semi-automatically.

...read moreread less

Patent•

Operating non-volatile memory with boost structures

[...]

Nima Mokhlesi¹•Institutions (1)

SanDisk¹

13 Nov 2006

TL;DR: In this article, the boost structures are provided for individual NAND strings and can be individually controlled to assist in programming, verifying and reading processes, in part based on a target programming state or verify level.

...read moreread less

Abstract: A method for operating non-volatile memory having boost structures. The boost structures are provided for individual NAND strings and can be individually controlled to assist in programming, verifying and reading processes. The boost structures can be commonly boosted and individually discharged, in part, based on a target programming state or verify level. The boost structures assists in programming so that the programming and pass voltage on a word line can be reduced, thereby reducing side effects such as program disturb. During verifying, all storage elements on a word line can be verified concurrently. The boost structure can also assist during reading. In one approach, the NAND string has dual source-side select gates between which the boost structure contacts the substrate at a source/drain region, and a boost voltage is provided to the boost structure via a source-side of the NAND string.

...read moreread less

Collapse