Showing papers on "String (computer science) published in 2016"

PDF

Open Access

Proceedings Article•DOI•

Does String-Based Neural MT Learn Source Syntax?

[...]

Xing Shi¹, Inkit Padhi², Kevin Knight¹•Institutions (2)

University of Southern California¹, IBM²

01 Nov 2016

TL;DR: This work investigates whether a neural, encoderdecoder translation system learns syntactic information on the source side as a by-product of training and proposes two methods to detect whether the encoder has learned local and global source syntax.

...read moreread less

Abstract: We investigate whether a neural, encoderdecoder translation system learns syntactic information on the source side as a by-product of training. We propose two methods to detect whether the encoder has learned local and global source syntax. A fine-grained analysis of the syntactic structure learned by the encoder reveals which kinds of syntax are learned and which are missing.

...read moreread less

352 citations

Proceedings Article•DOI•

Multi-Source Neural Translation

[...]

Barret Zoph¹, Kevin Knight¹•Institutions (1)

Information Sciences Institute¹

01 Jun 2016

TL;DR: The authors built a multi-source machine translation model and trained it to maximize the probability of a target English string given French and German sources using the neural encoder-decoder framework.

...read moreread less

Abstract: We build a multi-source machine translation model and train it to maximize the probability of a target English string given French and German sources. Using the neural encoder-decoder framework, we explore several combination methods and report up to +4.8 Bleu increases on top of a very strong attention-based neural translation model.

...read moreread less

289 citations

Proceedings Article•

Neuro-Symbolic Program Synthesis

[...]

Emilio Parisotto¹, Abdelrahman Mohamed², Rishabh Singh², Lihong Li, Dengyong Zhou², Pushmeet Kohli² - Show less +2 more•Institutions (2)

University of Toronto¹, Microsoft²

04 Nov 2016

TL;DR: Neuro-Symbolic Program Synthesis (NSPSS) as discussed by the authors is based on two neural modules: the cross correlation I/O network and the recursive-Reverse-Recursive Neural Network (R3NN).

...read moreread less

Abstract: Recent years have seen the proposal of a number of neural architectures for the problem of Program Induction. Given a set of input-output examples, these architectures are able to learn mappings that generalize to new test inputs. While achieving impressive results, these approaches have a number of important limitations: (a) they are computationally expensive and hard to train, (b) a model has to be trained for each task (program) separately, and (c) it is hard to interpret or verify the correctness of the learnt mapping (as it is defined by a neural network). In this paper, we propose a novel technique, Neuro-Symbolic Program Synthesis, to overcome the above-mentioned problems. Once trained, our approach can automatically construct computer programs in a domain-specific language that are consistent with a set of input-output examples provided at test time. Our method is based on two novel neural modules. The first module, called the cross correlation I/O network, given a set of input-output examples, produces a continuous representation of the set of I/O examples. The second module, the Recursive-Reverse-Recursive Neural Network (R3NN), given the continuous representation of the examples, synthesizes a program by incrementally expanding partial programs. We demonstrate the effectiveness of our approach by applying it to the rich and complex domain of regular expression based string transformations. Experiments show that the R3NN model is not only able to construct programs from new input-output examples, but it is also able to construct new programs for tasks that it had never observed before during training.

...read moreread less

248 citations

Journal Article•DOI•

Distributed Adaptive Integrated-Sliding-Mode Controller Synthesis for String Stability of Vehicle Platoons

[...]

Xiang-Gui Guo¹, Jianliang Wang², Fang Liao³, Rodney Teo³•Institutions (3)

Tianjin University of Technology¹, Nanyang Technological University², National University of Singapore³

04 Apr 2016-IEEE Transactions on Intelligent Transportation Systems

TL;DR: This paper presents a distributed finite-time adaptive integral-sliding-mode (ISM) control approach for a platoon of vehicles consisting of a leader and multiple followers subjected to bounded unknown disturbances to overcome string instability caused by nonzero initial spacing errors.

...read moreread less

Abstract: This paper presents a distributed finite-time adaptive integral-sliding-mode (ISM) control approach for a platoon of vehicles consisting of a leader and multiple followers subjected to bounded unknown disturbances. In order to avoid collisions among the vehicles, control protocols have to be designed to ensure string stability of the whole vehicle platoon. First, the constant time headway (CTH) policy known to improve string stability is applied to the case of zero initial spacing errors. Contrary to requiring zero initial spacing and zero initial velocity errors simultaneously in existing methods based on constant spacing (CS) policy, initial velocity errors here are not required to be zero. Then, since string stability condition can fail at the initial conditions, a modified CTH policy is constructed to overcome string instability caused by nonzero initial spacing errors. Moreover, the proposed adaptive ISM control schemes can be implemented without the requirement that the bounds of the disturbances be known in advance. In addition, one effective method is proposed to reduce the chattering phenomenon caused by the indicator function. Finally, simulation results are included to demonstrate its effectiveness and advantages over existing methods.

...read moreread less

159 citations

Book Chapter•DOI•

Computationally Binding Quantum Commitments

[...]

Dominique Unruh¹•Institutions (1)

University of Tartu¹

08 May 2016

TL;DR: A new definition of computationally binding commitment schemes in the quantum setting, which is called "collapse-binding", applies to string commitments, composes in parallel, and works well with rewinding-based proofs.

...read moreread less

Abstract: We present a new definition of computationally binding commitment schemes in the quantum setting, which we call "collapse-binding". The definition applies to string commitments, composes in parallel, and works well with rewinding-based proofs. We give simple constructions of collapse-binding commitments in the random oracle model, giving evidence that they can be realized from hash functions like SHA-3. We evidence the usefulness of our definition by constructing three-round statistical zero-knowledge quantum arguments of knowledge for all NP languages.

...read moreread less

107 citations

Book Chapter•DOI•

NIZKs with an Untrusted CRS: Security in the Face of Parameter Subversion

[...]

Mihir Bellare¹, Georg Fuchsbauer², Alessandra Scafuro³•Institutions (3)

University of California, San Diego¹, École Normale Supérieure², North Carolina State University³

04 Dec 2016

TL;DR: In this paper, the security of NIZKs in the presence of a maliciously chosen common reference string is studied and both negative and positive results are provided. But they do not address the security issues of subversion soundness and zero knowledge.

...read moreread less

Abstract: Motivated by the subversion of "trusted" public parameters in mass-surveillance activities, this paper studies the security of NIZKs in the presence of a maliciously chosen common reference string. We provide definitions for subversion soundness, subversion witness indistinguishability and subversion zero knowledge. We then provide both negative and positive results, showing that certain combinations of goals are unachievable but giving protocols to achieve other combinations.

...read moreread less

96 citations

Journal Article•DOI•

A comparative study of SMILES-based compound similarity functions for drug-target interaction prediction.

[...]

Hakime Öztürk¹, Elif Ozkirimli¹, Arzucan Özgür¹•Institutions (1)

Boğaziçi University¹

18 Mar 2016-BMC Bioinformatics

TL;DR: This study provides a comparison of 13 different ligand similarity functions, each of which utilizes the SMILES string of molecule representation, and proposes cosine similarity based SMilES kernels that make use of the Term Frequency and Term Frequency-Inverse Document Frequency weighting approaches.

...read moreread less

Abstract: Molecular structures can be represented as strings of special characters using SMILES. Since each molecule is represented as a string, the similarity between compounds can be computed using SMILES-based string similarity functions. Most previous studies on drug-target interaction prediction use 2D-based compound similarity kernels such as SIMCOMP. To the best of our knowledge, using SMILES-based similarity functions, which are computationally more efficient than the 2D-based kernels, has not been investigated for this task before. In this study, we adapt and evaluate various SMILES-based similarity methods for drug-target interaction prediction. In addition, inspired by the vector space model of Information Retrieval we propose cosine similarity based SMILES kernels that make use of the Term Frequency (TF) and Term Frequency-Inverse Document Frequency (TF-IDF) weighting approaches. We also investigate generating composite kernels by combining our best SMILES-based similarity functions with the SIMCOMP kernel. With this study, we provided a comparison of 13 different ligand similarity functions, each of which utilizes the SMILES string of molecule representation. Additionally, TF and TF-IDF based cosine similarity kernels are proposed. The more efficient SMILES-based similarity functions performed similarly to the more complex 2D-based SIMCOMP kernel in terms of AUC-ROC scores. The TF-IDF based cosine similarity obtained a better AUC-PR score than the SIMCOMP kernel on the GPCR benchmark data set. The composite kernel of TF-IDF based cosine similarity and SIMCOMP achieved the best AUC-PR scores for all data sets.

...read moreread less

95 citations

Journal Article•DOI•

Active vibration control for a flexible string system with input backlash

[...]

Shuang Zhang¹, Wei He, Deqing Huang²•Institutions (2)

University of Electronic Science and Technology of China¹, Southwest Jiaotong University²

25 Apr 2016-Iet Control Theory and Applications

TL;DR: In this article, a physically motivated Lyapunov function is employed to design boundary control law to ensure the vibration suppression and guarantee the stability of the closed-loop system with input backlash.

...read moreread less

Abstract: In this study, the authors are concerned with the active vibration control of a flexible string system with input backlash. For vibration suppression, active control is applied at the right boundary of the flexible string. To deal with the input backlash, a novel ‘disturbance-like’ term is proposed in the control design. A physically motivated Lyapunov function is employed to design boundary control law to ensure the vibration suppression and guarantee the stability of the closed-loop system. Numerical simulations illustrate the effectiveness of the proposed control method.

...read moreread less

88 citations

Journal Article•DOI•

Construct, Merge, Solve & Adapt A new general algorithm for combinatorial optimization

[...]

Christian Blum¹, Pedro Pinacho², Manuel López-Ibáñez³, Jose A. Lozano⁴•Institutions (4)

Ikerbasque¹, Universidad Santo Tomás², University of Manchester³, University of the Basque Country⁴

01 Apr 2016-Computers & Operations Research

TL;DR: This paper describes a general hybrid metaheuristic for combinatorial optimization labelled Construct, Merge, Solve & Adapt, which is a specific instantiation of a framework known from the literature as Generate-And-Solve based on the following general idea.

...read moreread less

82 citations

Journal Article•DOI•

A really simple approximation of smallest grammar

[...]

Artur Jeż¹•Institutions (1)

University of Wrocław¹

22 Feb 2016-Theoretical Computer Science

TL;DR: In this paper, a simple linear-time algorithm is presented for constructing a context-free grammar of size 4 g log 3 / 2? ( N / g ) for the input string, where N is the size of the input text and g is the length of the optimal grammar generating this text.

...read moreread less

74 citations

Journal Article•DOI•

Efficient temporal mining of micro-blog texts and its application to event discovery

[...]

Giovanni Stilo, Paola Velardi

01 Mar 2016-Data Mining and Knowledge Discovery

TL;DR: A novel method for clustering words in micro-blogs, based on the similarity of the related temporal series, using the Symbolic Aggregate ApproXimation algorithm to discretize the temporal series of terms into a small set of levels, leading to a string for each.

...read moreread less

Abstract: In this paper we present a novel method for clustering words in micro-blogs, based on the similarity of the related temporal series. Our technique, named SAX*, uses the Symbolic Aggregate ApproXimation algorithm to discretize the temporal series of terms into a small set of levels, leading to a string for each. We then define a subset of "interesting" strings, i.e. those representing patterns of collective attention. Sliding temporal windows are used to detect co-occurring clusters of tokens with the same or similar string. To assess the performance of the method we first tune the model parameters on a 2-month 1 % Twitter stream, during which a number of world-wide events of differing type and duration (sports, politics, disasters, health, and celebrities) occurred. Then, we evaluate the quality of all discovered events in a 1-year stream, "googling" with the most frequent cluster n-grams and manually assessing how many clusters correspond to published news in the same temporal slot. Finally, we perform a complexity evaluation and we compare SAX* with three alternative methods for event discovery. Our evaluation shows that SAX* is at least one order of magnitude less complex than other temporal and non-temporal approaches to micro-blog clustering.

...read moreread less

Proceedings Article•DOI•

String analysis for side channels with segmented oracles

[...]

Lucas Bang¹, Abdulbaki Aydin¹, Quoc-Sang Phan², Corina S. Păsăreanu², Tevfik Bultan¹ - Show less +1 more•Institutions (2)

University of California, Santa Barbara¹, Carnegie Mellon University²

01 Nov 2016

TL;DR: An efficient technique for segmented oracles that computes information leakage for multiple runs using only the path constraints generated from a single run symbolic execution is presented.

...read moreread less

Abstract: We present an automated approach for detecting and quantifying side channels in Java programs, which uses symbolic execution, string analysis and model counting to compute information leakage for a single run of a program. We further extend this approach to compute information leakage for multiple runs for a type of side channels called segmented oracles, where the attacker is able to explore each segment of a secret (for example each character of a password) independently. We present an efficient technique for segmented oracles that computes information leakage for multiple runs using only the path constraints generated from a single run symbolic execution. Our implementation uses the symbolic execution tool Symbolic PathFinder (SPF), SMT solver Z3, and two model counting constraint solvers LattE and ABC. Although LattE has been used before for analyzing numeric constraints, in this paper, we present an approach for using LattE for analyzing string constraints. We also extend the string constraint solver ABC for analysis of both numeric and string constraints, and we integrate ABC in SPF, enabling quantitative symbolic string analysis.

...read moreread less

Proceedings Article•DOI•

String solving with word equations and transducers: towards a logic for analysing mutation XSS

[...]

Anthony W. Lin¹, Pablo Barceló²•Institutions (2)

Yale-NUS College¹, University of Chile²

11 Jan 2016

TL;DR: The main contribution is to show that the "straight-line fragment" of the logic is decidable, which can express the program logics of straight-line string-manipulating programs with concatenations and transductions as atomic operations, which arise when performing bounded model checking or dynamic symbolic executions.

...read moreread less

Abstract: We study the fundamental issue of decidability of satisfiability over string logics with concatenations and finite-state transducers as atomic operations. Although restricting to one type of operations yields decidability, little is known about the decidability of their combined theory, which is especially relevant when analysing security vulnerabilities of dynamic web pages in a more realistic browser model. On the one hand, word equations (string logic with concatenations) cannot precisely capture sanitisation functions (e.g. htmlescape) and implicit browser transductions (e.g. innerHTML mutations). On the other hand, transducers suffer from the reverse problem of being able to model sanitisation functions and browser transductions, but not string concatenations. Naively combining word equations and transducers easily leads to an undecidable logic. Our main contribution is to show that the "straight-line fragment" of the logic is decidable (complexity ranges from PSPACE to EXPSPACE). The fragment can express the program logics of straight-line string-manipulating programs with concatenations and transductions as atomic operations, which arise when performing bounded model checking or dynamic symbolic executions. We demonstrate that the logic can naturally express constraints required for analysing mutation XSS in web applications. Finally, the logic remains decidable in the presence of length, letter-counting, regular, indexOf, and disequality constraints.

...read moreread less

Book Chapter•DOI•

Progressive Reasoning over Recursively-Defined Strings

[...]

Minh-Thai Trinh¹, Duc-Hiep Chu¹, Joxan Jaffar¹•Institutions (1)

National University of Singapore¹

17 Jul 2016

TL;DR: A progressive search algorithm to not only mitigate the problem of non-terminating reasoning but also guide the search towards a “minimal solution” when the input formula is in fact satisfiable.

...read moreread less

Abstract: We consider the problem of reasoning over an expressive constraint language for unbounded strings. The difficulty comes from “recursively defined” functions such as replace, making state-of-the-art algorithms non-terminating. Our first contribution is a progressive search algorithm to not only mitigate the problem of non-terminating reasoning but also guide the search towards a “minimal solution” when the input formula is in fact satisfiable. We have implemented our method using the state-of-the-art Z3 framework. Importantly, we have enabled conflict clause learning for string theory so that our solver can be used effectively in the setting of program verification. Finally, our experimental evaluation shows leadership in a large benchmark suite, and a first deployment for another benchmark suite which requires reasoning about string formulas of a class that has not been solved before.

...read moreread less

Journal Article•DOI•

A surprisingly simple de Bruijn sequence construction

[...]

Joe Sawada¹, Aaron Williams², Dennis Wong³•Institutions (3)

University of Guelph¹, Bard College at Simon's Rock², Northwest Missouri State University³

06 Jan 2016-Discrete Mathematics

TL;DR: This shift rule leads to a new de Bruijn sequence construction that can be generated in O ( 1 ) -amortized time per bit.

...read moreread less

Journal Article•DOI•

String Submodular Functions With Curvature Constraints

[...]

Zhenliang Zhang¹, Edwin K. P. Chong¹, Ali Pezeshki¹, William Moran²•Institutions (2)

Colorado State University¹, University of Melbourne²

01 Mar 2016-IEEE Transactions on Automatic Control

TL;DR: A generalization of the diminishing-return property by defining the elemental forward curvature and the notion of string-matroid is introduced, which is investigated two applications of string submodular functions with curvature constraints: choosing a string of actions to maximize the expected fraction of accomplished tasks; and designing astring of measurement matrices such that the information gain is maximized.

...read moreread less

Abstract: Consider the problem of choosing a string of actions to optimize an objective function that is string submodular. It was shown in previous papers that the greedy strategy, consisting of a string of actions that only locally maximizes the step-wise gain in the objective function, achieves at least a $(1-e^{-1})$ -approximation to the optimal strategy. This paper improves this approximation by introducing additional constraints on curvature, namely, total backward curvature, total forward curvature, and elemental forward curvature. We show that if the objective function has total backward curvature $\sigma$ , then the greedy strategy achieves at least a $(1/\sigma)(1-e^{-\sigma})$ -approximation of the optimal strategy. If the objective function has total forward curvature $\epsilon$ , then the greedy strategy achieves at least a $(1-\epsilon)$ -approximation of the optimal strategy. Moreover, we consider a generalization of the diminishing-return property by defining the elemental forward curvature. We also introduce the notion of string-matroid and consider the problem of maximizing the objective function subject to a string-matroid constraint. We investigate two applications of string submodular functions with curvature constraints: 1) choosing a string of actions to maximize the expected fraction of accomplished tasks; and 2) designing a string of measurement matrices such that the information gain is maximized.

...read moreread less

Patent•

Semiconductor device and manufacturing method thereof

[...]

Jang Min Sik¹•Institutions (1)

SK Hynix¹

04 Aug 2016

TL;DR: In this paper, a semiconductor device consisting of source select lines, word lines, drain select lines and bit lines, which are stacked on a substrate where a first string area and a second string area are defined, is described.

...read moreread less

Abstract: The present invention relates to a semiconductor device and a manufacturing method thereof, wherein the semiconductor device comprises: source select lines, word lines, drain select lines, and bit lines, which are stacked on a substrate where a first string area and a second string area are defined; channel films and memory films which vertically penetrate the source select lines, the word lines, and the drain select lines in the first string area and the second string area; and a common source line which vertically penetrates the source select lines, the word lines, and the drain select lines at the centers of the first string area and the second string area, and which extends to a lower part of the source select lines Thereby, the capacity of a memory device can be enhanced and electric properties can be improved

...read moreread less

Journal Article•DOI•

An efficient SMT solver for string constraints

[...]

Tianyi Liang¹, Andrew Reynolds¹, Nestan Tsiskaridze¹, Cesare Tinelli¹, Clark Barrett², Morgan Deters² - Show less +2 more•Institutions (2)

University of Iowa¹, New York University²

01 Jun 2016

TL;DR: A set of algebraic techniques for solving constraints over a rich theory of unbounded strings natively, without reduction to other problems are presented and implemented in the SMT solver cvc4, making it the first solver able to accept a rich set of mixed constraints over strings, integers, reals, arrays and algebraic datatypes.

...read moreread less

Abstract: An increasing number of applications in verification and security rely on or could benefit from automatic solvers that can check the satisfiability of constraints over a diverse set of data types that includes character strings. Until recently, satisfiability solvers for strings were standalone tools that could reason only about fairly restricted fragments of the theory of strings and regular expressions (e.g., strings of bounded lengths). These solvers were based on reductions to satisfiability problems over other data types such as bit vectors or to automata decision problems. We present a set of algebraic techniques for solving constraints over a rich theory of unbounded strings natively, without reduction to other problems. These techniques can be used to integrate string reasoning into general, multi-theory SMT solvers based on the common DPLL(T) architecture. We have implemented them in our SMT solver cvc4, expanding its already large set of built-in theories to include a theory of strings with concatenation, length, and membership in regular languages. This implementation makes cvc4 the first solver able to accept a rich set of mixed constraints over strings, integers, reals, arrays and algebraic datatypes. Our initial experimental results show that, in addition, on pure string problems cvc4 is highly competitive with specialized string solvers accepting a comparable input language.

...read moreread less

Journal Article•DOI•

String kernels for native language identification: Insights from behind the curtains

[...]

Radu Tudor Ionescu¹, Marius Popescu¹, Aoife Cahill•Institutions (1)

University of Bucharest¹

01 Sep 2016-Computational Linguistics

TL;DR: A full view of the string kernels approach is given and insights into two kinds of language transfer effects, namely, word choice (lexical transfer) and morphological differences are offered.

...read moreread less

Abstract: The most common approach in text mining classification tasks is to rely on features like words, part-of-speech tags, stems, or some other high-level linguistic features. Recently, an approach that uses only character p-grams as features has been proposed for the task of native language identification NLI. The approach obtained state-of-the-art results by combining several string kernels using multiple kernel learning. Despite the fact that the approach based on string kernels performs so well, several questions about this method remain unanswered. First, it is not clear why such a simple approach can compete with far more complex approaches that take words, lemmas, syntactic information, or even semantics into account. Second, although the approach is designed to be language independent, all experiments to date have been on English. This work is an extensive study that aims to systematically present the string kernel approach and to clarify the open questions mentioned above. A broad set of native language identification experiments were conducted to compare the string kernels approach with other state-of-the-art methods. The empirical results obtained in all of the experiments conducted in this work indicate that the proposed approach achieves state-of-the-art performance in NLI, reaching an accuracy that is 1.7% above the top scoring system of the 2013 NLI Shared Task. Furthermore, the results obtained on both the Arabic and the Norwegian corpora demonstrate that the proposed approach is language independent. In the Arabic native language identification task, string kernels show an increase of more than 17% over the best accuracy reported so far. The results of string kernels on Norwegian native language identification are also significantly better than the state-of-the-art approach. In addition, in a cross-corpus experiment, the proposed approach shows that it can also be topic independent, improving the state-of-the-art system by 32.3%. To gain additional insights about the string kernels approach, the features selected by the classifier as being more discriminating are analyzed in this work. The analysis also offers information about localized language transfer effects, since the features used by the proposed model are p-grams of various lengths. The features captured by the model typically include stems, function words, and word prefixes and suffixes, which have the potential to generalize over purely word-based features. By analyzing the discriminating features, this article offers insights into two kinds of language transfer effects, namely, word choice lexical transfer and morphological differences. The goal of the current study is to give a full view of the string kernels approach and shed some light on why this approach works so well.

...read moreread less

Journal Article•DOI•

Practical compressed string dictionaries

[...]

Miguel A. Martínez-Prieto¹, Nieves R. Brisaboa², Rodrigo Cánovas³, Francisco Claude⁴, Gonzalo Navarro⁵ - Show less +1 more•Institutions (5)

University of Valladolid¹, University of A Coruña², NICTA³, Diego Portales University⁴, University of Chile⁵

01 Mar 2016-Information Systems

TL;DR: This paper revisit classical solutions for string dictionaries like hashing, tries, and front-coding, and improve them by using compression techniques, and introduces some novel string dictionary representations built on top of recent advances in succinct data structures and full-text indexes.

...read moreread less

Journal Article•DOI•

Padé Approximation of Delays in Cooperative ACC Based on String Stability Requirements

[...]

Haitao Xing¹, Jeroen Ploeg, Henk Nijmeijer¹•Institutions (1)

Eindhoven University of Technology¹

01 Sep 2016

TL;DR: This work provides a method to decide for the lowest possible order of the Padé approximation, which is sufficiently accurate in view of CACC (string) stability analysis, and compares the minimum string-stable time gaps for a CACC system with both exact and approximated delays.

...read moreread less

Abstract: Cooperative adaptive cruise control (CACC) improves road throughput by employing intervehicle wireless communications. The inherent communication time delay and vehicle actuator delay significantly limit the minimum intervehicle distance in view of string stability requirements. Hence, controller design needs to consider both delays, which result in a nonrational transfer function representation of the CACC-controlled string. Pade approximations can be applied to arrive at a finite-dimensional model, which allows for many standard control methods. Our objective is to provide a method to decide for the lowest possible order of the Pade approximation, which is sufficiently accurate in view of CACC (string) stability analysis. The constant time gap strategy and a one-vehicle look-ahead topology are adopted to develop a CACC stable string. First, based on the stable controller parameter region, a suitable order of Pade approximations of the vehicle actuator delay can been carried out in view of individual vehicle stability. Then, the minimum string-stable time gaps for a CACC system with both exact and approximated delays have been compared. The procedure with a Proportional-derivative controller to choose the approximation order of delays has been given, followed by the time-domain simulation validation.

...read moreread less

Patent•

Optimized retrieval of custom string resources

[...]

Edmund Alexander Davis¹, Shaun Logan¹, Duncan Richard Mills¹•Institutions (1)

Business International Corporation¹

06 Jul 2016

TL;DR: In this article, a desktop integration framework is proposed to optimize retrieval of custom string resources from resource bundles hosted by server computer systems by using a document as a user interface to a web-server application hosted by a server computer system.

...read moreread less

Abstract: In various embodiments, methods, systems, and non-transitory computer-readable media are disclosed that allow a desktop integration framework to optimize retrieval of custom string resources from resource bundles hosted by server computer systems. A client computer that uses a document as a user interface to a web-server application hosted by a server-computer system can determine which custom string resources are to be utilized in the document. The client computer system can request only the custom string resources that are determined to be utilized in the document from the server-computer system in a single request thereby optimizing retrieval without requesting entire resource bundles.

...read moreread less

Patent•

CAPTCHA challenge incorporating obfuscated characters

[...]

Michael Brown¹, Carlos Fonseca¹, Neil Ian Readshaw¹•Institutions (1)

IBM¹

16 May 2016

TL;DR: In this paper, a method for determining if a user of a computer system is a human was proposed, where a processor receives an indication that a computer security program is needed and acquires at least one image depicting a first string of characters including at least a first and second set of one or more characters.

...read moreread less

Abstract: A method for determining if a user of a computer system is a human. A processor receives an indication that a computer security program is needed and acquires at least one image depicting a first string of characters including at least a first and second set of one or more characters. A processor assigns a substitute character to be used as input for each of the second set of one or more characters. A processor presents the at least one image and an indication of the substitute character and when to use the substitute character to the user. A processor receives a second string of characters from the user. A processor determines whether the second string of characters substantially matches the first string of characters based on the substitute character assigned to each of the second set of one or more characters and determines whether the user is a human.

...read moreread less

Proceedings Article•DOI•

FIDEX: filtering spreadsheet data using examples

[...]

Xinyu Wang¹, Sumit Gulwani², Rishabh Singh²•Institutions (2)

University of Texas at Austin¹, Microsoft²

19 Oct 2016

TL;DR: This work presents a system, FIDEX, that can efficiently learn desired data filtering expressions from a small set of positive and negative string examples, and designs an expressive DSL to represent disjunctive filter expressions needed for several real-world data filtering tasks.

...read moreread less

Abstract: Data filtering in spreadsheets is a common problem faced by millions of end-users. The task of data filtering requires a computational model that can separate intended positive and negative string instances. We present a system, FIDEX, that can efficiently learn desired data filtering expressions from a small set of positive and negative string examples. There are two key ideas of our approach. First, we design an expressive DSL to represent disjunctive filter expressions needed for several real-world data filtering tasks. Second, we develop an efficient synthesis algorithm for incrementally learning consistent filter expressions in the DSL from very few positive and negative examples. A DAG-based data structure is used to succinctly represent a large number of filter expressions, and two corresponding operators are defined for algorithmically handling positive and negative examples, namely, the intersection and subtraction operators. FIDEX is able to learn data filters for 452 out of 460 real-world data filtering tasks in real time (0.22s), using only 2.2 positive string instances and 2.7 negative string instances on average.

...read moreread less

Posted Content•

Space-Efficient Construction of Compressed Indexes in Deterministic Linear Time

[...]

J. Ian Munro¹, Gonzalo Navarro², Yakov Nekrich¹•Institutions (2)

University of Waterloo¹, University of Chile²

15 Jul 2016-arXiv: Data Structures and Algorithms

TL;DR: The compressed suffix array and the compressed suffix tree of a string $T$ can be built in O(n) deterministic time using $O(n\log\sigma)$ bits of space, where n is the string length and $\sigma$ is the alphabet size.

...read moreread less

Abstract: We show that the compressed suffix array and the compressed suffix tree of a string $T$ can be built in $O(n)$ deterministic time using $O(n\log\sigma)$ bits of space, where $n$ is the string length and $\sigma$ is the alphabet size. Previously described deterministic algorithms either run in time that depends on the alphabet size or need $\omega(n\log \sigma)$ bits of working space. Our result has immediate applications to other problems, such as yielding the first linear-time LZ77 and LZ78 parsing algorithms that use $O(n \log\sigma)$ bits.

...read moreread less

Journal Article•DOI•

Variable neighborhood search for the second type of two-sided assembly line balancing problem

[...]

Deming Lei¹, Xiuping Guo²•Institutions (2)

Wuhan University of Technology¹, Southwest Jiaotong University²

01 Aug 2016-Computers & Operations Research

TL;DR: An effective variable neighborhood search (VNS) is proposed to solve TALBP-II, which is to minimize cycle time for a given number of stations and the computational results show the promising advantage of VNS on the considered TAL BP-II.

...read moreread less

Proceedings Article•

UnibucKernel: An Approach for Arabic Dialect Identification Based on Multiple String Kernels

[...]

Radu Tudor Ionescu¹, Marius Popescu¹•Institutions (1)

University of Bucharest¹

01 Dec 2016

TL;DR: This paper presents a method that uses only character p-grams as features for the Arabic Dialect Identification (ADI) Closed Shared Task of the DSL 2016 Challenge and has an important advantage in that it is language independent and linguistic theory neutral, as it does not require any NLP tools.

...read moreread less

Abstract: The most common approach in text mining classification tasks is to rely on features like words, part-of-speech tags, stems, or some other high-level linguistic features. Unlike the common approach, we present a method that uses only character p-grams (also known as n-grams) as features for the Arabic Dialect Identification (ADI) Closed Shared Task of the DSL 2016 Challenge. The proposed approach combines several string kernels using multiple kernel learning. In the learning stage, we try both Kernel Discriminant Analysis (KDA) and Kernel Ridge Regression (KRR), and we choose KDA as it gives better results in a 10-fold cross-validation carried out on the training set. Our approach is shallow and simple, but the empirical results obtained in the ADI Shared Task prove that it achieves very good results. Indeed, we ranked on the second place with an accuracy of 50.91% and a weighted F1 score of 51.31%. We also present improved results in this paper, which we obtained after the competition ended. Simply by adding more regularization into our model to make it more suitable for test data that comes from a different distribution than training data, we obtain an accuracy of 51.82% and a weighted F1 score of 52.18%. Furthermore, the proposed approach has an important advantage in that it is language independent and linguistic theory neutral, as it does not require any NLP tools.

...read moreread less

Book Chapter•DOI•

String Analysis via Automata Manipulation with Logic Circuit Representation

[...]

Hung-En Wang¹, Tzung-Lin Tsai¹, Chun-Han Lin², Fang Yu², Jie-Hong R. Jiang¹ - Show less +1 more•Institutions (2)

National Taiwan University¹, National Chengchi University²

17 Jul 2016

TL;DR: This paper proposes a new string analysis method based on a scalable logic circuit representation for (nondeterministic) finite automata to support various string and automata manipulation operations and enables both counterexample generation and filter synthesis in string constraint solving.

...read moreread less

Abstract: Many severe security vulnerabilities in web applications can be attributed to string manipulation mistakes, which can often be avoided through formal string analysis. String analysis tools are indispensable and under active development. Prior string analysis methods are primarily automata-based or satisfiability-based. The two approaches exhibit distinct strengths and weaknesses. Specifically, existing automata-based methods have difficulty in generating counterexamples at system inputs to witness vulnerability, whereas satisfiability-based methods are inadequate to produce filters amenable for firmware or hardware implementation for real-time screening of malicious inputs to a system under protection. In this paper, we propose a new string analysis method based on a scalable logic circuit representation for (nondeterministic) finite automata to support various string and automata manipulation operations. It enables both counterexample generation and filter synthesis in string constraint solving. By using the new data structure, automata with large state spaces and/or alphabet sizes can be efficiently represented. Empirical studies on a large set of open source web applications and well-known attack patterns demonstrate the unique benefits of our method compared to prior string analysis tools.

...read moreread less

Posted Content•

Kernels for sequentially ordered data

[...]

Franz J. Király¹, Harald Oberhauser²•Institutions (2)

University College London¹, University of Oxford²

29 Jan 2016-arXiv: Machine Learning

TL;DR: In this paper, a signature-based sequential kernel framework for learning with sequential data, such as time series, sequences of graphs, or strings, is presented. But it does not address the problem of non-definiteness.

...read moreread less

Abstract: We present a novel framework for kernel learning with sequential data of any kind, such as time series, sequences of graphs, or strings. Our approach is based on signature features which can be seen as an ordered variant of sample (cross-)moments; it allows to obtain a "sequentialized" version of any static kernel. The sequential kernels are efficiently computable for discrete sequences and are shown to approximate a continuous moment form in a sampling sense. A number of known kernels for sequences arise as "sequentializations" of suitable static kernels: string kernels may be obtained as a special case, and alignment kernels are closely related up to a modification that resolves their open non-definiteness issue. Our experiments indicate that our signature-based sequential kernel framework may be a promising approach to learning with sequential data, such as time series, that allows to avoid extensive manual pre-processing.

...read moreread less

Posted Content•

Neuro-Symbolic Program Synthesis

[...]

Emilio Parisotto¹, Abdelrahman Mohamed², Rishabh Singh², Lihong Li, Dengyong Zhou², Pushmeet Kohli² - Show less +2 more•Institutions (2)

University of Toronto¹, Microsoft²

06 Nov 2016-arXiv: Artificial Intelligence

TL;DR: This paper proposes a novel technique, Neuro-Symbolic Program Synthesis, that can automatically construct computer programs in a domain-specific language that are consistent with a set of input-output examples provided at test time and demonstrates the effectiveness of the approach by applying it to the rich and complex domain of regular expression based string transformations.

...read moreread less

Collapse