Showing papers on "String (computer science) published in 2012"

PDF

Open Access

Journal Article•DOI•

STRING v9.1: protein-protein interaction networks, with increased coverage and integration

[...]

Andrea Franceschini¹, Damian Szklarczyk¹, Sune Frankild¹, Michael Kuhn¹, Milan Simonovic¹, Alexander Roth¹, Jianyi Lin¹, Pablo Minguez¹, Peer Bork¹, Christian von Mering¹, Lars Juhl Jensen¹ - Show less +7 more•Institutions (1)

Swiss Institute of Bioinformatics¹

29 Nov 2012-Nucleic Acids Research

TL;DR: The update to version 9.1 of STRING is described, introducing several improvements, including extending the automated mining of scientific texts for interaction information, to now also include full-text articles, and providing users with statistical information on any functional enrichment observed in their networks.

...read moreread less

Abstract: Complete knowledge of all direct and indirect interactions between proteins in a given cell would represent an important milestone towards a comprehensive description of cellular mechanisms and functions. Although this goal is still elusive, considerable progress has been made-particularly for certain model organisms and functional systems. Currently, protein interactions and associations are annotated at various levels of detail in online resources, ranging from raw data repositories to highly formalized pathway databases. For many applications, a global view of all the available interaction data is desirable, including lower-quality data and/or computational predictions. The STRING database (http://string-db.org/) aims to provide such a global perspective for as many organisms as feasible. Known and predicted associations are scored and integrated, resulting in comprehensive protein networks covering >1100 organisms. Here, we describe the update to version 9.1 of STRING, introducing several improvements: (i) we extend the automated mining of scientific texts for interaction information, to now also include full-text articles; (ii) we entirely re-designed the algorithm for transferring interactions from one model organism to the other; and (iii) we provide users with statistical information on any functional enrichment observed in their networks.

...read moreread less

3,900 citations

Journal Article•DOI•

Distributed Receding Horizon Control of Vehicle Platoons: Stability and String Stability

[...]

William B. Dunbar¹, Derek S. Caveney²•Institutions (2)

University of California, Santa Cruz¹, Toyota²

01 Mar 2012-IEEE Transactions on Automatic Control

TL;DR: In this article, the authors consider the problem of distributed control of a platoon of vehicles with nonlinear dynamics and derive sufficient conditions that guarantee asymptotic stability and string stability.

...read moreread less

Abstract: This paper considers the problem of distributed control of a platoon of vehicles with nonlinear dynamics. We present distributed receding horizon control algorithms and derive sufficient conditions that guarantee asymptotic stability, leader-follower string stability, and predecessor-follower string stability, following a step speed change in the platoon. Vehicles compute their own control in parallel, and receive communicated position and velocity error trajectories from their immediate predecessor. Leader-follower string stability requires additional communication from the lead car at each update, in the form of a position error trajectory. Predecessor-follower string stability, as we define it, implies leader-follower string stability. Predecessor-follower string stability requires stricter constraints in the local optimal control problems than the leader-follower formulation, but communication from the lead car is required only once at initialization. Provided an initially feasible solution can be found, subsequent feasibility of the algorithms are guaranteed at every update. The theory is generalized for nonlinear decoupled dynamics, and is thus applicable to fleets of planes, robots, or boats, in addition to cars. A simple seven-car simulation examines parametric tradeoffs that affect stability and string stability. Analysis on platoon formation, heterogeneity and size (length) is also considered, resulting in intuitive tradeoffs between lead car and following car control flexibility.

...read moreread less

357 citations

Patent•

Three-dimensional semiconductor memory device

[...]

Sunil Shim, Jang-Gn Yun, Jeong-Hyuk Choi, Kwang Soo Seol, Jae-Hoon Jang, Jung-Dal Choi - Show less +2 more

14 Aug 2012

TL;DR: In this article, a semiconductor memory device is provided including first and second cell strings formed on a substrate, the first cell strings jointly connected to a bit line, and the second string selection unit of the second cell string has a channel dopant region.

...read moreread less

Abstract: A semiconductor memory device is provided including first and second cell strings formed on a substrate, the first and second cell strings jointly connected to a bit line, wherein each of the first and second cell strings includes a ground selection unit, a memory cell, and first and second string selection units sequentially formed on the substrate to be connected to each other, wherein the ground selection unit is connected to a ground selection line, the memory cell is connected to a word line, the first string selection unit is connected to a first string selection line, and the second string selection unit is connected to a second string selection line, and wherein the second string selection unit of the first cell string has a channel dopant region.

...read moreread less

330 citations

Proceedings Article•

Automatically Constructing a Normalisation Dictionary for Microblogs

[...]

Bo Han¹, Paul Cook¹, Timothy Baldwin¹•Institutions (1)

University of Melbourne¹

12 Jul 2012

TL;DR: This paper proposes a method for constructing a dictionary of lexical variants of known words that facilitates lexical normalisation via simple string substitution and shows that a dictionary-based approach achieves state-of-the-art performance for both F-score and word error rate on a standard dataset.

...read moreread less

Abstract: Microblog normalisation methods often utilise complex models and struggle to differentiate between correctly-spelled unknown words and lexical variants of known words In this paper, we propose a method for constructing a dictionary of lexical variants of known words that facilitates lexical normalisation via simple string substitution (eg tomorrow for tmrw) We use context information to generate possible variant and normalisation pairs and then rank these by string similarity Highly-ranked pairs are selected to populate the dictionary We show that a dictionary-based approach achieves state-of-the-art performance for both F-score and word error rate on a standard dataset Compared with other methods, this approach offers a fast, lightweight and easy-to-use solution, and is thus suitable for high-volume microblog pre-processing

...read moreread less

203 citations

Journal Article•DOI•

Evaluation of CACC string stability using SUMO, Simulink and OMNeT++

[...]

Chenxi Lei¹, Emiel Martijn van Eenennaam¹, Wouter Klein Wolterink¹, Jeroen Ploeg, Georgios Karagiannis¹, Geert Heijenk¹ - Show less +2 more•Institutions (1)

University of Twente¹

23 Mar 2012-Eurasip Journal on Wireless Communications and Networking

TL;DR: The string stability of CACC is discussed and its performance under varying packet loss ratios, beacon sending frequencies, and time headway settings in simulation experiments is evaluated.

...read moreread less

Abstract: Recent development in wireless technology enables communication between vehicles. The concept of cooperative adaptive cruise control (CACC)–which uses wireless communication between vehicles–aims at string stable behavior in a platoon of vehicles. “String stability” means any non-zero position, speed, and acceleration errors of an individual vehicle in a string do not amplify when they propagate upstream. In this article, we will discuss the string stability of CACC and evaluate its performance under varying packet loss ratios, beacon sending frequencies, and time headway settings in simulation experiments. The simulation framework is built up with a controller prototype, a traffic simulator, and a network simulator.

...read moreread less

189 citations

Book Chapter•DOI•

Functional Encryption for Regular Languages

[...]

Brent Waters¹•Institutions (1)

University of Texas at Austin¹

19 Aug 2012

TL;DR: In this paper, the authors proposed a functional encryption system that supports functionality for regular languages, where a secret key is associated with a Deterministic Finite Automata (DFA) M. A ciphertext is encrypted and associated with an arbitrary length string w. A user is able to decrypt the ciphertext if and only if the DFA M associated with his private key accepts the string w w.

...read moreread less

Abstract: We provide a functional encryption system that supports functionality for regular languages. In our system a secret key is associated with a Deterministic Finite Automata (DFA) M. A ciphertext \(\text {CT}\) encrypts a message m and is associated with an arbitrary length string w. A user is able to decrypt the ciphertext \(\text {CT}\) if and only if the DFA M associated with his private key accepts the string w.

...read moreread less

166 citations

Journal Article•DOI•

Robust Adaptive Boundary Control of a Vibrating String Under Unknown Time-Varying Disturbance

[...]

Wei He¹, Shuzhi Sam Ge¹•Institutions (1)

National University of Singapore¹

01 Jan 2012-IEEE Transactions on Control Systems and Technology

TL;DR: In this article, robust adaptive boundary control is developed for a class of flexible string-type systems under unknown time-varying disturbance, where the dynamics of the string system is represented by a nonhomogeneous hyperbolic partial differential equation (PDE) and two ordinary differential equations.

...read moreread less

Abstract: In this paper, robust adaptive boundary control is developed for a class of flexible string-type systems under unknown time-varying disturbance. The dynamics of the string system is represented by a nonhomogeneous hyperbolic partial differential equation (PDE) and two ordinary differential equations. Boundary control is proposed at the right boundary of the string based on the original distributed parameter system model (PDE) to suppress the vibration excited by the external unknown disturbance. Adaptive control is designed to compensate the system parametric uncertainty. With the proposed robust adaptive boundary control, all the signals in the closed-loop system are guaranteed to be uniformly ultimately bounded. The state of the string system is proven to converge to a small neighborhood of zero by appropriately choosing design parameters. Simulations are provided to illustrate the effectiveness of the proposed control.

...read moreread less

151 citations

Journal Article•DOI•

Localizing Text in Scene Images by Boundary Clustering, Stroke Segmentation, and String Fragment Classification

[...]

Chucai Yi¹, Yingli Tian¹•Institutions (1)

City University of New York¹

01 Sep 2012-IEEE Transactions on Image Processing

TL;DR: The proposed framework of text localization is evaluated on scene images, born-digital images, broadcast video images, and images of handheld objects captured by blind persons and demonstrates that the framework outperforms state-of-the-art localization algorithms.

...read moreread less

Abstract: In this paper, we propose a novel framework to extract text regions from scene images with complex backgrounds and multiple text appearances. This framework consists of three main steps: boundary clustering (BC), stroke segmentation, and string fragment classification. In BC, we propose a new bigram-color-uniformity-based method to model both text and attachment surface, and cluster edge pixels based on color pairs and spatial positions into boundary layers. Then, stroke segmentation is performed at each boundary layer by color assignment to extract character candidates. We propose two algorithms to combine the structural analysis of text stroke with color assignment and filter out background interferences. Further, we design a robust string fragment classification based on Gabor-based text features. The features are obtained from feature maps of gradient, stroke distribution, and stroke width. The proposed framework of text localization is evaluated on scene images, born-digital images, broadcast video images, and images of handheld objects captured by blind persons. Experimental results on respective datasets demonstrate that the framework outperforms state-of-the-art localization algorithms.

...read moreread less

135 citations

Posted Content•

A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance

[...]

Andrew McCallum¹, Kedar Bellare¹, Fernando Pereira²•Institutions (2)

University of Massachusetts Amherst¹, University of Pennsylvania²

04 Jul 2012-arXiv: Learning

TL;DR: This article proposed a discriminative string-edit CRF, a conditional random field model for edit sequences between strings, which is trained on both positive and negative instances of string pairs.

...read moreread less

Abstract: The need to measure sequence similarity arises in information extraction, object identity, data mining, biological sequence analysis, and other domains. This paper presents discriminative string-edit CRFs, a finitestate conditional random field model for edit sequences between strings. Conditional random fields have advantages over generative approaches to this problem, such as pair HMMs or the work of Ristad and Yianilos, because as conditionally-trained methods, they enable the use of complex, arbitrary actions and features of the input strings. As in generative models, the training data does not have to specify the edit sequences between the given string pairs. Unlike generative models, however, our model is trained on both positive and negative instances of string pairs. We present positive experimental results on several data sets.

...read moreread less

133 citations

Patent•

Hybrid adaptation of named entity recognition

[...]

Vassilina Nikoulina¹, Ágnes Sándor¹•Institutions (1)

Xerox¹

07 Dec 2012

TL;DR: In this article, a machine translation method includes receiving a source text string and identifying any named entities, and the identified named entities may be processed to exclude common nouns and function words, based on the extracted features, a protocol is selected for translating the source text text.

...read moreread less

Abstract: A machine translation method includes receiving a source text string and identifying any named entities. The identified named entities may be processed to exclude common nouns and function words. Features are extracted from the source text string relating to the identified named entities. Based on the extracted features, a protocol is selected for translating the source text string. A first translation protocol includes forming a reduced source string from the source text string in which the named entity is replaced by a placeholder, translating the reduced source string by machine translation to generate a translated reduced target string, while processing the named entity separately to be incorporated into the translated reduced target string. A second translation protocol includes translating the source text string by machine translation, without replacing the named entity with the placeholder. The target text string produced by the selected protocol is output.

...read moreread less

121 citations

Book Chapter•DOI•

Synthesizing number transformations from input-output examples

[...]

Rishabh Singh¹, Sumit Gulwani²•Institutions (2)

Massachusetts Institute of Technology¹, Microsoft²

07 Jul 2012

TL;DR: A framework that can learn number transformations from very few input-output examples is presented, and an inductive synthesis algorithm for manipulating data types that have numbers as a constituent sub-type such as date, unit, and time is obtained.

...read moreread less

Abstract: Numbers are one of the most widely used data type in programming languages. Number transformations like formatting and rounding present a challenge even for experienced programmers as they find it difficult to remember different number format strings supported by different programming languages. These transformations present an even bigger challenge for end-users of spreadsheet systems like Microsoft Excel where providing such custom format strings is beyond their expertise. In our extensive case study of help forums of many programming languages and Excel, we found that both programmers and end-users struggle with these number transformations, but are able to easily express their intent using input-output examples. In this paper, we present a framework that can learn such number transformations from very few input-output examples. We first describe an expressive number transformation language that can model these transformations, and then present an inductive synthesis algorithm that can learn all expressions in this language that are consistent with a given set of examples. We also present a ranking scheme of these expressions that enables efficient learning of the desired transformation from very few examples. By combining our inductive synthesis algorithm for number transformations with an inductive synthesis algorithm for syntactic string transformations, we are able to obtain an inductive synthesis algorithm for manipulating data types that have numbers as a constituent sub-type such as date, unit, and time. We have implemented our algorithms as an Excel add-in and have evaluated it successfully over several benchmarks obtained from the help forums and the Excel product team.

...read moreread less

Proceedings Article•DOI•

Automated repair of HTML generation errors in PHP applications using string constraint solving

[...]

Hesam Samimi¹, Max Schäfer², Shay Artzi², Todd Millstein¹, Frank Tip², Laurie Hendren³ - Show less +2 more•Institutions (3)

University of California, Los Angeles¹, IBM², McGill University³

02 Jun 2012

TL;DR: It is observed that malformed HTML is often produced by incorrect constant prints, i.e., statements that print string literals, and two tools for automatically repairing such HTML generation errors are presented.

...read moreread less

Abstract: PHP web applications routinely generate invalid HTML. Modern browsers silently correct HTML errors, but sometimes malformed pages render inconsistently, cause browser crashes, or expose security vulnerabilities. Fixing errors in generated pages is usually straightforward, but repairing the generating PHP program can be much harder. We observe that malformed HTML is often produced by incorrect "constant prints", i.e., statements that print string literals, and present two tools for automatically repairing such HTML generation errors. PHPQuickFix repairs simple bugs by statically analyzing individual prints. PHPRepair handles more general repairs using a dynamic approach. Based on a test suite, the property that all tests should produce their expected output is encoded as a string constraint over variables representing constant prints. Solving this constraint describes how constant prints must be modified to make all tests pass. Both tools were implemented as an Eclipse plugin and evaluated on PHP programs containing hundreds of HTML generation errors, most of which our tools were able to repair automatically.

...read moreread less

Posted Content•

Learning Semantic String Transformations from Examples

[...]

Rishabh Singh¹, Sumit Gulwani²•Institutions (2)

Massachusetts Institute of Technology¹, Microsoft²

26 Apr 2012-arXiv: Databases

TL;DR: An expressive transformation language for semantic manipulation that combines table lookup operations and syntactic manipulations is described and a synthesis algorithm that can learn all transformations in the language that are consistent with the user-provided set of input-output examples is presented.

...read moreread less

Abstract: We address the problem of performing semantic transformations on strings, which may represent a variety of data types (or their combination) such as a column in a relational table, time, date, currency, etc. Unlike syntactic transformations, which are based on regular expressions and which interpret a string as a sequence of characters, semantic transformations additionally require exploiting the semantics of the data type represented by the string, which may be encoded as a database of relational tables. Manually performing such transformations on a large collection of strings is error prone and cumbersome, while programmatic solutions are beyond the skill-set of end-users. We present a programming by example technology that allows end-users to automate such repetitive tasks. We describe an expressive transformation language for semantic manipulation that combines table lookup operations and syntactic manipulations. We then present a synthesis algorithm that can learn all transformations in the language that are consistent with the user-provided set of input-output examples. We have implemented this technology as an add-in for the Microsoft Excel Spreadsheet system and have evaluated it successfully over several benchmarks picked from various Excel help-forums.

...read moreread less

Proceedings Article•DOI•

String stability of interconnected vehicles under communication constraints

[...]

Sinan Oncu¹, Nathan van de Wouw¹, W. P. Maurice H. Heemels¹, Henk Nijmeijer¹•Institutions (1)

Eindhoven University of Technology¹

01 Dec 2012

TL;DR: Conditions on the uncertain sampling intervals and delays under which string stability can still be guaranteed are provided to support the design of CACC systems that are robust to uncertainties introduced by wireless communication.

...read moreread less

Abstract: In this paper, we present a novel modelling and string stability analysis method for an interconnected vehicle string in which information exchange takes place via wireless communication. The usage of wireless communication introduces time-varying sampling intervals, delays, and communication constraints of which the impact on string stability requires a careful analysis. In particular, we study a Cooperative Adaptive Cruise Control (CACC) system which regulates inter-vehicle distances in a vehicle string and utilizes information exchange between vehicles through wireless communication in addition to local sensor measurements. The propagation of disturbances through the interconnected vehicle string is inspected by using the notion of so-called string stability which is formulated here in terms of an ℒ 2 -gain requirement from disturbance inputs to controlled outputs. This paper provides conditions on the uncertain sampling intervals and delays under which string stability can still be guaranteed. These results support the design of CACC systems that are robust to uncertainties introduced by wireless communication.

...read moreread less

Journal Article•DOI•

Learning semantic string transformations from examples

[...]

Rishabh Singh¹, Sumit Gulwani²•Institutions (2)

Massachusetts Institute of Technology¹, Microsoft²

01 Apr 2012

TL;DR: In this article, the problem of performing semantic transformations on strings, which may represent a variety of data types (or their combination) such as a column in a relational table, time, date, currency, etc., is addressed.

...read moreread less

Abstract: We address the problem of performing semantic transformations on strings, which may represent a variety of data types (or their combination) such as a column in a relational table, time, date, currency, etc. Unlike syntactic transformations, which are based on regular expressions and which interpret a string as a sequence of characters, semantic transformations additionally require exploiting the semantics of the data type represented by the string, which may be encoded as a database of relational tables. Manually performing such transformations on a large collection of strings is error prone and cumbersome, while programmatic solutions are beyond the skill-set of end-users. We present a programming by example technology that allows end-users to automate such repetitive tasks.We describe an expressive transformation language for semantic manipulation that combines table lookup operations and syntactic manipulations. We then present a synthesis algorithm that can learn all transformations in the language that are consistent with the user-provided set of input-output examples. We have implemented this technology as an add-in for the Microsoft Excel Spreadsheet system and have evaluated it successfully over several benchmarks picked from various Excel help-forums.

...read moreread less

Journal Article•DOI•

Simple diagnostic approach for determining of faulted PV modules in string based PV arrays

[...]

Nuri Gokmen¹, Engin Karatepe¹, Berk Celik¹, Santiago Silvestre²•Institutions (2)

Ege University¹, Polytechnic University of Catalonia²

01 Nov 2012-Solar Energy

TL;DR: A simple diagnostic method to determine the number of open and short circuited PV modules in a string of a PV system by taking into account the economical factor, such as minimum number of sensors, has been proposed.

...read moreread less

Journal Article•DOI•

Prioritizing test cases with string distances

[...]

Yves Ledru¹, Alexandre Petrenko, Sergiy Boroday, Nadine Mandran¹•Institutions (1)

Grenoble Institute of Technology¹

01 Mar 2012

TL;DR: The obtained results indicate that prioritisation based on string distances is more efficient in finding defects than random ordering of the test suite: the test suites prioritized using string distances are moreefficient in detecting the strongest mutants, and, on average, have a better APFD than randomly ordered test suites.

...read moreread less

Abstract: Test case prioritisation aims at finding an ordering which enhances a certain property of an ordered test suite. Traditional techniques rely on the availability of code or a specification of the program under test. We propose to use string distances on the text of test cases for their comparison and elaborate a prioritisation algorithm. Such a prioritisation does not require code or a specification and can be useful for initial testing and in cases when code is difficult to instrument. In this paper, we also report on experiments performed on the "Siemens Test Suite", where the proposed prioritisation technique was compared with random permutations and four classical string distance metrics were evaluated. The obtained results, confirmed by a statistical analysis, indicate that prioritisation based on string distances is more efficient in finding defects than random ordering of the test suite: the test suites prioritized using string distances are more efficient in detecting the strongest mutants, and, on average, have a better APFD than randomly ordered test suites. The results suggest that string distances can be used for prioritisation purposes, and Manhattan distance could be the best choice.

...read moreread less

Patent•

In-context word prediction and word correction

[...]

Jerome Pasquero¹, Donald Somerset McCulloch Mckenzie¹, Jason Tyler Griffin¹•Institutions (1)

BlackBerry Limited¹

16 Mar 2012

TL;DR: In this paper, the authors present a system for predicting user input on a keyboard consisting of at least three fields: the first field displays an input string that is based on input selections such as keyboard entries, the second field displays a candidate prediction generated based on other input selections, consisting at least in part of a proposed completion to the input selection, and partially based on the input string in the first fields.

...read moreread less

Abstract: Methods and systems for predicting user input on a keyboard. Methods include enabling user input on a display comprising at least three fields. The first field displays an input string that is based on input selections such as keyboard entries. The second field displays a candidate prediction generated based on other input selections, consisting at least in part of a proposed completion to the input selection, and partially based on the input string in the first field. The third field displays another candidate prediction generated based on the input string in the first field as well as the candidate prediction in the second field.

...read moreread less

Patent•

Lexical and phrasal feature domain adaptation in statistical machine translation

[...]

Vassilina Nikoulina¹, Stéphane Clinchant¹•Institutions (1)

Xerox¹

28 Aug 2012

TL;DR: In this paper, a translation method is adapted to a domain of interest by generating a set of candidate translations of the source text string, each candidate translation comprising a sequence of target words in a target language.

...read moreread less

Abstract: A translation method is adapted to a domain of interest. The method includes receiving a source text string comprising a sequence of source words in a source language and generating a set of candidate translations of the source text string, each candidate translation comprising a sequence of target words in a target language. An optimal translation is identified from the set of candidate translations as a function of at least one domain-adapted feature computed based on bilingual probabilities and monolingual probabilities. Each bilingual probability is for a source text fragment and a target text fragment of the source text string and candidate translation respectively. The bilingual probabilities are estimated on an out-of-domain parallel corpus that includes source and target strings. The monolingual probabilities for text fragments of one of the source text string and candidate translation are estimated on an in-domain monolingual corpus.

...read moreread less

Journal Article•DOI•

Algorithms for Jumbled Pattern Matching in Strings

[...]

Péter Burcsi¹, Ferdinando Cicalese², Gabriele Fici³, Zsuzsanna Lipták⁴•Institutions (4)

Eötvös Loránd University¹, University of Salerno², University of Nice Sophia Antipolis³, Bielefeld University⁴

06 Apr 2012-International Journal of Foundations of Computer Science

TL;DR: Two novel algorithms for the case where the text is fixed and many queries arrive over time are presented, both using an O(n) size data structure, each of which can be constructed in O( n) time.

...read moreread less

Abstract: The Parikh vector p(s) of a string s over a finite ordered alphabet Σ = {a1, …, aσ} is defined as the vector of multiplicities of the characters, p(s) = (p1, …, pσ), where pi = |{j | sj = ai}|. Parikh vector q occurs in s if s has a substring t with p(t) = q. The problem of searching for a query q in a text s of length n can be solved simply and worst-case optimally with a sliding window approach in O(n) time. We present two novel algorithms for the case where the text is fixed and many queries arrive over time. The first algorithm only decides whether a given Parikh vector appears in a binary text. It uses a linear size data structure and decides each query in O(1) time. The preprocessing can be done trivially in Θ(n2) time. The second algorithm finds all occurrences of a given Parikh vector in a text over an arbitrary alphabet of size σ ≥ 2 and has sub-linear expected time complexity. More precisely, we present two variants of the algorithm, both using an O(n) size data structure, each of which can be constructed in O(n) time. The first solution is very simple and easy to implement and leads to an expected query time of , where m = ∑i qi is the length of a string with Parikh vector q. The second uses wavelet trees and improves the expected runtime to , i.e., by a factor of log m.

...read moreread less

Book Chapter•DOI•

A faster grammar-based self-index

[...]

Travis Gagie¹, Paweł Gawrychowski², Juha Kärkkäinen³, Yakov Nekrich⁴•Institutions (4)

Aalto University¹, University of Wrocław², University of Helsinki³, University of Bonn⁴

05 Mar 2012

TL;DR: In this paper, a balanced straight-line program for a string S[1..n] whose LZ77 parse consists of z phrases is presented, which can add O(z log log z) words and obtain a compressed self-index for S such that, given a pattern P [1..m], we can list the occ occurrences of P in S in O(m2 + (m + occ) log log n) time.

...read moreread less

Abstract: To store and search genomic databases efficiently, researchers have recently started building compressed self-indexes based on straight-line programs and LZ77. In this paper we show how, given a balanced straight-line program for a string S[1..n] whose LZ77 parse consists of z phrases, we can add O(z log log z) words and obtain a compressed self-index for S such that, given a pattern P [1..m], we can list the occ occurrences of P in S in O(m2 + (m + occ) log log n) time. All previous self-indexes are either larger or slower in the worst case.

...read moreread less

Journal Article•DOI•

Nondissipative String Current Diverter for Solving the Cascaded DC–DC Converter Connection Problem in Photovoltaic Power Generation System

[...]

Riad Kadri¹, J-P Gaubert¹, Gerard Champenois¹•Institutions (1)

University of Poitiers¹

01 Mar 2012-IEEE Transactions on Power Electronics

TL;DR: In this article, a nonissipative string current diverter is proposed to overcome the problem of inhomogeneous irradiation in photovoltaic (PV) power generation system.

...read moreread less

Abstract: Frequently considered one of the promising solutions for grid connection of the photovoltaic (PV) power generation system, module-integrated converters have been the focus of numerous papers. Most of the proposed approaches thus far have relied on the use of series string of the dc-dc converter to create a high-voltage string connected to the dc-ac inverter. The boost converter is better in this application. However, under inhomogeneous irradiation, the power generated by each PV module and the output dc voltage of each boost become unbalanced so that the output currents of each boost are balanced and equal to the string current. In this case, the boost converter cannot always deliver all the power from a mixture of shaded panels and those delivering full power. In this paper, a nondissipative string current diverter is proposed to overcome this problem. One important feature of the proposed circuit herein is the ability to effectively decouple each converter from the rest of the string, making it insensitive to change in the string current. Hence, it is possible to obtain the maximum power from the PV module with the maximum power point tracking algorithm implemented on each dc-dc converter and to do so at the optimum efficiency. The simulation and experimental results verify that the proposed topology exhibits notable performances despite inhomogeneous irradiation. On the other hand, the string current diverter circuit is very easy to control and does not operate without inhomogeneous irradiation, so the topology efficiency is improved for any type of irradiation.

...read moreread less

Patent•

Handheld Electronic Device and Method for Calibrating Input of Webpage Address

[...]

Te-Pei Tseng¹, Kun-Da Wu¹•Institutions (1)

HTC¹

02 May 2012

TL;DR: In this article, a method for calibrating an input of webpage address used in a handheld electronic device is provided, which comprises a touch display unit, a storage unit for storing a plurality website address data and a processing unit being electrically connected to the touch display units and the storage unit.

...read moreread less

Abstract: A method for calibrating an input of webpage address used in a handheld electronic device is provided. The handheld electronic device comprises a touch display unit, a storage unit for storing a plurality website address data and a processing unit being electrically connected to the touch display unit and the storage unit. The method comprises the steps outlined in the sentences that follow. At least one character is received from the touch display unit, wherein each of the character has a plurality neighboring characters on a keyboard. A plurality of string combinations are generated by the processing unit according to the neighboring characters. The storage unit is searched by the processing unit according to the string combinations to generate an address suggestion list. A handheld electronic device is disclosed herein as well.

...read moreread less

Proceedings Article•DOI•

PermA and Balloon: Tools for string alignment and text processing

[...]

Uwe D. Reichel¹•Institutions (1)

Ludwig Maximilian University of Munich¹

01 Jan 2012

TL;DR: PermA, a general-purpose string aligner, and Balloon, a text processing toolkit for German and English providing components for part-of-speech tagging, morphological analyses, and grapheme-to-phoneme conversion including syllabifica- tion and word-stress assignment are introduced.

...read moreread less

Abstract: Two online research tools are presented in this paper: PermA, a general-purpose string aligner which can for example be used for grapheme-to-phoneme and phoneme-to-phoneme alignment, and Balloon, a text processing toolkit for German and English providing components for part-of-speech tagging, morphological analyses, and grapheme-to-phoneme conversion including syllabifica- tion and word-stress assignment. The general architectures of these tools are introduced with a focus on recent improvements concerning the alignment cost function derivation and word stress assignment.

...read moreread less

Journal Article•DOI•

The Synthesis of Time Optimal Supervisors by Using Heaps-of-Pieces

[...]

Rong Su¹, J.H. van Schuppen, Jacobus E. Rooda²•Institutions (2)

Nanyang Technological University¹, Eindhoven University of Technology²

01 Jan 2012-IEEE Transactions on Automatic Control

TL;DR: In this paper, the minimum-makespan supervisor synthesis problem is solved by a terminable algorithm, where the execution time of each string is computable by the theory of heaps-of-pieces.

...read moreread less

Abstract: In many practical applications, we need to compute a nonblocking supervisor that not only complies with pre-specified safety requirements but also achieves a certain time optimal performance such as maximum throughput. In this paper, we first present a minimum-makespan supervisor synthesis problem. Then we show that the problem can be solved by a terminable algorithm, where the execution time of each string is computable by the theory of heaps-of-pieces. We also provide a timed supervisory control map that can implement the synthesized minimum-makespan sublanguage.

...read moreread less

Journal Article•

Pictures of Processes: Automated Graph Rewriting for Monoidal Categories and Applications to Quantum Computing

[...]

Aleks Kissinger

01 Mar 2012-arXiv: Category Theory

TL;DR: The introduction of a discretised version of a string diagram called a string graph is introduced, and it is shown how string graphs modulo a rewrite system can be used to construct free symmetric traced and compact closed categories on a monoidal signature.

...read moreread less

Abstract: This work is about diagrammatic languages, how they can be represented, and what they in turn can be used to represent. More specifically, it focuses on representations and applications of string diagrams. String diagrams are used to represent a collection of processes, depicted as "boxes" with multiple (typed) inputs and outputs, depicted as "wires". If we allow plugging input and output wires together, we can intuitively represent complex compositions of processes, formalised as morphisms in a monoidal category. [...] The first major contribution of this dissertation is the introduction of a discretised version of a string diagram called a string graph. String graphs form a partial adhesive category, so they can be manipulated using double-pushout graph rewriting. Furthermore, we show how string graphs modulo a rewrite system can be used to construct free symmetric traced and compact closed categories on a monoidal signature. The second contribution is in the application of graphical languages to quantum information theory. We use a mixture of diagrammatic and algebraic techniques to prove a new classification result for strongly complementary observables. [...] We also introduce a graphical language for multipartite entanglement and illustrate a simple graphical axiom that distinguishes the two maximally-entangled tripartite qubit states: GHZ and W. [...] The third contribution is a description of two software tools developed in part by the author to implement much of the theoretical content described here. The first tool is Quantomatic, a desktop application for building string graphs and graphical theories, as well as performing automated graph rewriting visually. The second is QuantoCoSy, which performs fully automated, model-driven theory creation using a procedure called conjecture synthesis.

...read moreread less

Journal Article•DOI•

Trie-join: a trie-based method for efficient string similarity joins

[...]

Jianhua Feng¹, Jiannan Wang¹, Guoliang Li¹•Institutions (1)

Tsinghua University¹

01 Aug 2012

TL;DR: This paper designs efficient trie-join algorithms and pruning techniques to achieve high performance and shows that these algorithms outperform state-of-the-art methods by an order of magnitude on the data sets with short strings.

...read moreread less

Abstract: A string similarity join finds similar pairs between two collections of strings. Many applications, e.g., data integration and cleaning, can significantly benefit from an efficient string-similarity-join algorithm. In this paper, we study string similarity joins with edit-distance constraints. Existing methods usually employ a filter-and-refine framework and suffer from the following limitations: (1) They are inefficient for the data sets with short strings (the average string length is not larger than 30); (2) They involve large indexes; (3) They are expensive to support dynamic update of data sets. To address these problems, we propose a novel method called trie-join, which can generate results efficiently with small indexes. We use a trie structure to index the strings and utilize the trie structure to efficiently find similar string pairs based on subtrie pruning. We devise efficient trie-join algorithms and pruning techniques to achieve high performance. Our method can be easily extended to support dynamic update of data sets efficiently. We conducted extensive experiments on four real data sets. Experimental results show that our algorithms outperform state-of-the-art methods by an order of magnitude on the data sets with short strings.

...read moreread less

Proceedings Article•DOI•

Search-Based Test Input Generation for String Data Types Using the Results of Web Queries

[...]

Phil McMinn¹, Muzammil Shahbaz¹, Mark Stevenson¹•Institutions (1)

University of Sheffield¹

17 Apr 2012

TL;DR: This paper presents an approach in which examples of inputs are sought from the Internet by reformulating program identifiers into web queries, and used to augment and seed a search-based test data generation technique.

...read moreread less

Abstract: Generating realistic, branch-covering string inputs is a challenging problem, due to the diverse and complex types of real-world data that are naturally encodable as strings, for example resource locators, dates of different localised formats, international banking codes, and national identity numbers. This paper presents an approach in which examples of inputs are sought from the Internet by reformulating program identifiers into web queries. The resultant URLs are downloaded, split into tokens, and used to augment and seed a search-based test data generation technique. The use of the Internet as part of test input generation has two key advantages. Firstly, web pages are a rich source of valid inputs for various types of string data that may be used to improve test coverage. Secondly, the web pages tend to contain realistic, human-readable values, which are invaluable when test cases need manual confirmation due to the lack of an automated oracle. An empirical evaluation of the approach is presented, involving string input validation code from 10 open source projects. Well-formed, valid string inputs were retrieved from the web for 96% of the different string types analysed. Using the approach, coverage was improved for 75% of the Java classes studied by an average increase of 14%.

...read moreread less

Patent•

Network on chip processor with multiple cores and routing method thereof

[...]

Liang-Gee Chen¹, Chuan-Yung Tsai¹•Institutions (1)

National Taiwan University¹

30 Aug 2012

TL;DR: In this paper, a network on chip processor including multiple cores and a Kautz NoC is presented, where each of the cores is assigned with an addressing string with L based-D words, and the addressing string does not have two neighboring identical words.

...read moreread less

Abstract: An exemplary embodiment of the present disclosure illustrates a network on chip processor including multiple cores and a Kautz NoC. Each of the cores is assigned with an addressing string with L based-D words, and the addressing string does not have two neighboring identical words, wherein L present of an addressing string length is an integer larger than 1, D present of a word selection is an integer larger than 2. Each of the cores is unidirectionally link to other (D−1) cores through the Kautz NoC, and in the two connected cores, the last (L−1) words associated with the addressing string of one core are same as the first (L−1) words associated with the addressing string of the other core.

...read moreread less

Journal Article•DOI•

The set of parameterized k-covers problem

[...]

Anna Gorbenko¹, V. Yu. Popov¹•Institutions (1)

Ural Federal University¹

01 Mar 2012-Theoretical Computer Science

TL;DR: It is proved that k-SPC is NP-complete, the problem of the set of parameterized k-covers which combines k-cover measure with parameterized matching, which is a distance measure for strings.

...read moreread less

Collapse