scispace - formally typeset
Search or ask a question

Showing papers on "String (computer science) published in 2018"


Posted Content
TL;DR: MolGAN is introduced, an implicit, likelihood-free generative model for small molecular graphs that circumvents the need for expensive graph matching procedures or node ordering heuris-tics of previous likelihood-based methods.
Abstract: eep generative models for graph-structured data offer a new angle on the problem of chemical synthesis: by optimizing differentiable models that directly generate molecular graphs, it is pos-sible to side-step expensive search procedures in the discrete and vast space of chemical structures. We introduce MolGAN, an implicit, likelihood-free generative model for small molecular graphs that circumvents the need for expensive graph matching procedures or node ordering heuris-tics of previous likelihood-based methods. Our method adapts generative adversarial networks (GANs) to operate directly on graph-structured data. We combine our approach with a reinforce-ment learning objective to encourage the genera-tion of molecules with specific desired chemical properties. In experiments on the QM9 chemi-cal database, we demonstrate that our model is capable of generating close to 100% valid com-pounds. MolGAN compares favorably both to recent proposals that use string-based (SMILES) representations of molecules and to a likelihood-based method that directly generates graphs, al-beit being susceptible to mode collapse.

631 citations


Proceedings ArticleDOI
Fan Bai1, Zhanzhan Cheng, Yi Niu, Shiliang Pu, Shuigeng Zhou1 
18 Jun 2018
TL;DR: Zhang et al. as discussed by the authors proposed a novel method called edit probability (EP) for scene text recognition, which tries to estimate the probability of generating a string from the output sequence of probability distribution conditioned on the input image, while considering the possible occurrences of missing/superfluous characters.
Abstract: We consider the scene text recognition problem under the attention-based encoder-decoder framework, which is the state of the art. The existing methods usually employ a frame-wise maximal likelihood loss to optimize the models. When we train the model, the misalignment between the ground truth strings and the attention's output sequences of probability distribution, which is caused by missing or superfluous characters, will confuse and mislead the training process, and consequently make the training costly and degrade the recognition accuracy. To handle this problem, we propose a novel method called edit probability (EP) for scene text recognition. EP tries to effectively estimate the probability of generating a string from the output sequence of probability distribution conditioned on the input image, while considering the possible occurrences of missing/superfluous characters. The advantage lies in that the training process can focus on the missing, superfluous and unrecognized characters, and thus the impact of the misalignment problem can be alleviated or even overcome. We conduct extensive experiments on standard benchmarks, including the IIIT-5K, Street View Text and ICDAR datasets. Experimental results show that the EP can substantially boost scene text recognition performance.

151 citations


Book ChapterDOI
29 Apr 2018
TL;DR: These protocols are provided assuming the minimal assumption that two-round oblivious transfer (OT) exists and that the protocol is secure against semi-honest adversaries and malicious adversaries.
Abstract: We provide new two-round multiparty secure computation (MPC) protocols assuming the minimal assumption that two-round oblivious transfer (OT) exists. If the assumed two-round OT protocol is secure against semi-honest adversaries (in the plain model) then so is our two-round MPC protocol. Similarly, if the assumed two-round OT protocol is secure against malicious adversaries (in the common random/reference string model) then so is our two-round MPC protocol. Previously, two-round MPC protocols were only known under relatively stronger computational assumptions. Finally, we provide several extensions.

135 citations


Proceedings Article
01 Aug 2018
TL;DR: This paper proposes a novel approach which can partially solve the above problems of distant supervision for NER, and applies partial annotation learning to reduce the effect of unknown labels of characters in incomplete and noisy annotations.
Abstract: A bottleneck problem with Chinese named entity recognition (NER) in new domains is the lack of annotated data. One solution is to utilize the method of distant supervision, which has been widely used in relation extraction, to automatically populate annotated training data without humancost. The distant supervision assumption here is that if a string in text is included in a predefined dictionary of entities, the string might be an entity. However, this kind of auto-generated data suffers from two main problems: incomplete and noisy annotations, which affect the performance of NER models. In this paper, we propose a novel approach which can partially solve the above problems of distant supervision for NER. In our approach, to handle the incomplete problem, we apply partial annotation learning to reduce the effect of unknown labels of characters. As for noisy annotation, we design an instance selector based on reinforcement learning to distinguish positive sentences from auto-generated annotations. In experiments, we create two datasets for Chinese named entity recognition in two domains with the help of distant supervision. The experimental results show that the proposed approach obtains better performance than the comparison systems on both two datasets.

130 citations


Journal ArticleDOI
TL;DR: This work discovered a flaw in how previous corpora were created that leads to an over-estimation of classification accuracy, and discovered that most of the information contained in n-grams stem from string features that could be obtained in simpler ways.
Abstract: Malware classification using machine learning algorithms is a difficult task, in part due to the absence of strong natural features in raw executable binary files. Byte n-grams previously have been used as features, but little work has been done to explain their performance or to understand what concepts are actually being learned. In contrast to other work using n-gram features, in this work we use orders of magnitude more data, and we perform feature selection during model building using Elastic-Net regularized Logistic Regression. We compute a regularization path and analyze novel multi-byte identifiers. Through this process, we discover significant previously unreported issues with byte n-gram features that cause their benefits and practicality to be overestimated. Three primary issues emerged from our work. First, we discovered a flaw in how previous corpora were created that leads to an over-estimation of classification accuracy. Second, we discovered that most of the information contained in n-grams stem from string features that could be obtained in simpler ways. Finally, we demonstrate that n-gram features promote overfitting, even with linear models and extreme regularization.

128 citations


Book ChapterDOI
26 Feb 2018
TL;DR: Recent efficient constructions of zero-knowledge Succinct Non-interactive Arguments of Knowledge (zk-SNARKs), require a setup phase in which a common-reference string (CRS) with a certain structure is generated.
Abstract: Recent efficient constructions of zero-knowledge Succinct Non-interactive Arguments of Knowledge (zk-SNARKs), require a setup phase in which a common-reference string (CRS) with a certain structure is generated. This CRS is sometimes referred to as the public parameters of the system, and is used for constructing and verifying proofs. A drawback of these constructions is that whomever runs the setup phase subsequently possesses trapdoor information enabling them to produce fraudulent pseudoproofs.

116 citations


Proceedings ArticleDOI
08 Jul 2018
TL;DR: Training and evaluating on a dataset with 2M domain names shows that there is surprisingly little difference between various convolutional neural network and recurrent neural network based architectures in terms of accuracy, prompting a preference for the simpler architectures, since they are faster to train and to score, and less prone to overfitting.
Abstract: Recently several different deep learning architectures have been proposed that take a string of characters as the raw input signal and automatically derive features for text classification Few studies are available that compare the effectiveness of these approaches for character based text classification with each other In this paper we perform such an empirical comparison for the important cybersecurity problem of DGA detection: classifying domain names as either benign vs produced by malware (ie, by a Domain Generation Algorithm) Training and evaluating on a dataset with 2M domain names shows that there is surprisingly little difference between various convolutional neural network (CNN) and recurrent neural network (RNN) based architectures in terms of accuracy, prompting a preference for the simpler architectures, since they are faster to train and to score, and less prone to overfitting

113 citations


Journal ArticleDOI
TL;DR: This work develops a novel adaptive driving strategy for CAVs to stabilise heterogeneous vehicle strings by controlling one CAV under vehicle-to-infrastructure (V2I) communications and demonstrates the predictive power of the analytical string stability conditions.
Abstract: Literature has shown potentials of Connected/Cooperative Automated Vehicles (CAVs) in improving highway operations, especially on roadway capacity and flow stability. However, benefits were also shown to be negligible at low market penetration rates. This work develops a novel adaptive driving strategy for CAVs to stabilise heterogeneous vehicle strings by controlling one CAV under vehicle-to-infrastructure (V2I) communications. Assumed is a roadside system with V2I communications, which receives control parameters of the CAV in the string and estimates parameters imperfectly of non-connected automated vehicles. It determines the adaptive control parameters (e.g. desired time gap and feedback gains) of the CAV if a downstream disturbance is identified and sends them to the CAV. The CAV changes its behaviour based on the adaptive parameters commanded by the roadside system to suppress the disturbance. The proposed adaptive driving strategy is based on string stability analysis of heterogeneous vehicle strings. To this end, linearised vehicle dynamics model and control law are used in the controller parametrisation and Laplace transform of the speed and gap error dynamics in time domain to frequency domain enables the determination of sufficient string stability criteria of heterogeneous strings. The analytical string stability conditions give new insights into automated vehicular string stability properties in relation to the system properties of time delays and controller design parameters of feedback gains and desired time gap. It further allows the quantification of a stability margin, which is subsequently used to adapt the feedback control gains and desired time gap of the CAV to suppress the amplification of gap and speed errors through the string. Analytical results are verified via systematic simulation of both homogeneous and heterogeneous strings. Simulation demonstrates the predictive power of the analytical string stability conditions. The performance of the adaptive driving strategy under V2I cooperation is tested in simulation. Results show that even the estimation of control parameters of non-connected automated vehicles are imperfect and there is mismatch between the model used in analytical derivation and that in simulation, the proposed adaptive driving strategy suppresses disturbances in a wide range of situations.

111 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present a decentralized battery management system with no communication requirement based on a modular multilevel converter topology with a distributed inductor and distributed controller running on a local microprocessor.
Abstract: The performance of a string of series-connected batteries is typically restricted by the worst cell in the string and a single failure point will render the entire string unusable. To address these issues, we present a decentralized battery management system with no communication requirement based on a modular multilevel converter topology with a distributed inductor and distributed controller running on a local microprocessor. This configuration is referred to as a “smart cell.” By sensing the voltage across the local distributed inductor, each smart cell is able to: first, determine its optimal switching pattern in order to minimize the output voltage ripple; and second, adjust its duty cycle to synchronize its state of charge (SOC) with the average SOC of the series string of cells. The decentralized controller is derived using the theory of Kuramoto oscillators, and the stability of a system of smart cells is investigated. We experimentally show that a system of three smart cells with their decentralized controllers can accurately synchronize the SOC while minimizing their output voltage ripple.

100 citations


Posted Content
Fan Bai1, Zhanzhan Cheng, Yi Niu, Shiliang Pu, Shuigeng Zhou1 
TL;DR: Zhang et al. as mentioned in this paper proposed a novel method called edit probability (EP) for scene text recognition, which tries to estimate the probability of generating a string from the output sequence of probability distribution conditioned on the input image, while considering the possible occurrences of missing/superfluous characters.
Abstract: We consider the scene text recognition problem under the attention-based encoder-decoder framework, which is the state of the art. The existing methods usually employ a frame-wise maximal likelihood loss to optimize the models. When we train the model, the misalignment between the ground truth strings and the attention's output sequences of probability distribution, which is caused by missing or superfluous characters, will confuse and mislead the training process, and consequently make the training costly and degrade the recognition accuracy. To handle this problem, we propose a novel method called edit probability (EP) for scene text recognition. EP tries to effectively estimate the probability of generating a string from the output sequence of probability distribution conditioned on the input image, while considering the possible occurrences of missing/superfluous characters. The advantage lies in that the training process can focus on the missing, superfluous and unrecognized characters, and thus the impact of the misalignment problem can be alleviated or even overcome. We conduct extensive experiments on standard benchmarks, including the IIIT-5K, Street View Text and ICDAR datasets. Experimental results show that the EP can substantially boost scene text recognition performance.

99 citations


Journal ArticleDOI
TL;DR: Under the proposed control, the uniformly ultimately bounded stability of the closed loop system is achieved through rigorous Lyapunov analysis without any discretization or simplification of the dynamics in the time and space.

Journal ArticleDOI
TL;DR: Evidence is provided that the near-quadratic running time bounds known for the problem of computing edit distance might be tight, and it is shown that if the edit distance can be computed in time $O(n^{2-\delta})$ for some constant $\delta>0$, then the satisfiability of conjunctive normal form formulas with $N$ variables and $M$ clauses can be solved in time.
Abstract: The edit distance (a.k.a. the Levenshtein distance) between two strings is defined as the minimum number of insertions, deletions, or substitutions of symbols needed to transform one string into an...

Proceedings ArticleDOI
02 Jun 2018
TL;DR: GenAx is presented, an accelerator for read alignment, a time-consuming step in genome sequencing which achieves 31.7× speedup over the standard BWA-MEM sequence aligner running on a 56-thread dualsocket 14-core Xeon E5 server processor, while reducing power consumption and area.
Abstract: Genomics can transform health-care through precision medicine. Plummeting sequencing costs would soon make genome testing affordable to the masses. Compute efficiency, however, has to improve by orders of magnitude to sequence and analyze the raw genome data. Sequencing software used today can take several hundreds to thousands of CPU hours to align reads to a reference sequence. This paper presents GenAx, an accelerator for read alignment, a time-consuming step in genome sequencing. It consists of a seeding and seed-extension accelerator. The latter is based on an innovative automata design that was designed from the ground-up to enable hardware acceleration. Unlike conventional Levenshtein automata, it is string independent and scales quadratically with edit distance, instead of string length. It supports critical features commonly used in sequencing such as affine gap scoring and traceback. GenAx provides a throughput of 4,058K reads/s for Illumina 101 bp reads. GenAx achieves 31.7x speedup over the standard BWA-MEM sequence aligner running on a 56--thread dualsocket 14-core Xeon E5 server processor, while reducing power consumption by 12 x and area by 5.6 x.

Proceedings ArticleDOI
01 Apr 2018
TL;DR: This article used bag-of-super-word embeddings for automatic essay scoring, which achieved the best performance on the Automated Student Assessment Prize data set, in both in-domain and cross-domain settings, surpassing recent state of the art deep learning approaches.
Abstract: In this work, we present an approach based on combining string kernels and word embeddings for automatic essay scoring. String kernels capture the similarity among strings based on counting common character n-grams, which are a low-level yet powerful type of feature, demonstrating state-of-the-art results in various text classification tasks such as Arabic dialect identification or native language identification. To our best knowledge, we are the first to apply string kernels to automatically score essays. We are also the first to combine them with a high-level semantic feature representation, namely the bag-of-super-word-embeddings. We report the best performance on the Automated Student Assessment Prize data set, in both in-domain and cross-domain settings, surpassing recent state-of-the-art deep learning approaches.

Journal ArticleDOI
TL;DR: This article presents a novel matching approach, leveraging a deep neural network to classify pairs of toponyms as either matching or nonmatching, and shows that the proposed method can significantly outperform individual similarity metrics from previous studies, as well as previous methods based on supervised machine learning for combining multiple metrics.
Abstract: Toponym matching, i.e. pairing strings that represent the same real-world location, is a fundamental problemfor several practical applications. The current state-of-the-art relies on string similar...

Journal ArticleDOI
TL;DR: This paper focuses on the fulfillment of string stability in the practical case of heterogeneous vehicle strings that comprise vehicles with different dynamic properties using the idea of predecessor following, acceleration feedforward, predicted accelerationfeedforward, and input signal feedforward.
Abstract: String stability is an essential property to ensure that the fluctuations are attenuated along vehicle strings. This paper focuses on the fulfillment of string stability in the practical case of heterogeneous vehicle strings that comprise vehicles with different dynamic properties. Using the idea of predecessor following, acceleration feedforward, predicted acceleration feedforward, and input signal feedforward are considered as different possible feedforward strategies. For all strategies, the parameter ranges of predecessor vehicles that ensure string stability of a given vehicle are characterized, computed, and validated by simulation.

Book ChapterDOI
25 Mar 2018
TL;DR: SNarks are proof systems with succinct proofs, which are at the core of the cryptocurrency Zcash, whose anonymity relies on ZK-SNARKs; they are also used for ZK contingent payments in Bitcoin.
Abstract: Subversion zero knowledge for non-interactive proof systems demands that zero knowledge (ZK) be maintained even when the common reference string (CRS) is chosen maliciously. SNARKs are proof systems with succinct proofs, which are at the core of the cryptocurrency Zcash, whose anonymity relies on ZK-SNARKs; they are also used for ZK contingent payments in Bitcoin.

Posted Content
TL;DR: For very large collections stored in slow-access memory, this work proposes extremely compact data structures that solve weak prefix searches--they return the correct result only if some string in S starts with the given prefix.
Abstract: It has been shown in the indexing literature that there is an essential difference between prefix/range searches on the one hand, and predecessor/rank searches on the other hand, in that the former provably allows faster query resolution. Traditionally, prefix search is solved by data structures that are also dictionaries---they actually contain the strings in $S$. For very large collections stored in slow-access memory, we propose much more compact data structures that support \emph{weak} prefix searches---they return the ranks of matching strings provided that \emph{some} string in $S$ starts with the given prefix. In fact, we show that our most space-efficient data structure is asymptotically space-optimal. Previously, data structures such as String B-trees (and more complicated cache-oblivious string data structures) have implicitly supported weak prefix queries, but they all have query time that grows logarithmically with the size of the string collection. In contrast, our data structures are simple, naturally cache-efficient, and have query time that depends only on the length of the prefix, all the way down to constant query time for strings that fit in one machine word. We give several applications of weak prefix searches, including exact prefix counting and approximate counting of tuples matching conjunctive prefix conditions.

Journal ArticleDOI
TL;DR: A deep learning approach for the core digital libraries task of parsing bibliographic reference strings by deploying the state-of-the-art long short-term memory (LSTM) neural network architecture, a variant of a recurrent neural network to capture long-range dependencies in reference strings.
Abstract: We present a deep learning approach for the core digital libraries task of parsing bibliographic reference strings. We deploy the state-of-the-art long short-term memory (LSTM) neural network architecture, a variant of a recurrent neural network to capture long-range dependencies in reference strings. We explore word embeddings and character-based word embeddings as an alternative to handcrafted features. We incrementally experiment with features, architectural configurations, and the diversity of the dataset. Our final model is an LSTM-based architecture, which layers a linear chain conditional random field (CRF) over the LSTM output. In extensive experiments in both English in-domain (computer science) and out-of-domain (humanities) test cases, as well as multilingual data, our results show a significant gain ( $$p<0.01$$ ) over the reported state-of-the-art CRF-only-based parser.

Journal ArticleDOI
TL;DR: By taking a probabilistic perspective on training CNNs, this work derives two different loss functions for binary and real-valued word string embeddings and proposes two different CNN architectures, specifically designed for word spotting.
Abstract: Word spotting has become a field of strong research interest in document image analysis over the last years. Recently, AttributeSVMs were proposed which predict a binary attribute representation (Almazan et al. in IEEE Trans Pattern Anal Mach Intell 36(12):2552---2566, 2014). At their time, this influential method defined the state of the art in segmentation-based word spotting. In this work, we present an approach for learning attribute representations with convolutional neural networks(CNNs). By taking a probabilistic perspective on training CNNs, we derive two different loss functions for binary and real-valued word string embeddings. In addition, we propose two different CNN architectures, specifically designed for word spotting. These architectures are able to be trained in an end-to-end fashion. In a number of experiments, we investigate the influence of different word string embeddings and optimization strategies. We show our attribute CNNs to achieve state-of-the-art results for segmentation-based word spotting on a large variety of data sets.

Book ChapterDOI
19 Aug 2018
TL;DR: Goyal and Kumar as mentioned in this paper proposed constructions of 2-out-of-2 non-malleable secret sharing (NMSS) codes in the 2 split-state model.
Abstract: Goyal and Kumar (STOC’18) recently introduced the notion of non-malleable secret sharing. Very roughly, the guarantee they seek is the following: the adversary may potentially tamper with all of the shares, and still, either the reconstruction procedure outputs the original secret, or, the original secret is “destroyed” and the reconstruction outputs a string which is completely “unrelated” to the original secret. Prior works on non-malleable codes in the 2 split-state model imply constructions which can be seen as 2-out-of-2 non-malleable secret sharing (NMSS) schemes. Goyal and Kumar proposed constructions of t-out-of-n NMSS schemes. These constructions have already been shown to have a number of applications in cryptography.

Proceedings Article
07 Jan 2018
TL;DR: This paper presents an efficient data structure for maintaining a dynamic collection of strings under the following operations, and proves that even if the only possible query is checking equality of two strings, either updates or queries take amortized $\Omega(\log n)$ time; hence the implementation is optimal.
Abstract: In this paper, we study the fundamental problem of maintaining a dynamic collection of strings under the following operations: • make_string - add a string of constant length, • concat - concatenate two strings, • split - split a string into two at a given position, • compare - find the lexicographical order (less, equal, greater) between two strings, • LCP - calculate the longest common prefix of two strings. We develop a generic framework for dynamizing the recompression method recently introduced by Jez [J. ACM, 2016]. It allows us to present an efficient data structure for the above problem, where an update requires only O(log n) worst-case time with high probability, with n being the total length of all strings in the collection, and a query takes constant worst-case time. On the lower bound side, we prove that even if the only possible query is checking equality of two strings, either updates or queries must take amortized Ω(log n) time; hence our implementation is optimal.

Journal ArticleDOI
TL;DR: This work examines document spanners, a formal framework for information extraction that was introduced by Fagin, Kimelfeld, Reiss, and Vansummeren, and compares the expressive power of core spanners to three models – namely, patterns, word equations, and a rich and natural subclass of extended regular expressions (regular expressions with a repetition operator).
Abstract: We examine document spanners, a formal framework for information extraction that was introduced by Fagin, Kimelfeld, Reiss, and Vansummeren (PODS 2013, JACM 2015) A document spanner is a function that maps an input string to a relation over spans (intervals of positions of the string) We focus on document spanners that are defined by regex formulas, which are basically regular expressions that map matched subexpressions to corresponding spans, and on core spanners, which extend the former by standard algebraic operators and string equality selection First, we compare the expressive power of core spanners to three models – namely, patterns, word equations, and a rich and natural subclass of extended regular expressions (regular expressions with a repetition operator) These results are then used to analyze the complexity of query evaluation and various aspects of static analysis of core spanners Finally, we examine the relative succinctness of different kinds of representations of core spanners and relate this to the simplification of core spanners that are extended with difference operators

Journal ArticleDOI
TL;DR: Fractional-order-based control algorithms to enhance the car-following and string stability performance for both ACC and CACC vehicle strings, including communication temporal delay effects are presented.
Abstract: Traffic flow optimization and driver comfort enhancement are the main contributions of an Adaptive Cruise Control (ACC) system. If communication links are added, more safety and shorter gaps can be reached performing a Cooperative-ACC (CACC). Although shortening the inter-vehicular distances directly improves traffic flow, it can cause string unstable behavior. This paper presents fractional-order-based control algorithms to enhance the car-following and string stability performance for both ACC and CACC vehicle strings, including communication temporal delay effects. The proposed controller is compared with state-of-the-art implementations, exhibiting better performance. Simulation and real experiments have been conducted for validating the approach.

Posted Content
TL;DR: In this article, the authors use persistent homology to characterize distributions of Type IIB flux vacua on moduli space for three examples: the rigid Calabi-Yau, a hypersurface in weighted projective space, and the symmetric six-torus.
Abstract: Persistent homology computes the multiscale topology of a data set by using a sequence of discrete complexes. In this paper, we propose that persistent homology may be a useful tool for studying the structure of the landscape of string vacua. As a scaled-down version of the program, we use persistent homology to characterize distributions of Type IIB flux vacua on moduli space for three examples: the rigid Calabi-Yau, a hypersurface in weighted projective space, and the symmetric six-torus $T^6=(T^2)^3$. These examples suggest that persistence pairing and multiparameter persistence contain useful information for characterization of the landscape in addition to the usual information contained in standard persistent homology. We also study how restricting to special vacua with phenomenologically interesting low-energy properties affects the topology of a distribution.

Journal ArticleDOI
25 Apr 2018
TL;DR: This paper presents a novel solution of classifying SQL queries purely on the features of the initial query string using a Gap-Weighted String Subsequence Kernel algorithm and a Support Vector Machine trained on the similarity metrics between known query strings.
Abstract: SQL Injection Attacks are one of the most common methods behind data security breaches. Previous research has attempted to produce viable detection solutions in order to filter SQL Injection Attacks from regular queries. Unfortunately it has proven to be a challenging problem with many solutions suffering from disadvantages such as being unable to process in real time as a preventative solution, a lack of adaptability to differing types of attack and the requirement for access to difficult-to-obtain information about the source application. This paper presents a novel solution of classifying SQL queries purely on the features of the initial query string. A Gap-Weighted String Subsequence Kernel algorithm is implemented to identify subsequences of shared characters between query strings for the output of a similarity metric. Finally a Support Vector Machine is trained on the similarity metrics between known query strings which are then used to classify unknown test queries. By gathering all feature data from the query strings, additional information from the source application is not required. The probabilistic nature of the learned models allows the solution to adapt to new threats whilst in operation. The proposed solution is evaluated using a number of test datasets derived from the Amnesia testbed datasets. The demonstration software achieved 97.07% accuracy for Select type queries and 92.48% accuracy for Insert type queries. This limited success rate is due to unsanitised quotation marks within legitimate inputs confusing the feature extraction. Using a test dataset that denies legitimate queries the use of unsanitised quotation marks, the Select and Insert query accuracy rose.

Posted ContentDOI
20 Aug 2018-bioRxiv
TL;DR: This work introduces Viruses.STRING, a protein–protein interaction database specifically catering to virus-virus and virus-host interactions, which combines evidence from experimental and text-mining channels to provide combined probabilities for interactions between viral and host proteins.
Abstract: As viruses continue to pose risks to global health, having a better un-derstanding of virus–host protein–protein interactions aids in the development of treatments and vaccines. Here, we introduce Viruses.STRING, a protein–protein interaction database specifically catering to virus-virus and virus-host interactions. This database combines evidence from experimental and text-mining channels to provide combined probabilities for interactions between viral and host proteins. The database contains 177,425 interactions between 239 viruses and 319 hosts. The database is publicly available at viruses.string-db.org, and the interaction data can also be accessed through the latest version of the Cytoscape STRING app.

Journal ArticleDOI
TL;DR: A new linear-size data structure is proposed which provides a fast access to all palindromic substrings of a string or a set of strings that inherits some ideas from the construction of both the suffix trie and suffix tree.

Book ChapterDOI
24 Sep 2018
TL;DR: It is shown that when extended with several natural predicates on words, the existential fragment becomes undecidable and deciding whether solutions exist for a restricted class of equations, augmented with many of the predicates leading to undecidability in the general case, is possible in non-deterministic polynomial time.
Abstract: The study of word equations is a central topic in mathematics and theoretical computer science. Recently, the question of whether a given word equation, augmented with various constraints/extensions, has a solution has gained critical importance in the context of string SMT solvers for security analysis. We consider the decidability of this question in several natural variants and thus shed light on the boundary between decidability and undecidability for many fragments of the first order theory of word equations and their extensions. In particular, we show that when extended with several natural predicates on words, the existential fragment becomes undecidable. On the other hand, the positive \(\varSigma _2\) fragment is decidable, and in the case that at most one terminal symbol appears in the equations, remains so even when length constraints are added. Moreover, if negation is allowed, it is possible to model arbitrary equations with length constraints using only equations containing a single terminal symbol and length constraints. Finally, we show that deciding whether solutions exist for a restricted class of equations, augmented with many of the predicates leading to undecidability in the general case, is possible in non-deterministic polynomial time.

Journal ArticleDOI
01 Jun 2018
TL;DR: This letter investigates networks of interconnected systems and introduces the notion of “scalable input-to-state stability” (sISS), which can be interpreted as an extension of the well-known concept of string stability from simple line graphs to general graphs.
Abstract: This letter investigates networks of interconnected systems and introduces the notion of “scalable input-to-state stability” (sISS). This concept is based on input-to-state stability (ISS) and can be interpreted as an extension of the well-known concept of string stability from simple line graphs to general graphs. It guarantees that the trajectories of all states are bounded at all times independently of the network’s size and structure and can hence be regarded as an important performance notion. Further, sufficient conditions are derived to guarantee sISS of homogeneous networks with well-defined interconnection structures. In fact, the conditions depend on local ISS Lyapunov functions but guarantee the global condition of sISS. Hence, a first step is made towards developing suitable extensions of string stability to general networks. Two examples are discussed to illustrate the theoretical result.