scispace - formally typeset
Search or ask a question

Showing papers by "Turku Centre for Computer Science published in 2009"


Proceedings ArticleDOI
05 Jun 2009
TL;DR: A system for extracting complex events among genes and proteins from biomedical literature, developed in context of the BioNLP'09 Shared Task on Event Extraction, which defines a wide array of features and makes extensive use of dependency parse graphs.
Abstract: We describe a system for extracting complex events among genes and proteins from biomedical literature, developed in context of the BioNLP'09 Shared Task on Event Extraction. For each event, its text trigger, class, and arguments are extracted. In contrast to the prevailing approaches in the domain, events can be arguments of other events, resulting in a nested structure that better captures the underlying biological statements. We divide the task into independent steps which we approach as machine learning problems. We define a wide array of features and in particular make extensive use of dependency parse graphs. A rule-based post-processing step is used to refine the output in accordance with the restrictions of the extraction task. In the shared task evaluation, the system achieved an F-score of 51.95% on the primary task, the best performance among the participants.

231 citations


Journal ArticleDOI
TL;DR: It is shown why the discriminant of a maximal order within a cyclic division algebra must be minimized in order to get the densest possible matrix lattices with a prescribed nonvanishing minimum determinant.
Abstract: It is shown why the discriminant of a maximal order within a cyclic division algebra must be minimized in order to get the densest possible matrix lattices with a prescribed nonvanishing minimum determinant. Using results from class field theory, a lower bound to the minimum discriminant of a maximal order with a given center and index (= the number of Tx/Rx antennas) is derived. Also numerous examples of division algebras achieving the bound are given. For example, a matrix lattice with quadrature amplitude modulation (QAM) coefficients that has 2.5 times as many codewords as the celebrated Golden code of the same minimum determinant is constructed. Also, a general algorithm due to Ivanyos and Ronyai for finding maximal orders within a cyclic division algebra is described and enhancements to this algorithm are discussed. Also some general methods for finding cyclic division algebras of a prescribed index achieving the lower bound are proposed.

92 citations


Journal ArticleDOI
TL;DR: A framework for regularized least-squares (RLS) type of ranking cost functions is introduced and a kernel-based preference learning algorithm, which is called RankRLS, is proposed for minimizing these functions.
Abstract: In this paper, we introduce a framework for regularized least-squares (RLS) type of ranking cost functions and we propose three such cost functions. Further, we propose a kernel-based preference learning algorithm, which we call RankRLS, for minimizing these functions. It is shown that RankRLS has many computational advantages compared to the ranking algorithms that are based on minimizing other types of costs, such as the hinge cost. In particular, we present efficient algorithms for training, parameter selection, multiple output learning, cross-validation, and large-scale learning. Circumstances under which these computational benefits make RankRLS preferable to RankSVM are considered. We evaluate RankRLS on four different types of ranking tasks using RankSVM and the standard RLS regression as the baselines. RankRLS outperforms the standard RLS regression and its performance is very similar to that of RankSVM, while RankRLS has several computational benefits over RankSVM.

63 citations


Proceedings ArticleDOI
20 Jul 2009
TL;DR: It is showed that the interpretation of m-ary adjacency relations is the same of binary relations and therefore they can consistently be employed in social network analysis and some novel results be derived.
Abstract: Adjacency relations for social network analysis have usually been tackled in their bidimensional form, in the sense that relations are computed over pairs of objects. Nevertheless, this paper considers the bidimensional case as restrictive and it proposes an approach where the dimension of the analysis is not limited to binary relations. With the aid of fuzzy logic and OWA operators, it is showed that the interpretation of m-ary adjacency relations is the same of binary relations and therefore they can consistently be employed in social network analysis and some novel results be derived. Besides justifying the use of m-ary relations, the paper proposes a way to characterize them and, eventually, it will provide the reader with an example section.

45 citations


Book ChapterDOI
01 Jan 2009
TL;DR: This chapter focuses on the core aspects of algebraic series, pushdown automata, and their relation to formal languages, and a presentation of their theory based on the concept of properness.
Abstract: We concentrate in this chapter on the core aspects of algebraic series, pushdown automata, and their relation to formal languages. We choose to follow here a presentation of their theory based on the concept of properness. We introduce in Sect. 2 some auxiliary notions and results needed throughout the chapter, in particular the notions of discrete convergence in semirings and C-cycle free infinite matrices. In Sect. 3 we introduce the algebraic power series in terms of algebraic systems of equations. We focus on interconnections with context-free grammars and on normal forms. We then conclude the section with a presentation of the theorems of Shamir and Chomsky–Schutzenberger. We discuss in Sect. 4 the algebraic and the regulated rational transductions, as well as some representation results related to them. Section 5 is dedicated to pushdown automata and focuses on the interconnections with classical (non-weighted) pushdown automata and on the interconnections with algebraic systems. We then conclude the chapter with a brief discussion of some of the other topics related to algebraic systems and pushdown automata.

45 citations


Journal ArticleDOI
TL;DR: This paper aims to demonstrate the efforts towards in-situ applicability of EMMARM, which aims to provide real-time information about the response of the immune system to EMTs.
Abstract: Work of the first author supported by a Discovery Grant from NSERC. Work of the second author supported by the Finnish Academy under grant 8206039.

35 citations


Journal ArticleDOI
TL;DR: A fully automatic brain segmentation method for T1-weighted images is proposed in present paper that uses watershed segmentation with Gaussian mixture model clustering for segmenting cerebrospinal fluid from brain matter and other head tissues.

30 citations


Journal ArticleDOI
TL;DR: An unsupervised method based on a combination of hidden Markov models and latent semantic analysis which allows the topics of interest to be defined freely, without the need for data annotation, and can identify short segments is introduced.

30 citations


Journal ArticleDOI
TL;DR: It is proved that there exist infinitely many infinite overlap-free binary partial words containing at least one hole and it is shown that these words cannot contain more than one hole.

24 citations


Journal ArticleDOI
TL;DR: It is shown that the widely used normalisation constraint does not apply to the priority vectors associated with reciprocal relations, also called fuzzy preference relations, whenever additive transitivity is involved, and an alternative normalisation procedure is proposed which is compatible with additive Transitivity and leads to better results.
Abstract: In this paper, we show that the widely used normalisation constraint does not apply to the priority vectors associated with reciprocal relations, also called fuzzy preference relations, whenever additive transitivity is involved. We show that misleading applications of this type of normalisation may lead to unsatisfactory results and we give some examples from the literature. Then, we propose an alternative normalisation procedure which is compatible with additive transitivity and leads to better results.

24 citations


Journal ArticleDOI
TL;DR: It is shown that the (infinite) tiling problem by Wang tiles is undecidable even if the given tile set is deterministic by all four corners, i.e. a tile is uniquely determined by the colors of any two adjacent edges.

Journal ArticleDOI
TL;DR: A formal grammar and a parser are developed for ICU Finnish, thus providing better tools for the development of further applications in the clinical domain and enabling a deeper analysis of the text than was previously possible.

Book ChapterDOI
03 Mar 2009
TL;DR: This paper presents a model-based testing approach based on user provided testing scenarios that can be used to test different features of a system such as incorporated fault-tolerance mechanisms.
Abstract: In this paper, we present a model-based testing approach based on user provided testing scenarios. In this approach, when a software model is refined to add or modify features, the corresponding testing scenarios are automatically refined to incorporate these changes. The test cases, to be applied on the system under test, are generated from these scenarios. We use the Event-B formalism for software models, while user scenarios are represented as Communicating Sequential Process (CSP) expressions. The presented case study demonstrates how our approach can be used to test different features of a system such as incorporated fault-tolerance mechanisms.

Journal ArticleDOI
TL;DR: It is shown that every minimal palindromic word is abelian unbordered, that is, no proper suffix of the word can be obtained by permuting the letters of a proper prefix.


Journal Article
TL;DR: It is proved that there are recurrent words with ultimately constant complexity c for every c, which means there are constant words with constant abelian complexity three and constant complexity four.
Abstract: It is known that there are recurrent words with constant abelian complexity three, but not with constant complexity four. We prove that there are recurrent words with ultimately constant complexity c for every c.

Journal ArticleDOI
TL;DR: A framework based on a word-position matrix representation of text, linear feature transformations of the word- position matrices, and kernel functions constructed from the transformations is introduced and a highly efficient method for prediction is introduced.
Abstract: In the application of machine learning methods with natural language inputs, the words and their positions in the input text are some of the most important features. In this article, we introduce a framework based on a word-position matrix representation of text, linear feature transformations of the word-position matrices, and kernel functions constructed from the transformations. We consider two categories of transformations, one based on word similarities and the second on their positions, which can be applied simultaneously in the framework in an elegant way. We show how word and positional similarities obtained by applying previously proposed techniques, such as latent semantic analysis, can be incorporated as transformations in the framework. We also introduce novel ways to determine word and positional similarities. We further present efficient algorithms for computing kernel functions incorporating the transformations on the word-position matrices, and, more importantly, introduce a highly efficient method for prediction. The framework is particularly suitable to natural language disambiguation tasks where the aim is to select for a single word a particular property from a set of candidates based on the context of the word. We demonstrate the applicability of the framework to this type of tasks using context-sensitive spelling error correction on the Reuters News corpus as a model problem.

Journal ArticleDOI
TL;DR: A procedure to search for prime divisors of class numbers of real abelian fields and a table of odd primes < 10000 not dividing the degree that divide the classNumbers of fields of conductor < 2000 are given.
Abstract: In this paper we give a procedure to search for prime divisors of class numbers of real abelian fields and present a table of odd primes < 10000 not dividing the degree that divide the class numbers of fields of conductor < 2000. Cohen-Lenstra heuristics allow us to conjecture that no larger prime divisors should exist. Previous computations have been largely limited to prime power conductors.

Book ChapterDOI
01 Jan 2009
TL;DR: Using the fact that the tiling problem of Wang tiles is undecidable even if the given tile set is deterministic by two opposite corners, it was shown in this article that the question of whether there exists a trajectory which belongs to the given open and closed set is also undecidability for one-dimensional reversible cellular automata.
Abstract: Using the fact that the tiling problem of Wang tiles is undecidable even if the given tile set is deterministic by two opposite corners, it is shown that the question whether there exists a trajectory which belongs to the given open and closed set is undecidable for one-dimensional reversible cellular automata This result holds even if the cellular automaton is mixing Furthermore, it is shown that left expansivity of a reversible cellular automaton is an undecidable property Also, the tile set construction gives yet another proof for the universality of one-dimensional reversible cellular automata

Proceedings ArticleDOI
09 Jul 2009
TL;DR: The analysis of the proposed architectures and algorithms shows that some of them improve the fault tolerance of NoC with a reasonable overhead by decreasing the average hop counts and keeping the cores connectable even in the case of faults.
Abstract: The topology level fault tolerance of Network-on-Chip (NoC) can be improved with multi network interface (multi-NI) architectures. Multi-NI NoC architectures are based on connecting at least two network interfaces on each core. The aim is to improve fault tolerance on the architectural level which means the delivery of packets even when there are faulty links or routers in the network. This paper presents architectures and algorithms for multi-NI NoCs. The analysis of the proposed architectures and algorithms shows that some of them improve the fault tolerance of NoC with a reasonable overhead by decreasing the average hop counts and keeping the cores connectable even in the case of faults. With a multi-NI architecture the number of successfully delivered packets has been even doubled.

Journal ArticleDOI
TL;DR: Questions closely related to the open problem whether M"t"+"r(n+m)==r+3 when r>=1 and t=1 are considered, and constructions for the best known 1-identifying codes of certain lengths are given.

Journal Article
TL;DR: In this paper, the authors considered the problem of computing the solution set of a word equation with at most four occurrences of the unknown and showed that each of them possesses either infinitely many solutions or at most two solutions.
Abstract: We consider properties of the solution set of a word equation with one unknown. We prove that the solution set of a word equation possessing infinite number of solutions is of the form (pq)*p where pq is primitive. Next, we prove that a word equation with at most four occurrences of the unknown possesses either infinitely many solutions or at most two solutions. We show that there are equations with at most four occurrences of the unknown possessing exactly two solutions. Finally, we prove that a word equation with at most 2k occurrences of the unknown possesses either infinitely many solutions or at most 8 log k + O(1) solutions. Hence, if we consider a class ek of equations with at most 2k occurrences of the unknown, then each equation in this class possesses either infinitely many solutions or O(log k) number of solutions. Our considerations allow to construct the first alphabet independent linear time algorithm for computing the solution set of an equation in a nontrivial class of equations.


Journal ArticleDOI
TL;DR: This paper introduces a new consistency evaluation method and proposes to use it in group decision making problems in order to fairly weigh the decision maker's preferences according to their consistency.
Abstract: In decision-making processes, it often occurs that the decision maker is asked to pairwise compare alternatives. His/her judgements over a set of pairs of alternatives can be collected into a matrix and some relevant properties, for instance, consistency, can be estimated. Consistency is a desirable property which implies that all the pairwise comparisons respect a principle of transitivity. So far, many indices have been proposed to estimate consistency. Nevertheless, in this paper we argue that most of these indices do not fairly evaluate this property. Then, we introduce a new consistency evaluation method and we propose to use it in group decision making problems in order to fairly weigh the decision maker's preferences according to their consistency. In our analysis, we consider two families of pairwise comparison matrices: additively reciprocal pairwise comparison matrices and multiplicatively reciprocal pairwise comparison matrices.

Book ChapterDOI
01 Jul 2009
TL;DR: Hmelevskii's theorem is analyzed, which states that the general solutions of constant-free equations on three unknowns are expressible by a finite collection of formulas of word and numerical parameters, and it is proved that the size of the finite representation is bounded by an exponential function on the sizeof the equation.
Abstract: We analyze Hmelevskii's theorem, which states that the general solutions of constant-free equations on three unknowns are expressible by a finite collection of formulas of word and numerical parameters. We prove that the size of the finite representation is bounded by an exponential function on the size of the equation. We also prove that the shortest nontrivial solution of the equation, if it exists, is exponential, and that its existence can be solved in nondeterministic polynomial time.

Journal ArticleDOI
TL;DR: The results show that Nodularia strains can be distinguished from the eukaryotes by applying a pattern recognition procedure to the fluorescence induction curves, suggesting that the Fluorescence fingerprinting technique might be useful in environmental monitoring of marine algae.

Proceedings ArticleDOI
09 Jul 2009
TL;DR: The developed fully adaptive algorithm is shown to provide superior fault tolerance and the best deadlock-free algorithm is the variation based on the north-last turn model, which provides the highest fault tolerance.
Abstract: The reliability of networks-on-chip can be increased by developing routing algorithms that provide fault tolerance. Distributed routing algorithms are a class of algorithms where the routing decisions are made without knowledge of the global state of the network. The paper presents a distributed routing algorithm, which targets to maximal fault tolerance. Total of 11 different algorithms are analyzed in their fault tolerance and average hop count using a dedicated in-house C++ simulator. Five of the algorithms are variations of the proposed algorithm and the rest are references or algorithms presented in literature. The developed fully adaptive algorithm is shown to provide superior fault tolerance. The best deadlock-free algorithm is the variation based on the north-last turn model. In the class of minimal algorithms the dynamic xy algorithm provides the highest fault tolerance.


Journal ArticleDOI
TL;DR: Two new algorithms for determining fetch lengths for study points in the same directions are presented, assumed that the two‐dimensional map is stored in vector format, i.e. shorelines of islands and mainland are stored as polygons.
Abstract: Distances from points to closest shorelines in a given direction are used, for example, in some models for estimating wave exposure. Such distances, also called fetch lengths, can be determined using standard geographic information systems. However, performance may be a problem if these distances are required for a great number of study points. Two new algorithms for determining fetch lengths for study points in the same directions are presented in this paper. It is assumed that the two-dimensional map is stored in vector format, i.e. shorelines of islands and mainland are stored as polygons. The first algorithm works on a set of undirected line segments derived from the shoreline polygons. The other works on a raster representation of the map. The algorithm saves memory by postponing the rasterisation until necessary. Both of the new algorithms have superior efficiency to a previously reported algorithm when the number of study points is large.

Journal ArticleDOI
TL;DR: This paper analyzes the energy consumption, execution efficiency, and speed issues of Java applications in a typical consumer mobile device environment and introduces a Java accelerator with a companion Java virtual machine.