Home
/
Authors
/
Esko Ukkonen

Author

Esko Ukkonen

Other affiliations: Max Planck Society, Helsinki Institute for Information Technology

Bio: Esko Ukkonen is an academic researcher from University of Helsinki. The author has contributed to research in topics: String searching algorithm & String (computer science). The author has an hindex of 45, co-authored 200 publications receiving 11846 citations. Previous affiliations of Esko Ukkonen include Max Planck Society & Helsinki Institute for Information Technology.

Papers published on a yearly basis

2020
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1976

Papers

PDF

Open Access

More filters

Journal Article•DOI•

On-line construction of suffix trees

[...]

Esko Ukkonen¹•Institutions (1)

University of Helsinki¹

01 Sep 1995-Algorithmica

TL;DR: An on-line algorithm is presented for constructing the suffix tree for a given string in time linear in the length of the string, developed as a linear-time version of a very simple algorithm for (quadratic size) suffixtries.

...read moreread less

Abstract: An on-line algorithm is presented for constructing the suffix tree for a given string in time linear in the length of the string. The new algorithm has the desirable property of processing the string symbol by symbol from left to right. It always has the suffix tree for the scanned part of the string ready. The method is developed as a linear-time version of a very simple algorithm for (quadratic size) suffixtries. Regardless of its quadratic worst case this latter algorithm can be a good practical method when the string is not too long. Another variation of this method is shown to give, in a natural way, the well-known algorithms for constructing suffix automata (DAWGs).

...read moreread less

1,528 citations

Journal Article•DOI•

DNA-Binding Specificities of Human Transcription Factors

[...]

Arttu Jolma¹, Jian Yan¹, Tom Whitington¹, Jarkko Toivonen², Kazuhiro R. Nitta¹, Pasi Rastas², Ekaterina Morgunova¹, Martin Enge¹, Mikko Taipale², Gong-Hong Wei², Kimmo Palin², Juan M. Vaquerizas³, Renaud Vincentelli⁴, Nicholas M. Luscombe³, Timothy P. Hughes⁵, Patrick Lemaire, Esko Ukkonen², Teemu Kivioja¹, Teemu Kivioja², Jussi Taipale², Jussi Taipale¹ - Show less +17 more•Institutions (5)

Karolinska Institutet¹, University of Helsinki², European Bioinformatics Institute³, Aix-Marseille University⁴, University of Toronto⁵

17 Jan 2013-Cell

TL;DR: Global analysis of the data revealed that homodimer orientation and spacing preferences, and base-stacking interactions, have a larger role in TF-DNA binding than previously appreciated.

...read moreread less

1,140 citations

Journal Article•DOI•

Algorithms for approximate string matching

[...]

Esko Ukkonen¹•Institutions (1)

University of Helsinki¹

01 Mar 1985-Information & Computation

TL;DR: An improved algorithm that works in time and in space O and algorithms that can be used in conjunction with extended edit operation sets, including, for example, transposition of adjacent characters.

...read moreread less

Abstract: The edit distance between strings a 1 … a m and b 1 … b n is the minimum cost s of a sequence of editing steps (insertions, deletions, changes) that convert one string into the other. A well-known tabulating method computes s as well as the corresponding editing sequence in time and in space O ( mn ) (in space O (min( m, n )) if the editing sequence is not required). Starting from this method, we develop an improved algorithm that works in time and in space O ( s · min( m, n )). Another improvement with time O ( s · min( m, n )) and space O ( s · min( s, m, n )) is given for the special case where all editing steps have the same cost independently of the characters involved. If the editing sequence that gives cost s is not required, our algorithms can be implemented in space O (min( s, m, n )). Since s = O (max( m, n )), the new methods are always asymptotically as good as the original tabulating method. As a by-product, algorithms are obtained that, given a threshold value t , test in time O ( t · min( m, n )) and in space O (min( t, m, n )) whether s ⩽ t . Finally, different generalized edit distances are analyzed and conditions are given under which our algorithms can be used in conjunction with extended edit operation sets, including, for example, transposition of adjacent characters.

...read moreread less

672 citations

Journal Article•DOI•

Approximate string-matching with q -grams and maximal matches

[...]

Esko Ukkonen¹•Institutions (1)

University of Helsinki¹

06 Jan 1992

TL;DR: Two string distance functions that are computable in linear time give a lower bound for the edit distance (in the unit cost model), which leads to fast hybrid algorithms for the edited distance based string matching.

...read moreread less

Abstract: We study approximate string matching in connection with two string distance functions that are computable in linear time. The first function is based on the so-called $q$-grams. An algorithm is given for the associated string matching problem that finds the locally best approximate occurences of pattern $P$, $|P|=m$, in text $T$, $|T|=n$, in time $O(n\log (m-q))$. The occurences with distance $\leq k$ can be found in time $O(n\log k)$. The other distance function is based on finding maximal common substrings and allows a form of approximate string matching in time $O(n)$. Both distances give a lower bound for the edit distance (in the unit cost model), which leads to fast hybrid algorithms for the edit distance based string matching.

...read moreread less

665 citations

Journal Article•DOI•

Genome-Wide Analysis of ETS-Family DNA-Binding In Vitro and In Vivo

[...]

Gong-Hong Wei¹, Gwenael Badis², Michael F. Berger, Teemu Kivioja¹, Teemu Kivioja³, Kimmo Palin³, Martin Enge⁴, Martin Bonke¹, Arttu Jolma¹, Markku Varjosalo¹, Andrew R. Gehrke, Jian Yan¹, Shaheynoor Talukder², Mikko P. Turunen¹, Mikko Taipale¹, Hendrik G. Stunnenberg⁵, Esko Ukkonen³, Timothy P. Hughes², Martha L. Bulyk, Jussi Taipale⁴, Jussi Taipale¹ - Show less +17 more•Institutions (5)

National Institutes of Health¹, University of Toronto², University of Helsinki³, Karolinska Institutet⁴, Radboud University Nijmegen⁵

07 Jul 2010-The EMBO Journal

TL;DR: The results indicate that even relatively small differences in in vitro binding specificity of a TF contribute to site selectivity in vivo, and this work identifies amino‐acid residues that are critical for the differences in specificity between all the classes.

...read moreread less

Abstract: Members of the large ETS family of transcription factors (TFs) have highly similar DNA-binding domains (DBDs)—yet they have diverse functions and activities in physiology and oncogenesis. Some differences in DNA-binding preferences within this family have been described, but they have not been analysed systematically, and their contributions to targeting remain largely uncharacterized. We report here the DNA-binding profiles for all human and mouse ETS factors, which we generated using two different methods: a high-throughput microwell-based TF DNA-binding specificity assay, and protein-binding microarrays (PBMs). Both approaches reveal that the ETS-binding profiles cluster into four distinct classes, and that all ETS factors linked to cancer, ERG, ETV1, ETV4 and FLI1, fall into just one of these classes. We identify amino-acid residues that are critical for the differences in specificity between all the classes, and confirm the specificities in vivo using chromatin immunoprecipitation followed by sequencing (ChIP-seq) for a member of each class. The results indicate that even relatively small differences in in vitro binding specificity of a TF contribute to site selectivity in vivo.

...read moreread less

527 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

DnaSP v5

[...]

Pablo Librado¹, Julio Rozas¹•Institutions (1)

University of Barcelona¹

01 Jun 2009-Bioinformatics

TL;DR: Version 5 implements a number of new features and analytical methods allowing extensive DNA polymorphism analyses on large datasets, including visualizing sliding window results integrated with available genome annotations in the UCSC browser.

...read moreread less

Abstract: Motivation: DnaSP is a software package for a comprehensive analysis of DNA polymorphism data. Version 5 implements a number of new features and analytical methods allowing extensive DNA polymorphism analyses on large datasets. Among other features, the newly implemented methods allow for: (i) analyses on multiple data files; (ii) haplotype phasing; (iii) analyses on insertion/deletion polymorphism data; (iv) visualizing sliding window results integrated with available genome annotations in the UCSC browser. Availability: Freely available to academic users from: http://www.ub.edu/dnasp Contact: [email protected]

...read moreread less

13,511 citations

SPAdes, a new genome assembly algorithm and its applications to single-cell sequencing ( 7th Annual SFAF Meeting, 2012)

[...]

Glenn Tesler

01 Jun 2012

TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).

...read moreread less

Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.

...read moreread less

10,124 citations

Journal Article•DOI•

Robust enumeration of cell subsets from tissue expression profiles

[...]

Aaron M. Newman¹, Chih Long Liu¹, Michael R. Green¹, Andrew J. Gentles¹, Weiguo Feng¹, Yue Xu¹, Chuong D. Hoang¹, Maximilian Diehn¹, Arash Ash Alizadeh¹ - Show less +5 more•Institutions (1)

Stanford University¹

01 May 2015-Nature Methods

TL;DR: CIBERSORT outperformed other methods with respect to noise, unknown mixture content and closely related cell types when applied to enumeration of hematopoietic subsets in RNA mixtures from fresh, frozen and fixed tissues, including solid tumors.

...read moreread less

Abstract: We introduce CIBERSORT, a method for characterizing cell composition of complex tissues from their gene expression profiles When applied to enumeration of hematopoietic subsets in RNA mixtures from fresh, frozen and fixed tissues, including solid tumors, CIBERSORT outperformed other methods with respect to noise, unknown mixture content and closely related cell types CIBERSORT should enable large-scale analysis of RNA mixtures for cellular biomarkers and therapeutic targets (http://cibersortstanfordedu/)

...read moreread less

6,967 citations

Journal Article•DOI•

CAP3: A DNA Sequence Assembly Program

[...]

Xiaoqiu Huang¹, Anup Madan•Institutions (1)

Michigan Technological University¹

01 Sep 1999-Genome Research

TL;DR: The third generation of the CAP sequence assembly program is described, which has a capability to clip 5' and 3' low-quality regions of reads and uses forward-reverse constraints to correct assembly errors and link contigs.

...read moreread less

Abstract: The shotgun sequencing strategy has been used widely in genome sequencing projects. A major phase in this strategy is to assemble short reads into long sequences. A number of DNA sequence assembly programs have been developed (Staden 1980; Peltola et al. 1984; Huang 1992; Smith et al. 1993; Gleizes and Henaut 1994; Lawrence et al. 1994; Kececioglu and Myers 1995; Sutton et al. 1995; Green 1996). The FAKII program provides a library of routines for each phase of the assembly process (Larson et al. 1996). The GAP4 program has a number of useful interactive features (Bonfield et al. 1995). The PHRAP program clips 5′ and 3′ low-quality regions of reads and uses base quality values in evaluation of overlaps and generation of contig sequences (Green 1996). TIGR Assembler has been used in a number of megabase microbial genome projects (Sutton et al. 1995). Continued development and improvement of sequence assembly programs are required to meet the challenges of the human, mouse, and maize genome projects. We have developed the third generation of the CAP sequence assembly program (Huang 1992). The CAP3 program includes a number of improvements and new features. A capability to clip 5′ and 3′ low-quality regions of reads is included in the CAP3 program. Base quality values produced by PHRED (Ewing et al. 1998) are used in computation of overlaps between reads, construction of multiple sequence alignments of reads, and generation of consensus sequences. Efficient algorithms are employed to identify and compute overlaps between reads. Forward–reverse constraints are used to correct assembly errors and link contigs. Results of CAP3 on four BAC data sets are presented. The performance of CAP3 was compared with that of PHRAP on a number of BAC data sets. PHRAP often produces longer contigs than CAP3 whereas CAP3 often produces fewer errors in consensus sequences than PHRAP. It is easier to construct scaffolds with CAP3 than with PHRAP on low-pass data with forward–reverse constraints. An unusual feature of CAP3 is the use of forward–reverse constraints in the construction of contigs. A forward–reverse constraint is often produced by sequencing of both ends of a subclone. A forward–reverse constraint specifies that the two reads should be on the opposite strands of the DNA molecule within a specified range of distance. By sequencing both ends of each subclone, a large number of forward–reverse constraints are produced for a cosmid or BAC data set. A difficulty with use of forward–reverse constraints in assembly is that some of the forward–reverse constraints are incorrect because of errors in lane tracking and cloning. Our strategy for dealing with this difficulty is based on the observation that a majority of the constraints are correct and wrong constraints usually occur randomly. Thus, a few unsatisfied constraints in a contig may not be sufficient to indicate an assembly error in the contig. However, if a sufficient number of constraints are all inconsistent with a join in a contig and all support an alternative join, it is likely that the current join is an error, and the alternative join should be made.

...read moreread less

5,074 citations

Journal Article•DOI•

Integrative analysis of 111 reference human epigenomes

[...]

Anshul Kundaje¹, Wouter Meuleman¹, Wouter Meuleman², Jason Ernst³, Misha Bilenky⁴, Angela Yen², Angela Yen¹, Alireza Heravi-Moussavi⁴, Pouya Kheradpour¹, Pouya Kheradpour², Zhizhuo Zhang², Zhizhuo Zhang¹, Jianrong Wang¹, Jianrong Wang², Michael J. Ziller², Viren Amin⁵, John W. Whitaker, Matthew D. Schultz⁶, Lucas D. Ward², Lucas D. Ward¹, Abhishek Sarkar², Abhishek Sarkar¹, Gerald Quon¹, Gerald Quon², Richard Sandstrom⁷, Matthew L. Eaton¹, Matthew L. Eaton², Yi-Chieh Wu¹, Yi-Chieh Wu², Andreas R. Pfenning¹, Andreas R. Pfenning², Xinchen Wang², Xinchen Wang¹, Melina Claussnitzer², Melina Claussnitzer¹, Yaping Liu², Yaping Liu¹, Cristian Coarfa⁵, R. Alan Harris⁵, Noam Shoresh², Charles B. Epstein², Elizabeta Gjoneska¹, Elizabeta Gjoneska², Danny Leung⁸, Wei Xie⁸, R. David Hawkins⁸, Ryan Lister⁶, Chibo Hong⁹, Philippe Gascard⁹, Andrew J. Mungall⁴, Richard A. Moore⁴, Eric Chuah⁴, Angela Tam⁴, Theresa K. Canfield⁷, R. Scott Hansen⁷, Rajinder Kaul⁷, Peter J. Sabo⁷, Mukul S. Bansal¹, Mukul S. Bansal², Mukul S. Bansal¹⁰, Annaick Carles⁴, Jesse R. Dixon⁸, Kai How Farh², Soheil Feizi², Soheil Feizi¹, Rosa Karlic¹¹, Ah Ram Kim², Ah Ram Kim¹, Ashwinikumar Kulkarni¹², Daofeng Li¹³, Rebecca F. Lowdon¹³, Ginell Elliott¹³, Tim R. Mercer¹⁴, Shane Neph⁷, Vitor Onuchic⁵, Paz Polak¹⁵, Paz Polak², Nisha Rajagopal⁸, Pradipta R. Ray¹², Richard C Sallari², Richard C Sallari¹, Kyle Siebenthall⁷, Nicholas A Sinnott-Armstrong¹, Nicholas A Sinnott-Armstrong², Michael Stevens¹³, Robert E. Thurman⁷, Jie Wu¹⁶, Bo Zhang¹³, Xin Zhou¹³, Arthur E. Beaudet⁵, Laurie A. Boyer¹, Philip L. De Jager², Philip L. De Jager¹⁵, Peggy J. Farnham¹⁷, Susan J. Fisher⁹, David Haussler¹⁸, Steven J.M. Jones⁴, Steven J.M. Jones¹⁹, Wei Li⁵, Marco A. Marra⁴, Michael T. McManus⁹, Shamil R. Sunyaev², Shamil R. Sunyaev¹⁵, James A. Thomson²⁰, Thea D. Tlsty⁹, Li-Huei Tsai², Li-Huei Tsai¹, Wei Wang, Robert A. Waterland⁵, Michael Q. Zhang²¹, Lisa Helbling Chadwick²², Bradley E. Bernstein⁶, Bradley E. Bernstein², Bradley E. Bernstein¹⁵, Joseph F. Costello⁹, Joseph R. Ecker¹¹, Martin Hirst⁴, Alexander Meissner², Aleksandar Milosavljevic⁵, Bing Ren⁸, John A. Stamatoyannopoulos⁷, Ting Wang¹³, Manolis Kellis¹, Manolis Kellis² - Show less +120 more•Institutions (22)

Massachusetts Institute of Technology¹, Broad Institute², University of California, Los Angeles³, University of British Columbia⁴, Baylor College of Medicine⁵, Howard Hughes Medical Institute⁶, University of Washington⁷, Ludwig Institute for Cancer Research⁸, University of California, San Francisco⁹, University of Connecticut¹⁰, University of Zagreb¹¹, University of Texas at Austin¹², Washington University in St. Louis¹³, University of Queensland¹⁴, Harvard University¹⁵, Cold Spring Harbor Laboratory¹⁶, University of Southern California¹⁷, University of California, Santa Cruz¹⁸, Simon Fraser University¹⁹, Morgridge Institute for Research²⁰, University of Texas at Dallas²¹, National Institutes of Health²²

19 Feb 2015-Nature

TL;DR: It is shown that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease.

...read moreread less

Abstract: The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.

...read moreread less

5,037 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse