Home
/
Authors
/
Eric Malmi

Author

Eric Malmi

Other affiliations: Helsinki Institute for Information Technology, Idiap Research Institute, University of Helsinki ...read more

Bio: Eric Malmi is an academic researcher from Google. The author has contributed to research in topics: Sentence & Language model. The author has an hindex of 15, co-authored 56 publications receiving 651 citations. Previous affiliations of Eric Malmi include Helsinki Institute for Information Technology & Idiap Research Institute.

Topics: Sentence, Language model, Lyrics, Task (project management), Anomaly detection ...read more

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Encode, Tag, Realize: High-Precision Text Editing

[...]

Eric Malmi¹, Sebastian Krause², Sascha Rothe³, Daniil Mirylenka⁴, Aliaksei Severyn³ - Show less +1 more•Institutions (4)

Aalto University¹, German Research Centre for Artificial Intelligence², Google³, University of Trento⁴

03 Sep 2019

TL;DR: LaserTagger is proposed - a sequence tagging approach that casts text generation as a text editing task, and it is shown that at inference time tagging can be more than two orders of magnitude faster than comparable seq2seq models, making it more attractive for running in a live environment.

...read moreread less

Abstract: We propose LaserTagger - a sequence tagging approach that casts text generation as a text editing task. Target texts are reconstructed from the inputs using three main edit operations: keeping a token, deleting it, and adding a phrase before the token. To predict the edit operations, we propose a novel model, which combines a BERT encoder with an autoregressive Transformer decoder. This approach is evaluated on English text on four tasks: sentence fusion, sentence splitting, abstractive summarization, and grammar correction. LaserTagger achieves new state-of-the-art results on three of these tasks, performs comparably to a set of strong seq2seq baselines with a large number of training examples, and outperforms them when the number of examples is limited. Furthermore, we show that at inference time tagging can be more than two orders of magnitude faster than comparable seq2seq models, making it more attractive for running in a live environment.

...read moreread less

161 citations

Posted Content•

You Are What Apps You Use: Demographic Prediction Based on User's Apps

[...]

Eric Malmi¹, Ingmar Weber²•Institutions (2)

Aalto University¹, Qatar Computing Research Institute²

29 Feb 2016-arXiv: Social and Information Networks

TL;DR: The predictability of user demographics based on the list of a user's apps and the effect of the training set size and the number of apps on the predictability are looked into and it is shown that both of these factors have a large impact on the prediction accuracy.

...read moreread less

Abstract: Understanding the demographics of app users is crucial, for example, for app developers, who wish to target their advertisements more effectively. Our work addresses this need by studying the predictability of user demographics based on the list of a user's apps which is readily available to many app developers. We extend previous work on the problem on three frontiers: (1) We predict new demographics (age, race, and income) and analyze the most informative apps for four demographic attributes included in our analysis. The most predictable attribute is gender (82.3 % accuracy), whereas the hardest to predict is income (60.3 % accuracy). (2) We compare several dimensionality reduction methods for high-dimensional app data, finding out that an unsupervised method yields superior results compared to aggregating the apps at the app category level, but the best results are obtained simply by the raw list of apps. (3) We look into the effect of the training set size and the number of apps on the predictability and show that both of these factors have a large impact on the prediction accuracy. The predictability increases, or in other words, a user's privacy decreases, the more apps the user has used, but somewhat surprisingly, after 100 apps, the prediction accuracy starts to decrease.

...read moreread less

75 citations

Posted Content•

A Simple Recipe for Multilingual Grammatical Error Correction

[...]

Sascha Rothe¹, Jonathan Mallinson², Eric Malmi¹, Sebastian Krause³, Aliaksei Severyn¹ - Show less +1 more•Institutions (3)

Google¹, University of Edinburgh², German Research Centre for Artificial Intelligence³

07 Jun 2021-arXiv: Computation and Language

TL;DR: It is demonstrated that performing a single fine-tuning step on cLang-8 with the off-the-shelf language models yields further accuracy improvements over an already top-performing gT5 model for English.

...read moreread less

Abstract: This paper presents a simple recipe to train state-of-the-art multilingual Grammatical Error Correction (GEC) models. We achieve this by first proposing a language-agnostic method to generate a large number of synthetic examples. The second ingredient is to use large-scale multilingual language models (up to 11B parameters). Once fine-tuned on language-specific supervised sets we surpass the previous state-of-the-art results on GEC benchmarks in four languages: English, Czech, German and Russian. Having established a new set of baselines for GEC, we make our results easily reproducible and accessible by releasing a cLang-8 dataset. It is produced by using our best model, which we call gT5, to clean the targets of a widely used yet noisy lang-8 dataset. cLang-8 greatly simplifies typical GEC training pipelines composed of multiple fine-tuning stages -- we demonstrate that performing a single fine-tuning step on cLang-8 with the off-the-shelf language models yields further accuracy improvements over an already top-performing gT5 model for English.

...read moreread less

55 citations

Proceedings Article•DOI•

DopeLearning: A Computational Approach to Rap Lyrics Generation

[...]

Eric Malmi¹, Pyry Takala¹, Hannu Toivonen², Tapani Raiko¹, Aristides Gionis¹ - Show less +1 more•Institutions (2)

Aalto University¹, University of Helsinki²

13 Aug 2016

TL;DR: In this paper, a prediction model was developed to identify the next line of existing lyrics from a set of candidate next lines, and then they employed the prediction model to combine lines from existing songs, producing lyrics with rhyme and a meaning.

...read moreread less

Abstract: Writing rap lyrics requires both creativity to construct a meaningful, interesting story and lyrical skills to produce complex rhyme patterns, which form the cornerstone of good flow. We present a rap lyrics generation method that captures both of these aspects. First, we develop a prediction model to identify the next line of existing lyrics from a set of candidate next lines. This model is based on two machine-learning techniques: the RankSVM algorithm and a deep neural network model with a novel structure. Results show that the prediction model can identify the true next line among 299 randomly selected lines with an accuracy of 17%, i.e., over 50 times more likely than by random. Second, we employ the prediction model to combine lines from existing songs, producing lyrics with rhyme and a meaning. An evaluation of the produced lyrics shows that in terms of quantitative rhyme density, the method outperforms the best human rappers by 21%. The rap lyrics generator has been deployed as an online tool called DeepBeat, and the performance of the tool has been assessed by analyzing its usage logs. This analysis shows that machine-learned rankings correlate with user preferences.

...read moreread less

55 citations

Posted Content•

DiscoFuse: A Large-Scale Dataset for Discourse-Based Sentence Fusion

[...]

Mor Geva¹, Eric Malmi², Idan Szpektor², Jonathan Berant¹•Institutions (2)

Tel Aviv University¹, Google²

27 Feb 2019-arXiv: Computation and Language

TL;DR: A method for automatically-generating fusion examples from raw text and a sequence-to-sequence model on DiscoFuse, a large scale dataset for discourse-based sentence fusion, are proposed and shown to improve performance on WebSplit when viewed as a sentence fusion task.

...read moreread less

Abstract: Sentence fusion is the task of joining several independent sentences into a single coherent text. Current datasets for sentence fusion are small and insufficient for training modern neural models. In this paper, we propose a method for automatically-generating fusion examples from raw text and present DiscoFuse, a large scale dataset for discourse-based sentence fusion. We author a set of rules for identifying a diverse set of discourse phenomena in raw text, and decomposing the text into two independent sentences. We apply our approach on two document collections: Wikipedia and Sports articles, yielding 60 million fusion examples annotated with discourse information required to reconstruct the fused text. We develop a sequence-to-sequence model on DiscoFuse and thoroughly analyze its strengths and weaknesses with respect to the various discourse phenomena, using both automatic as well as human evaluation. Finally, we conduct transfer learning experiments with WebSplit, a recent dataset for text simplification. We show that pretraining on DiscoFuse substantially improves performance on WebSplit when viewed as a sentence fusion task.

...read moreread less

39 citations

1
2
3
4
…
5
6
7
8
9
10
11
12

Collapse

Cited by

PDF

Open Access

More filters

Pattern Recognition and Machine Learning

[...]

Christopher M. Bishop¹•Institutions (1)

Microsoft¹

01 Jan 2006

TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.

...read moreread less

Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

...read moreread less

10,141 citations

Proceedings Article•DOI•

Exploring temporal effects for location recommendation on location-based social networks

[...]

Huiji Gao¹, Jiliang Tang¹, Xia Hu¹, Huan Liu¹•Institutions (1)

Arizona State University¹

12 Oct 2013

TL;DR: A novel location recommendation framework is introduced, based on the temporal properties of user movement observed from a real-world LBSN dataset, which exhibits the significance of temporal patterns in explaining user behavior, and demonstrates their power to improve location recommendation performance.

...read moreread less

Abstract: Location-based social networks (LBSNs) have attracted an inordinate number of users and greatly enriched the urban experience in recent years. The availability of spatial, temporal and social information in online LBSNs offers an unprecedented opportunity to study various aspects of human behavior, and enable a variety of location-based services such as location recommendation. Previous work studied spatial and social influences on location recommendation in LBSNs. Due to the strong correlations between a user's check-in time and the corresponding check-in location, recommender systems designed for location recommendation inevitably need to consider temporal effects. In this paper, we introduce a novel location recommendation framework, based on the temporal properties of user movement observed from a real-world LBSN dataset. The experimental results exhibit the significance of temporal patterns in explaining user behavior, and demonstrate their power to improve location recommendation performance.

...read moreread less

496 citations

Journal Article•DOI•

Leveraging Pre-trained Checkpoints for Sequence Generation Tasks

[...]

Sascha Rothe¹, Shashi Narayan¹, Aliaksei Severyn¹•Institutions (1)

Google¹

17 Jun 2020-Transactions of the Association for Computational Linguistics

TL;DR: A Transformer-based sequence-to-sequence model that is compatible with publicly available pre-trained BERT, GPT-2, and RoBERTa checkpoints is developed and an extensive empirical study on the utility of initializing the model, both encoder and decoder, with these checkpoints is conducted.

...read moreread less

Abstract: Unsupervised pre-training of large neural models has recently revolutionized Natural Language Processing. By warm-starting from the publicly released checkpoints, NLP practitioners have pushed the ...

...read moreread less

350 citations

Statistics and Computing

[...]

J. A. Eccleston

01 Jan 1999

TL;DR: Statistics and Computing Series Ed.

...read moreread less

Abstract: Submission information at the series homepage and springer.com/authors Order online at springer.com ▶ or for the Americas call (toll free) 1-800-SPRINGER ▶ or email us at: customerservice@springer.com. ▶ For outside the Americas call +49 (0) 6221-345-4301 ▶ or email us at: customerservice@springer.com. Statistics and Computing Series Ed.: W.K. Härdle Statistics and Computing (SC) includes monographs and advanced texts on statistical computing and statistical packages.

...read moreread less

228 citations

Proceedings Article•DOI•

Self-Instruct: Aligning Language Models with Self-Generated Instructions

[...]

Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi - Show less +3 more

20 Dec 2022

TL;DR: The authors propose Self-Instruct, a framework for improving the instruction-following capabilities of pre-trained language models by bootstrapping off their own generations, which generates instructions, input, and output samples from a language model, then filters invalid or similar ones before using them to finetune the original model.

...read moreread less

Abstract: Large “instruction-tuned” language models (i.e., finetuned to respond to instructions) have demonstrated a remarkable ability to generalize zero-shot to new tasks. Nevertheless, they depend heavily on human-written instruction data that is often limited in quantity, diversity, and creativity, therefore hindering the generality of the tuned model. We introduce Self-Instruct, a framework for improving the instruction-following capabilities of pretrained language models by bootstrapping off their own generations. Our pipeline generates instructions, input, and output samples from a language model, then filters invalid or similar ones before using them to finetune the original model. Applying our method to the vanilla GPT3, we demonstrate a 33% absolute improvement over the original model on Super-NaturalInstructions, on par with the performance of InstructGPT-001, which was trained with private user data and human annotations. For further evaluation, we curate a set of expert-written instructions for novel tasks, and show through human evaluation that tuning GPT3 with Self-Instruct outperforms using existing public instruction datasets by a large margin, leaving only a 5% absolute gap behind InstructGPT-001. Self-Instruct provides an almost annotation-free method for aligning pre-trained language models with instructions, and we release our large synthetic dataset to facilitate future studies on instruction tuning.

...read moreread less

201 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161

Collapse