Milan Straka

Researcher at Charles University in Prague

Publications - 80

Citations - 3412

Milan Straka is an academic researcher from Charles University in Prague. The author has contributed to research in topics: Treebank & Czech. The author has an hindex of 20, co-authored 76 publications receiving 2655 citations.

Papers

PDF

Open Access

More filters

Proceedings ArticleDOI

Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe

Milan Straka, +1 more

TL;DR: An update to UDPipe 1.0 (Straka et al., 2016), a trainable pipeline which performs sentence segmentation, tokenization, POS tagging, lemmatization and dependency parsing, which provides models for all 50 languages of UD 2.0.

...read moreread less

Proceedings Article

CoNLL 2018 Shared Task : Multilingual Parsing from Raw Text to Universal Dependencies

Daniel Zeman, +7 more

TL;DR: This overview paper defines the task and the updated evaluation methodology, describes data preparation, report and analyze the main results, and provides a brief categorization of the different approaches of the participating systems.

...read moreread less

Proceedings Article

UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing

Milan Straka, +2 more

TL;DR: UDPipe, a pipeline processing CoNLL-U-formatted files, performs tokenization, morphological analysis, part-of-speech tagging, lemmatization and dependency parsing for nearly all treebanks of Universal Dependencies 1.2.

...read moreread less

Proceedings ArticleDOI

CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

Daniel Zeman, +60 more

TL;DR: The task and evaluation methodology is defined, how the data sets were prepared, report and analyze the main results, and a brief categorization of the different approaches of the participating systems are provided.

...read moreread less

Proceedings ArticleDOI

75 Languages, 1 Model: Parsing Universal Dependencies Universally

Dan Kondratyuk, +1 more

TL;DR: It is found that fine-tuning a multilingual BERT self-attention model pretrained on 104 languages can meet or exceed state-of-the-art UPOS, UFeats, Lemmas, (and especially) UAS, and LAS scores, without requiring any recurrent or language-specific components.

...read moreread less

Collapse