scispace - formally typeset
M

Milan Straka

Researcher at Charles University in Prague

Publications -  80
Citations -  3412

Milan Straka is an academic researcher from Charles University in Prague. The author has contributed to research in topics: Treebank & Czech. The author has an hindex of 20, co-authored 76 publications receiving 2655 citations.

Papers
More filters
Proceedings ArticleDOI

Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe

TL;DR: An update to UDPipe 1.0 (Straka et al., 2016), a trainable pipeline which performs sentence segmentation, tokenization, POS tagging, lemmatization and dependency parsing, which provides models for all 50 languages of UD 2.0.
Proceedings Article

CoNLL 2018 Shared Task : Multilingual Parsing from Raw Text to Universal Dependencies

TL;DR: This overview paper defines the task and the updated evaluation methodology, describes data preparation, report and analyze the main results, and provides a brief categorization of the different approaches of the participating systems.
Proceedings Article

UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing

TL;DR: UDPipe, a pipeline processing CoNLL-U-formatted files, performs tokenization, morphological analysis, part-of-speech tagging, lemmatization and dependency parsing for nearly all treebanks of Universal Dependencies 1.2.
Proceedings ArticleDOI

CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

TL;DR: The task and evaluation methodology is defined, how the data sets were prepared, report and analyze the main results, and a brief categorization of the different approaches of the participating systems are provided.
Proceedings ArticleDOI

75 Languages, 1 Model: Parsing Universal Dependencies Universally

TL;DR: It is found that fine-tuning a multilingual BERT self-attention model pretrained on 104 languages can meet or exceed state-of-the-art UPOS, UFeats, Lemmas, (and especially) UAS, and LAS scores, without requiring any recurrent or language-specific components.