mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer
Citations
10 citations
10 citations
10 citations
10 citations
10 citations
References
52,856 citations
13,994 citations
"mT5: A Massively Multilingual Pre-t..." refers methods in this paper
..., 2020b), and RoBERTa (Liu et al., 2019), respectively....
[...]
...It uses data in 26 languages from Wikipedia and CC-News (Liu et al., 2019)....
[...]
...XLM-R (Conneau et al., 2020) is an improved version of XLM based on the RoBERTa model (Liu et al., 2019)....
[...]
..., 2020) is an improved version of XLM based on the RoBERTa model (Liu et al., 2019)....
[...]
...Popular models of this type are mBERT (Devlin, 2018), mBART (Liu et al., 2020a), and XLM-R (Conneau et al., 2020), which are multilingual variants of BERT (Devlin et al., 2019), BART (Lewis et al., 2020b), and RoBERTa (Liu et al., 2019), respectively....
[...]
3,667 citations
3,248 citations
"mT5: A Massively Multilingual Pre-t..." refers background or methods in this paper
...XLM-R (Conneau et al., 2020) is an improved version of XLM based on the RoBERTa model (Liu et al., 2019)....
[...]
...Values used by prior work include α = 0.7 for mBERT (Devlin, 2018), α = 0.3 for XLM-R (Conneau et al., 2020), and α = 0.2 for MMNMT (Arivazhagan et al., 2019)....
[...]
...We therefore take the approach used in (Devlin, 2018; Conneau et al., 2020; Arivazhagan et al., 2019) and boost lower-resource languages by sampling examples according to the probability p(L) ∝ |L|α, where p(L) is the probability of sampling text from a given language during pre-training and |L| is the number of examples in the language....
[...]
...We therefore take the approach used in (Devlin, 2018; Conneau et al., 2020; Arivazhagan et al., 2019) and boost lower-resource languages by sampling examples according to the probability p(L) ∝ |L|α, where p(L) is the probability of sampling text from a given language during pre-training and |L| is…...
[...]
..., 2020a), and XLM-R (Conneau et al., 2020), which are multilingual variants of BERT (Devlin...
[...]
2,128 citations