What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers

Posted Content•

What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers

Boseop Kim, HyoungSeok Kim, Sang Woo Lee¹, Gichang Lee, Dong-Hyun Kwak¹, Dong Hyeon Jeon, Sunghyun Park², Sungju Kim, Seonhoon Kim³, Dongpil Seo, Heungsub Lee, Minyoung Jeong, Sungjae Lee⁴, Minsub Kim, Suk Hyun Ko, Seokhun Kim, Taeyong Park³, Jinuk Kim, Soyoung Kang, Na-Hyeon Ryu, Kang Min Yoo¹, Minsuk Chang⁵, Soobin Suh, Sookyo In, Jin-Seong Park⁶, Kyungduk Kim⁷, Hiun Kim, Jisu Jeong¹, Yong Goo Yeo, Donghoon Ham, Dongju Park, Min Young Lee⁸, Jae-Wook Kang⁹, Inho Kang¹, Jung-Woo Ha¹, Woo-Myoung Park⁷, Nako Sung¹ - Show less +33 more•Institutions (9)

Naver Corporation¹, Amazon.com², Seoul National University³, Dong-eui University⁴, KAIST⁵, Hanyang University⁶, Samsung⁷, Yonsei University⁸, Chonbuk National University⁹

10 Sep 2021-arXiv: Computation and Language-

TL;DR: HyperCLOVA as discussed by the authors is a Korean variant of 82B GPT-3 trained on a Korean-centric corpus of 560B tokens, which shows state-of-the-art zero-shot and few-shot learning performances on various downstream tasks in Korean.

read less

Abstract: GPT-3 shows remarkable in-context learning ability of large-scale language models (LMs) trained on hundreds of billion scale data. Here we address some remaining issues less reported by the GPT-3 paper, such as a non-English LM, the performances of different sized models, and the effect of recently introduced prompt optimization on in-context learning. To achieve this, we introduce HyperCLOVA, a Korean variant of 82B GPT-3 trained on a Korean-centric corpus of 560B tokens. Enhanced by our Korean-specific tokenization, HyperCLOVA with our training configuration shows state-of-the-art in-context zero-shot and few-shot learning performances on various downstream tasks in Korean. Also, we show the performance benefits of prompt-based learning and demonstrate how it can be integrated into the prompt engineering pipeline. Then we discuss the possibility of materializing the No Code AI paradigm by providing AI prototyping capabilities to non-experts of ML by introducing HyperCLOVA studio, an interactive prompt engineering interface. Lastly, we demonstrate the potential of our methods with three successful in-house applications.

...read moreread less

What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers

Citations

References

Related Papers (5)