Yuhui Zhang

Researcher at Tsinghua University

Publications - 17

Citations - 1568

Yuhui Zhang is an academic researcher from Tsinghua University. The author has contributed to research in topics: Computer science & Tokenization (data security). The author has an hindex of 7, co-authored 11 publications receiving 583 citations. Previous affiliations of Yuhui Zhang include Stanford University.

Papers

PDF

Open Access

More filters

Proceedings ArticleDOI

Stanza: A Python Natural Language Processing Toolkit for Many Human Languages

Peng Qi, +4 more

TL;DR: This work introduces Stanza, an open-source Python natural language processing toolkit supporting 66 human languages that features a language-agnostic fully neural pipeline for text analysis, including tokenization, multi-word token expansion, lemmatization, part-of-speech and morphological feature tagging, dependency parsing, and named entity recognition.

...read moreread less

Journal ArticleDOI

Holistic Evaluation of Language Models

Percy Liang, +48 more

- 16 Nov 2022 -

Annals of the New York Academy of Scienc...

TL;DR: The Holistic Evaluation of Language Models (HELM) as mentioned in this paper ) is a popular benchmark for language models, with 30 models evaluated on 16 core scenarios and 7 metrics, exposing important trade-offs.

...read moreread less

Posted Content

On the Opportunities and Risks of Foundation Models.

Rishi Bommasani, +113 more

- 16 Aug 2021 -

arXiv: Learning

TL;DR: The authors provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e. g.g. model architectures, training procedures, data, systems, security, evaluation, theory) to their applications.

...read moreread less

Posted Content

Biomedical and Clinical English Model Packages in the Stanza Python NLP Library

Yuhao Zhang, +4 more

- 29 Jul 2020 -

arXiv: Computation and Language

TL;DR: The study introduces biomedical and clinical NLP packages built for the Stanza library, which offer performance that is similar to the state of the art, and is also optimized for ease of use.

...read moreread less

Proceedings ArticleDOI

Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning

Weixin Liang, +4 more

TL;DR: Modality gap, an intriguing geometric phenomenon of the representation space of multi-modal models, is presented and it is demonstrated that varying the modality gap distance has a signiﬁcant impact in improving the model’s downstream zero-shot classi-cation performance and fairness.

...read moreread less