scispace - formally typeset
M

Melanie Subbiah

Publications -  8
Citations -  12032

Melanie Subbiah is an academic researcher. The author has contributed to research in topics: Computer science & Language model. The author has an hindex of 2, co-authored 2 publications receiving 3036 citations.

Papers
More filters
Proceedings ArticleDOI

SafeText: A Benchmark for Exploring Physical Safety in Language Models

TL;DR: It is argued that state-of-the-art large language models are susceptible to the generation of unsafe text and haveulty rejecting unsafe advice, and it is argued for further studies of safety and the assessment of commonsense physical safety in models before release.
Proceedings ArticleDOI

Mitigating Covertly Unsafe Text within Natural Language Systems

TL;DR: This work distinguishes types of text that can lead to physical harm and establishes one particularly underexplored category: covertly unsafe text, which is further broken down with respect to the system’s information and discusses solutions to mitigate the generation of text in each of these subcategories.

Looking Under the Hood of DetectGPT

TL;DR: This paper analyzed DetectGPT and showed that selectively masking a combination of nouns, verbs, and adjectives improves the AUROC metric by up to 9.5%, demonstrating the importance of targeted masking strategies.