B
Been Kim
Researcher at Google
Publications - 75
Citations - 13180
Been Kim is an academic researcher from Google. The author has contributed to research in topics: Interpretability & Computer science. The author has an hindex of 38, co-authored 70 publications receiving 8631 citations. Previous affiliations of Been Kim include Massachusetts Institute of Technology & Allen Institute for Artificial Intelligence.
Papers
More filters
Posted Content
Learning About Meetings
Been Kim,Cynthia Rudin +1 more
TL;DR: Tentative evidence that it is possible to automatically detect when during the meeting a key decision is taking place and it is often possible to predict whether a proposal during a meeting will be accepted or rejected based entirely on the language used by the speaker is provided.
Journal ArticleDOI
Inferring team task plans from human meetings: a generative modeling approach with logic-based prior
TL;DR: In this article, a hybrid approach combines probabilistic generative modeling with logical plan validation used to compute a highly structured prior over possible plans, enabling them to overcome the challenge of performing inference over a large solution space with only a small amount of noisy data from the team planning session.
Journal ArticleDOI
Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models
TL;DR: This article showed that representation denoising does not provide any insight into which model MLP layer would be best to edit in order to override an existing stored fact with a new one.
Journal ArticleDOI
Impossibility Theorems for Feature Attribution
TL;DR: The authors show that for moderately rich model classes (easily satisfied by neural networks), any feature attribution method that is complete and linear (i.e., Integrated Gradients and SHAP) can provably fail to improve on random guessing for inferring model behavior.
Posted Content
Interpreting Black Box Predictions using Fisher Kernels
TL;DR: The authors use Fisher kernels as the defining feature embedding of each data point, combined with Sequential Bayesian Quadrature (SBQ) for efficient selection of examples for black box interpretation of test predictions in terms of training examples.