scispace - formally typeset
Search or ask a question
Author

Shlomi Hod

Bio: Shlomi Hod is an academic researcher from Boston University. The author has contributed to research in topics: Artificial neural network & Modularity (networks). The author has an hindex of 4, co-authored 5 publications receiving 31 citations.

Papers
More filters
Posted Content
TL;DR: This work generalizes the results of Perdomo et al. (2020), who investigated "performative prediction" in a stateless setting to the case where the response of the population to the deployed classifier depends both on the classifier and the previous distribution of the Population.
Abstract: Deployed supervised machine learning models make predictions that interact with and influence the world. This phenomenon is called "performative prediction" by Perdomo et al. (2020), who investigated it in a stateless setting. We generalize their results to the case where the response of the population to the deployed classifier depends both on the classifier and the previous distribution of the population. We also demonstrate such a setting empirically, for the scenario of strategic manipulation.

41 citations

Posted Content
10 Mar 2020
TL;DR: A measurable notion of modularity is introduced for multi-layer perceptrons (MLPs) and it is found that MLPs that undergo training and weight pruning are often significantly more modular than random networks with the same distribution of weights.
Abstract: The learned weights of a neural network are often considered devoid of scrutable internal structure. To discern structure in these weights, we introduce a measurable notion of modularity for multi-layer perceptrons (MLPs), and investigate the modular structure of MLPs trained on datasets of small images. Our notion of modularity comes from the graph clustering literature: a "module" is a set of neurons with strong internal connectivity but weak external connectivity. We find that training and weight pruning produces MLPs that are more modular than randomly initialized ones, and often significantly more modular than random MLPs with the same (sparse) distribution of weights. Interestingly, they are much more modular when trained with dropout. We also present exploratory analyses of the importance of different modules for performance and how modules depend on each other. Understanding the modular structure of neural networks, when such structure exists, will hopefully render their inner workings more interpretable to engineers. Note that this paper has been superceded by "Clusterability in Neural Networks", arXiv:2103.03386!

9 citations

Posted Content
TL;DR: A measurable notion of modularity for multi-layer perceptrons (MLPs) is introduced, and it is found that training and weight pruning produces MLPs that are more modular than randomly initialized ones, and often significantly more modules than random MLPs with the same (sparse) distribution of weights.
Abstract: The learned weights of a neural network are often considered devoid of scrutable internal structure To discern structure in these weights, we introduce a measurable notion of modularity for multi-layer perceptrons (MLPs), and investigate the modular structure of MLPs trained on datasets of small images Our notion of modularity comes from the graph clustering literature: a "module" is a set of neurons with strong internal connectivity but weak external connectivity We find that training and weight pruning produces MLPs that are more modular than randomly initialized ones, and often significantly more modular than random MLPs with the same (sparse) distribution of weights Interestingly, they are much more modular when trained with dropout We also present exploratory analyses of the importance of different modules for performance and how modules depend on each other Understanding the modular structure of neural networks, when such structure exists, will hopefully render their inner workings more interpretable to engineers Note that this paper has been superceded by "Clusterability in Neural Networks", arXiv:210303386!

6 citations

Posted Content
TL;DR: In this article, the authors look for structure in the form of clusterability: how well a network can be divided into groups of neurons with strong internal connectivity but weak external connectivity and find that a trained neural network is typically more clusterable than randomly initialized networks, and often clusterable relative to random networks with the same distribution of weights.
Abstract: The learned weights of a neural network have often been considered devoid of scrutable internal structure. In this paper, however, we look for structure in the form of clusterability: how well a network can be divided into groups of neurons with strong internal connectivity but weak external connectivity. We find that a trained neural network is typically more clusterable than randomly initialized networks, and often clusterable relative to random networks with the same distribution of weights. We also exhibit novel methods to promote clusterability in neural network training, and find that in multi-layer perceptrons they lead to more clusterable networks with little reduction in accuracy. Understanding and controlling the clusterability of neural networks will hopefully render their inner workings more interpretable to engineers by facilitating partitioning into meaningful clusters.

5 citations

Journal ArticleDOI
TL;DR: Learning Responsible AI together as mentioned in this paper ) is a learning-based approach to learning responsible AI together, which can be seen as a learning approach for learning AI together with learning responsible learning.
Abstract: Learning Responsible AI together.

5 citations


Cited by
More filters
Proceedings Article
18 Jul 2021
TL;DR: A functional modular probing method is used to analyze deep model structures under OOD setting and demonstrates that even in biased models (which focus on spurious correlation) there still exist unbiased functional subnetworks.
Abstract: Can models with particular structure avoid being biased towards spurious correlation in out-of-distribution (OOD) generalization? Peters et al. (2016) provides a positive answer for linear cases. In this paper, we use a functional modular probing method to analyze deep model structures under OOD setting. We demonstrate that even in biased models (which focus on spurious correlation) there still exist unbiased functional subnetworks. Furthermore, we articulate and demonstrate the functional lottery ticket hypothesis: full network contains a subnetwork that can achieve better OOD performance. We then propose Modular Risk Minimization to solve the subnetwork selection problem. Our algorithm learns the subnetwork structure from a given dataset, and can be combined with any other OOD regularization methods. Experiments on various OOD generalization tasks corroborate the effectiveness of our method.

45 citations

Posted Content
TL;DR: This paper identifies a natural set of properties of the loss function and model-induced distribution shift under which the performative risk is convex, a property which does not follow from convexity of the losses alone.
Abstract: In performative prediction, predictions guide decision-making and hence can influence the distribution of future data. To date, work on performative prediction has focused on finding performatively stable models, which are the fixed points of repeated retraining. However, stable solutions can be far from optimal when evaluated in terms of the performative risk, the loss experienced by the decision maker when deploying a model. In this paper, we shift attention beyond performative stability and focus on optimizing the performative risk directly. We identify a natural set of properties of the loss function and model-induced distribution shift under which the performative risk is convex, a property which does not follow from convexity of the loss alone. Furthermore, we develop algorithms that leverage our structural assumptions to optimize the performative risk with better sample efficiency than generic methods for derivative-free convex optimization.

42 citations

Posted Content
TL;DR: A novel method based on learning binary weight masks to identify individual weights and subnets responsible for specific functions in NNs is presented, demonstrating how common NNs fail to reuse submodules and offering new insights into the related issue of systematic generalization on language tasks.
Abstract: Neural networks (NNs) whose subnetworks implement reusable functions are expected to offer numerous advantages, including compositionality through efficient recombination of functional building blocks, interpretability, preventing catastrophic interference, etc. Understanding if and how NNs are modular could provide insights into how to improve them. Current inspection methods, however, fail to link modules to their functionality. In this paper, we present a novel method based on learning binary weight masks to identify individual weights and subnets responsible for specific functions. Using this powerful tool, we contribute an extensive study of emerging modularity in NNs that covers several standard architectures and datasets. We demonstrate how common NNs fail to reuse submodules and offer new insights into the related issue of systematic generalization on language tasks.

32 citations

Posted Content
TL;DR: This paper test whether 17 unsupervised, weakly supervised, and fully supervised representation learning approaches correctly infer the generative factors of variation in simple datasets and observe that all of them struggle to learn the underlying mechanism regardless of supervision signal and architectural bias.
Abstract: An important component for generalization in machine learning is to uncover underlying latent factors of variation as well as the mechanism through which each factor acts in the world. In this paper, we test whether 17 unsupervised, weakly supervised, and fully supervised representation learning approaches correctly infer the generative factors of variation in simple datasets (dSprites, Shapes3D, MPI3D). In contrast to prior robustness work that introduces novel factors of variation during test time, such as blur or other (un)structured noise, we here recompose, interpolate, or extrapolate only existing factors of variation from the training data set (e.g., small and medium-sized objects during training and large objects during testing). Models that learn the correct mechanism should be able to generalize to this benchmark. In total, we train and test 2000+ models and observe that all of them struggle to learn the underlying mechanism regardless of supervision signal and architectural bias. Moreover, the generalization capabilities of all tested models drop significantly as we move from artificial datasets towards more realistic real-world datasets. Despite their inability to identify the correct mechanism, the models are quite modular as their ability to infer other in-distribution factors remains fairly stable, providing only a single factor is out-of-distribution. These results point to an important yet understudied problem of learning mechanistic models of observations that can facilitate generalization.

28 citations

10 Jan 2022
TL;DR: A new game theoretic framework for this phenomenon, called multi-player performative prediction, is formulates and it is shown that under mild assumptions, the performatively stable equilibria can be found efficiently by a variety of algorithms, including repeated retraining and repeated (stochastic) gradient play.
Abstract: Learning problems commonly exhibit an interesting feedback mechanism wherein the population data reacts to competing decision makers' actions. This paper formulates a new game theoretic framework for this phenomenon, called"multi-player performative prediction". We focus on two distinct solution concepts, namely (i) performatively stable equilibria and (ii) Nash equilibria of the game. The latter equilibria are arguably more informative, but can be found efficiently only when the game is monotone. We show that under mild assumptions, the performatively stable equilibria can be found efficiently by a variety of algorithms, including repeated retraining and the repeated (stochastic) gradient method. We then establish transparent sufficient conditions for strong monotonicity of the game and use them to develop algorithms for finding Nash equilibria. We investigate derivative free methods and adaptive gradient algorithms wherein each player alternates between learning a parametric description of their distribution and gradient steps on the empirical risk. Synthetic and semi-synthetic numerical experiments illustrate the results.

15 citations