scispace - formally typeset
Search or ask a question
Author

Anupam Datta

Bio: Anupam Datta is an academic researcher from Carnegie Mellon University. The author has contributed to research in topics: Personally identifiable information & Privacy policy. The author has an hindex of 39, co-authored 192 publications receiving 6607 citations. Previous affiliations of Anupam Datta include Assam Medical College & Stanford University.


Papers
More filters
Proceedings ArticleDOI
16 May 2010
TL;DR: TrustVisor is presented, a special-purpose hypervisor that provides code integrity as well as data integrity and secrecy for selected portions of an application that has a very small code base that makes verification feasible.
Abstract: An important security challenge is to protect the execution of security-sensitive code on legacy systems from malware that may infect the OS, applications, or system devices. Prior work experienced a tradeoff between the level of security achieved and efficiency. In this work, we leverage the features of modern processors from AMD and Intel to overcome the tradeoff to simultaneously achieve a high level of security and high performance. We present TrustVisor, a special-purpose hypervisor that provides code integrity as well as data integrity and secrecy for selected portions of an application. TrustVisor achieves a high level of security, first because it can protect sensitive code at a very fine granularity, and second because it has a very small code base (only around 6K lines of code) that makes verification feasible. TrustVisor can also attest the existence of isolated execution to an external entity. We have implemented TrustVisor to protect security-sensitive code blocks while imposing less than 7% overhead on the legacy OS and its applications in the common case.

616 citations

Proceedings ArticleDOI
22 May 2016
TL;DR: The transparency-privacy tradeoff is explored and it is proved that a number of useful transparency reports can be made differentially private with very little addition of noise.
Abstract: Algorithmic systems that employ machine learning play an increasing role in making substantive decisions in modern society, ranging from online personalization to insurance and credit decisions to predictive policing. But their decision-making processes are often opaque -- it is difficult to explain why a certain decision was made. We develop a formal foundation to improve the transparency of such decision-making systems. Specifically, we introduce a family of Quantitative Input Influence (QII) measures that capture the degree of influence of inputs on outputs of systems. These measures provide a foundation for the design of transparency reports that accompany system decisions (e.g., explaining a specific credit decision) and for testing tools useful for internal and external oversight (e.g., to detect algorithmic discrimination). Distinctively, our causal QII measures carefully account for correlated inputs while measuring influence. They support a general class of transparency queries and can, in particular, explain decisions about individuals (e.g., a loan decision) and groups (e.g., disparate impact based on gender). Finally, since single inputs may not always have high influence, the QII measures also quantify the joint influence of a set of inputs (e.g., age and income) on outcomes (e.g. loan decisions) and the marginal influence of individual inputs within such a set (e.g., income). Since a single input may be part of multiple influential sets, the average marginal influence of the input is computed using principled aggregation measures, such as the Shapley value, previously applied to measure influence in voting. Further, since transparency reports could compromise privacy, we explore the transparency-privacy tradeoff and prove that a number of useful transparency reports can be made differentially private with very little addition of noise. Our empirical validation with standard machine learning algorithms demonstrates that QII measures are a useful transparency mechanism when black box access to the learning system is available. In particular, they provide better explanations than standard associative measures for a host of scenarios that we consider. Further, we show that in the situations we consider, QII is efficiently approximable and can be made differentially private while preserving accuracy.

573 citations

Journal ArticleDOI
01 Apr 2015
TL;DR: AdFisher, an automated tool that explores how user behaviors, Google's ads, and Ad Settings interact, finds that the Ad Settings was opaque about some features of a user’s profile, that it does provide some choice on advertisements, and that these choices can lead to seemingly discriminatory ads.
Abstract: To partly address people’s concerns over web tracking, Google has created the Ad Settings webpage to provide information about and some choice over the profiles Google creates on users We present AdFisher, an automated tool that explores how user behaviors, Google’s ads, and Ad Settings interact Our tool uses a rigorous experimental design and analysis to ensure the statistical significance of our results It uses machine learning to automate the selection of a statistical test We use AdFisher to find that Ad Settings is opaque about some features of a user’s profile, that it does provide some choice on ads, and that these choices can lead to seemingly discriminatory ads In particular, we found that visiting webpages associated with substance abuse will change the ads shown but not the settings page We also found that setting the gender to female results in getting fewer instances of an ad related to high paying jobs than setting it to male

544 citations

Proceedings ArticleDOI
21 May 2006
TL;DR: This work formalizes some aspects of contextual integrity in a logical framework for expressing and reasoning about norms of transmission of personal information to capture naturally many notions of privacy found in legislation, including those found in HIPAA, COPPA, and GLBA.
Abstract: Contextual integrity is a conceptual framework for understanding privacy expectations and their implications developed in the literature on law, public policy, and political philosophy. We formalize some aspects of contextual integrity in a logical framework for expressing and reasoning about norms of transmission of personal information. In comparison with access control and privacy policy frameworks such as RBAC, EPAL, and P3P, these norms focus on who personal information is about, how it is transmitted, and past and future actions by both the subject and the users of the information. Norms can be positive or negative depending on whether they refer to actions that are allowed or disallowed. Our model is expressive enough to capture naturally many notions of privacy found in legislation, including those found in HIPAA, COPPA, and GLBA. A number of important problems regarding compliance with privacy norms, future requirements associated with specific actions, and relations between policies and legal standards reduce to standard decision procedures for temporal logic.

449 citations

Journal ArticleDOI
TL;DR: PCL supports compositional reasoning about complex security protocols and has been applied to a number of industry standards including SSL/TLS, IEEE 802.11i and Kerberos V5.

211 citations


Cited by
More filters
Journal ArticleDOI
01 Apr 1988-Nature
TL;DR: In this paper, a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) is presented.
Abstract: Deposits of clastic carbonate-dominated (calciclastic) sedimentary slope systems in the rock record have been identified mostly as linearly-consistent carbonate apron deposits, even though most ancient clastic carbonate slope deposits fit the submarine fan systems better. Calciclastic submarine fans are consequently rarely described and are poorly understood. Subsequently, very little is known especially in mud-dominated calciclastic submarine fan systems. Presented in this study are a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) that reveals a >250 m thick calciturbidite complex deposited in a calciclastic submarine fan setting. Seven facies are recognised from core and thin section characterisation and are grouped into three carbonate turbidite sequences. They include: 1) Calciturbidites, comprising mostly of highto low-density, wavy-laminated bioclast-rich facies; 2) low-density densite mudstones which are characterised by planar laminated and unlaminated muddominated facies; and 3) Calcidebrites which are muddy or hyper-concentrated debrisflow deposits occurring as poorly-sorted, chaotic, mud-supported floatstones. These

9,929 citations

Proceedings Article
04 Dec 2017
TL;DR: In this article, a unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations), is presented, which assigns each feature an importance value for a particular prediction.
Abstract: Understanding why a model makes a certain prediction can be as crucial as the prediction's accuracy in many applications. However, the highest accuracy for large modern datasets is often achieved by complex models that even experts struggle to interpret, such as ensemble or deep learning models, creating a tension between accuracy and interpretability. In response, various methods have recently been proposed to help users interpret the predictions of complex models, but it is often unclear how these methods are related and when one method is preferable over another. To address this problem, we present a unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations). SHAP assigns each feature an importance value for a particular prediction. Its novel components include: (1) the identification of a new class of additive feature importance measures, and (2) theoretical results showing there is a unique solution in this class with a set of desirable properties. The new class unifies six existing methods, notable because several recent methods in the class lack the proposed desirable properties. Based on insights from this unification, we present new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.

7,309 citations

Journal ArticleDOI
TL;DR: In this paper, a taxonomy of recent contributions related to explainability of different machine learning models, including those aimed at explaining Deep Learning methods, is presented, and a second dedicated taxonomy is built and examined in detail.

2,827 citations

Journal ArticleDOI
TL;DR: In this paper, the authors provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box decision support systems, given a problem definition, a black box type, and a desired explanation, this survey should help the researcher to find the proposals more useful for his own work.
Abstract: In recent years, many accurate decision support systems have been constructed as black boxes, that is as systems that hide their internal logic to the user. This lack of explanation constitutes both a practical and an ethical issue. The literature reports many approaches aimed at overcoming this crucial weakness, sometimes at the cost of sacrificing accuracy for interpretability. The applications in which black box decision systems can be used are various, and each approach is typically developed to provide a solution for a specific problem and, as a consequence, it explicitly or implicitly delineates its own definition of interpretability and explanation. The aim of this article is to provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box system. Given a problem definition, a black box type, and a desired explanation, this survey should help the researcher to find the proposals more useful for his own work. The proposed classification of approaches to open black box models should also be useful for putting the many research open questions in perspective.

2,805 citations

Proceedings Article
06 Aug 2017
TL;DR: In this article, the authors identify two fundamental axioms (sensitivity and implementation invariance) that attribution methods ought to satisfy and use them to guide the design of a new attribution method called Integrated Gradients.
Abstract: We study the problem of attributing the prediction of a deep network to its input features, a problem previously studied by several other works. We identify two fundamental axioms— Sensitivity and Implementation Invariance that attribution methods ought to satisfy. We show that they are not satisfied by most known attribution methods, which we consider to be a fundamental weakness of those methods. We use the axioms to guide the design of a new attribution method called Integrated Gradients. Our method requires no modification to the original network and is extremely simple to implement; it just needs a few calls to the standard gradient operator. We apply this method to a couple of image models, a couple of text models and a chemistry model, demonstrating its ability to debug networks, to extract rules from a network, and to enable users to engage with models better.

2,712 citations