scispace - formally typeset
Search or ask a question
Posted Content

Increasing Trust in AI Services through Supplier's Declarations of Conformity

TL;DR: In this article, a supplier's declaration of conformity (SDoC) for artificial intelligence (AI) services is proposed to help increase trust in AI services, which is a transparent, standardized, but often not legally required, document used in many industries and sectors to describe the lineage of a product along with safety and performance testing it has undergone.
Abstract: The accuracy and reliability of machine learning algorithms are an important concern for suppliers of artificial intelligence (AI) services, but considerations beyond accuracy, such as safety, security, and provenance, are also critical elements to engender consumers' trust in a service. In this paper, we propose a supplier's declaration of conformity (SDoC) for AI services to help increase trust in AI services. An SDoC is a transparent, standardized, but often not legally required, document used in many industries and sectors to describe the lineage of a product along with the safety and performance testing it has undergone. We envision an SDoC for AI services to contain purpose, performance, safety, security, and provenance information to be completed and voluntarily released by AI service providers for examination by consumers. Importantly, it conveys product-level rather than component-level functional testing. We suggest a set of declaration items tailored to AI and provide examples for two fictitious AI services.
Citations
More filters
Posted Content
TL;DR: Documentation to facilitate communication between dataset creators and consumers and consumers is presented.
Abstract: The machine learning community currently has no standardized process for documenting datasets, which can lead to severe consequences in high-stakes domains. To address this gap, we propose datasheets for datasets. In the electronics industry, every component, no matter how simple or complex, is accompanied with a datasheet that describes its operating characteristics, test results, recommended uses, and other information. By analogy, we propose that every dataset be accompanied with a datasheet that documents its motivation, composition, collection process, recommended uses, and so on. Datasheets for datasets will facilitate better communication between dataset creators and dataset consumers, and encourage the machine learning community to prioritize transparency and accountability.

1,080 citations

Posted Content
TL;DR: CTRL is released, a 1.63 billion-parameter conditional transformer language model, trained to condition on control codes that govern style, content, and task-specific behavior, providing more explicit control over text generation.
Abstract: Large-scale language models show promising text generation capabilities, but users cannot easily control particular aspects of the generated text. We release CTRL, a 1.63 billion-parameter conditional transformer language model, trained to condition on control codes that govern style, content, and task-specific behavior. Control codes were derived from structure that naturally co-occurs with raw text, preserving the advantages of unsupervised learning while providing more explicit control over text generation. These codes also allow CTRL to predict which parts of the training data are most likely given a sequence. This provides a potential method for analyzing large amounts of data via model-based source attribution. We have released multiple full-sized, pretrained versions of CTRL at this https URL.

844 citations

Proceedings ArticleDOI
TL;DR: This work proposes model cards, a framework that can be used to document any trained machine learning model in the application fields of computer vision and natural language processing, and provides cards for two supervised models: One trained to detect smiling faces in images, and one training to detect toxic comments in text.
Abstract: Trained machine learning models are increasingly used to perform high-impact tasks in areas such as law enforcement, medicine, education, and employment. In order to clarify the intended use cases of machine learning models and minimize their usage in contexts for which they are not well suited, we recommend that released models be accompanied by documentation detailing their performance characteristics. In this paper, we propose a framework that we call model cards, to encourage such transparent model reporting. Model cards are short documents accompanying trained machine learning models that provide benchmarked evaluation in a variety of conditions, such as across different cultural, demographic, or phenotypic groups (e.g., race, geographic location, sex, Fitzpatrick skin type) and intersectional groups (e.g., age and race, or sex and Fitzpatrick skin type) that are relevant to the intended application domains. Model cards also disclose the context in which models are intended to be used, details of the performance evaluation procedures, and other relevant information. While we focus primarily on human-centered machine learning models in the application fields of computer vision and natural language processing, this framework can be used to document any trained machine learning model. To solidify the concept, we provide cards for two supervised models: One trained to detect smiling faces in images, and one trained to detect toxic comments in text. We propose model cards as a step towards the responsible democratization of machine learning and related AI technology, increasing transparency into how well AI technology works. We hope this work encourages those releasing trained machine learning models to accompany model releases with similar detailed evaluation numbers and other relevant documentation.

744 citations

Journal ArticleDOI
11 Jul 2019
TL;DR: A framework for identifying a broad range of menaces in the research and practices around social data is presented, including biases and inaccuracies at the source of the data, but also introduced during processing.
Abstract: Social data in digital form—including user-generated content, expressed or implicit relations between people, and behavioral traces—are at the core of popular applications and platforms, driving the research agenda of many researchers. The promises of social data are many, including understanding “what the world thinks” about a social issue, brand, celebrity, or other entity, as well as enabling better decision-making in a variety of fields including public policy, healthcare, and economics. Many academics and practitioners have warned against the naive usage of social data. There are biases and inaccuracies occurring at the source of the data, but also introduced during processing. There are methodological limitations and pitfalls, as well as ethical boundaries and unexpected consequences that are often overlooked. This paper recognizes the rigor with which these issues are addressed by different researchers varies across a wide range. We identify a variety of menaces in the practices around social data use, and organize them in a framework that helps to identify them. “For your own sanity, you have to remember that not all problems can be solved. Not all problems can be solved, but all problems can be illuminated.” –Ursula Franklin1

379 citations

Posted Content
TL;DR: This report suggests various steps that different stakeholders can take to improve the verifiability of claims made about AI systems and their associated development processes, with a focus on providing evidence about the safety, security, fairness, and privacy protection of AI systems.
Abstract: With the recent wave of progress in artificial intelligence (AI) has come a growing awareness of the large-scale impacts of AI systems, and recognition that existing regulations and norms in industry and academia are insufficient to ensure responsible AI development. In order for AI developers to earn trust from system users, customers, civil society, governments, and other stakeholders that they are building AI responsibly, they will need to make verifiable claims to which they can be held accountable. Those outside of a given organization also need effective means of scrutinizing such claims. This report suggests various steps that different stakeholders can take to improve the verifiability of claims made about AI systems and their associated development processes, with a focus on providing evidence about the safety, security, fairness, and privacy protection of AI systems. We analyze ten mechanisms for this purpose--spanning institutions, software, and hardware--and make recommendations aimed at implementing, exploring, or improving those mechanisms.

191 citations

References
More filters
Book
01 Jan 2013
Abstract: 1. The integrated wholeness of the organism must be one of the foundation stones of motivation theory. 2. The hunger drive (or any other physiological drive) was rejected as a centering point or model for a definitive theory of motivation. Any drive that is somatically based and localizable was shown to be atypical rather than typical in human motivation. 3. Such a theory should stress and center itself upon ultimate or basic goals rather than partial or superficial ones, upon ends rather than means to these ends. Such a stress would imply a more central place for unconscious than for conscious motivations. 4. There are usually available various cultural paths to the same goal. Therefore conscious, specific, local-cultural desires are not as fundamental in motivation theory as the more basic, unconscious goals. 5. Any motivated behavior, either preparatory or consummatory, must be understood to be a channel through which many basic needs may be simultaneously expressed or satisfied. Typically an act has more than one motivation. 6. Practically all organismic states are to be understood as motivated and as motivating. 7. Human needs arrange themselves in hierarchies of prepotency. That is to say, the appearance of one need usually rests on the prior satisfaction of another, more pre-potent need. Man is a perpetually wanting animal. Also no need or drive can be treated as if it were isolated or discrete; every drive is related to the state of satisfaction or dissatisfaction of other drives. 8. Lists of drives will get us nowhere for various theoretical and practical reasons. Furthermore any classification of motivations

18,001 citations

Journal ArticleDOI
TL;DR: In this paper, the authors present a struggling attempt to give structure to the statement: "Business in under-developed countries is difficult"; in particular, a structure is given for determining the economic costs of dishonesty.
Abstract: This paper relates quality and uncertainty. The existence of goods of many grades poses interesting and important problems for the theory of markets. On the one hand, the interaction of quality differences and uncertainty may explain important institutions of the labor market. On the other hand, this paper presents a struggling attempt to give structure to the statement: “Business in under-developed countries is difficult”; in particular, a structure is given for determining the economic costs of dishonesty. Additional applications of the theory include comments on the structure of money markets, on the notion of “insurability,” on the liquidity of durables, and on brand-name goods.

17,764 citations

Journal ArticleDOI
TL;DR: In this paper, the mean absolute scaled error (MESEME) was proposed as the standard measure for comparing forecast accuracy across multiple time series across different time series types, and was used in the M-competition as well as the M3competition.

3,870 citations

Proceedings ArticleDOI
01 Jan 2010
TL;DR: The current relationship between statistics and Python and open source more generally is discussed, outlining how the statsmodels package fills a gap in this relationship.
Abstract: Statsmodels is a library for statistical and econometric analysis in Python. This paper discusses the current relationship between statistics and Python and open source more generally, outlining how the statsmodels package fills a gap in this relationship. An overview of statsmodels is provided, including a discussion of the overarching design and philosophy, what can be found in the package, and some usage examples. The paper concludes with a look at what the future holds.

3,116 citations

Posted Content
TL;DR: This position paper defines interpretability and describes when interpretability is needed (and when it is not), and suggests a taxonomy for rigorous evaluation and exposes open questions towards a more rigorous science of interpretable machine learning.
Abstract: As machine learning systems become ubiquitous, there has been a surge of interest in interpretable machine learning: systems that provide explanation for their outputs. These explanations are often used to qualitatively assess other criteria such as safety or non-discrimination. However, despite the interest in interpretability, there is very little consensus on what interpretable machine learning is and how it should be measured. In this position paper, we first define interpretability and describe when interpretability is needed (and when it is not). Next, we suggest a taxonomy for rigorous evaluation and expose open questions towards a more rigorous science of interpretable machine learning.

2,589 citations

Trending Questions (1)
How can AI deployments be used to increase trust?

The paper proposes the use of a supplier's declaration of conformity (SDoC) for AI services to increase trust. The SDoC would contain information about the purpose, performance, safety, security, and provenance of the AI service.