scispace - formally typeset
Author

Daniela Witten

Bio: Daniela Witten is a academic researcher at University of Washington who has co-authored 140 publication(s) receiving 24957 citation(s). The author has an hindex of 47. Previous affiliations of Daniela Witten include Institute for Advanced Study & Stanford University. The author has done significant research in the topic(s): Lasso (statistics) & Graphical model.

...read more

Papers
  More

01 Jan 2013-
Abstract: Statistics An Intduction to Stistical Lerning with Applications in R An Introduction to Statistical Learning provides an accessible overview of the fi eld of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fi elds ranging from biology to fi nance to marketing to astrophysics in the past twenty years. Th is book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classifi cation, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, and more. Color graphics and real-world examples are used to illustrate the methods presented. Since the goal of this textbook is to facilitate the use of these statistical learning techniques by practitioners in science, industry, and other fi elds, each chapter contains a tutorial on implementing the analyses and methods presented in R, an extremely popular open source statistical soft ware platform. Two of the authors co-wrote Th e Elements of Statistical Learning (Hastie, Tibshirani and Friedman, 2nd edition 2009), a popular reference book for statistics and machine learning researchers. An Introduction to Statistical Learning covers many of the same topics, but at a level accessible to a much broader audience. Th is book is targeted at statisticians and non-statisticians alike who wish to use cutting-edge statistical learning techniques to analyze their data. Th e text assumes only a previous course in linear regression and no knowledge of matrix algebra.

...read more

6,116 Citations


Open accessJournal ArticleDOI: 10.1038/NG.2892
Martin Kircher1, Daniela Witten1, Preti Jain, Brian J. O'Roak2  +3 moreInstitutions (2)
01 Mar 2014-Nature Genetics
Abstract: Our capacity to sequence human genomes has exceeded our ability to interpret genetic variation. Current genomic annotations tend to exploit a single information type (e.g. conservation) and/or are restricted in scope (e.g. to missense changes). Here, we describe Combined Annotation Dependent Depletion (CADD), a framework that objectively integrates many diverse annotations into a single, quantitative score. We implement CADD as a support vector machine trained to differentiate 14.7 million high-frequency human derived alleles from 14.7 million simulated variants. We pre-compute “C-scores” for all 8.6 billion possible human single nucleotide variants and enable scoring of short insertions/deletions. C-scores correlate with allelic diversity, annotations of functionality, pathogenicity, disease severity, experimentally measured regulatory effects, and complex trait associations, and highly rank known pathogenic variants within individual genomes. The ability of CADD to prioritize functional, deleterious, and pathogenic variants across many functional categories, effect sizes and genetic architectures is unmatched by any current annotation.

...read more

Topics: Genome-wide association study (54%), Genomics (51%)

4,148 Citations


Open accessBook
29 Jul 2021-
Abstract: An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, and more. Color graphics and real-world examples are used to illustrate the methods presented. Since the goal of this textbook is to facilitate the use of these statistical learning techniques by practitioners in science, industry, and other fields, each chapter contains a tutorial on implementing the analyses and methods presented in R, an extremely popular open source statistical software platform. Two of the authors co-wrote The Elements of Statistical Learning (Hastie, Tibshirani and Friedman, 2nd edition 2009), a popular reference book for statistics and machine learning researchers. An Introduction to Statistical Learning covers many of the same topics, but at a level accessible to a much broader audience. This book is targeted at statisticians and non-statisticians alike who wish to use cutting-edge statistical learning techniques to analyze their data. The text assumes only a previous course in linear regression and no knowledge of matrix algebra.

...read more

2,857 Citations


Open accessJournal ArticleDOI: 10.1093/BIOSTATISTICS/KXP008
01 Jul 2009-Biostatistics
Abstract: SUMMARY We present a penalized matrix decomposition (PMD), a new framework for computing a rank-K approximation for a matrix. We approximate the matrix X as ˆ X = � K=1 dkukv T , where dk, uk, and

...read more

Topics: Matrix decomposition (66%), Sparse matrix (64%), Symmetric matrix (64%) ...read more

1,363 Citations


Open accessJournal ArticleDOI: 10.1093/NAR/GKY1016
Abstract: Combined Annotation-Dependent Depletion (CADD) is a widely used measure of variant deleteriousness that can effectively prioritize causal variants in genetic analyses, particularly highly penetrant contributors to severe Mendelian disorders. CADD is an integrative annotation built from more than 60 genomic features, and can score human single nucleotide variants and short insertion and deletions anywhere in the reference assembly. CADD uses a machine learning model trained on a binary distinction between simulated de novo variants and variants that have arisen and become fixed in human populations since the split between humans and chimpanzees; the former are free of selective pressure and may thus include both neutral and deleterious alleles, while the latter are overwhelmingly neutral (or, at most, weakly deleterious) by virtue of having survived millions of years of purifying selection. Here we review the latest updates to CADD, including the most recent version, 1.4, which supports the human genome build GRCh38. We also present updates to our website that include simplified variant lookup, extended documentation, an Application Program Interface and improved mechanisms for integrating CADD scores into other tools or applications. CADD scores, software and documentation are available at https://cadd.gs.washington.edu.

...read more

1,204 Citations


Cited by
  More

Open accessJournal ArticleDOI: 10.1186/S13059-014-0550-8
05 Dec 2014-Genome Biology
Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .

...read more

Topics: MRNA Sequencing (54%), Integrator complex (51%), Count data (50%) ...read more

29,675 Citations


Open accessJournal Article
Abstract: THE DESIGN AND ANALYSIS OF EXPERIMENTS. By Oscar Kempthorne. New York, John Wiley and Sons, Inc., 1952. 631 pp. $8.50. This book by a teacher of statistics (as well as a consultant for \"experimenters\") is a comprehensive study of the philosophical background for the statistical design of experiment. It is necessary to have some facility with algebraic notation and manipulation to be able to use the volume intelligently. The problems are presented from the theoretical point of view, without such practical examples as would be helpful for those not acquainted with mathematics. The mathematical justification for the techniques is given. As a somewhat advanced treatment of the design and analysis of experiments, this volume will be interesting and helpful for many who approach statistics theoretically as well as practically. With emphasis on the \"why,\" and with description given broadly, the author relates the subject matter to the general theory of statistics and to the general problem of experimental inference. MARGARET J. ROBERTSON

...read more

12,326 Citations


Open accessJournal ArticleDOI: 10.1038/GIM.2015.30
Sue Richards1, Nazneen Aziz2, Nazneen Aziz3, Sherri J. Bale4  +9 moreInstitutions (11)
Abstract: Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology

...read more

11,349 Citations


Open accessJournal Article
Fumio Tajima1Institutions (1)
30 Oct 1989-Genomics
Abstract: The relationship between the two estimates of genetic variation at the DNA level, namely the number of segregating sites and the average number of nucleotide differences estimated from pairwise comparison, is investigated. It is found that the correlation between these two estimates is large when the sample size is small, and decreases slowly as the sample size increases. Using the relationship obtained, a statistical method for testing the neutral mutation hypothesis is developed. This method needs only the data of DNA polymorphism, namely the genetic variation within population at the DNA level. A simple method of computer simulation, that was used in order to obtain the distribution of a new statistic developed, is also presented. Applying this statistical method to the five regions of DNA sequences in Drosophila melanogaster, it is found that large insertion/deletion (greater than 100 bp) is deleterious. It is suggested that the natural selection against large insertion/deletion is so weak that a large amount of variation is maintained in a population.

...read more

Topics: Neutral mutation (54%), Population (54%), Genetic variation (52%) ...read more

10,734 Citations


Open accessBook
01 Jan 1998-

9,675 Citations


Performance
Metrics

Author's H-index: 47

No. of papers from the Author in previous years
YearPapers
20219
202012
201916
20188
20178
201610

Top Attributes

Show by:

Author's top 5 most impactful journals

arXiv: Methodology

23 papers, 750 citations

Biostatistics

6 papers, 1.7K citations

arXiv: Machine Learning

6 papers, 109 citations

bioRxiv

6 papers, 139 citations

The Annals of Applied Statistics

5 papers, 286 citations

Network Information
Related Authors (5)
William J Salerno

40 papers, 1.5K citations

95% related
Weixin Wang

6 papers, 243 citations

95% related
Devanshi Patel

8 papers, 171 citations

94% related
Waleed Nasser

16 papers, 833 citations

94% related
Jan Konrad

1 papers, 126 citations

94% related