scispace - formally typeset
Search or ask a question

Categorical data analysis

12 Nov 2013-Iss: 1
TL;DR: Categorical data analysis, Categorical Data Analysis (CDA) as discussed by the authors, کتابخانه الکرونیک و دیجیتال - آذرسا
Abstract: Categorical data analysis , Categorical data analysis , کتابخانه الکترونیک و دیجیتال - آذرسا
Citations
More filters
Journal ArticleDOI
TL;DR: In this article, the applicability of statistical inferences to seismic assessment procedures is discussed, and the application of statistical inference to seismic fragility functions is also discussed, using dynamic structural analysis.
Abstract: Estimation of fragility functions using dynamic structural analysis is an important step in a number of seismic assessment procedures. This paper discusses the applicability of statistical inferenc...

896 citations

Journal ArticleDOI
TL;DR: The authors' scores achieved the highest discriminative power compared with all the deleteriousness prediction scores tested and showed low false-positive prediction rate for benign yet rare nonsynonymous variants, which demonstrated the value of combining information from multiple orthologous approaches.
Abstract: Accurate deleteriousness prediction for nonsynonymous variants is crucial for distinguishing pathogenic mutations from background polymorphisms in whole exome sequencing (WES) studies. Although many deleteriousness prediction methods have been developed, their prediction results are sometimes inconsistent with each other and their relative merits are still unclear in practical applications. To address these issues, we comprehensively evaluated the predictive performance of 18 current deleteriousness-scoring methods, including 11 function prediction scores (PolyPhen-2, SIFT, MutationTaster, Mutation Assessor, FATHMM, LRT, PANTHER, PhD-SNP, SNAP, SNPs&GO and MutPred), 3 conservation scores (GERP++, SiPhy and PhyloP) and 4 ensemble scores (CADD, PON-P, KGGSeq and CONDEL). We found that FATHMM and KGGSeq had the highest discriminative power among independent scores and ensemble scores, respectively. Moreover, to ensure unbiased performance evaluation of these prediction scores, we manually collected three distinct testing datasets, on which no current prediction scores were tuned. In addition, we developed two new ensemble scores that integrate nine independent scores and allele frequency. Our scores achieved the highest discriminative power compared with all the deleteriousness prediction scores tested and showed low false-positive prediction rate for benign yet rare nonsynonymous variants, which demonstrated the value of combining information from multiple orthologous approaches. Finally, to facilitate variant prioritization in WES studies, we have pre-computed our ensemble scores for 87 347 044 possible variants in the whole-exome and made them publicly available through the ANNOVAR software and the dbNSFP database.

878 citations

Journal ArticleDOI
TL;DR: The authors used logit and probit models to compare the coefficients of a given variable across different specified models fitted to the same set of variables. But the common practice of comparing the coefficients across different models is to compare a variable across a set of models.
Abstract: Logit and probit models are widely used in empirical sociological research. However, the common practice of comparing the coefficients of a given variable across differently specified models fitted...

847 citations

Book
01 May 2014
TL;DR: This textbook for senior undergraduate and graduate data mining courses provides a broad yet in-depth overview of data mining, integrating related concepts from machine learning and statistics.
Abstract: The fundamental algorithms in data mining and analysis form the basis for the emerging field of data science, which includes automated methods to analyze patterns and models for all kinds of data, with applications ranging from scientific discovery to business intelligence and analytics. This textbook for senior undergraduate and graduate data mining courses provides a broad yet in-depth overview of data mining, integrating related concepts from machine learning and statistics. The main parts of the book include exploratory data analysis, pattern mining, clustering, and classification. The book lays the basic foundations of these tasks, and also covers cutting-edge topics such as kernel methods, high-dimensional data analysis, and complex graphs and networks. With its comprehensive coverage, algorithmic perspective, and wealth of examples, this book offers solid guidance in data mining for students, researchers, and practitioners alike. Key features: Covers both core methods and cutting-edge research Algorithmic approach with open-source implementations Minimal prerequisites: all key mathematical concepts are presented, as is the intuition behind the formulas Short, self-contained chapters with class-tested examples and exercises allow for flexibility in designing a course and for easy reference Supplementary website with lecture slides, videos, project ideas, and more

844 citations

Journal ArticleDOI
TL;DR: The essentials in calculating power and sample size for a variety of applied study designs for a wide range of study designs are covered.
Abstract: Determining the optimal sample size for a study assures an adequate power to detect statistical significance. Hence, it is a critical step in the design of a planned research protocol. Using too many participants in a study is expensive and exposes more number of subjects to procedure. Similarly, if study is underpowered, it will be statistically inconclusive and may make the whole protocol a failure. This paper covers the essentials in calculating power and sample size for a variety of applied study designs. Sample size computation for single group mean, survey type of studies, 2 group studies based on means and proportions or rates, correlation studies and for case-control for assessing the categorical outcome are presented in detail.

691 citations