scispace - formally typeset
Search or ask a question
Author

T. Tony Cai

Bio: T. Tony Cai is an academic researcher from University of Pennsylvania. The author has contributed to research in topics: Estimator & Minimax. The author has an hindex of 80, co-authored 550 publications receiving 24841 citations. Previous affiliations of T. Tony Cai include University of Chicago & University of Oslo.


Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, the problem of interval estimation of a binomial proportion is revisited, and a number of natural alternatives are presented, each with its motivation and con- text, each interval is examined for its coverage probability and its length.
Abstract: We revisit the problem of interval estimation of a binomial proportion. The erratic behavior of the coverage probability of the stan- d ardWaldconfid ence interval has previously been remarkedon in the literature (Blyth andStill, Agresti andCoull, Santner andothers). We begin by showing that the chaotic coverage properties of the Waldinter- val are far more persistent than is appreciated. Furthermore, common textbook prescriptions regarding its safety are misleading and defective in several respects andcannot be trusted . This leads us to consideration of alternative intervals. A number of natural alternatives are presented, each with its motivation and con- text. Each interval is examinedfor its coverage probability andits length. Basedon this analysis, we recommendthe Wilson interval or the equal- tailedJeffreys prior interval for small n andthe interval suggestedin Agresti andCoull for larger n. We also provide an additional frequentist justification for use of the Jeffreys interval.

2,893 citations

Journal ArticleDOI
TL;DR: It is shown that under conditions on the mutual incoherence and the minimum magnitude of the nonzero components of the signal, the support of the signals can be recovered exactly by the OMP algorithm with high probability.
Abstract: We consider the orthogonal matching pursuit (OMP) algorithm for the recovery of a high-dimensional sparse signal based on a small number of noisy linear measurements. OMP is an iterative greedy algorithm that selects at each step the column, which is most correlated with the current residuals. In this paper, we present a fully data driven OMP algorithm with explicit stopping rules. It is shown that under conditions on the mutual incoherence and the minimum magnitude of the nonzero components of the signal, the support of the signal can be recovered exactly by the OMP algorithm with high probability. In addition, we also consider the problem of identifying significant components in the case where some of the nonzero components are possibly small. It is shown that in this case the OMP algorithm will still select all the significant components before possibly selecting incorrect ones. Moreover, with modified stopping rules, the OMP algorithm can ensure that no zero components are selected.

1,093 citations

Journal ArticleDOI
TL;DR: A constrained ℓ1 minimization method for estimating a sparse inverse covariance matrix based on a sample of n iid p-variate random variables and is applied to analyze a breast cancer dataset and is found to perform favorably compared with existing methods.
Abstract: This article proposes a constrained l1 minimization method for estimating a sparse inverse covariance matrix based on a sample of n iid p-variate random variables. The resulting estimator is shown to have a number of desirable properties. In particular, the rate of convergence between the estimator and the true s-sparse precision matrix under the spectral norm is when the population distribution has either exponential-type tails or polynomial-type tails. We present convergence rates under the elementwise l∞ norm and Frobenius norm. In addition, we consider graphical model selection. The procedure is easily implemented by linear programming. Numerical performance of the estimator is investigated using both simulated and real data. In particular, the procedure is applied to analyze a breast cancer dataset and is found to perform favorably compared with existing methods.

947 citations

01 Jan 2009
TL;DR: It is essential to limit the use of antibiotics in general and fluoroquinolones and cephalosporins in particular, especially in uncomplicated infections and asymptomatic bacteriuria.
Abstract: Introduction Infections of the urinary tract (UTIs) pose a serious health problem for patients at high cost for society. UTIs are also the most frequent healthcare associated infections. E. coli is the predominating pathogen in uncomplicated UTIs while other Enterobacteriaceae and Enterococcus spp are isolated in higher frequency in patients with urological diseases. The present state of microbial resistance development is alarming and the rates of resistance are related to the amount of antibiotics used in the different countries. Particularly worrisome is the increasing resistance to broad spectrum antibiotics. It is thus essential to limit the use of antibiotics in general and fluoroquinolones and cephalosporins in particular, especially in uncomplicated infections and asymptomatic bacteriuria.

827 citations

Posted Content
TL;DR: In this article, a constrained L1 minimization method is proposed for estimating a sparse inverse covariance matrix based on a sample of $n$ iid $p$-variate random variables.
Abstract: A constrained L1 minimization method is proposed for estimating a sparse inverse covariance matrix based on a sample of $n$ iid $p$-variate random variables. The resulting estimator is shown to enjoy a number of desirable properties. In particular, it is shown that the rate of convergence between the estimator and the true $s$-sparse precision matrix under the spectral norm is $s\sqrt{\log p/n}$ when the population distribution has either exponential-type tails or polynomial-type tails. Convergence rates under the elementwise $L_{\infty}$ norm and Frobenius norm are also presented. In addition, graphical model selection is considered. The procedure is easily implementable by linear programming. Numerical performance of the estimator is investigated using both simulated and real data. In particular, the procedure is applied to analyze a breast cancer dataset. The procedure performs favorably in comparison to existing methods.

674 citations


Cited by
More filters
Book
24 Aug 2012
TL;DR: This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach, and is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.
Abstract: Today's Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach. The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic methods, the book stresses a principled model-based approach, often using the language of graphical models to specify models in a concise and intuitive way. Almost all the models described have been implemented in a MATLAB software package--PMTK (probabilistic modeling toolkit)--that is freely available online. The book is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.

8,059 citations

01 Feb 2009
TL;DR: This Secret History documentary follows experts as they pick through the evidence and reveal why the plague killed on such a scale, and what might be coming next.
Abstract: Secret History: Return of the Black Death Channel 4, 7-8pm In 1348 the Black Death swept through London, killing people within days of the appearance of their first symptoms. Exactly how many died, and why, has long been a mystery. This Secret History documentary follows experts as they pick through the evidence and reveal why the plague killed on such a scale. And they ask, what might be coming next?

5,234 citations

Posted Content
TL;DR: A theme of the text is the use of artificial regressions for estimation, reference, and specification testing of nonlinear models, including diagnostic tests for parameter constancy, serial correlation, heteroscedasticity, and other types of mis-specification.
Abstract: Offering a unifying theoretical perspective not readily available in any other text, this innovative guide to econometrics uses simple geometrical arguments to develop students' intuitive understanding of basic and advanced topics, emphasizing throughout the practical applications of modern theory and nonlinear techniques of estimation. One theme of the text is the use of artificial regressions for estimation, reference, and specification testing of nonlinear models, including diagnostic tests for parameter constancy, serial correlation, heteroscedasticity, and other types of mis-specification. Explaining how estimates can be obtained and tests can be carried out, the authors go beyond a mere algebraic description to one that can be easily translated into the commands of a standard econometric software package. Covering an unprecedented range of problems with a consistent emphasis on those that arise in applied work, this accessible and coherent guide to the most vital topics in econometrics today is indispensable for advanced students of econometrics and students of statistics interested in regression and related topics. It will also suit practising econometricians who want to update their skills. Flexibly designed to accommodate a variety of course levels, it offers both complete coverage of the basic material and separate chapters on areas of specialized interest.

4,284 citations

Journal ArticleDOI
TL;DR: An adaption of Egger regression can detect some violations of the standard instrumental variable assumptions, and provide an effect estimate which is not subject to these violations, and provides a sensitivity analysis for the robustness of the findings from a Mendelian randomization investigation.
Abstract: Background: The number of Mendelian randomization analyses including large numbers of genetic variants is rapidly increasing. This is due to the proliferation of genome-wide association studies, and the desire to obtain more precise estimates of causal effects. However, some genetic variants may not be valid instrumental variables, in particular due to them having more than one proximal phenotypic correlate (pleiotropy). Methods: We view Mendelian randomization with multiple instruments as a meta-analysis, and show that bias caused by pleiotropy can be regarded as analogous to small study bias. Causal estimates using each instrument can be displayed visually by a funnel plot to assess potential asymmetry. Egger regression, a tool to detect small study bias in meta-analysis, can be adapted to test for bias from pleiotropy, and the slope coefficient from Egger regression provides an estimate of the causal effect. Under the assumption that the association of each genetic variant with the exposure is independent of the pleiotropic effect of the variant (not via the exposure), Egger’s test gives a valid test of the null causal hypothesis and a consistent causal effect estimate even when all the genetic variants are invalid instrumental variables. Results: We illustrate the use of this approach by re-analysing two published Mendelian randomization studies of the causal effect of height on lung function, and the causal effect of blood pressure on coronary artery disease risk. The conservative nature of this approach is illustrated with these examples. Conclusions: An adaption of Egger regression (which we call MR-Egger) can detect some violations of the standard instrumental variable assumptions, and provide an effect estimate which is not subject to these violations. The approach provides a sensitivity analysis for the robustness of the findings from a Mendelian randomization investigation.

3,392 citations