scispace - formally typeset
Search or ask a question
Author

Nitin R. Patel

Bio: Nitin R. Patel is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Contingency table & Exact test. The author has an hindex of 31, co-authored 55 publications receiving 4573 citations. Previous affiliations of Nitin R. Patel include Cytel & Indian Institute of Management Ahmedabad.


Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, the problem of finding all paths through a directed acyclic network that equal or exceed a fixed length is transformed into one of identifying all paths in a directed ACYCLIC network.
Abstract: An exact test of significance of the hypothesis that the row and column effects are independent in an r × c contingency table can be executed in principle by generalizing Fisher's exact treatment of the 2 × 2 contingency table. Each table in a conditional reference set of r × c tables with fixed marginal sums is assigned a generalized hypergeometric probability. The significance level is then computed by summing the probabilities of all tables that are no larger (on the probability scale) than the observed table. However, the computational effort required to generate all r × c contingency tables with fixed marginal sums severely limits the use of Fisher's exact test. A novel technique that considerably extends the bounds of computational feasibility of the exact test is proposed here. The problem is transformed into one of identifying all paths through a directed acyclic network that equal or exceed a fixed length. Some interesting new optimization theorems are developed in the process. The numer...

960 citations

Journal ArticleDOI
TL;DR: This work provides an alternative to the maximum likelihood method for making inferences about the parameters of the logistic regression model based on appropriate permutational distributions of sufficient statistics.
Abstract: We provide an alternative to the maximum likelihood method for making inferences about the parameters of the logistic regression model. The method is based appropriate permutational distributions of sufficient statistics. It is useful for analysing small or unbalanced binary data with covariates. It also applies to small-sample clustered binary data. We illustrate the method by analysing several biomedical data sets.

469 citations

Journal ArticleDOI
TL;DR: A quadratic time network algorithm is provided for computing an exact confidence interval for the common odds ratio in several 2×2 independent contingency tables, shown to be a considerable improvement on an existing algorithm developed by Thomas (1975), which relies on exhaustive enumeration.
Abstract: A quadratic time network algorithm is provided for computing an exact confidence interval for the common odds ratio in several 2×2 independent contingency tables. The algorithm is shown to be a considerable improvement on an existing algorithm developed by Thomas (1975), which relies on exhaustive enumeration. Problems that would formerly have consumed several CPU hours can now be solved in a few CPU seconds. The algorithm can easily handle sparse data sets where asymptotic results are suspect. The network approach, on which the algorithm is based, is also a powerful tool for exact statistical inference in other settings.

387 citations

Journal ArticleDOI
TL;DR: An efficient numerical algorithm for computing the exact significance level and a simple method for obtaining the asymptotic significance level are provided for establishing the therapeutic equivalence of two treatments that are being compared on the basis of ordered categorical data.
Abstract: This communication concerns the problem of establishing the therapeutic equivalence of two treatments that are being compared on the basis of ordered categorical data. The problem is formulated as a significance test in which the null hypothesis specifies a treatment difference. An efficient numerical algorithm for computing the exact significance level is provided, along with a simple method for obtaining the asymptotic significance level. Both methods are applied to a clinical trial of a new agent versus an active control. Guidelines for when to use the exact procedure and when to rely on asymptotic theory are provided.

335 citations

Journal ArticleDOI
TL;DR: In this paper, an efficient recursive algorithm was proposed to generate the joint and conditional distributions of the sufficient statistics for logistic regression with binary response variables, and the algorithm was shown to be computationally feasible except in a few special situations.
Abstract: Logistic regression is a commonly used technique for the analysis of retrospective and prospective epidemiological and clinical studies with binary response variables. Usually this analysis is performed using large sample approximations. When the sample size is small or the data structure sparse, the accuracy of the asymptotic approximations is in question. On other occasions, singularity of the covariance matrix of parameter estimates precludes asymptotic analysis. Under these circumstances, use of exact inferential procedures would seem to be a prudent alternative. Cox (1970) showed that exact inference on the parameters of a logistic model with binary response requires consideration of the distribution of sufficient statistics for these parameters. To date, however, resorting to the exact method has not been computationally feasible except in a few special situations. This article presents an efficient recursive algorithm that generates the joint and conditional distributions of the sufficient...

289 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The second-generation versions of PLINK will offer dramatic improvements in performance and compatibility, and for the first time, users without access to high-end computing resources can perform several essential analyses of the feature-rich and very large genetic datasets coming into use.
Abstract: Background: PLINK 1 is a widely used open-source C/C++ toolset for genome-wide association studies (GWAS) and research in population genetics. However, the steady accumulation of data from imputation and whole-genome sequencing studies has exposed a strong need for faster and scalable implementations of key functions, such as logistic regression, linkage disequilibrium estimation, and genomic distance evaluation. In addition, GWAS and population-genetic data now frequently contain genotype likelihoods, phase information, and/or multiallelic variants, none of which can be represented by PLINK 1’s primary data format. Findings: To address these issues, we are developing a second-generation codebase for PLINK. The first major release from this codebase, PLINK 1.9, introduces extensive use of bit-level parallelism, O √ n -time/constant-space Hardy-Weinberg equilibrium and Fisher’s exact tests, and many other algorithmic improvements. In combination, these changes accelerate most operations by 1-4 orders of magnitude, and allow the program to handle datasets too large to fit in RAM. We have also developed an extension to the data format which adds low-overhead support for genotype likelihoods, phase, multiallelic variants, and reference vs. alternate alleles, which is the basis of our planned second release (PLINK 2.0). Conclusions: The second-generation versions of PLINK will offer dramatic improvements in performance and compatibility. For the first time, users without access to high-end computing resources can perform several essential analyses of the feature-rich and very large genetic datasets coming into use.

7,038 citations

Journal ArticleDOI
TL;DR: Findings indicate that low EPV can lead to major problems, and the regression coefficients were biased in both positive and negative directions, and paradoxical associations (significance in the wrong direction) were increased.

6,490 citations

Journal ArticleDOI
TL;DR: PLINK as discussed by the authors is a C/C++ toolset for genome-wide association studies (GWAS) and research in population genetics, which has been widely used in the literature.
Abstract: PLINK 1 is a widely used open-source C/C++ toolset for genome-wide association studies (GWAS) and research in population genetics. However, the steady accumulation of data from imputation and whole-genome sequencing studies has exposed a strong need for even faster and more scalable implementations of key functions. In addition, GWAS and population-genetic data now frequently contain probabilistic calls, phase information, and/or multiallelic variants, none of which can be represented by PLINK 1's primary data format. To address these issues, we are developing a second-generation codebase for PLINK. The first major release from this codebase, PLINK 1.9, introduces extensive use of bit-level parallelism, O(sqrt(n))-time/constant-space Hardy-Weinberg equilibrium and Fisher's exact tests, and many other algorithmic improvements. In combination, these changes accelerate most operations by 1-4 orders of magnitude, and allow the program to handle datasets too large to fit in RAM. This will be followed by PLINK 2.0, which will introduce (a) a new data format capable of efficiently representing probabilities, phase, and multiallelic variants, and (b) extensions of many functions to account for the new types of information. The second-generation versions of PLINK will offer dramatic improvements in performance and compatibility. For the first time, users without access to high-end computing resources can perform several essential analyses of the feature-rich and very large genetic datasets coming into use.

3,513 citations

Journal ArticleDOI
07 May 1993-Science
TL;DR: Colorectal tumor DNA was examined for somatic instability at (CA)n repeats on human chromosomes 5q, 15q, 17p, and 18q, and this instability was significantly correlated with the tumor's location in the proximal colon and with increased patient survival and loss of heterozygosity.
Abstract: Colorectal tumor DNA was examined for somatic instability at (CA)n repeats on human chromosomes 5q, 15q, 17p, and 18q. Differences between tumor and normal DNA were detected in 25 of the 90 (28 percent) tumors examined. This instability appeared as either a substantial change in repeat length (often heterogeneous in nature) or a minor change (typically two base pairs). Microsatellite instability was significantly correlated with the tumor's location in the proximal colon (P = 0.003), with increased patient survival (P = 0.02), and, inversely, with loss of heterozygosity for chromosomes 5q, 17p, and 18q. These data suggest that some colorectal cancers may arise through a mechanism that does not necessarily involve loss of heterozygosity.

3,093 citations