Author

# Tathagata Banerjee

Other affiliations: University of Calcutta

Bio: Tathagata Banerjee is an academic researcher from Indian Institute of Management Ahmedabad. The author has contributed to research in topic(s): Regression analysis & Estimator. The author has an hindex of 10, co-authored 34 publication(s) receiving 300 citation(s). Previous affiliations of Tathagata Banerjee include University of Calcutta.

##### Papers

More filters

••

TL;DR: Besides advocating the use of differencing procedures as simple methods for nonparametric and semiparametric regression analysis, the author appliesNonparametric least squares methods to take generalnonparametric constraints into account.

Abstract: (2006). Exploring Multivariate Data With the Forward Search. Journal of the American Statistical Association: Vol. 101, No. 473, pp. 398-398.

57 citations

••

TL;DR: A class of nonproportional hazards models known as generalized odds-rate class of regression models, which is general enough to include several commonly used models, such as proportional hazards model, proportional odds model, and accelerated life time model are considered.

Abstract: In the analysis of censored survival data Cox proportional hazards model (1972) is extremely popular among the practitioners. However, in many real-life situations the proportionality of the hazard ratios does not seem to be an appropriate assumption. To overcome such a problem, we consider a class of nonproportional hazards models known as generalized odds-rate class of regression models. The class is general enough to include several commonly used models, such as proportional hazards model, proportional odds model, and accelerated life time model. The theoretical and computational properties of these models have been re-examined. The propriety of the posterior has been established under some mild conditions. A simulation study is conducted and a detailed analysis of the data from a prostate cancer study is presented to further illustrate the proposed methodology.

34 citations

••

TL;DR: In this paper, the authors developed C(α) tests for interaction and main effects assuming data to be Poisson distributed and also assuming that data within the cells have extra (over/under) dispersion beyond that explained by a Poisson distribution.

Abstract: Multiple counts may occur in each cell of an a × b two-way layout (balanced or unbalanced) of two fixed factors A and B. Standard log-linear model analysis based on a Poisson distribution assumption of the cell counts is not applicable here, because of the unbalanced nature of the table or because the Poisson distribution assumption is not valid. We develop C(α) tests for interaction and main effects assuming data to be Poisson distributed and also assuming that data within the cells have extra (over/under) dispersion beyond that explained by a Poisson distribution. For this we consider an extended negative binominal distribution and a semiparametric model using the quasi-likelihood. We show that in all situations the C(α) tests for interaction are of very simple forms. For C(α) tests for the main effect in presence of no interaction, such simplification is possible only under certain conditions. A score test for detecting extra dispersion in presence of interaction is also obtained and is of sim...

31 citations

••

TL;DR: The article considers regression models for binary response in a situation when the response is subject to classification error and it is assumed that some of the covariates are unobservable, but measurements on its surrogates are available.

Abstract: The article considers regression models for binary response in a situation when the response is subject to classification error. It is also assumed that some of the covariates are unobservable, but measurements on its surrogates are available. Likelihood based analysis is developed to fit the model. A sensitivity analysis is also carried out through simulation to ascertain the effect of ignoring classification error and/or measurement error on the estimation of regression parameters. At the end, the methodology developed in this paper is illustrated through an example.

25 citations

••

TL;DR: In this paper, the authors consider cDNA microarray experiments when the cell populations have a factorial structure, and investigate the problem of their optimal designing under a baseline parametrization where the objects of interest differ from those under the more common orthogonal parameter.

Abstract: We consider cDNA microarray experiments when the cell populations have a factorial structure, and investigate the problem of their optimal designing under a baseline parametrization where the objects of interest differ from those under the more common orthogonal parametrization. First, analytical results are given for the $2\times 2$ factorial. Since practical applications often involve a more complex factorial structure, we next explore general factorials and obtain a collection of optimal designs in the saturated, that is, most economic, case. This, in turn, is seen to yield an approach for finding optimal or efficient designs in the practically more important nearly saturated cases. Thereafter, the findings are extended to the more intricate situation where the underlying model incorporates dye-coloring effects, and the role of dye-swapping is critically examined.

24 citations

##### Cited by

More filters

••

TL;DR: The saimie paper suggests how susceptible individuals could reduce their total intake of aluminium and suggests that although definite proof is still lacking, there is more than enough evidence to fuel further epidemiological investigation.

Abstract: The saimie paper suggests how susceptible individuals could reduce their total intake of aluminium. In presenting the cpidemiological evidence for a link betveen aluminium and Alzheimcr's, Nart'n suggests that although definite proof is still lacking, there is more than enough positixe evidence to fuel further epidemiological investigation. It states that such investigations might specificallx address the issue of the confounding cffect of silicon and an assessment of exposure to spccific

1,311 citations

••

TL;DR: In this paper, a Bayesian method was proposed to account for measurement errors in linear regression of astronomical data. The method is based on deriving a likelihood function for the measured data, and focus on the case when the intrinsic distribution of the independent variables can be approximated using a mixture of Gaussian functions.

Abstract: I describe a Bayesian method to account for measurement errors in linear regression of astronomical data. The method allows for heteroscedastic and possibly correlated measurement errors and intrinsic scatter in the regression relationship. The method is based on deriving a likelihood function for the measured data, and I focus on the case when the intrinsic distribution of the independent variables can be approximated using a mixture of Gaussian functions. I generalize the method to incorporate multiple independent variables, nondetections, and selection effects (e.g., Malmquist bias). A Gibbs sampler is described for simulating random draws from the probability distribution of the parameters, given the observed data. I use simulation to compare the method with other common estimators. The simulations illustrate that the Gaussian mixture model outperforms other common estimators and can effectively give constraints on the regression parameters, even when the measurement errors dominate the observed scatter, source detection fraction is low, or the intrinsic distribution of the independent variables is not a mixture of Gaussian functions. I conclude by using this method to fit the X-ray spectral slope as a function of Eddington ratio using a sample of 39 z 0.8 radio-quiet quasars. I confirm the correlation seen by other authors between the radio-quiet quasar X-ray spectral slope and the Eddington ratio, where the X-ray spectral slope softens as the Eddington ratio increases. IDL routines are made available for performing the regression.

1,129 citations

••

TL;DR: This work discusses the practice of problem solving, testing hypotheses about statistical parameters, calculating and interpreting confidence limits, tolerance limits and prediction limits, and setting up and interpreting control charts.

Abstract: THE best adjective to describe this work is \"sweep11 ing.\" The range of subject matter is so broad that it can almost be described as containing everything except fuzzy set theory. Included are explicit discussions of the basics of probability (relegated to an appendix); the practice of problem solving; testing hypotheses about statistical parameters; calculating and interpreting confidence limits; tolerance limits and prediction limits; setting up and interpreting control charts; design of experiments; analysis of variance; line and surface fitting; and maximum likelihood procedures. If you can think of something that is not in this list, then it probably means I have overlooked it.

309 citations

••

TL;DR: It is concluded that pyrosequencing can be used to investigate genetically diverse samples with high accuracy if technical errors are properly treated and probabilistic haplotype inference outperforms the counting-based calling method in both precision and recall.

Abstract: Next-generation sequencing technologies can be used to analyse genetically heterogeneous samples at unprecedented detail. The high coverage achievable with these methods enables the detection of many low-frequency variants. However, sequencing errors complicate the analysis of mixed populations and result in inflated estimates of genetic diversity. We developed a probabilistic Bayesian approach to minimize the effect of errors on the detection of minority variants. We applied it to pyrosequencing data obtained from a 1.5-kb-fragment of the HIV-1 gag/pol gene in two control and two clinical samples. The effect of PCR amplification was analysed. Error correction resulted in a two- and five-fold decrease of the pyrosequencing base substitution rate, from 0.05% to 0.03% and from 0.25% to 0.05% in the non-PCR and PCR-amplified samples, respectively. We were able to detect viral clones as rare as 0.1% with perfect sequence reconstruction. Probabilistic haplotype inference outperforms the counting-based calling method in both precision and recall. Genetic diversity observed within and between two clinical samples resulted in various patterns of phenotypic drug resistance and suggests a close epidemiological link. We conclude that pyrosequencing can be used to investigate genetically diverse samples with high accuracy if technical errors are properly treated.

223 citations