scispace - formally typeset
Search or ask a question
Author

Tiejun Tong

Bio: Tiejun Tong is an academic researcher from Hong Kong Baptist University. The author has contributed to research in topics: Estimator & Sample size determination. The author has an hindex of 20, co-authored 105 publications receiving 4345 citations. Previous affiliations of Tiejun Tong include University of California, Santa Barbara & Yale University.


Papers
More filters
Journal ArticleDOI
TL;DR: In this article, the authors proposed a new estimation method by incorporating the sample size and compared the estimators of the sample mean and standard deviation under all three scenarios and presented some suggestions on which scenario is preferred in real-world applications.
Abstract: In systematic reviews and meta-analysis, researchers often pool the results of the sample mean and standard deviation from a set of similar clinical trials. A number of the trials, however, reported the study using the median, the minimum and maximum values, and/or the first and third quartiles. Hence, in order to combine results, one may have to estimate the sample mean and standard deviation for such trials. In this paper, we propose to improve the existing literature in several directions. First, we show that the sample standard deviation estimation in Hozo et al.’s method (BMC Med Res Methodol 5:13, 2005) has some serious limitations and is always less satisfactory in practice. Inspired by this, we propose a new estimation method by incorporating the sample size. Second, we systematically study the sample mean and standard deviation estimation problem under several other interesting settings where the interquartile range is also available for the trials. We demonstrate the performance of the proposed methods through simulation studies for the three frequently encountered scenarios, respectively. For the first two scenarios, our method greatly improves existing methods and provides a nearly unbiased estimate of the true sample standard deviation for normal data and a slightly biased estimate for skewed data. For the third scenario, our method still performs very well for both normal data and skewed data. Furthermore, we compare the estimators of the sample mean and standard deviation under all three scenarios and present some suggestions on which scenario is preferred in real-world applications. In this paper, we discuss different approximation methods in the estimation of the sample mean and standard deviation and propose some new estimation methods to improve the existing literature. We conclude our work with a summary table (an Excel spread sheet including all formulas) that serves as a comprehensive guidance for performing meta-analysis in different situations.

4,745 citations

Posted Content
TL;DR: This work proposes a new estimation method by incorporating the sample size that greatly improves existing methods and provides a nearly unbiased estimate of the true sample standard deviation for normal data and a slightly biased estimate for skewed data.
Abstract: In systematic reviews and meta-analysis, researchers often pool the results of the sample mean and standard deviation from a set of similar clinical trials. A number of the trials, however, reported the study using the median, the minimum and maximum values, and/or the first and third quartiles. Hence, in order to combine results, one may have to estimate the sample mean and standard deviation for such trials. In this paper, we propose to improve the existing literature in several directions. First, we show that the sample standard deviation estimation in Hozo et al. (2005) has some serious limitations and is always less satisfactory in practice. Inspired by this, we propose a new estimation method by incorporating the sample size. Second, we systematically study the sample mean and standard deviation estimation problem under more general settings where the first and third quartiles are also available for the trials. Through simulation studies, we demonstrate that the proposed methods greatly improve the existing methods and enrich the literature. We conclude our work with a summary table that serves as a comprehensive guidance for performing meta-analysis in different situations.

1,812 citations

Journal ArticleDOI
TL;DR: This article investigates the optimal estimation of the sample mean for meta-analysis from both theoretical and empirical perspectives and proposes estimators capable to serve as “rules of thumb” and will be widely applied in evidence-based medicine.
Abstract: The era of big data is coming, and evidence-based medicine is attracting increasing attention to improve decision making in medical practice via integrating evidence from well designed and conducted clinical research. Meta-analysis is a statistical technique widely used in evidence-based medicine for analytically combining the findings from independent clinical trials to provide an overall estimation of a treatment effectiveness. The sample mean and standard deviation are two commonly used statistics in meta-analysis but some trials use the median, the minimum and maximum values, or sometimes the first and third quartiles to report the results. Thus, to pool results in a consistent format, researchers need to transform those information back to the sample mean and standard deviation. In this article, we investigate the optimal estimation of the sample mean for meta-analysis from both theoretical and empirical perspectives. A major drawback in the literature is that the sample size, needless to say its imp...

1,353 citations

Journal ArticleDOI
TL;DR: This paper proposes to further advance the literature by developing a smoothly weighted estimator for the sample standard deviation that fully utilizes the sample size information and shows that the new estimator provides a more accurate estimate for normal data and also performs favorably for non-normal data.
Abstract: When reporting the results of clinical studies, some researchers may choose the five-number summary (including the sample median, the first and third quartiles, and the minimum and maximum values) rather than the sample mean and standard deviation (SD), particularly for skewed data For these studies, when included in a meta-analysis, it is often desired to convert the five-number summary back to the sample mean and SD For this purpose, several methods have been proposed in the recent literature and they are increasingly used nowadays In this article, we propose to further advance the literature by developing a smoothly weighted estimator for the sample SD that fully utilizes the sample size information For ease of implementation, we also derive an approximation formula for the optimal weight, as well as a shortcut formula for the sample SD Numerical results show that our new estimator provides a more accurate estimate for normal data and also performs favorably for non-normal data Together with the optimal sample mean estimator in Luo et al, our new methods have dramatically improved the existing methods for data transformation, and they are capable to serve as "rules of thumb" in meta-analysis for studies reported with the five-number summary Finally for practical use, an Excel spreadsheet and an online calculator are also provided for implementing our optimal estimators

146 citations

Journal ArticleDOI
TL;DR: In this paper, the error variance is estimated as the intercept in a simple linear regression model with squared differences of paired observations as the dependent variable and squared distances between the paired covariates as the regressor.
Abstract: SUMMARY We propose a new estimator for the error variance in a nonparametric regression model. We estimate the error variance as the intercept in a simple linear regression model with squared differences of paired observations as the dependent variable and squared distances between the paired covariates as the regressor. For the special case of a one-dimensional domain with equally spaced design points, we show that our method reaches an asymptotic optimal rate which is not achieved by some existing methods. We conduct extensive simulations to evaluate finite-sample performance of our method and compare it with existing methods. Our method can be extended to nonparametric regression models with multivariate functions defined on arbitrary subsets of normed spaces, possibly observed on unequally spaced or clustered designed points.

67 citations


Cited by
More filters
Book ChapterDOI
30 Dec 2011
TL;DR: This table lists the most common surnames in the United States used to be Anglicised as "United States", then changed to "United Kingdom" in the 1990s.
Abstract: OUTPU T 29 OUTPU T 30 OUTPU T 31 OUTPU T 32 OUTPU T 25 OUTPU T 26 OUTPU T 27 OUTPU T 28 OUTPU T 21 OUTPU T 22 OUTPU T 23 OUTPU T 24 OUTPU T 17 OUTPU T 18 OUTPU T 19 OUTPU T 20 OUTPU T 13 OUTPU T 14 OUTPU T 15 OUTPU T 16 OUTPU T 9 OUTPU T 10 OUTPU T 11 OUTPU T 12 OUTPU T 5 OUTPU T 6 OUTPU T 7 OUTPU T 8 OUTPU T 1 OUTPU T 2 OUTPU T 3 OUTPU T 4 29 30 31 32 25 26 27 28 21 22 23 24 17 18 19 20 13 14 15 16 9

1,662 citations

Journal ArticleDOI

1,484 citations

Journal ArticleDOI
TL;DR: An in depth review of rare event detection from an imbalanced learning perspective and a comprehensive taxonomy of the existing application domains of im balanced learning are provided.
Abstract: 527 articles related to imbalanced data and rare events are reviewed.Viewing reviewed papers from both technical and practical perspectives.Summarizing existing methods and corresponding statistics by a new taxonomy idea.Categorizing 162 application papers into 13 domains and giving introduction.Some opening questions are discussed at the end of this manuscript. Rare events, especially those that could potentially negatively impact society, often require humans decision-making responses. Detecting rare events can be viewed as a prediction task in data mining and machine learning communities. As these events are rarely observed in daily life, the prediction task suffers from a lack of balanced data. In this paper, we provide an in depth review of rare event detection from an imbalanced learning perspective. Five hundred and seventeen related papers that have been published in the past decade were collected for the study. The initial statistics suggested that rare events detection and imbalanced learning are concerned across a wide range of research areas from management science to engineering. We reviewed all collected papers from both a technical and a practical point of view. Modeling methods discussed include techniques such as data preprocessing, classification algorithms and model evaluation. For applications, we first provide a comprehensive taxonomy of the existing application domains of imbalanced learning, and then we detail the applications for each category. Finally, some suggestions from the reviewed papers are incorporated with our experiences and judgments to offer further research directions for the imbalanced learning and rare event detection fields.

1,448 citations

Journal ArticleDOI
TL;DR: This article investigates the optimal estimation of the sample mean for meta-analysis from both theoretical and empirical perspectives and proposes estimators capable to serve as “rules of thumb” and will be widely applied in evidence-based medicine.
Abstract: The era of big data is coming, and evidence-based medicine is attracting increasing attention to improve decision making in medical practice via integrating evidence from well designed and conducted clinical research. Meta-analysis is a statistical technique widely used in evidence-based medicine for analytically combining the findings from independent clinical trials to provide an overall estimation of a treatment effectiveness. The sample mean and standard deviation are two commonly used statistics in meta-analysis but some trials use the median, the minimum and maximum values, or sometimes the first and third quartiles to report the results. Thus, to pool results in a consistent format, researchers need to transform those information back to the sample mean and standard deviation. In this article, we investigate the optimal estimation of the sample mean for meta-analysis from both theoretical and empirical perspectives. A major drawback in the literature is that the sample size, needless to say its imp...

1,353 citations

Journal ArticleDOI
01 Jan 2021
TL;DR: Although SARS-CoV-2 RNA shedding in respiratory and stool samples can be prolonged, duration of viable virus is relatively short-lived.
Abstract: Summary Background Viral load kinetics and duration of viral shedding are important determinants for disease transmission. We aimed to characterise viral load dynamics, duration of viral RNA shedding, and viable virus shedding of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in various body fluids, and to compare SARS-CoV-2, SARS-CoV, and Middle East respiratory syndrome coronavirus (MERS-CoV) viral dynamics. Methods In this systematic review and meta-analysis, we searched databases, including MEDLINE, Embase, Europe PubMed Central, medRxiv, and bioRxiv, and the grey literature, for research articles published between Jan 1, 2003, and June 6, 2020. We included case series (with five or more participants), cohort studies, and randomised controlled trials that reported SARS-CoV-2, SARS-CoV, or MERS-CoV infection, and reported viral load kinetics, duration of viral shedding, or viable virus. Two authors independently extracted data from published studies, or contacted authors to request data, and assessed study quality and risk of bias using the Joanna Briggs Institute Critical Appraisal Checklist tools. We calculated the mean duration of viral shedding and 95% CIs for every study included and applied the random-effects model to estimate a pooled effect size. We used a weighted meta-regression with an unrestricted maximum likelihood model to assess the effect of potential moderators on the pooled effect size. This study is registered with PROSPERO, CRD42020181914. Findings 79 studies (5340 individuals) on SARS-CoV-2, eight studies (1858 individuals) on SARS-CoV, and 11 studies (799 individuals) on MERS-CoV were included. Mean duration of SARS-CoV-2 RNA shedding was 17·0 days (95% CI 15·5–18·6; 43 studies, 3229 individuals) in upper respiratory tract, 14·6 days (9·3–20·0; seven studies, 260 individuals) in lower respiratory tract, 17·2 days (14·4–20·1; 13 studies, 586 individuals) in stool, and 16·6 days (3·6–29·7; two studies, 108 individuals) in serum samples. Maximum shedding duration was 83 days in the upper respiratory tract, 59 days in the lower respiratory tract, 126 days in stools, and 60 days in serum. Pooled mean SARS-CoV-2 shedding duration was positively associated with age (slope 0·304 [95% CI 0·115–0·493]; p=0·0016). No study detected live virus beyond day 9 of illness, despite persistently high viral loads, which were inferred from cycle threshold values. SARS-CoV-2 viral load in the upper respiratory tract appeared to peak in the first week of illness, whereas that of SARS-CoV peaked at days 10–14 and that of MERS-CoV peaked at days 7–10. Interpretation Although SARS-CoV-2 RNA shedding in respiratory and stool samples can be prolonged, duration of viable virus is relatively short-lived. SARS-CoV-2 titres in the upper respiratory tract peak in the first week of illness. Early case finding and isolation, and public education on the spectrum of illness and period of infectiousness are key to the effective containment of SARS-CoV-2. Funding None.

1,061 citations