scispace - formally typeset
Search or ask a question
Author

Margaret S. Pepe

Other affiliations: University of Washington
Bio: Margaret S. Pepe is an academic researcher from Fred Hutchinson Cancer Research Center. The author has contributed to research in topics: Receiver operating characteristic & Population. The author has an hindex of 78, co-authored 202 publications receiving 27114 citations. Previous affiliations of Margaret S. Pepe include University of Washington.


Papers
More filters
Journal ArticleDOI
TL;DR: Obese children under three years of age without obese parents are at low risk for obesity in adulthood, but among older children, obesity is an increasingly important predictor of adult obesity, regardless of whether the parents are obese.
Abstract: Background Childhood obesity increases the risk of obesity in adulthood, but how parental obesity affects the chances of a child's becoming an obese adult is unknown. We investigated the risk of obesity in young adulthood associated with both obesity in childhood and obesity in one or both parents. Methods Height and weight measurements were abstracted from the records of 854 subjects born at a health maintenance organization in Washington State between 1965 and 1971. Their parents' medical records were also reviewed. Childhood obesity was defined as a body-mass index at or above the 85th percentile for age and sex, and obesity in adulthood as a mean body-mass index at or above 27.8 for men and 27.3 for women. Results In young adulthood (defined as 21 to 29 years of age), 135 subjects (16 percent) were obese. Among those who were obese during childhood, the chance of obesity in adulthood ranged from 8 percent for 1- or 2-year-olds without obese parents to 79 percent for 10-to-14-year-olds with at least on...

3,994 citations

Journal ArticleDOI
TL;DR: This work proposes summarizing the discrimination potential of a marker X, measured at baseline (t = 0), by calculating ROC curves for cumulative disease or death incidence by time t, which is presented as ROC(t), and presents an example where ROC (t) is used to compare a standard and a modified flow cytometry measurement for predicting survival after detection of breast cancer.
Abstract: ROC curves are a popular method for displaying sensitivity and specificity of a continuous diagnostic marker, X, for a binary disease variable, D. However, many disease outcomes are time dependent, D(t), and ROC curves that vary as a function of time may be more appropriate. A common example of a time-dependent variable is vital status, where D(t) = 1 if a patient has died prior to time t and zero otherwise. We propose summarizing the discrimination potential of a marker X, measured at baseline (t = 0), by calculating ROC curves for cumulative disease or death incidence by time t, which we denote as ROC(t). A typical complexity with survival data is that observations may be censored. Two ROC curve estimators are proposed that can accommodate censored data. A simple estimator is based on using the Kaplan-Meier estimator for each possible subset X > c. However, this estimator does not guarantee the necessary condition that sensitivity and specificity are monotone in X. An alternative estimator that does guarantee monotonicity is based on a nearest neighbor estimator for the bivariate distribution function of (X, T), where T represents survival time (Akritas, M. J., 1994, Annals of Statistics 22, 1299-1327). We present an example where ROC(t) is used to compare a standard and a modified flow cytometry measurement for predicting survival after detection of breast cancer and an example where the ROC(t) curve displays the impact of modifying eligibility criteria for sample size and power in HIV prevention trials.

2,177 citations

Journal ArticleDOI
TL;DR: The purpose of this commentary is to define a formal structure to guide the process of biomarker development and to provide a checklist of issues that should be addressed at each phase of development before proceeding to the next.
Abstract: Recent developments in such areas of research as geneexpression microarrays, proteomics, and immunology offer new approaches to cancer screening (1). The surge in research to develop cancer-screening biomarkers prompted the establishment of the Early Detection Research Network (EDRN) by the National Cancer Institute (2). The purpose of the EDRN is to coordinate research among biomarker-development laboratories, biomarker-validation laboratories, clinical repositories, and population-screening programs. By coordination of research efforts, the hope is to facilitate collaboration and to promote efficiency and rigor in research. With the goals of the EDRN in mind, the purpose of this commentary is to define a formal structure to guide the process of biomarker development. We categorize the development into five phases that a biomarker needs to pass through to produce a useful population-screening tool. The phases of research are generally ordered according to the strength of evidence that each provides in favor of the biomarker, from weakest to strongest. In addition, the results of earlier phases are generally necessary to design later phases. Therapeutic drug development has had such a structure in place for some time (3). The clinical phases of testing a new cancer drug are as follows: phase 1, determinations of toxicity, pharmacokinetics, and optimal dose levels; phase 2, determinations of biologic efficacy; and phase 3, definitive controlled trials of effects on clinical endpoints. For each phase, guidelines exist for subject selection, outcome measures, relevant comparisons for evaluating study results, and so forth. Although deviations are common, the basic structure facilitates coherent, thorough, and efficient development of new therapies. A phased approach has also been proposed for prevention trials (4,5). In a similar vein, we hope that our proposed guidelines or some related construct will facilitate the development of biomarker-based screening tools for early detection of cancer. Although deviations from these guidelines may be necessary in specific applications, our proposal will, at the minimum, provide a checklist of issues that should be addressed at each phase of development before proceeding to the next.

1,491 citations

Journal ArticleDOI
TL;DR: A new approach to the design and analysis of Phase 1 clinical trials in cancer and a particularly simple model is looked at that enables the use of models whose only requirements are that locally they reasonably well approximate the true probability of toxic response.
Abstract: This paper looks at a new approach to the design and analysis of Phase 1 clinical trials in cancer. The basic idea and motivation behind the approach stem from an attempt to reconcile the needs of dose-finding experimentation with the ethical demands of established medical practice. It is argued that for these trials the particular shape of the dose toxicity curve is of little interest. Attention focuses rather on identifying a dose with a given targeted toxicity level and on concentrating experimentation at that which all current available evidence indicates to be the best estimate of this level. Such an approach not only makes an explicit attempt to meet ethical requirements but also enables the use of models whose only requirements are that locally (i.e., around the dose corresponding to the targeted toxicity level) they reasonably well approximate the true probability of toxic response. Although a large number of models could be contemplated, we look at a particularly simple one. Extensive simulations show the model to have real promise.

1,402 citations

Journal ArticleDOI
TL;DR: An illustration of the relation between odds ratios and receiver operating characteristic curves shows, for example, that a marker with an odds ratio of as high as 3 is in fact a very poor classification tool.
Abstract: A marker strongly associated with outcome (or disease) is often assumed to be effective for classifying persons according to their current or future outcome. However, for this assumption to be true, the associated odds ratio must be of a magnitude rarely seen in epidemiologic studies. In this paper, an illustration of the relation between odds ratios and receiver operating characteristic curves shows, for example, that a marker with an odds ratio of as high as 3 is in fact a very poor classification tool. If a marker identifies 10% of controls as positive (false positives) and has an odds ratio of 3, then it will correctly identify only 25% of cases as positive (true positives). The authors illustrate that a single measure of association such as an odds ratio does not meaningfully describe a marker's ability to classify subjects. Appropriate statistical methods for assessing and reporting the classification power of a marker are described. In addition, the serious pitfalls of using more traditional methods based on parameters in logistic regression models are illustrated.

1,294 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This article proposes methods for combining estimates of the cause-specific hazard functions under the proportional hazards formulation, but these methods do not allow the analyst to directly assess the effect of a covariate on the marginal probability function.
Abstract: With explanatory covariates, the standard analysis for competing risks data involves modeling the cause-specific hazard functions via a proportional hazards assumption Unfortunately, the cause-specific hazard function does not have a direct interpretation in terms of survival probabilities for the particular failure type In recent years many clinicians have begun using the cumulative incidence function, the marginal failure probabilities for a particular cause, which is intuitively appealing and more easily explained to the nonstatistician The cumulative incidence is especially relevant in cost-effectiveness analyses in which the survival probabilities are needed to determine treatment utility Previously, authors have considered methods for combining estimates of the cause-specific hazard functions under the proportional hazards formulation However, these methods do not allow the analyst to directly assess the effect of a covariate on the marginal probability function In this article we pro

11,109 citations

Journal ArticleDOI
TL;DR: pROC as mentioned in this paper is a package for R and S+ that contains a set of tools displaying, analyzing, smoothing and comparing ROC curves in a user-friendly, object-oriented and flexible interface.
Abstract: Receiver operating characteristic (ROC) curves are useful tools to evaluate classifiers in biomedical and bioinformatics applications. However, conclusions are often reached through inconsistent use or insufficient statistical analysis. To support researchers in their ROC curves analysis we developed pROC, a package for R and S+ that contains a set of tools displaying, analyzing, smoothing and comparing ROC curves in a user-friendly, object-oriented and flexible interface. With data previously imported into the R or S+ environment, the pROC package builds ROC curves and includes functions for computing confidence intervals, statistical tests for comparing total or partial area under the curve or the operating points of different classifiers, and methods for smoothing ROC curves. Intermediary and final results are visualised in user-friendly interfaces. A case study based on published clinical and biomarker data shows how to perform a typical ROC analysis with pROC. pROC is a package for R and S+ specifically dedicated to ROC analysis. It proposes multiple statistical tests to compare ROC curves, and in particular partial areas under the curve, allowing proper ROC interpretation. pROC is available in two versions: in the R programming language or with a graphical user interface in the S+ statistical software. It is accessible at http://expasy.org/tools/pROC/ under the GNU General Public License. It is also distributed through the CRAN and CSAN public repositories, facilitating its installation.

8,052 citations

Journal ArticleDOI
TL;DR: Convergence of Probability Measures as mentioned in this paper is a well-known convergence of probability measures. But it does not consider the relationship between probability measures and the probability distribution of probabilities.
Abstract: Convergence of Probability Measures. By P. Billingsley. Chichester, Sussex, Wiley, 1968. xii, 253 p. 9 1/4“. 117s.

5,689 citations

Journal ArticleDOI
TL;DR: Two new measures, one based on integrated sensitivity and specificity and the other on reclassification tables, are introduced that offer incremental information over the AUC and are proposed to be considered in addition to the A UC when assessing the performance of newer biomarkers.
Abstract: Identification of key factors associated with the risk of developing cardiovascular disease and quantification of this risk using multivariable prediction algorithms are among the major advances made in preventive cardiology and cardiovascular epidemiology in the 20th century. The ongoing discovery of new risk markers by scientists presents opportunities and challenges for statisticians and clinicians to evaluate these biomarkers and to develop new risk formulations that incorporate them. One of the key questions is how best to assess and quantify the improvement in risk prediction offered by these new models. Demonstration of a statistically significant association of a new biomarker with cardiovascular risk is not enough. Some researchers have advanced that the improvement in the area under the receiver-operating-characteristic curve (AUC) should be the main criterion, whereas others argue that better measures of performance of prediction models are needed. In this paper, we address this question by introducing two new measures, one based on integrated sensitivity and specificity and the other on reclassification tables. These new measures offer incremental information over the AUC. We discuss the properties of these new measures and contrast them with the AUC. We also develop simple asymptotic tests of significance. We illustrate the use of these measures with an example from the Framingham Heart Study. We propose that scientists consider these types of measures in addition to the AUC when assessing the performance of newer biomarkers.

5,651 citations

Journal ArticleDOI
13 Dec 2016-JAMA
TL;DR: An algorithm based on deep machine learning had high sensitivity and specificity for detecting referable diabetic retinopathy and diabetic macular edema in retinal fundus photographs from adults with diabetes.
Abstract: Importance Deep learning is a family of computational methods that allow an algorithm to program itself by learning from a large set of examples that demonstrate the desired behavior, removing the need to specify rules explicitly. Application of these methods to medical imaging requires further assessment and validation. Objective To apply deep learning to create an algorithm for automated detection of diabetic retinopathy and diabetic macular edema in retinal fundus photographs. Design and Setting A specific type of neural network optimized for image classification called a deep convolutional neural network was trained using a retrospective development data set of 128 175 retinal images, which were graded 3 to 7 times for diabetic retinopathy, diabetic macular edema, and image gradability by a panel of 54 US licensed ophthalmologists and ophthalmology senior residents between May and December 2015. The resultant algorithm was validated in January and February 2016 using 2 separate data sets, both graded by at least 7 US board-certified ophthalmologists with high intragrader consistency. Exposure Deep learning–trained algorithm. Main Outcomes and Measures The sensitivity and specificity of the algorithm for detecting referable diabetic retinopathy (RDR), defined as moderate and worse diabetic retinopathy, referable diabetic macular edema, or both, were generated based on the reference standard of the majority decision of the ophthalmologist panel. The algorithm was evaluated at 2 operating points selected from the development set, one selected for high specificity and another for high sensitivity. Results The EyePACS-1 data set consisted of 9963 images from 4997 patients (mean age, 54.4 years; 62.2% women; prevalence of RDR, 683/8878 fully gradable images [7.8%]); the Messidor-2 data set had 1748 images from 874 patients (mean age, 57.6 years; 42.6% women; prevalence of RDR, 254/1745 fully gradable images [14.6%]). For detecting RDR, the algorithm had an area under the receiver operating curve of 0.991 (95% CI, 0.988-0.993) for EyePACS-1 and 0.990 (95% CI, 0.986-0.995) for Messidor-2. Using the first operating cut point with high specificity, for EyePACS-1, the sensitivity was 90.3% (95% CI, 87.5%-92.7%) and the specificity was 98.1% (95% CI, 97.8%-98.5%). For Messidor-2, the sensitivity was 87.0% (95% CI, 81.1%-91.0%) and the specificity was 98.5% (95% CI, 97.7%-99.1%). Using a second operating point with high sensitivity in the development set, for EyePACS-1 the sensitivity was 97.5% and specificity was 93.4% and for Messidor-2 the sensitivity was 96.1% and specificity was 93.9%. Conclusions and Relevance In this evaluation of retinal fundus photographs from adults with diabetes, an algorithm based on deep machine learning had high sensitivity and specificity for detecting referable diabetic retinopathy. Further research is necessary to determine the feasibility of applying this algorithm in the clinical setting and to determine whether use of the algorithm could lead to improved care and outcomes compared with current ophthalmologic assessment.

4,810 citations