scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A new approach to a legacy concern: Evaluating machine-learned Bayesian networks to predict childhood lead exposure risk from community water systems.

01 Mar 2022-Environmental Research (RTI International. P.O. Box 12194, Research Triangle Park, NC 27709-2194. Tel: 919-541-6000; e-mail: publications@rit.org; Web site: http://www.rti.org)-Vol. 204, pp 112146-112146
TL;DR: In this paper, the relationship between children's blood lead levels and drinking water system characteristics using machine-learned Bayesian networks was assessed using blood lead records from 2003 to 2017 for 40,742 children in Wake County, North Carolina.
About: This article is published in Environmental Research.The article was published on 2022-03-01 and is currently open access. It has received 4 citations till now. The article focuses on the topics: Lead (geology) & Environmental health.
Citations
More filters
Journal ArticleDOI
TL;DR: In this paper , machine-learned Bayesian network (BN) models were used to predict building-wide water lead risk in over 4,000 child care facilities in North Carolina according to maximum and 90th percentile lead levels from water lead concentrations at 22,943 taps.
Abstract: Tap water lead testing programs in the U.S. need improved methods for identifying high-risk facilities to optimize limited resources. In this study, machine-learned Bayesian network (BN) models were used to predict building-wide water lead risk in over 4,000 child care facilities in North Carolina according to maximum and 90th percentile lead levels from water lead concentrations at 22,943 taps. The performance of the BN models was compared to common alternative risk factors, or heuristics, used to inform water lead testing programs among child care facilities including building age, water source, and Head Start program status. The BN models identified a range of variables associated with building-wide water lead, with facilities that serve low-income families, rely on groundwater, and have more taps exhibiting greater risk. Models predicting the probability of a single tap exceeding each target concentration performed better than models predicting facilities with clustered high-risk taps. The BN models' Fβ-scores outperformed each of the alternative heuristics by 118-213%. This represents up to a 60% increase in the number of high-risk facilities that could be identified and up to a 49% decrease in the number of samples that would need to be collected by using BN model-informed sampling compared to using simple heuristics. Overall, this study demonstrates the value of machine-learning approaches for identifying high water lead risk that could improve lead testing programs nationwide.
Journal ArticleDOI
TL;DR: In this paper , structural topic modeling (STM) and geographic mapping is used to identify the main topics and pollutant categories being researched and the areas exposed to drinking water contaminants.
Journal ArticleDOI
TL;DR: In this article , a global dataset (∼40 countries, n = 1951) of community sourced household dust samples were used to predict whether indoor dust was elevated in Pb, expanding on recent work in the United States.
Journal ArticleDOI
TL;DR: In this article, a detailed posterior probabilities analysis was conducted to unfold the network associations among the gray-level co-occurrence matrix (GLCM) features, and the cluster prominence was selected as target node.
Abstract: Lung cancer is the second foremost cause of cancer due to which millions of deaths occur worldwide. Developing automated tools is still a challenging task to improve the prediction. This study is specifically conducted for detailed posterior probabilities analysis to unfold the network associations among the gray-level co-occurrence matrix (GLCM) features. We then ranked the features based on t-test. The Cluster Prominence is selected as target node. The association and arc analysis were determined based on mutual information. The occurrence and reliability of selected cluster states were computed. The Cluster Prominence at state ≤330.85 yielded ROC index of 100%, relative Gini index of 99.98%, and relative Gini index of 100%. The proposed method further unfolds the dynamics and to detailed analysis of computed features based on GLCM features for better understanding of the hidden dynamics for proper diagnosis and prognosis of lung cancer.
References
More filters
Journal ArticleDOI
TL;DR: A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics, is presented and it is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a random chosen non-diseased subject.
Abstract: A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics, is presented. It is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a randomly chosen non-diseased subject. Moreover, this probability of a correct ranking is the same quantity that is estimated by the already well-studied nonparametric Wilcoxon statistic. These two relationships are exploited to (a) provide rapid closed-form expressions for the approximate magnitude of the sampling variability, i.e., standard error that one uses to accompany the area under a smoothed ROC curve, (b) guide in determining the size of the sample required to provide a sufficiently reliable estimate of this area, and (c) determine how large sample sizes should be to ensure that one can statistically detect difference...

19,398 citations

Journal Article
TL;DR: The goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features, and the resulting classifier is then used to assign class labels to the testing instances where the values of the predictor features are known, but the value of the class label is unknown.
Abstract: The goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features. The resulting classifier is then used to assign class labels to the testing instances where the values of the predictor features are known, but the value of the class label is unknown. This paper describes various supervised machine learning classification techniques. Of course, a single chapter cannot be a complete review of all supervised machine learning classification algorithms (also known induction classification algorithms), yet we hope that the references cited will cover the major theoretical issues, guiding the researcher in interesting research directions and suggesting possible bias combinations that have yet to be explored.

2,535 citations

Journal ArticleDOI
TL;DR: The Kolmogorov test as discussed by the authors is a distribution-free test of goodness of fit that is sensitive to discrepancies at the tails of the distribution rather than near the median.
Abstract: Some (large sample) significance points are tabulated for a distribution-free test of goodness of fit which was introduced earlier by the authors. The test, which uses the actual observations without grouping, is sensitive to discrepancies at the tails of the distribution rather than near the median. An illustration is given, using a numerical example used previously by Birnbaum in illustrating the Kolmogorov test.

2,013 citations

Journal ArticleDOI
TL;DR: Environmental lead exposure in children who have maximal blood lead levels < 7.5 μg/dL is associated with intellectual deficits, and an inverse relationship between blood lead concentration and IQ score is found.
Abstract: Lead is a confirmed neurotoxin, but questions remain about lead-associated intellectual deficits at blood lead levels < 10 µg/dL and whether lower exposures are, for a given change in exposure, associated with greater deficits. The objective of this study was to examine the association of intelligence test scores and blood lead concentration, especially for children who had maximal measured blood lead levels < 10 µg/dL. We examined data collected from 1,333 children who participated in seven international population-based longitudinal cohort studies, followed from birth or infancy until 5‐10 years of age. The full-scale IQ score was the primary outcome measure. The geometric mean blood lead concentration of the children peaked at 17.8 µg/dL and declined to 9.4 µg/dL by 5‐7 years of age; 244 (18%) children had a maximal blood lead concentration < 10 µg/dL, and 103 (8%) had a maximal blood lead concentration < 7.5 µg/dL. After adjustment for covariates, we found an inverse relationship between blood lead concentration and IQ score. Using a loglinear model, we found a 6.9 IQ point decrement [95% confidence interval (CI), 4.2‐9.4] associated with an increase in concurrent blood lead levels from 2.4 to 30 µg/dL. The estimated IQ point decrements associated with an increase in blood lead from 2.4 to 10 µg/dL, 10 to 20 µg/dL, and 20 to 30 µg/dL were 3.9 (95% CI, 2.4‐5.3), 1.9 (95% CI, 1.2‐2.6), and 1.1 (95% CI, 0.7‐1.5), respectively. For a given increase in blood lead, the lead-associated intellectual decrement for children with a maximal blood lead level < 7.5 µg/dL was significantly greater than that observed for those with a maximal blood lead level ≥ 7.5 µg/dL (p = 0.015). We conclude that environmental lead exposure in children who have maximal blood lead levels < 7.5 µg/dL is asso

1,945 citations

Journal ArticleDOI
TL;DR: Blood lead concentrations, even those below 10 microg per deciliter, are inversely associated with children's IQ scores at three and five years of age, and associated declines in IQ are greater at these concentrations than at higher concentrations.
Abstract: background Despite dramatic declines in children’s blood lead concentrations and a lowering of the Centers for Disease Control and Prevention’s level of concern to 10 µg per deciliter (0.483 µmol per liter), little is known about children’s neurobehavioral functioning at lead concentrations below this level. methods We measured blood lead concentrations in 172 children at 6, 12, 18, 24, 36, 48, and 60 months of age and administered the Stanford–Binet Intelligence Scale at the ages of 3 and 5 years. The relation between IQ and blood lead concentration was estimated with the use of linear and nonlinear mixed models, with adjustment for maternal IQ, quality of the home environment, and other potential confounders. results The blood lead concentration was inversely and significantly associated with IQ. In the linear model, each increase of 10 µg per deciliter in the lifetime average blood lead concentration was associated with a 4.6-point decrease in IQ (P=0.004), whereas for the subsample of 101 children whose maximal lead concentrations remained below 10 µg per deciliter, the change in IQ associated with a given change in lead concentration was greater. When estimated in a nonlinear model with the full sample, IQ declined by 7.4 points as lifetime average blood lead concentrations increased from 1 to 10 µg per deciliter. conclusions Blood lead concentrations, even those below 10 µg per deciliter, are inversely associated with children’s IQ scores at three and five years of age, and associated declines in IQ are greater at these concentrations than at higher concentrations. These findings suggest that more U.S. children may be adversely affected by environmental lead than previously estimated.

1,939 citations