scispace - formally typeset
Search or ask a question

Showing papers on "Decision tree model published in 2023"


Proceedings ArticleDOI
29 Apr 2023
TL;DR: Wang et al. as discussed by the authors used decision tree classification algorithm the employment situation of college students based on their past educational background and work experience, and took the enrollment, student status management and employment data of Yunnan Institute of Mechanical and Electrical Technology as samples, analyzed the employment factors of students through learning decision tree classifier, and obtains the scores of student enrollment, comprehensive quality evaluation
Abstract: Decision tree is a data analysis method that can be used to predict the future behavior of individuals based on their past behavior. This technology is very effective in predicting the future employment situation of students, because it takes into account all relevant factors that affect a person's job prospects. In this paper, I will use decision tree classification algorithm the employment situation of college students based on their past educational background and work experience. This paper is based on the quality requirements of higher vocational education, based on the classification analysis algorithm in data mining and machine learning, and taking the enrollment, student status management and employment data of Yunnan Institute of Mechanical and Electrical Technology as samples, analyzes the employment factors of students through learning decision tree classifier, and obtains the scores of student enrollment, comprehensive quality evaluation The prediction model of the region and nature of the employment unit on the smooth employment and the type of the employment unit is designed to provide a scientific basis for the employment guidance and talent training program of higher vocational colleges, and has certain practical value.

Journal ArticleDOI
06 Jul 2023-Water
TL;DR: Choi et al. as mentioned in this paper proposed a new data-driven approach combining various recurrent neural networks and decision tree-based algorithms to predict river runoff in ungauged basins.
Abstract: River runoff predictions in ungauged basins are one of the major challenges in hydrology. In the past, the approach using a physical-based conceptual model was the main approach, but recently, a solution using a data-driven model has been evaluated as more appropriate through several studies. In this study, a new data-driven approach combining various recurrent neural networks and decision tree-based algorithms is proposed. An advantage of recurrent neural networks is that they can learn long-term dependencies between inputs and outputs provided to the network. Decision tree-based algorithms, combined with recurrent neural networks, serve to reflect topographical information treated as constants and can identify the importance of input features. We tested the proposed approach using data from 25 watersheds publicly available on the Korean government’s website. The potential of the proposed approach as a regional hydrologic model is evaluated in the view that one regional model predicts river runoff in various watersheds using the leave-one-out cross-validation regionalization setup.

Journal ArticleDOI
TL;DR: In this paper , a decision tree-based model combining IPI and TME cells in diffuse large B-cell lymphoma (DLBCLBCL) was proposed to refine risk stratification integrating TME and clinical features.
Abstract: In diffuse large B-cell lymphoma (DLBCL), the heterogeneity of response to standard first-line therapy1 largely relies on the tremendous biological complexity of the disease, claiming for an urgent improvement of our capacity to characterize it at diagnosis to guide treatments. Among emerging prognosticators, peculiar cytotypes of tumor microenvironment (TME) and relative gene sets were shown to predict patients’ risk.2–5 We previously recognized prognostic genes reflecting patterns of stromal (myofibroblasts [Myo]) and immune elements (CD4+ T and dendritic cells) intriguingly associated with macrophages (Mo).6 These latter as well as the choice of reproducible biomarkers to capture their functional heterogeneity remain an object of intense debate.7 Starting from the notion that Mo polarization and inflammatory response may be influenced by oxysterol levels via nuclear “Liver X receptors” (LXRs),8,9 we recently recognized the prognostic role of LXRα (NR1H3) in DLBCL, prompting future investigation on new LXR agonists for therapeutic purposes.10 Here, we sought to corroborate our previous observations using a decision-tree approach on large independent DLBCL cohorts to refine risk stratification integrating TME and clinical features. In doing so, we validated NR1H3 as an intriguing M1-Mo-related prognosticator, envisioning its potential as a molecular predictor toward future approaches of Mo-targeting drugs. We applied the deconvolution algorithm CIBERSORT4 in combination with a decision tree–based approach on 2 large independent DLBCL cohorts to recognize the most relevant features among clinical and TME prognosticators. The whole methodological pipeline is detailed in Figure 1A.Figure 1.: Decision tree–based model combining IPI and TME cells in DLBCL. (A) Schematic description of the pipeline used for data processing. (B) Histograms showing the relative percentages of each cell type in the M24 customized signature as detected by CIBERSORT analysis deconvolution of the GSE117556 dataset (n = 928). Decision-tree depicting results of recursive models applied on clinical and biological features built on PFS (C) and OS (D) in the training set. The most relevant groups are shown along with survival plots (log-rank test, P < 10−3). Decision-tree showing results of recursive models applied on clinical features (Rev-IPI variables), ACTA2 and NR1H3 surrogating Myofibroblasts and M1 macrophages, respectively, built on PFS (E) and OS (F). Most relevant groups are shown along with survival plots (log-rank test, P < 10−3). Kaplan-Meier survival plots for PFS (G–H) and OS (I–J) of ACTA2 and NR1H3 dichotomized expression combined with Rev-IPI in the validation cohort (GSE98588, n = 98). Adjusted P values as derived from pairwise comparisons using log-rank test are shown. ***P ≤ 0.001; **P ≤ 0.01; *P ≤ 0.05; ns (not significant) P > 0.05. AAstage = Ann Arbor Stage; ABC = activated B cell; ECOGps = eastern cooperative oncology group performance status; GBC = germinal B center-like; IPI or Rev-IPI = Revised International Prognostic Index; LDH>/≤ULN = lactate dehydrogenase >/≤ upper level of normal; NK = natural killer; OS = overall survival; PFS = progression-free survival; TME = tumor microenvironment.Briefly, GEP data from 928 DLBCL cases (training set; GSE117556, Table 1) were deconvoluted to quantify the relative percentages of 24 TME cytotypes (Suppl. Table S1, Figure 1B) and dichotomized according to maximally selected rank statistics (surv_cutpoint function implemented in surminer R package) based on both progression-free survival (PFS) and overall survival (OS). The resulting variables in keeping with known clinical prognosticators (cell of origin [COO] subtypes and revised International Prognostic Index [Rev-IPI]) and other clinical features (gender, age at diagnosis, lactate dehydrogenase > upper limit of normal, Eastern Cooperative Group performance status, Ann Arbor stage, extranodal involvements, and treatment arm) underwent univariate feature selection by applying log-rank and Cox proportional hazard test on PFS and OS (Suppl. Table S2). Then, the selected significant features were included in a recursive decision-tree model (partykit package implemented in R software).11 To increase the translational power of the model, it was reapplied by substituting prognostic cell types with consistent molecular surrogates (smooth muscle alpha-2 actin encoded by ACTA2 gene for Myo and NR1H3 for M1-Mo). The expression values of these 2 genes were dichotomized according to a cutoff identified by maximally selected rank statistics in 2 groups (high or low) before reapplying the recursive model. Finally, we validated the results on an independent case set of 137 DLBCL (GSE98588)12 (Suppl. material) using Rev-IPI, ACTA2, and NR1H3 expression value dichotomized according to the cutoff previously identified in the training set. To overcome the batch effect on normalization derived from 2 different array platforms, the expression values were properly scaled. To validate our methods, we used the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis Or Diagnosis (TRIPOD) criteria (Suppl. TRIPOD). Table 1 - Patients’ Characteristics Training Set Validation Set No. of patients 928 137 P Value Gender, N of males (%) 517 (55.7) 76 (55.5) 1 Age, y: median (IQR) 44.0 (35.0–50.0) 45.5 (36.0–55.0) 0.018 LDH > ULN, N (%) 388 (41.8) 52 (50.0) 0.134 ECOGps 3/4, N (%) 105 (11.3) 21 (19.4) 0.022 AAStage III/IV, N (%) 638 (68.8) 69 (63.9) 0.35 Extranodal involvement, N (%) 521 (56.1) 28 (25.9) <0.001 Rev IPI (%) 0.029 Poor 446 (48.1) 44 (42.3) Good 428 (46.1) 47 (45.2) Very good 54 (5.8) 13 (12.5) NA 0 33 COO (%) <0.001 ABC 255 (27.5) 63 (46.0) GCB 543 (58.5) 54 (39.4) UNC 130 (14.0) 20 (14.6) Molecular high grade (%) NA Double hit/triple hit 35 (9.7) NA MYC-normal 309 (85.8) NA MYC-rearranged 14 (3.9) NA TME cytotypes Tumor cells ABC low a , N (%) 552 (59.5) 109 (79.6) <0.001 Tumor cells ABC low b , N (%) 552 (59.5) 109 (79.6) <0.001 Tumor cells GCB low a , N (%) 834 (89.9) 29 (21.2) <0.001 Tumor cells GCB low b , N (%) 831 (89.5) 29 (21.2) <0.001 Naive B-cells low a , N (%) 792 (85.3) 87 (63.5) <0.001 Naive B-cells low b , N (%) 777 (83.7) 106 (77.4) 0.085 Memory B-cells low a , N (%) 107 (11.5) 94 (68.6) <0.001 Memory B-cells low b , N (%) 107 (11.5) 19 (13.9) 0.516 Plasma cells low a , N (%) 237 (25.5) 107 (78.1) <0.001 Plasma cells low b , N (%) 699 (75.3) 107 (78.1) 0.548 CD8+ cells low a , N (%) 699 (75.3) 55 (40.1) <0.001 CD8+ cells low b , N (%) 383 (41.3) 74 (54.0) 0.007 CD4+ cells low a , N (%) 417 (44.9) 77 (56.2) 0.017 CD4+ cells low b , N (%) 680 (73.3) 100 (73.0) 1 Gamma delta TC low a , N (%) 765 (82.4) 124 (90.5) 0.024 Gamma delta TC low b , N (%) 763 (82.2) 124 (90.5) 0.021 Follicular helper TC low a , N (%) 413 (44.5) 35 (25.5) <0.001 Follicular helper TC low b , N (%) 504 (54.3) 35 (25.5) <0.001 Regulatory cells low a , N (%) 518 (55.8) 137 (100.0) <0.001 Regulatory cells low b , N (%) 808 (87.1) 137 (100.0) <0.001 NK resting low a , N Low (%) 830 (89.4) 15 (10.9) <0.001 NK resting low b , N Low (%) 223 (24.0) 28 (20.4) 0.414 NK activated low a , N (%) 217 (23.4) 0 (0.0) <0.001 NK activated low b , N (%) 203 (21.9) 40 (29.2) 0.072 Monocytes low a , N (%) 187 (20.2) 11 (8.0) 0.001 Monocytes low b , N (%) 187 (20.2) 28 (20.4) 1 M1-Mo low a , N (%) 198 (21.3) 112 (81.8) <0.001 M1-Mo low b , N (%) 790 (85.1) 113 (82.5) 0.498 M2-Mo low a , N (%) 807 (87.0) 56 (40.9) <0.001 M2-Mo low b , N (%) 0 (0.0) 29 (21.2) <0.001 Dendritic cell low a , N (%) 0 (0.0) 121 (88.3) <0.001 Dendritic cell low b , N (%) 0 (0.0) 119 (86.9) <0.001 Eosinophils low a , N (%) 0 (0) 0 (0) NA Eosinophils low b , N (%) 307 (33.1) 0 (0.0) <0.001 Neutrophils low a , N (%) 371 (40.0) 0 (0.0) <0.001 Neutrophils low b , N (%) 824 (88.8) 0 (0.0) <0.001 Myofibroblasts low a , N (%) 703 (75.8) 113 (82.5) 0.103 Myofibroblasts low b , N (%) 665 (71.7) 0 (0.0) <0.001 Lymphatic endothelial cells low a , N (%) 223 (24.0) 0 (0.0) <0.001 Lymphatic endothelial cells low b , N (%) 0 (0.0) 43 (31.4) <0.001 Artery endothelial cells low a , N (%) 0 (0.0) 120 (87.6) <0.001 Artery endothelial cells low b , N (%) 611 (65.8) 110 (80.3) 0.001 Adipocytes low a , N (%) 611 (65.8) 113 (82.5) <0.001 Adipocytes low b , N (%) 771 (83.1) 113 (82.5) 0.958 Pericytes low a , N (%) 193 (20.8) 118 (86.1) <0.001 Pericytes low b , N (%) 813 (87.6) 113 (82.5) 0.127 R-CHOP like chemotherapy, N (%) 113 (12.2) 24 (17.5) 0.108 aCut-off on PFS.bCut-off on OS.Italic values represent significant P values: P < 0.05.AAStage = Ann Arbor stage; ABC = activated B cell; COO = cell of origin; ECOGps = Eastern Cooperative Group performance status; GCB = germinal center B cell; IPI = international prognostic index; IQR = interquartile range; LDH = lactate dehydrogenase; Mo = macrophages; NA = not available; NK = natural killer; OS = overall survival; PFS = progression-free survival; TME = tumor microenvironment; ULN = upper limit of normal; UNC = unclassified. Among all clinical and TME features undergone univariate analysis, only 12 displayed significant association with both PFS and OS (Suppl. Figures S1–S2). As previously observed,6,7 those cases with higher predominance of M1-Mo (Suppl. Figure S1A–B) and Myo showed significantly longer OS and PFS (Suppl. Figure S1C–D). Our recursive PFS-based model (Figure 1C) recognized the Rev-IPI as the first feature splitting the entire study cohort into good/very good and poor categories (P < 0.001). The more favorable one was further subdivided—according to the level of Myo infiltration—into “high” and “low” subsets differing significantly in terms of PFS (P = 0.04). On the other hand, the proportion of intra-tumor M1-Mo (distinct in “high” and “low”) was able to separate the poor category into 2 smaller subsets, M1-Mo “high” and “low” (P < 0.001), the latter being at worst prognosis, independently of double hit or triple hit (DH/TH) status (Suppl. Figure S3A). Notably, the good/very good and Myo “high” subgroup displayed the best PFS, remarkably longer than cases in the Poor/M1-Mo “low” subgroups. In terms of OS (Figure 1D), our model confirmed the Rev-IPI as main prognosticator (P < 0.001) and recognized the extent of M1-Mo infiltration as the only feature identifying—within the poor category—patients at very high risk, independently of DH/TH status (Suppl. Figures S3B). Interestingly, good/very good patients showed OS similar to those that, even belonging to the poor category, have higher M1-Mo infiltration. Based on our recent finding6,7 and a new correlation analysis (Suppl. Figure S4A–B), we constructed a decision-tree model using the expression of ACTA2 and NR1H3 as molecular surrogates of Myo and M1-Mo, respectively. Again, in the PFS-based model (Figure 1E), the Rev-IPI subdivided patients into good/very good and poor categories (P < 0.001). ACTA2 expression split good/very good patients into 2 additional subgroups with significantly different PFS, while the poor category comprised patients at worse PFS with no further subcategorization. According to OS (Figure 1F), the model confirmed the Rev-IPI as the main risk predictor (P < 0.001), showing that the poor category included patients at lower NR1H3 expression and shortest OS (P < 0.001). Notably, despite belonging to the poor category, those expressing higher levels of NR1H3 displayed similar OS compared with good/very good patients. We used an independent set12 of 137 cases to validate these findings. In the univariate and multivariate analysis, ACTA2 showed no correlation with survival, whereas NR1H3 displayed a significant prognostic value in terms of both PFS and OS, confirmed also by a multivariate analysis including the Rev-IPI and COO (Suppl. Figure S5). Our PFS-based model categorized patients according to the Rev-IPI and ACTA2 expression and split only the good/very good cases into 2 additional subclasses with no PFS difference (P = 0.80, Figure 1G). Conversely, within the poor category, those at higher NR1H3 expression showed longer PFS (almost comparable to the good/very good subgroup) (Figure 1H). Consistently, in terms of OS, while ACTA2 did not show any significant stratification in good/very good cases (Figure 1I), Rev-IPI poor patients with higher expression of NR1H3 largely overlapped with good/very good patients. On the other hand, the impact of NR1H3 expression on OS was even more evident for the patients classified as at high risk based on the sole Rev-IPI (P < 0.001) (Figure 1J). Moreover, the recursive model including NR1H3 evaluated on OS (C-index [C] = 0.7, standard error [SE] = 0.04 and Akaike Information Criterion [AIC] = 259.7) outperformed the one on PFS (C = 0.61, SE = 0.04, and AIC = 310.9) as well as models including ACTA2 (PFS, C = 0.67, SE = 0.04, and AIC = 267.6 and OS, C = 0.61, SE = 0.04, and AIC = 310.9). Here, we corroborate the prognostic value of peculiar TME components in DLBCL,6,7 using clinical and transcriptomic data from large patient cohorts to build practical tree-based models of risk stratification. Such an approach may not only enable early prognostication, but also help in enriching a subset of patients who might benefit from new therapeutic opportunities. Beyond confirming the prognostic value of the Rev-IPI, we pointed out Myo and M1-Mo as key features associated with outcomes, independently of DH/TH status. In particular, we validated (i) a direct correlation between stromal cell infiltration and longer PFS in Rev-IPI favorable patients; and (ii) the capacity of M1-Mo infiltration extent to mitigate (when higher) the outcome of patients at high clinical risk and, conversely, to recognize (when lower) those at very high risk, in terms of both PFS and OS. Recent deconvolution analyses of TME revealed intriguing association of cellular ecosystems with patients’ survival, highlighting on the one hand the prognostic relevance of complex interactions between different immune/stromal elements and, on the other, a recurrent enrichment of fibroblast- and Mo-related signatures in those subsets of DLBCL at relatively favorable prognosis.13,14 Moreover, these results remain of arguable translational value due to the lack of (i) definite mechanistic relationships between TME cells and outcomes, and (ii) robust functional biomarkers to be incorporated in practical models of risk prediction. Our previous7 and present results provide novel insights on the latter point, supporting the potential usefulness of NR1H3 and ACTA2 as functional prognostic biomarkers related to specific TME components, namely M1-Mo and myofibroblasts, respectively. Our unsupervised model supports the capacity of NR1H3 to identify more favorable cases at higher expression, while recognizing patients at very high risk and lower expression. We envision a potential relevance of such results in guiding future strategies of Mo-targeted immunomodulation,10 although they still require a comprehensive mechanistic explanation that takes into account the impact of different immune and/or stromal elements on tumor behavior. Moreover, despite the current lacking of an independent real-world validation, we believe that the broad availability of platforms for the routine measurement of target genes (ie, NanoString Technology) could add translational value to our results in the future, thus prompting the generation of new rationales for TME-oriented therapeutics. On the other hand, our findings confirm the Rev-IPI as the main prognosticator in DLBCL. However, as compared with previous studies based on canonical regression or machine-learning models,15 we highlight here the practical advantage of a tree-based approach in improving the accuracy of risk stratification by easily combining well-known and novel prognostic determinants. In conclusion, we provided evidence that the assessment of stromal and Mo-related features of DLBCL may be of practical help for identifying patients at higher risk, independently of classical prognosticators. Future studies are needed to (i) mechanistically clarify the prognostic significance of TME-related features; (ii) explore their biological meaning, and (iii) integrate them with tumor-related features for a practical improvement of prognostication and, hopefully, therapeutic prediction in DLBCL. ACKNOWLEDGMENTS The authors thank the Scientific Director, Prof. Massimo Tommasino, and the clinicians of the Hematology Unit at the IRCCS Istituto Tumori “Giovanni Paolo II.” GMZ would thank the colleagues Eng. Giuseppe Carella, Eng. Vito Angiulli, and Eng. G. Salomone from the Technology Transfer Office. AUTHOR CONTRIBUTIONS GMZ, MCV, and SC had the idea and designed the manuscript. GMZ, MCV, GM, MS analyzed the data. GMZ, MCV, GV, and SC wrote the manuscript. GV, NA, GG, ASP, AB, FE, GO, FC, AN, PM, MCDC, VAB, and AG reviewed the results, suggested modifications, and approved the final publication. DATA AVAILABILITY Data set information can be found in the supplementary digital content. DISCLOSURES The authors have no conflicts of interest to disclose. SOURCES OF FUNDING Italian Ministry of Health - “Ricerca Corrente 2023.”

Proceedings ArticleDOI
24 Feb 2023
TL;DR: In this article , a financial modeling system based on decision tree analysis method can reduce the difficulty of financial management and improve the actual quality of management work, which can help financial managers through financial data model for financial analysis, make accurate decisions after targeted management.
Abstract: Financial management is very important for enterprise operation, but the work has a certain difficulty, especially in the modern development of the difficulty of the work is still increasing, leading to the enterprise traditional financial management system is no longer applicable, need to seek new methods to support. Therefore, this paper will rely on decision tree analysis to carry out relevant research, introduced the basic concept of decision tree analysis, followed by the financial modeling system design, the design of the system can help financial managers through financial data model for financial analysis, make accurate decisions after targeted management. Through research, the financial modeling system based on decision tree analysis method can reduce the difficulty of financial management and improve the actual quality of management work.

Posted ContentDOI
11 Jun 2023
TL;DR: In this article , the authors train a low-depth tree with the objective of minimising the maximum misclassification error across each leaf node, and then suspend further tree-based models (e.g., trees of unlimited depth) from each leaf of the low depth tree.
Abstract: In classification and forecasting with tabular data, one often utilizes tree-based models. This can be competitive with deep neural networks on tabular data [cf. Grinsztajn et al., NeurIPS 2022, arXiv:2207.08815] and, under some conditions, explainable. The explainability depends on the depth of the tree and the accuracy in each leaf of the tree. Here, we train a low-depth tree with the objective of minimising the maximum misclassification error across each leaf node, and then ``suspend'' further tree-based models (e.g., trees of unlimited depth) from each leaf of the low-depth tree. The low-depth tree is easily explainable, while the overall statistical performance of the combined low-depth and suspended tree-based models improves upon decision trees of unlimited depth trained using classical methods (e.g., CART) and is comparable to state-of-the-art methods (e.g., well-tuned XGBoost).

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors explored the important factors of online service satisfaction and the best predicting model and showed that random tree model with one-versus-rest mode has the greatest accuracy among the models which offers telecommunications companies insight.
Abstract: Our lives cannot do without the internet. How to improve the network quality has always been an essential problem. The paper explores the important factors of online service satisfaction and the best predicting model. Based on the data offered by Beijing Mobile Company, we identify main factors affecting online service satisfaction by calculating their mutual information values. The factors include signal problem factors, scene factors and software usage factors. Additionally, based on decision tree model and models with decision tree as base learner, we predict the online service satisfaction. The result shows that random tree model with One Vs Rest mode has the greatest accuracy among the models which offers telecommunications companies insight.

Journal ArticleDOI
TL;DR: In this paper , the authors used Decision-Stump model for liver disease diagnosis in the North East of Andhra Pradesh, India, registered at the University of California in 2012.
Abstract: Background: Liver cancer is the third most common cause of cancer mortality. Artificial intelligence, as a diagnostic tool, can reduce physicians’ working load. However, the main fear is that due to the existence of many causes and factors, liver diseases are not easily diagnosed. This study analyzes liver disease intelligently. Various decision tree models were used in this research. Methods: The records of 583 patients in the North East of Andhra Pradesh, India, registered at the University of California in 2012, were collected. Decision tree models were compared by three measures of sensitivity, accuracy, and area under the ROC curve. Results: In this study, Decision-Stump showed better results than other models. Accuracy, sensitivity, and ROC curve of Decision-Stump were 71.3058, 1, and 0.646, respectively. Conclusion: The superior model with the highest precision is the Decision-Stump model. Therefore, the Decision-Stump model is recommended for liver disease diagnosis. This paper is invaluable for the allocation of health resources for risky people.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors implemented a predictive model of stroke based on decision tree is implemented to predict the stroke probability of ten samples by using Python language and showed that older people with high blood pressure, heart disease, habitual smoking are more possible to have stroke, with a prediction accuracy of 88% for decision tree method and 79% for Naive Bayes model, respectively.
Abstract: In this paper, the predictive model of stroke based on decision tree is implemented to predict the stroke probability of ten samples by using Python language. The dataset of stroke is collected and is preprocessed, then the Gini coefficients of each feature are calculated to select the division, and then the decision tree model is obtained. Finally, the stroke probability is predicted for ten samples. In addition, Naive Bayes model is applied to predict the stroke probability to compare with the decision tree method. The experimental results show that older people with high blood pressure, heart disease, habitual smoking are more possible to have stroke, with a prediction accuracy of 88% for decision tree method and 79% for Naive Bayes model, respectively.

Proceedings ArticleDOI
29 Apr 2023
TL;DR: In this paper , a decision tree classification model was used to identify water samples requiring further analysis in order to streamline the work of laboratory technologists, and water samples were classified into clean and contaminated categories using the decision tree algorithm.
Abstract: Data on water quality in Kenya is analyzed using a decision tree classification model. Using data mining techniques based on parameters related to water quality, the decision tree algorithm helps predict clean water. A predictive model was developed to identify water samples requiring further analysis in order to streamline the work of laboratory technologists. WEKA software was used to implement the model based on secondary data collected from the Kenya Water Institute. Water samples were classified into clean and contaminated categories using the decision tree algorithm. A crucial factor for evaluating water quality is its alkalinity and conductivity. Public health and safety depend on access to clean drinking water. Researchers used five decision tree classifiers to evaluate the model’s accuracy: J48, LMT, Random Forest, Hoeffding Tree, and Decision Stump

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a recommendation system that, given a set of tasks and a convolutional neural network-based backbone model, automatically suggests tree-structured multitask architectures that could achieve a high task performance while meeting a user-specified computation budget without performing model training.
Abstract: Neural networks with branched architectures, namely, tree-structured models, have been employed to jointly tackle multiple vision tasks in the context of multitask learning (MTL). Such tree-structured networks typically start with a number of shared layers, after which different tasks branch out into their own sequence of layers. Hence, the major challenge is to determine where to branch out for each task given a backbone model to optimize for both task accuracy and computation efficiency. To address the challenge, this article proposes a recommendation system that, given a set of tasks and a convolutional neural network-based backbone model, automatically suggests tree-structured multitask architectures that could achieve a high task performance while meeting a user-specified computation budget without performing model training. Extensive evaluations on popular MTL benchmarks show that the recommended architectures could achieve competitive task accuracy and computation efficiency compared with state-of-the-art MTL methods. Our tree-structured multitask model recommender is open-sourced and available at https://github.com/zhanglijun95/TreeMTL.