Machine learning-aided engineering services' cost overruns prediction in high-rise residential building projects: Application of random forest regression
01 Jun 2022-Journal of building engineering-Vol. 50, pp 104102-104102
TL;DR: In this article , the authors proposed a robust random forest (RF) regression model to predict ESCOs considering both project-related and organizational-related variables, and compared the results with those of support vector regression (SVR) and multiple linear regression (MLR), which indicated that with an R2 value of 0.8680 and mean absolute error (MAE) of 3.88, the RF regression model performs better than those baseline models, namely SVR and MLR.
Abstract: Current approaches to automating cost estimation mainly focus on construction costs. Yet, the two main services provided by design firms, namely ‘designing the project’, and ‘supervision of construction operations’ labelled as engineering services, despite their comparatively low cost, can significantly affect the total cost of construction projects as they can engender reworks, changes and disputes on project participants during the subsequent stages of the project. Continuous evaluation of engineering services' cost overruns (ESCO) is quintessentially important in order to prevent consequential problems later on in the project's development and use. Consequently, this research proposes a robust random forest (RF) regression model to predict ESCOs considering both project-related and organizational-related variables. A database consisting of 95 high-rise residential building projects designed during the past eight years in Iran, along with 12 related variables, were collected to develop and validate the model. The results were also compared with those of support vector regression (SVR) and multiple linear regression (MLR), which indicated that with an R2 value of 0.8680 and mean-absolute-error (MAE) of 3.88, the RF regression model performs better than those baseline models, namely SVR and MLR. This research presents two main contributions to the existing body of knowledge. From the practical point of view, it provides an efficient tool for design firms enabling them to screen and prioritize their projects from the cost overrun standpoint and to devise a contingency plan for them. From the theoretical point of view, it revealed that to mitigate ESCOs, three key factors should be given thorough consideration, namely: ‘the level of computer-aided design technologies adoption’; ‘level of communication among the project team’; and scope definition adequacy’ – cumulatively, these three factors contribute to 52.35% of ESCO variations.
TL;DR: In this article , a relationship between mix design parameters, density, and compressive strength using gene expression programming was developed for sustainable lightweight foamed concrete (LWFC) and the results showed that 95% of the predicted results had error values less than 2% for the density model and 91% of predicted results have error values lower than 5 MPa for the strength model.
Abstract: Foamed concrete is a versatile material that can be used in different construction applications and with proper mix designing, it can also be used as a structural member. The production of sustainable lightweight foamed concrete (LWFC) requires a proper mix design relation to achieve the desired physical and mechanical properties. There are a few standards such as ACI 211.2 that give mix design procedures, but the provided methods cannot be applied for all forms of lightweight concrete, particularly for LWFC. This study aimed to develop a relationship between the mix design parameters, density, and compressive strength using gene expression programming. For the development of models, an extensive database of 191 data points was collected from the published literature that comprised of cement content, sand content, water to cement ratio, foam volume as input parameters, and the dry density and 28 days compressive strength as an output parameter. The developed models were evaluated by using a regression tool i.e. coefficient of determination (R2) along with the root mean square error, mean absolute error, root square error, objective function and performance index. A strong correlation of R2 0.95, was obtained for both the density and strength models along with the least statistical errors. From the results, it has been observed that 95% of the predicted results have error values less than 2% for the density model and 91% of the predicted results have error values less than 5 MPa for the strength model. The validity of the models was further verified by conducting an experimental investigation where a satisfactory relation of R2 0.79 and 0.94 was observed for density and compressive strength models, respectively. In addition, the sensitivity and parametric analysis were also performed to analyze the influence of individual input parameters in relation to the output. The developed GEP models are expressed in the form of empirical relations that can be used for the practical application of foamed concrete in different construction applications.
TL;DR: This study compared the BN classifier model’s performance accuracy to that of the Naive Bayes (NB) and decision tree (DT) models to determine the effect of considering possible correlations between cost overrun risks on prediction accuracy.
Abstract: Cost overrun risks are declared to be dynamic and interdependent. Ignoring the relationship between cost overrun risks during the risk assessment process is one of the primary reasons construction projects go over budget. Conversely, recent studies have failed to account for potential interrelationships between risk factors in their machine learning (ML) models. Additionally, the presented ML models are not interpretable. Thus, this study contributes to the entire ML process using a Bayesian network (BN) classifier model by considering the possible interactions between predictors, which are cost overrun risks, to predict cost overrun and assess cost overrun risks. Furthermore, this study compared the BN classifier model’s performance accuracy to that of the Naive Bayes (NB) and decision tree (DT) models to determine the effect of considering possible correlations between cost overrun risks on prediction accuracy. Moreover, the most critical risks and their relationships are identified by interpreting the learned BN model. The results indicated that the 18 BN models demonstrated an average prediction accuracy of 78.86%, significantly higher than the NB and DT. The present study identified the most significant risks as an increase in the cost of materials, lack of knowledge and experience among human resources, and inflation.
TL;DR: In this article , a deep learning neural network (DLNN) was used to predict preliminary factory construction cost, which outperformed all other machine learning algorithms in this comparison, while the ensemble model of artificial neural networks and generalized linear regression also fared well.
Abstract: Construction of industrial enterprises has become more necessary in recent years. It is critical for project managers to estimate the entire cost of a building project at this early stage. Existing approaches that use operator experience as a mathematical formula. Initial estimates are inaccurate due to the lack of available data points, which leads to overruns in project costs. This research utilizes different machine learning techniques to predict preliminary factory construction cost. Five popular numeric predictive techniques: support vector machine (SVM), artificial neural network (ANN), generalized linear regression (GENLIN), classification and regression-based techniques (CART), exhaustive chi-squared automatic interaction detection (CHAID) are used for baseline and ensemble models. A deep learning neural network (DLNN) is also utilized in this study. The machine learning model is trained and tested on actual data gathered in the southern part of Vietnam. Deep learning outperforms all other machine learning algorithms in this comparison, while the ensemble model of artificial neural networks and generalised linear regression also fared well. Cost estimators can quickly pick the best model for projecting the cost of constructing a preliminary factory by having access to a variety of estimate methodologies.
10 Oct 2022
TL;DR: In this paper , the authors examined the risks factors associated with construction projects in wetlands ecosystem, using Ghana as a case study, and found that the key driving forces spurring construction projects were rapid urbanisation, high rate of migration, and scarcity of land for development.
Abstract: ABSTRACT Within some developing countries, wetlands have long been regarded as dumping grounds. However, there has been a gradual paradigm shift in societal attitudes towards recognising their importance as landscape features that provide benefits for humans and wildlife. Unfortunately, construction activities have continually plagued their existence. Therefore, this present research examines the risks factors associated with construction projects in wetlands ecosystem, using Ghana as a case study. A quantitative research strategy leaning towards the positivist paradigm was adopted. A structured survey was developed to collect data from key stakeholders. Purposive sampling was employed to recruit construction experts in the Kumasi Metropolis, with a total of 78 experts agreeing to participate. Relative Importance Index (RII), Mean Score Ranking and a One-Sample T-Test were used to analyse primary data collected. The findings revealed that the key driving forces spurring construction projects in wetlands were rapid urbanisation; high rate of migration; and scarcity of land for development. Critical risks factors associated with construction projects in wetlands were identified as cost overruns; exploitation of biological resources; and water pollution. Finally, the findings also showed that the most critical detrimental effects of construction projects in wetlands were the destruction of aquatic and terrestrial lives; loss of flood control capability; and deterioration of wetland water quality. The study recommends the protection and conservation of wetlands environments through systematized enactment and enforcement of environmental protection regulations by government and non-governmental institutions. This would ensure the preservation of important biodiversity and aid with pollution control and flood protection.
TL;DR: Evaluated machine learning models for modelling cyanobacteria blue-green algae at two rivers located in the USA show that good predictive accuracy was obtained using the RFR model and the ANN and RFR were found to be more accurate compared to the ELM and RVFL models, exhibiting high numerical performances.
TL;DR: A new, conditional permutation scheme is developed for the computation of the variable importance measure that reflects the true impact of each predictor variable more reliably than the original marginal approach.
Abstract: Random forests are becoming increasingly popular in many scientific fields because they can cope with "small n large p" problems, complex interactions and even highly correlated predictor variables. Their variable importance measures have recently been suggested as screening tools for, e.g., gene expression studies. However, these variable importance measures show a bias towards correlated predictor variables. We identify two mechanisms responsible for this finding: (i) A preference for the selection of correlated predictors in the tree building process and (ii) an additional advantage for correlated predictor variables induced by the unconditional permutation scheme that is employed in the computation of the variable importance measure. Based on these considerations we develop a new, conditional permutation scheme for the computation of the variable importance measure. The resulting conditional variable importance reflects the true impact of each predictor variable more reliably than the original marginal approach.
TL;DR: In this article, the authors examined the limited research on the issue and developed a set of criteria appropriate to all Information Systems/Information Technology (IS/IT) projects, using research as illustrations.
TL;DR: This paper discussed similarities and distinctions among these components and their relation to other sections of a manuscript such as the problem statement, discussion, and implications, and concluded with an overview of the literature review, theoretical framework, and conceptual framework as separate types of manuscripts.
Abstract: This essay starts with a discussion of the literature review, theoretical framework, and conceptual framework as components of a manuscript. This discussion includes similarities and distinctions among these components and their relation to other sections of a manuscript such as the problem statement, discussion, and implications. The essay concludes with an overview of the literature review, theoretical framework, and conceptual framework as separate types of manuscripts. Understanding similarities and differences among the literature review, theoretical framework, and conceptual framework can help novice and experienced researchers in organizing, conceptualizing, and conducting their research, whether qualitative, quantitative, or mixed-methods.
TL;DR: In this paper, the authors developed linear regression models to predict the construction cost of buildings, based on 286 sets of data collected in the United Kingdom, and the best regression model is the log of cost backward model which gives an R2 of 0.661 and a mean absolute percentage error (MAPE) of 19.3%.
Abstract: This paper describes the development of linear regression models to predict the construction cost of buildings, based on 286 sets of data collected in the United Kingdom. Raw cost is rejected as a suitable dependent variable and models are developed for cost∕ m2 , log of cost, and log of cost∕ m2 . Both forward and backward stepwise analyses were performed, giving a total of six models. Forty-one potential independent variables were identified. Five variables appeared in each of the six models: gross internal floor area (GIFA), function, duration, mechanical installations, and piling, suggesting that they are the key linear cost drivers in the data. The best regression model is the log of cost backward model which gives an R2 of 0.661 and a mean absolute percentage error (MAPE) of 19.3%; these results compare favorably with past research which has shown that traditional methods of cost estimation have values of MAPE typically in the order of 25%.