Topic

Random forest

About: Random forest is a research topic. Over the lifetime, 13345 publications have been published within this topic receiving 345395 citations. The topic is also known as: random forests & randomized trees.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Prediction of the Ibuprofen Loading Capacity of MOFs by Machine Learning

[...]

Xujie Liu, Yan Wang, Jiongpeng Yuan, Xiaojing Li, Siwei Wu, Yijun Bao, Zhenzhen Feng, Feilong Ou, Yan He - Show less +5 more

30 Sep 2022-Bioengineering

TL;DR: Initial results indicate that the screening of MOFs with high drug loading capacity is a well generalized, straightforward, and cost-effective method that can be applied not only for the prediction of IBU loading capacity, but also in many other biomaterials projects.

...read moreread less

Abstract: Metal-organic frameworks (MOFs) have been widely researched as drug delivery systems due to their intrinsic porous structures. Herein, machine learning (ML) technologies were applied for the screening of MOFs with high drug loading capacity. To achieve this, first, a comprehensive dataset was gathered, including 40 data points from more than 100 different publications. The organic linkers, metal ions, and the functional groups, as well as the surface area and the pore volume of the investigated MOFs, were chosen as the model’s inputs, and the output was the ibuprofen (IBU) loading capacity. Thereafter, various advanced and powerful machine learning algorithms, such as support vector regression (SVR), random forest (RF), adaptive boosting (AdaBoost), and categorical boosting (CatBoost), were employed to predict the ibuprofen loading capacity of MOFs. The coefficient of determination (R2) of 0.70, 0.72, 0.66, and 0.76 were obtained for the SVR, RF, AdaBoost, and CatBoost approaches, respectively. Among all the algorithms, CatBoost was the most reliable, exhibiting superior performance regarding the sparse matrices and categorical features. Shapley additive explanations (SHAP) analysis was employed to explore the impact of the eigenvalues of the model’s outputs. Our initial results indicate that this methodology is a well generalized, straightforward, and cost-effective method that can be applied not only for the prediction of IBU loading capacity, but also in many other biomaterials projects.

...read moreread less

4 citations

Journal Article•DOI•

What to expect from dynamical modelling of cluster haloes II. Investigating dynamical state indicators with Random Forest

[...]

Qingyang Li, Jiaxin Han, Wenting Wang, Wei Cui, F. De Luca, Xiaohu Yang, Ya-Nan Zhou, Rui Shi - Show less +4 more

29 Mar 2022-Monthly Notices of the Royal Astronomical Society

TL;DR: In this paper , the importance of various dynamical features in predicting the dynamical state (ds) of galaxy clusters, based on the Random Forest (RF) machine learning approach, was investigated.

...read moreread less

Abstract: We investigate the importances of various dynamical features in predicting the dynamical state (ds ) of galaxy clusters, based on the Random Forest (RF) machine learning approach. We use a large sample of galaxy clusters from the Three Hundred Project of hydrodynamical zoomed-in simulations, and construct dynamical features from the raw data as well as from the corresponding mock maps in the optical, X-ray, and Sunyaev-Zel’dovich (SZ) channels. Instead of relying on the impurity based feature importance of the RF algorithm, we directly use the out-of-bag (oob ) scores to evaluate the importances of individual features and different feature combinations. Among all the features studied, we find the virial ratio, η, to be the most important single feature. The features calculated directly from the simulations and in 3-dimensions carry more information on the ds than those constructed from the mock maps. Compared with the features based on X-ray or SZ maps, features related to the centroid positions are more important. Despite the large number of investigated features, a combination of up to three features of different types can already saturate the score of the prediction. Lastly, we show that the most sensitive feature η is strongly correlated with the well-known half-mass bias in dynamical modelling. Without a selection in ds , cluster haloes have an asymmetric distribution in η, corresponding to an overall positive half-mass bias. Our work provides a quantitative reference for selecting the best features to discriminate the ds of galaxy clusters in both simulations and observations.

...read moreread less

4 citations

Journal Article•DOI•

Modeling of Feature Selection Based on Random Forest Algorithm and Pearson Correlation Coefficient

[...]

Kai Mei, Mei-Ming Tan, Zhihui Yang, Shaoyue Shi

01 Apr 2022-Journal of Physics: Conference Series

TL;DR: A feature selection model is established to selects 20 molecular descriptors of compounds with the most significant influence on biological activity and parameters such as MlogP, XlogP and TopoPSA were found that had a prominent effect on the biological activity.

...read moreread less

Abstract: This paper establishes a feature selection model to selects 20 molecular descriptors of compounds with the most significant influence on biological activity. Random forest algorithm was used to calculate the correlation between molecular descriptors and pIC50 values of biological activity. In this way, the top 26 molecular descriptors with high correlation were screened out. The Pearson correlation coefficient was used to analyze the 26 molecular descriptors just selected and eliminate the variables with high correlation between the independent variables. By consulting literature, the parameters such as MlogP, XlogP and TopoPSA in the selected molecular descriptors were found that had a prominent effect on the biological activity, indicating that the screening methods and results of the 20 molecular descriptors were reasonable.

...read moreread less

4 citations

Proceedings Article•DOI•

Classification Application Based on Mutual Information and Random Forest Method for High Dimensional Data

[...]

Qingqing Kong, Hui-Li Gong¹, Xiangqian Ding¹, Ruichun Hou¹•Institutions (1)

Ocean University of China¹

01 Aug 2017

TL;DR: Experimental results demonstrate that CMI-RF method can select the feature subset with stronger correlation, no redundancy and high classification accuracy.

...read moreread less

Abstract: Random Forest (RF) has been widely used in the classification of high dimensional data. However, all the features of high dimensional data are used for classification, which will increase the computation time and reduce the classification accuracy. Therefore, feature selection is critical to high dimensional data classification. In order to solve this problem, this paper presents a method of Conditional Mutual Information (CMI) and Random Forest (CMI-RF). CMI is used to remove irrelevant and redundant information. The optimal subset of features with higher classification accuracy is obtained by RF. In this paper, the high dimensional near infrared spectral data is taken as experimental data. The experimental results demonstrate that CMI-RF method can select the feature subset with stronger correlation, no redundancy and high classification accuracy.

...read moreread less

4 citations

Journal Article•DOI•

Mapping 30 m Fractional Forest Cover over China's Three-North Region from Landsat-8 Data Using Ensemble Machine Learning Methods

[...]

Xiaobang Liu, Shunlin Liang, Bing Li, Han Ma, Tao He - Show less +1 more

02 Jul 2021-Remote Sensing

TL;DR: Wang et al. as discussed by the authors used multiple machine learning algorithms (MLAs) to estimate the fractional forest cover (FFC) in China's Three North Region (TNR) using 30m Landsat-8 data and aggregated 1-m GaoFen-2 (GF-2) satellite images.

...read moreread less

Abstract: The accurate monitoring of forest cover and its changes are essential for environmental change research, but current satellite products for forest coverage carry many uncertainties. This study used 30-m Landsat-8 data, and aggregated 1-m GaoFen-2 (GF-2) satellite images to construct the training samples and used multiple machine learning algorithms (MLAs) to estimate the fractional forest cover (FFC) in China’s Three North Region (TNR). In this study, multiple MLAs were merged to construct stacked generalization (SG) models based on the idea of SG, and the performances of the MLAs in the FFC estimation were evaluated. The results of the 10-fold cross-validation showed that all non-linear algorithms had a good performance, with an R2 value of greater than 0.8 and a root-mean square error (RMSE) of less than 0.05. In the bagging ensemble, the random forest (RF) (R2 = 0.993, RMSE = 0.020) model performed the best and in the boosting ensemble, the light gradient boosted machine (LGBM) (R2 = 0.992, RMSE = 0.022) performed the best. Although the evaluation index of the RF is slightly better than that of the LGBM, the independent validation results show that the two models have similar performances. The model evaluation results of the independent datasets showed that, in the SG model, the performance of the SG(LGBM) (R2 = 0.991, RMSE = 0.034) was better than that of the single or non-ensemble model. Comparing the FFC estimates of our model with those of existing datasets showed that our model exhibited more forest spatial distribution details and higher accuracy in complex landscapes. Overall, in this study, the method of using high-resolution remote sensing (RS) images to extract samples for FFC estimation is feasible. Our results demonstrate the potential of the ensemble MLAs to map the FFC. The research results also show that among many MALs, the RF algorithm is the most suitable algorithm for estimating FFC, which provides a reference for future research.

...read moreread less

4 citations

Collapse

Network Information

Performance

Metrics

29,141

Papers

532,363

Citations

No. of papers in the topic in previous years
Year	Papers
2024	1
2023	5,459
2022	10,287
2021	2,325
2020	2,251
2019	1,961

Random forest

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics