# 电子工程代写|数据管理和数据系统代写Data Management and Data Systems代考|TM351

## 电子工程代写|数据管理和数据系统代写Data Management and Data Systems代考|Landslide Conditioning Factor Selection

When evaluating landslide susceptibility using machine learning models, results are highly dependent on input features and quality of the input features [23]. Selected features may or may not have a significant impact on a landslide occurrence as well as where features with noise, which may also be called low-quality features, may even reduce the predictive capability of the trained model. In order to achieve a higher accuracy, features which have a high impact on a landslide occurrence needed to be identified and features with no or very low impact needed to be removed from the training set of features. In this research, relative feature importance based on information gain was used to quantify the predictive capability of conditioning factors. These methods could identify the most important factors and improve the overall classification accuracy of the model.

Feature importance provides insights into the impact of a feature on the trained model (Figs. 6, 7). Random forest and gradient boosting algorithms calculate the information gain of each feature, which is used as the basis of calculating the relative importance scores. Using these scores, a cut-off score was determined to filter the less impactful features.

## 电子工程代写|数据管理和数据系统代写Data Management and Data Systems代考|Landslide Susceptibility Modeling

In this research, the following three ensemble models were tested on their ability in predicting the landslide susceptibility. Ensemble learning improves the result of a machine learning problem by combining the predictions of multiple models and calculating the final result based on a voting scheme of either bagging or boosting. This approach allows for achieving a better prediction accuracy compared to a single model. Further to get optimal results, grid search mechanism was utilized to tune the hyperparameters of each model.

A random forest is an algorithmic structure consisting of multiple decision trees which are trained by the bootstrap aggregation algorithm. The final result of classification is achieved by the bagging the vote of each tree. Hence, it contains additional benefits compared to a single model such as the virtual immunity to overfitting. The random forest algorithm introduces extra randomness into the model during the training phase, causing extra diversity which ultimately leads to better predictions.
The bagging algorithm repeatedly selects a random sample with replacement from the training set and tries to fit trees to the sample. If there are $\mathrm{n}$ number of trees, each $i$ th tree is trained with a random set of training data $\left(X_{i}\right)$ and targets $\left(Y_{i}\right)$ and will be denoted as the $i$ th classifier $\left(f_{i}\right)$. When training is completed, predictions on unseen data are issued by taking the majority vote by individual trees $\left(f_{i}\right)$. This can lead to a higher accuracy by reducing the variance of the classifier. Compared to single model, this is more robust to noise since not every individual tree in the forest is correlated.
In random forest algorithm, feature bagging is also introduced to make sure that each tree uses only a subset of the features. If any particular feature or a set of features is identified to have a relatively bigger influence on the target, presumably they will be selected in most of the trees. Other than being used as a prediction tool, random forest model possess another distinctive ability to calculate the relative importance of the features by monitoring how effectively each feature reduces the very much convenient to extract the most important features from a large feature set.

bagging 算法从训练集中反复选择一个有放回的随机样本，并尝试将树拟合到样本中。如果有n树的数量，每个一世使用一组随机的训练数据训练树(X一世)和目标(是一世)并将被表示为一世分类器(F一世). 训练完成后，通过单个树获得多数票来发布对看不见的数据的预测(F一世). 这可以通过减少分类器的方差来提高准确性。与单一模型相比，这对噪声更鲁棒，因为并非森林中的每一棵树都是相关的。

