Implementation of free and open-source semi-automatic feature engineering tool in landslide susceptibility mapping using the machine-learning algorithms RF, SVM, and XGBoost

Yükleniyor...
Küçük Resim

Tarih

2023

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

SPRINGER

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

Various machine learning (ML) techniques have been recommended and used in the literature to produce landslide susceptibility map (LSM). On the other hand, feature engineering (FE) is an important topic in ML studies, but the concept is ignored by most research. In this study, a novel FE framework, including feature selection, feature transformation, feature binning, and feature weighting, is proposed to produce LSMs using eXtreme gradient boosting (XGBoost), random forest (RF), and support vector machine (SVM). For this purpose, first, thirteen landslide conditioning factors used in data preprocessing were utilized for producing LSM models in the study area, Babadag district of Denizli Province in the Aegean region of Turkey. Second, two irrelevant factors eliminated from the input feature subset using the feature selection in the FE framework. Third, features determined as skewed data were converted into symmetric form by applying feature transformation analysis with log transformation. Then, the remaining factors having continuous values were turned into categorical values using the quantile classifier technique. During the feature weighting phase, four different feature weighting methods, namely, eXtreme Gradient Boosting, random forest (RF), non-negative least squares (NNLS), and Frequency Ratio, were utilized to calculate the weights in each subclass of each landslide-related factor. In addition, the proposed feature subsets were also compared with raw data. At the end of process, the XGBoost model constructed with a FR-selected subset (Overall Accuracy (Acc) = 0.907 and area under curve (AUC) = 0.9822) outperformed both raw (Acc = 0.874; AUC = 0.960) and other methods (i.e., RF-FR and SVM-NNLS). Consequently, the study results revealed that the proposed FE approach could be a useful framework to increase the performance of ML techniques in identifying and extracting relevant features to develop highly optimized and enriched models.

Açıklama

Anahtar Kelimeler

Landslide Susceptibility, Data Preparation, Feature Engineering, Feature Transformation, Feature Weighting, Machine Learning

Kaynak

Stochastic Environmental Research and Risk Assessment

WoS Q Değeri

Q1

Scopus Q Değeri

Q1

Cilt

37

Sayı

3

Künye

Sahin, E. K. (2023). Implementation of free and open-source semi-automatic feature engineering tool in landslide susceptibility mapping using the machine-learning algorithms RF, SVM, and XGBoost. Stochastic Environmental Research and Risk Assessment, 37(3), 1067-1092.