Implementation of free and open-source semi-automatic feature engineering tool in landslide susceptibility mapping using the machine-learning algorithms RF, SVM, and XGBoost
dc.authorid | 0000-0002-9830-8585 | en_US |
dc.contributor.author | Şahin, Emrehan Kutluğ | |
dc.date.accessioned | 2023-08-29T07:53:05Z | |
dc.date.available | 2023-08-29T07:53:05Z | |
dc.date.issued | 2023 | en_US |
dc.department | BAİBÜ, Mühendislik Fakültesi, İnşaat Mühendisliği Bölümü | en_US |
dc.description.abstract | Various machine learning (ML) techniques have been recommended and used in the literature to produce landslide susceptibility map (LSM). On the other hand, feature engineering (FE) is an important topic in ML studies, but the concept is ignored by most research. In this study, a novel FE framework, including feature selection, feature transformation, feature binning, and feature weighting, is proposed to produce LSMs using eXtreme gradient boosting (XGBoost), random forest (RF), and support vector machine (SVM). For this purpose, first, thirteen landslide conditioning factors used in data preprocessing were utilized for producing LSM models in the study area, Babadag district of Denizli Province in the Aegean region of Turkey. Second, two irrelevant factors eliminated from the input feature subset using the feature selection in the FE framework. Third, features determined as skewed data were converted into symmetric form by applying feature transformation analysis with log transformation. Then, the remaining factors having continuous values were turned into categorical values using the quantile classifier technique. During the feature weighting phase, four different feature weighting methods, namely, eXtreme Gradient Boosting, random forest (RF), non-negative least squares (NNLS), and Frequency Ratio, were utilized to calculate the weights in each subclass of each landslide-related factor. In addition, the proposed feature subsets were also compared with raw data. At the end of process, the XGBoost model constructed with a FR-selected subset (Overall Accuracy (Acc) = 0.907 and area under curve (AUC) = 0.9822) outperformed both raw (Acc = 0.874; AUC = 0.960) and other methods (i.e., RF-FR and SVM-NNLS). Consequently, the study results revealed that the proposed FE approach could be a useful framework to increase the performance of ML techniques in identifying and extracting relevant features to develop highly optimized and enriched models. | en_US |
dc.description.sponsorship | The raw data used in this paper was obtained from the project ``Development of ArcGIS Interfaces with R programming language for Landslide Susceptibility Mapping'' (No. 118Y090) funded by The Scientific and Technological Research Council of Turkey (TUBITAK). | en_US |
dc.identifier.citation | Sahin, E. K. (2023). Implementation of free and open-source semi-automatic feature engineering tool in landslide susceptibility mapping using the machine-learning algorithms RF, SVM, and XGBoost. Stochastic Environmental Research and Risk Assessment, 37(3), 1067-1092. | en_US |
dc.identifier.doi | 10.1007/s00477-022-02330-y | |
dc.identifier.endpage | 1092 | en_US |
dc.identifier.issn | 1436-3240 | |
dc.identifier.issn | 1436-3259 | |
dc.identifier.issue | 3 | en_US |
dc.identifier.scopus | 2-s2.0-85141386495 | en_US |
dc.identifier.scopusquality | Q1 | en_US |
dc.identifier.startpage | 1067 | en_US |
dc.identifier.uri | http://dx.doi.org/10.1007/s00477-022-02330-y | |
dc.identifier.uri | https://hdl.handle.net/20.500.12491/11608 | |
dc.identifier.volume | 37 | en_US |
dc.identifier.wos | WOS:000878953800001 | en_US |
dc.identifier.wosquality | Q1 | en_US |
dc.indekslendigikaynak | Web of Science | en_US |
dc.indekslendigikaynak | Scopus | en_US |
dc.institutionauthor | Şahin, Emrehan Kutluğ | |
dc.language.iso | en | en_US |
dc.publisher | SPRINGER | en_US |
dc.relation.ispartof | Stochastic Environmental Research and Risk Assessment | en_US |
dc.relation.publicationcategory | Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı | en_US |
dc.relation.tubitak | Scientific and Technological Research Council of Turkey (TUBITAK) [118Y090] | |
dc.rights | info:eu-repo/semantics/closedAccess | en_US |
dc.subject | Landslide Susceptibility | en_US |
dc.subject | Data Preparation | en_US |
dc.subject | Feature Engineering | en_US |
dc.subject | Feature Transformation | en_US |
dc.subject | Feature Weighting | en_US |
dc.subject | Machine Learning | en_US |
dc.title | Implementation of free and open-source semi-automatic feature engineering tool in landslide susceptibility mapping using the machine-learning algorithms RF, SVM, and XGBoost | en_US |
dc.type | Article | en_US |