Similarity-based attribute weighting methods via clustering algorithms in the classification of imbalanced medical datasets

Polat, Kemal

Similarity-based attribute weighting methods via clustering algorithms in the classification of imbalanced medical datasets

dc.authorid	0000-0003-1840-9958	en_US
dc.contributor.author	Polat, Kemal
dc.date.accessioned	2021-06-23T19:49:36Z
dc.date.available	2021-06-23T19:49:36Z
dc.date.issued	2018
dc.department	BAİBÜ, Mühendislik Fakültesi, Elektrik Elektronik Mühendisliği Bölümü	en_US
dc.description.abstract	In the fields of pattern recognition and machine learning, the use of data preprocessing algorithms has been increasing in recent years to achieve high classification performance. In particular, it has become inevitable to use the data preprocessing method prior to classification algorithms in classifying medical datasets with the nonlinear and imbalanced data distribution. In this study, a new data preprocessing method has been proposed for the classification of Parkinson, hepatitis, Pima Indians, single proton emission computed tomography (SPECT) heart, and thoracic surgery medical datasets with the nonlinear and imbalanced data distribution. These datasets were taken from UCI machine learning repository. The proposed data preprocessing method consists of three steps. In the first step, the cluster centers of each attribute were calculated using k-means, fuzzy c-means, and mean shift clustering algorithms in medical datasets including Parkinson, hepatitis, Pima Indians, SPECT heart, and thoracic surgery medical datasets. In the second step, the absolute differences between the data in each attribute and the cluster centers are calculated, and then, the average of these differences is calculated for each attribute. In the final step, the weighting coefficients are calculated by dividing the mean value of the difference to the cluster centers, and then, weighting is performed by multiplying the obtained weight coefficients by the attribute values in the dataset. Three different attribute weighting methods have been proposed: (1) similarity-based attribute weighting in k-means clustering, (2) similarity-based attribute weighting in fuzzy c-means clustering, and (3) similarity-based attribute weighting in mean shift clustering. In this paper, we aimed to aggregate the data in each class together with the proposed attribute weighting methods and to reduce the variance value within the class. Thus, by reducing the value of variance in each class, we have put together the data in each class and at the same time, we have further increased the discrimination between the classes. To compare with other methods in the literature, the random subsampling has been used to handle the imbalanced dataset classification. After attribute weighting process, four classification algorithms including linear discriminant analysis, k-nearest neighbor classifier, support vector machine, and random forest classifier have been used to classify imbalanced medical datasets. To evaluate the performance of the proposed models, the classification accuracy, precision, recall, area under the ROC curve, kappa value, and F-measure have been used. In the training and testing of the classifier models, three different methods including the 50-50% train-test holdout, the 60-40% train-test holdout, and tenfold cross-validation have been used. The experimental results have shown that the proposed attribute weighting methods have obtained higher classification performance than random subsampling method in the handling of classifying of the imbalanced medical datasets.	en_US
dc.identifier.doi	10.1007/s00521-018-3471-8
dc.identifier.endpage	1013	en_US
dc.identifier.issn	0941-0643
dc.identifier.issn	1433-3058
dc.identifier.issue	3	en_US
dc.identifier.scopus	2-s2.0-85044757192	en_US
dc.identifier.scopusquality	Q1	en_US
dc.identifier.startpage	987	en_US
dc.identifier.uri	https://doi.org/10.1007/s00521-018-3471-8
dc.identifier.uri	https://hdl.handle.net/20.500.12491/9562
dc.identifier.volume	30	en_US
dc.identifier.wos	WOS:000438595400025	en_US
dc.identifier.wosquality	Q1	en_US
dc.indekslendigikaynak	Web of Science	en_US
dc.indekslendigikaynak	Scopus	en_US
dc.institutionauthor	Polat, Kemal
dc.language.iso	en	en_US
dc.publisher	Springer London Ltd	en_US
dc.relation.ispartof	Neural Computing & Applications	en_US
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.subject	Imbalanced Medical Dataset Classification	en_US
dc.subject	Data Preprocessing	en_US
dc.subject	Attribute Weighting	en_US
dc.subject	Clustering Algorithms	en_US
dc.title	Similarity-based attribute weighting methods via clustering algorithms in the classification of imbalanced medical datasets	en_US
dc.type	Article	en_US

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1

İsim:: kemal-polat.pdf
Boyut:: 2.73 MB
Biçim:: Adobe Portable Document Format
Açıklama:: Tam Metin/Full Text

İndir

Koleksiyon

WoS İndeksli Yayınlar Koleksiyonu
Elektrik Elektronik Mühendisliği Bölümü Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu