Novel dual-channel long short-term memory compressed capsule networks for emotion recognition

dc.authorid0000-0001-7856-9342en_US
dc.authorid0000-0002-7201-6963en_US
dc.authorid0000-0003-1840-9958en_US
dc.contributor.authorShahin, Ismail
dc.contributor.authorHindawi, Noor
dc.contributor.authorNassif, Ali Bou
dc.contributor.authorAlhudhaif, Adi
dc.contributor.authorPolat, Kemal
dc.date.accessioned2024-03-04T12:50:26Z
dc.date.available2024-03-04T12:50:26Z
dc.date.issued2022en_US
dc.departmentBAİBÜ, Mühendislik Fakültesi, Elektrik Elektronik Mühendisliği Bölümüen_US
dc.descriptionThe authors of this work would like to express their gratitude and gratitude to the University of Sharjah for their assistance through the two competitive research projects entitled Emirati-Accented Speaker and Emotion Recognition Based on Deep Neural Network, No. 19020403139, and Investigation and Analysis of Emirati-Accented Corpus in Neutral and Abnormal Talking Environments for Engineering Applications using Shallow and Deep Classifiers, No. 20020403159.en_US
dc.description.abstractRecent analysis on speech emotion recognition (SER) has made considerable advances with the use of MFCC's spectrogram features and the implementation of neural network approaches such as convolutional neural networks (CNNs). The fundamental issue of CNNs is that the spatial information is not recorded in spectrograms. Capsule networks (CapsNet) have gained gratitude as alternatives to CNNs with their larger capacities for hierarchical representation. However, the concealed issue of CapsNet is the compression method that is employed in CNNs cannot be directly utilized in CapsNet. To address these issues, this research introduces a text-independent and speaker-independent SER novel architecture, where a dual-channel long short-term memory compressed-CapsNet (DC-LSTM COMP-CapsNet) algorithm is proposed based on the structural features of CapsNet. Our proposed novel classifier can ensure the energy efficiency of the model and adequate compression method in speech emotion recognition, which is not delivered through the original structure of a CapsNet. Moreover, the grid search (GS) approach is used to attain optimal solutions. Results witnessed an improved performance and reduction in the training and testing running time. The speech datasets used to evaluate our algorithm are: Arabic Emirati-accented corpus, English speech under simulated and actual stress (SUSAS) corpus, English Ryerson audio-visual database of emotional speech and song (RAVDESS) corpus, and crowd-sourced emotional multimodal actors dataset (CREMA-D). This work reveals that the optimum feature extraction method compared to other known methods is MFCCs delta-delta. Using the four datasets and the MFCCs delta-delta, DC-LSTM COMP-CapsNet surpasses all the state-of-the-art systems, classical classifiers, CNN, and the original CapsNet. Using the Arabic Emirati-accented corpus, our results demonstrate that the proposed work yields average emotion recognition accuracy of 89.3% compared to 84.7%, 82.2%, 69.8%, 69.2%, 53.8%, 42.6%, and 31.9% based on CapsNet, CNN, support vector machine (SVM), multi-layer perceptron (MLP), k-nearest neighbor (KNN), radial basis function (RBF), and naive Bayes (NB), respectively.en_US
dc.description.sponsorshipUniversity of Sharjah [19020403139, 20020403159]en_US
dc.identifier.citationShahin, I., Hindawi, N., Nassif, A. B., Alhudhaif, A., & Polat, K. (2022). Novel dual-channel long short-term memory compressed capsule networks for emotion recognition. Expert Systems with Applications, 188, 116080.en_US
dc.identifier.doi10.1016/j.eswa.2021.116080
dc.identifier.endpage19en_US
dc.identifier.issn0957-4174
dc.identifier.issn1873-6793
dc.identifier.scopus2-s2.0-85117686381en_US
dc.identifier.scopusqualityQ1en_US
dc.identifier.startpage1en_US
dc.identifier.urihttp://dx.doi.org/10.1016/j.eswa.2021.116080
dc.identifier.urihttps://hdl.handle.net/20.500.12491/12047
dc.identifier.volume188en_US
dc.identifier.wosWOS:000768193500021en_US
dc.identifier.wosqualityQ1en_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakScopusen_US
dc.institutionauthorPolat, Kemal
dc.language.isoenen_US
dc.publisherPergamon-Elsevier Science Ltden_US
dc.relation.ispartofExpert Systems with Applicationsen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectCapsule Networksen_US
dc.subjectConvolutional Neural Networken_US
dc.subjectDeep Neural Networken_US
dc.subjectDual-Channelen_US
dc.subjectEmotion Recognitionen_US
dc.subjectLSTMen_US
dc.titleNovel dual-channel long short-term memory compressed capsule networks for emotion recognitionen_US
dc.typeArticleen_US

Dosyalar

Orijinal paket
Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
ismail-shahin.pdf
Boyut:
2.28 MB
Biçim:
Adobe Portable Document Format
Açıklama:
Tam Metin/Full Text
Lisans paketi
Listeleniyor 1 - 1 / 1
Küçük Resim Yok
İsim:
license.txt
Boyut:
1.44 KB
Biçim:
Item-specific license agreed upon to submission
Açıklama: