Emotional speaker identification using a novel capsule nets model

dc.authorid0000-0003-1570-0897en_US
dc.authorid0000-0001-7856-9342en_US
dc.authorid0000-0003-2265-7268en_US
dc.authorid0000-0002-7201-6963en_US
dc.authorid0000-0003-1840-9958en_US
dc.contributor.authorNassif, Ali Bou
dc.contributor.authorShahin, Ismail
dc.contributor.authorElnagar, Ashraf
dc.contributor.authorVelayudhan, Divya
dc.contributor.authorAlhudhaif, Adi
dc.contributor.authorPolat, Kemal
dc.date.accessioned2024-02-22T05:55:22Z
dc.date.available2024-02-22T05:55:22Z
dc.date.issued2022en_US
dc.departmentBAİBÜ, Mühendislik Fakültesi, Elektrik Elektronik Mühendisliği Bölümüen_US
dc.descriptionThe authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. The authors would also like to thank the University of Sharjah for funding this project.en_US
dc.description.abstractSpeaker recognition systems are widely used in various applications to identify a person by their voice; however, the high degree of variability in speech signals makes this a challenging task. Dealing with emotional variations is very difficult because emotions alter the voice characteristics of a person; thus, the acoustic features differ from those used to train models in a neutral environment. Therefore, speaker recognition models trained on neutral speech fail to correctly identify speakers under emotional stress. Although considerable advancements in speaker identification have been made using convolutional neural networks (CNN), CNNs cannot exploit the spatial association between low-level features. Inspired by the recent introduction of capsule networks (CapsNets), which are based on deep learning to overcome the inadequacy of CNNs in preserving the pose relationship between low-level features with their pooling technique, this study investigates the performance of using CapsNets in identifying speakers from emotional speech recordings. A CapsNet-based speaker identification model is proposed and evaluated using three distinct speech databases, i.e., the Emirati Speech Database, SUSAS Dataset, and RAVDESS (open-access). The proposed model is also compared to baseline systems. Experimental results demonstrate that the novel proposed CapsNet model trains faster and provides better results over current stateof-the-art schemes. The effect of the routing algorithm on speaker identification performance was also studied by varying the number of iterations, both with and without a decoder network.en_US
dc.description.sponsorshipUniversity of Sharjahen_US
dc.identifier.citationNassif, A. B., Shahin, I., Elnagar, A., Velayudhan, D., Alhudhaif, A., & Polat, K. (2022). Emotional speaker identification using a novel capsule nets model. Expert Systems with Applications, 193, 116469.en_US
dc.identifier.doi10.1016/j.eswa.2021.116469
dc.identifier.endpage13en_US
dc.identifier.issn0957-4174
dc.identifier.issn1873-6793
dc.identifier.scopus2-s2.0-85123926490en_US
dc.identifier.scopusqualityQ1en_US
dc.identifier.startpage1en_US
dc.identifier.urihttp://dx.doi.org/10.1016/j.eswa.2021.116469
dc.identifier.urihttps://hdl.handle.net/20.500.12491/12036
dc.identifier.volume193en_US
dc.identifier.wosWOS:000748526500009en_US
dc.identifier.wosqualityQ1en_US
dc.indekslendigikaynakWeb of Scienceen_US
dc.indekslendigikaynakScopusen_US
dc.institutionauthorPolat, Kemal
dc.language.isoenen_US
dc.publisherPergamon-Elsevier Science Ltden_US
dc.relation.ispartofExpert Systems with Applicationsen_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectCapsule Networken_US
dc.subjectConvolutional Neural Networken_US
dc.subjectEmotional Speechen_US
dc.subjectRecognitionen_US
dc.subjectComputeren_US
dc.subjectSystemen_US
dc.titleEmotional speaker identification using a novel capsule nets modelen_US
dc.typeArticleen_US

Dosyalar

Orijinal paket
Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
ali-bou-nassif.pdf
Boyut:
4.39 MB
Biçim:
Adobe Portable Document Format
Açıklama:
Tam Metin/Full Text
Lisans paketi
Listeleniyor 1 - 1 / 1
Küçük Resim Yok
İsim:
license.txt
Boyut:
1.44 KB
Biçim:
Item-specific license agreed upon to submission
Açıklama: