Hybrid BBO_PSO and higher order spectral features for emotion and stress recognition from natural speech
Yükleniyor...
Dosyalar
Tarih
2017
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Elsevier
Erişim Hakkı
info:eu-repo/semantics/closedAccess
Özet
The aim of the present study is to select a set of higher order spectral features for emotion/stress recognition system. 50 Bispectral (28 features) and Bicoherence (22 features) based higher order spectral features were extracted from speech signal and its glottal waveform. These features were combined with Inter-Speech 2010 features to further improve the recognition rates. Feature subset selection (FSS) was carried out in this proposed work with the objective of maximizing emotion recognition rate for subject independent with minimum features. The FSS contains two stages: Multi-cluster feature selection was adopted in Stage 1 to reduce feature space and identify relevant feature subset from Interspeech 2010 features. In Stage 2, Biogeography based optimization (BBO), Particle swarm optimization (PSO) and proposed BBO_PSO Hybrid optimization were performed to further reduce the dimension of feature space and identify the most relevant feature subset, which has higher discrimination ability to distinguish different emotional states. The proposed method was tested in three different databases: Berlin emotional speech database (BES), Surrey audio-visual expressed emotion database (SAVEE) and Speech under simulated and actual stress (SUSAS) simulated domain. The proposed feature set was evaluated with subject independent (SI), subject dependent (SD), gender dependent male (GD-male), gender dependent female (GD-female), text independent pairwise speech (TIDPS), and text independent multi-style speech (TIDMSS) experiments by using SVM and ELM classifiers. From the results obtained, it is evident that the proposed method attained accuracies of 93.25% (SI), 100% (SD), 93.75% (GD-male), and 97.58% (GD-female) for BES; 62.38% (SI) and 76.19% (SD) for SAVEE; and 90.09% (TIDMSS), 97.04% (TIDPS - Angryvs. Neutral), 98.89% (TIDPS - Lombard vs. Neutral), 99.07% (TIDPS - Loud vs. Neutral) for SUSAS. (c) 2017 Elsevier B.V. All rights reserved.
Açıklama
Anahtar Kelimeler
Speech Signals, Feature Extraction, Feature Selection and Emotion Recognition
Kaynak
Applied Soft Computing
WoS Q Değeri
Q1
Scopus Q Değeri
Q1
Cilt
56