IJE TRANSACTIONS A: Basics Vol. 27, No. 1 (January 2014) 79-90   

downloaded Downloaded: 586   viewed Viewed: 4311

Z. Esmaileyan and H. Marvi
( Received: March 13, 2013 – Accepted: June 20, 2013 )

Abstract    Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian emotional speech corpus collected from emotional sentences of drama radio programs. Moreover, we proposed a new automatic speech emotion recognition system which is used both spectral and prosodic feature simultaneously. We compared the proposed database with the public and widely used Berlin database. The proposed SER system is developed for females and males separately. Then, irrelevant features are removed using Fisher Discriminant Ratio (FDR) filtering feature selection technique. The selected features are further reduced in dimensions using Linear Discriminant Analysis (LDA) embedding feature reduction scheme. Finally, the samples are classified by a LDA classifier. The overall recognition rate of 55.74% and 47.28% is achieved on proposed database for females and males, respectively. Also, the average recognition rate of 78.64% and 73.40% are obtained for Berlin database for females and males, respectively.


Keywords    Emotional Speech Database, PDREC, Speech Emotion Recognition.


چکیده    پیشرفت روزافزون در سیستم های اتوماتیک و رباتیک موجب شده است که محققان تلاش های زیادی در جهت افزایش کیفیت این ارتباط انجام دهند. از آنجا که گفتار متداول ترین روش ارتباط میان انسان هاست، تشخیص احساس انسان از روی گفتار به یکی از موضوعات چالش برانگیز در این حوزه تبدیل شده است. ما در این تحقیق یک پایگاه داده احساسی فارسی تدوین نموده ایم. جملات این پایگاه داده از نمایش های رادیویی موجود در وب سایت رسمی رادیو نمایش گرفته شده است. علاوه بر آن یک سیستم تشخیص احساس از روی گفتار فارسی طراحی نموده ایم. بدین منظور از ویژگی های عروضی و طیفی سیگنال گفتار استفاده گردیده است. نتایج حاصل از انجام آزمایشات بدست آمده از پایگاه داده ی پیشنهادی با پایگاه داده ی معروف برلین مقایسه شده است. سیستم مورد نظر برای گویندگان زن و مرد بصورت جداگانه طراحی شده است. در این سیستم ویژگی های غیر مرتبط و نویزی بوسیله ی الگوریتم انتخاب ویژگی فیشر حذف می شوند. ویژگی های انتخاب شده توسط الگوریتم فیشر، در یک مرحله ی دیگر توسط الگوریتم جداساز خطی کاهش می یابند. سپس داده ها با استفاده از کلاسه بند جداساز خطی کلاسه بندی می شوند. متوسط نرخ تشخیص بدست آمده برای زنان و مردان در پایگاه داده پیشنهادی 74/55% و 89/47% می باشد. همچنین متوسط نرخ تشخیص بدست آمده برای زنان و مردان در پایگاه داده برلین 64/78% و 40/73% می باشد.


1.     Nicholson, J., Takahashi, K. and Nakatsu, R., "Emotion recognition in speech using neural networks", Neural Computing & Applications,  Vol. 9, No. 4, (2000), 290-296.

2.     Schuller, B., Rigoll, G. and Lang, M., "Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture", in Acoustics, Speech, and Signal Processing, Proceedings.(ICASSP'04). IEEE International Conference on, Vol. 1, (2004), 577-580.

3.     France, D. J., Shiavi, R. G., Silverman, S., Silverman, M. and Wilkes, M., "Acoustical properties of speech as indicators of depression and suicidal risk", Biomedical Engineering, IEEE Transactions on,  Vol. 47, No. 7, (2000), 829-837.

4.     Abdi, J., Khalili, G. F., Fatourechi, M., Lucas, C. and Sedigh, A. K., "Control of multivariable systems based on emotional temporal difference learning controller", International Journal of Engineering-Transactions A: Basics,  Vol. 17, No. 4, (2004), 363.

5.     Mirmomeni, M. and Yazdanpanah, M., "An unsupervised learning method for an attacker agent in robot soccer competitions based on the kohonen neural network", International Journal of Engineering) IJE Transactions A: Basics,  Vol. 21, No. 3, (2008), 255-268.

6.     Hansen, J. H. and Cairns, D. A., "Icarus: Source generator based real-time recognition of speech in noisy stressful and lombard effect environments", Speech Communication,  Vol. 16, No. 4, (1995), 391-422.

7.     El Ayadi, M., Kamel, M. S. and Karray, F., "Survey on speech emotion recognition: Features, classification schemes, and databases", Pattern Recognition,  Vol. 44, No. 3, (2011), 572-587.

8.     Fernandez, R., A computational model for the automatic recognition of affect in speech., Massachusetts Institute of Technology (2004).

9.     Russell, J. A. and Barrett, L. F., "Core affect, prototypical emotional episodes, and other things called emotion: Dissecting the elephant", Journal of Personality and Social Psychology,  Vol. 76, No. 5, (1999), 805.

10.   Hozjan, V. and Kacic, Z., "Context-independent multilingual emotion recognition from speech signals", International Journal of Speech Technology,  Vol. 6, No. 3, (2003), 311-320.

11.   Ververidis, D. and Kotropoulos, C., "A review of emotional speech databases", in Proc. Panhellenic Conference on Informatics (PCI). (2003), 560-574.

12.   Babcock, D. E. and Miller, M. A., "Client education: Theory & practice", Gastroenterology Nursing,  Vol. 18, No. 4, (1995), 157.

13.   Buck, R., "Human motivation and emotion", John Wiley & Sons,  (1988).

14.   Ross, N., Medin, D. and Cox, D., "Epistemological models and culture conflict: Menominee and euroamerican hunters in wisconsin", ETHOS,  Vol. 35, No. 4, (2007), 478-515.

15.   Lewis, M. and Michalson, L., "Children's emotions and moods: Developmental theory and measurement", Plenum Press New York:,  (1983).

16.   Malatesta, C. Z. and Kalnok, M., "Emotional experience in younger and older adults", Journal of Gerontology,  Vol. 39, No. 3, (1984), 301-308.

17.   de Albornoz, J. C., Plaza, L., Gervás, P. and Díaz, A., A joint model of feature mining and sentiment analysis for product review rating, in Advances in information retrieval. Springer. (2011) 55-66.

18.   Cosmides, L. and Tooby, J., "Evolutionary psychology, moral heuristics, and the law", Dahlem University Press,  (2006).

19.   Morrison, D., Wang, R. and De Silva, L. C., "Ensemble methods for spoken emotion recognition in call-centres", Speech Communication,  Vol. 49, No. 2, (2007), 98-112.

20.   Slaney, M. and McRoberts, G., "Baby ears: A recognition system for affective vocalizations", in Acoustics, Speech and Signal Processing, 1998. Proceedings of the  IEEE International Conference on, Vol. 2, (1998), 985-988.

21.   "Bavarian archive for speech signals, http://www.Bas.Uni-muenchen.De/bas/".

22.   Liberman, M., Davis, K., Grossman, M., Martey, N. and Bell, J., "Emotional prosody speech and transcripts", Linguistic Data Consortium, Philadelphia,  (2002).

23.   Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F. and Weiss, B., "A database of german emotional speech", in Interspeech. (2005), 1517-1520.

24.   Engberg, I. S. and Hansen, A. V., "Documentation of the danish emotional speech database des", Internal AAU report, Center for Person Kommunikation, Denmark,  (1996).

25.   Nwe, T. L., Foo, S. W. and De Silva, L. C., "Speech emotion recognition using hidden markov models", Speech Communication,  Vol. 41, No. 4, (2003), 603-623.

26.   Hozjan, V., Kacic, Z., Moreno, A., Bonafonte, A. and Nogueiras, A., "Interface databases: Design and collection of a multilingual emotional speech database", in LREC, (2002).

27.   Breazeal, C. and Aryananda, L., "Recognition of affective communicative intent in robot-directed speech", Autonomous Robots,  Vol. 12, No. 1, (2002), 83-104.

28.   Hansen, J. H., Bou-Ghazale, S. E., Sarikaya, R. and Pellom, B., "Getting started with susas: A speech under simulated and actual stress database", in Eurospeech. Vol. 97, (1997), 1743-46.

29.   Schuller, B., Reiter, S., Muller, R., Al-Hames, M., Lang, M., and Rigoll, G., "Speaker independent speech emotion recognition by ensemble classification", in Multimedia and Expo,. ICME International Conference on, IEEE. (2005), 864-867.

30.   Fu, L., Mao, X. and Chen, L., "Speaker independent emotion recognition based on svm/hmms fusion system", in Audio, Language and Image Processing, ICALIP  International Conference on, IEEE. (2008), 61-65.

31.   Schuller, B., "Towards intuitive speech interaction by the integration of emotional aspects", in Systems, Man and Cybernetics, International Conference on, IEEE. Vol. 6, (2002), 6 -11

32.   Petrushin, V., "Emotion in speech: Recognition and application to call centers", in Proceedings of Artificial Neural Networks in Engineering. (1999), 7-10.

33.   Makarova, V., "A database of russian emotional utterances", in ICSLP 2002. (2002).

34.   Kim, E. H., Hyun, K. H., Kim, S. H. and Kwak, Y. K., "Speech emotion recognition using eigen-fft in clean and noisy environments", in Robot and Human interactive Communication, RO-MAN. The 16th International Symposium on, IEEE. (2007), 689-694.

35.   Zhou, J., Wang, G., Yang, Y. and Chen, P., "Speech emotion recognition based on rough set and svm", in Cognitive Informatics, ICCI. 5th International Conference on, IEEE. Vol. 1,., (2006), 53-61.

36.   Hu, H., Xu, M.-X. and Wu, W., "Gmm supervector based svm with spectral features for speech emotion recognition", in Acoustics, Speech and Signal Processing,. ICASSP. International Conference on, IEEE. Vol. 4, (2007), IV-413-IV-416.

37.   Pereira, C., "Dimensions of emotional meaning in speech", in ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion., (2000).

38.   Rabiner, L. and Juang, B.-H., "Fundamentals of speech recognition", (1993).

39.   Wu, S., Falk, T. H. and Chan, W.-Y., "Automatic speech emotion recognition using modulation spectral features", Speech Communication,  Vol. 53, No. 5, (2011), 768-785.

40.   Ntalampiras, S. and Fakotakis, N., "Modeling the temporal evolution of acoustic parameters for speech emotion recognition", Affective Computing, IEEE Transactions on,  Vol. 3, No. 1, (2012), 116-125.

41.   Kim, E. H., Hyun, K. H., Kim, S. H. and Kwak, Y. K., "Improved emotion recognition with a novel speaker-independent feature", Mechatronics, IEEE/ASME Transactions on,  Vol. 14, No. 3, (2009), 317-325.

42.   Lee, C.-C., Mower, E., Busso, C., Lee, S. and Narayanan, S., "Emotion recognition using a hierarchical binary decision tree approach", Speech Communication,  Vol. 53, No. 9, (2011), 1162-1171.

43.   Pérez-Espinosa, H., Reyes-García, C. A. and Villaseñor-Pineda, L., "Acoustic feature selection and classification of emotions in speech using a 3d continuous emotion model", Biomedical Signal Processing and Control,  Vol. 7, No. 1, (2012), 79-87.

44.   Bozkurt, E., Erzin, E., Erdem, Ç. E. and Erdem, A. T., "Formant position based weighted spectral features for emotion recognition", Speech Communication,  Vol. 53, No. 9, (2011), 1186-1197.

45.   Harimi, A., Marvi, H. and Esmaileyan, Z., "Estimation of lpc coefficients using evolutionary algorithms", Journal of AI and Data MiningJournal of AI and Data Mining,  Vol. 1, No. 2, (2013), 111-118.

46.   Steidl, S., Batliner, A., Noth, E. and Hornegger, J., Quantization of segmentation and f0 errors and their effect on emotion recognition, in 11th international conference on Text, Speech and Dialogue.: Heidelberg: Springer-Verlag. (2008) 525-534.

47.   Krajewski, J. and Kröger, B. J., "Using prosodic and spectral characteristics for sleepiness detection", in Interspeech., (2007), 1841-1844.

48.   Rong, J., Li, G. and Chen, Y.-P. P., "Acoustic feature selection for automatic emotion recognition from speech", Information Processing & Management,  Vol. 45, No. 3, (2009), 315-328.

49.   Schuller, B., Batliner, A., Steidl, S. and Seppi, D., "Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge", Speech Communication,  Vol. 53, No. 9, (2011), 1062-1087.

50.   Altun, H. and Polat, G., "Boosting selection of speech related features to improve performance of multi-class svms in emotion detection", Expert Systems with Applications,  Vol. 36, No. 4, (2009), 8197-8203.

51.   Bishop, C. M. and Nasrabadi, N. M., "Pattern recognition and machine learning", springer New York,  Vol. 1,  (2006).

52.   Ye, J., Janardan, R., Li, Q. and Park, H., "Feature extraction via generalized uncorrelated linear discriminant analysis", in Proceedings of the twenty-first international conference on Machine learning, ACM. (2004), 113.

53.   Laukka, P., Neiberg, D., Forsell, M., Karlsson, I. and Elenius, K., "Expression of affect in spontaneous speech: Acoustic correlates and automatic detection of irritation and resignation", Computer Speech & Language,  Vol. 25, No. 1, (2011), 84-104.

54.           Raudys, S. J. and Jain, A. K., "Small sample size effects in statistical pattern recognition: Recommendations for practitioners", IEEE Transactions on Pattern Analysis and Machine Intelligence,  Vol. 13, No. 3, (1991), 252-264.

International Journal of Engineering
E-mail: office@ije.ir
Web Site: http://www.ije.ir