IJE TRANSACTIONS B: Applications Vol. 30, No. 11 (November 2017) 1723-1729   

downloaded Downloaded: 147   viewed Viewed: 1636

S. Kumar and G Sahoo
( Received: April 06, 2017 – Accepted in Revised Form: September 08, 2017 )

Abstract    Machine learning-based classification techniques provide support for the decision making process in the field of healthcare, especially in disease diagnosis, prognosis and screening. Healthcare datasets are voluminous in nature and their high dimensionality problem comprises in terms of slower learning rate and higher computational cost. Feature selection is expected to deal with the high dimensionality of datasets in terms of reduced feature set. Feature selection improves the performance of classification accuracy particularly performing with less number of features in decision making process. In this paper, Random Forest (RF) is employed for the diagnosis of cardiovascular disease. The first phase of the proposed system aims at constructing various feature selection algorithms such as Principal Component Analysis (PCA), Relief- F, Sequential Forward Floating Search (SFFS), Sequential Backward Floating Search (SBFS) and Genetic Algorithm (GA) for reducing the dimension of cardiovascular disease dataset. The second phase switched to model construction based on RF algorithm for cardiovascular disease classification. The outcome shows that the combination with GA and RF delivered the highest classification accuracy of 93.2% by the help of six features.


Keywords    Random Forest, Genetic Algorithm, Feature Selection, Cardiovascular Disease


چکیده    روش های طبقه بندی مبتنی بر یادگیری ماشین، از فرآیند تصمیم گیری در زمینه مراقبت های بهداشتی، به ویژه در تشخیص بیماری، پیش آگهی و غربالگری حمایت می کند. مجموعه داده های مراقبت های بهداشتی به طور طبیعی در مقیاس وسیع هستند و مشکل بزرگ بودن آنها شامل نرخ یادگیری کمتر و هزینه های محاسباتی بالاتر است. انتظار می رود که انتخاب ویژگی با ابعاد بالاتری از مجموعه داده ها از لحاظ تنظیم ویژگی های کاهش یافته باشد. انتخاب ویژگی عملکرد دقت طبقه بندی را به ویژه با انجام تعداد کمتر از ویژگی های در روند تصمیم گیری بهبود می بخشد. در این مقاله، فارست تصادفی (RF) برای تشخیص بیماری قلبی عروقی مورد استفاده قرار می گیرد. هدف فاز اول سیستم پیشنهادی، ساخت الگوریتم های انتخابی گوناگون مانند تجزیه و تحلیل مولفه های اصلی (PCA)، Relief-F، جستجو شناور متوالی مستقیم (SFFS)، جستجو به صورت شناور متوالی بازگشت به عقب (SBFS) و الگوریتم ژنتیک (GA) برای کاهش بعد مجموعه داده های بیماری های قلبی عروقی است. فاز دوم، ساخت مدل بر اساس الگوریتم RF برای طبقه بندی بیماری های قلبی عروقی تغییر یافت. نتیجه نشان می دهد که ترکیب با GA و RF بالاترین ضریب طبقه بندی ۲/۹۳٪ را با کمک شش ویژگی ارائه می کند.


1.      Koh, H.C. and Tan, G., "Data mining applications in healthcare", Journal of Healthcare Information Management,  Vol. 19, No. 2, (2011), 65-73.

2.      Dietterich, T.G., "Ensemble methods in machine learning", Multiple Classifier Systems,  Vol. 1857, (2000), 1-15.

3.      Van Der Maaten, L., Postma, E. and Van den Herik, J., "Dimensionality reduction: A comparative", The Journal of Machine Learning Research,  Vol. 10, (2009), 66-71.

4.      Guyon, I. and Elisseeff, A., "An introduction to variable and feature selection", Journal of Machine Learning Research,  Vol. 3, No. Mar, (2003), 1157-1182.

5.      Organization, W.H., "Prevention of cardiovascular disease: Guidelines for assessment and management of cardiovascular risk, World Health Organization,  (2007).

6.      Shilaskar, S. and Ghatol, A., "Feature selection for medical diagnosis: Evaluation for cardiovascular diseases", Expert Systems with Applications,  Vol. 40, No. 10, (2013), 4146-4153.

7.      Inbarani, H.H., Azar, A.T. and Jothi, G., "Supervised hybrid feature selection based on pso and rough sets for medical diagnosis", Computer Methods and Programs in Biomedicine,  Vol. 113, No. 1, (2014), 175-185.

8.      Liu, X., Wang, X., Su, Q., Zhang, M., Zhu, Y., Wang, Q. and Wang, Q., "A hybrid classification system for heart disease diagnosis based on the rfrs method", Computational and Mathematical Methods in Medicine,  Vol. 2017, No., (2017).

9.      Shafiee-Chafi, M. and Gholizade-Narm, H., "A novel fuzzy based method for heart rate variability prediction", International Journal of Engineering-Transactions A: Basics,  Vol. 27, No. 7, (2014), 1041.

10.    Polat, K., Sahan, S. and Gunes, S., "Automatic detection of heart disease using an artificial immune recognition system (airs) with fuzzy resource allocation mechanism and k-nn (nearest neighbour) based weighting preprocessing", Expert Systems with Applications,  Vol. 32, No. 2, (2007), 625-631.

11.    Shouman, M., Turner, T. and Stocker, R., "Using decision tree for diagnosing heart disease patients", in Proceedings of the Ninth Australasian Data Mining Conference-Volume 121, Australian Computer Society, Inc. (2011), 23-30.

12.    Das, R., Turkoglu, I. and Sengur, A., "Effective diagnosis of heart disease through neural networks ensembles", Expert Systems with Applications,  Vol. 36, No. 4, (2009), 7675-7680.

13.    Holland, J.H., "Genetic algorithms", Scientific American,  Vol. 267, No. 1, (1992), 66-73.

14.    Azar, A.T., Elshazly, H.I., Hassanien, A.E. and Elkorany, A.M., "A random forest classifier for lymph diseases", Computer Methods and Programs in Biomedicine,  Vol. 113, No. 2, (2014), 465-473.

15.    Elsayed, S.M., Sarker, R.A. and Essam, D.L., "A new genetic algorithm for solving optimization problems", Engineering Applications of Artificial Intelligence,  Vol. 27, (2014), 57-69.

16.    Amit, Y. and Geman, D., "Shape quantization and recognition with randomized trees", Neural Computation,  Vol. 9, No. 7, (1997), 1545-1588.

17.    Breiman, L., "Random forests", Machine Learning,  Vol. 45, No. 1, (2001), 5-32.

18.    Newman, D., Hettich, S., Blake, C., Merz, C. and Aha, D., "Uci repository of machine learning databases. Department of information and computer science, university of california, irvine, ca", in 1998 of Conference, http://archive. ics. uci. edu/ml/datasets. html., (1998).

19.    Powers, D.M., "Evaluation: From precision, recall and f-measure to roc, informedness, markedness and correlation",  Vol., No., (2011).

20.    Yang, T.-N. and Wang, S.-D., "Robust algorithms for principal component analysis", Pattern Recognition Letters,  Vol. 20, No. 9, (1999), 927-933.

21.    Kira, K. and Rendell, L.A., "A practical approach to feature selection", in Proceedings of the ninth international workshop on Machine learning., (1992), 249-256.

22.    Pudil, P., Novovicova, J. and Kittler, J., "Floating search methods in feature selection", Pattern Recognition Letters,  Vol. 15, No. 11, (1994), 1119-1125.

23.             Donner, A., Shoukri, M.M., Klar, N. and Bartfay, E., "Testing the equality of two dependent kappa statistics", Statistics in Medicine,  Vol. 19, No. 3, (2000), 373-387.

International Journal of Engineering
E-mail: office@ije.ir
Web Site: http://www.ije.ir