An Evaluation of the Wisconsin Breast Cancer Dataset using Ensemble Classifiers and RFE Feature Selection Technique

Sulyman Age  Abdulkareem; Zainab Olorunbukademi  Abdulkareem

Authors

Sulyman Age Abdulkareem Institute for Communication Systems, Home of 5G and 6G Innovation Centre, University of Surrey, Guildford, GU2 7XH, UK
Zainab Olorunbukademi Abdulkareem Computer and Information Sciences Department, University of Strathclyde, Glasgow, G1 1XQ, UK

Keywords:

Breast Cancer, WBCD, XGBoost, RF, RFE, Ensemble Classifiers

Abstract

Breast cancer represents one of the deadliest diseases that records a high number of death rate annually. It is the most common type of cancer and the main cause of death among women worldwide. Machine learning (ML) approach is an effective way to classify data, especially in medical field. It is widely used for classification and analysis to make decisions. In this paper, a performance comparison between two ensemble ML classifiers: Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) on the Wisconsin Breast Cancer Dataset (WBCD) is conducted. The main objective of this study is to assess the correctness of the classifiers with respect to their efficiency and effectiveness in classifying the dataset. This was done by utilizing all and reduced features of the dataset that were generated with Recursive Feature Elimination (RFE) feature selection technique. Four metrics were used in the study: Accuracy, Precision, Recall and F1-Score to evaluate the classifiers. All experiments were executed within Anaconda Environment with Jupyter Notebook and conducted using Python programming language. Experimental result shows that XGBoost with 5 reduced feature using RFE feature selection technique gives the highest accuracy (99.02%) with lowest error rate.

References

. “WHO - Breast Cancer: Prevention and Control,” https://www.who.int/ health-topics/cancer, 2020, Accessed December 3, 2020, from WHO - World Health Organization.

. U. C. S. W. Group et al., “United states cancer statistics: 1999–2011 incidence and mortality web-based report,” Atlanta (GA): Department of Health and Human Services, Centers for Disease Control and Prevention, and National Cancer Institute, 2014.

. “NCI. SEER: Cancer Statistics Review,” 2012.

. L. R. Borges, “Analysis of the wisconsin breast cancer dataset and machine learning for breast cancer detection,” Group, vol. 1, no. 369, 1989.

. J. G. Elmore, C. Y. Nakano, T. D. Koepsell, L. M. Desnick, C. J. D’orsi, and D. F. Ransohoff, “International variation in screening mammography interpretations in community-based programs,” Journal of the National Cancer Institute, vol. 95, no. 18, pp. 1384–1393, 2003.

. R. W. Giard and J. Hermans, “The value of aspiration cytologic examination of the breast a statistical review of the medical literature,” Cancer, vol. 69, no. 8, pp. 2104–2110, 1992.

. J. G. Elmore, K. Armstrong, C. D. Lehman, and S. W. Fletcher, “Screening for breast cancer,” Jama, vol. 293, no. 10, pp. 1245–1256, 2005.

. D. Michie, D. J. Spiegelhalter, C. Taylor et al., “Machine learning,” Neural and Statistical Classification, vol. 13, no. 1994, pp. 1–298, 1994.

. S. Saxena and K. Burse, “A survey on neural network techniques for classification of breast cancer data,” International Journal of Engineering and Advanced Technology, vol. 2, no. 1, pp. 234–237, 2012.

. G. I. Salama, M. Abdelhalim, and M. A.-e. Zeid, “Breast cancer diagnosis on three different datasets using multi-classifiers,” Breast Cancer (WDBC), vol. 32, no. 569, p. 2, 2012.

. D. Lavanya and K. U. Rani, “Analysis of feature selection with classification: Breast cancer datasets,” Indian Journal of Computer Science and Engineering (IJCSE), vol. 2, no. 5, pp. 756–763, 2011.

. V. Chaurasia and S. Pal, “Data mining techniques: to predict and resolve breast cancer survivability,” International Journal of Computer Science and Mobile Computing IJCSMC, vol. 3, no. 1, pp. 10–22, 2014.

. H. Asri, H. Mousannif, H. Al Moatassime, and T. Noel, “Using machine learning algorithms for breast cancer risk prediction and diagnosis,” Procedia Computer Science, vol. 83, pp. 1064–1069, 2016.

. D. Bazazeh and R. Shubair, “Comparative study of machine learning algorithms for breast cancer detection and diagnosis,” in 2016 5th International Conference on Electronic Devices, Systems and Applications (ICEDSA). IEEE, 2016, pp. 1–4.

. J. Ivancˇakov ´ a, F. Babi ´ c, and P. Butka, “Comparison of different machine ˇ learning methods on wisconsin dataset,” in 2018 IEEE 16th World Symposium on Applied Machine Intelligence and Informatics (SAMI). IEEE, 2018, pp. 000 173–000 178.

. S. S. Shajahaan, S. Shanthi, and V. ManoChitra, “Application of data mining techniques to model breast cancer data,” International Journal of Emerging Technology and Advanced Engineering, vol. 3, no. 11, pp. 362–369, 2013.

. M. Amrane, S. Oukid, I. Gagaoua, and T. Ensar˙I, “Breast cancer classification using machine learning,” in 2018 Electric Electronics, Computer Science, Biomedical Engineerings’ Meeting (EBBT). IEEE, 2018, pp. 1–4.

. E. A. Bayrak, P. Kırcı, and T. Ensari, “Comparison of machine learning methods for breast cancer diagnosis,” in 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT). IEEE, 2019, pp. 1–3.

. L. G. Ahmad, A. Eshlaghy, A. Poorebrahimi, M. Ebrahimi, A. Razavi et al., “Using three machine learning techniques for predicting breast cancer recurrence,” J Health Med Inform, vol. 4, no. 124, p. 3, 2013.

. M. I. H. Showrov, M. T. Islam, M. D. Hossain and M. S. Ahmed, "Performance Comparison of Three Classifiers for the Classification of Breast Cancer Dataset," 2019 4th International Conference on Electrical Information and Communication Technology (EICT), Khulna, Bangladesh, 2019, pp. 1-5, doi: 10.1109/EICT48899.2019.9068816.

. M. Amrane, S. Oukid, I. Gagaoua and T. Ensarİ, "Breast cancer classification using machine learning," 2018 Electric Electronics, Computer Science, Biomedical Engineering’s' Meeting (EBBT), Istanbul, 2018, pp. 1-4, doi: 10.1109/EBBT.2018.8391453.

. S. K. Sarkar and A. Nag, “Identifying patients at risk of breast cancer through decision trees,” International Journal of Advanced Research in Computer Science, vol. 8, no. 8, pp. 88–91, 2017.

. [23] D. Houfani, S. Slatnia, O. Kazar, N. Zerhouni, H. Saouli, and I. Remadna, “Breast cancer classification using machine learning techniques: a comparative study”, Medical Technologies Journal, vol. 4, no. 2, pp. 535–544.

. “UCI Breast Cancer Wisconsin (Original) Datase,” https://archive.ics. uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Original29., 2020, Accessed December 24, 2020.

. I. Kononenko, “Machine learning for medical diagnosis: history, state of the art and perspective,” Artificial Intelligence in medicine, vol. 23, no. 1, pp. 89–109, 2001.

. Y. Yasui and X. Wang, “Statistical learning from a regression perspective by berk, ra,” Biometrics, vol. 65, no. 4, pp. 1309–1310, 2009.

. T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785–794.

. M. Gumus and M. S. Kiran, “Crude oil price forecasting using xgboost,” in 2017 International Conference on Computer Science and Engineering (UBMK). IEEE, 2017, pp. 1100–1103.

. M. Çinar, M. Engin, E. Z. Engin, and Y. Z. Ates¸c¸i, “Early prostate cancer diagnosis by using artificial neural networks and support vector machines,” Expert Systems with Applications, vol. 36, no. 3, pp. 6357– 6361, 2009.

. R. Sumbaly, N. Vishnusri, and S. Jeyalatha, “Diagnosis of breast cancer using decision tree data mining technique,” International Journal of Computer Applications, vol. 98, no. 10, 2014.

. K. P. Bennett and J. Blue, “A support vector machine approach to decision trees,” in 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No. 98CH36227), vol. 3. IEEE, 1998, pp. 2396–2401.

. V. Chaurasia, S. Pal, and B. Tiwari, “Prediction of benign and malignant breast cancer using data mining techniques,” Journal of Algorithms & Computational Technology, vol. 12, no. 2, pp. 119–126, 2018.

. M. M. Islam, M. R. Haque, H. Iqbal, M. M. Hasan, M. Hasan, and M. N. Kabir, “Breast cancer prediction: a comparative study using machine learning techniques,” SN Computer Science, vol. 1, no. 5, pp. 1–14, 2020.

An Evaluation of the Wisconsin Breast Cancer Dataset using Ensemble Classifiers and RFE Feature Selection Technique

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Information

Developed By

Current Issue