Random Forests for Poverty Classification
Keywords:
data mining, classification, poverty, random forests, poverty-gender gapAbstract
This paper applies a relatively novel method in data mining to address the issue of poverty classification in Mauritius. The random forests algorithm is applied to the census data in view of improving classification accuracy for poverty status. The analysis shows that the numbers of hours worked, age, education and sex are the most important variables in the classification of the poverty status of an individual. In addition, a clear poverty-gender gap is identified as women have higher chances to be classified as poor as compared to men.
References
World_Bank, "The World Bank Working for a World Free of Poverty," 2014. [Online]. Available: http://www.worldbank.org/en/topic/poverty/overview. [Accessed 16 July 2014].
P. Olinto, K. Beegle, C. Sobrado and H. Uematsu, "The State of the Poor: Where Are The Poor, Where Is Extreme Poverty Harder to End, and What Is the Current Profile of the World
A. Banovcinova, J. Levicka and M. Veres, "The Impact of Poverty on the Family System Functioning," Procedia - Social and Behavioral Sciences, vol. 132, p. 148
E. O. Wright, "The Class Analysis of Poverty," International Journal of Health Services , vol. 25, no. 1, pp. 85 - 100 , 1995.
F. N. Stapleford, "Causes of Poverty," The Public Health Journal, vol. 10, no. 4, pp. 157-161, 1919.
S. J. Lipina and J. A. Colombo, Poverty and brain development during childhood: An approach from cognitive psychology and neuroscience., Washington, DC, US: American Psychological Association, 2009.
V. Barham, R. Boadway, M. Marchand and P. Pestieau, "Education and the poverty trap," European Economic Review, vol. 39, no. 7, p. 1257
C. Hokayem and M. L. Heggeness, "Living in Near Poverty in the United States:1966 - 2012," U.S. Census Bureau, 2014.
H. Bundhoo, "Poverty Analysis 2001/02," Central Statistics Office, Ministry of Finance and Economic Development, Port Louis, 2006.
R. Nisbet, J. Elder and G. Miner, Handbook of Statistical Analysis and Data Mining Applications, Academic Press, 2009.
L. Breiman, "Random Forests," Machine Learning, vol. 45, pp. 5-32, 2001.
G. Louppe, L. Wehenkel, A. Sutera and P. Geurts, "Understanding variable importances in forests of randomized trees," Electronic Proceedings, 2013.
C. Vickery, "The Time-Poor: A New Look at Poverty," The Journal of Human Resources, vol. 12, no. 1, pp. 22-48, 1977.
L. Breiman, "Out-of-Bag Estimation," Technical report, Statistics Department, University of California Berkeley, Berkeley CA 94708, pp. 1-13, 1996.
L. Breiman, "Manual on Setting Up, Using, And Understanding Random Forests V3.1," Technical Report, 2002.
M. Pal, "Random forest classifier for remote sensing classification," International Journal of Remote Sensing, vol. 26, no. 1, pp. 217-222, 2005.
J. Maindonald and W. J. Braun, Data Analysis and Graphics Using R: An Example-Based Approach, 3 ed., New York: Cambridge University Press, 2010.
M. Kuhn, "Variable Importance Using the Caret Package," 19 March 2012. [Online]. Available: http://www.icesi.edu.co/CRAN/web/packages/caret/vignettes/caretVarImp.pdf. [Accessed 21 July 2014].
Downloads
Published
How to Cite
Issue
Section
License
Authors who submit papers with this journal agree to the following terms.