PREDICTION OF SOCIAL LAG IN MEXICO: A MACHINE LEARNING APPROACH FROM ECONOMIC UNIT DATA

Autores/as

  • Pablo Rodrigo Ávila-Solís
  • Juan Manuel González-Camacho
  • Delfino Vargas-Chanes
  • Paulino Peréz-Rodríguez

DOI:

https://doi.org/10.47163/agrociencia.v56i2.2768

Palabras clave:

decision trees, poverty, classification models, logistic regression, artificial intelligence.

Resumen

Social lag in Mexico is officially calculated at the municipal level every five years, based on data from the population and housing census, by the National Council for the Evaluation of Social Development Policy (Consejo Nacional para la Evaluación de la Política de Desarrollo Social, CONEVAL). However, it is advisable to have annual forecasts of social lag for the follow-up of public policies. This study presents a machine learning approach to predict the classes or degrees of social lag (high, medium, low) at the municipal level in Mexico, based on information on economic units from the 2015 National Statistical Directory of Economic Units (Directorio Estadístico Nacional de Unidades Económicas, DENUE). Three supervised machine learning classifiers were implemented: logistic regression, support vector machine and random forests; and they were trained and tested in prediction based on information from counts of economic units, in their different categories, population and geographic coordinates of the municipalities; likewise, objective social lag classes were used at the municipal level, calculated with 2015 census information, reported in the literature. The criteria for evaluating the performance of the classifiers were the F-macro value, the overall accuracy and the area under the curve (ROC). The results indicate that the best overall performance was obtained with the random forest classifier with an F-macro value of 0.713 and an overall classification accuracy of 0.716; and F1-macro values for the high, medium, and low social lag classes of 0.596, 0.730, and 0.822, respectively. This classifier was used to predict social lag for 2016 and 2017. The results showed that there is a relationship between social lag and economic units aggregated at the subsector level, and that the proposed approach represents a viable and low-cost alternative for predicting social lag when census information is lacking.

Archivos adicionales

Publicado

18-04-2022

Número

Sección

Matemáticas Aplicadas, Estadística y Computación