25% of the students in my hometown studying at public institutions fail or drop school before completing the 9th grade. The brazilian governement needs to know who are these students with high chance of failure before they actually fail in order to allocate resources efficiently (e.g. one-to-one tutoring).
In this work, we analysed a population of students in Recife, PE - Brazil. The dataset consisted of social-economical indicators such as:
- Number of automobile at home
- Father/mother education level
- Math/Language scores in the previous year
- and others…
We identified the most relevant predictor variables in the dataset with distance-based general sensitivity analysis and trained a random forest model to predict the condition of the student in the following year.
In the map below, each circle is a school. The size of the circle indicates the number of students in that school who have filled in the census forms correctly in the beginning of the academic year.
The algorithm predicts the condition of the student as one of PASS, FAIL or DROP in the following year. The color of the circle represents the predicted percentage of students passing (i.e. PASS).
By clicking on a school (e.g. Padre Jose de Anchieta), we obtain a list of students (fake names for privacy reasons) ranked by their chances of failure as illustrated below. This information was used to mitigate inefficiencies in the school network.