A05: Predicting student performance

Using this dataset, build a decision tree with scikit-learn that predicts student performance in a math course. Instead of using the dataset’s G1,G2,G3 columns, create a new “Pass” column defined as Pass=1 whenever (G1+G2+G3)>=35, 0 otherwise.


Produce a single PDF (using Jupyter most likely) that includes the following:

  • Code for all steps.
  • A tree diagram, produced by graphviz.
  • An explanation/interpretation of the tree diagram.
  • 5-fold cross-validation accuracy score for various max-tree-depths.
  • An example of using the best-scoring tree to predict a new row of data not included in the dataset (make up a row of data).

Grading rubric

  • 5 pts: All deliverables present.
  • 4 pts: All deliverables present except the search for the best tree across several max-tree-depths.
  • 3 pts: Only the code and tree diagram are present.
  • 2 pts: Inappropriate dataset cleaning and processing, but decision tree code is correct.
  • 0-1 pts: Inappropriate dataset cleaning and processing and incorrect decision tree code.

CSCI 431 material by Joshua Eckroth is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Source code for this website available at GitHub.