A08: NFL play-by-play

This assignment helps you practice classification workflows. Your task is to train a classifier that best predicts whether an NFL play will be successful. We will use the NFL play-by-play data from NFLsavant.com dataset.


The NFL play-by-play data includes the following details for each play:

There are about 140,000 plays recored across three seasons (2013-09-05 to 2016-01-03). You are only required to work with the 2013 data.

I also downloaded weather data (from NFLsavant) and statistics about passing for each team. I found team name abbreviations, which are necessary to connect the play-by-play data with these other files. All files may be found on londo in /home/jeckroth/csci431/nfldata.


Decide whether you will predict successful passes or rushes or both (or something else entirely). Process the data appropriately, possibly merging in data about the weather and passing statistics. Find a good classifier and report its accuracy.

Be sure to balance the successful/unsuccessful outcomes so ZeroR classifier gives 50% accuracy. This can be done in Weka by using the Resample filter in the preprocess tab (filter menu: supervised > instance > resample). Use values biasToUniformClass = 1.0, noReplacement = True, sampleSizePercent = 80.


Submit each of the following:

Grading rubric

Grading of this assignment will be somewhat subjective. Having worked with the NFL dataset now for a significant amount of time, I’m aware of some easy mistakes that can be made in data processing and feature engineering. I’ll be looking for evidence of these mistakes and otherwise poor methodology.

To earn full credit, data must be processed correctly and the classifiers must be evaluated correctly. The best classifier must have an accuracy > 60% on balanced classes (if predicting passes; I have not attempted predicting successful rushes). The methodology must be sophisticated, correct, and well-documented.

CSCI 431 material by Joshua Eckroth is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Source code for this website available at GitHub.