A07: Detecting Spam Comments Part Deux
Using this dataset, build a neural network classifier that accurately (>= 95% accuracy) detects spam YouTube comments. Use all the data files, totalling 1956 rows. Use Keras for the neural network stuff.
You are also required to demonstrate a few extra features:
- Use cross-validation to take 80/20% splits (5 folds) and train/test the model on each split. Print the average accuracy. Consider using StratifiedKFold from scikit-learn for generating the splits.
- Create two Python files: one that trains and saves a model (trained on all the data) and a second file that loads the model and interactively asks the user for a new comment, then tells that user if it is spam.
Produce two Python files: train.py (which performs cross validation and after that trains and saves a model on all the data), and run.py (which loads the model and asks for user input).
- 5 pts: All deliverables present.
- 4 pts: All deliverables present except the model saving and reloading and interactive user input, or 95% accuracy was not achieved.
- 3 pts: All deliverables present except the model saving and reloading and cross-validation.
- 2 pts: Inappropriate dataset processing, but most code is correct.
- 0-1 pts: Inappropriate dataset processing and most code is incorrect.