RPPP – Reddit Post Popularity Predictor
A project with two goals:
1. Given a Reddit post, predict how popular it's going to be (what it's score will be)
2. Showcasing a remote working file system with DVC

Dean b54b8e2185 Added calculation of training metrics, modified metric filenames to relate to stage 1 week ago
.dvc c016be9385 init git + dvc, init remote working file system, added data to the project 3 weeks ago
models
.gitignore b2eb747da1 Finished training of text based classifier. 3 weeks ago
CONTRIBUTING.md d9735a5dd8 Add contributing guide 2 weeks ago
README.md 0aa625a470 Update 'README.md' 2 weeks ago
eval.dvc b54b8e2185 Added calculation of training metrics, modified metric filenames to relate to stage 1 week ago
evaluate.py b54b8e2185 Added calculation of training metrics, modified metric filenames to relate to stage 1 week ago
general_params.yml b2eb747da1 Finished training of text based classifier. 3 weeks ago
make_data.dvc 50aefb4765 Added make dataset stage the runs a query on BigQuery and saves the raw data to the remote working file system. 1 week ago
make_dataset.py 50aefb4765 Added make dataset stage the runs a query on BigQuery and saves the raw data to the remote working file system. 1 week ago
model_def.py b54b8e2185 Added calculation of training metrics, modified metric filenames to relate to stage 1 week ago
model_params.yml b6c2981a67 This completes training for the numerical and categorical base model. 2 weeks ago
params.yml b6c2981a67 This completes training for the numerical and categorical base model. 2 weeks ago
preprocess.py 50aefb4765 Added make dataset stage the runs a query on BigQuery and saves the raw data to the remote working file system. 1 week ago
preprocessing.dvc 50aefb4765 Added make dataset stage the runs a query on BigQuery and saves the raw data to the remote working file system. 1 week ago
reddit_utils.py 50aefb4765 Added make dataset stage the runs a query on BigQuery and saves the raw data to the remote working file system. 1 week ago
remote-wfs-setup.md cac18be035 Add 'remote-wfs-setup.md' 2 weeks ago
requirements.txt e9d1904b2e Added black formatting 3 weeks ago
test_metrics.csv b54b8e2185 Added calculation of training metrics, modified metric filenames to relate to stage 1 week ago
training.dvc b54b8e2185 Added calculation of training metrics, modified metric filenames to relate to stage 1 week ago
training.py b54b8e2185 Added calculation of training metrics, modified metric filenames to relate to stage 1 week ago
training_metrics.csv b54b8e2185 Added calculation of training metrics, modified metric filenames to relate to stage 1 week ago

Data Pipeline

Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

RPPP - Reddit Post Popularity Predictor

This Project attempts to predict whether a reddit submission will be popular or not according to it's features.

We currently provide models for r/MachineLearning only, base on submission title and body.

DVC Remote Working File System

This project is also an exploration of DVC remote WFS workflow. To setup your remote WFS – read here: Remote WFS Setup

Contributing

Contributions Are Very Welcome!

Read the Contribution Guide for more information.

Ideas to work on:

  • Combine textual and numerical classifier into one model!
  • Add UI to test if your post is going to be successful!
  • Add MOAR data! (other subreddits, more from r/ML)
  • Improve model performance (there is a lotttt to improve)!
  • Add memes: Add MOAR MEMES