RPPP – Reddit Post Popularity Predictor
A project with two goals:
1. Given a Reddit post, predict how popular it's going to be (what it's score will be)
2. Showcasing a remote working file system with DVC

Dean 2e0f2b35fb Merge branch 'localize-wfs' of Dean/RPPP into master 2 months ago
.dvc 7942d96f38 - Removed remote-wfs elements of the project. All dvc managed files are local. 2 months ago
models
processed 7942d96f38 - Removed remote-wfs elements of the project. All dvc managed files are local. 2 months ago
raw
.gitignore 7942d96f38 - Removed remote-wfs elements of the project. All dvc managed files are local. 2 months ago
CONTRIBUTING.md d9735a5dd8 Add contributing guide 9 months ago
README.md 0aa625a470 Update 'README.md' 9 months ago
dvc.lock 7942d96f38 - Removed remote-wfs elements of the project. All dvc managed files are local. 2 months ago
dvc.yaml 7942d96f38 - Removed remote-wfs elements of the project. All dvc managed files are local. 2 months ago
evaluate.py 7942d96f38 - Removed remote-wfs elements of the project. All dvc managed files are local. 2 months ago
general_params.yml b2eb747da1 Finished training of text based classifier. 9 months ago
make_dataset.py 7942d96f38 - Removed remote-wfs elements of the project. All dvc managed files are local. 2 months ago
model_def.py b54b8e2185 Added calculation of training metrics, modified metric filenames to relate to stage 9 months ago
model_params.yml b6c2981a67 This completes training for the numerical and categorical base model. 9 months ago
params.yml b6c2981a67 This completes training for the numerical and categorical base model. 9 months ago
preprocess.py 7942d96f38 - Removed remote-wfs elements of the project. All dvc managed files are local. 2 months ago
raw.dvc 7942d96f38 - Removed remote-wfs elements of the project. All dvc managed files are local. 2 months ago
reddit_utils.py 50aefb4765 Added make dataset stage the runs a query on BigQuery and saves the raw data to the remote working file system. 9 months ago
remote-wfs-setup.md cac18be035 Add 'remote-wfs-setup.md' 9 months ago
requirements.txt 4d47e01475 Update dvc version in requirements.txt 2 months ago
test_metrics.csv 7942d96f38 - Removed remote-wfs elements of the project. All dvc managed files are local. 2 months ago
training.py 7942d96f38 - Removed remote-wfs elements of the project. All dvc managed files are local. 2 months ago
training_metrics.csv 7942d96f38 - Removed remote-wfs elements of the project. All dvc managed files are local. 2 months ago

Data Pipeline

Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

RPPP - Reddit Post Popularity Predictor

This Project attempts to predict whether a reddit submission will be popular or not according to it's features.

We currently provide models for r/MachineLearning only, base on submission title and body.

DVC Remote Working File System

This project is also an exploration of DVC remote WFS workflow. To setup your remote WFS – read here: Remote WFS Setup

Contributing

Contributions Are Very Welcome!

Read the Contribution Guide for more information.

Ideas to work on:

  • Combine textual and numerical classifier into one model!
  • Add UI to test if your post is going to be successful!
  • Add MOAR data! (other subreddits, more from r/ML)
  • Improve model performance (there is a lotttt to improve)!
  • Add memes: Add MOAR MEMES