RPPP – Reddit Post Popularity Predictor
A project with two goals:
1. Given a Reddit post, predict how popular it's going to be (what it's score will be)
2. Showcasing a remote working file system with DVC

Dean e18a705887 Fixed another issue where because of pandas' infer type some titles and bodies were considered floats instead of strings 3 weeks ago
.dvc c016be9385 init git + dvc, init remote working file system, added data to the project 3 weeks ago
.gitignore e9d1904b2e Added black formatting 3 weeks ago
general_params.yaml 1e908bb64b Updated preprocessing step: 3 weeks ago
preprocess.py e18a705887 Fixed another issue where because of pandas' infer type some titles and bodies were considered floats instead of strings 3 weeks ago
preprocessing.dvc e18a705887 Fixed another issue where because of pandas' infer type some titles and bodies were considered floats instead of strings 3 weeks ago
rML-raw-data.csv.dvc cf7012e4ce Changed data - removed null columns, added top decile and top percent columns 3 weeks ago
requirements.txt e9d1904b2e Added black formatting 3 weeks ago

Data Pipeline

Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File