No Description

lorendery 7deae53b8f new model metrics 2 months ago
.dvc 3a44a79eee Adding files from Google Drive to the project 2 months ago
.idea cf89c2cedf update data 2 months ago
__MACOSX 3a44a79eee Adding files from Google Drive to the project 2 months ago
code 7deae53b8f new model metrics 2 months ago
data
src 3a44a79eee Adding files from Google Drive to the project 2 months ago
.dvcignore 21831337f5 Add data dir to dvc tracking 2 months ago
.gitignore 21831337f5 Add data dir to dvc tracking 2 months ago
README.md 3a44a79eee Adding files from Google Drive to the project 2 months ago
data.dvc cf89c2cedf update data 2 months ago
requirements.txt 3a44a79eee Adding files from Google Drive to the project 2 months ago

Data Pipeline

Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

First Repo Project

This project is a simple 'Ham or Spam' classifier for emails using the Enron data set. It contains two python code files, 5 data files, and one constants file.

  • code directory - holds the data-preprocessing and modeling files:
    • data-preprocessing.py - processing the raw data (enron.csv), splits it to train and test sets, and saves it to the data directory.
    • modeling.py - simple Random Forest Regressor.
  • data directory - contains the raw and processed data.
  • src - contains the constants file.
  • requirements.txt - python dependencies that are required to run the python files.
  • README.md - Read me file.