Are you sure you want to delete this access key?
Legend |
---|
DVC Managed File |
Git Managed File |
Metric |
Stage File |
External File |
Legend |
---|
DVC Managed File |
Git Managed File |
Metric |
Stage File |
External File |
Idea is to build multiple model and ensemble them to create a stable strong model.
Your final pipeline structure will look like this in the end:
+-------------------+
| data/iris.csv.dvc |
+-------------------+
*
*
*
+-----------+
| split.dvc |
+-----------+
*
*
*
+---------------+
****| featurize.dvc |****
******** +---------------+ ********
******** ** *** *********
******** *** ** ********
***** ** ** ********
+--------------------+ +---------------+ +-------------------+ *****
| train_logistic.dvc |** | train_svc.dvc | | train_forrest.dvc | ********
+--------------------+ ******** +---------------+ +-------------------+ ********
******** ** *** *********
******** *** ** ********
***** ** ** *****
+--------------------+
| train_ensemble.dvc |
+--------------------+
Order is data -> train_test_split -> feature_extraction -> 3 models -> ensemble_model
and Done!!
With data/iris.csv
versioned in DVC you can start with your first pipeline.
Define Stage1
=> split.dvc
i.e train_test_split.
dvc run -n split\
-d data/iris.csv\
-d src/train_test_split.py\
-o data/split\
python src/train_test_split.py -i "data/iris.csv" -o "data/split/"
Define Stage2
=> featurize.dvc
i.e Feature Engineering.
dvc run -n featurize\
-d data/split\
-d src/feature_engineering.py\
-p pca\
-o data/features\
-o data/models/pca/model.gz\
-M data/models/pca/metrics.csv\
python src/feature_engineering.py -i "data/split/" -o "data/features/" -o "data/models/pca/"
Define Stage3.a
=> train_logistic.dvc
i.e Fit Logistic Regression Model.
dvc run -n train_logistic\
-d src/logistic_regression.py\
-d data/features\
-p logistic\
-o data/models/logistic/model.gz\
-M data/models/logistic/metrics.csv\
python src/logistic_regression.py -i "data/features/" -o "data/models/logistic/"
Define Stage3.b
=> train_svc.dvc
i.e Fit Linear SVC Model.
dvc run -n train_svc\
-d src/linear_svc.py\
-d data/features\
-p svc\
-o data/models/svc/model.gz\
-M data/models/svc/metrics.csv\
python src/linear_svc.py -i "data/features/" -o "data/models/svc/"
Define Stage3.c
=> train_forrest.dvc
i.e Fit Random Forrest Model.
dvc run -n train_forrest\
-d src/random_forrest.py\
-d data/features\
-p forrest\
-o data/models/r_forrest/model.gz\
-M data/models/r_forrest/metrics.csv\
python src/random_forrest.py -i "data/features/" -o "data/models/r_forrest/"
Define Stage4
=> train_ensemble.dvc
i.e Create an Ensemble Model.
dvc run -n train_ensemble\
-d src/ensemble.py\
-d data/features\
-d data/models/logistic/model.gz\
-d data/models/svc/model.gz\
-d data/models/r_forrest/model.gz\
-p ensemble\
-o data/models/ensemble/model.gz\
-M data/models/ensemble/metrics.csv\
python src/ensemble.py -i "data/features/" -m "data/models/" -o "data/models/ensemble/"
Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?