jnirschl/titanic_dvc

Message	Author	SHA1	Date
Update dvc files	Jeff Nirschl	4c4735258f	3 years ago
Re-do DVC stage predict_output. I had forgotten to add models/estimator.pkl as a dependency for the stage. Running successfully now.	Jeff Nirschl	186090e400	3 years ago
Add DVC stage predict_output. Current best submission with RF model + hyperopt and predicting proba with JS estimator gives a test-set accuracy of 0.77033, which is not too far from the cross-validation accuracy.	Jeff Nirschl	1a0beb71bf	3 years ago
Re-configure stage train_model to put metrics.json under Git version control instead of putting it in the dvc cache	Jeff Nirschl	63bd95a666	3 years ago
Update params with best params from param_tuning.py	Jeff Nirschl	9c452cc081	3 years ago
Update script train_model.py to accept training dataset and pre-defined cross-validation splits, train an SK-Learn estimator (currently RF), and output a pickle file of the CV estimator and performance metrics. Also, built a custom function to compute the geometric mean of precision and recall. Re-configured params.yaml to have a base variable "classifier" that defines the current classifier in use and a separate dict "model_params" that includes sub-dicts for the various model-specific parameters. DVC running successfully.	Jeff Nirschl	9557af78d7	3 years ago
Correct DVC DAG for stage normalize data. The function normalize.py accepts the featurized data, not the nan-imputed data. Successful re-run stage normalize_data and split_train_dev.	Jeff Nirschl	b4a7fefdb1	3 years ago
Add placeholder script build_features.py to allow feature engineering (currently just saves a copy of the input dataframe as "_featurized.csv"). Add DVC stage build_features (feature engineering) prior to feature normalization. Run DVC stages build_features, normalize_data, and split_train_dev with all stages working. Update README.md to include feature engineering stage.	Jeff Nirschl	49cd29a4db	3 years ago
Move script normalize_data.py to src/features and rename to normalize.py Remove DVC stages normalize_data and split_train_dev in order to add feature engineering stage prior to data normalization.	Jeff Nirschl	cb1e554593	3 years ago
Add script to split training data into train/dev sets using stratified K-fold cross validation. Save indices for train/dev splits as CSV.	Jeff Nirschl	7ca1feda42	3 years ago
Add script to optionally normalize_data.py. Add stage to DVC and run pipeline.	Jeff Nirschl	fa936bf47b	3 years ago
Refactor out duplicated code to create helper function "load_data" in data.__init__.py. Remove unused variables from replace_nan.py. Update DVC stages.	Jeff Nirschl	74be384aa7	3 years ago
Add script to replace missing age values using mean imputation. Added 3rd stage of DVC pipeline "impute_nan"	Jeff Nirschl	1e1537214a	3 years ago
Refactor encode_labels.py to read dtypes from params.yaml.	Jeff Nirschl	a76b329171	3 years ago
Add stage 1 = make dataset	Jeff Nirschl	eba57940da	3 years ago
Run first stage of updated dvc pipeline: download_data	Jeff Nirschl	d7097646e9	3 years ago
Deleting previous DVC pipeline to create new pipeline	Jeff Nirschl	581186c153	3 years ago
Re-configure Stage train_model to send outputs to results directory	Jeff Nirschl	6cef1f0386	3 years ago
update train_model pipeline	Jeff Nirschl	ac11fd43da	3 years ago
Add script to train RandomForest model and create DVC stage	Jeff Nirschl	1699b9697a	3 years ago
Update make dataset to encode categorical variables, optionally remove nan from training, and save categorized data as well as yaml encoding categorical classes	Jeff Nirschl	8350d592c2	3 years ago
Adding DVC Stage 1: Prepare dataset	Jeff Nirschl	0026abd3b4	3 years ago

jnirschl / titanic_dvc mirror of https://github.com/jnirschl/titanic_dvc.git

jnirschl
/
titanic_dvc
mirror of https://github.com/jnirschl/titanic_dvc.git