jnirschl/titanic_dvc

Message	Author	SHA1	Date
Re-run RF model predictions without the James-Stein estimator shrinkage	Jeff Nirschl	50f3aaf3e6	3 years ago
Update dvc files	Jeff Nirschl	4c4735258f	3 years ago
Add default XGBoostClassifier. Re-run pipeline	Jeff Nirschl	17d9608490	3 years ago
Re-do DVC stage predict_output. I had forgotten to add models/estimator.pkl as a dependency for the stage. Running successfully now.	Jeff Nirschl	186090e400	3 years ago
Add DVC stage predict_output. Current best submission with RF model + hyperopt and predicting proba with JS estimator gives a test-set accuracy of 0.77033, which is not too far from the cross-validation accuracy.	Jeff Nirschl	1a0beb71bf	3 years ago
Create function to compute James-Stein estimator.	Jeff Nirschl	1114efcf4e	3 years ago
Update DVC.lock to include mean imputation method	Jeff Nirschl	43a2938115	3 years ago
Update params.yaml	Jeff Nirschl	546471bd26	3 years ago
Update replace_nan.py to impute missing Fare values. There were no nan in the train dataset but the test dataset was missing some fare values	Jeff Nirschl	75787c26fb	3 years ago
Fix output saving in train_model.py. Previous version only saved the initialized RF model created before training. Now saves a pickle object with a list of the trained estimators	Jeff Nirschl	5d7ee74e95	3 years ago
Update params.yaml to include imputation method	Jeff Nirschl	fb261a5424	3 years ago
Merge branch 'master' of github.com:jnirschl/titanic_dvc	Jeff Nirschl	292c1611b1	3 years ago
Re-configure stage train_model to put metrics.json under Git version control instead of putting it in the dvc cache	Jeff Nirschl	63bd95a666	3 years ago
Merge pull request #4 from jnirschl/sourcery/dev_pipeline	jnirschl	f51e471e1a	3 years ago
'Refactored by Sourcery'	Sourcery AI	ea6ed28717	3 years ago
Merge pull request #2 from jnirschl/dev_pipeline	jnirschl	24e4ad087b	3 years ago
Run stage train_mode with RF best params from hyperopt	Jeff Nirschl	86726e14cb	3 years ago
Update params with best params from param_tuning.py	Jeff Nirschl	9c452cc081	3 years ago
Remove param_tuning stage. Run DVC repro	Jeff Nirschl	cbb3da568b	3 years ago
Run DVC repro, same results with base RF model	Jeff Nirschl	d3ca6e8055	3 years ago
Update function param_tuning.py to perform hyperparam optimzation using hyperopt. Minor update to formatting and var split_train_generator in train_model.py. Run DVC repro.	Jeff Nirschl	1a4a171249	3 years ago
Add function to save params.yaml after converting dict values "None" to "null" to ensure correct yaml reading or NoneType objects.	Jeff Nirschl	ff2fbb064f	3 years ago
Add bash example code for stage train_model to README.md. Add indent to other sections of bash example code/	Jeff Nirschl	968ff7e2f0	3 years ago
Run DVC repro successfully	Jeff Nirschl	32e82153d5	3 years ago
Update script train_model.py to accept training dataset and pre-defined cross-validation splits, train an SK-Learn estimator (currently RF), and output a pickle file of the CV estimator and performance metrics. Also, built a custom function to compute the geometric mean of precision and recall. Re-configured params.yaml to have a base variable "classifier" that defines the current classifier in use and a separate dict "model_params" that includes sub-dicts for the various model-specific parameters. DVC running successfully.	Jeff Nirschl	9557af78d7	3 years ago
Correct DVC DAG for stage normalize data. The function normalize.py accepts the featurized data, not the nan-imputed data. Successful re-run stage normalize_data and split_train_dev.	Jeff Nirschl	b4a7fefdb1	3 years ago
Set default remote to origin@dagshub	Jeff Nirschl	f87c0e6d6d	3 years ago
Add placeholder script build_features.py to allow feature engineering (currently just saves a copy of the input dataframe as "_featurized.csv"). Add DVC stage build_features (feature engineering) prior to feature normalization. Run DVC stages build_features, normalize_data, and split_train_dev with all stages working. Update README.md to include feature engineering stage.	Jeff Nirschl	49cd29a4db	3 years ago
Move script normalize_data.py to src/features and rename to normalize.py Remove DVC stages normalize_data and split_train_dev in order to add feature engineering stage prior to data normalization.	Jeff Nirschl	cb1e554593	3 years ago
Add script to split training data into train/dev sets using stratified K-fold cross validation. Save indices for train/dev splits as CSV.	Jeff Nirschl	7ca1feda42	3 years ago

Newer Older

jnirschl / titanic_dvc mirror of https://github.com/jnirschl/titanic_dvc.git

jnirschl
/
titanic_dvc
mirror of https://github.com/jnirschl/titanic_dvc.git