Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Commit History
Message Author SHA1 Date
Re-run RF model predictions without the James-Stein estimator shrinkage   Jeff Nirschl 3 years ago
Update dvc files   Jeff Nirschl 3 years ago
Add default XGBoostClassifier. Re-run pipeline   Jeff Nirschl 3 years ago
Re-do DVC stage predict_output. I had forgotten to add models/estimator.pkl as a dependency for the stage. Running successfully now.   Jeff Nirschl 3 years ago
Add DVC stage predict_output. Current best submission with RF model + hyperopt and predicting proba with JS estimator gives a test-set accuracy of 0.77033, which is not too far from the cross-validation accuracy.   Jeff Nirschl 3 years ago
Create function to compute James-Stein estimator.   Jeff Nirschl 3 years ago
Update DVC.lock to include mean imputation method   Jeff Nirschl 3 years ago
Update params.yaml   Jeff Nirschl 3 years ago
Update replace_nan.py to impute missing Fare values. There were no nan in the train dataset but the test dataset was missing some fare values   Jeff Nirschl 3 years ago
Fix output saving in train_model.py. Previous version only saved the initialized RF model created before training. Now saves a pickle object with a list of the trained estimators   Jeff Nirschl 3 years ago
Update params.yaml to include imputation method   Jeff Nirschl 3 years ago
Merge branch 'master' of github.com:jnirschl/titanic_dvc   Jeff Nirschl 3 years ago
Re-configure stage train_model to put metrics.json under Git version control instead of putting it in the dvc cache   Jeff Nirschl 3 years ago
Merge pull request #4 from jnirschl/sourcery/dev_pipeline   jnirschl 3 years ago
'Refactored by Sourcery'   Sourcery AI 3 years ago
Merge pull request #2 from jnirschl/dev_pipeline   jnirschl 3 years ago
Run stage train_mode with RF best params from hyperopt   Jeff Nirschl 3 years ago
Update params with best params from param_tuning.py   Jeff Nirschl 3 years ago
Remove param_tuning stage. Run DVC repro   Jeff Nirschl 3 years ago
Run DVC repro, same results with base RF model   Jeff Nirschl 3 years ago
Update function param_tuning.py to perform hyperparam optimzation using hyperopt. Minor update to formatting and var split_train_generator in train_model.py. Run DVC repro.   Jeff Nirschl 3 years ago
Add function to save params.yaml after converting dict values "None" to "null" to ensure correct yaml reading or NoneType objects.   Jeff Nirschl 3 years ago
Add bash example code for stage train_model to README.md. Add indent to other sections of bash example code/   Jeff Nirschl 3 years ago
Run DVC repro successfully   Jeff Nirschl 3 years ago
Update script train_model.py to accept training dataset and pre-defined cross-validation splits, train an SK-Learn estimator (currently RF), and output a pickle file of the CV estimator and performance metrics. Also, built a custom function to compute the geometric mean of precision and recall. Re-configured params.yaml to have a base variable "classifier" that defines the current classifier in use and a separate dict "model_params" that includes sub-dicts for the various model-specific parameters. DVC running successfully.   Jeff Nirschl 3 years ago
Correct DVC DAG for stage normalize data. The function normalize.py accepts the featurized data, not the nan-imputed data. Successful re-run stage normalize_data and split_train_dev.   Jeff Nirschl 3 years ago
Set default remote to origin@dagshub   Jeff Nirschl 3 years ago
Add placeholder script build_features.py to allow feature engineering (currently just saves a copy of the input dataframe as "_featurized.csv"). Add DVC stage build_features (feature engineering) prior to feature normalization. Run DVC stages build_features, normalize_data, and split_train_dev with all stages working. Update README.md to include feature engineering stage.   Jeff Nirschl 3 years ago
Move script normalize_data.py to src/features and rename to normalize.py Remove DVC stages normalize_data and split_train_dev in order to add feature engineering stage prior to data normalization.   Jeff Nirschl 3 years ago
Add script to split training data into train/dev sets using stratified K-fold cross validation. Save indices for train/dev splits as CSV.   Jeff Nirschl 3 years ago