Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Commit History
Message Author SHA1 Date
Update dvc files   Jeff Nirschl 3 years ago
Re-do DVC stage predict_output. I had forgotten to add models/estimator.pkl as a dependency for the stage. Running successfully now.   Jeff Nirschl 3 years ago
Add DVC stage predict_output. Current best submission with RF model + hyperopt and predicting proba with JS estimator gives a test-set accuracy of 0.77033, which is not too far from the cross-validation accuracy.   Jeff Nirschl 3 years ago
Re-configure stage train_model to put metrics.json under Git version control instead of putting it in the dvc cache   Jeff Nirschl 3 years ago
Update params with best params from param_tuning.py   Jeff Nirschl 3 years ago
Update script train_model.py to accept training dataset and pre-defined cross-validation splits, train an SK-Learn estimator (currently RF), and output a pickle file of the CV estimator and performance metrics. Also, built a custom function to compute the geometric mean of precision and recall. Re-configured params.yaml to have a base variable "classifier" that defines the current classifier in use and a separate dict "model_params" that includes sub-dicts for the various model-specific parameters. DVC running successfully.   Jeff Nirschl 3 years ago
Correct DVC DAG for stage normalize data. The function normalize.py accepts the featurized data, not the nan-imputed data. Successful re-run stage normalize_data and split_train_dev.   Jeff Nirschl 3 years ago
Add placeholder script build_features.py to allow feature engineering (currently just saves a copy of the input dataframe as "_featurized.csv"). Add DVC stage build_features (feature engineering) prior to feature normalization. Run DVC stages build_features, normalize_data, and split_train_dev with all stages working. Update README.md to include feature engineering stage.   Jeff Nirschl 3 years ago
Move script normalize_data.py to src/features and rename to normalize.py Remove DVC stages normalize_data and split_train_dev in order to add feature engineering stage prior to data normalization.   Jeff Nirschl 3 years ago
Add script to split training data into train/dev sets using stratified K-fold cross validation. Save indices for train/dev splits as CSV.   Jeff Nirschl 3 years ago
Add script to optionally normalize_data.py. Add stage to DVC and run pipeline.   Jeff Nirschl 3 years ago
Refactor out duplicated code to create helper function "load_data" in data.__init__.py. Remove unused variables from replace_nan.py. Update DVC stages.   Jeff Nirschl 3 years ago
Add script to replace missing age values using mean imputation. Added 3rd stage of DVC pipeline "impute_nan"   Jeff Nirschl 3 years ago
Refactor encode_labels.py to read dtypes from params.yaml.   Jeff Nirschl 3 years ago
Add stage 1 = make dataset   Jeff Nirschl 3 years ago
Run first stage of updated dvc pipeline: download_data   Jeff Nirschl 3 years ago
Deleting previous DVC pipeline to create new pipeline   Jeff Nirschl 3 years ago
Re-configure Stage train_model to send outputs to results directory   Jeff Nirschl 3 years ago
update train_model pipeline   Jeff Nirschl 3 years ago
Add script to train RandomForest model and create DVC stage   Jeff Nirschl 3 years ago
Update make dataset to encode categorical variables, optionally remove nan from training, and save categorized data as well as yaml encoding categorical classes   Jeff Nirschl 3 years ago
Adding DVC Stage 1: Prepare dataset   Jeff Nirschl 3 years ago