Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

dvc.yaml 1.3 KB

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
  1. stages:
  2. make_dataset:
  3. desc: Download data from Kaggle, create data dictionary and summary dtable
  4. cmd: python3 src/data/make_dataset.py -c titanic -tr train.csv -te test.csv -o
  5. ./data/raw
  6. deps:
  7. - src/data/make_dataset.py
  8. params:
  9. - dtypes
  10. outs:
  11. - data/raw/test.csv
  12. - data/raw/train.csv
  13. - reports/figures/data_dictionary.tex
  14. - reports/figures/table_one.tex
  15. encode_labels:
  16. desc: Convert categorical labels to integer values and save mapping
  17. cmd: python3 src/data/encode_labels.py -tr data/raw/train.csv -te data/raw/test.csv
  18. -o data/interim
  19. deps:
  20. - data/raw/test.csv
  21. - data/raw/train.csv
  22. - src/data/encode_labels.py
  23. params:
  24. - dtypes
  25. outs:
  26. - data/interim/label_encoding.yaml
  27. - data/interim/test_categorized.csv
  28. - data/interim/train_categorized.csv
  29. impute_nan:
  30. desc: Replace missing values for age with mean values from training dataset.
  31. cmd: python3 src/data/replace_nan.py -tr data/interim/train_categorized.csv -te
  32. data/interim/test_categorized.csv -o data/interim
  33. deps:
  34. - data/interim/test_categorized.csv
  35. - data/interim/train_categorized.csv
  36. - src/data/replace_nan.py
  37. params:
  38. - imputation
  39. outs:
  40. - data/interim/test_nan_imputed.csv
  41. - data/interim/train_nan_imputed.csv
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...