Just a tiny repo meant to illustrate that even if your data comes from a dynamic source (is queried), you can still maintain reproducibility by storing a reference to the resulting data as part of your DVC pipeline.

Guy a352e0b596 Added screenshot to readme 2 years ago
.dvc 041c98e60e Example of data queried from a service being saved as part of a commit 2 years ago
.gitignore 041c98e60e Example of data queried from a service being saved as part of a commit 2 years ago
README.md a352e0b596 Added screenshot to readme 2 years ago
data.csv
data.csv.dvc 041c98e60e Example of data queried from a service being saved as part of a commit 2 years ago
get-data-from-service.sh 041c98e60e Example of data queried from a service being saved as part of a commit 2 years ago
illustration.png 3fb6648015 Added an illustrated screenshot 2 years ago
model
model.dvc 041c98e60e Example of data queried from a service being saved as part of a commit 2 years ago
train.py 041c98e60e Example of data queried from a service being saved as part of a commit 2 years ago

Data Pipeline

Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

pipeline-intermediate-files-example

Just a tiny repo meant to illustrate that even if your data comes from a dynamic source (is queried), you can still maintain reproducibility by storing a reference to the resulting data as part of your DVC pipeline.

Illustrated screenshot