Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Integration:  dvc git github
Martin Fabbri 6087e8e445
Merge branch 'main' of https://github.com/martin-fabbri/tpu-workflow into main
3 years ago
5314e41f62
split stage v1
3 years ago
0f41110ea9
pipenv settings
3 years ago
0ebfac90c6
Adding checkpoints.
3 years ago
ae129cd512
train stage
3 years ago
0f41110ea9
pipenv settings
3 years ago
0ebfac90c6
Adding checkpoints.
3 years ago
6272b9ff77
train stage
3 years ago
4e946138d7
Local TPU checkpoints.
3 years ago
7817e34171
save train plot metrics series
3 years ago
src
0ebfac90c6
Adding checkpoints.
3 years ago
f6ac65d25a
dvc init
3 years ago
90a85b9287
reset dvc
3 years ago
db139bcfb7
forcing python 3.8
3 years ago
704c4c1790
updating dvc dependency.
3 years ago
704c4c1790
updating dvc dependency.
3 years ago
419d4de0e2
train stage
3 years ago
0ebfac90c6
Adding checkpoints.
3 years ago
0ebfac90c6
Adding checkpoints.
3 years ago
0ebfac90c6
Adding checkpoints.
3 years ago
ad9e5c0fa5
generating training plots
3 years ago
Storage Buckets
Data Pipeline
Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

You have to be logged in to leave a comment. Sign In

tpu-worflows

DVC

Remote origin setup

dvc remote add origin https://dagshub.com/martin-fabbri/tpu-workflow.dvc
dvc remote modify origin --local auth basic
dvc remote modify origin --local user "$DAGSHUB_USER"
dvc remote modify origin --local password "$SUPER_SECRET_PASSWORD"

Define pipeline stages

dvc run -n split \
-d src/split.py \
-o data/interim/train_split.json \
-o data/interim/val_split.json \
python3 src/split.py --gcs-path gs://kds-357fde648f21ba86b09520d51e296ad06846fd421d364336db3d426d --batch-size 16 

GCS

dvc run -n download_file \
-d gs://kaggle-data-tpu/test/test.txt \
-o test.txt \
gsutil cp gs://kaggle-data-tpu/test/test.txt test.txt

Remote aliases

['remote "gcs_test"']
    url = gs://kaggle-data-tpu/test

| gs://kaggle-data-tpu/test/test_2.txt

dvc run -n download_file_2 \
          -d remote://gcs_test/test_2.txt \
          -o test_2.txt \
          gsutil cp gs://kaggle-data-tpu/test/test_2.txt test_2.txt
download_file_2:
    cmd: gsutil cp gs://kaggle-data-tpu/test/test_2.txt test_2.txt
    deps:
    - remote://gcs_test/test_2.txt
    outs:
    - test_2.txt

Test python script

import logging
import os
from google.cloud import storage

client = storage.Client()

bucket = client.get_bucket("kaggle-data-tpu")
blob = bucket.get_blob("test/test_2.txt")
print(blob.download_as_string())

Test task stage

dvc run -n test_read_blob_gcs \
          -d remote://gcs_test/test_2.txt \
          python src/test.py

List objects in a bucket

from google.cloud import storage

storage_client = storage.Client()
blobs = storage_client.list_blobs("gs://kaggle-data-tpu/test/")
for blob in blobs:
    print(blob.name)
dvc run -n test_list_objects_gcs \
          -d remote://gcs_test/ \
          python src/test_list_blobs.py
dvc run -n train \
    -d src/pipeline/train.py \
    python src/pipeline/train.py --lr 1
Tip!

Press p or to see the previous file or, n or to see the next file

About

No description

Collaborators 1

Comments

Loading...