Are you sure you want to delete this access key?
title | description |
---|---|
DagsHub integration with Google Colab - Your Cloud Notebook | DagsHub’s integration with Google Colab allows you to open, version, and commit cloud-based Jupyter Notebook with access to free GPU, with no MLOps friction |
Google Colaboratory{:rel="nofollow" target="_blank"} , or "Colab" for short, is a free Jupyter notebook environment that runs entirely in the cloud. It does not require any setup, can be shared easily with team members, and provides free access to GPUs. DagsHub provides its users with Colab Notebook templates for various tasks, such as fully configuring DagsHub with Colab runtime, transferring data from Google Drive to DagsHub Storage, tutorials, and more.
DagsHub is officially integrated with Google Colab, enabling users to:
Seamlessly build, train, and collaborate on ML models with ZERO MLOps friction.
To see an example notebook:
You can easily access your DagsHub projects code, data and experiments from any Colab environment. By setting your DagsHub credentials, you'll be able to clone your code, and pull your data hosted on DagsHub Storage, and then log experiments to the project's experiment tracking server.
To open a notebook from DagsHub in Colab, just navigate to your notebook file preview on DagsHub – there you will see the "Open in Colab" button which will open the notebook in Colab.
Adding topics with the UI
!!! info "Open in Colab" button only works in public projects currently. To open in Colab in a private repo, simply download the file, and upload it to Colab.
DagsHub Storage Buckets offer an S3-Compatible, ML-focused storage solution, now seamlessly integrated with Google Colab. This integration enables easy and scalable access to projects working with large-scale datasets, overcoming the limitations of Google Drive and traditional cloud storage solutions for ML workflows.
Use the DagsHub client for an easy data upload process. The client supports both command line and Python API methods for uploading your datasets directly into the DagsHub Storage Bucket.
=== "Python"
python dagshub.upload_files("<user_name>/<repo_name>", "<local_path>", remote_path="<remote_path>", bucket=True)
=== "Command Line"
bash dagshub upload <user_name>/<repo_name> <local_path> <remote_path> --bucket
Unlike Google Drive, DagsHub Storage Buckets are designed with ML use cases in mind, offering a more scalable and robust backend. You can easily mount your DagsHub Storage Bucket to your Colab instance, facilitating direct access to your data for model training and inference.
To sync a local folder to DagsHub Storage, simply run:
dagshub.storage.sync("<user_name>/<repo_name>", "<local_path>", "<remote_path>")
Sync a local folder with your DagsHub storage remote by specifying the paths. This command ensures your local dataset is mirrored in the DagsHub Storage Bucket.
Mount your DagsHub Storage Bucket to a Colab notebook for direct file access. If needed, you can specify a custom mount
path, by providing a path=
argument. You can also provide cache=True
to do smart caching that will accelerate training
(at the cost of taking up more disk space).
mount_path = dagshub.storage.mount("<user_name>/<repo_name>")
To unmount the bucket, simply run:
dagshub.storage.unmount("<user_name>/<repo_name>", mount_path)
!!! warning "Remounting Buckets" In case of errors or if the Colab cell execution breaks, you can easily remount your DagsHub Storage Bucket using the dagshub.storage.mount() function. This ensures continuous access to your data without disruption.
Integrating DagsHub and Colab introduces a significant improvement in notebook version control, as users can use DVC to version large notebooks that Git has trouble facilitating. DagsHub lets you diff notebooks and comment on notebook cells which unlocks collaboration for ML teams, without needing third-party platforms or sharing screenshots across Slack
To version your notebook with the DagsHub Client (run pip install dagshub
to install), use the save_notebook
function as follows:
from dagshub.notebook import save_notebook
save_notebook(repo="<repo_owner>/<repo_name>")
With the following argument:
repo
(str): your DagsHub repository in the format of <repo_owner>/<repo_name>
You can also use the following optional arguments:
path
(str): Where to save the notebook within the repository (including the filename). If the filename is not specified, we'll save it as "notebook-{datetime.now}.ipynb" under the specified folderbranch
(str): The branch under which the notebook should be saved. Will commit to the default repo branch if not specifiedcommit_message
(str): The commit message for the updateversioning
(str): ['git'|'dvc']
The VCS used to version the notebook!!! info "Alternative way to version your Colab notebook"
In some cases the above function might fail due to Colab related issues. An alternative way to version your notebook is to download it locally, then in an environment with pip install dagshub
run the following snippet:
import dagshub
dagshub.upload_files(repo="<repo_owner>/<repo_name>", local_path="<path/to/notebook.ipynb>", remote_path="<path/in/remote/notebook.ipynb>", commit_message="<commit_message>")
Follow this notebook to move your data from Google Drive to DagsHub Storage Buckets easily:
Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?