Skip to content

Google Colab

Google Colaboratory , or "Colab" for short, is a free Jupyter notebook environment that runs entirely in the cloud. It does not require any setup, can be shared easily with team members, and provides free access to GPUs. DagsHub provides its users with Colab Notebook templates for various tasks, such as fully configuring DagsHub with Colab runtime, transferring data from Google Drive to DagsHub Storage, tutorials, and more.

How does DagsHub work with Google Colab?

DagsHub is officially integrated with Google Colab, enabling users to:

Seamlessly build, train, and collaborate on ML models with ZERO MLOps friction.

To see an example notebook:

Open in Colab

You can easily access your DagsHub projects code, data and experiments from any Colab environment. By setting your DagsHub credentials, you'll be able to clone your code, and pull your data hosted on DagsHub Storage, and then log experiments to the project's experiment tracking server.

How to use DagsHub with Google Colab?

Open a notebook from your DagsHub repo in Colab

To open a notebook from DagsHub in Colab, just navigate to your notebook file preview on DagsHub – there you will see the "Open in Colab" button which will open the notebook in Colab.

Adding topics with the UI

Info

"Open in Colab" button only works in public projects currently. To open in Colab in a private repo, simply download the file, and upload it to Colab.

DagsHub Storage Buckets Integration with Google Colab

DagsHub Storage Buckets offer an S3-Compatible, ML-focused storage solution, now seamlessly integrated with Google Colab. This integration enables easy and scalable access to projects working with large-scale datasets, overcoming the limitations of Google Drive and traditional cloud storage solutions for ML workflows.

Uploading Data to Your DagsHub Storage Bucket

Use the DagsHub client for an easy data upload process. The client supports both command line and Python API methods for uploading your datasets directly into the DagsHub Storage Bucket.

dagshub.upload_files("<user_name>/<repo_name>", "<local_path>", remote_path="<remote_path>", bucket=True)
dagshub upload <user_name>/<repo_name> <local_path> <remote_path> --bucket

Mounting & Syncing with DagsHub Storage Buckets

Unlike Google Drive, DagsHub Storage Buckets are designed with ML use cases in mind, offering a more scalable and robust backend. You can easily mount your DagsHub Storage Bucket to your Colab instance, facilitating direct access to your data for model training and inference.

Sync a Local Folder to DagsHub Storage

To sync a local folder to DagsHub Storage, simply run:

dagshub.storage.sync("<user_name>/<repo_name>", "<local_path>", "<remote_path>")

Sync a local folder with your DagsHub storage remote by specifying the paths. This command ensures your local dataset is mirrored in the DagsHub Storage Bucket.

Mounting DagsHub Storage to Colab

Mount your DagsHub Storage Bucket to a Colab notebook for direct file access. If needed, you can specify a custom mount path, by providing a path= argument. You can also provide cache=True to do smart caching that will accelerate training (at the cost of taking up more disk space).

mount_path = dagshub.storage.mount("<user_name>/<repo_name>")

To unmount the bucket, simply run:

dagshub.storage.unmount("<user_name>/<repo_name>", mount_path)

Remounting Buckets

In case of errors or if the Colab cell execution breaks, you can easily remount your DagsHub Storage Bucket using the dagshub.storage.mount() function. This ensures continuous access to your data without disruption.

Versioning your Colab notebook using Git or DVC on DagsHub

Integrating DagsHub and Colab introduces a significant improvement in notebook version control, as users can use DVC to version large notebooks that Git has trouble facilitating. DagsHub lets you diff notebooks and comment on notebook cells which unlocks collaboration for ML teams, without needing third-party platforms or sharing screenshots across Slack

To version your notebook with the DagsHub Client (run pip install dagshub to install), use the save_notebook function as follows:

from dagshub.notebook import save_notebook

save_notebook(repo="<repo_owner>/<repo_name>")

With the following argument:

  • repo (str): your DagsHub repository in the format of <repo_owner>/<repo_name>

You can also use the following optional arguments:

  • path (str): Where to save the notebook within the repository (including the filename). If the filename is not specified, we'll save it as "notebook-{datetime.now}.ipynb" under the specified folder
  • branch (str): The branch under which the notebook should be saved. Will commit to the default repo branch if not specified
  • commit_message (str): The commit message for the update
  • versioning (str): ['git'|'dvc'] The VCS used to version the notebook

Alternative way to version your Colab notebook

In some cases the above function might fail due to Colab related issues. An alternative way to version your notebook is to download it locally, then in an environment with pip install dagshub run the following snippet:

import dagshub
dagshub.upload_files(repo="<repo_owner>/<repo_name>", local_path="<path/to/notebook.ipynb>", remote_path="<path/in/remote/notebook.ipynb>", commit_message="<commit_message>")

Copy your data from GDrive to DagsHub

Follow this notebook to move your data from Google Drive to DagsHub Storage Buckets easily:

Open in Colab

Other Resources

  • Hello World - Try DagsHub without installing anything locally. The primary goal of this notebook is to help you learn the basic features and usage of DagsHub while maintaining a relatively clean environment. By following this notebook, you will create your first hello-world project on DagsHub.
  • Tensorflow , fast.ai - Learn how to log MLflow Experiments to your DagsHub's MLflow Tracking server by following a few steps.
  • DagsHub x GitHub - Learn how to use all the benefits DagsHub has to offer in your GitHub project by following this notebook.