GitHub¶
GitHub is the most popular platform for software development projects. It provides out-of-the-box solutions for code collaboration and CI/CD that are unique and highly valuable. We decided to improve the integration and provide a smoother workflow for using both platforms so the DagsHub community can use the best GitHub has to offer in their machine learning projects.
How does the integration work?¶
The GitHub-connected repositories on DagsHub have a much smoother workflow. In addition to syncing the git-tracked files, DagsHub also automatically syncs the repo on push and enables management of PR and Issues from both platforms.
- Instant Repository Sync - The GitHub connected repositories are subscribed to the GitHub webhooks and are automatically synced when code is pushed.
- Sync Pull Requests - Pull requests created on GitHub are shown in DagsHub and vice versa. Use DagsHub to review code, data & models, and when you're done, click the merge button to merge on both platforms.
- Sync Issues - Issues open from DagsHub will also be opened on GitHub, with a comment linking them to the DagsHub discussion.
- View GitHub stars - The repository's star count moves with you to DagsHub. People can now star your project on both DagsHub and GitHub, and it will be shown on your DagsHub repository.
How to connect a GitHub project to DagsHub?¶
If you have a GitHub project ready to connect, you can get started in less than a minute!
- Press the blue Create + button on the top right and click + New Repository
- Select the Import Repository card
- Click on the GitHub Connect button and authorize in GitHub
- Click the Add/Revoke Access button and choose to give access to all your repositories or specific ones.
- Click the repository you want to connect on DagsHub and click Connect Repository.
DagsHub connect
What is the added value of connecting a GitHub project to DagsHub?¶
By connecting a ML project from GitHub to DagsHub, you gain a lot of benefits, here are some concrete examples:
Remote object storage and experiment tracking server¶
While your git-tracked files stay on GitHub, your project now has a free and fully configured remote object storage and experiment tracking server. To use them, all you need to do is copy a few lines of commands to your terminal (generated by DagsHub), and you're set to go.
You can manage the storage and version your data using DVC and log your experiments to the remote tracking server using MLflow Tracking.
Diff Notebooks and comment on cells¶
Now, you can view your existing notebook pull requests from GitHub, which means you can see rich diffs:
View and diff data¶
DagsHub Catalog lets you view a variety of data types hosted both on the Git server and on DagsHub Storage. You can easily view photos, videos, audio, CSV files, and more! Not only that, as part of the Pull Request, you can diff all of those formats and merge them once you're done.
Use GitHub Actions with your ML projects¶
When connecting a GitHub repository to DagsHub, you can use GitHub Actions for CI/CD as an integral part of the ML project. You can integrate DagsHub storage with your actions, trigger actions when you open a pull request from DagsHub, and more.