Introducing: DagsHub Connect - The Complete GitHub Integration is Here
  Back to blog home

Introducing: DagsHub Connect - The Complete GitHub Integration is Here

GitHub Apr 25, 2022

Why did we build the integration?

"Building ML projects with the tools you love" describes a major part of what we’re doing at DagsHub. Knowing that ML projects have so many components and no one magical tool that can provide everything, DagsHub takes the best tools out there and put them under the same roof.

After numerous user requests for using GitHub Actions with DagsHub, connecting pull requests & issues so that we can review data and notebook changes, and more, we decided it’s time to take the GitHub integration to the next level! Today, we’re happy to announce DagsHub Connect to do just that!

The new and improved integration take the relationship between DagsHub and GitHub to more than JUST a mirror, by providing a full workflow for using both platforms, so users can utilize both platforms; GitHub for code reviews & CI/CD tools, and DagsHub for data science review & data versioning + experiment tracking.

Why did we build the integration?

What's New?

Up until now the GitHub connection only allowed for mirroring of git files from GitHub to DagsHub, syncing them periodically. This allowed you to use DagsHub powers like DVC & MLflow remotes, but as we said, it’s not enough.

From now on, the GitHub-connected repositories on DagsHub will have a complete workflow. In addition to the mentioned above, the new update adds:

  • Instant Repository Sync - The GitHub connected repositories are subscribed to the GitHub webhooks and are automatically kept in sync.
  • Sync Pull Requests & Issues - Pull requests & Issues created on GitHub are shown in DagsHub and vice versa. Use GitHub Actions for testing, training and deployment, and DagsHub to review code, data, models & annotations and when you're done, click the merge button to merge on both platforms.
  • View GitHub stars - The repository's star count moves with you to DagsHub! People can now star your project on both DagsHub and GitHub, and it will be shown on your DagsHub repository.
  • Specific Repository Access - You are now in control of which repos you give us access to. No more need to give full account-wide access, just choose the repositories you want to connect and grant DagsHub access only to them.

How to connect a GitHub repository to DagsHub?

If you have a GitHub project ready to connect, you can get started in less than a minute!

  • Step 1: Press the green ‘+ Create’ button on the top right and click ‘Connect A Repo’
  • Step 2: Click on the GitHub connect button and authorize in GitHub
  • Step 3: Choose to either give access to all your repositories or specific ones you want to connect.
  • Step 4: Click the repository you want to connect on DagsHub and click Connect Repository.

Boom! now you have a GitHub connected repo - as easy as that.

From now, new issues and PRs will get a nifty comment to help you move easily to DagsHub and enjoy all of its features.

Join the discussion

Note: You can change the comment settings in your DagsHub repository settings, under ‘GitHub Connection Settings’.

Githib connection settings

Superpowers gained by connecting a GitHub repository to DagsHub

Ok, I connected my repository. Now what?
I’m glad you asked, my italic text friend.

By connecting your repository to DagsHub, you gain DagsHub Storage, Experiment Tracking, DagsHub Annotation, Data Pipeline, Data visualizations and Data science review features.

What are those? you may ask, well let’s explore some of the powers you gain.

Fully configured DVC remote and MLflow server

If you’re already familiar with DagsHub, skip to the following section to read about new benefits

While your git-tracked code files stay on GitHub, your data, models & experiments live on DagsHub, with free fully configured DVC remote and an MLflow server. To use each one, all you need to do is copy a few lines of commands to your terminal (generated by DagsHub), and you're set to go.

Fully configured DVC remote and MLflow server
Press the ‘?’ icon to reveal the commands you need to type in order to use the remote.

You can manage the storage and version your data using DVC and log your experiments to the remote tracking server using MLflow Tracking.

But why do I need all of those? I'm doing just fine with my Git server.

Are you really? because it kinda looks like your data is a big mess of zip files on Google Drive 🙄. If you’re an individual or a team looking for an organized, professional workflow, let's explore some practical use cases where DagsHub comes in handy.


1. Diff Notebooks and comment on cells from DagsHub

Reviewing a notebook as part of a PR in GitHub is not the nicest experience. GitHub treats the notebooks just like any other code file, and tries to show a diff of the underlying JSON of the notebook. This is not useful to actually see what changed, especially if you want to compare rich outputs like graphs, images, etc.

With the new integration, you can use DagsHub view your existing notebook pull requests from GitHub, which means you can see rich diffs:

This is what happens when trying to view notebooks diff on GitHub 😔
This is how they appear on DagsHub 😇

And also comment on everything, including specific notebook cells!

1. Diff Notebooks and comment on cells from DagsHub
This notebook is from the repo deadtrees by Christian Werner

2. Use GitHub Actions with your ML projects

Probably the most requested feature we’ve been getting for a very long time is how do I use DagsHub with GH actions! We listened to your requests, and you can now use GH actions with PRs opened via DagsHub – things like data testing, automating training, and deploying a model are now all possible using GH actions. Stay tuned for additional examples on how to do this. You can integrate DagsHub storage with your actions and trigger actions when you open & merge pull requests, push code & data.

The sky is the limit! We can’t wait to see what you come up with 🥹

3. View and diff various data types

While GitHub hosts and let you see & review your code, your data often lives someplace else. Many times that storage will not have a good way of visualizing all your different data types, let alone tools that let you review it.

DagsHub lets you view a variety of data types hosted both on the Git server, DagsHub Storage or your own object storage. You can easily view photos, videos, audio, CSV files, and more!

Not only that, the data is versioned so you can show a diff of all of those formats, open a data PR, review the changes, and merge them once you're done.

DagsHub let’s you see & review annotation files made using ‘DagsHub Annotation’
dagshub annotation

4. Stars ✨

To keep your long-earned reputation on GitHub, the repository stars move to DagsHub with you! Not only that, you can now get stars from both GitHub and DagsHub!

What’s Next?

What do you think of the GitHub integration? We'd love to hear what you think about the improved integration with GitHub? What would you like to see next?

Tell us via our Discord channel!

See you next time!

Tags

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.