Skip to content
Reader Mode

Found a problem?
Let us know (or fix it):

Edit this Page

Have a question?
Join our community now:

Discord Chat

Integration Guide

Data scientists want to use the best tools to tackle their challenges, and make their work as streamlined and enjoyable as possible.

At DAGsHub we believe the best way to do that, is not to reinvent the wheel, but to integrate the best tools for your needs. We want to support the entire data science project lifecycle by integrating with a wide range of open-source tools. Each tool gives a partial set of capabilities, and hosting them under one roof creates a holistic solution for teamwork on data science projects, without losing flexibility and modularity.

Here you will find all the tools that DAGsHub is integrated with and links to their usage. We are always happy to hear our users' feedback on the existing tools we're integrated with and additional capabilities they would like to add. We invite you to our Discord channel where you can contact us directly.

GitHub, GitLab, Bitbucket, and other Git Servers

A question that comes up a lot is how is DAGsHub different from GitHub (or other Git servers)? In short, DAGsHub adds many features and integrations that are dedicated to the machine learning and data science workflow. However, you don't need to choose between DAGsHub and GitHub.

DAGsHub is integrated with GitHub, enabling you to connect any GitHub project and enjoy the best of both worlds. If you prefer to host your project directly on DAGsHub, you can do that as well.

What is the integration scope?

  • You can sign up with your GitHub account.
  • Connect an existing repository easily so that you can continue pushing your code to GitHub, while viewing your data, models, pipelines, and experiments on DAGsHub.
  • If you signed up with GitHub, you can also do a smart connection which will scan your existing GitHub repositories, show their info, and let you choose which one to connect.

DVC

What is DVC?

DVC is an open-source version control tool for machine learning projects designed to handle large files, data sets, machine learning models, and metrics. It works on top of Git, so that it can easily integrate with your existing Git code repositories.

What is the integration scope?

  • DAGsHub Storage – a DVC storage remote that is automatically configured when you create a DAGsHub repository, with up to 10 GB of free space per user.
  • View files tracked by DVC and hosted on DAGsHub storage, AWS S3, or Google storage in the repository UI.
  • Visualize DVC pipelines interactively with valuable information about the different files, metrics, and data processing steps.
  • Generate experiments using the output metrics files of DVC pipelines.

MLflow

What is MLflow?

MLflow is an open-source platform to manage the machine learning lifecycle, including experimentation, reproducibility, deployment, and a central model registry.

What is the integration scope?

  • An MLflow tracking server that is automatically configured with every DAGsHub repository you create.
  • Present experiments that are logged to the repository MLflow tracking server in the experiment tab.

Google Colab

What is Google Colab?

Google Colaboratory, or "Colab" for short, is a free Jupyter notebook environment that runs entirely in the cloud. It does not require any setup, can be shared easily with team members, and provides free access to GPUs.

What is the integration scope?

  • Clone, configure, and fully use DAGsHub repository in the run time.
  • Live-log of experiments using MLflow to DAGsHub repository

Google Colab Examples

Jenkins

What is Jenkins?

Jenkins is the most popular and mature open-source tool for CI/CD and automation, and it can also be used to automate Data Science and Machine Learning workflows.

What is the integration scope?

  • DAGsHub has an official Jenkins plugin that you can use to automatically scan your DAGsHub repositories, and execute custom pipelines on each: Git branch push, Git tag creation, and Pull request, among other actions.
  • The full capabilities are reviewed in our Jenkins documentation

Jenkins Examples

  • If you're looking for examples on how to create ML automations with Jenkins, check out this article series written by one of our community members:
    • Part 1 is a high level overview of how Jenkins can be used in an ML project setting.
    • Part 2 is an in depth guide into creating an ML project with Jenkins, where models are automatically trained when new code versions are pushed to DAGsHub.

Webhook

DAGsHub supports webhooks for repository events. You can find it in the settings page of your project:

https://dagshub.com/<username>/<reponame>/settings/hooks

What is the integration scope?

  • We currently support three versions of formats – DAGsHub, Slack, and Discord.
  • All event pushes are POST requests