FAQ
1 |
|
Q: I'm experiencing a problem or something isn't working, what should I do?¶
A: First, check if this is a common problem with a solution in our Troubleshooting page. If not, head over to our Discord server's support channel, where are team is waiting to help you out.
Q: So what IS DagsHub exactly?¶
A: DagsHub is a web platform for data version control and collaboration for data scientists and machine learning engineers.
Q: Seriously, I don’t get it...what is it?¶
A: It’s like GitHub for data science and machine learning.
Q: Why can’t I just use Git?¶
A: Basically, regular Git is not so good at versioning large files, which is important for many data science and machine learning projects.
git-lfs is an extension to Git that can be used to version large files, but that's only half of the problem.
Git and git-lfs don't version the data pipeline. Therefore, when one of the pipeline's components is modified, you won't know that the pipeline (e.g., the trained model) should be reproduced. You would have to manually ensure the downstream stages are run with the updated data/code. DagsHub integrated tools can also skip cached stages and run only updated files within your data pipeline.
Using DagsHub's suite of tools, it's possible to push and effectively version your large-data files, in a way that can be obtained from pointer files present within the Git repository.
Q: So, then, does DagsHub do all of that stuff?¶
A: The short answer is YES.
The longer answer is that DagsHub is built on Git and DVC, which is an open source command-line tool built for data and pipeline versioning. You use Git for the exact same things you would in a regular code project, and you use DVC on top for the DS/ML versioning stuff. DagsHub adds visualizations and automation features on top of that. We have connectors to both GitHub Actions and Jenkins, which lets you automate things like training and deployment on top of Git and DVC.
Q: Does that mean I need to learn a whole new framework again?¶
A: The great thing about DVC is that it doesn’t affect code versioning. You still use plain old Git for that.
DVC adds commands for DS and ML on top of that, but the syntax is similar to Git, so it’s not entirely unfamiliar. Most Git commands have a direct equivalent in DVC.
Q: So why not just use Git and DVC through the command line?¶
A: In a nutshell: DagsHub is for DVC what GitHub is for Git.
DVC is great, and so is Git. But they are both command line tools, and as such have some inconveniences which DagsHub works to resolve.
First of all, there is no convenient interface for visualizing your pipeline or getting an overview your project metrics. DagsHub shows your pipeline as a... wait for it... DAG (!!!), where every node is a file, with important details and a direct link to the file itself. This is especially important for team projects, where you want everyone on the same page and seeing the same high level picture.
You can send someone a link to your DagsHub repo, and give them a way to explore your project, including downloading your data and models from any past version, experiment, or branch, without forcing them to clone or run any code.
Building on the powerful foundations of Git and DVC, we have many more features in the works, which should make life easier for everyone.
Q: Most tools that offer data pipeline versioning require adding lines of code to my project and/or importing libraries, what does DagsHub or DVC require me to do?¶
A: NOTHING! This is why we love DVC so much. Just like Git, it is non-intrusive and not bloated. You just install the program and it works.
Q: Then surely, it works only for certain languages and with certain ML libraries?¶
A: Nope. Completely, 100% language and library agnostic. DVC, and DagsHub, don’t care if you’re using Python or R, Keras or Pytorch.
Q: Is DagsHub secure enough for my company/organization to use?¶
A: Yes, we have many companies and teams using DagsHub. There are varying degrees of security levels depending on what you need. You can use our data storage, connect external storage with your own access management, or in more extreme cases install a private instance of DagsHub on your own cloud or physical servers.
Q: OK, but I like GitHub, and that’s what I’m using for my project. So you can’t help me, right?¶
A: Actually, we can. You can connect a project from GitHub to DagsHub and enjoy the best of both worlds! The repository on DagsHub will be subscribed to a GitHub webhooks and automatically synced on push. On top of that, Pull Requests & Issues created on GitHub are shown in DagsHub and vice versa. You can use DagsHub to review code, data & models, and when done, simply click the merge button to merge on both platforms.
Q: Sounds good...How much will it cost me?¶
A: Starting at a whopping $0, DagsHub is completely free for open source projects. Private repos are currently free with up to 2 additional collaborators. If you need more collaborators, early access to new features or other special requests, you can contact us through our plans page for more details.
Q: So how do I use DagsHub?¶
A: You can start with the tutorial.
Q: Does DagsHub support DVC 3.0?¶
A: Yes, DagsHub supports DVC 3.x!