Collaborate on Machine Learning Projects with DagsHub and Hugging Face's Transformers
  Back to blog home

Collaborate on Machine Learning Projects with DagsHub and Hugging Face's Transformers

Integrations Mar 27, 2023

We heard you! After the incredible reception PyCaret’s integration with DagsHub was received, we are thrilled to share our latest-and-greatest integration with Hugging Face’s Transformers library!

With the latest integration between Hugging Face's Transformers and DagsHub, you can log your experiments and artifacts to DagsHub remote servers with minor changes to your code. This includes versioning raw and processed data with DVC and DDA, as well as logging experiment metrics, parameters, and trained models with MLflow. This integration allows you to continue using the familiar MLflow interface while also enabling you to collaborate with others, compare the results of different runs, and make data-driven decisions with ease.

💡  To use the integration, install pip install transformers --upgrade

What is Transformers?

Transformers is a brilliant open-source library that provides straightforward APIs to fine tune pre-trained state-of-art transformer models to a specific task. It's very contributor driven - adding new architectures everyday, spanning several modalities across a large set of domains.

The APIs it exposes are high-level, allowing us to setup complicated optimization routines with minimal effort. It also includes a lot of hooks for external goodies - allowing logging and deployment integration with virtually no legwork.

What does the integration between DagsHub and Transformers include?

Transformers provides an out-of-the-box integration with MLflow, enabling users to log important metrics, data, and plots on their local machines or a custom endpoint. This helps to organize the research phase and manage the projects as we move to production. However, it requires users to either collaborate with teammates and share results by either moving to 3rd party platform (e.g., sending screenshots on Slack) or setting up and hosting their own servers manually. Additionally, when logging data there is no easy way to see how different processing methods affected the data to make qualitative decisions.

This is where DagsHub comes into play.

DagsHub provides a remote MLflow server for each repository, enabling users to log experiments with MLflow and view and manage the results and trained models from the built-in UI. The DagsHub repository also includes a fully configured object storage to store data, models, and any large file. These files are diffable, enabling users to see the changes between different versions of their data and models, helping them to understand the impact of those changes on their results.

DagsHub Example

With the new integration between Transformers and DagsHub, you can now log experiments to your remote MLflow server hosted on DagsHub, diff experiments and share them with your friends and colleagues. On top of that, you can version your raw and processed data using DVC, push it to DagsHub to view, diff, and share them with others. All these are encapsulated under the new DagsHub Logger that is integrated into Transformers.

Dagshub example

How to use DagsHub Logger with Transformers?

You can integrate the DagsHub Logger into your training pipeline with four lines of code! Before you build your Trainer, add the following:

import dagshub
import os

dagshub.init('a-really-cool-repository-name', 'your-username')
os.environ["HF_DAGSHUB_LOG_ARTIFACTS"]= "True" # optional; if disabled, only logs metrics!

Congratulations: the integration is complete! Transformers will automatically detect that the integration is triggered and available, and will ensure that it adds our hook to your pipeline.

Conclusion

The new integration between Hugging Face and DagsHub makes it easy for you to log experiments, version data, and collaborate with others on machine learning projects. Give the DagsHub Logger a try and see how it can enhance your machine learning workflow with Transformers. Let us know how it goes on our community Discord and if you have any enhancements requests - we’d love to enrich this integration and add more capabilities!

Tags

Jinen Setpal

Machine Learning Engineer @ DAGsHub. Research in Interpretable Model Optimization within CV/NLP.

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.