Are you sure you want to delete this access key?
title | description |
---|---|
Log and track Hugging Face Transformer experiments with DagsHub | Log and track Hugging Face Transformers experiments with DagsHub with minimal code changes for collaboration, reproducibility, and data-driven decisions. |
The Hugging Face Transformers library is an open-source machine-learning library. It is built on top of PyTorch and TensorFlow and provides a set of pre-trained models for natural language processing tasks. With Hugging Face Transformers, developers and researchers can easily fine-tune the pre-trained models on their own datasets, or even train their own models from scratch with ease.
With DagsHub, you can easily log the experiments you run with Hugging Face Transformers to a remote server with minimal changes to your code.
This includes versioning raw and processed data with DVC, as well as logging experiment metrics, parameters, and trained models with MLflow. This integration enables you to continue using the familiar MLflow interface, while also facilitating collaboration with others, comparing results from different runs, and making data-driven decisions with ease.
DagsHub leverages the hooks developed by Hugging Face’s Transformers library to inject code at specific points during the training run. These code snippets log information regarding the training run, like metrics and artifacts, to the DagsHub remote using information provided using environment variables set before the trainer is run.
Log your transformer experiments in 3 simple steps:
=== "Mac-os, Linux, Windows"bash pip install dagshub
import dagshub
import os
dagshub.init(repo_name='Repository-Name', repo_owner='Username')
os.environ["HF_DAGSHUB_LOG_ARTIFACTS"]= "True" # optional; if disabled, only logs metrics!
dagshub.init
configures your DagsHub account and repository, including the remote Mlflow tracking server and DagsHub Storage, with your local machine. If the repository you provide as input doesn’t exist, it will automatically create it for you.
Running this command requires authenticating your DagsHub user. If you want to automate this process, you need to set your DagsHub Token under DAGSHUB_USER_TOKEN
environment variable.
!!! Important "You need to set the environment variable before you initialize the Trainer
"
??? Note "Optional Environment Variables"
The following are optional environmental variables that can be configured.
python os.environ["HF_DAGSHUB_MODEL_NAME"] = "model name" # defaults to 'main' os.environ["BRANCH"] = "branch" # defaults to 'main'
=== "Mac-os, Linux, Windows"```python from transformers import TrainingArguments, Trainer
training_args = TrainingArguments(output_dir="experiment-name")
trainer = Trainer(... , args=training_args)
```
Great job! The integration has been successfully finished. Transformers will automatically recognize the activation of DagsHub integration and include our hook in your pipeline. Consequently, every run will be logged to your DagsHub repository.
The artifacts created during training fail to get overridden if the same experiment is run multiple times. However, the experiments are still logged and can be tracked.
Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?