Solving DVC's 'failed to push data' Errors by Adding an S3 Proxy to DagsHub Storage
  Back to blog home

Solving DVC's 'failed to push data' Errors by Adding an S3 Proxy to DagsHub Storage

DVC Mar 06, 2023

Over the past few months, we’ve been drastically improving DVC’s reliability when used with DagsHub. Prior to our improvements, if you’ve ever had to push a lot of data to a DVC remote, you’ve more than likely come across the following error:

ERROR: failed to push data to the cloud - 896 files failed to upload

For 99% of use cases, these errors are a thing of the past!

There are, however, still situations, where you might run across these errors. Specifically, they can occur if you’re pushing files on a slow internet connection, pushing large files, or both.

When this happens, the usual fix is to try again (and again). Luckily in these cases, DVC knows what was successfully pushed and only retries pushing the files that failed. While it’s great that it doesn’t have to start from scratch, it’s still an annoying, manual process.

But what are you going to do? ¯\(ツ)/¯

How to solve DVC ‘failed to push data’ errors

Here at DagsHub, we weren’t content to live with this bit of DVC friction. We demand improved consistency and reliability from our tools!

After further researching the issue, it dawned on us. There is an S3 plugin for DVC, aptly named dvc-s3, which supports pushing and pulling files to an Amazon S3-compatible store. The best part is this plugin supports automatic retries for failed pushes!

Animated gif of a baby celebrating emphatically at a sporting event. The text reads, "YESSSSSSS!!"

So, we added an S3 proxy to DagsHub Storage to make it compatible with the S3 plugin.

To use it, you first need to install the dvc-s3 plugin:

pip install dvc-s3

After that, you just need to set up the S3 remote using the following commands:

dvc remote add origin-s3 s3://dvc
dvc remote modify origin-s3 endpointurl <https://dagshub.com/><user-name>/<repo-name>.s3
dvc remote modify origin-s3 --local access_key_id <Token>
dvc remote modify origin-s3 --local secret_access_key <Token>

Once this is setup, you can use DVC as usual, but using the origin-s3 remote like so:

dvc push -r origin-s3

And those failed to push data to the cloud errors truly are a thing of the past!

Tags

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.