Are you sure you want to delete this access key?
DagsHub automatically configures a remote object storage for every repository with 10 GB of free space. Everyone can use it without having a degree in DevOps and a billing account in a cloud provider. The storage can be used as a general purpose storage bucket, or you can utilize DVC to get more advanced versioning capabilities.
!!! info "Connect External Storage Buckets" You can also get all the benefits of DagsHub Storage with your own storage bucket. To learn how to do that, go the guide for connecting external storage.
Every repository is provided with two places you can store the data your project needs:
You can use your access token to interact with either of them, and access control is based on the access control of your repository. Meaning that only repository writers can change the data, and if your repository is private, then you're in control of who can look at and read the files.
Both of the storages are explorable through the web interface of the repository and through the Content API.
DVC data will be shown along with git repository files, whenever we find any dvc pointer files (.dvc
) that were pushed to git, and the bucket is explorable through "DagsHub Storage" entry in the "Storage Buckets" section at the homepage of your repository.
??? info "The DVC remote and the bucket are separate from each other" That means that the files you pushed to the DVC remote won't show up in the storage bucket, and same for the opposite.
The DagsHub Python client has a function that can generate clients for the following S3 libraries:
To get a client add the following code, then use the client according to the documentation of the library:
!!! note Most functions in the libraries ask for the name of the bucket as an argument. In those cases, use the name of the repository as the bucket name.
=== "boto3" ```python from dagshub import get_repo_bucket_client
boto_client = get_repo_bucket_client("<user>/<repo>", flavor="boto")
# Upload file
boto_client.upload_file(
Bucket="<repo>", # name of the repo
Filename="local.csv", # local path of file to upload
Key="remote.csv", # remote path where to upload the file
)
# Download file
boto_client.download_file(
Bucket="<repo>", # name of the repo
Key="remote.csv", # remote path from where to download the file
Filename="local.csv", # local path where to download the file
)
```
=== "s3fs" ```python from dagshub import get_repo_bucket_client
s3fs_client = get_repo_bucket_client("<user>/<repo>", flavor="s3fs")
# Read from file
with s3fs_client.open("<repo>/remote.csv", "rb") as f:
print(f.read())
# Write to file
with s3fs_client.open("<repo>/remote.csv", "wb") as f:
f.write(b"Content")
# Upload file (can also upload directories)
s3fs_client.put(
"local.csv", # local path of file/dir to upload
"<repo>/remote.csv" # remote path where to upload the file
)
```
RClone{rel=nofollow, target=_blank} is a very convenient CLI tool that allows you to synchronize data between different storages, be they local, FTP or object storages.
To add the DagsHub storage bucket to RClone as a remote:
rclone config
in your terminaln
to add a new remotes3
as the storage typeOther
(last) as the providerEnter AWS credentials in the next step.
to enter the credentials manuallyendpoint
to https://dagshub.com/api/v1/repo-buckets/s3/<user>
(where user is the owner of the repository)The resulting config from RClone should look like this:
Configuration complete.
Options:
- type: s3
- provider: Other
- access_key_id: <token>
- secret_access_key: <token>
- endpoint: https://dagshub.com/api/v1/repo-buckets/s3/<user>
After the setup is done you can start using RClone with the bucket!
Here's an example of how you can copy a local folder to the bucket (assuming the name of the remote in RClone is dagshub
):
rclone sync <local_path_to_folder> dagshub:<repo_name>/<remote_path_to_folder>
If your operating system supports FUSE, you can use s3fs-fuse{rel=nofollow, target=_blank} to mount any S3 compatible bucket as a local filesystem. This includes DagsHub storage! After you mount a bucket you can interact with it in your file explorer.
Here's how you can mount the dagshub bucket at /media/dagshub
(make sure that the path exists and you have access to it):
AWS_ACCESS_KEY_ID=<token> AWS_SECRET_ACCESS_KEY=<token> s3fs <repo_name> /media/dagshub -o url=https://dagshub.com/api/v1/repo-buckets/s3/<user> -o use_path_request_style
The use_path_request_style
is required for s3fs to function with our bucket. If you need to launch it in non-background mode add the -f
flag in the end
Enter a terminal in your project, paste the commands and run them (the following commands are an example, and the actual commands are the ones you should copy from the Remote dropdown):
dvc remote add origin s3://dvc
dvc remote modify origin endpointurl https://dagshub.com/<DagsHub-user-name>/hello-world.s3
dvc remote modify origin --local access_key_id <Token>
dvc remote modify origin --local secret_access_key <Token>
??? info "Why --local?"
Everything you configure without --local
will end up in the .dvc/config
file, which is tracked by git, and appear in your repository. Personal info like authentication details should always be kept local.
Note: You need to be inside a Git and DVC directory for this process to succeed. To learn how to do that, please follow the first part of the Get Started section.
dvc pull -r origin
dvc push -r origin
In case you have a usecase not covered in the "Working with the S3 compatible storage bucket" section, here are the credentials you need to connect to the bucket:
<name of the repo>
https://dagshub.com/api/v1/repo-buckets/s3/<username>
<Token>
<Token>
The region is irrelevant. If your library/program requires a region, put in the AWS default us-east-1
Objects:
Multipart uploads:
Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?