Repository Bucket

dagshub.storage.mount(repo: str, cache: bool = False, path: Path = None) PathLike

Mounts a DAGsHub repository bucket to a local directory.

Warning

This function is only supported on Linux machines and on macOS via FUSE for macOS (FUSE-T or macFUSE). It may not work as expected on other operating systems due to differences in the handling of filesystem mounts.

Parameters:
  • repo – The repository in the format <repo_owner>/<repo_name>. This is used to determine the remote name and mount point.

  • cache – Optional. A boolean flag that enables or disables the cache feature. If True, caching is enabled with specific settings –vfs-cache-max-age 24h.

  • path – Optional. A Path object specifying the custom mount path. If not provided, the mount directory is determined based on the current working directory and the repository name.

Note

This function, as well sync() are using RClone to work. If it’s not installed, you’ll get instructions how to install it.

dagshub.storage.unmount(repo, path=None)

Unmounts a previously mounted DAGsHub repository bucket from the local file system.

Parameters:
  • repo – The name of the repository. Used to determine the default mount point if a custom path is not provided.

  • path – Optional. A custom path to the mount point. If not provided, the default logic is used to determine the mount point based on the repository name.

dagshub.storage.sync(repo: str, local_path: str | PathLike, remote_path: str | PathLike)

Synchronizes the contents of a local directory with a specified remote directory in a DAGsHub repository using Rclone.

Parameters:
  • repo – A string in the <repo_owner>/<repo_name> format representing the target DAGsHub repository.

  • local_path – A Path object or string pointing to the local directory to be synchronized.

  • remote_path – A Path object or string representing the remote directory path relative to the DagsHub Storage bucket root.

dagshub.storage.rclone_init(repo_owner: str, conf_path: Path | None = None, update=False, quiet=False) Tuple[str, Path]

Initializes or updates the Rclone configuration for a DAGsHub repository.

Parameters:
  • repo_owner – The owner of the repository. This is used to create a unique section in the Rclone configuration.

  • conf_path – Optional. The path to the Rclone configuration file. If not provided, the default path is used.

  • update – Optional. A boolean flag indicating whether to update the configuration if it already exists. Defaults to False.

  • quiet – Optional. A boolean flag that controls the output of the function. If False, the function will print messages about its operation.

Returns:

Name of the remote for rclone + The absolute path to the Rclone configuration file.

dagshub.get_repo_bucket_client(repo: str, flavor: Literal['boto', 's3fs'] = 'boto')

Creates an S3 client for the specified repository’s DagsHub storage bucket

Available flavors are:
"boto" (Default): Returns a boto3.client. with predefined EndpointURL and credentials.

The name of the bucket is the name of the repository, and you will need to specify it for any request you make.

Example usage:

boto_client = get_repo_bucket_client("user/my-repo")

# Upload file
boto_client.upload_file(
    Bucket="my-repo",      # name of the repo
    Filename="local.csv",  # local path of file to upload
    Key="remote.csv",      # remote path where to upload the file
)
# Download file
boto_client.download_file(
    Bucket="my-repo",      # name of the repo
    Key="remote.csv",      # remote path from where to download the file
    Filename="local.csv",  # local path where to download the file
)
"s3fs": Returns a s3fs.S3FileSystem with predefined EndpointURL and credentials.

The name of the bucket is the name of the repository, and you will need to specify it for any request you make

Example usage:

s3fs_client = get_repo_bucket_client("user/my-repo", flavor="s3fs")

# Read from file
with s3fs_client.open("my-repo/remote.csv", "rb") as f:
    print(f.read())

# Write to file
with s3fs_client.open("my-repo/remote.csv", "wb") as f:
    f.write(b"Content")

# Upload file (can also upload directories)
s3fs_client.put(
    "local.csv",           # local path of file/dir to upload
     "my-repo/remote.csv"  # remote path where to upload the file
)
Parameters:
  • repo – Name of the repo in the format of username/reponame

  • flavor – one of the possible s3 client flavor variants