File Uploading

dagshub.upload_files(repo: str, local_path: str | IOBase, commit_message='Upload files using DagsHub client', remote_path: str = None, bucket: bool = False, **kwargs)

Upload file(s) into a repository.

Parameters:
  • repo – Repo name in the form of <username>/<reponame>.

  • local_path – File or directory to be uploaded.

  • commit_message (optional) – Specify a commit message.

  • remote_path – Specify the path to upload the file to. Defaults to the relative component of local_path to CWD.

  • bucket – Upload the file(s) to the DagsHub Storage bucket

For kwarg docs look at Repo.upload().

dagshub.upload.create_repo(repo_name: str, org_name: str = '', description: str = '', private: bool = False, auto_init: bool = False, gitignores: str = 'Python', license: str = '', readme: str = '', template: str = 'custom', host: str = '')

Creates a repository on DagsHub for the current user or an organization passed as an argument

Parameters:
  • repo_name – Name of the repository to be created.

  • org_name (optional) – Organization that will own the repo. Alternative to creating a repository owned by you.

  • description – Repository description.

  • private – Set to True to make repository private.

  • auto_init – Set to True to create an initial commit with README, .gitignore and LICENSE.

  • gitignores – Which gitignore template(s) to use in a comma separated string.

  • license – Which license file to use.

  • readme – Readme template to initialize with.

  • template

    Which project template to use, options are:

    • "none" - creates an empty repo

    • "custom" - creates a repo with your specified gitignores, license and readme

    • "notebook-template"

    • "cookiecutter-mlops"

    • "cookiecutter-dagshub-dvc"

    By default, creates an empty repo if none of gitignores, license or readme were provided. Otherwise, the template is "custom".

  • host – URL of the DagsHub instance to host the repo on.

Note

To learn more about the templates, visit https://dagshub.com/docs/feature_guide/project_templates/

Returns:

Repo object of the repository created.

Return type:

Repo

dagshub.upload.create_dataset(repo_name: str, local_path: str, glob_exclude: str = '', org_name: str = '', private=False)

Create a new repository on DagsHub and upload an entire folder dataset to it

Parameters:
  • repo_name – Name of the repository to be created.

  • local_path – local path where the dataset to upload is located.

  • glob_exclude – glob pattern to exclude certain files from being uploaded.

  • org_name (optional) – Organization that will own the repo. Alternative to creating a repository owned by you.

  • private – Set to True to make the repository private.

Returns:

Repo object of the repository created.

Return type:

Repo

class dagshub.upload.Repo(owner: str, name: str, username: str | None = None, password: str | None = None, token: str | None = None, branch: str | None = None)
__init__(owner: str, name: str, username: str | None = None, password: str | None = None, token: str | None = None, branch: str | None = None)

Class that can be used to upload files into a repository on DagsHub

Warning

This class is not thread safe. Uploading files in parallel can lead to unexpected outcomes

Parameters:
  • owner – user or org that owns the repository.

  • name – name of the repository.

  • token (optional) – Token to use for authentication. If unset, uses the cached token or goes through OAuth.

  • username – Username to log in with (alternative to token).

  • password – Password to log in with (alternative to token).

  • branch – Branch to upload files to.

upload(local_path: str | IOBase, commit_message='Upload files using DagsHub client', remote_path: str = None, bucket: bool = False, **kwargs)

Upload a file or a directory to the repo.

Parameters:
  • local_path – Path to file or directory to be uploaded

  • commit_message – Specify a commit message

  • remote_path – Specify the path to upload the file/dir to. If unspecified, sets the value to the relative component of local_path to CWD. If local_path is not relative to CWD, remote_path is the last component of the local_path

  • bucket – Upload to the DagsHub Storage bucket (s3-compatible) without versioning, if this is set to true,

  • ignored. (commit_message will be)

The kwargs are the parameters of upload_files()

upload_files(files: List[Tuple[PathLike, BinaryIO]], directory_path: str = '', commit_message: str | None = 'Upload files using DagsHub client', versioning: str = 'auto', new_branch: str = None, last_commit: str = None, force: bool = False, quiet: bool = False)

Upload a list of binary files to the specified directory. This function is lower level than upload(), but useful when you don’t have the files stored on the filesystem.

Parameters:
  • files – List of Tuples of (path in repo, binaryIO) of files to upload

  • directory_path – Directory in repo relative to which to upload files

  • commit_message – Commit message

  • versioning – Which versioning system to use to upload a file. Possible options: "git", "dvc", "auto" (default, best effort guess)

  • new_branch – Create a new branch with this name

  • last_commit – Consistency argument - last revision of the files you want to commit on top of. Exists to prevent accidental overwrites of data.

  • force (bool) – Force the upload of a file even if it is already present on the server. Sets last_commit to be the tip of the branch

  • quiet (bool) – Don’t show messages about starting/successfully completing an upload. Set to True when uploading a directory

directory(path: str) DataSet

Create a DataSet object that allows you to stage multiple files before pushing them all to DagsHub in a single commit with commit().

Parameters:

path – The path of the directory in the repository relative to which the files will be uploaded.

upload_files_to_bucket(local_path, remote_path, max_workers=8, **kwargs)

Upload a file or directory to an S3 bucket, preserving the directory structure.

Parameters:
  • local_path – Path to the local directory to upload

  • remote_path – The directory path within the S3 bucket

  • max_workers – The maximum number of threads to use

class dagshub.upload.wrapper.DataSet(repo: Repo, directory: str)

Not to be confused with DataEngine’s datasets. This class represents a folder with files that are going to be uploaded to a repo.

add(file: str | IOBase, path=None)

Add a file to upload. The file will not be uploaded unless you call commit()

Parameters:
  • file – Path to the file on the filesystem OR the contents of the file.

  • path – Where to store the file in the repo.

add_dir(local_path, glob_exclude='', commit_message=None, **upload_kwargs)

Add and upload an entire directory to the DagsHub repository.

By default, this uploads a dvc folder.

Parameters:
  • local_path – Local path of the directory to upload.

  • glob_exclude – Glob pattern to exclude some files from being uploaded.

  • commit_message – Message of the commit with the upload.

The keyword arguments are passed to Repo.upload_files().

commit(commit_message='Upload files using DagsHub client', *args, **kwargs)

Commit files added with add() to the repo

Parameters:

commit_message – Message of the commit with the upload.

Other positional and keyword arguments are passed to Repo.upload_files()