File Uploading

dagshub.upload_files(repo, local_path, commit_message='Upload files using DagsHub client', remote_path=None, bucket=False, **kwargs)

Upload file(s) into a repository.

Parameters:
  • repo (str) – Repo name in the form of <username>/<reponame>.

  • local_path (Union[str, PathLike]) – File or directory to be uploaded.

  • commit_message (optional) – Specify a commit message.

  • remote_path (Optional[str]) – Specify the path to upload the file to. Defaults to the relative component of local_path to CWD.

  • bucket (bool) – Upload the file(s) to the DagsHub Storage bucket

For kwarg docs look at Repo.upload().

dagshub.upload.create_repo(repo_name, org_name='', description='', private=False, auto_init=False, gitignores='Python', license='', readme='', template='custom', host='')

Creates a repository on DagsHub for the current user or an organization passed as an argument

Parameters:
  • repo_name (str) – Name of the repository to be created.

  • org_name (optional) – Organization that will own the repo. Alternative to creating a repository owned by you.

  • description (str) – Repository description.

  • private (bool) – Set to True to make repository private.

  • auto_init (bool) – Set to True to create an initial commit with README, .gitignore and LICENSE.

  • gitignores (str) – Which gitignore template(s) to use in a comma separated string.

  • license (str) – Which license file to use.

  • readme (str) – Readme template to initialize with.

  • template (str) –

    Which project template to use, options are:

    • "none" - creates an empty repo

    • "custom" - creates a repo with your specified gitignores, license and readme

    • "notebook-template"

    • "cookiecutter-mlops"

    • "cookiecutter-dagshub-dvc"

    By default, creates an empty repo if none of gitignores, license or readme were provided. Otherwise, the template is "custom".

  • host (str) – URL of the DagsHub instance to host the repo on.

Note

To learn more about the templates, visit https://dagshub.com/docs/feature_guide/project_templates/

Returns:

Repo object of the repository created.

Return type:

Repo

dagshub.upload.create_dataset(repo_name, local_path, glob_exclude='', org_name='', private=False)

Create a new repository on DagsHub and upload an entire folder dataset to it

Parameters:
  • repo_name (str) – Name of the repository to be created.

  • local_path (str) – local path where the dataset to upload is located.

  • glob_exclude (str) – glob pattern to exclude certain files from being uploaded.

  • org_name (optional) – Organization that will own the repo. Alternative to creating a repository owned by you.

  • private – Set to True to make the repository private.

Returns:

Repo object of the repository created.

Return type:

Repo

class dagshub.upload.Repo(owner, name, username=None, password=None, token=None, branch=None)
__init__(owner, name, username=None, password=None, token=None, branch=None)

Class that can be used to upload files into a repository on DagsHub

Warning

This class is not thread safe. Uploading files in parallel can lead to unexpected outcomes

Parameters:
  • owner (str) – user or org that owns the repository.

  • name (str) – name of the repository.

  • token (optional) – Token to use for authentication. If unset, uses the cached token or goes through OAuth.

  • username (Optional[str]) – Username to log in with (alternative to token).

  • password (Optional[str]) – Password to log in with (alternative to token).

  • branch (Optional[str]) – Branch to upload files to.

upload(local_path, commit_message='Upload files using DagsHub client', remote_path=None, bucket=False, **kwargs)

Upload a file or a directory to the repo.

Parameters:
  • local_path (Union[str, PathLike]) – Path to file or directory to be uploaded

  • commit_message – Specify a commit message

  • remote_path (Optional[str]) – Specify the path to upload the file/dir to. If unspecified, sets the value to the relative component of local_path to CWD. If local_path is not relative to CWD, remote_path is the last component of the local_path

  • bucket (bool) – Upload to the DagsHub Storage bucket (s3-compatible) without versioning, if this is set to true,

  • ignored. (commit_message will be)

The kwargs are the parameters of upload_files()

upload_files(files, directory_path='', commit_message='Upload files using DagsHub client', versioning='auto', new_branch=None, last_commit=None, force=False, quiet=False)

Upload a list of binary files to the specified directory. This function is lower level than upload(), but useful when you don’t have the files stored on the filesystem.

Parameters:
  • files (List[Tuple[str, BinaryIO]]) – List of Tuples of (path in repo, binaryIO) of files to upload

  • directory_path (str) – Directory in repo relative to which to upload files

  • commit_message (Optional[str]) – Commit message

  • versioning (str) – Which versioning system to use to upload a file. Possible options: "git", "dvc", "auto" (default, best effort guess)

  • new_branch (Optional[str]) – Create a new branch with this name

  • last_commit (Optional[str]) – Consistency argument - last revision of the files you want to commit on top of. Exists to prevent accidental overwrites of data.

  • force (bool) – Force the upload of a file even if it is already present on the server. Sets last_commit to be the tip of the branch

  • quiet (bool) – Don’t show messages about starting/successfully completing an upload. Set to True when uploading a directory

directory(path) DataSet

Create a DataSet object that allows you to stage multiple files before pushing them all to DagsHub in a single commit with commit().

Parameters:

path (str) – The path of the directory in the repository relative to which the files will be uploaded.

Return type:

DataSet

upload_files_to_bucket(local_path, remote_path, max_workers=8, **kwargs)

Upload a file or directory to an S3 bucket, preserving the directory structure.

Parameters:
  • local_path (Path) – Path to the local directory or file to upload

  • remote_path (str) – The directory path within the S3 bucket

  • max_workers (int) – The maximum number of threads to use

class dagshub.upload.wrapper.DataSet(repo, directory)

Not to be confused with DataEngine’s datasets. This class represents a folder with files that are going to be uploaded to a repo.

add(file, path=None)

Add a file to upload. The file will not be uploaded unless you call commit()

Parameters:
  • file (Union[str, BinaryIO]) – Path to the file on the filesystem OR the contents of the file.

  • path (Union[str, Path, None]) – Where to store the file in the repo.

add_dir(local_path, glob_exclude='', commit_message=None, **upload_kwargs)

Add and upload an entire directory to the DagsHub repository.

By default, this uploads a dvc folder.

Parameters:
  • local_path (str) – Local path of the directory to upload.

  • glob_exclude – Glob pattern to exclude some files from being uploaded.

  • commit_message – Message of the commit with the upload.

The keyword arguments are passed to Repo.upload_files().

commit(commit_message='Upload files using DagsHub client', *args, **kwargs)

Commit files added with add() to the repo

Parameters:

commit_message – Message of the commit with the upload.

Other positional and keyword arguments are passed to Repo.upload_files()