File Uploading¶
- dagshub.upload_files(repo: str, local_path: str | IOBase, commit_message='Upload files using DagsHub client', remote_path: str = None, bucket: bool = False, **kwargs)¶
Upload file(s) into a repository.
- Parameters:
repo – Repo name in the form of
<username>/<reponame>
.local_path – File or directory to be uploaded.
commit_message (optional) – Specify a commit message.
remote_path – Specify the path to upload the file to. Defaults to the relative component of
local_path
to CWD.bucket – Upload the file(s) to the DagsHub Storage bucket
For kwarg docs look at
Repo.upload()
.
- dagshub.upload.create_repo(repo_name: str, org_name: str = '', description: str = '', private: bool = False, auto_init: bool = False, gitignores: str = 'Python', license: str = '', readme: str = '', template: str = 'custom', host: str = '')¶
Creates a repository on DagsHub for the current user or an organization passed as an argument
- Parameters:
repo_name – Name of the repository to be created.
org_name (optional) – Organization that will own the repo. Alternative to creating a repository owned by you.
description – Repository description.
private – Set to
True
to make repository private.auto_init – Set to True to create an initial commit with README, .gitignore and LICENSE.
gitignores – Which gitignore template(s) to use in a comma separated string.
license – Which license file to use.
readme – Readme template to initialize with.
template –
Which project template to use, options are:
"none"
- creates an empty repo"custom"
- creates a repo with your specifiedgitignores
,license
andreadme
"notebook-template"
"cookiecutter-mlops"
"cookiecutter-dagshub-dvc"
By default, creates an empty repo if none of
gitignores
,license
orreadme
were provided. Otherwise, the template is"custom"
.host – URL of the DagsHub instance to host the repo on.
Note
To learn more about the templates, visit https://dagshub.com/docs/feature_guide/project_templates/
- Returns:
Repo object of the repository created.
- Return type:
- dagshub.upload.create_dataset(repo_name: str, local_path: str, glob_exclude: str = '', org_name: str = '', private=False)¶
Create a new repository on DagsHub and upload an entire folder dataset to it
- Parameters:
repo_name – Name of the repository to be created.
local_path – local path where the dataset to upload is located.
glob_exclude – glob pattern to exclude certain files from being uploaded.
org_name (optional) – Organization that will own the repo. Alternative to creating a repository owned by you.
private – Set to
True
to make the repository private.
- Returns:
Repo object of the repository created.
- Return type:
- class dagshub.upload.Repo(owner: str, name: str, username: str | None = None, password: str | None = None, token: str | None = None, branch: str | None = None)¶
- __init__(owner: str, name: str, username: str | None = None, password: str | None = None, token: str | None = None, branch: str | None = None)¶
Class that can be used to upload files into a repository on DagsHub
Warning
This class is not thread safe. Uploading files in parallel can lead to unexpected outcomes
- Parameters:
owner – user or org that owns the repository.
name – name of the repository.
token (optional) – Token to use for authentication. If unset, uses the cached token or goes through OAuth.
username – Username to log in with (alternative to token).
password – Password to log in with (alternative to token).
branch – Branch to upload files to.
- upload(local_path: str | IOBase, commit_message='Upload files using DagsHub client', remote_path: str = None, bucket: bool = False, **kwargs)¶
Upload a file or a directory to the repo.
- Parameters:
local_path – Path to file or directory to be uploaded
commit_message – Specify a commit message
remote_path – Specify the path to upload the file/dir to. If unspecified, sets the value to the relative component of
local_path
to CWD. Iflocal_path
is not relative to CWD,remote_path
is the last component of thelocal_path
bucket – Upload to the DagsHub Storage bucket (s3-compatible) without versioning, if this is set to true,
ignored. (commit_message will be)
The kwargs are the parameters of
upload_files()
- upload_files(files: List[Tuple[PathLike, BinaryIO]], directory_path: str = '', commit_message: str | None = 'Upload files using DagsHub client', versioning: str = 'auto', new_branch: str = None, last_commit: str = None, force: bool = False, quiet: bool = False)¶
Upload a list of binary files to the specified directory. This function is lower level than
upload()
, but useful when you don’t have the files stored on the filesystem.- Parameters:
files – List of Tuples of (path in repo, binaryIO) of files to upload
directory_path – Directory in repo relative to which to upload files
commit_message – Commit message
versioning – Which versioning system to use to upload a file. Possible options:
"git"
,"dvc"
,"auto"
(default, best effort guess)new_branch – Create a new branch with this name
last_commit – Consistency argument - last revision of the files you want to commit on top of. Exists to prevent accidental overwrites of data.
force (bool) – Force the upload of a file even if it is already present on the server. Sets last_commit to be the tip of the branch
quiet (bool) – Don’t show messages about starting/successfully completing an upload. Set to True when uploading a directory
- directory(path: str) DataSet ¶
Create a
DataSet
object that allows you to stage multiple files before pushing them all to DagsHub in a single commit withcommit()
.- Parameters:
path – The path of the directory in the repository relative to which the files will be uploaded.
- upload_files_to_bucket(local_path, remote_path, max_workers=8, **kwargs)¶
Upload a file or directory to an S3 bucket, preserving the directory structure.
- Parameters:
local_path – Path to the local directory to upload
remote_path – The directory path within the S3 bucket
max_workers – The maximum number of threads to use
- class dagshub.upload.wrapper.DataSet(repo: Repo, directory: str)¶
Not to be confused with DataEngine’s datasets. This class represents a folder with files that are going to be uploaded to a repo.
- add(file: str | IOBase, path=None)¶
Add a file to upload. The file will not be uploaded unless you call
commit()
- Parameters:
file – Path to the file on the filesystem OR the contents of the file.
path – Where to store the file in the repo.
- add_dir(local_path, glob_exclude='', commit_message=None, **upload_kwargs)¶
Add and upload an entire directory to the DagsHub repository.
By default, this uploads a dvc folder.
- Parameters:
local_path – Local path of the directory to upload.
glob_exclude – Glob pattern to exclude some files from being uploaded.
commit_message – Message of the commit with the upload.
The keyword arguments are passed to
Repo.upload_files()
.
- commit(commit_message='Upload files using DagsHub client', *args, **kwargs)¶
Commit files added with
add()
to the repo- Parameters:
commit_message – Message of the commit with the upload.
Other positional and keyword arguments are passed to
Repo.upload_files()