Datapoint

class dagshub.data_engine.model.datapoint.Datapoint(datapoint_id: int, path: str, metadata: Dict[str, Any], datasource: 'Datasource')
datapoint_id: int

ID of the datapoint in the database

path: str

Path of the datapoint, relative to the root of the datasource

metadata: Dict[str, Any]

Dictionary with the metadata

datasource: Datasource

Datasource this datapoint is from

delete_metadata(*fields: str)

Delete metadata from this datapoint.

The deleted values can be accessed using versioned query with time set before the deletion.

Parameters:

fields – fields to delete

delete(force: bool = False)

Delete this datapoint.

  • This datapoint will no longer show up in queries.

  • Does not delete the datapoint’s file, only removing the data from the datasource.

  • You can still query this datapoint and associated metadata with versioned queries whose time is before deletion time.

  • You can re-add this datapoint to the datasource by uploading new metadata to it with, for example, Datasource.metadata_context. This will create a new datapoint with new id and new metadata records.

  • Datasource scanning will not add this datapoint back.

Parameters:

force – Skip the confirmation prompt

save()

Commit changes to metadata done with one or more dictionary assignment syntax usages. Learn more here.

Example:

specific_data_point['metadata_field_name'] = 42
specific_data_point.save()
property download_url

URL that can be used to download the datapoint’s file from DagsHub

Type:

str

property path_in_repo

Path of the datapoint in repo

Return type:

PurePosixPath

get_blob(column: str, cache_on_disk=True, store_value=False) bytes

Returns the blob stored in a binary column

Parameters:
  • column – where to get the blob from

  • cache_on_disk – whether to store the downloaded blob on disk. If you store the blob on disk, then it won’t need to be re-downloaded in the future. The contents of datapoint[column] will change to be the path of the blob on the disk.

  • store_value – whether to store the blob in memory on the field attached to this datapoint, which will make its value accessible later using datapoint[column]

download_file(target: PathLike | str | None = None, keep_source_prefix=True, redownload=False) PathLike

Downloads the datapoint to the target_dir directory

Parameters:
  • target – Where to download the file (either a directory, or the full path). If not specified, then downloads to datasource's default location.

  • keep_source_prefix – If True, includes the prefix of the datasource in the download path.

  • redownload – Whether to redownload a file if it exists on the filesystem already.

Note

We don’t do any hashsum checks, so if it’s possible that the file has been updated, set redownload to True

Returns:

Path to the downloaded file