Datapoint

class dagshub.data_engine.model.datapoint.Datapoint(datapoint_id, path, metadata, datasource) None
datapoint_id: int

ID of the datapoint in the database

path: str

Path of the datapoint, relative to the root of the datasource

metadata: Dict[str, Any]

Dictionary with the metadata

datasource: Datasource

Datasource this datapoint is from

delete_metadata(*fields)

Delete metadata from this datapoint.

The deleted values can be accessed using versioned query with time set before the deletion.

Parameters:

fields (str) – fields to delete

delete(force=False)

Delete this datapoint.

  • This datapoint will no longer show up in queries.

  • Does not delete the datapoint’s file, only removing the data from the datasource.

  • You can still query this datapoint and associated metadata with versioned queries whose time is before deletion time.

  • You can re-add this datapoint to the datasource by uploading new metadata to it with, for example, Datasource.metadata_context. This will create a new datapoint with new id and new metadata records.

  • Datasource scanning will not add this datapoint back.

Parameters:

force (bool) – Skip the confirmation prompt

save()

Commit changes to metadata done with one or more dictionary assignment syntax usages. Learn more here.

Example:

specific_data_point['metadata_field_name'] = 42
specific_data_point.save()
property download_url

URL that can be used to download the datapoint’s file from DagsHub

Type:

str

property path_in_repo

Path of the datapoint in repo

Return type:

PurePosixPath

get_blob(column, cache_on_disk=True, store_value=False) bytes

Returns the blob stored in a binary column

Parameters:
  • column (str) – where to get the blob from

  • cache_on_disk – whether to store the downloaded blob on disk. If you store the blob on disk, then it won’t need to be re-downloaded in the future. The contents of datapoint[column] will change to be the path of the blob on the disk.

  • store_value – whether to store the blob in memory on the field attached to this datapoint, which will make its value accessible later using datapoint[column]

Return type:

bytes

download_file(target=None, keep_source_prefix=True, redownload=False) PathLike

Downloads the datapoint to the target_dir directory

Parameters:
  • target (Union[PathLike, str, None]) – Where to download the file (either a directory, or the full path). If not specified, then downloads to datasource's default location.

  • keep_source_prefix – If True, includes the prefix of the datasource in the download path.

  • redownload – Whether to redownload a file if it exists on the filesystem already.

Note

We don’t do any hashsum checks, so if it’s possible that the file has been updated, set redownload to True

Return type:

PathLike

Returns:

Path to the downloaded file

get_version_timestamps(fields=None, from_time=None, to_time=None) List[DatapointHistoryResult]

Get the timestamps of all versions of this datapoint, where the specified fields have changed.

Parameters:
  • fields (Optional[List[str]]) – List of fields to check for changes. If None, all fields are checked.

  • from_time (Optional[datetime]) – Only search versions since this time. If None, the start time is unbounded

  • to_time (Optional[datetime]) – Only search versions until this time. If None, the end time is unbounded

Return type:

List[DatapointHistoryResult]

Returns:

List of objects with information about the versions.

class dagshub.data_engine.client.models.DatapointHistoryResult(timestamp) None
timestamp: datetime

Times of the version changes for this datapoint. The timezone is always UTC.