Install DagsHub:
pip install dagshub
To stream this data directly on DagsHub
from dagshub.streaming import DagsHubFilesystem
fs = DagsHubFilesystem(".", repo_url="https://dagshub.com/DagsHub-Datasets/klarna_productpage_dataset-dataset")
fs.listdir("s3://klarna-research-public-datasets/")
Description
A collection of 51,701 product pages from 8175 e-commerce websites across 8 markets (US, GB, SE, NL, FI, NO, DE, AT) with 5 manually labelled elements, specifically, the product price, name and image, add-to-cart and go-to-cart buttons. The dataset was collected between 2018 and 2019 and is made available has MHTML and as WebTraversalLibrary-format snapshots.
Additional information
Documentation
Update frequency
The dataset is not expected to update frequently.
Managed by
Web Automation Research, Klarna
License
CC BY-NC-SA