Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Integration:  git mlflow
53b575df89
Use the `epoch_loss` variable instead of recalculating
11 months ago
d4e4d4bceb
Add Tensorflow code
11 months ago
f8906a06e0
Add cache and data dirs to .gitignore
11 months ago
18c05b5d28
Update 'README.md'
11 months ago
Storage Buckets

README.md

You have to be logged in to leave a comment. Sign In

Aesthetics Predictor

This repo shows examples of how to stream a subset of a large dataset using Direct Data Access (DDA).

Each training script (one for PyTorch and one for TensorFlow) expects a DagsHub Access Token to be in an environment variable named DAGSHUB_TOKEN. This should be set prior to training:

export DAGSHUB_TOKEN="..."

The scripts then read this token in and use it to authenticate:

import os
DAGSHUB_TOKEN = os.environ.get('DAGSHUB_TOKEN', None)

import dagshub
dagshub.auth.add_app_token(DAGSHUB_TOKEN)

Once that's done, DDA can be set up with two lines of code:

from dagshub.streaming import install_hooks
install_hooks(project_root='.', repo_url='https://dagshub.com/DagsHub-Datasets/LAION-Aesthetics-V2-6.5plus', branch='main')

PyTorch Dataset and DataLoader

The PyTorch version of the code uses a custom Dataset to stream the images and aesthetics scores from the LAION Aesthetics dataset.

The LAIONAestheticsDataset streams the annotations file, which includes the images names, captions, and aesthetics scores. It also relys on the EfficientNetFeatureExtractor to stream the images and extract the features.

Streaming happens automatically as the code uses the standard open() function to read the annotations file and the PIL.Image.open() method to read the images. The installed hooks take care of all this transparently.

This code can be found in pytorch/data.py.

TensorFlow data generator

The TensorFlow data generator, LAIONAestheticsDataGenerator, also relies on an EfficientNetFeatureExtractor to stream the images via PIL.Image.open() calls.

The annotations file is, however, streamed in the train_valid_split function, which determines ahead of time which samples belong to the training and validation sets.

This code can be found in tensorflow/data.py.

Tip!

Press p or to see the previous file or, n or to see the next file

About

No description

Collaborators 1

Comments

Loading...