1 Branches

pytorch

53b575df89

Use the `epoch_loss` variable instead of recalculating

11 months ago

tensorflow

d4e4d4bceb

Add Tensorflow code

11 months ago

.gitignore

f8906a06e0

Add cache and data dirs to .gitignore

11 months ago

README.md

18c05b5d28

Update 'README.md'

11 months ago

DagsHub Storage

You have to be logged in to leave a comment.

Aesthetics Predictor

This repo shows examples of how to stream a subset of a large dataset using Direct Data Access (DDA).

Each training script (one for PyTorch and one for TensorFlow) expects a DagsHub Access Token to be in an environment variable named DAGSHUB_TOKEN. This should be set prior to training:

export DAGSHUB_TOKEN="..."

The scripts then read this token in and use it to authenticate:

import os
DAGSHUB_TOKEN = os.environ.get('DAGSHUB_TOKEN', None)

import dagshub
dagshub.auth.add_app_token(DAGSHUB_TOKEN)

Once that's done, DDA can be set up with two lines of code:

from dagshub.streaming import install_hooks
install_hooks(project_root='.', repo_url='https://dagshub.com/DagsHub-Datasets/LAION-Aesthetics-V2-6.5plus', branch='main')

PyTorch Dataset and DataLoader

The PyTorch version of the code uses a custom Dataset to stream the images and aesthetics scores from the LAION Aesthetics dataset.

The LAIONAestheticsDataset streams the annotations file, which includes the images names, captions, and aesthetics scores. It also relys on the EfficientNetFeatureExtractor to stream the images and extract the features.

Streaming happens automatically as the code uses the standard open() function to read the annotations file and the PIL.Image.open() method to read the images. The installed hooks take care of all this transparently.

This code can be found in pytorch/data.py.

TensorFlow data generator

The TensorFlow data generator, LAIONAestheticsDataGenerator, also relies on an EfficientNetFeatureExtractor to stream the images via PIL.Image.open() calls.

The annotations file is, however, streamed in the train_valid_split function, which determines ahead of time which samples belong to the training and validation sets.

This code can be found in tensorflow/data.py.

Tip!

Press p or to see the previous file or, n or to see the next file

README.md

Aesthetics Predictor

PyTorch Dataset and DataLoader

TensorFlow data generator

Comments

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

yonomitt / AestheticPredictor

README.md

Aesthetics Predictor

PyTorch Dataset and DataLoader

TensorFlow data generator

Comments

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

yonomitt
/
AestheticPredictor