Photo by DeepMind on Unsplash

Public Utility Data Liberation Project Dataset for Machine Learning

Install DagsHub:

pip install dagshub
Click on copy button to copy content

To stream this data directly on DagsHub

from dagshub.streaming import DagsHubFilesystem

fs = DagsHubFilesystem(".", repo_url="https://dagshub.com/DagsHub-Datasets/catalyst-cooperative-pudl-dataset")

fs.listdir("s3://intake.catalyst.coop")
Click on copy button to copy content

Description

The Public Utility Data Liberation Project (PUDL) provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.

PUDL is an open source data processing pipeline that makes US energy data easier to access and use programmatically. Hundreds of gigabytes of valuable data are published by US government agencies, but it’s often difficult to work with. PUDL takes the original spreadsheets, CSV files, and databases and turns them into a unified resource. This allows users to spend more time on novel analysis and less time on data preparation.

This information allows users to explore the operating costs of individual power plants, and see how fuel costs impact the viability of different types of generation. It can highlight the competitiveness of renewable electricity in the market today. It can show how the generation mix of different utilities has evolved over time, and how the usage of individual power plants has changed as fuel prices have changed and more renewable generation has been brought online.

The data hosted on Amazon Web Services is intended to be accessed through the PUDL Intake Catalog. The catalog allows users to access the data via a uniform API for each data type (parquet, SQL), handles local caching and provides rich metadata about the data.

Additional information

Documentation

To access the data via the the PUDL intake catalog, follow the setup
instructions in the documentation.
You can learn more about the data in the PUDL data dictionary documentation.

Update frequency

The federal agencies that publish the raw data PUDL processes release new data, monthly, quarterly and yearly.
PUDL is continuously improving the data and tries to release new versions of the data monthly.

License

The PUDL data and documentation are published under the Creative Commons Attribution License v4.0 (CC-BY-4.0).

Related datasets

Atmospheric Models from Météo-France

CAFE60 reanalysis

Coupled Model Intercomparison Project Phase 5 (CMIP5) University of Wisconsin-Madison Probabilistic Downscaling Dataset

Earth Radio Occultation

Launch your ML development to new heights with DagsHub

Back to top