Install DagsHub:
pip install dagshub
To stream this data directly on DagsHub
from dagshub.streaming import DagsHubFilesystem
fs = DagsHubFilesystem(".", repo_url="https://dagshub.com/DagsHub-Datasets/catalyst-cooperative-pudl-dataset")
fs.listdir("s3://intake.catalyst.coop")
Description
The Public Utility Data Liberation Project (PUDL) provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
PUDL is an open source data processing pipeline that makes US energy data easier to access and use programmatically. Hundreds of gigabytes of valuable data are published by US government agencies, but it’s often difficult to work with. PUDL takes the original spreadsheets, CSV files, and databases and turns them into a unified resource. This allows users to spend more time on novel analysis and less time on data preparation.
This information allows users to explore the operating costs of individual power plants, and see how fuel costs impact the viability of different types of generation. It can highlight the competitiveness of renewable electricity in the market today. It can show how the generation mix of different utilities has evolved over time, and how the usage of individual power plants has changed as fuel prices have changed and more renewable generation has been brought online.
The data hosted on Amazon Web Services is intended to be accessed through the PUDL Intake Catalog. The catalog allows users to access the data via a uniform API for each data type (parquet, SQL), handles local caching and provides rich metadata about the data.
Additional information
Documentation
To access the data via the the PUDL intake catalog, follow the setup
instructions in the documentation.
You can learn more about the data in the PUDL data dictionary documentation.
Update frequency
The federal agencies that publish the raw data PUDL processes release new data, monthly, quarterly and yearly.
PUDL is continuously improving the data and tries to release new versions of the data monthly.
Managed by
License
The PUDL data and documentation are published under the Creative Commons Attribution License v4.0 (CC-BY-4.0).