Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
406b546586
Initial commit
1 year ago
a0e81a24dc
Initialize ReadMe
1 year ago
Storage Buckets

README.md

You have to be logged in to leave a comment. Sign In

Cornell EAS Data Lake

Stream data with DDA:

import os, git
from dagshub.streaming import DagsHubFilesystem
url = 'https://<username>:<token>@dagshub.com/DagsHub-Datasets/cornell-eas-data-lake-dataset.git'
git.Git("./").clone(url)
fs = DagsHubFilesystem("cornell-eas-data-lake-dataset")
print(list(fs.scandir("cornell-eas-data-lake-dataset")))
print(list(fs.listdir("cornell-eas-data-lake-dataset/s3://cornell-eas-data-lake")))

Description:

Earth & Atmospheric Sciences at Cornell University has created a public data lake of climate data. The data is stored in columnar storage formats (ORC) to make it straightforward to query using standard tools like Amazon Athena or Apache Spark. The data itself is originally intended to be used for building decision support tools for farmers and digital agriculture. The first dataset is the historical NDFD / NDGD data distributed by NCEP / NOAA / NWS. The NDFD (National Digital Forecast Database) and NDGD (National Digital Guidance Database) contain gridded forecasts and observations at 2.5km resolution for the Contiguous United States (CONUS). There are also 5km grids for several smaller US regions and non-continguous territories, such as Hawaii, Guam, Puerto Rico and Alaska. NOAA distributes archives of the NDFD/NDGD via its NOAA Operational Model Archive and Distribution System (NOMADS) in Grib2 format. The data has been converted to ORC to optimize storage space and to, more importantly, simplify data access via standard data analytics tools.

Contact:

Earth & Atmospheric Sciences at Cornell University has created a public data lake of climate data. The data is stored in columnar storage formats (ORC) to make it straightforward to query using standard tools like Amazon Athena or Apache Spark. The data itself is originally intended to be used for building decision support tools for farmers and digital agriculture. The first dataset is the historical NDFD / NDGD data distributed by NCEP / NOAA / NWS. The NDFD (National Digital Forecast Database) and NDGD (National Digital Guidance Database) contain gridded forecasts and observations at 2.5km resolution for the Contiguous United States (CONUS). There are also 5km grids for several smaller US regions and non-continguous territories, such as Hawaii, Guam, Puerto Rico and Alaska. NOAA distributes archives of the NDFD/NDGD via its NOAA Operational Model Archive and Distribution System (NOMADS) in Grib2 format. The data has been converted to ORC to optimize storage space and to, more importantly, simplify data access via standard data analytics tools.

Update Frequency:

Hourly

Managed By:

Not currently managed

Resources:

  1. resource:

    • Description: Cornell EAS Data Lake
    • ARN: arn:aws:s3:::cornell-eas-data-lake
    • Region: us-east-2
    • Type: S3 Bucket
  2. resource:

    • Description: Cornell EAS Data Lake Notifications. Used to send human readable information about updates to the EAS Data Lake.
    • ARN: arn:aws:sns:us-east-2:003709786761:cornell-eas-data-lake-human
    • Region: us-east-2
    • Type: SNS Topic
  3. resource:

    • Description: Cornell EAS Data Lake Automation Notifications. Used to send JSON notifications to automated build pipelines and ETL jobs when the EAS Data Lake is updated.
    • ARN: arn:aws:sns:us-east-2:003709786761:cornell-eas-data-lake
    • Region: us-east-2
    • Type: SNS Topic

Tags:

agriculture, aws-pds, climate, earth observation, elevation, environmental, geospatial, mapping, meteorological, sustainability, weather

Tutorials:

  1. tutorial:
Tip!

Press p or to see the previous file or, n or to see the next file

About

cornell-eas-data-lake-dataset is originate from the Registry of Open Data on AWS

Collaborators 5

Comments

Loading...