Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
Integration:  dvc git github
Max Jones 005dcada2e
Update README.md
3 weeks ago
dfc4f64601
Use DVC to manage test data on DagsHub (#8)
3 months ago
2f59b3badd
Migrate to VirtualiZarr 2.0 (#41)
3 weeks ago
8b9b430ef8
Setup library (#3)
4 months ago
2710268fc2
Add basic docs (#34)
2 months ago
src
940b642d06
Temporarily remove support for int64 (#43)
3 weeks ago
940b642d06
Temporarily remove support for int64 (#43)
3 weeks ago
dfc4f64601
Use DVC to manage test data on DagsHub (#8)
3 months ago
2f59b3badd
Migrate to VirtualiZarr 2.0 (#41)
3 weeks ago
774e4a6273
Add initial thoughts
7 months ago
58b13ecb9e
Configure readthedocs build (#35)
2 months ago
774e4a6273
Add initial thoughts
7 months ago
005dcada2e
Update README.md
3 weeks ago
2710268fc2
Add basic docs (#34)
2 months ago
940b642d06
Temporarily remove support for int64 (#43)
3 weeks ago
Storage Buckets
Data Pipeline
Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

You have to be logged in to leave a comment. Sign In

Virtual TIFF

A Parser for creating Virtual Zarr stores from TIFF files using VirtualiZarr 2.0 and async-tiff.

Background

First, some thoughts on why we should virtualize GeoTIFFs and/or COGS:

  1. Provide faster access to non-cloud-optimized GeoTIFFS that contain some form of internal tiling without any data duplication see notebook #1.
  2. Provide fully async I/O for both GeoTIFFs and COGs using Zarr-Python
  3. Allow loading a stack of GeoTIFFS/COGS into a data cube while minimizing the number of GET requests relative to using stackstac/xstac, thereby decreasing cost and increasing performance
  4. Provide users access to a lazily loaded DataTree providing both the data and the overviews, allowing scientists to use the overviews not only for tile-based visualization but also quickly iterating on analytics
  5. Include etags in the virtualized datasets to support reproducibility
  6. A motivation that's less clear to me, but maybe possible, is using the virtualization layer to access COGs with disparate CRSs as a single dataset (https://github.com/zarr-developers/geozarr-spec/issues/53)

Getting started

  1. Clone the repository: git clone https://github.com/virtual-zarr/virtual-tiff.git.
  2. Pull baseline image data from dvc remote pixi run -e test download-test-images WARNING: This will download ~1.4GB of TIFFs for testing to your machine.
  3. Run the test suite using pixi run -e test run-tests WARNING: Some tests will fail due to incomplete status of the implementation.
  4. Start a shell if needed in the development environment using pixi run -e test zsh.

License

virtual-tiff is distributed under the terms of the MIT license.

Tip!

Press p or to see the previous file or, n or to see the next file

About

Exploring motivations and mechanisms for virtualizing GeoTIFFs

Collaborators 1

Comments

Loading...