Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

visualizing_datasets.md 3.8 KB

You have to be logged in to leave a comment. Sign In
title description
DagsHub Data Engine - Visualizing Data Documentation on using Data Engine to visualize data

Visualizing your data

Data Engine provides an easy way to visualize data points and their enrichments. By displaying data with enrichments such as annotations, predictions, and metadata - you can quickly:

  • See your data points & better understand your datasets
  • Discover use cases, in which your model is underperforming
  • Create a new dataset out of it using visual filters
  • Send your data for annotation or re-annotation

This way you can make sense of the datasets you’re using to train and improve your model.

Visualize data locally

Data Engine’s local visualization (currently available) is implemented as an integration with open source Voxel51 tool. To visualize your datasets, simply run the following:

  1. First, install the Voxel51 package:

    pip install fiftyone
    

    !!! warning "Visualizing using Google Colab" If you are working with Google Colab, make sure to run the following command at the beginning of your notebook: bash pip install fiftyone-db-ubuntu2204 After installation, you'll need to restart your kernel.

  2. To visualize data source, dataset, or query results, use the .visualize() function:

    # Query datasource
    query = ds["annotation"].is_null()
    
    # Visualize the quried datasource
    query.visualize()
    

    This function will open a visualization instance on your local machine.

    Visualization Local Instance

    Behind the scenes, Data Engine checks which files need to be available for visualization, automatically create a Voxel51 compatible dataset, and creates a new Voxel51 instance locally with the built in DagsHub integration.

All Voxel51 capabilities such as filtering, sorting and tagging are provided out-of-the-box with the Data Engine visualize command. But we didn’t stop there - Data Engine visualizations come with a few new capabilities that don’t exist in a regular voxel instance…

Save query results as a dataset

After filtering your dataset through the Voxel51 UI, you can save your filtered results as a new Data Engine dataset. To do that, simply navigate to the DagsHub tab by clicking on the DagsHub icon:

Dagshub Tab Navigation

Click on the ‘Save dataset’ button:

Save query as dataset

Provide the name for your dataset and check the checkbox to keep the filters used within the visualization instance (the ones on the left):

Name dataset

Use the get_datasets() command or navigate to your repository and check the datasets tab to see your new dataset.

Edit metadata

You can update data points’ metadata directly from the local visualization instance. To do this, navigate to the Dagshub tab and click on the ‘Update metadata for selected’ button.

Update Metadata Butoon

Choose the field you would like to change, insert new value and click ‘save’. The metadata will be updated immediately. Changes will affect all selected data points.

Update Metadata Butoon

Currently supported with primitive type columns only. Adding new enrichment fields from the visualization instance is still not supported.

??? note "Voxel51 capabilities" Go to Voxel51 documentation for other capabilities.

Annotate your data

You can also use Data Engine visualization instance to send your data for annotation and re-annotation. To learn more about that, click here

Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...