The Data Engine will allow you to share your queries and results with your teammates so they can continue where you left off.
Create production grade training-ready datasets for machine learning
We provide an out-of-the-box solution with a clear display of your datasets, querying abilities, annotations, lineage and eventually a faster way to experiment and improve models.
We’re covering all steps to create training ready datasets
Features
Seamless connection to your existing storage
Simple interface to connect your external storage, no DevOps needed.
We currently support S3, Google Cloud, and S3 compatible, with more to be added in the near future.
Datasets versioning and lineage
Clear & organized display of your datasets,
including visual lineage that connects datasets, models, experiments, labels and predictions
Data querying
Pick and choose the most relevant data points to improve a model where performance is low. This can be achieved by filtering, sorting, and searching for similar examples to create and save a new training-ready version of the dataset
Annotations
Annotate relevant data points in one click with zero setup. Use existing models to automatically label your data, and fine tune manually.
Experiments and retraining
Use subsets of your data to experiment and retrain your model by streaming it directly to your pipeline and track your experiments within DagsHub.