Experiment discovery, sometimes called experiment management, is crucial for data science collaboration. When a project evolves over time or grows in complexity, we need a way to compare results and see what approaches are more promising than others.
Our belief is that experiment discovery should be based on simple, open formats for parameters and metrics, and integrated with the ability to reproduce results.
The easiest way to reproduce experiments is to make each experiment a git commit, which means it's reproducible with a
If you define your parameters and metrics in one of the supported formats, DAGsHub enables you to discover, compare and reproduce experiments, so that you can make the most of your data science projects.
To see the experiment view of a project, just click the experiments tab at the project's homepage.
If you're not sure where to start, you can read the rest of this page to see how experiment discovery works on DAGsHub, or go to the formats page to see how simple formats give you all the benefits of an enterprise grade experiment management tool. If you want to build an example project, you can go to our tutorial and try it out.
The experiment table is the first thing you see when you go to the experiments tab.
Each row represents one experiment, and also one git commit.
A row is added to the table every time you add a relevant commit to your project.
Each column is a parameter/metric of that experiment or some commit metadata.
A Glorified Experiment Table
You can filter and sort the table to find the most relevant experiments for your needs.
Sort, Filter and Column Selection buttons
- Sorting is done by clicking the arrows on the relative column header. You can sort in ascending or descending order.
- Filtering is done by clicking on the Filter button, which will cause each of the visible (and filterable) columns to show a filter input. You can filter by multiple columns.
- You can also choose which columns to show or hide from the table. This can be done by clicking the Columns button, which will open the column selection menu. There you can click a column name to select or de-select it, or drag and drop to change the order of columns.
Column Selection Menu
Using the features above you can find that experiment you worked on a few months ago, that had a specific model type, and had an AUC of more than 0.756.
You can click the green source code icon to go to the file view of that experiment and download the data or model from the DAGsHub data pipeline view.
Now, let's dive in a bit deeper. The experiment discovery tab has additional superpowers in store.
After finding a specific experiment we might want to get some extra info.
We can do exactly that by clicking the experiment info button.
Single experiment view button
Single Experiment View¶
The single experiment view allows you to understand an experiment at a deeper level.
It includes a more detailed view of the experiment parameters and metrics, as well as interactive graphs of metrics over time.
Sometimes, comparing experiments leads to even more insights. That is what the experiment comparison view is for.
To reach the experiment comparison view, you need to check the checkbox of 2 or more experiments in the experiment table.
Clicking the blue box in the table header will automatically check or uncheck all experiments.
After checking the experiments you'd like to compare, click the comparison button to go to the view.
Choosing experiments to compare
The experiment comparison view looks similar to the single experiment view, but is geared towards showing differences in performance between the compared runs.
Experiment comparison view
Now, there's a lot going on there, so let's look at each of the sections
The comparison table shows the different meta data, parameters and latest metric values in a table view.
This can help show how different parameter choices affected the end metrics, and is probably more suitable when you compare a relatively small amount of experiments.
The colors next to each commit ID represents the color that will correspond to it's lines in the metric chart view.
Sometimes, you want to dive deeper into the relationships between parameter choices, and look at a very broad set of experiments. That is what the parallel coordinates plot is for.
Parallel Coordinates Plot¶
The parallel coordinates plot is an interactive plot that let's you visualize the relationship between sets of experiment parameters and metrics.
Here, you can choose which parameters and metrics to show. You can drag along one of the parameter axes to gray out any lines outside of that parameter range.
Only parameters and metrics that appear in all compared experiments will be available in the parallel coordinate plot view.
In the example seen in the image above, we can see that the parameter determining the
avg_val_loss metric is the learning rate parameter. Perhaps un-intuitively, you can see that the lowest and highest learning rates produces higher losses then the middle learning rate. This is a simple example of the insights that might be gained from this plot.
Metric Charts View¶
The last thing in the comparison view is a list of charts that includes all metrics that exist for the compared experiments. Each chart includes lines for the relevant compared experiments
Each view is interactive, and you can download the graphs as png files, zoom in on relevant parts by dragging a rectangle on them (double click to zoom back out), or remove experiments from a plot by clicking the experiment name in the legend.
After finding the experiment we want to work on, we probably want to get it to our working system as soon as possible. To understand how to that, go to the next page about reproducing experiment results.