Active Learning with Domain Experts - Working with Dentists on Machine Learning
  Back to blog home

Active Learning with Domain Experts - Working with Dentists on Machine Learning

Active Learning Oct 30, 2023

Historically, implementing an active learning pipeline has been something only within reach for corporations with large budgets for ML and MLOps teams. Even then, it is no trivial task, as it requires either:

  1. Developing custom in-house dev tools,
  2. Patching together currently available tools, or
  3. A mixture of both

The release of Data Engine, however, enables even single developers to implement an active learning pipeline in short order.

This post talks about various things you should consider when setting up an active learning pipeline. As reference, you can see a complete active learning pipeline in a Jupyter Notebook, created using Data Engine in the Tooth Fairy project. This project is a case study, where we worked with a domain expert to create a model which could segment teeth in dental x-rays.

What is Active Learning?

Active learning is semi-supervised learning with a human-in-the-loop. The main difference to standard supervised learning is that rather than having humans label all training data, an algorithm or model chooses which data should be labeled.

Why Active Learning?

The big appeal of using active learning is training more accurate models using less data.

Active learning is a data-centric approach to machine learning, where we focus on labeling samples, from which our models can learn the most. This, correctly, implies that not all data is created equal. Some samples may be information dense. Perhaps they contain an edge case the current model has never encountered. Other samples may offer only shallow insights. The model has already learned the patterns they contain and doesn’t truly need them.

By focusing on these information dense, “hard” samples, our model can learn more efficiently, reducing the amount of data we need to label.

The benefits of reducing the needed data go beyond training time of a single cycle. Labeling data is an expensive, time consuming process. Reducing the amount of data can save companies quite a bit of money. This is especially true, when the labeling needs to be done by domain experts, as in our Tooth Fairy project. Since we are data scientists and machine learning engineers, we are unqualified to label dental x-rays. At least not as accurately as a dentist can.

Initial Dataset and Model

The first difficult decision to make with an active learning pipeline is how to get your first model.

Before you can start selecting data to label, you need to have a model which handles the selection process.

Sometimes you need to get creative with this process, but here are a few options we’ve implemented in the past:

  • Use a model pretrained on similar data - During our Squirrel Detector project, we used a YOLOv5 model pretrained on a COCO dataset. Although squirrel is not one of the 80 COCO categories, we noticed that the model did a decent job detecting squirrels as bears, birds, dogs, and sheep.
  • Train a smaller model on less data - For the Tooth Fairy project, our domain expert pre-labeled some data. We were able to train a smaller model on this data and use that as our initial model. This can be seen in steps 2-4 in the Jupyter Notebook.

Selecting New Data to Label

Once we have an initial model, we can use it to select new data to be labeled. Ideal data to label is data, from which the model can learn the most.

To do this, we need a function, which uses the model’s output to determine an overall score for each data sample.

In the simplest form, we can use the model’s detection confidence to determine a score. But even here there are quite a few options to choose from:

  • Lowest confidence - the score is the lowest confidence of all detected objects
  • Average confidence - average of all confidences of detected objects
  • Minimizing confidence delta - difference between confidences for the top two labels of an object
  • Maximizing entropy - entropy is a measure of the confidence across all categories. Maximizing this means we find data the model is most confused about
  • and many, many others…

The trade offs will vary per project, but, in general, it’s a balance between simplicity to implement/calculate and the accuracy of the assessment.

For the Tooth Fairy project, we went with a simple average confidence score, which can be seen in steps 5-8 of the Jupyter Notebook.

Once the data has been selected for labeling, it’s time to work with your labelers. In our example project, we were working with a domain expert.

Working with Domain Experts

Domain experts are high skilled people, who posses specific knowledge most data scientists do not. For instance, a radiologist would be able to label x-ray images significantly better than a typical data scientist or crowd-source labeler. A car mechanic can more easily spot defects in parts. A language expert can more accurately label parts of speech in text.

In order to get the best quality labels for some applications, we need to work with domain experts.

There are several things to keep in mind when working with domain experts in your active learning setup.

Domain Experts May be Time Constrained

In order to be respectful of a domain expert’s time, we need to optimize the workflow to minimize the amount of time they need to spend labeling samples. This might mean ensuring you don’t give them too many samples to label at a given time.

For the Tooth Fairy project, we worked with a dentist to train a tooth segmentation model. At the beginning of the project we met to try to answer to the following questions:

  1. How long, on average, does it take to label a single image?
  2. How many hours per week, on average, could they spend labeling?
  3. How many images do we need per active learning cycle?
  4. How long does an active learning cycle take to run?

Based on our educated guesses to these questions, we could work out how to most efficiently utilize the domain expert’s knowledge without overburdening them — all while moving the project along.

Sensitive Data

Many machine learning projects may contain sensitive data. This is often the case, when a domain expert is involved in the project. The sensitivity of the data may be due to a privacy concern, such as with medical data, or it may be a result of the data being corporate IP.

In either case, how you set up your active learning pipeline needs to be mindful of such data sensitivities.

DagsHub offers various options to help you maintain data privacy.

  • Private repos - Make a repository private. This can be done at the time of creation or changed later through the repo settings.
  • Contributor access - Choose specific users to have read, write, or admin access to a repository. Setting access permissions for annotators is also available for enterprise tiers.
  • S3-compatible buckets - Connect S3-compatible buckets to your repository. This keeps your sensitive data where you currently store it.
  • On-premises installation - For enterprise projects that absolutely need to control the machines their data is stored on, DagsHub offers on-premises installation options.

Calibrating Expectations

It’s also important to calibrate the expectations of your domain expert. This is especially true, if they’re vested in the quality of the trained model you produce. While they are experts in their own domains, such as medicine, agriculture, or manufacturing, they may not have a reasonable expectation as to how the machine learning process works.

For instance, in the Tooth Fairy project, we, the machine learning experts, wanted to ensure that tooth segmentation was something a model could even learn in the first place. We trained a nano-sized YOLOv8 model on a small amount of data. This was our proof-of-concept and not intended to be indicative of the accuracy of the final model we would train.

However, when we showed the dentist the results, they were disappointed in the quality of the segmentation masks produced. Where we saw a positive indication that a larger model trained on more data would work well, they saw poor results.

We did not appropriately calibrate their expectations prior to running the experiment.

Training a New Model and Starting a New Cycle

Once the domain expert has labeled the chosen samples, the next step is to train a new model. As you’ve already trained an initial model, this begins a new active learning cycle.

However, one important thing to consider when training your models at this step in the cycle is logging. Ideally, we would log everything about the training process and the model:

  • Hyperparameters used to train
  • Loss and/or accuracy metrics during training
  • Loss and/or accuracy of the test set run after training
  • Weights of the best performing model during training

Each of these can be logged using MLflow to the MLflow server provided with each DagsHub repo. In addition to logging, we also recommend registering the final model with the MLflow Model Registry. This allows you to easily load registered models later for inference.

The scientist part in data scientist presumes that we follow the scientific method. By logging all of these parameters and metrics, we’re keeping a record of our experiments. These are critical for us to be able to reproduce our results and make new hypotheses on improving our models.

Catastrophic Forgetting

One thing to be cognizant of when fine-tuning pre-trained models is catastrophic forgetting. This occurs when the fine-tuning causes the weights to be updated in such a way that the model loses the ability to label data it was able to prior to fine-tuning.

This does not necessarily need to be a problem for your project, though. For instance, in the Tooth Fairy project, we don’t need our model to be able to classify the original COCO categories it was pre-trained on. As long as it can generalize to teeth, catastrophic forgetting is not a concern.

However, if this is a problem in your project, you might need to look at various techniques that can help. For instance, you could include some data from the pre-training dataset in your fine-tuning dataset or you could look into fine-tuning using Low-Rank Adaptation (LoRA) techniques. LoRA keeps the pre-trained model weights frozen and adds trainable rank decomposition matrices into each layer. These LoRA matrices can then be enabled or disabled for inference keeping the original network in-tact.


While active learning may seem complicated on the surface, breaking it down into its atomic pieces makes it more digestible and easy to understand. Additionally, by using Data Engine, a lot of the heavy lifting is done for us.

A lot of projects can benefit from using active learning — especially those which require domain expertise to label the data.

If you have any questions on setting up your own active learning pipeline using Data Engine, feel free to reach out. We’re happy to help. You can join our Discord, where we’ve built a vibrant, helpful and friendly community.

Tags

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.