Mana.bio: Reproducing results and accelerating experimentation
  Back to blog home

Mana.bio: Reproducing results and accelerating experimentation

Case Studies Nov 01, 2023

Learn how Mana.bio, an AI-based drug delivery startup, creating a platform for mRNA-based therapeutics, vaccines, and gene therapy, used DagsHub to enhance their operational efficiency, reduce time for POCs and increase their confidence in experiment reproducibility.

In this case study, we explore Mana.bio’s machine learning use cases, their requirements from an MLOps platform, and how implementing DagsHub helped streamline their machine learning workflows.

Company Overview

Mana.bio is an emerging startup at the forefront of the rapidly evolving field of gene therapy and drug delivery. They specialize in developing cutting-edge, AI-based drug delivery solutions, which integrate machine learning (henceforth: ML) and nanotechnology.

With the approval of the COVID-19 vaccine, the entire field of gene therapy has accelerated and Mana.bio is playing a crucial role in this exciting new era.

Their main focus is developing lipid nanoparticle technology for oligonucleotide therapies, including mRNA-based therapeutics & vaccines. In other words, they are working on medicine that is tiny and needs help to get to the right places in our bodies. Their platform utilizes a vast database of experimental data from scientific literature and data generated in their lab to predict which molecules will work best, enabling them to stay ahead of the curve in this rapidly evolving field.

How Mana.bio applies ML in their work

Mana.bio’s main work in ML involves cheminformatics, which is the field of applying ML techniques to molecular data to assist in drug discovery and delivery processes. They use ML, particularly graph neural networks, to analyze the interactions within molecules and develop models that predict chemical and biological properties. Their data is a combination of graphs and tabular information. The ML team utilizes this data to create models that enable them to plan and design better lipid nanoparticles.

The Requirements

As a data-driven company, Mana.bio’s data science (henceforth: DS) team encountered several requirements related to their data management and ML processes. Two key requirements were data versioning and standardization across the ML lifecycle.

1. Standardization across the ML lifecycle

One primary requirement identified by the team was the necessity for standardization and the establishment of a dedicated pipeline for their ML workflows. Mana.bio recognized the importance of standardization in their data management and ML processes to enhance collaboration and workflow efficiency. They sought a solution that would streamline their processes, eliminate confusion, and provide clarity in documenting progress. With the implementation of DagsHub, Mana.bio was able to establish standardized workflows, resulting in improved collaboration, efficient communication of findings, and increased confidence in their data and results.

It was obvious that we needed some kind of infrastructure where we can coordinate what we are working on.

Mana.bio aimed to optimize their time to market, supporting their business growth in the highly competitive drug discovery and delivery industry, where speed and accuracy play vital roles. These standardized processes are instrumental in fostering efficient workflows and delivering high-quality results driving overall success.

2. Data versioning

Managing versions of the constantly evolving data the team used was an important requirement. They recognized the importance of keeping track of data versions used by each team member and for specific models. The team frequently needed to replicate their experiments to engage in iterative optimization and also to communicate progress to clients and investors. The absence of a data versioning solution that tied with code and experiment versions meant they had to manually retrace their steps and document their findings. This led them to recognize that implementing an integrated data versioning solution would reduce the risk of human error and lead to much faster proof-of-concept (POC) timelines.

3. A managed solution

Lastly, the team sought a managed solution that would alleviate the need for hosting and maintenance on their end. Investing in building more and better models, rather than managing infrastructure, was their preferred approach.

Recognizing these requirements, the team at Mana.bio realized they needed a best-of-breed solution to effectively manage their pipeline processes. They set out to find a tool that would address as many of these requirements as possible.

The Solution

Mana.bio’s tool review led them to DagsHub as a way to track experiments, better their teamwork as well as work with other stakeholders, and manage data throughout their development cycles.

DagsHub helped in 5 ways:

1. Streamlined Data Versioning

Mana.bio’s initial tool review led them to discover DVC as a popular open-source version control tool for ML projects, which are integrated with DagsHub and also fully configured with remote storage.

Because our data is live and changing, we constantly have to stay in sync with each other, which versions we are working on. It's in our essence, we couldn't work without versioning and that's something DagsHub gave us out of the box.

2. Establishing Consistent Workflows for ML

To standardize their ML development flow, the team tracked their experiments using MLflow, a popular OS tool, which came with every DagsHub project. The Mana.bio team became power users of MLFlow on DagsHub, which provided extensive features, such as artifact storage, model registry, model deployment, and experiment tracking.

3. Easy Experiment Tracking

DagsHub's MLflow server hosting provided a solution for experiment tracking and made it easy to use and integrate with the rest of their work. This reduced the cognitive overhead and cost of finding additional solutions, as well as integrating them back into the existing platform.

The link between data versions, the experiments, and the models being trained is the most important thing for us. In this context, MLflow is very easy to work with and very well documented.

4.  Rapid Model Selection and choosing the right model

By using DagsHub and MLFlow, the team can quickly iterate through various models and provide other teams and clients with the best possible results, in significantly shorter times, with the press of a button.

When starting a new project, product managers often have specific properties in mind for the particle they are planning. With the help of DagsHub and MLFlow, our team can quickly test and select the best production model that fulfills the desired properties.

5. All-in-One MLOps Solution

The team recognized they needed a comprehensive solution that would cover more of their requirements. After a thorough analysis, the team chose to implement DagsHub as their MLOps platform, as it has managed servers hosting out of the box for data versioning, labeling, and experiment tracking

The Results / Impact

Simplifying the workflow process meant that the data science team could focus on improving their models and development, without investing long hours in configurations of environments, collaborating effectively, and reproducing results from previous runs.

However, the benefits extended beyond the mere enhancement of operational efficiency. The time to operate a client’s POCs reduced significantly, freeing time for the team and satisfying both management and the team itself.

Running a POC is now a much more efficient process, saving us a few days in each POC. This has allowed us to focus on other important tasks.

Not only was time for experiments reduced but reproducing relevant data became simple with artifact storage. Now, they have more confidence in their data, without the need to download and train it again as well as get more accurate reproducibility results.

Tags

Dean Pleban

Co-Founder & CEO of DAGsHub. Building the home for data science collaboration. Interested in machine learning, physics and philosophy. Join https://DAGsHub.com

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.