This session focuses on being more production-oriented when collaborating on a data science project. It focuses on two main topics:
- Using a monorepo strategy.
- Moving from research to production-ready code using Git and DVC.
In the first part of the session, we will learn about the monorepo strategy, which is derived from software development. We will explore how it can be implemented in a data science project to help us manage a project with multiple collaborators more efficiently and scale our work alongside a growing project and team.
The second part will explore how to move from the research phase, when using a notebook interface, to a production-ready project that can be trained and deployed on a remote machine. We will use Git and DVC to version all project components, DagsHub storage to host our DVC tracked files, and Google Colab as a notebook interface and as our remote machine.