Getting Data Scientists to Write Better Code š„ with Laszlo Sragner
In this episode, we dive into the challenging but very important topic of getting data scientists to write better code. How to approach complex machine learning projects and break them down, and why growing unicorns š¦ is better than hunting them. Check out this is an awesome conversation with Laszlo Sragner, Founder at š„ Hypergolic.
Listen to the Audio
Q&A
- Q: Why is it important that data scientists write better code?
A: It is the product of your labor and you should care about its quality, as it represents your professionalism and attitude. Practical reasons are the same as for SWE (I asked a couple of the original code-quality people on the topic of āwhy they started itā, and the answer was to āreduce SWE stress and frustrationā. Itās just a better way of doing things. Beyond a certain scope, you canāt comprehend the entire project because of ābounded rationalityā. You need to get organized. - Q: What does it mean to write better code?
A: Being more coherent and thoughtful about what you want to do and not just stumble from one step to another. Understanding and using the techniques on both strategic and tactical levels. - Q: What experience led you to the conclusion that better code is important
A: SWEs didnāt seem to have this problem (or, they at least had it to a lesser extent), so I went out to see if I could figure out whether it applies to ML. - Q: Whatās the most interesting part of taking machine learning models into production?
A: The best part is if you can think about modeling new features (not input features) in a completely abstract way, and knowing that you have a straight path to production because you did your homework. - Q: What role do you think Software Engineering production practices play in ML production? Are they the same? Where do they differ?
A: I think unit testing is much less important, partly because of the statistical nature but otherwise, the role is the same: organize your work so you can tackle larger problems than you are capable of comprehending at once. - Q: How do you see building ML tools and processes for small startups compared to giant corporations? Where do they differ?
A: Startups need to be craftier, with fewer resources but more flexibility. - Q: Walk me through your process for thinking about a problem that might be solved with ML. When do you start building, and how do you approach the first iteration?
A: Depending on the situation, we probably do some DDD and implement a business layer, and figure out what weāll need later in the infra layer. Parallel to that, we figure out how to solve a problem: what would make it work? Often we think in terms of embedding spaces: where would you put instances in the space, and why would training get there? Is this representation one where learning a function is easy? How would data quality affect this? - Q: Whatās your recipe for breaking big problems into manageable ones?
A: Taking a cue from SWE, we should try to boil it down to a smaller vertical slice of the business problem ā knowing how to do this is the data scientists job. Spend time thinking about the different layers of abstraction, bounded context and decoupling, as well as how to build a clean architecture and interface. - Q: End-to-end platforms vs. best-of-breed tools, which is best?
A: I think it is too early to tell. I still donāt see a standard (abstract) way of running ML projects, and that needs to emerge before we can talk about the ābestā way to implement issues. - Q: Build-vs-buy?
A: Build AND buy! You shouldnāt outsource or couple with a framework (see also: Clean Architecture) - Q: What are the strongest/most exciting trends you see in ML and MLOps?
A: Iād hope its CodeQuality/LeanML, but one of my personal favorites is: Geometric / Graph ML (Bronstein / Velickovic) - Q: How do you keep up to date with everything thatās going on in this field?
A: Itās very hard. I follow LinkedIn, MLOps.Community, Twitter, and I hope anything important comes across these three. I take any piece of information I come across and validate it based on first principles: whether it does or does not make sense in my mental framework. You do need to avoid reading a lot of hype content and meta-analysis of hype content because they are usually not very actionable - Q: Recommendations for the audience?
A: Try looking for ideas outside of DS and if you can apply them to your work. Those are usually strong signals that you are onto something. DS/ML is a very rapidly changing environment, prepare to learn and change your mind because, more often than not, you are probably wrong.