Getting Data Scientists to Write Better Code 🔥 with Laszlo Sragner

Dean Pleban
3 min read
4 years ago

Co-Founder & CEO of DAGsHub. Building the home for data science collaboration. Interested in machine learning, physics and philosophy. Join https://DAGsHub.com | DagsHub Co-Founder & CEO

Table of Contents

Share This Article

In this episode, we dive into the challenging but very important topic of getting data scientists to write better code. How to approach complex machine learning projects and break them down, and why growing unicorns 🦄 is better than hunting them. Check out this is an awesome conversation with Laszlo Sragner, Founder at 🔥 Hypergolic.

Join our Discord community

Listen to the Audio

Q&A

Q: Why is it important that data scientists write better code?
A: It is the product of your labor and you should care about its quality, as it represents your professionalism and attitude. Practical reasons are the same as for SWE (I asked a couple of the original code-quality people on the topic of “why they started it”, and the answer was to “reduce SWE stress and frustration”. It’s just a better way of doing things. Beyond a certain scope, you can’t comprehend the entire project because of “bounded rationality”. You need to get organized.
Q: What does it mean to write better code?
A: Being more coherent and thoughtful about what you want to do and not just stumble from one step to another. Understanding and using the techniques on both strategic and tactical levels.
Q: What experience led you to the conclusion that better code is important
A: SWEs didn’t seem to have this problem (or, they at least had it to a lesser extent), so I went out to see if I could figure out whether it applies to ML.
Q: What’s the most interesting part of taking machine learning models into production?
A: The best part is if you can think about modeling new features (not input features) in a completely abstract way, and knowing that you have a straight path to production because you did your homework.
Q: What role do you think Software Engineering production practices play in ML production? Are they the same? Where do they differ?
A: I think unit testing is much less important, partly because of the statistical nature but otherwise, the role is the same: organize your work so you can tackle larger problems than you are capable of comprehending at once.
Q: How do you see building ML tools and processes for small startups compared to giant corporations? Where do they differ?
A: Startups need to be craftier, with fewer resources but more flexibility.
Q: Walk me through your process for thinking about a problem that might be solved with ML. When do you start building, and how do you approach the first iteration?
A: Depending on the situation, we probably do some DDD and implement a business layer, and figure out what we’ll need later in the infra layer. Parallel to that, we figure out how to solve a problem: what would make it work? Often we think in terms of embedding spaces: where would you put instances in the space, and why would training get there? Is this representation one where learning a function is easy? How would data quality affect this?
Q: What’s your recipe for breaking big problems into manageable ones?
A: Taking a cue from SWE, we should try to boil it down to a smaller vertical slice of the business problem – knowing how to do this is the data scientists job. Spend time thinking about the different layers of abstraction, bounded context and decoupling, as well as how to build a clean architecture and interface.
Q: End-to-end platforms vs. best-of-breed tools, which is best?
A: I think it is too early to tell. I still don’t see a standard (abstract) way of running ML projects, and that needs to emerge before we can talk about the “best” way to implement issues.
Q: Build-vs-buy?
A: Build AND buy! You shouldn’t outsource or couple with a framework (see also: Clean Architecture)
Q: What are the strongest/most exciting trends you see in ML and MLOps?
A: I’d hope its CodeQuality/LeanML, but one of my personal favorites is: Geometric / Graph ML (Bronstein / Velickovic)
Q: How do you keep up to date with everything that’s going on in this field?
A: It’s very hard. I follow LinkedIn, MLOps.Community, Twitter, and I hope anything important comes across these three. I take any piece of information I come across and validate it based on first principles: whether it does or does not make sense in my mental framework. You do need to avoid reading a lot of hype content and meta-analysis of hype content because they are usually not very actionable
Q: Recommendations for the audience?
A: Try looking for ideas outside of DS and if you can apply them to your work. Those are usually strong signals that you are onto something. DS/ML is a very rapidly changing environment, prepare to learn and change your mind because, more often than not, you are probably wrong.