Getting Data Scientists to Write Better Code šŸ”„ with Laszlo Sragner
  Back to blog home

Getting Data Scientists to Write Better Code šŸ”„ with Laszlo Sragner

MLOps Podcast Feb 14, 2022

In this episode, we dive into the challenging but very important topic of getting data scientists to write better code. How to approach complex machine learning projects and break them down, and why growing unicorns šŸ¦„ is better than hunting them. Check out this is an awesome conversation with Laszlo Sragner, Founder at šŸ”„ Hypergolic.

Join our Discord community

Listen to the Audio

Q&A

  1. Q: Why is it important that data scientists write better code?
    A: It is the product of your labor and you should care about its quality, as it represents your professionalism and attitude. Practical reasons are the same as for SWE (I asked a couple of the original code-quality people on the topic of ā€œwhy they started itā€, and the answer was to ā€œreduce SWE stress and frustrationā€. Itā€™s just a better way of doing things. Beyond a certain scope, you canā€™t comprehend the entire project because of ā€œbounded rationalityā€. You need to get organized.
  2. Q: What does it mean to write better code?
    A: Being more coherent and thoughtful about what you want to do and not just stumble from one step to another. Understanding and using the techniques on both strategic and tactical levels.
  3. Q: What experience led you to the conclusion that better code is important
    A: SWEs didnā€™t seem to have this problem (or, they at least had it to a lesser extent), so I went out to see if I could figure out whether it applies to ML.
  4. Q: Whatā€™s the most interesting part of taking machine learning models into production?
    A: The best part is if you can think about modeling new features (not input features) in a completely abstract way, and knowing that you have a straight path to production because you did your homework.
  5. Q: What role do you think Software Engineering production practices play in ML production? Are they the same? Where do they differ?
    A: I think unit testing is much less important, partly because of the statistical nature but otherwise, the role is the same: organize your work so you can tackle larger problems than you are capable of comprehending at once.
  6. Q: How do you see building ML tools and processes for small startups compared to giant corporations? Where do they differ?
    A: Startups need to be craftier, with fewer resources but more flexibility.
  7. Q: Walk me through your process for thinking about a problem that might be solved with ML. When do you start building, and how do you approach the first iteration?
    A: Depending on the situation, we probably do some DDD and implement a business layer, and figure out what weā€™ll need later in the infra layer. Parallel to that, we figure out how to solve a problem: what would make it work? Often we think in terms of embedding spaces: where would you put instances in the space, and why would training get there? Is this representation one where learning a function is easy? How would data quality affect this?
  8. Q: Whatā€™s your recipe for breaking big problems into manageable ones?
    A: Taking a cue from SWE, we should try to boil it down to a smaller vertical slice of the business problem ā€“ knowing how to do this is the data scientists job. Spend time thinking about the different layers of abstraction, bounded context and decoupling, as well as how to build a clean architecture and interface.
  9. Q: End-to-end platforms vs. best-of-breed tools, which is best?
    A: I think it is too early to tell. I still donā€™t see a standard (abstract) way of running ML projects, and that needs to emerge before we can talk about the ā€œbestā€ way to implement issues.
  10. Q: Build-vs-buy?
    A: Build AND buy! You shouldnā€™t outsource or couple with a framework (see also: Clean Architecture)
  11. Q: What are the strongest/most exciting trends you see in ML and MLOps?
    A: Iā€™d hope its CodeQuality/LeanML, but one of my personal favorites is: Geometric / Graph ML (Bronstein / Velickovic)
  12. Q: How do you keep up to date with everything thatā€™s going on in this field?
    A: Itā€™s very hard. I follow LinkedIn, MLOps.Community, Twitter, and I hope anything important comes across these three. I take any piece of information I come across and validate it based on first principles: whether it does or does not make sense in my mental framework. You do need to avoid reading a lot of hype content and meta-analysis of hype content because they are usually not very actionable
  13. Q: Recommendations for the audience?
    A: Try looking for ideas outside of DS and if you can apply them to your work. Those are usually strong signals that you are onto something. DS/ML is a very rapidly changing environment, prepare to learn and change your mind because, more often than not, you are probably wrong.

Recommendations:

Tags

Dean Pleban

Co-Founder & CEO of DAGsHub. Building the home for data science collaboration. Interested in machine learning, physics and philosophy. Join https://DAGsHub.com

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.