Building models that actually perform with Kyle Gallatin

Dean Pleban
31 min read
4 years ago

Co-Founder & CEO of DAGsHub. Building the home for data science collaboration. Interested in machine learning, physics and philosophy. Join https://DAGsHub.com | DagsHub Co-Founder & CEO

Table of Contents

Share This Article

In this episode, I had the pleasure of speaking with Kyle Gallatin, a Machine Learning Software Engineer at Etsy. We talk about how he built the machine learning platform at Etsy, experimentation in production (yes, you heard right), and how to optimize model performance at very large scales.

Listen to the audio

Read the transcription

Dean: Hi everyone. My name is Dean and you're listening to the MLOps Podcast. As you probably know, machine learning in general and data science are two fields that are evolving all the time and it's really hard to keep up to date. Specifically, the area of bringing machine learning into production or into the real world seems like it's very confusing. There's a lot going on and it's hard to make sense of all that's happening around you. But on the other hand, there are a lot of smart people that are doing great work in bringing their own projects into production, and we've had a chance to speak with a lot of these people, but it definitely seems like the information is not widespread enough and a lot of people don't know of best practices and how other teams work. So that's why we decided to start this podcast, where we'll be speaking with people who are working in various types of machine learning teams and hearing about how they are bringing their projects into production. I hope you find this interesting and let's get started. Today I have Kyle Galatin with me. Kyle holds a master's degree in molecular and cellular biology. He trained as a data scientist at the NYC Data Science Academy, where he now serves as a mentor to students. Since graduating, he's been a data scientist at Fleetwick, followed by Pfizer, and then he transitioned into machine learning engineering. He's currently a software engineer at Etsy, where he is building their machine learning platform. Hi Kyle, thanks for joining. You have a very unique path into the world of data science, and I'd love for you to share a bit about how you got into this.

Kyle: I always like to use this phrase that I kind of tripped and fell into it. While I was getting my master's in biology and planning on going deeper into that route, I ended up getting an internship at a small biotech company, where my job was to teach myself R and Python to analyze our internal data. Through that, I learned about the whole world of data science and all that was possible there. I'd taken some coding classes in College, but they hadn't really resonated with me, but I really got interested in R and Python through that internship, which led me to pursue that full time boot camp and then go into the data science space from there. From biology all the way into data science.

Dean: One of the points that you mentioned is, you did have experience in coding before and you didn't really like it. Was having a real world problem to work on the difference, or was it something else?

Kyle: I think it was definitely the difference. It took me a while to get over that first learning curve hump of coding. I took a MATLAB class my freshman year, and then I took a scholar class my sophomore year. So a little bit of statistical programming, a little bit of more like functional laboratory stuff, but I always found it just frustratingly difficult to try and solve the sorts of problems that were assigned in class. But then when I was working on something real for a company and saw it driving value, it definitely changed my perspective of what programming was and the amount of satisfaction and enjoyment I got out of it, too.

Dean: You did the entire gamut of data related jobs and you are now in the more engineering oriented side of the spectrum. Was there a change or a point where you said, okay, maybe this coding thing is more interesting than I thought originally, or how did you end up where you are right now?

Kyle: I've always been more interested in the coding part than, for instance, the math part of data science and machine learning. I think I knew pretty early on I wasn't going to be making leaps and bounds, writing papers in deep learning or anything like that, and that my specialty was always in the practical side of things like implementing things, coding them out, and actually having them run. Through my work at Pfizer, I started to see a lot of machine learning projects fail to have impact because of the lack of software engineering expertise in a team or organization and the lack of engineering practices that were going into deploying these models to production and managing them and all those sorts of things. That's where I fostered an interest and saw a big opportunity to almost be a specialist in this new area of MLOps and ML infrastructure and platforms.

Dean: My main takeaway is, if you're now in a course or making your way into the world of data science and you feel like you're not as interested in coding as you thought you would be, in a sense, your story is kind of inspirational, right? Because you're basically saying that the technical part is not prerequisite to getting into being excited about the technical part. It's not a prerequisite to entering this field. You will need to work with code and there's mostly no way around that today, or at least for the more interesting use cases. But you can adopt that or learn to love that after you learn to love or be interested in the actual problems that you're solving, which I think should be inspiring and reassuring to a lot of people who are now getting into it.

Kyle: The one thing that I hear a lot of people say who haven't coded or took one coding course and look at the data science space or the software spaces, they say, I could never do that. And I'm like, listen, I know it definitely feels that way, but trust me, just like anyone, all it takes is time and some interest, and you would definitely get to the point where you're able to do it well.

Dean: This is already a good start to the episode. You're now sort of taking lead on building the machine learning platform at Etsy. Why? From our experience, a lot of companies today want to do machine learning. And then the nextlogical step is we need a platform to support machine learning. In many cases, that's actually not a good first step. You want to start by actually proving that this provides value before you systemicize things and all of that. So, yeah. Why did you build the platform and how is it structured?

Kyle: I am definitely not the architect of this platform or the sole engineer by a long shot. So the platform actually started back in 2017, far long before I joined Etsy. As a means to meet the needs of data scientists within the company. They wanted a more standardized software engineering, MLOps practices way to deploy models and maintain them in production. So I think all the way since back then and probably even long before then, there's been value seen in machine learning at Etsy. It powers search ads and recommendations, and they do have frameworks set up where we can measure the real monetary impact of those frameworks, which is great. Essentially, the value of machine learning for Etsy justifies the existence of a very comprehensive platform to suit all of the needs of data scientists and machine learning practitioners, applied scientists within the organization.

Dean: Can you share about what are the components of the platform?

Kyle: There's many, many different components. The machine learning platform itself is split into two sides, very simply, the training side and the serving side. There's also an ML system side which deals with a lot of other stuff. And I feel like those two terms are always very I've heard them used both ways in a lot of different company systems versus platform, but the platform is split into training and serving. So the serving side of things is where I work, and that's to do with real time model inference. Essentially, we deal with once a data scientist has trained a model, serving it in production, monitoring it, having observability over it, and all of the scale and performance considerations that come with that. The training side deals more with day to day experimentation and setting up pipelines for new models, existing models, et cetera, and maintaining those over time.

Dean: What's the definition of the ML system side? Just so people can structure this in their head?

Kyle: There are a lot of different squads over on that side now. I personally think of it like everything upstream of training and serving. There's a lot of feature systems up there. There's some other things to do with experimentation. There's some computer vision specialty systems and stuff that are being built out.

Dean: It's less on the, let's say, getting something ready for production and more on the getting something ready for training. It's an earlier step.

Kyle: That'd be a good way to put it. It's by no means a hard line between the two. There's overlap in a lot of places that we collaborate, depending on what part of the system is being worked on.

Dean: All of this obviously is supposed to make the lives of data scientists within the organization easier, simpler, more effective at getting their work into production. And you did mention that there's a split. Even on the platform side, you have the training side, you have the deployment side. Obviously, those are connected in the sense that they happen one after the other when things are successful. I'm curious, how do the data scientists at Etsy interface with each part of this platform?

Kyle: To get a little more explicit and technical, we mainly use Vertex AI now, which is the Google cloud provider for artificial intelligence. We use Vertex pipelines for essentially writing Model Dags out, which is a managed service that provides you access to Kubeflow, and then we use the Vertex platform or Vertex AI to actually train those models. Data scientists can go on Vertex and then train the experiment with whatever they want. They can open up notebooks with whatever environment that they want, experiment with code, try out models, et cetera. And then when they're ready to productize that more, they can work it into a Kubeflow Dag that will be on a schedule and run daily or something like that. Which is great and super helpful. Once it goes to hand off to the serving side, we have a model management service that we call Barista internally, that the serving squad manages, and that is essentially both a UI and API that acts as a control plane over Kubernetes. So through the different model serving, essentially images or platforms that we support, like we support TensorFlow serving, Selden, and an in house one that was built quite a while ago based written in Python, data scientists can create deployments on Kubernetes and then essentially take their model artifacts and configure their deployment with that artifact to run for inference.

Dean: It makes sense. As a data scientist, what does that require of me to use TF serving or Selden or your internal system? Do I need to support a specific format or does everything that works for those systems work for you as well?

Kyle: It definitely depends on which one you're trying to go with. When we built out our new platform, that was built as a TensorFlow-first platform, so there was a lot of push internally to adopt TensorFlow as the primary serving framework. We built a lot of our assumptions on the fact that most folks would be using TensorFlow, and all that requires from the TensorFlow serving perspective. We do have internal helper libraries and stuff, but all we require at this time really is a TensorFlow saved model artifact. You can essentially pass that in or pass the location of it in from a Blob store or something similar, and it will be loaded up in a TensorFlow serving container and you get a deployment on Kubernetes. The reason we adopted Seldon as well was because we also wanted to maintain some flexibility in the platform. We didn't want TensorFlow to be the only modeling framework. Many of our older models served on our legacy in-house framework is like GBM. There are some TensorFlow models on there as well, and I think probably a dozen other frameworks like VW for some language stuff. But we wanted people not to be limited to just TensorFlow, to be able to write whatever. So Selvan is a little bit of a different workflow where data scientists write their own Python serving code and manage their own images, in their own repo. So that's the flexible side of things in that trade-off there where you can go with TensorFlow and just provide an artifact. But if you need or want more flexibility, you can write whatever you want, but then obviously you're responsible for more aspects of the code base and the serving code.

Dean: That's a really great mental model. Many times, it's either-or, and then you lose on the trade off on one side. Either you're very inflexible or you require a lot of upfront work in order to deploy a model. This sort of gives you a "choose your own adventure" sort of paradigm where you can't complain. If you want simple, just use TensorFlow, you want custom, then you need to do a bit more work in order to support that. That makes a lot of sense to me, that would be something that I expect the market to go towards that direction, having more platforms do this and enable that selection and not either-or. You have a notion of experiments both on the training side and the deployment side. On the data science side, you have the experimentation of trying different model types and seeing what performs well, and then on the production side, you want to actually test the effect that has on the downstream or the business metrics that maybe it's harder to measure once you train. How does that look within your system? And is that something that you support natively, or not?

Kyle: The experimentation side of ML training is all about those offline metrics. Area of the curve or some data science metric. I'm not a data scientist anymore, I won't even talk about it. But we actually have an entire experimentation platform that's built up around the concept of those AB test experimentations for online metrics, essentially confirming that a model not only performs well offline with those metrics, but actually generates more revenue or leads to more clicks or more interactions with products or something like that. So we have an entire team actually, that works on a platform to manage that and measure those online metrics once it's being served in our machine learning platform. I would say it is fairly mature in that regard, that we definitely do have comprehensive systems in place for evaluating those kinds of things. But I don't spend that much time in the experimentation side of things. I mostly focus on making sure a model is performing enough to get to the phase of experimentation.

Dean: I'm going to dig deeper on one point here, and then we'll move on to performance, which I think is also super interesting. But that would put Etsy in a very advanced stage compared to most companies that at least I get a chance to speak with. We're all data people, so we know the value of an AB test and being data driven in your decision making and what to bring into production and things like that. But setting up that framework is something that I don't see often. Companies talk about them wanting to do that in the future. It's not yet implemented usually, and that's fair. We're still growing as a market, and that's okay. But yeah. What tips could you give? If I'm a CTO of a company that wants to do this, how can I get there faster or better?

Kyle: So that's a million dollar question. I'd probably be in a different role if I could answer that effectively. The first thing is just not forgetting that they exist. If you're thinking about the layout of your company, at least this was true a couple of years ago, or has been for companies that were only hiring data scientists but not hiring software engineers. It's that modeling and machine learning shouldn't exist in a vacuum, and you should have a proper number of teams to support every single phase that an actual machine learning model would go to. So it could be the simple advice of, have a team dedicated to building out an experimentation platform. Have a team dedicated to building out a machine learning platform. You will need different teams to serve your data scientists and modelers internally so that you can deliver value with machine learning. You want modeling folks to focus on modeling, and then once they're ready, you want them to hand that off through some clean interface to a team who specializes in infrastructure and scale. And you want a team that specializes in experimentation or data, to write software that compares the performance to things in production and generates good metrics that you can trust so you can escalate from there. I think it would be almost the structure of your organization and the good folks that you hire to build that out and define that workflow. Because machine learning by itself is not going to deliver value without all of those other considerations.

Dean: One thing I could share from our experience is that it seems to me like the first step is having the people building the models understand the language of the business side. If you ask someone what's your metric for success and their answer is accuracy or area under the curve, that is an indication that they're thinking about this as a technical problem to solve and not a business problem to solve. Whereas, even if you don't have a good way to AB test, but you realize that what you're trying to optimize for is reducing churn, increasing conversion, whatever it is, then you're already one step ahead. And then the next part is again, I feel like I'm speaking from 30k feet above the terrain, and it's very hard to implement this. But I think another thing that you can think about is before building a model, start by implementing some naive solution and deploy that and see what happens. That usually sets the stage for not being in the mentality of throwing machine learning at it. Which is dangerous because machine learning is awesome.I definitely understand more than most why people would want to work on that and solve problems with it. But having that mentality and thinking about this as a product and not just as a technology might be a good first step, which doesn't require special technical abilities.

Kyle: To take a ton of steps back, evaluate whether or not you need machine learning in the first place. My philosophy is definitely don't do it unless you need it. There are so many use cases I've seen where folks either want machine learning because it's machine learning, or want to do machine learning for the problem because it's what they know. But the problem could be solved with just data structures and algorithms or a well designed general hierarchical framework.

Dean: Let's dive into performance. You decided you want to deploy a model. And let's say, you know the business needs, which also, by the way, this also relates to performance. Because if you don't know the business needs and you don't care whether the model is performing or not, you just want as many neurons, layers, whatever it is, and having the coolest model. But obviously, if you make the best prediction, but it takes 3 hours and the user expected it to happen after 3 seconds, then no user will wait to see your cool model spit out whatever it is. One thing that always comes up is like, who is responsible for this? Is it the platform team? Is it the data scientist or is it some other engineering team which does optimization?

Kyle: It's a joint effort because we have to look at every layer of things going on. As the platform team, we're responsible for the cluster, the deployments themselves, and some of the other infrastructure, like our ingest controller, load balancers, those sorts of things. But then when it comes to the actual model latency itself, since data scientists can write whatever code they want with TensorFlow Transform or add whatever layers they want to a network, obviously, there's a lot of things that could happen within just the model itself to introduce latency. So if we get to the phase of performance testing, which happens before experimentation, we're like, oh, this latency is relatively high, and we're actually working on an inhouse framework right now to identify that as early in the pipeline as possible almost with the training and experimentation side. To say this will be a high latency model or essentially cost a lot to serve at low latency, we try and work with them to either optimize infrastructure settings or identify specific places in the model that could be improved. For instance, with TensorFlow Transform, we've had instances, where a simple nested for loop instead of more vectorized code, has led from 200 millisecond P 99 latency, going down to like 15 millisecond P 99 latency at the same RPS. And it's those kinds of things you're just like, wow, there are little things you can do in your code that really just have a huge impact on how a model runs in production.

Dean: How do you do knowledge preservation on that? Because it seems like a lot of these things are someone tried out something or looked at the bytecode behind TensorFlow and realized that using a for-loop Would be better. So do you have a Wiki for hacks for improving speed, or how does that look?

Kyle: We put together some code labs, which is kind of fun. Since we're TensorFlow first, we use TensorBoard primary to look at specific operations and the TensorFlow graph and stuff. Unfortunately, support for TensorBoard is a little bit more limited on the inference side than it is on the training side. And there's less features you get. But you can still look at a lot of different parts of the TensorFlow graph and say oh, wow, this operation is super expensive or taking up this percentage of the time, and it has helped us in many instances identify bottlenecks where data scientists have gone back, rewritten some code, and we've ended up with a much more performance model.

Dean: Would you say that that's usually the bottleneck? What percentage of bottlenecks are a result of the actual model versus data transformation? Things like that?

Kyle: Since it's more of an accumulation of bottlenecks rather than... We've had like one or two times where it's one bottleneck and it's like, bam, solved, done. But more often than not it's multiple improvements over time and making small changes to the code that have improved it over time. There's definitely always things we can do on the infrastructure side as well. We can always increase the amount of resources it gets, which increases costs. We can play with things like the client side timeout to essentially be like, all right, users will get results a little bit slower as a general rule here, but we're okay with that for this model because it'll give us better results still. But there are more things that we've done on the modeling side or a greater variety of things that we have done on the modeling side to decrease the latency serving time.

Dean: There are differences between if the user expects something to happen in zero time and then you take 5 seconds or something like that, or if you say like, listen, we're doing magic for you and then you have a loader for 5 seconds and then you can get a lot by, even if it takes less than that, sometimes the product advice is actually load for longer than the action takes because it looks like you're doing meaningful work for the user. Jokes aside, there are case studies that prove that this improves conversion and things like that. So there's many ways to think outside the box about improving performance. Some of them might be not to improve performance, but actually to just reflect to the user, some of them the magic that's happening behind the scenes. There is a trade off here, right? Every model requires a different decision making process, and then it depends on the task, it depends on the maturity of the organization. How do you guys balance the performance versus the results? And if you have any tips on how to think about this, like a framework to adapt this to my company or to any other company.

Kyle: Even though we're the platform team, it's not like we own these models. When it comes down to it, that is a product decision whether or not you're okay with having something get back in X time at X cost. But we have thought about it where we've had really a new iteration of a model that's very costly to serve. And it comes down to simple costs. Like, all right, it's going to cost us this much annually to serve. It's going to net us this much profit compared to the old model. Are we okay with that? From a monetary perspective overall? Are we okay with that from an environmental perspective, if it's that ridiculous of a model? And how should we move forward from here in a way that makes the most sense for the product? Thought about it in the simplest terms, it's cost of implementation and maintenance versus the profit. It's often a somewhat clear cut whether or not it's worth it.

Dean: Are you looking for a certain threshold of upside and that's just it?

Kyle: That would probably depend on the product itself. I don't think we have a universal framework for no go if it's only this much of a margin and yes, above a threshold. So no, I wouldn't say it's specific. I can only imagine the number of variables that would factor into whether or not that's the right decision for a specific product or a specific business. But at the end of the day, I'm sure product always makes the right decision.

Dean: You mentioned this in passing, but you're basically saying that you're trying to build in the mechanisms to identify these bottlenecks as soon as possible. So to me, what comes to mind is a profiler, right? I recently looked at this on a website and you want to see what are the bottlenecks and loading the website. Because if it loads horribly, then people will just drop off. They don't want to wait. If you have a machine learning profiler, that would be very cool. I don't know if that's something that exists right now, but how do you build this into the model building process? How does that actually work?

Kyle: We actually are working on a profile, which is pretty cool. I don't want to give too much away, but the fundamental idea is that we run profiling and small contained load tests as early as possible in the model experimentation process for a data scientist training. So if you're testing on a model for the first time, you can very easily export a TensorFlow artifact, deploy that to our Dev environment with TensorBoard attached and automate the run of some profile and using some data that you just trained with to see whether or not it's expensive to serve, and also export those TensorBoard results to get more insight into specific operations. So this approach has already identified some bottlenecks, early in the process in a couple of models. And it's been great because as opposed to data scientists investing tons of hours and then getting to the end stage and us spending a month or two then trying to cut back on features or transformations, we get to do that from the moment that they're trying out new things early on in that process. So that's been huge for us in terms of identifying problems early.

Dean: Does that happen as part of the CI for some stage? When do I do this as a data scientist?

Kyle: Yes, you can do it whenever. We have it as a CLI. But we're also trying to write an automation where you can basically run this as part of your training pipeline. If you have a model and data, you can write this into your pipeline, or you can run an ad hoc however you want that makes sense for the phase of experimentation that you're currently at. The important thing is just that it's done and done early.

Dean: What you're saying is, these are a set of tools that are used everywhere in software development, but then you go into machine learning and that doesn't exist anywhere. Having a profiler is such a basic thing that is so helpful. And then having unit tests and defining those, the criteria that you care about and then testing early, it's taken for granted in software development and then in machine learning, it's just like science fiction. So it's awesome to hear that you're adopting those paradigms as well.

Kyle: One of my personal philosophies is that machine learning is software development or some specialized discipline of software development. MLOps took us forever to get to, but it's a huge mind blowing realization for machine learning that it needs to be more like software development. And then we do the same thing again with profiling or governance or whatever it's going to be, and it just keeps happening like one thing after the other. So I think the faster that we make machine learning or take as many learnings from software development as we can to the machine learning side, the faster we'll kind of get to the more end state of machine learning is, and maybe it's infinite, software development will just keep improving, getting better, and all of that.

Dean: Let's say we're now at the state where you provided tools for data scientists to identify bottlenecks and maybe places where they can improve. That usually leads to a point where we have seen in a bunch of organizations that we've spoken with, where you have the trade off between data scientists being familiar with framework. The typical example is like Pandas, right, with different transformations that you can do, versus the fact that some of these functions are not performance. And then you have to choose, do I enable the data scientist to use the entire framework as they know it, or do I tell them that these are the functions that you're allowed to use in order to get good performance results. I'm curious how that works at Etsy, and if you've figured something out that others can learn from.

Kyle: We're not super prescriptive because we offer so much support around TensorFlow. For instance, folks tend to use TensorFlow Transform and all the TensorFlow functions, but when it comes to what happens within there, we're definitely not prescriptive at all. And I think Etsy does have a high caliber of ML practitionerד where we've had folks make contributions to open source TensorFlowת because they've noticed bottlenecks and specific functions. And they're by far and large, the experts over there where we're not. Besides being prescriptive with the general framework, we're usually not like, oh, no, you can't do that or this. Functionally speaking, they're allowed to do whatever. Even in the case of Selvin where, at the end of the day, if they get to a point where it's infeasible from a product perspective, folks are aware of that and would avoid trying to get to that state to begin with.

Dean: I think that that relates to the point of, if you think about this as part of a product, then you realize the limitations yourself. And it's not like, why are they forcing me not to be able to use this function, which is awesome. Generally, what tips can you give for people that are trying to improve their models and build out the systems for model performance evaluation?

Kyle: It's software development. Test early, test often. Write those unit tests early, understand how your model is performing early, because the later in the process that you do it, the more difficult it's going to be to cut your model back down to a state where it's reasonable to serve in production. The earlier that you identify it, the better that you can think about it as a consideration while building the model, as opposed to a blocker once the model is built.

Dean: That makes a lot of sense as well. What are your thoughts about machine learning infrastructure? How has it changed since you've been working in this field, and what are the trends that you're seeing?

Kyle: It's become a field, which is one thing. When I first got into data science, MLOps wasn't really a term, and then MLOps became a term, but I still didn't really hear the words like machine learning infrastructure, machine learning platform together. And then out of that, I slowly started to see more roles that weren't machine learning engineer, but software engineer/machine learning or machine learning software engineer or machine learning platform engineer. And I think that it's like all other things in machine learning rising to the level of software engineering, where, as a platform team working on machine learning, we get a lot of tips from, for instance, our web infrastructure team or our search infrastructure team, who are absolute experts in managing complex infrastructure for pieces of our product that have been running forever, and have tons of tips and good things to learn from. And then they also help us out identifying the machine learning specific troubles that we have, like serving a bunch of models in production has different considerations from hosting web services or whatever. And so we can help and work with them to make decisions to improve on the platform from there. As a whole, I just see it coming to par with all the other platform and infrastructure tooling that's been in place. I see a ton of companies trying to build out platform teams. That's not a statistical observation, just for my LinkedIn DMs and folks I've spoken with, but I see a lot of people suddenly having this realization that they need to build out a machine learning platform for their internal data scientists to drive effective value of machine learning.

Dean: The fact that adopting the mindset that you could just go over to the engineering team that's working on something completely different, but there are a lot of learnings that have happened there and can support what you're doing, is a great tip. Because most companies that have a machine learning team had an engineering team before. Not all, but most of them. So if you have someone senior there, then you definitely have the experience within the organization to learn from. As you said, there are differences. I'm curious if you found things that are helpful to mention upfront so that those engineers have the right mindset when they advise about scale.

Kyle: One thing is definitely how you set up your models in production, for instance auto scaling in Kubernetes, to be specific. But for many folks it's like, if you have an image already on a note, it's really easy to spin up a container and react very quickly to load with whatever application you're running a set of pods or replicas for. If we're serving a really large model, it needs to be loaded into memory and sometimes warmed up before it actually starts to make inferences. So we have some deployments that take 20 minutes, because we're loading a huge, huge model into memory. And it's not the kind of thing where you can just be like, oh, I'm going to react to load like that. You need to preempt it more or be a little bit more conservative in your estimates of when or when you shouldn't scale. So there's things like that that just don't really apply in many other spaces where you're just loading this massive object into memory, like the moment before you start an application. But things like that where it's a little bit less like oh, you can do this. It's like, oh, well, you can't exactly do that. We do actually have a few more limitations, because it's the machine learning space, on how and when we can scale or what we can do to improve an application.

Dean: That's a really useful tip. Thanks for sharing that. For people who are trying to get a mental image of the scales that we're talking about, can you talk a bit about some of the scale of data, the scales of models that you're working with, just so people understand where you're at, and then if there are any other counterintuitive adaptations like the one you just mentioned, that would be awesome.

Kyle: The scale is well into the terabytes of data generated per day. I don't even work with the data platform, but just in terms of the small percentage of feature requests that I know we log is like, 30 something terabytes a day over our big period this year. And I'm sure that's only a fraction of the actual data log per day over our largest period around Black Friday, and that this year, we got 280,000 requests per second on the machine learning platform, and most of those are batched. And so that's millions and millions and millions of permissions per second that we're making on this platform. Hundreds of nodes in the cluster at any given time, with many of them being a minimum of 32 CPU nodes.

Dean: You also have this where you deploy to a smaller subset of users, and then deploy it to the broader system. I'm curious, when you transition between those two modes, are there recurring themes of things you need to change in order to support that larger scale, or is it relatively straightforward?

Kyle: Usually we can figure out good settings for a model, even with that smaller fraction. Speaking from the machine learning platform side, I'm sure there are broader considerations as we go further upstream between stuff that goes on in web and upstream clients and search and all of that, andmachine learning systems with features and everything. But for all the complexities that come with machine learning deployments, we're running stateless applications on Kubernetes. It's pretty simple in that regard. Once you have it figured out there there's not a ton you can do to make a better improvement. So even at a small percentage of traffic, it's relatively easy for us to translate those settings to a model serving larger traffic. It's usually making sure that we are not setting a very low cap on how high it can scale and have a good sense of whether it's going to scale linearly to what degree when it reaches the full amount of traffic, estimating that beforehand and making sure our cluster can handle it.

Dean: It is very obvious that you're working intensively with Kubernetes. I think I mentioned this in one other episode that we recorded. The first time you work with Docker, it seems like magic. The amount of things you can do with such a small amount of code lines is incredible. And the fact that it's so flexible and generic is crazy. And then you get the Kubernetes and it's the next layer of magic. It's really insane, the fact that you can do so much with so little. But obviously, it's reasonably so that people are sometimes intimidated by these two systems. What are your thoughts about how familiar data scientists should be with these tools? And if one of our listeners is a data scientist who wants to learn these tools because they understand that it's important, how would you have them approach it in a way that wouldn't scare them away after a day?

Kyle: Docker is a ubiquitous tool that should be in every developer's toolkit at this point, including ML partitioners and data scientists. It's just too useful in the day-to-day for anything you might be doing to be something that you should ignore. It's super important, and because it's useful to the everyday developer, I think that's something that everyone should know and has to value in learning. Kubernetes, on the other hand, is something that's particular to scale. Kubernetes is like machine learning. If you don't need it, then don't do it. And so it's the kind of thing where if you're not working in every day, there's probably value in picking up those skills and understanding what's going on. But no, you don't necessarily need to go and learn it. I wanted to learn it as a data scientist and machine learning engineer because I thought it was really cool. I thought that was where I was interested in going, was the scale of machine learning applications. But if you want to learn it, I think there are a lot of great resources now. If you have Docker Desktop, you can go into that drop-down UI, click Enable Kubernetes, and you have Kubernetes running locally. You can get the same kind of practice with KubeCTL commands And everyday Kubernetes stuff locally, that you can in a real cluster to a degree, but you can get the same low-level knowledge in everyday debugging knowledge of Kubernetes locally, which is great. So if someone wanted to learn it, I would say just install it locally with whatever framework that you want and play around, take some courses, find some blogs. I wrote an article about Kubernetes for Data Scientists, which just has a super high level serving machine learning model locally. With Kubernetes example. There's a ton of different stuff you can do to get familiar with it without breaking your personal bank and cloud costs. Or something nice.

Dean: I have a blog about Docker for data science. So together we cover everything.

Kyle: That's all you need to know those two things.

Dean: There's this term of onboarding users to complex systems of gradual exposure. If you have a list of ten features and the user starts using your tool or whatever product, and then you show them a list of all the features and they'll be scared away because it's too overwhelming. And so what you actually want to do is, here's one feature, and now that you know that, here's another feature that improves your life here. Docker and Kubernetesshare that relationship. Docker is the single instance, you can think about it. And then Kubernetes is what happens if I have a bunch of these running around doing multiple things and needing to communicate. So definitely start with Docker and see that you have a decent handle of that. It's not hard to get to the basic level of using it, understanding what's going on, and then diving into specific containers if and when that's necessary. You don't have to be able to rewrite. For machine learning, there's this thing where a lot of courses tell you to reimplement famous models, write your own SVM, write your own neural network and things like that. So you don't have to do that with Docker. It's okay if you just know how to use it without knowing how to build it yourself. And then Kubernetes would be the next step. And I think that there really is a lot of content, just Google Docker/Kubernetes for data science, you'll find great stuff and you can start there. What are your recommendations for the audience? It might be related to any of the topics that we discussed, but also books you like, Netflix shows, whatever.

Kyle: I always see a lot of people wanting to get into data science or get into machine learning platform engineering. I always recommend people do what you're interested in, because there's an infinite amount of learning to be done in any of those spaces.

Dean: You don't have to love it. But if you love it, it will be a good North Star for you because you could follow your curiosity, just like pulling strings. Loving something and being curious about something is not exactly the same. Being curious is more like having questions, and if you can get to the point where you have questions about what you're doing and then follow those answers to the next set of questions that you ask, then you will be able to develop in whatever field that you want. There are still limitations, and of course, everyone has different circumstances and time allocations and things like that. But I do think that that plays a role. That being said, there are people that just work for a living, and that's also okay.

Kyle: I don't want to perpetuate any kind of stereotype that that's how people should be living or software engineering because it's not necessarily a mentally healthy way to live for a lot of people at all. But being curious is a great way to put it. It's a good way to foster interest in developing the field.

Dean: Fair enough. Trying to think if I have any new recommendations. The problem is we recorded the last few episodes in the span of a few days, so I feel like I don't have any new recommendations to give. I started watching a Netflix show, this is related to current events, but Netflix went ahead and translated The Servant of the People, which is the show that the current President of Ukraine, where he plays a teacher that becomes the President. It's satire. It's pretty funny. I'm enjoying it. And now they have English subtitles, so it's now accessible to non Ukrainian language speakers. It's also crazy that an actor that played the President actually became the President. Even the US doesn't provide Hollywood stories like that. Anything else you want to add before we wrap up?

Kyle: I watch a fair bit of Netflix myself, so I'll give a Netflix anime recommendation. Kotaro Lives Alone Is actually pretty good. I watch a lot of anime. Real skeptical of Netflix original anime, but Kotaro Lives Alone recommended.

Dean: I got a recommendation from Netflix and then I didn't watch because I feel like I'm overloaded with anime, so I'll check it out. I just finished the most recent season of Demon Slayer, which is not a Netflix original, but I enjoy that. Kyle, I really appreciate you coming on. And this was super interesting and I'm sure a lot of the people listening will agree with me. So thank you again and I hope to have you again at some point.