Back to blog home

Large models in production with 🤗 HuggingFace CTO Julien Chaumond

Dean Pleban
33 min read
5 years ago

Co-Founder & CEO of DAGsHub. Building the home for data science collaboration. Interested in machine learning, physics and philosophy. Join https://DAGsHub.com | DagsHub Co-Founder & CEO

Table of Contents

Share This Article

In this episode, I'm speaking with Julien Chaumond from 🤗 HuggingFace, about how they got started, getting large language models to production in millisecond inference times, and the CERN for machine learning.

Listen to the audio

Read the transcription

00:01:00

Dean: Today, I have Julien Chaumond from HuggingFace. Most of you have probably heard of HuggingFace. They've created some of the most widely used and love libraries for machine learning. First and foremost is probably their Transformers Library which started out from NLP, but now provides support for other types of models and lets you basically use them very easily and makes it much more accessible to a lot of people. Julien is a co-founder and CTO at HuggingFace, and before he worked at this company, he graduated from Stanford with a master's in Electrical Engineering and Computer Science. HuggingFaces, his second start-up as an entrepreneur and he has a ton of experience building Machine Learning systems and is, overall, a really awesome guy.

00:02:19

Dean: Can you share the HuggingFace origin story? How did you decide to start the company and where does the name come from.

00:02:30

Julien: So the name is quite random. We were looking at emojis, kind of defining element of modern communication that is the emojis. We really wanted to pick an emoji as a company name and we thought to ourselves, Clément, my co-founder, and I, that the Hugging Face emoji is pretty unique in the fact that it's an emoji, so it's not non-human and at the same time it, has human features, like the hands, and it's doing a really human gesture, which is giving a hug. So it's bridging between machines and humans in away. We liked that. We didn't necessarily expect to stick with that name. But, in effect, it grew onus and I feel like, it's a nice emoji to pitch. We started in 2016, really passionate about machine learning and especially about Natural Language Processing. We were feeling that a lot of things were starting to happen in NLP, back in 2016, and we basically wanted an excuse to leave our previous companies and work full-time on NLP. So that was the start of HuggingFace. It was like Clémentand Thomas Wolf and we grew the team pretty slowly at first. The first couple of years were pretty interesting and we've been super lucky tobe part of this machine learning, NLP community at the time wherea ton of things have happened and are going to continue to happen.

00:04:52

Dean: I heard this advice that you should always start with a horrible name on purpose, like choose a really bad name that you would never feel comfortable putting outside because if you choose something which is in the middle, then it grows on you and you stick with it even if it's not perfect, so you decide the name just before the launch. Admittedly, we did the same thing at DAGsHub, so I don't know if I could say that it's easy to apply this advice to you... you mentioned that you really wanted to work on an LP and I agree with you that NLP is super interesting and there's something fundamentally different about human communication, but you recently launched on HuggingFace support for audio and computer vision models. So I'm interested in your perspective: why start with NLP models? Why were you so excited about it?

00:05:56

Julien: Language is the API to humans right? Between humans. It's what you use to communicate. It's what you do, 90% of your activities using language. Be it sending emails, calling someone on the phone, talking to your family, your friends, anyone. We don't necessarily have a strong reason why, but it's always been our passion. We've always been super excited about language and our initial reason to get into machine learning was to be able to spend some time really trying to catch up with what was happening in machine learning apply to text and strong interest in natural language.

00:07:16

Dean: Today, HuggingFaces is a huge success and there's a huge community. Out of all the sort of Open Source communities I've seen in the machine learning world, I don't think that there is any community that loves the tool that they're using more than the HuggingFace community. I would even use terms like "religiously love", but maybe that's too strong, but people really love this Library. I'm wondering if you could share, what was the moment that you understood that this is going to be super successful? Where do you say, holy cow, this is going to the moon?

00:08:57

Julien: First, it's super humbling to have this community of super skilled, super talented people all over the globe, basically enjoying what we do, and also it's not something that we do for the community. Basically, we try to help the community do stuff themselves. We are part of the community more than a company that tries to do stuff with the community. And to be honest, it's something that's pretty unique. A lot of people on the team tell me, and I feel the same thing, every morning it's super exciting to get up because you have this incredible energy from the community and we have this chance of being able to serve the community. Those are super, super talented, skilled people everywhere. So it's awesome. To answer your question, right from the start of the company, 2016, we started doing some open-source. One of our first open-source libraries was a neural co-reference resolution library. Co-reference is basically linking pronouns to subjects in a sentence, for example, take the sentence "My wife got up late this morning. She slept really late because of the heatwave in Paris right now". That's a true story. So "she" refers to "my wife" and co-reference resolution. We had this first library that was built on top of Spacy, which is a great NLP library. It had some success, but to my point that pretty much from the start, we were interested in publishing stuff as open-source because we felt that it was a good way of making sure that what we were building was useful to others as well. In 2018, this BERT paper was coming out, Thomas Wolf took a week where he and a few other people from the team didn't sleep much ported the original Tensor Flow implementation to PyTorch, and we found out that a lot of people who are looking for a nice, clean implementation of BERT in PyTorch. Pretty much over the course of a few weeks, we felt like oh wow, this is really useful for a lot of users. That was the starting point of our focusing more and more on doing everything up in the open by default, trying to make it as easy as possible for anyone to access the state-of-the-art models published by the big tech companies, the best universities in the world. When they release code or models that are not necessarily optimized for actual usage from anyone. So, we really felt like it was super useful for us to start doing that. The adoption has been great.

00:12:28

Dean: That's really impressive. First, it's crazy, how both computer vision and then NLP had this one transformative model. Always, you say science moves forward and small steps, but then you have this game-changing BERT which is really crazy, and I did not know that you implemented it so quickly after it came out. That's also very impressive.

00:12:58

Dean: This is an interesting point because you're sort of combining – on one hand, the community which, a lot of times, people that don't have access to the resources that the big players like Google, who created BERT, have, and then, you need this balance of, on one hand, working with the code and the things that creep that are created in the larger organizations and then building them out for humans and not just for cutting edge, research groups. Today, if there's one thing that seems a constant trend is that the best models seem to be bigger and bigger, which also means that they require more resources. So, do you balance this? On one hand, trying to reproduce and host the best models, and on the other hand, still wanting to be relevant to the community, and also of course, now that you have this huge community, I'm guessing that the ideal situation is that more and more would come from the community. So how do you make that accessible to them conveniently?

00:14:20

Julien: On the research side, we do have a really good research team inside HuggingFace as well, and they collaborate a lot with the best research labs in private companies and academia as well. Models are getting bigger and bigger on the science side. On the production side, what's really cool.. . so basically, 2018-2019, those big models were when it came out, but they were still not super used in production workloads in companies. W Hat's been happening since 2020 is that a lot of use cases are now starting to be addressed by deep learning models, something that we didn't really see before, like a lot of companies have machine learning or data science teams, but then in production, they have less modern machine learning models, and it was a struggle for their machine learning models to move from experimentation to production. I feel like this changed a lot over the past year, basically, and now a ton of companies are actually using those kinds of large models, maybe not the largest ones, right? But BERT sized models in production, and so it's been amazing because a lot of those companies are really seeing benefits of using those kinds of models in production versus what they were doing before. So I feel like the model sizes are increasing on the production side and also obviously increasing on the research side, obviously, almost no company is using ten million parameters models in production except for the biggest tech companies, maybe, but there's kind of trend to move to bigger and bigger models that we are happy to support. So we started monetization –we have some customers who are those types of companies looking to deploy deep learning models to production and it's been super interesting working with them on helping them unlock the potential of those modern models in their use cases.

00:17:40

Dean: I think, for me, the moment where I said, something here has changed with respect to bringing these huge models to production is, at the beginning of 2020, sometime around then, Google announced that they're going to apply BERT to their search. That was some product that probably every human that has access to the internet is using, and it's going to have this very big model behind it. So to me, that was a watershed moment. You mentioned that you're seeing this across the industry, but I'm wondering: aside from HuggingFace, which is obviously had its share of contribution to this, why do you think it's happening now? What's changed that more companies can afford or are actually putting these large models into production?

00:18:31

Julien: Yeah. I think it's not just us, but a ton of companies are providing services and tooling around machine learning, tooling, MLOps, it's what you guys are doing as well at DAGsHub, other companies as well. I feel like it's easier and easier and we are striving to democratize access to those kinds of models as well. It's becoming easier and easier to experiment and then it's becoming easier and easier to deploy to production as well. There's a lot of value in the intersection of software engineering and machine learning, something that we've believed for a really long time. But like BERT, if you run BERT out of the box, you're going to already have a decent performance. If you're really looking for scalability, low-latency cost per inference, you can do things like, there's an array of optimization hardware, software, that you can use to make it super efficient, and at scale, we help a lot of companies actually on exactly this kind of workload like scaling BERT type inference. We managed to do it and then to perform inference on BERT on the scale of like a few milliseconds. So super-low latency. If you manage to achieve a cost-effective way of reaching a few milliseconds inference times at scale on BERT size models, there's pretty much no reason not to put them into production, because it's going to be super cheap. Basically, we have some customers who are performing tens of millions of requests per hour, like in the messaging space or stuff like that, if you want to classify every message, spam detection, bullying detection, stuff like that. You want to feed every message that you get on your platform into a BERT model. If it takes just a few milliseconds and you're able to do it on a few CPU machines, especially, it's going to be a no-brainer to deploy those kinds of models because they are going to perform so much better than custom models that you train from scratch which was the paradigm that you had before. The types of model that I was mentioning before, models that people had in production two years ago, they are pretty much trained from scratch on small annotated data sets that were custom-built for those companies, whereas the current transform learning paradigm, is you take your large model that's been pre-trained on a non annotated data set and then you find you need a simple efficiency. You don't need as many, you need way fewer annotated samples to reach the same or better performance on your end and use case, which is awesome. It's a game-changer in machine learning.

00:22:39

Dean: I was trying to explain to a friend what fine-tuning a model means at a high level. It's like saying that the old models, you had to bring all the knowledge to the model yourself and that's a lot of hard work and today you can bring sort of this common knowledge, this trivia knowledge from outside and then just teach the finer details of what you want the model to learn... so I agree, this is probably one of the largest game-changes that we had, and this is across the board, it's now being used also in other domains, even though it started mostly in NOP, or at least we saw the returns more from NOP. You have a very unique situation. Maybe a lot of people are envious of you, not because you have such an amazing and exciting community, but also because you get the chance to work on so many models and productions. A lot of times we speak to companies that have, as you say, a large data science team, but one model in production, and so, because of your model hub, where users can experiment with models easily online, which is really awesome, if you know someone non-technical and you want them to be excited about machine learning, probably the easiest way, I know right now is to go on your model hub, choose some sample and just let them type in something, I really think it's a magical experience. You can use state-of-the-art models to predict your own inputs, which is really cool. But it also means that you have a ton of models in production, in real-time, with a ton of requests and everything, which I think few companies successfully manage. So I'm interested to hear if you can share how you do it and what are the challenges you've overcome along the way in doing this.

00:24:38

Julien: The first point is, as those models are pretty similar, most of them are transformers models. Obviously, they share a lot of architectural stuff. So if we manage to optimize some of those, we can pretty much transfer that knowledge to other models as well. A lot of the optimization that we do to run models in production for inference is horizontal and can be applied to all models at once, which is awesome. So we have this deep knowledge of the models themselves and we can apply this knowledge, the fact that we have a super-strong, open-source team, but we also have a stronger and stronger machine learning optimization team, people who are good at diving into the internals of the machine learning frameworks and even deeper to the internals of Computing Hardware devices like CPU object use. We try to optimize those models as much as we can across the board, and then it's like stand-up, like cloud computing architecture. So stuff like Kubernetes, where we launch models on the fly, we don't keep all of those models live. We have more than 10,000 models publicly available on our model hub right now. Not all of them are alive in the inference API all the time. Some of them are because users are using them on production, use cases, for instance, so we pin them to be always on, but some of them, the ones that, for instance, you try every once in a while, you play with the inference widgets on the model page. It has to be loaded on the fly. So that way, we keep the cost of running this infrastructure reasonable. There are a lot of things that we still want to improve on that subject. But the gist of it is using modern cloud computing infrastructure to deploy models on the fly, depending on different sets of constraints.

00:27:40

Dean: It sounds very impressive. Maybe I'll ask a more specific question. Like, what would be, if, if you had to choose one? What would be the biggest challenges that you've conquered with respect to machine learning in production and HuggingFace?

00:28:14

Julien: If anyone is watching and has a lot of production constraints related to the low latency or the volume of requests, feel free to get in touch. We have this product that's in beta, that companies will be able to deploy on-prem if they need to. So, basically on their own hardware, and it's super optimized. That's the system that we are using to get to sub-millisecond inference times at scale on BERT-sized models. Without diving too much into the details of it, it's like building on top of a lot of great work from cutting-edge hardware manufacturers, like Nvidia, Intel, and others. There is this kind of extended serialization format for machine-on-machine learning models, which is called ONNX, and then there is a lot of really great work happening right now on efficient ways of running ONNX models at scale. For anyone who wouldn't be too familiar with ONNX, and inference optimizations, I can maybe give one example, which is pretty cool. It's called Operator Fusion. Basically, you have this graph of a very large, deep learning model with a sequence of operations. For instance, matrix multiplication followed by softmax, or an activation function. If you know that you're going to optimize your model for production on specific hardware, there are some tools right now that are going to take this static graph and are going to compile it in a way, like fusing some of the operations to run them as one single hardware operations. For instance, matrix multiplication, and then activation function, if you do it sequentially, it's going to be slow. If you have access to hardware that can blend those operations as one single instruction, you should definitely do it, and those tools are capable of doing that. It means that each time you deploy your model to specific hardware, the model is going to optimized itself to run super efficiently for the instructions set that you have on the hardware.

00:31:34

Dean: That sounds crazy awesome and I'm intrigued to learn more. The personal touch here is, I'm from Israel, and we have a Habana, which was an Israeli company that was acquired by Intel not long ago, for a lot of money. They are doing very interesting things in that area, though it's very hush-hush, I'm sure some of the people working there will be listening to this. So, shout out. One question which is specifically interesting for me is, there is this hardware challenge or not hardware, but let's say the ops challenge of transitioning a model from what the data scientist was working on into production environment and there are tools like DAGsHub or others that you mentioned that help in doing that process, but I think that there's a huge part that workflows play in this, and I'm interested to hear how the transition from research to production looks like at HuggingFace. So if I'm a data scientist working in the research team, I have this new, cool thing that I've created: what do I now do until the point that it's in production and the end-users can use it?

00:32:01

Julien: I think it depends a lot on the new companies or the organizations. What we've seen more and more is that the machine learning team or data science team inside the company is pretty sophisticated. They know a lot about machine learning, and then they are in touch with the production team or the DevOps infra team at their companies, which is usually at least like an order of magnitude larger in terms of size, to the machine learning team. But what we're seeing is that they managed to talk more and more. It's easier and easier for infrastructure people, software developers, software engineers, to grasp the fundamentals of ML. It's also easier and easier for data scientists, machine learning engineers, to grasp a little bit about production. So I feel like, in the companies that we see, it's starting to be easier and easier for bridges to be built inside the organizations between software engineering, infra, and machine learning.

00:34:43

Dean: I tend to agree. I think we're probably going to meet somewhere in the middle. Now, another specific question about this: do you guys use notebooks in research, and if you do, when do you transition from notebooks into python modules? I know that's a recurring question.

00:35:06

Julien: I think it depends really on the individual. I think a lot of tooling is being built and makes it easier to switch from notebook to code, so that's great. Personally, I don't use a lot of notebooks, I basically pop open a Python debugger, VSCode, or PyCharm, and pretty much dive into the code right away, but I realized that not everyone has the same workflow. I feel like it's not too hard to move some code out of a notebook and into modules. There are also some tools that you can use to build, even like prototypes and web demos, like StreamLit and Gradio. Gradio is super awesome, in my opinion. It's super simple, but it does the job perfectly. You basically specify inputs of a model, the output types of a model and it feels like this super simple interface for it. It's pretty easy as well to spawn a Starlette server or Flask server and expose your super simple HTTP endpoint on top of your model. So I see more and more data scientists and machine learning engineers doing those types of quick prototypes, which is great, because that way, way more people, way more stakeholders inside the organizations can give feedback super early on. It's not like, okay, we're going to spend six months building a model, and in six months we're just going to hand off this model to the production team and they will have no idea what it does, no idea what it's limitations are. They will be completely clueless and then if you're lucky, like six months later, maybe it will be deployed. More often than not, it won't. So I feel like there are so many super cool machine learning tools right now, every one of these tools is useful because it makes it easier to communicate about what we are doing as machine learning engineers, communicate with other stakeholders, and build machine learning as part of an organization rather than in our own Silo, in a way.

00:38:14

Dean: Connecting it to understanding that machine learning is a part of the product and not living in a separate world is, I think, very important. So I'll ask a few, maybe higher-level questions, and then we'll get to some questions from the community. The first one is: what do you think is the most interesting topic in machine learning right now? When you guys started, maybe, NLP and Transformers were the hottest things. But what do you think is the hottest topic at the moment, and maybe, also, specifically, what do you think is the most interesting topic in machine learning in production, and why?

00:38:55

Julien: I'm super passionate about the intersection of software engineering and machine learning. So I'm super excited about all the different things that are happening around, like ML Ops, ML tooling. I feel like we're only at the start of what we can achieve. The goal is really for it, in my mind, to unleash the potential of the state-of-the-art research that's happening and make research 10x faster and make deployment to production 100x faster. If machine learning tooling manages to achieve those kinds of orders of magnitude, I feel like machine learning is going to have a huge impact that is going to be deployed pretty much everywhere.

00:40:17

Dean: And for machine learning, as in, more the research side, is there any topic that you're specifically excited about? Reinforcement learning or huge models like GPT-3?

00:40:38

Julien: I'm not that excited about super large models because, I mean, they're interesting in the way that they stress test the infrastructure, but that nobody is really going to use them out of the box in production, so I'm not super excited about this research avenue of going bigger and bigger, for the sake of going bigger and bigger. What I AM super excited about is obviously, we've seen super large models in the past year. I'm super excited about this initiative that we are part of which is codenamed Big Science, where we have this plan of building the super large language model and training it ourselves. But in the open, as part of an opened, wide reaching-collaboration with the largest possible number of institutions like companies and universities. So, I'm really excited because so far, the largest models, they've been built by one specific company. Not a lot of researchers are working on subjects such as biases, AI ethics, efficiency, energetic efficiency as well. They haven't been able to really dive into those models because they are pretty much the models of specific companies like OpenAI Google. So this Big Science initiative is super exciting because it's super open, and so, yes, one of the goals is, in the end, to have as an artifact, a super large model, but it's also the process of getting there that's interesting, and that's open and so, the data collection, how can we build everything from scratch to make it more ethical, so let's just read it for instance, because of course, you're going to have a quality of data issue if you just take Reddit as an input. How can we balance the datasets? How can we make sure that every language is going to be represented? Those are super interesting questions and it's going to be a long-term project, it's going to be a year or maybe a year and a half long project. I'm really looking forward to the initial results from this project – Thomas Wolf, who has a personal background in physics, has been describing this collaboration as the CERN of machine learning. Super open collaboration. We have high ambitions of having a super large impact at the end.

00:44:34

Dean: That is something I can get behind. On a personal level, when we started DAGsHub, part of the origin story for us was, we wanted to make it easy to do open-source data science, which is basically what you described, where the whole process of training. The model is a community effort and people can come in from the outside and take part, and it's not constrained to the labs of top organizations. So, I empathize with everything you said and I need to read more about this and see how we can help, but this sounds, I agree, very exciting. So learning is something important to probably everyone in the world but also, specifically, when you're working in a cutting edge field like machine learning, so, on a broader level, it doesn't have to be related specifically to machine learning or to what you're doing at HuggingFace. What things have you learned recently or are you currently learning?

00:45:56

Julien: HuggingFace is a company, obviously. Of course, it's a community-driven project as well but we still raise money and have this goal of building products that are going to be useful enough for people so that we can monetize them and sustain the whole open-source, open-science approach to machine learning. So as part of that, we are in the process of growing the team and it's super interesting honestly, it's been one of the most interesting things I've been learning ever is – how do you manage to grow a team in a balanced way, in a way that works well for both personal productivity, keep everyone as productive as possible, but also keep everyone in line on this goal of basically building the place on the web, where the whole community of ML gets together to build the future of AI. Trying to keep this balance of personal productivity for everyone, the team, as well as being more and more aligned on the long-term vision of making something great and super impactful in the open is super interesting to learn about. None of us is super experience at building large teams, so it's a learning experience and it's super interesting. What about you? What have you learned recently? That's a kind of open-ended question which is a great question.

00:48:03

Dean: I relate to what you're saying. We're also growing the team and that's always... there's this saying that hiring is the hardest part of building a company and it's hard to say if that's true because there are a lot of hard parts of building a company, but it's definitely one of the harder parts, or we spend a lot of time trying to think about exactly what you said, which is how do we grow the team while maintaining the culture and the vision and having everyone very focused on what we want to achieve. One thing I'm reading is the history of Netflix, that's very interesting from an entrepreneurial perspective. The book is called That Will Never Work and it's written by the first CEO of Netflix. It's super interesting because he's giving you the firsthand view of a very early stage of a company that now everyone knows, but at the time it was struggling like any other startup. So having hindsight is like 20-20, but I think it gives a lot of good points of view on the process. My favorite quote from the book so far is actually a quote from another book, which I don't remember who wrote, which is: nobody knows anything and the point in the context of Entrepreneurship (but this also applies to machine learning) is that a lot of times, the development that will change the world or make your company successful is exactly the development that you thought doesn't make sense and would never work. So, his claim, which I tend to agree with, is that the key capability for success is not having good ideas, but just experimenting with as many ideas as possible, because you won't know which ideas are good and which are bad and if you experiment with more ideas you grow your chances of landing a good one. So that's one thing that I've been thinking about a lot recently. We actually want to print this on our office wall because I think it's a good reminder to be humble and I really emphasize it. The other thing is, both of our companies care about communities of technical people, data scientists, machine learning engineers, developers, and we're learning a lot all the time both from the community, like interacting, but also, in a more academic way, about what's the best way to do that. Like, how do you provide value to these communities and everything? So that's also something that is on my mind and I'm actively learning about.

00:51:16

Dean: So now, I'll ask you a controversial question. Tell me something true but that few people agree with, in either the fields of machine learning, MLOps, or other fields you're interested in.

00:51:45

Julien: Maybe it's more on the business-building side more than on the machine learning and technical side. But there is this view that it's hard to build a large company on open-source. The examples of X successful companies in open-source, tend to be pretty much all in the infrastructure space, right? Like databases, search engines, or even lower-level stuff like deployment platforms, deployment systems, stuff like that. There is no super-successful open-source machine learning company because this field is still recent. What I would say is that we strongly believe, but it's not super obvious yet, is that the intersection of open source, open science, and machine learning is going to be incredibly successful and it's going to bring life to a new generation of open-source-type companies that are going to be nothing like the previous generation of open source companies. It's a totally different beast, but I think that those companies are going to be super successful. It's maybe not controversial but it's a strong conviction that not everyone is probably agreeing with.

00:53:51

Dean: Fair enough. I remember the stories of the early open source companies when they would go to raised around or talk to customers and their response would be like, what are you crazy? You're putting the core IP of your company out in the open, someone will steal it from you and then you won't have a company anymore, and I think we're beyond that point right now, but I agree with you that machine learning open source is very different from software open source, and definitely infrastructure open source. So there are still things that remain to be proven, but I agree with your controversial, or maybe semi-controversial statement.

00:54:37

Julien: Yeah, so maybe just adding to this point. Machine learning is probably unique in the sense that, when you deploy models in production or training or inference, it's super expensive, right? The computer is expensive, training model costs a lot of money, so I feel like if we improve the tooling it's going to be more accessible, it's going to be less and less expensive over time, especially compared to the impact that it's going to have, but at the same time, more and more companies are going to deploy machine learning and the aggregated sum of computing costs involved in machine learning is probably going to go up. It's definitely going to go up and it's massive, right? So those kinds of machine learning tooling companies, have this advantage compared to old open source companies in that, there is an inherent stream of infrastructure that's pretty seamlessly integrated with machine learning. You can't do machine learning without significant infrastructure. This is going to be my point, that this is going to be massive.

00:56:16

Dean: I agree with that as well. I am the few people. I'll open it up to community questions. We'll try to get through all of them, Though I'm not sure how much time you have, so we'll get through what we can, and I apologize to someone if we didn't get to your question. I'll start with a question from Matthew, he's a CEO of a company called Yokai, which is also a French company, and he asked me to say that they're hiring, so, if you're looking for a job working in a really interesting, machine learning company, you should reach out to him, but his question, which I really am interested in hearing what you have to say about is: you've had huge success at leveraging research teams. I think that puts you in a pretty unique place. There are other very successful companies with research teams, but most of them are significantly bigger, like Google and stuff like that aren't the good example for anything. So, what are your tips for organizing research teams in an industry setting?

00:57:43

Julien: I would probably give two answers. The first one is the obvious one, just human interest. By default, people are really interested in what other people are doing. So if you manage to build an organization or to foster an organization where people have a way to reach out to other people in other teams, you'll be able to bridge that gap between research and production in the end. Because software engineers are more and more interested in machine learning and even including the research side of it, research scientists are more and more interested in actually deploying those models in the right way. So just organization stuff. And the second part is, a lot of what we are doing is actually API design, spending a lot of time thinking as part of open groups of how to best represent, for instance, the inputs and the outputs of a model but not for one, specific model. But to be applicable, to hold ranges of models or problems or use cases at the same time. So if you can emphasize that API design is an important part and design, in general, is an important part of user experience, developer experience, and it's going to make your life way easier in the long term. It's easier for people to spend some time acknowledging that they need to think about that kind of subject before jumping in and encoding.

00:60:06

Dean: Thanks for the tips, this is useful. The second question is from Yosef: regarding accelerated inference, and specifically for GPT Neo, the 2.7 billion parameter model. There's currently no way to use loss functions. Is this something that you plan to release in the future?

00:60:30

Julien: Yes, we have actually talked about this right before the call. The inference API that we provide is basically static versions of models. So you don't train those models, it's a different product, so you don't have a way of specifying loss function. We do have this product called Auto NLP, where you can fine-tune the model. For instance, if you want to start from, like, one specific checkpoint of GPT Neo and fine-tune it, you can do it and there's a lot of, hyperparameters that you can specify. By default, Auto NLP is going to pick the best ones, do a hyperparameter search, and pick the best performing ones, but there's a lot of flexibility so you can pretty much tweak what you need to tweak.

01:01:34

Dean: So your stuff is covered, you need to use Auto NLP. We have the first question from Uri Goren, who is, among other things, an NLP guy. What is the most common use of HuggingFace? Is it information retrieval like SentenceBERT, classification, generation?

01:02:05

Julien: On the research side, I would say all of them. Under the production side, I would say definitely classification. A lot of actual use cases in organizations around document classifications or token classification to do information extraction. OpenAI was kind of saying that even a classification problem can be solved by using a generation model, which is technically true, but we found that in most organizations right now it's not an efficient way of solving a classification problem. The efficient way of solving a classification problem is to just fine-tune a classification model and the majority of use cases in the industry are classification problems. I don't know if you guys saw the same thing on the subset of DAGsHub users who are training NLP models. But for me, it's clear on the usage side.

01:03:31

Dean: I feel like text generation is an interesting problem. There are a lot of aspects to it that are compelling, especially after GPT-3 came out and a lot of people showed demos of stuff that it can do. It looks very compelling but then the devil is in the details. And the question, now, is how efficient is it in production, and can you actually use it for real-world stuff? And I would definitely say that we probably have much less experience than you guys have in this, but, classification seems to us also, more common, and then information retrieval, and then, I would say generation is probably still last, but that might change. It would be interesting to see trends. Maybe it's the least, but it's trending very high upwards, so I don't know. The other question from Uri was: text generation requires a search algorithm. An example he gave was a beam search, and many use cases require a constrained search, like Text2SQL or something like that. Is the search algorithm something that you're planning on integrating into the Transformers Library?

01:04:49

Julien: In general, the code base is super open to any contribution, and especially on the generation side, I feel like we have a lot of super-talented people who have improved the codebase over time. So definitely.

01:05:14

Dean: This next question is from Yonatan: are you planning on adding integrated vision and language models into the Transformer Library? I guess this also applies to the answer you just gave a moment ago, but I think he's talking about things like Dolly where you have multiple inputs and outputs and stuff like that.

01:05:38

Julien: I think multimodal models are super cool. I think they are going to have also some really interesting use cases outside of research super quickly.

01:05:55

Dean: Personally, I was more impressed by DALL-E than I was by GPT-3. It looks more practical, I would say. But again, this is me personally. The next question is from Imry, who is asking about the business side. What is your long-term business model? Is it solving business problems for large clients or becoming a platform for machine learning?

01:06:30

Julien: It's definitely like being a platform for machine learning, our goal is to build tools and we're still super early. We realize that for instance, our hub is still super early. We have a ton of things that we want to do, that we want to build, but we are maybe two percent of the way there. But the goal is to provide products that are not only for a few companies but are pretty much for everyone who is doing machine learning. If we can go there by doing some partnerships with larger companies first, giving us the space exploration time or credits to build something great on the platform side, it's awesome. The ultimate goal is to build something that's going to be useful for everyone doing machine learning. We expect that machine learning is going to be as big or bigger than software engineering in 5 years. You won't do any software engineering without at least some of the time, you will use some machine learning, right? It's already the case but, it's going to be clearer and clearer, I feel like.

01:08:20

Dean: I think I agree, and this is also maybe a good answer for the "something true that few people agree with" question.

01:08:27

Julien: If we all manage to make machine learning more accessible than software engineering, it's going to be even bigger, right? For some of those models, you can pretty much develop an intuition of how they work just by giving them some inputs. If you make it super easy to use, it's going to actually lower the barrier entry to software engineering, which is still something that you need to devote a lot of time to. To be a good software engineer today, you need to spend a few years and also have the support system to enable you to grow as a software engineer. I feel like maybe machine learning is going to be one way to make software engineering more accessible. So anyone can start building stuff no matter where they are, what academic background they have, and that would be super useful. I'm going on a tangent here but it's kind of contrary to this opinion that you see in a lot of Silicon Valley-style blog posts that, 1% of the population is going to be machine learning builders, AI builders, and 99% of the population of the world is going to be a consumer of technology produced by AI. I feel like it's absolutely not the future that we want. It should be a hundred percent of the population that knows how to build stuff with AI. That would be a to be a much better future to look forward to.

01:10:30

Dean: I think the optimistic version of that is – you have computer lessons in grade school today, you would have ML or AI lessons in the future and you would still have research and cutting-edge models that are maybe less accessible, but over time, things will percolate down and be more accessible to the wider community. This is actually a good segue to the next question, which is from another Uri, Eliabayev, who's one of the founders of a really big machine learning community in Israel. He's asking what do you think about automation and simplification of the model creation process? Do you think we'll see Wix for machine learning models that will have wide adoption and recognition? Maybe Auto NLP is a step towards that.

01:11:27

Julien: Hopefully. In general, I think we are going to see more and more of those tools. I think they are going to be useful in making it more accessible. Not just fine-tuning a model, by the way, but also developing this intuition of how it works, what its limitations are... every machine learning has a set of limitations. That's why explainability visualizations are so important. If you make it as accessible as possible for anyone to develop the intuition that a machine learning model is a function approximator trained on a specific space of input, you pretty much make it known and make it intuitively known to anyone, what machine on the actually is. And it's never going to be a silver bullet. It's always going to be an approximation trained on a super-specific set of inputs.

01:13:03

Dean: The last question we have from the community is from Shachar. You might have somewhat answered this, but if you have anything to add: how do you plan to support real-time inference? Do you have an inference package to support the distilled or smaller models? You touched upon this with the acceleration and customization, but...

01:13:31

Julien: Definitely. We have this product in private beta. Anyone should feel free to get in touch if they want to beta test it. We are looking for a lot of beta testers. Without diving too much into the details, it's built on top of software such as Triton, which is a pretty cool Onyx optimized run time. It's awesome. I feel like they are going to be so many breakthroughs in constraint production, super-low latency, or super high volume in France. So those are super exciting subjects.

01:14:37

Dean: Thank you for all of those questionsת I'm sure everyone will be glad that they got a chance, and let's end with recommendations. It doesn't have to be data science or machine learning-related, but anything you would recommend to the audience.

01:15:00

Julien: Pointing to something that we released recently, the HuggingFace course. We have two tremendous teams in the open-source team, who have been doing awesome work. They're doing a course and stuff like that. You probably know that it's way more work than you would expect from the outside. So they've been tirelessly working on this course for the past few months. I think it's really good. I would love some feedback from anyone interested in giving it. It's also going to be community-driven, community-centric, super interested in getting feedback from people on the course, and trying to help in any way we can.

01:16:13

Dean: I got the launch email for the course, and I was excited. It looks really cool, and I'll definitely be recommending it to friends.

01:16:22

Julien: I think they've done a great job at explaining some of this stuff in a more accessible way than the documentation diving into the code. I think it's going to be awesome.

01:16:43

Dean: I'm looking forward to looking at it, myself. I'll make two recommendations. One is non-technical, I'm not sure whether it's good or not, but I just finished seeing a series on Netflix that is called Sweet Tooth, which is based on a DC Comic. I enjoyed it. I'm not sure if it's high quality or not but I liked it, so maybe someone from the audience who's also a nerd might like it as well. Another thing that I would like to share, which is on a technical level, is not yet released, but by the time you're watching this, it will be. DAGsHub is launching an open-source tool that is called Fast Data Science, and the idea is that a lot of the people that we've spoken to are working, obviously with versioning tools: Git, etc. And if they want to incorporate data and model versioning, they're working with DVC, but a lot of them are having a hard time either convincing colleagues to use it, because they have to learn a new tool, or they there's a lot of command duplication and things like that, so Fast Data Science, as its name suggests, is supposed to make those things work faster and smoother. The idea is to unify Git and DVC while leaving them as a lower-level solution. So, if you need a complex command, you can always dive one layer deeper and use the regular commands. But will make your life easier with the most common commands. And so, I'm excited to see what people think about it and how they use it. Those are my recommendations this time. So with that, Julien, thank you for giving us some of your time. I really appreciate it. I really enjoyed the conversation, I learned a ton of things even though I thought I knew what to expect, I was surprised and that was fun. So thanks again for your time and hopefully, we can have another one of these conversations in the future.

Recommended for you

👏 A Practical Approach to Building LLM Applications with Liron Itzhaki Allerhand

a year ago • 1 min read

Bringing AI to Production with DagsHub and Red Hat OpenShift

a year ago • 2 min read

📡 Building Scalable ML Models with Natanel Davidovits

2 years ago • 1 min read

© Copyright Dagshub 2026