Simplifying Complex Ideas with Yannic Kilcher

Dean Pleban
28 min read
4 years ago

Co-Founder & CEO of DAGsHub. Building the home for data science collaboration. Interested in machine learning, physics and philosophy. Join https://DAGsHub.com | DagsHub Co-Founder & CEO

Table of Contents

Share This Article

In this episode, I'm speaking with the one and only, Yannic Kilcher! We talk about sunglasses 😎, the value and methodologies behind taking complex machine learning research, and making the idea accessible and digestible. We also discuss reproducibility in machine learning and the moving between research and entrepreneurship.

Listen to the Audio

Read the transcription

Dean: Welcome to the MLOPS podcast. My name is Dean, your host and today I have none other but Yannic Kilcher. He has a YouTube channel with the same name that has helped hundreds of thousands of people make sense of cutting edge machine learning research. He also holds a PhD in artificial intelligence from ETH Zurich and he is the CTO of DeepJudge, a startup that is doing NLP for legal documents. As everyone already knows, he is bringing back aviators. How did the sunglasses come to be?

Yannic: It's special. When I started appearing in videos, I actually have some videos where I just appeared by myself, but when I've given more consideration to appearing in videos myself because I would always just do a voiceover over me explaining a paper. So there was never any face in it and still isn't. So the paper explanations don't feature me as a person, as a face. But when I started appearing more and we also started the machine learning street talk at that time, I was kind of worried that because deep fakes were coming up and they were becoming pretty good for personalities that had a lot of recordings online. So people like Joe Rogan, where you have thousands of hours of just him talking, and people were pretty competent at making very good deep fakes out of him. So I thought to prevent that just a tiny bit, to prevent thousands of hours of me just talking, being online, I wear these for privacy reasons, but technology has progressed so much that now it's possible to make the same deep fake from 30 seconds. So it's kind of futile, but it's become a branding thing. People recognize the glasses. I don't know if it was a smart decision because when I do live coding, now I have to live code with sunglasses on, I have to turn up the brightness.

Dean: I didn’t know that it started from an adversarial machine learning idea, that's really interesting. I guess that with the progress of technology, you can't count on anything. Soon you won't even need one photo and they'll reconstruct your face from your voice or something . You can keep swapping up the sunglasses and then it will be your means of authentication, is the deep fate of the up-to-date sunglasses or not?

Yannic: I’ll print like a public SSH key across the sunglasses for every new video.

Dean: That's one way to do the verification. Thank you very much for taking the time to speak with me today. You're far from being the most famous Kilcher around the world. A lot of the audience probably don't know this – can you share about it?

Yannic: There's a man called Yule Kilcher who was in my grandfather's cousin who emigrated to Alaska in the 30s-40s. He was a hippie before there were hippies: in that time, these people were generally called "homesteaders". He took his family with him to Alaska and lived off the land there. There is an area called Catchmak Bay, a town called Homer, which is a couple of hours from Anchorage. What's really cool there is that there's a bay which kind of gives you a bit of protection from the big storms. And they had an earthquake or something like that that just laid bare an entire section of coal in the mountainside. So essentially, apart from it being Alaska and stupidly cold, you've got lots of fish, you've got lots of wood and you've got lots of coal lying around to make fire. So for the people who were into this, I believe it was quite an attractive place and there was lots of land. So they just lived off the land. He became a state senator and some of his many kids became musicians and all toured in the US. They’re quite famous. Jewel is his granddaughter. Now they have a reality TV show about living in Alaska. It's pretty funny. Kilcher is not a common name, but when you search it, you find lots of the Kilchers in Alaska, and then I'm somewhere at the bottom.

Dean: I actually googled it and realized that there's also someone with your last name that played Pocahontas in a Hollywood video or something like that.

Yannic: Q'orianka is kind of from the same lineage. They're entertainers and survivors. They're very tough people who are very skilled at many things. Big respects.

Dean: So it's a good family to be a part of. How did you decide to start the YouTube channel?

Yannic: I don't know exactly what led me to have the idea, but I had to read some obscure reinforcement learning papers which were about how to plan, how to bring planning into reinforcement learning, and in a bit of a different way than AlphaGo would do it with the Monte Carlo Tree Search, but more like how to make the agent learn to plan. So I had to read a bunch of these papers, and for some reason, I thought: if someone else had to read them at some point, maybe it would help them if I made a video of it. I saw that there was a lot of machine learning content for beginners on Youtube, like how to get into it, or the basics of deep learning, but also classic machine learning, but then not a lot of advanced content. So not a lot of content that would get you essentially from bachelor's or master's level at University to the current frontier of research. So I thought, I could just start providing that content. Maybe it would help someone. Additionally, it forces me still to pay attention to what I read and really understand it, and that helps tremendously, I believe.

Dean: I think that these things always start from a personal need and then, as you say, a lot of people have this issue of how to understand these complex topics. Was there a moment of change? Was there a single paper that you saw take off from the beginning?

Yannic: I believe I uploaded the first ones in 2017. It would not really take off until 2019-2020. I made these videos not regularly, but on and off for a long time and I got maybe 100 views. I wasn't looking for views in any way, and then I believe the video on Attention Is All You Need was quite popular. And it still is, it's an evergreen. People are going back to that paper to understand how attention mechanisms and transformers work and so on. Video-wise, it was a crappy video, I used Adobe Acrobat Reader and the pencil functionality in that, so it's really crude. I used the default thumbnail that YouTube gave me, which you should never do. I just put Attention Is All You Need in the title and still, to this day, people find it really helpful. That gave me a bit of views, which I didn't realize. Then, during COVID, all of a sudden I found myself with a lot of time because life slowed down around me. And at the same time, I was quite deeply impressed by other YouTubers like PewDiePie or Casey Neistat who were doing the daily upload thing. And I just asked myself, can I do daily uploads? What would it take to do that next to doing the Ph.D.? And it turned out that if you make that your mission, it somehow works. It's crazy that once you say, I'm going to do daily uploads, it works somehow. The volume of the daily uploads combined with everyone sitting at home spending more time online gave me a bit of a boost.

Dean: People sometimes don’t understand that content has a compounding effect. You don't see the dividends early on, but later in life, the fact that you have that older content is going to be really helpful, because people come for one content piece and they find that you have another ten that are interesting and so you have to push through it. For podcasts, we haven't crossed that border. People usually say it's ten episodes and this is going to be episode seven, but we're getting to that point where it becomes a habit, and we're already seeing the listeners or viewers coming in. The other interesting point that you mentioned is BERT, and that Attention was the moment of truth or the moment of change for the channel. I've had Julien from Hugging Face on this podcast and he spoke about the meaning of BERT for their open source project. It seems like BERT changed a lot of lives except for the people who created it, so that's also interesting. That's how I discovered your channel as well. I was working on the Reproducibility Challenge with a bunch of great people from the Fast AI community, and the paper that was chosen was the Reformer paper. Reformer is an evolution or something that's built on top of the Transformer. So, I was like okay, I want to take a step back and dive deeper into the theory behind first the Transformer, then the Reformer, and you have two great videos about both of those. One of the challenges is, you get to these papers and you try to read them yourself, which is something that I always do and I feel like I have the background to do that, and still, a lot of times you feel like there's a lot of information around the core idea. I want to believe that the authors do that to make it more accessible, but many times, it actually obscures the core idea. Watching your video, where you only talk about the core idea and explain how that works, and then going back and reading the paper again, then everything falls into place much better. The attention mechanism is not as straightforward and getting a deeper understanding of how it works is very useful, if that's something that you're interested in or working on. I have a question from a friend who's now working on his master’s studies. How long do you spend on an average paper that you review? How long does it take to create a video?

Yannic: It depends fully on the paper itself. There are some papers which are in the domain that I'm familiar with and a modification to some architecture that is a straightforward idea, implementation of the idea, experimental evaluations, a bit of ablation or introspection. I believe I read that paper twice and I'm good to go, feeling confident that I can bring across the core idea and what the paper wants to communicate. The other papers take way longer, I reread them, I go look for additional information. I usually try to read the paper until I understand it, whatever that means. That might take me a few hours or so. I sometimes don't read appendix, which I shouldn't if I review it fully, but sometimes the appendix is super duper long. I usually read and then sleep. Not deliberately sleep, but I try to do it on separate days because I feel, for whatever reason, that when I've slept between reading and recording the review, it just flows better, I have more clarity and an idea of how I can tell it, or maybe I understand it better. Maybe it's just a myth and it's become a placebo effect for me. But I do notice a difference in quality when I do a review on the same day I read the paper, than when I do it on the next day. The recording itself is mostly a single take. Sometimes I screw up the names of the authors, then I restart, but then I just do a single take again. There's a bit of post-processing and uploading, making chapter annotations, making the thumbnail, and so on, but there's not too much overhead.

Dean: I feel like that is unique and you're very humble, but at least from my perspective, from speaking to my friend, you're probably world class at taking notes on machine learning research. Simplifying complex ideas is something that a lot of people are working on, whether it's in industry or academic research, it's something that you have to do, righ? It's super important, otherwise you're going to get stuck. If there are listeners that are now getting into research or that need to do this on a regular basis in industry jobs: except for sleeping on it, which I think is a good idea for anything complex that you need to do, how do you do this, what's the process?

Yannic: It's a matter of practice. I was in this field, I did a master's where I mostly took machine learning classes, then did the PhD, where I read quite a bit. So I was always broadly interested. I used to read all of arXiv. Actually, I had a bot that downloaded new arXiv papers every day when this was still possible. I didn't read all the papers, but I read through all of the titles and abstracts and then some papers I would actually read in full. It is mostly a matter of practice, experience talking to people, and knowing what's going on in all the different subfields. With time, if you're "in it", it gets easier, as with everything complex. You'll see the same patterns over and over again. Someone develops some new architecture: sure, it's a new architecture, but the reasonings, the things they're going to do in terms of experiments, and so on, they're going to be largely the same, so you know what to expect, you see when something's unusual because you've seen ten other things before that are like it. I don't believe it's necessarily a matter of skill or an inherent talent or a secret methodology or anything like this. If you do it for a while, you get better at it. I totally see people in industry who every now and then have to read a paper to get up to speed on some new technology, and somehow that can take days or weeks because they're walking on this paper. They're not super aware of what's going on in the field around that paper. The paper doesn't explain everything in full detail, but when you're inside it, you're quickly able to get up to speed on anything.

Dean: Do you think that implementation is important for understanding? Do you find yourself implementing papers or looking through the code, or do you think that the paper is or should be self-contained?

Yannic: I don't implement the papers that I review because just that would be a massive amount of work. I do believe, though, that to truly understand, it is very helpful to go ahead and implement that paper. Because while implementing it, you also see, okay, here's a tricky bit. And then if there's reference code by the authors, you'll go look and say, they kind of cheated here and they brought in some extra thing there, and so on. If you need to actually work with a technology that's described in a paper, absolutely, go look at the code, implement it yourself and see whether you can reproduce the paper itself, even minimally. Very often, you'll find you can't.

Dean: Do you think that implementing the papers that you read is important to understanding them or should the ideas be fully contained within the paper?

Yannic: personally don’t implement the papers that I review because it just takes too much time implementing something like this and really getting it to work. Having the idea and actually making it work are two entirely different things. Ideas are plentiful, the idea is usually not the difficult part: it’s actually making it work that is. I sort of trust the papers’ results. However, if you are relying on a technology, if you do have to work with a technology that’s described in a paper, I’d recommend you definitely go ahead and implement it. You’ll learn the tricky bits. You’ll learn where it works, where the authors had to cheat a bit and so on. Even better if there’s reference code, look at it and see what’s going on. Generally, I don't trust papers' results unless they have been verified by multiple other sources. By looking at the paper, you can get a feeling of how hyper-optimized it is to make it appealing to publish, but you only truly figure it out once you implement it and try to get it to work. For a regular paper in machine learning, you’ll often find that you won't be able to get it to work as well as the paper describes.

Dean: A lot has been said about reproducibility in machine learning and it's a subject that's very close to me and to us working on DagsHub. Do you think that that's unique to the way machine learning is done or to the incentives that we have within the field? Is this a real problem and do you have any thoughts on how we can solve it?

Yannic: I don't believe it's necessarily unique. There is some uniqueness to machine learning in that we've had this explosion in the last few years, the increase of the size of the field, which leads to a situation where most reviewers of papers are quite new to the field and don't have much experience. Even if they do have experience, there's just such a breadth of the field that your reviewers most likely aren't super duper experts in the paper that they're assigned. So they guess and they don't reproduce either. I don't believe anyone. I have also actually re-implemented something very few times, just because I knew I could do it in under 30 minutes, and I absolutely knew that what it would show is that the paper was wrong. In that point, I will do it. But other than that, I didn't, and I don't think anyone else does. It's almost a crisis, but it exists in many fields. If you look at psychology or something like this, they're just p-hacking their way to publish some papers that show some weird effect, like if you only read with your left eye while jumping around, your empathy goes up or something. It exists in many fields because the stakes of mistakes are relatively low. If your method doesn't work as advertised, you don't get punished for it. No one is getting hurt because anyone who's implementing it is testing it first, anyway. It's not like medicine where you propose some new drug and you say, wow, this really works, and then millions of research dollars go into that and people are getting hurt. There aren't many stakes. We're moving away from that, but it's still the case that if you have good numbers, your paper is much more likely to get accepted then if you have a new method and you say, my method is interesting, it almost reaches the methods that already exist, it just does it in a different way. Good luck getting that paper published. It happens, but it doesn't happen often. However, if you say, here's the thing, whatever I do, I beat everything else on the planet, your paper has a very high likelihood of getting published unless the reviewers really don't believe your experimental results. So far, the attempts of the community to make things more reproducible have been quite fruitless, because what you're required to do is to submit the code of your experiments. That's not helping because most people don't outright fake their numbers. They don't sit there and go, my method has 95.7%. Most people will cheat and hack and try random seeds and optimize for more and longer and so on until their method reaches that higher number. But that means that if I give you my code and you run that code again, you'll get the same number. It doesn't mean that the method in a fair mode of comparison across the spectrum of data sets that exist and so on, or even in the wild, would hold up to the challenge. That is a challenge. I don't exactly know how to solve it. Maybe we should step back and question whether this conference review system that we have is the problem, because I've seen a lot of success in people putting things on arXiv, advertising it a bit and then other people trying it. Through the social network, and I mean that in a classic sense, the peer bonds between people. It propagates. I tried this and it didn't work and all my friends tried it and they said it didn't work either and so on. I have much more trust in that than in the review system.

Dean: Cultural norms are an important part of finding some solution which actually works. One of the challenges that we've experienced with this is that a lot of that knowledge is lost to the people that already have it. Sometimes someone on Reddit is so angry that they couldn't reproduce a paper that they call out the people who created the paper on how hard it was to reproduce. Most times, that doesn't happen and so, you go to meetups and you speak to people and they're like, yeah, everyone who's tried to run this paper fails. It doesn't work, it only works in theory. There’s no structured way to do that except for the Reproducibility Challenge. I understand what you're saying, you're basically creating more content to be reviewed, and the standards for the review. We’ve now supported the Reputability Challenge as well and we've gotten some submissions, which are awesome. People invest a lot of time in taking these papers, which are in many cases, as you say, state of the art, they do world class performance on some challenge, and then in most cases, there is a kernel of truth, it’s not an outright lie. We have seen a few people that were especially diligent and tried different configurations and basically found out that the paper is reproducible, but only in a very narrow set of parameters, data sets, whatever it is. That’s disheartening as well. If you're an industry professional, then you know a lot of things that wouldn't be considered research-worthy-knowledge, but are super important for industry, because you need to know that this paper, even though it sounds amazing, will not work in production. But there’s no organized way to collect that information and that is a problem. Do you feel like there are any tools that could help, aside from the culture, norms and the structure of conferences and review?

Yannic: It is legitimately quite hard to do. I don't think there's going to be the one way to do it, because as a paper author, there's always someone who reproduces your paper, doesn't get your performance and so on. You can always say you did it wrong and you might actually be convinced that they did it wrong. The tricky bit is that sometimes, they actually did it wrong. All these ideas that don't work, they sometimes only need a tiny modification, a tiny idea, to actually get them to work, that's the tricky bit. The last years have been full of methods that were invented in the 90s and tried again in the 2000s and then tried again at the beginning of the deep learning revolution, and never really worked. And then someone comes along and says, if I make this modification it works just fine. Or if I throw more computers at it, it works fine. We’ll just have to continue to work in this essentially decentralized way, message-passing from person to person to build a latent knowledge. I'm not sure if we could build good tools to support that. The open source and open research attitude that the community has with publishing on arXiv a lot and advertising around, talking in the open, is quite helpful.

Dean: You’re of working on a startup as well. Do you have different standards for what papers you think about in the context of the startup, compared to how you were as a PhD student?

Yannic: Definitely. There’s also the issue that you're not looking for that 2% gain in the startup, or at least we are not. If something is not twice as good or ten times as fast or something like that, it's usually not really worth it. It might be worth it once we're at that edge and really want to push the performance to its limit. We're in legal tech, and the law is quite a conservative industry. They just got done with digitalization, so you can imagine, there is a much more low-hanging fruit to be done with good models, modern models, modern techniques and so on. But it doesn't have to be the super duper complex mega state of the art method in order to already bring that value. So, we're thinking much more in terms of bang for the buck. When you're a PhD student, you're essentially thinking of, okay, I need to somehow contribute to the knowledge of the world, right? I need to figure out something that no one has figured out before and that can be a small thing as long as it's a new and important thing.

Dean: Most people know about the YouTube channel, they know about the work that you do with making sense of complicated ideas and papers. Can you share what you're working on in your startup?

Yannic: We essentially build NLP tech for any sort of legal document people. So, lawyers first and foremost, but also legal departments, government agencies, etc. That goes in various directions, since as you might know, the models all tend to converge, especially in NLP. We're able to tackle the different challenges there. So, we're quite young but it's been fun so far.

Dean: Do you feel like the background that you have in research led you to start the company? How did it happen?

Yannic: During the PhD, you need to collect some formal credits, so people take some random lectures. We took this one called Building a Robot Judge, which was broadly about legal tech, and that seemed like an interesting area and there seemed to be a lot of unsolved challenges that we found to be quite tackleable with the modern NLP tools that exist. So, we decided we'd do that. It was more or less random.

Dean: I guess it couldn't be completely random if you decided to start a company off of it, but it seems like it was a gradual process getting into entrepreneurship. Do you already have models running in production?

Yannic: We do have models running but, as I said, they're modern, but not ultra complicated. I find that data and the preprocessing, post processing, making sure that the quality is good, etc., will 10X your performance any day, and I believe most practitioners would agree. Whereas improvements in the model might bring you that 5% improvement, which is good, but if you have to decide where to focus your attention, oftentimes you'd rather invest it into doing the correct things rather than improving the models themselves.

Dean: What are the main challenges around either getting the models into production or as you say, collecting the data to improve the models?

Yannic: It depends on the task you do. A lot of data is public but then the most interesting data is not public. We tend to work with larger customers, which fortunately will always have a repository of data that we can work with. I believe the bigger challenges are not necessarily machine learning, so that the sad truth is that even if you run the machine learning an AI startup, you won't do AI all day long. It's a lot of integration, organization, some bureaucracy. If you have to work with someone else's system, because most often they just don't want to send you their data. They're like, yeah, here is all our documents clients and super confidential stuff, have it - it doesn't happen. You usually need to go to them and so there's a lot of challenges around deployment, how to keep things secure, how to keep things running when it's not on your particular system, etc.

Dean: Not everything is glamorous. There is sort of a recurring theme where even the technical challenges are usually around connecting the work that you would have done in a lab into where the customers or users really are. In your case, does that mean that your deployments are on premises? We've heard two ways to tackle this: one of them is creating some secure gateway for the data to get to the models. The other is actually deploying the model on the customers’ device. How does it work for you and what are some of the specific challenges you’ve faced?

Yannic: We do both. With the rise of private clouds, I believe that has become a lot easier. I haven't been around before that, but now most larger companies will have something like a cluster, maybe they’ll already have a private cloud or they can give you a bunch of machines where you can pull one up. We pay special attention that we are not very dependent on the offerings of the individual cloud vendors. We do use their offerings, but we always pay attention that we have, an alternative that works on premise. So if we use specific databases, we can swap them out, etc. That makes life a whole lot easier in that we can tell the company, please give us a name, space or something in your private cloud that we can deploy to, and then we'll take care of the rest. We're mainly cloud based like this and then it's not that much of a difference where that is. It depends mostly on what the people want, if they say no way our data leaves our premises, then we'll do it the private cloud way.

Dean: What do you think are the most interesting, exciting trends in machine learning operations right now?

Yannic: There’s probably ten answers in every subfield. We've already seen that the combination of modalities, so deep learning models, they used to be largely single modality, maybe sometimes translating between two, like speech to text or text to speech. But now we really see the code training and combining of different modalities, images and text together is probably the biggest thing, all the AI arts that exist right now, it's really crazy. It's really cool field. I believe that's going to enable a lot of people to do artistic things, and it's a really new way of interacting. If you were a graphic artist, like a painter or something, there's two aspects: you need to have the creativity to come up with a motif or something like that. And then you also used to have to have the skill of actually putting it down and there's a deep connection between the two, and that will never be taken away from painters. But these AI art models, if you are creative with your words and you understand the system a little bit, you no longer necessarily need that raw, mechanical skill. For example, I don't have the mechanical skill of drawing, my drawings look like a five-year old’s drawing. It gives people like me an opportunity to also dive into this art world and I believe we're going to see these kinds of things with much more modalities. The other thing is: we've seen a bunch of models now that are able to go look for information at inference time. We've mostly seen conversational models like chat bots etc. that not only learn to give you an answer, but at inference time will go use a search engine to look stuff up. That's very exciting because it breaks the trend of more and more compute in these language models, because we can focus on the language itself and the mechanics of acquiring knowledge, and all the knowledge itself can reside somewhere else and we can provide it at inference time. These are two of the things, possibly a third thing would be something like graph neural networks rising in popularity. With AlphaFold and other developments. I would say they are on the upswing and we see a lot of hardware startups focusing on graph neural network acceleration. If these things come together, that could bring quite a boost to a bunch of new problems that we really haven't been able to solve so far.

Dean: Especially with the AI art models, I’m always curious whether we've already seen the killer app for that. I'm not a big crypto person for good or for bad, but this feels amazing, especially now, when NFTs people are using these AI models to generate art, which they then mint it as NFTs and things like that. I'm like graphic designer by hobby. I feel like the AI-assisted art is going to be amazing once we've really nailed it. I feel like that's still in early days, like the things that Nvidia have already achieved are amazing. The moment it gets to the point where you can use it in Figma or something like that as part of your day to day work, that would be incredible. I also think that GauGAN, specifically, is compute-heavy, so I'm not sure how reasonable it is to use it in a day to day way for a reasonable price. I think all of those topics are really exciting and there's still work to be done. So, if you are looking for a research field or trying to build a startup on them, I think that there's still work to be done. We haven't figured out how to apply them to the real world well enough. As you said, in earlier days, you could actually download all of the new arXiv papers and quickly skim over them and decide what's interesting. Now that is probably no longer the case, so how do you keep up to date with everything that's going on? How do you decide what to focus on?

Yannic: It's become a challenge. Everyone has their own method, which in part I like. I like that there isn't a single source of truth that says, here are the good papers. I believe I cast my net broadly in that I do sometimes even look at arXiv itself, but there are various ways in which I get informed about new papers. From various Twitter feeds: Twitter people who post interesting new papers, but also following researchers on Twitter, because they advertise their new papers. Those are already two disconnected things, right? Because the people who run the Twitter feeds, they focus on what they think are interesting, and then the authors of papers obviously focus on their own papers. Then, some papers usually surface on Reddit, which is very prone to hypey and flashy sounding things, because on Reddit, there can only be one thing per day that is really high up. That is yet another way of consuming these things. I also have a great Discord community around my channel and people post papers there, new papers that they find and discuss them a little bit. I think what you want to look for is just a stream of different opinions and different objectives, of collecting your information so that you probably won't miss the big things. If DeepMind puts out a new AlphaFold paper that breaks every metric, you won't miss it, right? But you want to keep your feelers out for unexpected discoveries. I cast my net as wide as I can while still being manageable enough. I also use various of these recommendation engine type things. There's no one thing I can recommend to people, it also depends a little bit on how much stamina and will you have to skim through new things and decide whether they're good or not.

Dean: You're obviously a busy person: how many of those papers do you do every week or month? Do you have some order of magnitude?

Yannic: Skimming through the feeds. If it's social media, that's obviously interleaved with other stuff. And in other ways yeah, I don't know, I really don’t have a good metric. But then I save maybe a few dozen papers a week to closer inspect. And then through them I decide pretty quickly which ones I like.

Dean: I don't have a reference for what an average PhD student does, but a few dozen to look a bit deeper into in a week is a lot.

Yannic: “Look a bit deeper” just means reading title and abstract, looking at figure one, reading a bit around it and then deciding whether looks good or not. If not, I go super deeply. Maybe a first year PhD student would read pretty broadly, but most PhD students, once they're settled into their problem in their field, they will look at the new papers in their fields, but they’d be so specialized that most often, there will be one new paper a week, at most. It’s kind of sad because most of the time, PhD students are more advanced. They kind of lose touch with the current research happenings.

Dean: I wonder if it's a matter of personality and then some people go the extra mile and read in other areas. I feel like there's a huge upside to that, which is cross pollination of ideas. The community is so broad, obviously in some cases you can apply something from NLP to computer vision. But even within NLP, if you are super specialized in a sub-subfield and you don't even read from other sub-subfields, then that is sad.

Yannic: I guess it's personality as well as a tradeoff between focus and diversity. It's probably also good that we have different people, some that are hyper focus and some that diversify but then never get super altered into any one thing.

Dean: I haven't had the chance to speak with the authors of the Transformer paper. It would be interesting to take these world changing papers and see if the personas behind them are diverse in their approaches and backgrounds. You have more experience now that you're also bringing authors onto your channel, to review it with them. Do you feel like it's very diverse?

Yannic: I find that there is a diversity in how diverse people are. As I said, there are some people who you realize that they are quite focused on their particular niche and what's going on there, and then there are some people who cross pollinate, as you say.

Dean: Youyou have a discord channel. Is that openly available or not? Okay, so we'll add the link.

Yannic: Sure, I should probably get like a real link to that but it's somewhere in the description of my videos.

Dean: Do you have any recommendations for the audience.

Yannic: Sleep well. I keep rediscovering how much of a superpower it is to be well rested and I am the worst person at sleeping well. Suggestions for content is tough. I tend to be interested in stages in various things, so I'm not sure I have a great answer. I'm currently quite interested in how YouTube channels that used to be popular are no longer popular, because I'm quite paranoid that that's going to happen to me. So I'm watching a lot of documentaries and analyses on how YouTube channels, or in general more popular things, fail or fall out of fashion. Just to learn from that and try to innovate and try to stay helpful to people so that people continue to be interested. I don't think that's going to be super helpful to a lot of the audience here, except if you want to get into content creation, but it's definitely interesting.

Dean: Building a personal brand is important no matter what you do and in the ML world, for good or for bad, it is also very important. You can see it in the fact that a lot of the discourse is happening on Twitter. And so, people that have a personal brand have more clout in those discussions. There's a YouTube channel I'm also subscribed to called Veritasium. He's a physicist by training and he talks about a bunch of different things. He also explains complex topics in sort of math and physics and sometimes computer science in simplified ways, which makes them more accessible. He had an entire video about the YouTube algorithm and how he thinks about it as a content creator, YouTuber influencer, whatever you want to call him. I don't know if you're familiar with that video.

Yannic: Is he actually a trained physicist or is he just like, super interested in science? I'm not sure.

Dean: I think he did a Master’s; I think so. I'm not 100% sure. I vaguely remember that he is trained and that he said he was considering pursuing more research and decided that sharing knowledge is more interesting. But I might be completely wrong.

Yannic: I don't want to optimize for the algorithm per se, because the algorithm also changes. I don't want to become an algorithm changer, but I want to stay relevant to the ML community and serve them as well as I can, because it's fun and if it's useful, then all the better.

Dean: I think most of those videos that give advice on what to do end up with the timeless advice of create good content. So you should always adhere to that. I'm now finishing reading a book called The Fifth Season, that my brother got me. He has good recommendations in general. It's mostly fantasy, a bit Sci-Fi. I'm not at the end yet, but it's very good. Thing that I've been thinking a lot about recently are solving for complex machine learning use cases, and how that's done in the real world, as opposed to in theory. A lot of the discussions I've had recently are about active learning and how you make sure that your model continues to learn once you've deployed it aside from, as you said, improving architecture, which is nice, but it usually doesn't lead to the gains that you expected, and so there are a lot of interesting challenges there. I don't have a canonical book to recommend about it, I should probably come up with one, but that's something that I'm very interested in. So, maybe someone in the audience has recommendations, I'd love to hear them either on YouTube or on our Discord channel.