Recommended Data Science Content Sources

Recommended Data Science Content Sources

Data Science Oct 19, 2020

You are what you eat, and it's your job as a knowledge worker to be on the lookout for a good information diet. In this post, I want to share the sources of information regarding data science, AI, and the tech surrounding it, which I found most useful or appealing. I hope it helps you as well!

Of course, this is all Just My Opinion™. If you think I should change something, feel free to yell at me @Guy_T_Sky :)

In no particular order:

Two Minute Papers  

Good for staying up to date, updated frequently.

The host, Karoly, has an infectious enthusiasm and positivity for all the topics he covers.

Expect coverage of interesting papers not just about AI, but also about computer graphics and other visually stunning topics.

Yannic Kilcher  

Yannic explains prominent deep learning papers in a thorough, technical way. Rather than reading the paper yourself, it's often faster and easier to watch one of his videos in order to understand important papers in-depth. The explanations capture the punchline of the papers in a deep way, without handwaving away the math or getting lost in the weeds.

Yannic also shares his more subtle perspectives - how papers relate to each other, interpretations of the wider meaning and how seriously to take the results, etc. These insights are harder for newcomers (or non-academic practitioners) to arrive at by themselves.

Distill.pub  

In their own words:

Machine Learning Research Should Be Clear, Dynamic and Vivid.
Distill Is Here to Help.

Distill is a unique publication for machine learning research. It promotes articles which use stunning visualizations to give the reader a more intuitive understanding of the topics. Spatial reasoning and imagination tends to work very well to understand topics in machine learning & data science. This is not surprising, considering so much of the fundamental math behind the fields is linear algebra and calculus. In contrast, traditional publication formats tend to be rigid in their structure, static, dry, and sometimes unnecessarily "mathy".

Chris Olah, one of the creators of Distill, also has an amazing personal blog: https://colah.github.io/.
It hasn't been updated in a while, but it's still a collection of some of the most well-written explanations on Deep Learning ever written. The explanation of LSTMs in particular was a great help to me!

Visual explanation of gradient descent optimization dynamics, from distill.pub
Source: https://distill.pub/2017/momentum/

Sebastian Ruder  

Sebastian Ruder writes a super high-quality blog and newsletter, primarily about the intersection of neural networks and NLP. He also has lots of advice for researchers, and reports on academic conferences, which could be very useful if you're in academia.

His articles tend to take the form of surveys - summing up and explaining the state of the art research and techniques in an area, which means it's extremely useful for practitioners who want to orient themselves fast.

Andrej Karpathy  

Andrej Karpathy needs no introduction! Besides being one of the most known deep learning researchers on earth, he is a font of creativity, creating widely used tools like arxiv sanity preserver as side projects.

Countless people have entered the field via his Stanford cs231n course, and you would benefit from committing his neural network training recipe to heart.

I also recommend watching his talk about the real-life problems Tesla needs to overcome when trying to apply machine learning at massive scale in the real world. It's impressive, informative and sobering.

Besides writing about ML directly, he also writes some good life advice for aspiring researchers.

Uber Engineering

Uber's engineering blog is truly impressive in scope and breadth, covering a ton of topics, AI in particular.

What I particularly like about Uber's engineering culture, is their tendency to spin off super interesting and valuable open source projects at a head-spinning pace. Some examples are:

OpenAI Blog

Putting aside any controversy, the OpenAI blog is undeniably beautiful. From time to time, it posts content about deep learning insights which only OpenAI's massive scale can reach, such as the hypothesized Deep Double Descent phenomenon. They tend to post infrequent, high-impact pieces, so it's a high noise-to-signal ratio.

Deep Double Descent, from OpenAI
Source: https://openai.com/blog/deep-double-descent/

Taboola Blog

Not as well known as some of the other suggestions in this post, I find the Taboola blog to be unique - it deals with very down-to-earth, real-life problems when trying to use ML in production for "normal" businesses - less self driving cars and RL agents beating world champions, more "how do I know if my model is now predicting things with fake confidence?"

These problems are relevant for almost everyone working in the field, and they get less press coverage than the more sexy AI topics, but solving them correctly still requires world class talent. Thankfully, Taboola has both that talent AND the willingness and ability to write about it, so that others can learn as well.

Reddit

Alongside Twitter, there's nothing quite like Reddit to get caught up on papers, tools, and the wisdom of the crowds.

State of AI

Published only annually, but contains very dense information content.

Relative to the other sources in this list, it's more accessible to (non-technical) business people.

What I like about the report, is it tries to give a more holistic view of where the industry and research is going at 10,000 feet - tying together advances in hardware, research, business, and even geopolitics.

Be sure to start by skipping to the end to read about the conflicts of interests :)

Podcasts

To be frank, in my opinion podcasts are problematic for learning about technical topics. Most of them have a hard time explaining the things that need explaining using audio only, as data science is a very visual field. Podcasts tend to succeed only in giving you leads for deeper investigation later, or in fun philosophical discussions.

Nevertheless, here are some recommendations:

  • Lex Fridman's podcast, when he has on prominent researchers from the field of AI. The Francois Chollet episodes are particularly good!
  • Data Engineering podcast. Good for hearing about new data infrastructure tools in your audio-only time (though COVID has cut down on that time...).

Awesome Lists

Less content sources to follow, more useful resource when you know what you're looking for:

Twitter

Machine & Deep Learning Israel

I may be a bit biased, but I feel that Israel (my home-country) has an amazingly vibrant and professional data science community and ecosystem, with top-tier talent and a very practical, no-nonsense approach.

This Facebook group is always buzzing about the latest trends, events, and open source projects, as well as deep discussions about more timeless subjects like career planning. Top-tier professionals in the field weigh in frequently, and the result is an amazing resource for anyone who wants to learn about the field.

Unfortunately, this resource is only relevant if you happen to be a Hebrew speaker. Sorry, 99.9% of the population of Earth ¯\_(ツ)_/¯

Conclusion

This blog post may be updated as I find more wonderful sources of content, which would be a shame not to include in the list. Feel free to contact me @Guy_T_Sky if you want to recommend some new source!

DAGsHub is hiring a Data Science Advocate, so if you are creating your own data science content, we think we have a very fun offer for you!

Tags

Guy Smoilovsky

Co-Founder & CTO @ DAGsHub

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.