This is a list of resources for DataHack, and includes link to repositories and guides for machine learning and data science. Want to join DataHack 2019? Go to https://www.datahack.org.il

Guy cc5cefcf4b Fix bogus link and put images in funnier places 10 months ago
Leaderboard.md cc5cefcf4b Fix bogus link and put images in funnier places 10 months ago
README.md 9f9dedcc53 Using tensorboard with pytorch 10 months ago
datahacklogo.png e6ba2e9a1e Upload files to '' 11 months ago
leaderboard-meme-1.jpg 0ac67ca162 Upload some leaderboard memes 10 months ago
leaderboard-meme-2.jpg 0ac67ca162 Upload some leaderboard memes 10 months ago

README.md

DataHack Resources 2019

This is a list of resources for DataHack, and includes link to repositories and guides for machine learning and data science. Want to join DataHack 2019? Go to https://registration.datahack.org.il

Have any suggestions? Pull requests are very welcome.

Table of Contents

  1. List of Challenges
  2. The Basics
  3. IBM Cloud Resources
  4. General Resources
  5. Vision & Image
  6. NLP
  7. Voice & Audio
  8. Tabular Data

List of Challenges πŸ†

If you're working on one of these challenges, be sure to check out the leaderboard instructions.

The Basics πŸ› οΈ

  • Pandas - pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
  • Numpy - NumPy is the fundamental package for scientific computing with Python. It contains among other things:
    • a powerful N-dimensional array object
    • sophisticated (broadcasting) functions
    • tools for integrating C/C++ and Fortran code
    • useful linear algebra, Fourier transform, and random number capabilities
  • Scikit-Learn - Machine Learning in Python
    • Simple and efficient tools for data mining and data analysis
    • Accessible to everybody, and reusable in various contexts
    • Built on NumPy, SciPy, and matplotlib
  • TensorFlow - Google's machine learning (and deep learning) framework
  • PyTorch - Facebook's machine learning framework - Tensors and Dynamic neural networks in Python with strong GPU acceleration.

IBM Cloud Resources

Specially for DataHackers IBM offers a 6-month promo code worth $1,200 USD in IBM Cloud services including:

  • Data analysis with Jupyter notebooks or RStudio
  • Dashboards of data visualization without coding
  • Object storage to upload your data
  • Streaming data analysis
  • Data Refinery to cleanse and shape data
  • Free data sets from IBM Watson Community

How to start:

  1. Register to IBM Cloud and redeem your $1,200 promo code following the link: https://drive.google.com/file/d/1aZ1jYKX-O8-LmBqaUoN6Xu2g8DE5kYkn/view?usp=sharing
  2. Instructions on how to get started, including how to upload data to the notebook: https://drive.google.com/open?id=1bGQDijjJX2tiUBQ410e4AAMupfISTkjq
  3. Short video tutorials to learn how to use the tools

There are limited promo codes, so take advantage of this free offer before it’s gone.

General Resources πŸ€–

Blog Posts

Awesome Lists

Repositories

  • Tensor2Tensor GitHub repository - A library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research
  • HungaBunga By the one and only Yam Peleg, a brute-force AutoML package which just throws everything in Scikit-Learn at your data to see what sticks. May or may not blow up your machine.

Datasets

Experiment Tracking

Vision & Image πŸ‘οΈ + πŸ–ΌοΈ

Awesome Lists

Datasets

NLP πŸ’¬

Awesome Lists

  • Awesome NLP - πŸ“– A curated list of resources dedicated to Natural Language Processing (NLP)

Repositories

  • Fairseq - A sequence modeling toolkit enabling training custom models for translation, summarization, language modeling and other text generation tasks. It provides reference implementations of various sequence-to-sequence models
  • OpenAI GPT-2 - A really good pretrained language model (Code for the paper "Language Models are Unsupervised Multitask Learners")

Datasets

Voice & Audio πŸ“£

Awesome Lists

Tabular Data πŸ“Š

Blog Posts

Repositories

  • XGBoost - Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Flink and DataFlow