This is a list of resources for DataHack, and includes link to repositories and guides for machine learning and data science. Want to join DataHack 2019? Go to https://www.datahack.org.il

Guy cc5cefcf4b Fix bogus link and put images in funnier places 1 month ago
Leaderboard.md cc5cefcf4b Fix bogus link and put images in funnier places 1 month ago
README.md 9f9dedcc53 Using tensorboard with pytorch 1 month ago
datahacklogo.png e6ba2e9a1e Upload files to '' 3 months ago
leaderboard-meme-1.jpg 0ac67ca162 Upload some leaderboard memes 1 month ago
leaderboard-meme-2.jpg 0ac67ca162 Upload some leaderboard memes 1 month ago

README.md

DataHack Resources 2019

This is a list of resources for DataHack, and includes link to repositories and guides for machine learning and data science. Want to join DataHack 2019? Go to https://registration.datahack.org.il

Have any suggestions? Pull requests are very welcome.

Table of Contents

  1. List of Challenges
  2. The Basics
  3. IBM Cloud Resources
  4. General Resources
  5. Vision & Image
  6. NLP
  7. Voice & Audio
  8. Tabular Data

List of Challenges πŸ†

If you're working on one of these challenges, be sure to check out the leaderboard instructions.

The Basics πŸ› οΈ

  • Pandas - pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
  • Numpy - NumPy is the fundamental package for scientific computing with Python. It contains among other things:
    • a powerful N-dimensional array object
    • sophisticated (broadcasting) functions
    • tools for integrating C/C++ and Fortran code
    • useful linear algebra, Fourier transform, and random number capabilities
  • Scikit-Learn - Machine Learning in Python
    • Simple and efficient tools for data mining and data analysis
    • Accessible to everybody, and reusable in various contexts
    • Built on NumPy, SciPy, and matplotlib
  • TensorFlow - Google's machine learning (and deep learning) framework
  • PyTorch - Facebook's machine learning framework - Tensors and Dynamic neural networks in Python with strong GPU acceleration.

IBM Cloud Resources

Specially for DataHackers IBM offers a 6-month promo code worth $1,200 USD in IBM Cloud services including:

  • Data analysis with Jupyter notebooks or RStudio
  • Dashboards of data visualization without coding
  • Object storage to upload your data
  • Streaming data analysis
  • Data Refinery to cleanse and shape data
  • Free data sets from IBM Watson Community

How to start:

  1. Register to IBM Cloud and redeem your $1,200 promo code following the link: https://drive.google.com/file/d/1aZ1jYKX-O8-LmBqaUoN6Xu2g8DE5kYkn/view?usp=sharing
  2. Instructions on how to get started, including how to upload data to the notebook: https://drive.google.com/open?id=1bGQDijjJX2tiUBQ410e4AAMupfISTkjq
  3. Short video tutorials to learn how to use the tools

There are limited promo codes, so take advantage of this free offer before it’s gone.

General Resources πŸ€–

Blog Posts

Awesome Lists

Repositories

  • Tensor2Tensor GitHub repository - A library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research
  • HungaBunga By the one and only Yam Peleg, a brute-force AutoML package which just throws everything in Scikit-Learn at your data to see what sticks. May or may not blow up your machine.

Datasets

Experiment Tracking

Vision & Image πŸ‘οΈ + πŸ–ΌοΈ

Awesome Lists

Datasets

NLP πŸ’¬

Awesome Lists

  • Awesome NLP - πŸ“– A curated list of resources dedicated to Natural Language Processing (NLP)

Repositories

  • Fairseq - A sequence modeling toolkit enabling training custom models for translation, summarization, language modeling and other text generation tasks. It provides reference implementations of various sequence-to-sequence models
  • OpenAI GPT-2 - A really good pretrained language model (Code for the paper "Language Models are Unsupervised Multitask Learners")

Datasets

Voice & Audio πŸ“£

Awesome Lists

Tabular Data πŸ“Š

Blog Posts

Repositories

  • XGBoost - Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Flink and DataFlow