This is a list of resources for DataHack, and includes link to repositories and guides for machine learning and data science. Want to join DataHack 2019? Go to https://www.datahack.org.il

Dean 8d8109cf05 Added table of contents 3 weeks ago
README.md 8d8109cf05 Added table of contents 3 weeks ago
datahacklogo.png e6ba2e9a1e Upload files to '' 4 weeks ago

README.md

DataHack Resources 2019

This is a list of resources for DataHack, and includes link to repositories and guides for machine learning and data science. Want to join DataHack 2019? Go to https://registration.datahack.org.il

Have any suggestions? Pull requests are very welcome.

Table of Contents

  1. The Basics
  2. General Resources
  3. Vision & Image
  4. NLP
  5. Voice & Audio
  6. Tabular Data

The Basics ๐Ÿ› ๏ธ

  • Pandas - pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
  • Numpy - NumPy is the fundamental package for scientific computing with Python. It contains among other things:
    • a powerful N-dimensional array object
    • sophisticated (broadcasting) functions
    • tools for integrating C/C++ and Fortran code
    • useful linear algebra, Fourier transform, and random number capabilities
  • Scikit-Learn - Machine Learning in Python
    • Simple and efficient tools for data mining and data analysis
    • Accessible to everybody, and reusable in various contexts
    • Built on NumPy, SciPy, and matplotlib
  • TensorFlow - Google's machine learning (and deep learning) framework
  • PyTorch - Facebook's machine learning framework - Tensors and Dynamic neural networks in Python with strong GPU acceleration.

General Resources ๐Ÿค–

Blog Posts

Awesome Lists

Repositories

  • Tensor2Tensor GitHub repository - A library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research

Datasets

Vision & Image ๐Ÿ‘๏ธ + ๐Ÿ–ผ๏ธ

Awesome Lists

Datasets

NLP ๐Ÿ’ฌ

Awesome Lists

  • Awesome NLP - ๐Ÿ“– A curated list of resources dedicated to Natural Language Processing (NLP)

Repositories

  • Fairseq - A sequence modeling toolkit enabling training custom models for translation, summarization, language modeling and other text generation tasks. It provides reference implementations of various sequence-to-sequence models
  • OpenAI GPT-2 - A really good pretrained language model (Code for the paper "Language Models are Unsupervised Multitask Learners")

Datasets

Voice & Audio ๐Ÿ“ฃ

Awesome Lists

Tabular Data ๐Ÿ“Š

Blog Posts

Repositories

  • XGBoost - Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Flink and DataFlow