DataHack Resources 2019
This is a list of resources for DataHack, and includes link to repositories and guides for machine learning and data science. Want to join DataHack 2019? Go to https://registration.datahack.org.il
Have any suggestions? Pull requests are very welcome.
Table of Contents
- The Basics
- General Resources
- Vision & Image
- NLP
- Voice & Audio
- Tabular Data
The Basics 🛠️
- Pandas - pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
- Numpy - NumPy is the fundamental package for scientific computing with Python. It contains among other things:
- a powerful N-dimensional array object
- sophisticated (broadcasting) functions
- tools for integrating C/C++ and Fortran code
- useful linear algebra, Fourier transform, and random number capabilities
- Scikit-Learn - Machine Learning in Python
- Simple and efficient tools for data mining and data analysis
- Accessible to everybody, and reusable in various contexts
- Built on NumPy, SciPy, and matplotlib
- TensorFlow - Google's machine learning (and deep learning) framework
- PyTorch - Facebook's machine learning framework - Tensors and Dynamic neural networks in Python with strong GPU acceleration.
General Resources 🤖
Blog Posts
Awesome Lists
Repositories
- Tensor2Tensor GitHub repository - A library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research
Datasets
Vision & Image 👁️ + 🖼️
Awesome Lists
Datasets
NLP 💬
Awesome Lists
- Awesome NLP - 📖 A curated list of resources dedicated to Natural Language Processing (NLP)
Repositories
- Fairseq - A sequence modeling toolkit enabling training custom models for translation, summarization, language modeling and other text generation tasks. It provides reference implementations of various sequence-to-sequence models
- OpenAI GPT-2 - A really good pretrained language model (Code for the paper "Language Models are Unsupervised Multitask Learners")
Datasets
Voice & Audio 📣
Awesome Lists
Tabular Data 📊
Blog Posts
Repositories
- XGBoost - Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Flink and DataFlow