DataHack Resources 2019
This is a list of resources for DataHack, and includes link to repositories and guides for machine learning and data science. Want to join DataHack 2019? Go to https://registration.datahack.org.il
Have any suggestions? Pull requests are very welcome.
Table of Contents
- List of Challenges
- The Basics
- IBM Cloud Resources
- General Resources
- Vision & Image
- NLP
- Voice & Audio
- Tabular Data
List of Challenges π
If you're working on one of these challenges, be sure to check out the leaderboard instructions.
The Basics π οΈ
- Pandas - pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
- Numpy - NumPy is the fundamental package for scientific computing with Python. It contains among other things:
- a powerful N-dimensional array object
- sophisticated (broadcasting) functions
- tools for integrating C/C++ and Fortran code
- useful linear algebra, Fourier transform, and random number capabilities
- Scikit-Learn - Machine Learning in Python
- Simple and efficient tools for data mining and data analysis
- Accessible to everybody, and reusable in various contexts
- Built on NumPy, SciPy, and matplotlib
- TensorFlow - Google's machine learning (and deep learning) framework
- PyTorch - Facebook's machine learning framework - Tensors and Dynamic neural networks in Python with strong GPU acceleration.
IBM Cloud Resources
Specially for DataHackers IBM offers a 6-month promo code worth $1,200 USD in IBM Cloud services including:
- Data analysis with Jupyter notebooks or RStudio
- Dashboards of data visualization without coding
- Object storage to upload your data
- Streaming data analysis
- Data Refinery to cleanse and shape data
- Free data sets from IBM Watson Community
How to start:
- Register to IBM Cloud and redeem your $1,200 promo code following the link:
https://drive.google.com/file/d/1aZ1jYKX-O8-LmBqaUoN6Xu2g8DE5kYkn/view?usp=sharing
- Instructions on how to get started, including how to upload data to the notebook: https://drive.google.com/open?id=1bGQDijjJX2tiUBQ410e4AAMupfISTkjq
- Short video tutorials to learn how to use the tools
There are limited promo codes, so take advantage of this free offer before itβs gone.
General Resources π€
Blog Posts
Awesome Lists
Repositories
- Tensor2Tensor GitHub repository - A library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research
- HungaBunga By the one and only Yam Peleg, a brute-force AutoML package which just throws everything in Scikit-Learn at your data to see what sticks. May or may not blow up your machine.
Datasets
Experiment Tracking
Vision & Image ποΈ + πΌοΈ
Awesome Lists
Datasets
NLP π¬
Awesome Lists
- Awesome NLP - π A curated list of resources dedicated to Natural Language Processing (NLP)
Repositories
- Fairseq - A sequence modeling toolkit enabling training custom models for translation, summarization, language modeling and other text generation tasks. It provides reference implementations of various sequence-to-sequence models
- OpenAI GPT-2 - A really good pretrained language model (Code for the paper "Language Models are Unsupervised Multitask Learners")
Datasets
Voice & Audio π£
Awesome Lists
Tabular Data π
Blog Posts
Repositories
- XGBoost - Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Flink and DataFlow