https://madewithml.com/

Goku Mohandas 4ee7b12d57 Merge pull request #6 from GokuMohandas/test-act 1 day ago
.dvc 76f267609f added reproducability lessons 2 weeks ago
.github 52dc25313a added testing workflows and model registry dependencies 1 day ago
app 14ac2c81ae added testing workflows and model registry dependencies 1 day ago
config 14ac2c81ae added testing workflows and model registry dependencies 1 day ago
data e6573702b9 added local metadata store 1 month ago
docs 14ac2c81ae added testing workflows and model registry dependencies 1 day ago
great_expectations 76f267609f added reproducability lessons 2 weeks ago
model 14ac2c81ae added testing workflows and model registry dependencies 1 day ago
notebooks 14ac2c81ae added testing workflows and model registry dependencies 1 day ago
stores 75db0e1bfd reorg for versioning and DAGs 4 weeks ago
streamlit 14ac2c81ae added testing workflows and model registry dependencies 1 day ago
tagifai 14ac2c81ae added testing workflows and model registry dependencies 1 day ago
tests 14ac2c81ae added testing workflows and model registry dependencies 1 day ago
.dvcignore a1009bd6df added versioned data and experiments 1 month ago
.flake8 03ae938c6e added testing lesson 1 month ago
.gitignore 76f267609f added reproducability lessons 2 weeks ago
.pre-commit-config.yaml 76f267609f added reproducability lessons 2 weeks ago
Dockerfile 76f267609f added reproducability lessons 2 weeks ago
LICENSE 7c2bc14df2 added organization lesson for Applied ML course 2 months ago
Makefile 14ac2c81ae added testing workflows and model registry dependencies 1 day ago
README.md 14ac2c81ae added testing workflows and model registry dependencies 1 day ago
mkdocs.yml 76f267609f added reproducability lessons 2 weeks ago
pyproject.toml 03ae938c6e added testing lesson 1 month ago
requirements.txt da3969add5 added more pre-commit hooks 2 weeks ago
setup.py da3969add5 added more pre-commit hooks 2 weeks ago

Data Pipeline

Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

 Made With ML

Applied ML · MLOps · Production
Join 20K+ developers in learning how to responsibly deliver value with ML.


     


If you need refresh yourself on ML algorithms, check out our ML Foundations repository (🔥  Among the top ML repositories on GitHub)


📦  Product 🔢  Data 📈  Modeling
Objective Annotation Baselines
Solution Exploratory data analysis Experiment tracking
Evaluation Splitting Optimization
Iteration Preprocessing
📝  Scripting (cont.) 📦  Application ✅  Testing
Organization Styling CLI Code
Packaging Makefile API Data
Documentation Logging Models
♻️  Reproducibility 🚀  Production (cont.)
Git Dashboard Feature stores
Pre-commit CI/CD Workflows
Versioning Monitoring
Docker

📆  New lessons every month!
Subscribe for our monthly updates on new content.


Directory structure

app/
├── api.py        - FastAPI app
└── cli.py        - CLI app
├── schemas.py    - API model schemas
tagifai/
├── config.py     - configuration setup
├── data.py       - data processing components
├── eval.py       - evaluation components
├── main.py       - training/optimization pipelines
├── models.py     - model architectures
├── predict.py    - inference components
├── train.py      - training components
└── utils.py      - supplementary utilities

Documentation for this application can be found here.

Workflows

Use existing model

  1. Set up environment.

    export venv_name="venv"
    make venv name=${venv_name} env="dev"
    source ${venv_name}/bin/activate
    
  2. Pull latest model.

    dvc pull
    
  3. Run Application

    make app env="dev"
    

    You can interact with the API directly or explore via the generated documentation at http://0.0.0.0:5000/docs.

Update model (CI/CD)

Coming soon after CI/CD lesson where the entire application will be retrained and deployed when we push new data (or trigger manual reoptimization/training). The deployed model, with performance comparisons to previously deployed versions, will be ready on a PR to push to the main branch.

Update model (manual)

  1. Set up the development environment.

    export venv_name="venv"
    make venv name=${venv_name} env="dev"
    source ${venv_name}/bin/activate
    
  2. Pull versioned data and model artifacts.

    dvc pull
    
  3. Optimize using distributions specified in tagifai.main.objective. This also writes the best model's params to config/params.json

    tagifai optimize \
    --params-fp config/params.json \
    --study-name optimization \
    --num-trials 100
    

    We'll cover how to train using compute instances on the cloud from Amazon Web Services (AWS) or Google Cloud Platforms (GCP) in later lessons. But in the meantime, if you don't have access to GPUs, check out the optimize.ipynb notebook for how to train on Colab and transfer to local. We essentially run optimization, then train the best model to download and transfer it's artifacts.

  4. Train a model (and save all it's artifacts) using params from config/params.json and publish metrics to model/performance.json. You can view the entire run's details inside experiments/{experiment_id}/{run_id} or via the API (GET /runs/{run_id}).

    tagifai train-model \
    --params-fp config/params.json \
    --model-dir model \
    --experiment-name best \
    --run-name model
    
  5. Predict tags for an input sentence. It'll use the best model saved from train-model but you can also specify a run-id to choose a specific model.

    tagifai predict-tags --text "Transfer learning with BERT"  # test with CLI app
    make app env="dev"  # run API and test as well
    
  6. View improvements Once you're done training the best model using the current data version, best hyperparameters, etc., we can view performance difference.

    tagifai diff
    
  7. Commit to git This will clean and update versioned assets (data, experiments), run tests, styling, etc.

    git add .
    git commit -m ""
    git tag -a <TAG_NAME> -m ""
    git push origin <BRANCH_NAME>
    

Commands

Docker

make docker  # docker build -t tagifai:latest -f Dockerfile .
             # docker run -p 5000:5000 --name tagifai tagifai:latest

Application

make app  # uvicorn app.api:app --host 0.0.0.0 --port 5000 --reload --reload-dir tagifai --reload-dir app
make app-prod  # gunicorn -c config/gunicorn.py -k uvicorn.workers.UvicornWorker app.api:app

Streamlit dashboard

make streamlit  # streamlit run streamlit/app.py

MLFlow

make mlflow  # mlflow server -h 0.0.0.0 -p 5000 --backend-store-uri stores/model/

Mkdocs

make docs  # python -m mkdocs serve

Testing

make great-expectations  # great_expectations checkpoint run [projects, tags]
make test  # pytest --cov tagifai --cov app --cov-report html
make test-non-training  # pytest -m "not training"

Start Jupyterlab

python -m ipykernel install --user --name=tagifai
jupyter labextension install @jupyter-widgets/jupyterlab-manager
jupyter labextension install @jupyterlab/toc
jupyter lab

You can also run all notebooks on Google Colab.

FAQ

Why is this free?

While this content is for everyone, it's especially targeted towards people who don't have as much opportunity to learn. I firmly believe that creativity and intelligence are randomly distributed but opportunity is siloed. I want to enable more people to create and contribute to innovation.

Who is the author?

  • I've deployed large scale ML systems at Apple as well as smaller systems with constraints at startups and want to share the common principles I've learned along the way.
  • I created Made With ML so that the community can explore, learn and build ML and I learned how to build it into an end-to-end product that's currently used by over 20K monthly active users.
  • Connect with me on Twitter and LinkedIn

To cite this course, please use:

@article{madewithml,
    title  = "Applied ML - Made With ML",
    author = "Goku Mohandas",
    url    = "https://madewithml.com/courses/mlops/"
    year   = "2021",
}