Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
eef1574bbc
MLflow tracking, code is working but 30 times slower
5 months ago
dec87fbc55
updated readme for log reg, moved synthetic data, updated main.py
5 months ago
7170abadec
added results for power-of-norm
4 months ago
2a36b74800
directory reorganization
5 months ago
5e335ca796
update sentiment analysis folder structure
5 months ago
8c857a21a1
removed sent140 data files
4 months ago
8c857a21a1
removed sent140 data files
4 months ago
e3c6001819
Initial commit
6 months ago
7cc2530791
update guide
5 months ago
22845c11d8
file rename
5 months ago
cd269dca47
removed one todo
5 months ago
43756d7645
added report
5 months ago
7f63c8b89f
updated conda environment file
5 months ago
2b095f5cd8
Added author's paper
6 months ago
eef1574bbc
MLflow tracking, code is working but 30 times slower
5 months ago
Storage Buckets
Data Pipeline
Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

You have to be logged in to leave a comment. Sign In

Power-of-Choice

Reproducibility study of the paper "Towards Understanding Biased Client Selection in Federated Learning" by Yae Jee Cho, Jianyu Wang and Gauri Joshi.

Directory Structure

.
├── quadratic_optimization            # Experiment 1: Quadratic simulations
├── logistic_regression               # Experiment 2: Logistic regression using synthetic data
├── image_classification              # Experiment 3: Image classification using FMNIST, CIFAR10 data
├── sentiment_analysis                # Experiment 4: Sentiment analysis using Twitter data
├── data                              # synthetic data
├── MLflow_guide.ipynb                # ML flow guide
├── jee-cho22a.pdf                    # Original paper for reproducibility
├── Report-ReScience.pdf              # Our reproducibility report
├── Presentation-Hackathon.pdf        # Presentation talk for hackathon
├── requirements.txt                  # PIP requirements file
├── environment.yml                   # CONDA environment file
├── .gitignore
├── LICENSE
└── README.md

Dataset

All dataset files (except synthetic data) are automatically downloaded from their respective repositories. Synthetic data is included in our repository. No action is required for downloading/preprocessing any data.

Getting Started

We highly recommending setting up the environment through following commands instead of using environment.yml or requirements.txt to avoid issues arising from different architectures/machines.

To get started, install conda distribution for managing python packages. Create an environment:

Step 1: Create and activate conda environment

  • $ conda create -n myenv python=3.10 ipython ipykernel -y
  • $ conda activate myenv
  • $ python -m ipykernel install --user --name myenv --display-name "Python (myenv)"

Step 2.1: Install PyTorch 2.0.1, reference

  • For Mac: $ conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 -c pytorch -y
  • For Windows/Ubuntu (CPU): $ conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 cpuonly -c pytorch -y
  • For Windows/Ubuntu (GPU with Cuda 11.8): $ conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia
  • Verify install:
$ python -c "import torch; print(f'Torch: {torch.__version__}\nCUDA: {torch.version.cuda}\nIs CUDA available: {torch.cuda.is_available()}\nCUDA devices: {torch.cuda.device_count()}')"

Step 2.2: Install common packages

  • ML Toolkits: $ conda install -c anaconda pandas numpy tqdm -y
  • Misc: conda install -c conda-forge matplotlib jupyterlab -y
  • Verify install:
$ python -c "import pandas, numpy, tqdm, matplotlib; print ('Done')"

Note: you will need to activate the environment everytime you run our scripts. For jupyter notebook/lab, you need to select our custom kernel "Python (myenv)" created in Step 1.

Instructions

To reproduce the experimental results of the main paper, follow these steps:

Custom Experiment

To write your own custom federated learning experiment, you may reuse the codebase/pipeline for image_classification. Under this directory, add your custom dataset to data_utils.py, write your custom client selection algorithm in FedAvg.py, specify custom hyperparams in a new config file custom.json and you're ready to run your experiment by $ python main.py -c configs/custom.json

Set DagsHub as MLflow server

export MLFLOW_TRACKING_USERNAME=<token>

or

export MLFLOW_TRACKING_USERNAME=<username>
export MLFLOW_TRACKING_PASSWORD=<password/token>

Save and load model using MLflow

Please refer to MLflow_guide.ipynb for detailed information.

TODO

  • difference in loss values for image classification experiments (but accuracy values matches approximately)
  • remove default values in argparse, to be doubly sure that only the provided values are used
  • confirm correctness of pipeline with another fedml code/paper
  • distirbuted training setup using pytorch

Contact

Peng Ju, Gautam Choudhary

Tip!

Press p or to see the previous file or, n or to see the next file

About

Reproduce the result from the paper "Towards Understanding Biased Client Selection in Federated Learning."

Publications
View on arXiv  
Collaborators 2

Comments

Loading...