Are you sure you want to delete this access key?
The reference project has been developed in Python, but the same concepts should be applicable to other language ML projects.
It is a good practice to define jobs, to be run inside a Docker Container.
This enables to have an easy, maintainable, reproducible and standard setup for jobs. Also debugging environment specific issues becomes easier as we can reproduce the jobs execution env conditions in our local.
Jenkins enalbes us to define agent
s to be a docker container, which can be brought up from an docker image
or from a customised image defined in a Dockerfile
. More on the same can be checked at Using Docker with Pipeline section of their Pipeline documentation.
agent
to be a container brought up from this Dockerfine./project
path inside the container./extras
volume to cache any files, between multiple job runs.Agent Definition:
agent {
dockerfile {
args "-v ${env.WORKSPACE}:/project -w /project -v /extras:/extras -e PYTHONPATH=/project"
}
}
Agent Dockerfile:
Here we define base image, install the required software and library dependencies.
FROM python:3.8 # Base image for our job
RUN pip install --upgrade pip && \
pip install -U setuptools==49.6.0
RUN apt-get update && \
apt-get install unzip groff -y
RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" && \
unzip awscliv2.zip && \
./aws/install # Installing aws-cli to use S3 as remote storage
COPY requirements.txt ./
RUN pip install -r requirements.txt # Installing project dependenices
* # Ignores everything
!requirements.txt # except requirements.txt file
As we have defined our agent
, now we can define stages in our pipeline.
Here are few stages that we define in our Jeninks pipeline:
We have defined our test cases in test folder and using pytest to run them for us.
stage('Run Unit Test') {
steps {
sh 'pytest -vvrxXs'
}
}
For linting check as standard practice we use flake8 and black.
stage('Run Linting') {
steps {
sh '''
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
flake8 . --count --max-complexity=10 --max-line-length=127 --statistics
black . --check --diff
'''
}
}
Once you have setup credentials in Jenkins, we can use it in a stage as follows. With dvc status -r origin
we test our connect with the remote. DVC remote informations are define in file .dvc/config file.
stage('Setup DVC Creds') {
steps {
withCredentials(
[
usernamePassword(
credentialsId: 'PASSWORD',
passwordVariable: 'PASSWORD',
usernameVariable: 'USER_NAME'),
]
) {
sh '''
dvc remote modify origin --local auth basic
dvc remote modify origin --local user $USER_NAME
dvc remote modify origin --local password $PASSWORD
dvc status -r origin
'''
}
}
}
Before running any further DVC stanges, we would need to fetch the data and models versioned by DVC. This can be done with dvc pull
command. But fetching files from S3
or similar remote storages, it increases our network load, build latency and also service usages cost.
To optimise this we can cache already fetched files, say from previous builds and only fetch the diff required for the current build.
We will use the mounted volume /extras
for this and refer it by dvc remote jenkins_local
.
jenkins_local
.origin
.jenkins_local
.stage('Sync DVC Remotes') {
steps {
sh '''
dvc status
dvc status -r jenkins_local
dvc status -r origin
dvc pull -r jenkins_local || echo 'Some files are missing in local cache!' # 1
dvc pull -r origin # 2
dvc push -r jenkins_local # 3
'''
}
}
Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?