Are you sure you want to delete this access key?
Setting up auto labeling is a long and tedious process. Large parts can be automated if your model is connected via MLflow, and this project helps set such a system up in a few minutes.
Here's what it does:
Users have two points of injection: a post_hook
and a pre_hook
. The pre_hook
takes as input a local filepath to the downloaded datapoint for annotation, which is then forwarded to the MLflow model for prediction, which finally is forwarded to the post_hook
for conversion to the LS format.
The pre_hook
is optional, and defaults to the identity function lambda x: x
.
git submodule update --init
.docker build . -t configurable-ls-backend
docker run configurable-ls-backend -p <port-of-choice>:9090
a. Orchestrator: flask --app orchestrator run
The backend is now ready. now we move to the client.
Once this is working, you're ready to use any MLflow model as a LS backend. The last thing left to supply is hooks, one that processes filepaths into the desired input, and one that takes the predictions from an MLflow model and converts them into the LabelStudio format. Refer to the following section for details on building a post hook.
In [1]: from dagshub.data_engine import datasources
In [2]: from hooks.polygon_segmentation import post_hook
In [3]: ds = datasources.get_datasource('username/repo', 'datasource_name')
In [4]: ds.add_annotation_model('username/repo', 'model_name', post_hook)
For more information about additional options you can supply, refer to help(ds.add_annotation_model)
.
The key task that remains is that of setting up a post_hook
. This can be tricky, because failure is not always explicit. Refer to the following sections on tips for debugging, to ease that process.
The key idea is that the model expects a list of predictions for each annotation task (different image, different prediction),
A prediction consists of a dictionary containing result
, score
, and model_version
keys.
The result
key contains a list of results (e.g. multiple instances of an object on a single image), which further contain an id
that must be generated randomly, information about the target, the type of the prediction, as well as the value of the prediction itself. While the values passed varies between tasks, the overall key structure is retained, and following it is crucial to having everything render correctly.
An example of a predictions JSON is as follows (points trimmed for convenience):
"predictions": [
{
"id": 30,
"model_version": "0.0.1",
"created_ago": "23 hours, 41 minutes",
"result": [
{
"id": "f346",
"type": "polygonlabels",
"value": {
"score": 0.8430982828140259,
"closed": true,
"points": [ ... ],
"polygonlabels": [
"giraffe"
]
},
"to_name": "image",
"readonly": false,
"from_name": "label",
"image_rotation": 0,
"original_width": 426,
"original_height": 640
}
],
"score": 0.8430982828140259,
"cluster": null,
"neighbors": null,
"mislabeling": 0,
"created_at": "2024-07-16T12:56:49.517014Z",
"updated_at": "2024-07-16T12:56:49.517042Z",
"task": 7,
"project": 3
}
]
label-studio-ml start .
locally to not have to rebuild your docker container after every build.IPython.embed()
strategically within predict_tasks
from label_studio.ml.models
to identify if there's a discrepancy between what you expect and what you see. For this to works within model.py
, change tasks
from L30 to a list containing a path that you know contains valid targets.
a. If you opt for this, use a separate virtual environment for label-studio.print(inspect.getsourcecode(self.post_hook))
in model.py
.</>
button in LabelStudio's task view to reveal the source JSON, which you can use as a reference to build a functional prediction.Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?