Supervisely Tutorial #11

Usecase description:

We have to label cats and dogs on the same images that come every day. This tutorial illustrates SDK/API methods useful for organizing a custom data pipeline:

  1. Upload an incoming batch of images to the new dataset “batch X: {date-time}” in the “INBOX” project

  2. Copy created dataset to two labeling projects: “CATS” and “DOGS”. Project “CATS” will be used to create labeling jobs to annotate cats on images. Project “DOGS” will be used to create labeling jobs to annotate dogs on images.

  3. Once the corresponding labeling jobs are finished, annotations of cats and dogs on the same images will be merged and the merged data will be copied to project “FINAL” to dataset “batch X: {date-time}”.

This example allows us to organize and automate the data/labeling pipeline. The selected naming convention of projects and datasets helps to keep process simple and straightforward.

Imports

[1]:
import os
import supervisely_lib as sly
from datetime import datetime
import pprint
[2]:
import numpy as np
import requests

Initialize API access with your credentials

[3]:
address = os.environ['SERVER_ADDRESS']
token = os.environ['API_TOKEN']

print("Server address: ", address)
#print("Your API token: ", token)
Server address:  http://192.168.1.69:5777/

Initialize the API access object

[4]:
api = sly.Api(address, token)

Script parameters

[5]:
team_name = "max"
workspace_name = "Pipeline: Cats + Dogs"
project_name_inbox = "inbox"
project_name_cats = "cats"
project_name_dogs = "dogs"
project_name_final = "final"

# new batch images
images_urls = [
    "https://m.media-amazon.com/images/M/MV5BMTkxNTIyNzMxMV5BMl5BanBnXkFtZTcwMTQ3OTczNQ@@._V1_SY1000_CR0,0,1487,1000_AL_.jpg",
    "https://m.media-amazon.com/images/M/MV5BMTkyMTQ1Njk2Nl5BMl5BanBnXkFtZTcwOTQ3OTczNQ@@._V1_SY1000_CR0,0,672,1000_AL_.jpg",
    "https://m.media-amazon.com/images/M/MV5BNDc4MjUzNTYwOF5BMl5BanBnXkFtZTcwNjM2NDkxNA@@._V1_.jpg"
    "https://m.media-amazon.com/images/M/MV5BMTAyMDM5OTIzNDdeQTJeQWpwZ15BbWU3MDY0Nzk3MzU@._V1_SY1000_CR0,0,1487,1000_AL_.jpg",
    "https://m.media-amazon.com/images/M/MV5BMTYxNzUxODEyMV5BMl5BanBnXkFtZTcwMzQ3OTczNQ@@._V1_SY1000_CR0,0,1487,1000_AL_.jpg",

]

Verify of initialize parameters

[6]:
team = api.team.get_info_by_name(team_name)
if team is None:
    raise RuntimeError("Team {!r} not found".format(team_name))

workspace = api.workspace.get_info_by_name(team.id, workspace_name)
if workspace is None:
    workspace = api.workspace.create(team.id, workspace_name)

print("Team: id={}, name={!r}".format(team.id, team.name))
print("Workspace: id={}, name={!r}".format(workspace.id, workspace.name))
Team: id=6, name='max'
Workspace: id=13, name='Pipeline: Cats + Dogs'
[7]:
def get_or_create_project(api, workspace_id, project_name):
    project = api.project.get_info_by_name(workspace_id, project_name)
    if project is None:
        project = api.project.create(workspace_id, project_name, change_name_if_conflict=True)
    return project
[8]:
project_inbox = get_or_create_project(api, workspace.id, project_name_inbox)
project_cats = get_or_create_project(api, workspace.id, project_name_cats)
project_dogs = get_or_create_project(api, workspace.id, project_name_dogs)
project_final = get_or_create_project(api, workspace.id, project_name_final)

print("id={}, name={!r}".format(project_inbox.id, project_inbox.name))
print("id={}, name={!r}".format(project_cats.id, project_cats.name))
print("id={}, name={!r}".format(project_dogs.id, project_dogs.name))
print("id={}, name={!r}".format(project_final.id, project_final.name))
id=30, name='inbox'
id=31, name='cats'
id=32, name='dogs'
id=33, name='final'

Populate projects “cats” and “dogs” with classes to label

[9]:
project_meta_cats = sly.ProjectMeta.from_json(api.project.get_meta(project_cats.id))
project_meta_cats = project_meta_cats.add_obj_class(sly.ObjClass("cat", sly.Rectangle))
print(project_meta_cats)
api.project.update_meta(project_cats.id, project_meta_cats.to_json())
ProjectMeta:
Object Classes
+------+-----------+---------------+
| Name |   Shape   |     Color     |
+------+-----------+---------------+
| cat  | Rectangle | [15, 138, 60] |
+------+-----------+---------------+
Tags
+------+------------+-----------------+
| Name | Value type | Possible values |
+------+------------+-----------------+
+------+------------+-----------------+

[10]:
project_meta_dogs = sly.ProjectMeta.from_json(api.project.get_meta(project_dogs.id))
project_meta_dogs = project_meta_dogs.add_obj_class(sly.ObjClass("dog", sly.Polygon))
print(project_meta_dogs)
api.project.update_meta(project_dogs.id, project_meta_dogs.to_json())
ProjectMeta:
Object Classes
+------+---------+---------------+
| Name |  Shape  |     Color     |
+------+---------+---------------+
| dog  | Polygon | [138, 91, 15] |
+------+---------+---------------+
Tags
+------+------------+-----------------+
| Name | Value type | Possible values |
+------+------------+-----------------+
+------+------------+-----------------+

[11]:
project_meta_final = project_meta_cats.merge(project_meta_dogs)
api.project.update_meta(project_final.id, project_meta_final.to_json())
print(project_meta_final)
ProjectMeta:
Object Classes
+------+-----------+---------------+
| Name |   Shape   |     Color     |
+------+-----------+---------------+
| dog  |  Polygon  | [138, 91, 15] |
| cat  | Rectangle | [15, 138, 60] |
+------+-----------+---------------+
Tags
+------+------------+-----------------+
| Name | Value type | Possible values |
+------+------------+-----------------+
+------+------------+-----------------+

Create new dataset for new batch of data

[12]:
datasets = api.dataset.get_list(project_inbox.id)
new_dataset_name = "batch_{:03d}: {}".format(len(datasets) + 1, datetime.now().strftime("%d-%m-%Y"))

new_dataset = api.dataset.create(project_inbox.id, new_dataset_name)
print("New dataset is created: id={}, name={!r}".format(new_dataset.id, new_dataset.name))
New dataset is created: id=33, name='batch_001: 13-02-2020'

Upload new incoming data

[13]:
def download_image_by_url(image_url):
    response = requests.get(image_url)
    # Wrap the raw encoded image bytes.
    # Decode the JPEG data. Make sure to use our decoding wrapper to
    # guarantee the right number and order of color channels.
    img = sly.image.read_bytes(response.content)
    return img

In this example we dicided to demonstrate, how to upload images in numpy format. If you are interested in other upload methods, please check following guids in Explore section: “Upload local images to Supervisely Instance”, “Upload project”, “Create project using images urls” or others.

Please, notice, that all upload methods in SDK checks if image exists in Supervisely Storage. It means, that before uploading SKD methods check existence and upload data only for new images. Existing images will be uploaded by unique hashes. It allows to have same image in multiple datasets and project, but physically image is stored without duplicates. Also such mechanism allows to optimize image storage and network usage.

[14]:
for idx, image_url in enumerate(images_urls):
    name = "image_{:05d}.jpg".format(idx)
    img = download_image_by_url(image_url)
    img_info = api.image.upload_np(new_dataset.id, name, img)
    print("Image has been successfully added: id = {}, name = {!r}".format(img_info.id, img_info.name))
Image has been successfully added: id = 1568, name = 'image_00000.jpg'
Image has been successfully added: id = 1569, name = 'image_00001.jpg'
Image has been successfully added: id = 1570, name = 'image_00002.jpg'
Image has been successfully added: id = 1571, name = 'image_00003.jpg'

Copy uploaded data to labeling projects

[15]:
def copy_dataset(dst_project, src_dataset):
    copied_dataset = api.dataset.copy(dst_project.id,
                                      src_dataset.id,
                                      src_dataset.name,
                                      with_annotations=False,
                                      change_name_if_conflict=False)

    print("Dataset has been successfully copied to project {!r}: id = {}, name = {!r}".format(dst_project.name,
                                                                                              copied_dataset.id,
                                                                                              copied_dataset.name))
    return copied_dataset
[16]:
new_dataset_cats = copy_dataset(project_cats, new_dataset)
Dataset has been successfully copied to project 'cats': id = 34, name = 'batch_001: 13-02-2020'
[17]:
new_dataset_dogs = copy_dataset(project_dogs, new_dataset)
Dataset has been successfully copied to project 'dogs': id = 35, name = 'batch_001: 13-02-2020'

Create labeling jobs

[18]:
# Change here for your case. Take into account, that users have to be invited to the team
members = api.user.get_team_members(team.id)
#print(members)
labeler01 = members[0]
labeler02 = members[1]
#########################

print("Labeler #1: id={} login={!r}".format(labeler01.id, labeler01.login))
print("Labeler #2: id={} login={!r}".format(labeler02.id, labeler02.login))
Labeler #1: id=6 login='max'
Labeler #2: id=7 login='john'
[28]:
def create_labeling_job_for_new_dataset(api, team, project, dataset_to_label, labeler_id):
    project_related_jobs = api.labeling_job.get_list(team.id, project_id=project.id)
    cnt_jobs = len(project_related_jobs)

    lj_01 = api.labeling_job.create(name="{}_{:03d} [{}]".format(project.name, cnt_jobs + 1, dataset_to_label.name),
                                    dataset_id=dataset_to_label.id,
                                    user_ids=[labeler_id],
                                    readme="",
                                    description="")

    lj_01 = lj_01[0]
    print("Labeling jobs has been created: id={} name={!r}".format(lj_01.id, lj_01.name))

    # lj_01 is a list with one item in most cases. Number of created labeling jobs depends on the setting,
    # for examaple: if we define several users during creation, dataset images will be equally splitted to these users and several labeling jobs will be created.
    # WARNING: please, pass readme and description, even if you don't want to define them directly. Puclic API has a bug, that will be fixed in next release.
[29]:
create_labeling_job_for_new_dataset(api, team, project_cats, new_dataset_cats, labeler01.id)
Labeling jobs has been created: id=9 name='cats_005 [batch_001: 13-02-2020]'
[30]:
create_labeling_job_for_new_dataset(api, team, project_dogs, new_dataset_dogs, labeler02.id)
Labeling jobs has been created: id=10 name='dogs_002 [batch_001: 13-02-2020]'

Combine labeled cats and docs and copy dataset to the final project

[34]:
# new_dataset_cats and new_dataset_dogs are already labeled
[36]:
project_meta_final = sly.ProjectMeta.from_json(api.project.get_meta(project_final.id))
print(project_meta_final)
ProjectMeta:
Object Classes
+------+-----------+---------------+
| Name |   Shape   |     Color     |
+------+-----------+---------------+
| dog  |  Polygon  | [138, 91, 15] |
| cat  | Rectangle | [15, 138, 60] |
+------+-----------+---------------+
Tags
+------+------------+-----------------+
| Name | Value type | Possible values |
+------+------------+-----------------+
+------+------------+-----------------+

[40]:
new_dataset_final = copy_dataset(project_final, new_dataset)
Dataset has been successfully copied to project 'final': id = 36, name = 'batch_001: 13-02-2020'
[37]:
# images names in both datasets (new_dataset_cats and new_dataset_dogs) are the same because they are copies of a single dataset in "Inbox" project
[44]:
imgs_cats = api.image.get_list(new_dataset_cats.id)
imgs_dogs = api.image.get_list(new_dataset_cats.id)

name_to_id_cats = {info.name : info.id for info in imgs_cats}
name_to_id_dogs = {info.name : info.id for info in imgs_dogs}

names = list(name_to_id_cats.keys())
for name in names:
    id_cat = name_to_id_cats[name]
    id_dog = name_to_id_dogs[name]

    ann_cat = sly.Annotation.from_json(api.annotation.download(image_id=id_cat).annotation, project_meta_final)
    ann_dog = sly.Annotation.from_json(api.annotation.download(image_id=id_dog).annotation, project_meta_final)

    ann_merged = ann_cat.add_labels(ann_dog.labels)

    #if you need to merge tags, just uncomment this line
    #ann_merged = ann_merged.add_tags(ann_dog.tags())

    img_final = api.image.get_info_by_name(new_dataset_final.id, name)
    api.annotation.upload_ann(img_final.id, ann_merged)

Done!