Inference modes

The most usual approach of using neural networks for computer vision is to capture an image and feed the whole image to the neural network for inference. However, sometimes more complex logic is necessary, depending on the application. Some examples from our experience: * Crop off the boundaries of the image and only run inference on the inner part. * Run over the input image with a sliding window, run inference on each fragment and then aggregate to get the overall result. * In a pipeline setting, run inference only on specially selected regions. A typical example of this is first running a detection model on the whole image to get rough locations of the objects, and then run a segmentation model to find fine-grained contours on the objects only within the locations found by the detector.

The neural networks available out of the box in Supervisely all support the above inferfence modes with no code modifications. Only inference config for a given inference task need to be changed. Moreover, since this advanced logic is independent of the specific neural network being used, we have factored it out as a part of supervisely_lib SDK. It is trivial to integrate the available inference modes with your custom models also, as long as you rely on our SingleImageInferenceBase base class for inference. See our guide on how to integrate your custom neural network with the Supervisely platform.

In this tutorial we will go over the examples of available inference modes for neural networks in Supervisely.

One important thing to keep in mind is that all the inference modes produce labels in the context of the full original image. So, for example, if the original image is cropped before passing it to the neural network, the inference results from

Inference config structure

With all the Supervisely built-in neural networks, the inference mode settings and the settings of the specific network itself are decomposed into different sections of the overall inference task config as follows:

{
  "model": {
    # Config of the specific neural network.
    # Detection thresholds, other runtime options.
  },
  "mode": {
    # Inference mode config. Options used here are general
    # and do not depend on the specific neural network being used.
  }
}

Full image inference

The most basic inference mode is to pass the whole incoming image to the neural network for inference. Example config:

{
  "model": {
    "gpu_device": 0  # Use GPU #0 for this model.
  },
  "mode": {

    # Inference mode name, mandatory field.
    # "full_image" is the most basic mode, where all the image is
    # passed on to the neural net.
    "name": "full_image",

    # Mode-specific settings. For example, here we describe how to
    # post-process the labels returned by the model depending on
    # the object class.

    "model_classes": {
      # This section is actually common to all of the inference modes.

      # Add a suffix to all of the class names returned by the model.
      # This helps distinguish model labels from original labels when
      # using a labeled dataset as input.
      "add_suffix": "_unet",

      # Save all the labels produced by the models. Alternatively,
      # here one can specify a whitelist of class names to be saved,
      # e.g. ["car", "person"].
      "save_classes": "__all__"
    }
  }
}

ROI (fixed crop) inference

Sometimes we know that only a certain part of the input images is relevant for inference. For example, if a camera is fixed on the windshield of a car, we know that the top part of the image is the sky, where we are not going to have pedestrians, so it makes sense to only run the pedestrian detector on the bottom part of the image. In such cases we want to crop off a fixed boundary of the input image before passing it on to the neural network. This logic is handled by the roi inference mode.

{
  "model": {},
  "mode": {

    # Crop the image boundaries before inference.
    "name": "roi",

    # Class renaming and filtering settings.
    # See "Full image inference" example for details.
    "model_classes": {
    }

    # Cropping settings.
    # How much to crop from every side of the image.
    "bounds": {
      # The amount can be specified in pixels or
      # as a percentage of the corresponding dimension.
      "left": "100px",
      "right": "50px",
      "top": "20%",
      "bottom": "20%"
    },

    # Whether to add a bounding box of the cropped region
    # of interest as a separate label in the results.
    "save": false,

    # If saving the cropped region bounding box, which
    # class name to use.
    "class_name": 'inference_roi'
  }
}

Sliding window inference

Segmentation

In segmentation sliding window mode, the per-pixels class scores are summed up over all the sliding windows affecting the given pixel, and then the class with the maximum total score is taken as a label. Example config:

{
  "model": {},
  "mode": {

    # Go over the original image with a sliding window,
    # sum up per class segmentation scores and find the
    # class with the highest score for every pixel.
    "name": "sliding_window",

    # Class renaming and filtering settings.
    # See "Full image inference" example for details.
    "model_classes": {
    }

    # Sliding window parameters.

    # Width and height in pixels.
    # Cannot be larger than the original image.
    "window": {
      "width": 128,
      "height": 128,
    },

    # Minimum overlap for each dimension. The last
    # window in every dimension may have higher overlap
    # with the previous one if necessary to fit the whole
    # window within the original image.
    "min_overlap": {
      "x": 0,
      "y": 0,
    },

    # Whether to save each sliding window instance as a
    # bounding box rectangle.
    "save": false,

    # If saving the sliding window bounding boxes, which
    # class name to use.
    "class_name": 'sliding_window_bbox',
  }
}

Detection

With sliding windows for detection, instead of summing up per-pixel class scores, we accumulate the detection results (as rectangular bounding boxes). Optionally, non-maximum suppression is done for the final results. All of the options of the segmentation sliding window config apply, and an extra section with non-maximum suppression config is added:

{
  "model": {},
  "mode": {

    "name": "sliding_window_det",

    # All the sliding window options from the segmentation
    # sliding window config also apply here.

    "nms_after": {

      # Whether to run non-maximum suppression after accumulating
      # all the detection results from the sliding windows.
      "enable": true,

      # Intersection over union threshold above which the same-class
      # detection labels are considered to be significantly inersected
      # for non-maximum suppression.
      "iou_threshold": 0.2,

      # Tag name from which to read detection confidence by which we
      # rank the detections. This tag must be added by the model to
      # every detection label.
      "confidence_tag_name": "confidence"
    }
  }
}

Pipelining - bounding boxes ROI inference

In this mode, the neural network inference is invoked on subimages, defined by bounding boxes of existing labels. Those existing labels may come both from a previous run of another neural network (which allows us to pipeline the models), or from manual annotations. Example config:

{
  "model": {},
  "mode": {

    # Run inference within bounding boxes of existing labels.
    "name": "bboxes",

    # Class renaming and filtering settings.
    # See "Full image inference" example for details.
    "model_classes": {
    }

    # Filter the source bounding boxes by their object class name.
    # Use "__all__" to pass through all the classes or a whitelist
    # of classes like ["car", "person"].
    "from_classes": "__all__",

    # Padding settings for the source bounding boxes.
    "padding": {
      # Padding can be set in pixels or percents of the respective side
      "left": "20px",
      # Negative numbers mean crop instead of pad.
      "right": "-30px",
      "top": "20%",
      "bottom": "10%"
    },

    # Whether to save the bounding boxes that were used for selecting subimages
    # for inference.
    "save": true,

    # Because the input bounding boxes may have originated from multiple
    # object classes, we do not want to assign the same class name to them all.
    # Instead, append a given suffix to the original class name to get the
    # saved bounding box class name.
    "add_suffix": "_input_bbox"
  }
}
[ ]: