Building a Custom GStreamer Plugin for NVIDIA DeepStream in Python

A production-ready DeepStream pipeline handles multi-stream video analytics — hardware-accelerated decoding, tracking, on-screen display, and message brokering — all wired through GStreamer. For standard detection models exported to TensorRT, nvinfer handles everything.

However, the common case has limits. Vision-language models, custom post-processing, rotated bounding boxes, or the need to hot-swap models at runtime are places where nvinfer’s assumptions break down. Sometimes you have a mature PyTorch inference stack your team has carefully tuned, and you want DeepStream to call that rather than reimplementing it in a config file.

It’s worth noting that for YOLO-family models specifically, DeepStream-Yolo by Marcos Luciano has already done excellent work implementing custom postprocessing in C++. If C++ is on the table, start there. This article takes a different angle: achieving the same result entirely in Python, using a custom GStreamer plugin with pyservicemaker, without sacrificing throughput.

The key insight that makes this possible: downstream elements like nvtracker, nvdsosd, and nvmsgconv don’t care which element produced detection metadata. Write to DeepStream’s metadata structure correctly and the rest of the ecosystem works as if nvinfer was never in the picture.

DeepStream Metadata

Every buffer flowing through a DeepStream pipeline carries more than pixel data. From the moment frames pass through nvstreammux, each GstBuffer has an NvDsBatchMeta structure attached to it. The hierarchy is straightforward and can be found in the official documentation.

NvDsBatchMeta
├── NvDsUserMeta                        (batch-level custom metadata)
└── NvDsFrameMeta                       (one per source stream)
    ├── NvDsUserMeta                    (frame-level custom metadata)
    └── NvDsObjectMeta                  (one per detected object)
        ├── NvDsClassifierMeta
        └── NvDsUserMeta                (object-level custom metadata)

NvDsBatchMeta describes the whole batch. Each NvDsFrameMeta corresponds to one source stream and carries frame-level information like the source ID and frame number. Each NvDsObjectMeta represents a single detection, meaning that when our plugin writes detections, we’ll write an NvDsObjectMeta for each one.

The critical thing to understand is that none of this is owned by nvinfer. It’s a shared data contract. Any GStreamer element in the pipeline can read from it, write to it, or both:

nvtracker reads object bounding boxes and writes tracking IDs.
nvdsosd reads boxes and labels to draw overlays.
nvmsgconv reads the whole structure to produce message payloads.

Our custom plugin will simply write detections into this structure the same way nvinfer would, and everything downstream picks them up without modification.

One important constraint worth understanding before writing any code: NvDsObjectMeta instances cannot be constructed directly from Python. Attempting to instantiate the class raises a No constructor defined! error at runtime.

The reason is architectural. DeepStream manages its metadata objects through memory pools — pre-allocated blocks that get recycled across frames to avoid the overhead of repeated heap allocation and deallocation in a high-throughput pipeline. These pools are owned by NvDsBatchMeta and live on the C side of the boundary. The Python bindings expose access to those pools, but deliberately don’t expose a Python-side constructor, because creating an NvDsObjectMeta outside the pool would bypass the lifecycle management that keeps DeepStream’s memory usage predictable. The correct way to get one is to ask the batch for it: batch_meta.acquire_object_meta(), which hands you a pre-allocated instance from the pool. When the frame is done, DeepStream returns it to the pool automatically.

The Python Bridge: pyservicemaker

To interact with DeepStream’s metadata from Python, we’ll use pyservicemaker, NVIDIA’s current, supported Python SDK for DeepStream. The official documentation covers the basics of pipelines and flows, but stops short of showing how to write and attach metadata from a custom inference element. That’s the gap this article fills.

The key abstraction is BatchMetadataOperator. Subclassing it and implementing handle_metadata(batch_meta) gives you access to the full NvDsBatchMeta for every buffer flowing through the pipeline. From there, iterating frames is as simple as using batch_meta.frame_items and attaching a detection object.

pyservicemaker also provides a Buffer wrapper around Gst.Buffer that exposes batch_meta directly and, importantly, an extract(batch_id) method that returns a DLPack handle to each frame’s GPU memory. That’s what makes zero-copy inference possible — the frame can be handed straight to TensorRT without ever leaving the GPU.

Rather than using BatchMetadataOperator standalone via a probe, we’ll fold the same pattern directly into our custom plugin’s do_transform_ip method, which gives us control over the element’s lifecycle, properties, and caps negotiation alongside the metadata access. But first, we need to build that plugin.

A Discoverable Python GStreamer Plugin

GStreamer discovers plugins at runtime by scanning directories listed in GST_PLUGIN_PATH. For Python plugins specifically, it looks inside a python/ subdirectory within each of those paths. That means your plugin is just a .py file dropped in the right place — no compilation, no CMake, no shared library. The tradeoff is that the registration pattern is strict and getting it wrong produces silent failures that are genuinely hard to debug.

$GST_PLUGIN_PATH/
└── python/
    └── gstexampleplugin.py   # your plugin

Set GST_PLUGIN_PATH to point at the parent directory and GStreamer will find python/gstexampleplugin.py automatically on the next pipeline run.

The Plugin Skeleton

Here’s the minimal skeleton for a passthrough inference element: it receives batched video buffers, runs inference, attaches metadata, and passes the buffer downstream unmodified.

import gi
gi.require_version('Gst', '1.0')
gi.require_version('GstBase', '1.0')
from gi.repository import Gst, GstBase, GObject

import torch
from pyservicemaker import Buffer

GST_PLUGIN_NAME = "gstexampleplugin"

Gst.init(None)

class GstExamplePlugin(GstBase.BaseTransform):

    __gstmetadata__ = (
        'GstExamplePlugin',                     # name
        'Filter/Effect/Video',                  # classification
        'Custom inference element',             # description
        'Your Name'                             # author
    )

    src_format = Gst.Caps.from_string(
        "video/x-raw(memory:NVMM), format=RGB, "
        "width=(int)[ 1, 2147483647 ], height=(int)[ 1, 2147483647 ], "
        "framerate=(fraction)[ 0/1, 2147483647/1 ]"
    )
    sink_format = Gst.Caps.from_string(
        "video/x-raw(memory:NVMM), format=RGB, "
        "width=(int)[ 1, 2147483647 ], height=(int)[ 1, 2147483647 ], "
        "framerate=(fraction)[ 0/1, 2147483647/1 ]"
    )

    src_pad_template = Gst.PadTemplate.new(
        "src", Gst.PadDirection.SRC, Gst.PadPresence.ALWAYS, src_format
    )
    sink_pad_template = Gst.PadTemplate.new(
        "sink", Gst.PadDirection.SINK, Gst.PadPresence.ALWAYS, sink_format
    )
    __gsttemplates__ = (src_pad_template, sink_pad_template)

    __gproperties__ = {
        'model-engine': (
            str,
            'TensorRT engine path',
            'Path to the .engine file',
            '',
            GObject.ParamFlags.READWRITE
        ),
        'confidence-threshold': (
            float,
            'Confidence threshold',
            'Minimum confidence to attach a detection',
            0.0, 1.0, 0.5,
            GObject.ParamFlags.READWRITE
        ),
    }

    def __init__(self):
        super().__init__()
        self.model_engine = ''
        self.confidence_threshold = 0.5
        self.engine = None

    def do_get_property(self, prop):
        if prop.name == 'model-engine':
            return self.model_engine
        elif prop.name == 'confidence-threshold':
            return self.confidence_threshold

    def do_set_property(self, prop, value):
        if prop.name == 'model-engine':
            self.model_engine = value
        elif prop.name == 'confidence-threshold':
            self.confidence_threshold = value

    def do_start(self):
        # Load your TensorRT engine here
        self.engine = load_engine(self.model_engine)  # implement load_engine separately
        return True

    def do_transform_ip(self, gst_buffer: Gst.Buffer) -> Gst.FlowReturn:
        """In-place transform: attach metadata, pass buffer unchanged."""
        buffer = Buffer(gst_buffer)
        batch_meta = buffer.batch_meta

        frames = []
        for frame_meta in batch_meta.frame_items:
            t = torch.utils.dlpack.from_dlpack(buffer.extract(frame_meta.batch_id))
            frames.append(t)
        batch = torch.stack(frames, dim=0)

        # Run your model inference
        results = self.engine(batch)

        # Iterate over results for each frame and attach detections as object_meta.
        # For non-detection outputs, user_meta can be used instead.
        # The loop body below is pseudocode and depends on your inference output format.
        for frame_meta in batch_meta.frame_items:
            for det in results:
                obj = batch_meta.acquire_object_meta()
                # Fill obj fields with detection data (bbox, class, confidence, etc.)
                ...
                frame_meta.append(obj)

        return Gst.FlowReturn.OK


# --- Registration ---
GObject.type_register(GstExamplePlugin)
__gstelementfactory__ = (GST_PLUGIN_NAME, Gst.Rank.NONE, GstExamplePlugin)

A few things worth noting about this skeleton:

GstBase.BaseTransform is the right base class for an in-place filter — one that receives a buffer, modifies it by attaching metadata, and passes it downstream. We override do_transform_ip rather than do_transform because we’re not allocating a new output buffer.

__gstmetadata__ and __gsttemplates__ are not optional. GStreamer won’t register the element without them. The caps string video/x-raw(memory:NVMM) tells GStreamer this element works with NVIDIA memory, which is essential for staying on-GPU in a DeepStream pipeline.

__gproperties__ exposes model-engine and confidence-threshold as first-class GStreamer properties, which means you can set them from a gst-launch command line or from Python pipeline code without touching the source.

The final two lines are required for registration: GObject.type_register registers the class with the GObject type system, and __gstelementfactory__ tells GStreamer the element name, rank, and class to instantiate. Without both, the plugin file will be found but the element will not be available for use in a pipeline.