roboflow/supervision 0.28.0 on GitHub

🔦 Spotlight

Memory-efficient masks with `sv.CompactMask`

Segmentation models produce one full-resolution bitmap per instance. On a 1920×1080 image with 28 detections that is ~55 MB of mask data. Most pixels are background. sv.CompactMask stores only the tight bounding-box crop, RLE-encoded — the same 28 masks drop to ~237 KB of crops, a 240× reduction before RLE kicks in.

It's a drop-in replacement: annotators, filters, and area all work unchanged.

import supervision as sv

# any segmentation model — RF-DETR Seg, YOLO-Seg, SAM3
detections = model.predict(image)  # sv.Detections with dense masks

dense_mb = detections.mask.nbytes / 1024 / 1024
compact = sv.CompactMask.from_dense(
    masks=detections.mask,
    xyxy=detections.xyxy,
    image_shape=image.shape[:2],
)
detections.mask = compact  # swap in — API unchanged

# filter by pixel area without materialising dense masks
large = detections[compact.area > 1000]

# annotators call .to_dense() internally
annotated = sv.MaskAnnotator().annotate(image.copy(), detections)

SAM3 text-prompted segmentation

SAM3 segments objects by free-text prompt — no class list, no bounding boxes. sv.Detections.from_sam3() parses both PCS (multi-prompt) and PVS (video) response formats into a standard sv.Detections, with class_id set to the prompt index.

import requests, base64
import supervision as sv

PROMPTS = ["person", "bag"]

with open("image.jpg", "rb") as f:
    img_b64 = base64.b64encode(f.read()).decode()

response = requests.post(
    f"https://api.roboflow.com/inferenceproxy/seg-preview?api_key={API_KEY}",
    json={
        "image": {"type": "base64", "value": img_b64},
        "prompts": [{"type": "text", "text": p} for p in PROMPTS],
    },
    headers={"Content-Type": "application/json"},
)
sam3_result = response.json()

h, w = cv2.imread("image.jpg").shape[:2]
detections = sv.Detections.from_sam3(sam3_result=sam3_result, resolution_wh=(w, h))
# class_id == 0 → "person", class_id == 1 → "bag"

🔄 Migration

`VideoInfo.fps` is now `float`

NTSC frame rates (23.976, 29.97, 59.94) were silently truncated. fps is now the true float — cast at call sites that need an integer.

Before

info = sv.VideoInfo.from_video_path("clip.mp4")
buf = collections.deque(maxlen=info.fps)
trace = sv.TraceAnnotator(trace_length=info.fps)

After

info = sv.VideoInfo.from_video_path("clip.mp4")
buf = collections.deque(maxlen=int(info.fps))
trace = sv.TraceAnnotator(trace_length=int(info.fps))

`sv.ByteTrack` deprecated — use `ByteTrackTracker`

Tracker implementations now live in the dedicated trackers package. sv.ByteTrack remains available in 0.28–0.29 with DeprecationWarning; removal in 0.30.0.

Before

tracker = sv.ByteTrack()
detections = tracker.update_with_detections(detections)

After

# pip install trackers
from trackers import ByteTrackTracker

tracker = ByteTrackTracker()
detections = tracker.update(detections)

🚀 Added

Memory-efficient masks with sv.CompactMask. Sparse segmentation masks are now stored as a crop region plus RLE-encoded data instead of full-resolution bitmaps, cutting memory use by 10–100× for typical instance-segmentation outputs. It's a drop-in change — sv.Detections.mask, filtering, merging, and area all keep working without materialising the full array. (#2159)
SAM3 detection and PVS support in from_inference. sv.Detections.from_inference now parses SAM3 detection and point-video-segmentation outputs, both from the local inference package and from Roboflow-hosted server responses. (#2103, #2152)
Compressed COCO RLE masks in from_inference. Inference responses with rle or rle_mask fields containing a compressed counts string (as produced by pycocotools) are decoded directly into binary masks, skipping the lossy polygon round-trip. (#2178)
Standard logging module instead of print. Diagnostic output is now emitted under the supervision logger, so applications can capture, filter, or silence it through standard logging configuration. (#2154)
RGBA hex codes in sv.Color. sv.Color.from_hex accepts 8-digit hex (#ff00ff80), and Color.as_hex() round-trips alpha when not fully opaque. New top-level helpers: sv.hex_to_rgba, sv.rgba_to_hex, and sv.is_valid_hex. (#2004)
Dynamic kernel sizing in blur and pixelate annotators. BlurAnnotator(kernel_size=None) and PixelateAnnotator(pixel_size=None) (the new default) compute the kernel per detection as a fraction of the shorter bounding-box side, giving visually consistent results across object scales. (#709)
sv.ImageAssets for sample images. A counterpart to the existing video assets — downloads sample images for examples and tutorials. (#932)
Boundary warnings in InferenceSlicer. Emits a warning when callback detections fall outside tile boundaries, helping you spot coordinate-system bugs in custom callbacks early. (#2186)

⚠️ Breaking Changes

sv.VideoInfo.fps is now float, not int. Frame rates like 23.976, 29.97, and 59.94 are no longer truncated. If you pass fps to APIs that require an integer (deque(maxlen=...), TraceAnnotator(trace_length=...)), wrap with int(...). (#2210)
sv.rle_to_mask returns bool, not uint8. This matches the long-declared signature. Code that does mask * 255 still works via NumPy broadcasting, but explicit casts like mask.view(np.uint8) will break. Add .astype(np.uint8) if you relied on the undocumented integer output. (#2178)

See the migration guide below for before/after snippets.

🌱 Changed

Metric arrays use float32 instead of float64. sv.MeanAveragePrecisionResult and related arrays (mAP_scores, ap_per_class, iou_thresholds, precision/recall) drop to float32, reducing memory and speeding up computation. Numerical results may differ in the last few digits. (#2169)
rle_to_mask and mask_to_rle moved. New canonical path: supervision.detection.utils.converters. The old supervision.dataset.utils import still works but is deprecated. (#2178)

🗑️ Deprecated

normalized_xyxy argument renamed to xyxy in denormalize_boxes. sv.denormalize_boxes(normalized_xyxy=...) still works but emits a FutureWarning; switch to xyxy=. Scheduled for removal in 0.30.0.
sv.ByteTrack → ByteTrackTracker (external trackers package). Install with pip install trackers; the method renames from update_with_detections() to update(). Scheduled for removal in 0.30.0. (#2215)
supervision.keypoint → supervision.key_points. Also deprecated: the LMM enum (use VLM), from_lmm (use from_vlm), create_tiles in supervision.utils.image, ensure_cv2_image_for_processing in supervision.utils.conversion, and the keypoint validators in supervision.validators. (#2214)

🔧 Fixed

PolygonZone no longer double-counts overlapping zones. When two polygons contain the same anchor, each zone now reflects its own containment instead of every zone claiming the detection. (#1991)
LineZone respects class identity across reused tracker IDs. Trackers that recycle tracker_id across classes no longer leak crossing state from one object to another. (#1868)
process_video raises immediately on callback errors. Previously the exception was swallowed and the process hung until the writer was flushed. (#2022)
DetectionDataset populates class_name. Loaded annotations now carry data["class_name"], matching what model connectors produce. (#2156)
ByteTrack preserves externally assigned tracker_id. No longer overwrites caller-assigned IDs on the first update. (#1364)
Confusion matrix double-counting fixed. evaluate_detection_batch now correctly matches multiple predictions to the same target, so FP/FN counts match expectations. (#1853)
MeanAverageRecall mAR@K is now COCO-compliant. Computed using top-K detections per image; previous values were inflated relative to pycocotools. (#2136)
Detections.is_empty() handles empty tracker_id. Returns True for zero-row detections regardless of whether tracker_id is None or an empty array. (#2209)
CSVSink and JSONSink slice custom_data per row. NumPy arrays, lists, and tuples whose length matches the detection count are now indexed per row, instead of being written whole for every detection. (#2199, #2216)
TraceAnnotator smooth mode handles stationary tracks. Deduplicates anchor points and falls back to a raw polyline when splprep cannot fit fewer than 4 unique points. (#2217)
load_coco_annotations rejects path-traversal annotations. Refuses file_name entries that escape the images directory via ../ or absolute paths. (#2218)
OBB datasets no longer blow up memory. Loading oriented-bounding-box datasets stopped allocating full-image masks per box. (#2187)
KeyPoints boolean mask indexing fixed. Uniform-count selection now works correctly when all instances share the same keypoint count. (#2188)
DetectionDataset.as_coco() preserves area and iscrowd. No longer dropped silently in the round-trip. (#2185)
force_mask=True precision and COCO empty-polygon export. Annotation conversion no longer loses precision, and COCO export tolerates empty polygons across formats. (#1746, #1086, #265)

🏆 Contributors

A huge thank you to everyone who shipped this release:

@Erol444 — SAM3 detection and PVS parsing
@leeclemnet (LinkedIn) — compressed COCO RLE masks and rle_to_mask correctness
@abritton2002 — VideoInfo.fps as float and Detections.is_empty() fix
@shaun0927 (LinkedIn) — sink slicing, trace annotator, COCO path-traversal hardening
@happyhj (LinkedIn) — class_name in DetectionDataset
@farukalamai (LinkedIn) — CSVSink NumPy slicing
@stop1one (LinkedIn) — COCO-compliant MeanAverageRecall
@Adithi-Sreenath (LinkedIn) — PolygonZone overlap fix
@JESUSROYETH — LineZone class-aware tracker IDs
@realh4m — process_video error propagation
@rolson24 (LinkedIn) — ByteTrack preserves external tracker IDs
@panagiotamoraiti (LinkedIn) — confusion matrix correctness
@Youho99, @kirilllzaitsev — COCO empty polygons and force_masks consistency
@aza-ali — RGBA hex support in sv.Color
@Clemens-E — dynamic kernel sizing for blur and pixelate annotators
@NickHerrig (LinkedIn) — sv.ImageAssets
@0xD4rky — force_mask=True precision fix
@Borda (LinkedIn) — CompactMask, metrics float32, deprecations

Full changelog: 0.27.0...0.28.0

roboflow/supervision 0.28.0 supervision-0.28.0: CompactMask & SAM3 on GitHub