fix(stream): don't cache transient libplacebo probe timeouts

Second critico pass on the functional probe.

- The probe does real Vulkan device init, which can transiently fail when the
  box is busy (notably the startup warm racing the encode benchmark). Caching
  that timeout as a permanent 'no' would pin HDR to the zscale CPU chain until
  daemon restart. Now a deadline is NOT cached — only a clean non-zero exit
  (filter absent / no ICD), which is a stable result. zscale stays cached as
  before (cheap deterministic grep, can't flake).
- Surface the exec error when ffmpeg never produced stderr (timeout / ENOENT):
  the fallback log now shows err.Error() instead of a blank tail, so 'no
  Vulkan' is distinguishable from 'ffmpeg never ran'.
- Dockerfile comment: clarify the Vulkan ICD (not GLX) is the load-bearing
  mount that 'graphics' adds; 'compute' alone doesn't mount it.

Probe still returns true on a Vulkan host (verified); engine tests green.
This commit is contained in:
Deivid Soto 2026-06-03 10:48:30 +02:00
parent 5e5a719f27
commit e298ff6c05
2 changed files with 28 additions and 12 deletions

View file

@ -95,9 +95,10 @@ ENV XDG_DATA_HOME=/data
# NVIDIA passthrough defaults. `--gpus all` alone only grants the "utility" +
# "compute" capabilities; nvenc needs "video", and "graphics" makes the runtime
# mount the NVIDIA Vulkan ICD (nvidia_icd.json + GLX libs) so ffmpeg's libplacebo
# filter (GPU HDR tonemap, paired with libvulkan1 above) can create a Vulkan
# device. Baking these here means a plain `docker run --gpus all` (or the compose
# mount the NVIDIA Vulkan ICD (nvidia_icd.json — the load-bearing piece — plus
# GLX/EGL libs) so ffmpeg's libplacebo filter (GPU HDR tonemap, paired with
# libvulkan1 above) can create a Vulkan device. "compute" alone does NOT mount
# the ICD. Baking these here means a plain `docker run --gpus all` (or the compose
# device reservation) lights up HW transcode + GPU tonemap with zero extra flags.
# Harmless when no GPU is attached.
ENV NVIDIA_VISIBLE_DEVICES=all