feat(hls): full-GPU scale_cuda for NVENC SDR downscales

Keep an NVENC downscale of an SDR source entirely on the GPU
(decode -> scale_cuda -> h264_nvenc) instead of copying every frame to the
CPU for `scale=` and back. That GPU->CPU->GPU round-trip is the wall on
modest GPUs; even a strong box gains ~37% (scale_cuda 14.9x vs CPU 10.9x
on a 4K SDR HEVC -> 1080p encode).

Strictly gated so every case that needs CPU frames is unchanged:
- HDR (libplacebo Vulkan / zscale CPU tonemap can't consume a CUDA surface),
- burn-in (the scale2ref+overlay composite runs on CPU frames),
- non-NVENC encoders, and no-op when not actually downscaling.

- hwscale.go: FFmpegSupportsScaleCuda — a functional 1-frame probe mirroring
  the libplacebo probe (presence in -filters lies; needs a real CUDA device).
  Probes the worst-case real input (10-bit p010 -> 8-bit yuv420p) so a host
  whose scale_cuda can't do the 10->8-bit conversion fails closed to CPU.
- hls.go: useCudaScale gate + `-hwaccel_output_format cuda` + a
  `scale_cuda=-2:H:format=yuv420p` filter branch. Output is 8-bit
  (format=yuv420p + `-profile:v main`), browser-safe.
- transcode_quality.go / player_session_registry.go / daemon.go: HasScaleCuda
  flag, populated + warmed at startup like the other ffmpeg capability probes.

Fail-closed: probe absent/fails -> keep the CPU scale path, no regression.
Verified live (real 4K SDR HEVC Main10 session emitted scale_cuda, 5.54x
realtime, nvenc at 100%) + 8 arg-builder unit tests for the gate.
This commit is contained in:
Deivid Soto 2026-06-10 21:44:58 +02:00
parent 671bee8317
commit cda2e1322c
6 changed files with 251 additions and 7 deletions

View file

@ -27,6 +27,11 @@ type TranscodeRuntime struct {
// Preferred over the zscale chain for HDR sources — one GPU pass, higher
// quality, and present where zscale is missing.
HasLibplacebo bool
// HasScaleCuda: this host can run scale_cuda (CUDA device + filter). Lets an
// NVENC downscale of an SDR source stay fully on the GPU (decode → scale_cuda
// → h264_nvenc) instead of round-tripping each frame to the CPU for `scale=`.
// Probed functionally (FFmpegSupportsScaleCuda); false ⇒ keep the CPU scale.
HasScaleCuda bool
}
// qualityCap maps a session's Quality label to a (MaxHeight, VideoBitrate)