Keep an NVENC downscale of an SDR source entirely on the GPU
(decode -> scale_cuda -> h264_nvenc) instead of copying every frame to the
CPU for `scale=` and back. That GPU->CPU->GPU round-trip is the wall on
modest GPUs; even a strong box gains ~37% (scale_cuda 14.9x vs CPU 10.9x
on a 4K SDR HEVC -> 1080p encode).
Strictly gated so every case that needs CPU frames is unchanged:
- HDR (libplacebo Vulkan / zscale CPU tonemap can't consume a CUDA surface),
- burn-in (the scale2ref+overlay composite runs on CPU frames),
- non-NVENC encoders, and no-op when not actually downscaling.
- hwscale.go: FFmpegSupportsScaleCuda — a functional 1-frame probe mirroring
the libplacebo probe (presence in -filters lies; needs a real CUDA device).
Probes the worst-case real input (10-bit p010 -> 8-bit yuv420p) so a host
whose scale_cuda can't do the 10->8-bit conversion fails closed to CPU.
- hls.go: useCudaScale gate + `-hwaccel_output_format cuda` + a
`scale_cuda=-2:H:format=yuv420p` filter branch. Output is 8-bit
(format=yuv420p + `-profile:v main`), browser-safe.
- transcode_quality.go / player_session_registry.go / daemon.go: HasScaleCuda
flag, populated + warmed at startup like the other ffmpeg capability probes.
Fail-closed: probe absent/fails -> keep the CPU scale path, no regression.
Verified live (real 4K SDR HEVC Main10 session emitted scale_cuda, 5.54x
realtime, nvenc at 100%) + 8 arg-builder unit tests for the gate.
Prefer the single-pass Vulkan libplacebo filter over the CPU zscale chain
for HDR->SDR tonemapping when the agent ffmpeg has it. One GPU pass does
tonemap + BT.709 primaries/transfer/matrix + 8-bit yuv420p, replacing the
four-stage zscale chain and its trailing format=/setparams. Higher quality,
far cheaper than the CPU path, and present on builds that lack zscale.
- FFmpegSupportsLibplacebo probe (cached, mirrors FFmpegSupportsZscale)
- HasLibplacebo on TranscodeRuntime, wired from buildTranscodeRuntime
- hls.go: videoTail picks libplacebo when present (not h264_vaapi), else
keeps the zscale tonemap + format chain
- test: libplacebo replaces the zscale chain, never runs alongside it
HDR (HDR10/HLG/Dolby Vision) transcoded to SDR came out washed-out and
desaturated because the filter chain never tonemapped. buildHLSFFmpegArgsAt now
inserts a zscale linearise -> hable tonemap -> BT.709 chain after the scale and
before format=, but only when the source is HDR and the ffmpeg build has zscale
(FFmpegSupportsZscale, cached). Builds without zimg keep the old behaviour
(plays, just desaturated) instead of erroring.
It's a CPU filter, valid for every encoder here: the decode hwaccel deliberately
leaves frames in system memory (no -hwaccel_output_format), so zscale runs ahead
of format=/hwupload exactly like the existing scale filter. Verified on a real
4K HDR10 file — vivid colour and deep blacks vs the washed-out baseline.
Drops the custom WebRTC DataChannel pipeline + pion deps + WSS signaling
client + wire framing. Every in-browser playback now uses HLS over HTTP
from the daemon (Tailscale/LAN/UPnP). Browser P2P never re-enabled.
Wire renames (incompatible with web < 2026-05-26): agent.WebRTCSession
=> agent.StreamSession, SyncResponse.WebRTCSessions (JSON: webrtcSessions)
=> StreamSessions (JSON: streamSessions). MIN_AGENT_VERSION is bumped
to 0.9.4 on the web side so older agents see an upgrade card.
Also fixes the libx264 'VBV bitrate > level limit' abort by clamping
the encoder bitrate to the effective output height instead of the
requested label (carried over from the prior 0.9.3 unreleased work).
The seed_file vertical (mode=seed_file handler + engine.SeedFile) was
retired with the in-browser P2P player. [downloads.webrtc] config block
deleted; existing TOML files with the section still parse fine.
2026-05-26 18:04:35 +02:00
Renamed from internal/cmd/webrtc_session_registry.go (Browse further)