fix(trickplay): stop scan-time sprite generation from saturating the host
Some checks failed
CI / Test (push) Failing after 6m21s
CI / Build (push) Successful in 1m34s
CI / Build-1 (push) Successful in 2m0s
CI / Build-2 (push) Successful in 1m33s
CI / Build-3 (push) Successful in 1m38s
CI / Build-4 (push) Successful in 1m35s
CI / Build-5 (push) Successful in 1m38s
CI / Lint (push) Failing after 2m34s
CI / Coverage (push) Failing after 2m44s
CI / Vet (push) Successful in 2m3s

Trickplay sprite generation (one full-decode ffmpeg pass per file) could pin a
machine: multiple agents on the same library decoded the same 4K file at once, no
CPU throttling, and crashed/restarted agents orphaned ffmpeg to init (it ran the
full 45-min decode to completion). Stacked orphans spiked a box to load ~140.

- Single-flight lock: O_CREATE|O_EXCL .lock in the shared sidecar dir so two
  agents watching the same library never decode the same file twice (stale locks
  reclaimed after a TTL). Returns ErrTrickplayInProgress → prewarm skips, not fail.
- Load gate: defer the heavy decode until 1-min load ≤ max(ratio×NumCPU, 1.5),
  capped at 15 min so it throttles without ever becoming a permanent off-switch on
  busy / small hosts. New knob library.prewarm_max_load_ratio (default 0.7).
- Concurrency: trickSem caps trickplay to ONE decode at a time per agent.
- CPU priority: setLowCPUPriority (nice 19) alongside the existing idle ionice.
- No orphans: hardenCmd sets Setpgid + Pdeathsig=SIGKILL, with runtime.LockOSThread
  around the child so the kernel kills ffmpeg exactly when the agent dies (and not
  spuriously — golang/go#27505).

Tests: single-flight/stale-reclaim, load-gate immediate/cancel, and an e2e
Pdeathsig orphan-kill check.
This commit is contained in:
Deivid Soto 2026-06-04 08:25:00 +02:00
parent aba20e2078
commit c82826bf68
10 changed files with 399 additions and 8 deletions

View file

@ -205,6 +205,12 @@ type LibraryConfig struct {
// no contention with the active stream (the cause of broken seekbar previews)
// — and the file panel picks a few positions from the same grid.
Trickplay TrickplayConfig `toml:"trickplay"`
// PrewarmMaxLoadRatio gates the heavy trickplay decode on system load: a sprite
// job only starts while the 1-min load average is ≤ this × NumCPU, so scan-time
// generation never saturates the machine or the NAS. Default 0.7; 0 falls back
// to the default. Linux-only (no load reading elsewhere → unthrottled).
PrewarmMaxLoadRatio float64 `toml:"prewarm_max_load_ratio"`
}
// TrickplayConfig controls scan-time trickplay sprite generation.
@ -297,6 +303,7 @@ func Default() Config {
Interval: "10s",
Width: 240,
},
PrewarmMaxLoadRatio: 0.7,
},
}
}
@ -381,6 +388,11 @@ func applyDefaults(cfg *Config, meta toml.MetaData) {
if !meta.IsDefined("library", "trickplay", "width") {
cfg.Library.Trickplay.Width = 240
}
// Load-gate defaults ON for configs predating the key, so an old install can't
// saturate the box with scan-time sprite generation.
if !meta.IsDefined("library", "prewarm_max_load_ratio") {
cfg.Library.PrewarmMaxLoadRatio = 0.7
}
if !meta.IsDefined("downloads", "transcode", "enabled") {
cfg.Download.Transcode.Enabled = true