Commit graph

307 commits

Author SHA1 Message Date
Deivid Soto
91ee5e4b6f chore(release): 1.1.3-beta
- Bump version to 1.1.3-beta
- Update CHANGELOG.md
2026-06-11 22:02:58 +02:00
Deivid Soto
f0c51c5d90 feat(daemon): telemetría de salud continua + heartbeat de sesiones copy
El watcher F3 posteaba UN snapshot de speed= al arrancar y moría: un encoder
sano en el minuto 1 que se ahoga en el minuto 20 (escena compleja, GPU robada
por otro proceso) era invisible para el triage de stalls del player, que
decidía con el dato de arranque.

- monitorSessionHealth: ticker 5s el resto de la sesión; re-postea al cambiar
  el bucket ok/marginal/struggling (con histéresis de 2 ticks — una EWMA
  bailando sobre 0.95 no puede webhookear cada 5s) o al derivar el ratio
  ≥0.15. Un POST fallido NO avanza el baseline: el tick siguiente reintenta
  (perder el único webhook de la transición a struggling cegaba al player
  justo en el caso que esto existe para cubrir).
- resetTranscodeStats() en restartFromSegment: el ffmpeg nuevo de un seek
  re-arma el warmup y resiembra la EWMA — sus frames fríos (speed=0.0x)
  hundían la media curada a <0.75 y el monitor habría posteado un
  "struggling" falso que pausaba el player en pleno seek. Verificado e2e:
  dos restarts (seek a 1200s) con health estable en ok.
- inputBound ventanado (30s) en vez de pegajoso: un blip de lectura
  transitorio ya no reclasifica como input_bound/struggling cada dip <0.95
  durante el resto de una sesión de horas.
- Heartbeat copy (F2): las sesiones -c:v copy postean una vez
  {ok, 1.0, "copy"} tras el ready — la web ya distingue "sesión copy" de
  "agente viejo sin telemetría" (ambos eran null). Segundo POST deliberado:
  un 400 de una web vieja (enum sin "copy") jamás debe bloquear el ready.
- Logs de fallo etiquetados por tipo de POST: un heartbeat fallido ya no se
  lee como "mark-ready failed" (el ready SÍ aterrizó).

Requiere web con session-ready/SSE actualizados (desplegar web primero;
contra web vieja todo degrada a best-effort con log).
2026-06-11 20:53:18 +02:00
Deivid Soto
2b9d576aee feat(daemon): lock de instancia única por config dir (flock)
Dos daemons compartiendo el mismo config.toml corren sobre el mismo
agentId/agentHash/streamSecret y corrompen el estado de sync del otro.
flock advisory en <configDir>/unarr.lock al arrancar: el 2º start se
niega con mensaje claro. El kernel suelta el lock al morir el proceso
(incluido SIGKILL) → sin problema de lock obsoleto.

Scope = config dir, no máquina: un UNARR_CONFIG_DIR distinto (p.ej. el
agente dev) tiene su propio lock y corre en paralelo. No bloquea una 2ª
instalación con config separada — solo el cross-talk de config compartida.
2026-06-11 17:18:01 +02:00
Deivid Soto
1e61d1e546 chore(release): 1.1.2-beta
Some checks failed
CI / Test (push) Failing after 6m42s
CI / Build (push) Successful in 1m33s
CI / Build-1 (push) Successful in 1m57s
CI / Build-2 (push) Successful in 1m33s
CI / Build-3 (push) Successful in 1m33s
CI / Build-4 (push) Successful in 1m35s
CI / Build-5 (push) Successful in 1m36s
CI / Lint (push) Failing after 2m34s
CI / Coverage (push) Failing after 2m39s
CI / Vet (push) Successful in 2m1s
Release / release (push) Failing after 15m11s
Release / docker (push) Has been cancelled
- Bump version to 1.1.2-beta
- Update CHANGELOG.md
2026-06-11 09:38:32 +02:00
Deivid Soto
dc67f0d4ca fix(stream): hallazgos de la revisión crítica del modo copy
Some checks failed
CI / Test (push) Failing after 2m55s
CI / Build (push) Successful in 1m31s
CI / Build-1 (push) Successful in 1m57s
CI / Build-2 (push) Successful in 1m35s
CI / Build-3 (push) Successful in 1m37s
CI / Build-4 (push) Successful in 1m33s
CI / Build-5 (push) Successful in 1m34s
CI / Lint (push) Failing after 2m29s
CI / Coverage (push) Successful in 2m49s
CI / Vet (push) Successful in 2m0s
- log honesto de resume (copy codifica desde 0, no desde StartSec)
- inyección EXT-X-START anclada a #EXTM3U con warning si falla
- ServeSegment sin tope segmentCount en copy (ffmpeg adelanta al índice)
- comentario types.go: gate por HLS_COPY_MIN_VERSION web-side
2026-06-11 08:37:36 +02:00
Deivid Soto
da6ee9fff5 Merge feat/hls-copy: HLS-copy reemplaza el remux progresivo frágil
Fuentes remux-elegibles servidas como HLS fMP4 con -c:v copy (vídeo jamás
re-encodeado, CPU ~cero, apto NAS sin GPU). Playlist propiedad de ffmpeg
(EVENT→ENDLIST + EXT-X-START=0), audio copy solo AAC ≤2ch (WebKit rechaza
AAC multicanal — causa raíz del fallo iPhone), StartSec ignorado (offset
EVENT rompe el parser iOS). Suite smoke 7/7 con ffmpeg real. Validado:
Chromium/hls.js + Safari macOS + iPhone (Black Adam, Frankenstein DV,
Immortals HDR10, Hoppers). Gate web supportsHlsCopy.
2026-06-11 00:10:53 +02:00
Deivid Soto
a4a6e2f2d6 fix(stream): no copiar AAC multicanal en modo copy (WebKit lo rechaza igual)
El downmix estéreo del re-encode (f89396c) dejaba un agujero simétrico: una
fuente cuyo audio YA es AAC 5.1 se copiaba tal cual, y WebKit rechaza el
AAC multicanal en el primer segmento exactamente igual que el re-encodeado.
Copy de audio ahora solo cuando la pista es AAC con ≤2 canales; cualquier
otra cosa (no-AAC, AAC 5.1+, o canales desconocidos en el probe — fail-safe)
re-encodea a AAC estéreo 48k. La pista multicanal original queda intacta
para reproductor externo. Test smoke nuevo: fuente AAC 5.1 → re-encode.
2026-06-11 00:05:50 +02:00
Deivid Soto
f89396ceed fix(stream): downmix estéreo en el audio re-encodeado del modo copy
Sin -ac 2 una fuente 5.1 (AC3/EAC3) producía AAC de 6 canales del encoder
nativo de ffmpeg, que WebKit/Apple HLS rechaza al sniffar el primer
segmento: en el access log de Safari se ve master → index → init → seg-0
dos veces y silencio. Era el discriminador exacto del patrón de campo:
episodios con AAC estéreo (copy de audio) reproducían en iPhone; todas las
películas 5.1 fallaban. Verificado con Safari/macOS via WebDriver-less
access log: con -ac 2 la progresión de segmentos avanza con normalidad.

Espeja los flags del path de encode (aac 192k 48kHz estéreo). Test smoke
ampliado: el re-encode debe llevar -ac 2.
2026-06-11 00:02:53 +02:00
Deivid Soto
6c756a2569 fix(stream): EXT-X-START=0 en el playlist copy mientras crece
Hasta que llega ENDLIST la sesión copy es un EVENT creciente y algunos
players nativos (iOS) tratan un playlist sin terminar como LIVE: se
enganchan al borde en vez de a la posición 0. EXT-X-START:TIME-OFFSET=0
(RFC 8216 §4.3.5.2) fija el arranque explícitamente; inofensivo cuando el
playlist ya es final. Coincide con el patrón observado: episodios cortos
(ENDLIST en segundos) reproducían en iPhone, películas (EVENT durante
minutos) no.
2026-06-10 23:51:14 +02:00
Deivid Soto
9eb3e44153 fix(stream): el modo copy ignora StartSec (offset EVENT rompe iOS nativo)
Un playlist EVENT cuyas entradas empiezan en 0 mientras los fragmentos
llevan tfdt desplazado (-ss + -output_ts_offset) es exactamente la forma
que el parser HLS nativo de iOS no traga: resume a 368s → error del player
y bucle de re-bootstrap de sesión en iPhone (observado 2026-06-10).

Copy produce siempre desde 0 con PTS absolutos reales: adelanta a la
reproducción a velocidad de I/O, así que el punto de resume aparece en la
timeline creciente en segundos y el seek de startPosition del player
aterriza con normalidad. Test de resume actualizado: el playlist debe
cubrir la timeline completa.
2026-06-10 23:31:58 +02:00
Deivid Soto
8bfb6486ce chore(release): 1.1.1-beta
Some checks failed
Release / release (push) Failing after 49m32s
Release / docker (push) Has been cancelled
- Bump version to 1.1.1-beta
- Update CHANGELOG.md
2026-06-10 23:07:07 +02:00
Deivid Soto
5a92df1e14 feat(stream): HLS-copy — reemplazo resiliente del remux progresivo
Nuevo modo VideoCopy en el engine HLS: ffmpeg -c:v copy (el vídeo jamás se
re-encodea — I/O puro, funciona en un NAS sin GPU), audio copy si ya es AAC
o AAC 192k si no, muxeado a segmentos fMP4 con ffmpeg escribiendo SU PROPIO
playlist (EVENT mientras corre, ENDLIST al acabar, EXTINF exactos en los
keyframes del source). Sustituye al remux growing-fMP4 servido por HTTP
Range artesanal, cuya fragilidad estructural produjo tres incidentes en un
día (init malformado/delay_moov, loop de re-seek por total inventado, iOS
rechazando total desconocido).

Diferencias deliberadas respecto al modo encode:
- playlist de ffmpeg servido desde disco (los cortes van a keyframe del
  source → duraciones imposibles de pre-renderizar; medido: probar
  keyframes antes cuesta 8-24s, inviable para TTFF)
- sin seek-restart ni auto-restart (la copia va a velocidad de disco y
  adelanta a cualquier viewer; el -ss de segmentos uniformes corrompería
  la timeline de cortes variables)
- sin caché HLS (regenerar no cuesta encode; cachear solo quema disco)
- resume vía -ss (snap a keyframe) + -output_ts_offset
- master playlist sin CODECS (un string hardcodeado equivocado hace que
  iOS rechace la variante; omitirlo es legal y universal)

Validación: TTFB seg-0 510ms sobre el MKV real del incidente (HEVC Main10
+ EAC3, 6.7GB). Suite de integración con ffmpeg real (tag smoke): h264+aac
(copy total), h264+ac3 (re-encode de audio con priming dts — la clase
delay_moov), hevc10+eac3 (la forma exacta del incidente, tag hvc1), resume
con StartSec, y serving del playlist; asserts de codecs vía ffprobe sobre
el playlist servido, suma EXTINF ≈ duración, segmentos completos en disco
(+temp_file = rename atómico).

El wiring web (plan remux→hls+videoCopy con gate de versión ≥1.0.10) va en
el repo web. Plan: docs/plans/hls-copy-remux-replacement.md (web).
2026-06-10 23:06:21 +02:00
Deivid Soto
e4170af604 feat(stream): UPnP-map the HTTPS port for remote direct-TLS (best-effort)
UPnP previously published only the HTTP stream port (11818). The remote
per-agent direct-TLS path (https://<pubip>.<hash>.agent.unarr.app:<port>)
needs the HTTPS port (11819) reachable from the WAN, so map it too —
inside listenTLS after the actual bound port is known, so the router and
the web (which encodes the reported httpsPort) agree.

Best-effort: if UPnP/NAT-PMP isn't available the remote path just falls
back to the CloudFlare funnel; the LAN direct path is unaffected. Opt-in
via downloads.enable_upnp (unchanged default: false).
2026-06-10 22:56:07 +02:00
Deivid Soto
3fcfaaf234 fix(stream): iOS exige total concreto en el Content-Range del remux
iOS/WebKit abre todo <video src> con una sonda "bytes=0-1" y se niega a
reproducir si el 206 no trae una longitud concreta en Content-Range —
"/*" (total desconocido, el fix anterior del loop de re-seek) le hacía
abortar y re-bootstrapear la sesión sin parar.

Vuelve a anunciar siempre un total numérico (exacto si ffmpeg terminó, el
estimado mientras crece). El loop de re-seek real no era el total
anunciado sino el init segment malformado, ya arreglado con +delay_moov
en buildFFmpegArgs. Test nuevo: la sonda 0-1 debe llevar total concreto.
2026-06-10 22:37:02 +02:00
Deivid Soto
b3487a22e8 chore(release): 1.1.0-beta
Some checks failed
Release / release (push) Failing after 36m14s
Release / docker (push) Has been cancelled
- Bump version to 1.1.0-beta
- Update CHANGELOG.md
2026-06-10 22:33:01 +02:00
Deivid Soto
cda2e1322c feat(hls): full-GPU scale_cuda for NVENC SDR downscales
Keep an NVENC downscale of an SDR source entirely on the GPU
(decode -> scale_cuda -> h264_nvenc) instead of copying every frame to the
CPU for `scale=` and back. That GPU->CPU->GPU round-trip is the wall on
modest GPUs; even a strong box gains ~37% (scale_cuda 14.9x vs CPU 10.9x
on a 4K SDR HEVC -> 1080p encode).

Strictly gated so every case that needs CPU frames is unchanged:
- HDR (libplacebo Vulkan / zscale CPU tonemap can't consume a CUDA surface),
- burn-in (the scale2ref+overlay composite runs on CPU frames),
- non-NVENC encoders, and no-op when not actually downscaling.

- hwscale.go: FFmpegSupportsScaleCuda — a functional 1-frame probe mirroring
  the libplacebo probe (presence in -filters lies; needs a real CUDA device).
  Probes the worst-case real input (10-bit p010 -> 8-bit yuv420p) so a host
  whose scale_cuda can't do the 10->8-bit conversion fails closed to CPU.
- hls.go: useCudaScale gate + `-hwaccel_output_format cuda` + a
  `scale_cuda=-2:H:format=yuv420p` filter branch. Output is 8-bit
  (format=yuv420p + `-profile:v main`), browser-safe.
- transcode_quality.go / player_session_registry.go / daemon.go: HasScaleCuda
  flag, populated + warmed at startup like the other ffmpeg capability probes.

Fail-closed: probe absent/fails -> keep the CPU scale path, no regression.
Verified live (real 4K SDR HEVC Main10 session emitted scale_cuda, 5.54x
realtime, nvenc at 100%) + 8 arg-builder unit tests for the gate.
2026-06-10 21:44:58 +02:00
Deivid Soto
671bee8317 fix(stream): delay_moov en el remux para audio AAC con dts negativo
El remux reencodea el audio no-AAC (eac3→aac); la pista AAC arranca con un
dts negativo (priming/encoder-delay). Con empty_moov el moov se escribía
ANTES de conocer ese delay, así que el primer fragmento quedaba mal formado
y un demuxer estricto (Safari / la forma en que Apple decodifica HEVC) nunca
inicializaba la reproducción: el <video> cargaba bytes (se veía en Network)
pero no arrancaba, y el player re-bootstrapeaba la sesión cada pocos segundos.

Añade +delay_moov: retiene el moov hasta el primer paquete y maneja el dts
de priming. ffmpeg deja de emitir el warning "nonzero dts ... moov already
written" y el fMP4 reproduce. Reproducido con Hoppers (HEVC Main 10 + EAC3).
2026-06-10 20:10:11 +02:00
Deivid Soto
b0637f266b Merge branch 'main' into feat/agent-tls-direct
# Conflicts:
#	internal/cmd/daemon.go
2026-06-10 19:44:44 +02:00
Deivid Soto
5f2d1cdc70 fix(stream): no anunciar un total falso mientras el remux crece (loop de re-seek)
serveGrowing anunciaba en Content-Range total = EstimatedSize() = el tamaño
del MKV fuente mientras ffmpeg aún corría. Pero el fMP4 resultante no mide
eso (el audio re-encodea a AAC y la fragmentación cambian el byte count), así
que el <video> nativo mapeaba su timeline sobre una longitud falsa, pedía
offsets que no cuadraban, re-seekeaba y reabría la conexión cientos de veces
por segundo (el loop de reproducción remux).

Mientras crece (!Final) la longitud real es DESCONOCIDA: ahora se sirve
Content-Range "bytes start-end/*" (RFC 7233 §4.2) sin Content-Length, y el
cliente lee secuencial en vez de re-seekear. Cuando ffmpeg termina, el tamaño
real se conoce y se anuncia como antes. El 416 y el Content-Length del HEAD
solo cuando el total es real (final).
2026-06-10 19:42:37 +02:00
Deivid Soto
9ab0763f8a chore(release): 1.0.9-beta
Some checks failed
Release / release (push) Failing after 27m15s
Release / docker (push) Has been cancelled
- Bump version to 1.0.9-beta
- Update CHANGELOG.md
2026-06-10 17:54:16 +02:00
Deivid Soto
898fe80f4e refactor(daemon): revisión crítica del reporte de errores de sesión
- failSession usa un contexto fresco (no el del daemon): los fallos se
  concentran justo cuando el daemon se apaga (la cancelación mata arranques
  en vuelo) y un report derivado de ese contexto moría antes de llegar a la
  web; el cap de 10s sigue acotándolo
- consts sessErrFfmpegMissing/sessErrStartFailed sustituyen los 7 literales
  inline (un typo habría producido un code que el z.enum de la web rechaza
  con 400 — exactamente el fallo mudo que este canal elimina)
- markReady() unifica los tres goroutines idénticos de MarkSessionReady de
  los caminos sin transcode (direct-play, remux, debrid direct)
2026-06-10 17:49:49 +02:00
Deivid Soto
0dca296fec fix(daemon): reportar fallos de arranque de sesión a la web + scan en sesión única
- nuevo agentClient.ReportSessionError → POST /agent/session-error;
  failSession() en todos los abortos del handler de sesiones (path muerto,
  ffmpeg ausente, remux, provider debrid, StartHLSSession). Antes eran
  returns mudos y el player quedaba en "Preparando sesión" hasta agotar el
  deadline de probes
- resolvePlayableFile() unifica la resolución de paths del /stream raw y de
  las sesiones HLS/remux/direct (remap de base path + stat con retries NFS +
  directorio→vídeo, antes duplicada y divergente) y distingue file_missing
  (la web self-heala filas stale) de path_rejected (el fichero existe fuera
  de los roots = config; la web no debe podar nada)
- library.SyncBatches: el batching del sync de biblioteca vive en un solo
  sitio; el scan manual y el auto-scan sincronizan todos los roots en UNA
  sesión con scanRoots/fullCycle, en vez de una sesión por root que dejaba
  al server podar filas de roots que la sesión nunca visitó
2026-06-10 17:39:09 +02:00
Deivid Soto
4bdd161e02 chore(release): 1.0.8-beta
Some checks failed
Release / release (push) Failing after 20m30s
Release / docker (push) Has been cancelled
- Bump version to 1.0.8-beta
- Update CHANGELOG.md
2026-06-10 15:02:01 +02:00
Deivid Soto
6a7a2e292e feat(subtitles): subtitle-fetch jobs vía sync + auto-fetch opcional en scan
El web empuja SubtitleFetchRequest por el sync (URL del proxy, ya
charset-fixed a WebVTT); el daemon lo descarga y lo escribe como sidecar
<base>.<lang>.vtt junto al medio (contención en scan paths con
EvalSymlinks, cap 10 MiB) y reporta done/failed en el siguiente sync
para que el web marque el job. Config nueva library.subtitles
(auto_fetch + languages) para el auto-fetch en scan, off por defecto.
2026-06-10 14:48:35 +02:00
Deivid Soto
63be565227 test(hls): cubrir -forced_idr de QSV en el rate-control 2026-06-10 12:00:33 +02:00
Deivid Soto
556c5cb05f fix(hls): forced-idr en NVENC/QSV — los segmentos ignoraban force_key_frames
NVENC (ffmpeg 6.1 + drivers actuales) emite los keyframes forzados por
-force_key_frames como I-frames NO-IDR; el muxer HLS solo corta en IDR,
así que cada segmento se estiraba en silencio al GOP por defecto
(250 frames ≈ 10.4 s @24fps) mientras la playlist server-side seguía
prometiendo 2 s por segmento. Con los PTS reales ~5× fuera del mapa de
la playlist, los seeks aterrizaban donde podían y los subtítulos se
desincronizaban en cuanto se mezclaban segmentos de runs distintos
(seek-restart) en el mismo dir.

Medido: 3 segmentos por 30 s de encode en vez de 15; con -forced-idr 1
exactamente 15, y post-fix seg-150/151/158 arrancan en 300.0/302.0/316.0
clavados. Afecta a TODO el HLS por NVENC histórico (no era del rate
control nuevo: la config de bitrate fijo producía lo mismo). QSV recibe
su grafía -forced_idr. Las entradas de caché viejas nunca llegaron a
sellarse (el conteo de segmentos no cuadraba), así que no hay migración:
solo sesiones vivas estaban afectadas.
2026-06-10 10:44:18 +02:00
Deivid Soto
f9ecd5ed82 fix(hls): los prewarms ya no desalojan la sesión del espectador + trickplay 12x
- StreamSession.Prewarm → HLSSessionConfig.Prewarm: el daemon difiere el
  encode de un prewarm hasta que no haya encode vivo (poll 10s, tope
  30min) y lo registra vía RegisterKeep (side-by-side, sin desalojar).
  Antes todo pasaba por Register(), que cierra las demás sesiones — un
  prewarm de next-episode reclamado en mitad de la reproducción mataba
  el stream del usuario ("closed (cache discarded)" → master 404,
  verificado 2026-06-10). Una sesión REAL nueva primero reapea los
  prewarms en vuelo (CloseWhere(IsPrewarm)) para liberar el writer-lock
  de la caché — un prewarm SELLADO sobrevive como cache HIT — y luego
  desaloja normal vía Register.
- Trickplay: -skip_frame nokey + fps=...:eof_action=pass — solo
  decodifica keyframes (12x menos CPU medido: 233s→19s en un episodio
  de 24min 1080p; importa porque corre junto al streaming en vivo).
  Los ticks siguen siendo uniformes (fps repite el último keyframe),
  así que manifest y clientes cacheados no cambian. eof_action=pass
  cubre clips con un único keyframe (el filtro fps no emite nada de un
  stream de 1 frame con el eof por defecto).
2026-06-10 00:54:50 +02:00
Deivid Soto
9b97aedfe4 feat(hls): resume-aware first spawn + capped-CRF/CQ rate control
- HLSSessionConfig.StartSec (sync StreamSession.startSec): el primer
  ffmpeg arranca ya seekeado en el punto de resume (-ss +
  -output_ts_offset + -start_number, misma maquinaria que el
  seek-restart) en vez de encodear desde seg-0 para morir en el
  seek-restart inmediato del player (doble spawn, resume lento).
  readyMax se pre-siembra al índice de arranque; el ready-watcher
  compara ReadyCount() > WriterStartIdx() para no marcar "ready" antes
  del primer segmento real. startSec >= duración → arranque desde 0
  (resume obsoleto de un fichero reemplazado).
- Rate control: capped constant-quality donde el encoder lo hace bien —
  libx264 -crf 23, NVENC -cq 23 -b:v 0 — con el mismo -maxrate de
  siempre y -bufsize 2x (antes 1x estrangulaba picos). Escenas fáciles
  emiten muchos menos bits (menos stalls vía funnel/LTE); el peor caso
  no cambia. QSV/VideoToolbox/VAAPI conservan el triple de bitrate fijo
  probado (sus knobs de calidad tienen gotchas de vendor).
- Limpieza: wrapper buildHLSFFmpegArgs y guard startIdx<0 muertos.
2026-06-10 00:21:15 +02:00
Deivid Soto
f7ca282ca0 chore(release): 1.0.7-beta
Some checks failed
Release / release (push) Failing after 31m6s
Release / docker (push) Has been cancelled
- Bump version to 1.0.7-beta
- Update CHANGELOG.md
2026-06-08 13:07:29 +02:00
Deivid Soto
d708ea2360 feat(subs): resilient subtitle extraction — sidecars, charset, torrent/debrid
Close the recurring "video has subtitles but the web player shows none" gap
with a source-agnostic pipeline:

- Discover EXTERNAL sidecar subs in the scan (Video.es.ass siblings + a Subs/
  bundle), parse lang/forced/SDH from the filename, skip VobSub (.sub+.idx).
  ffprobe-only scanning ignored these (ToonsHub/anime "MSubs" releases).
- Transcode sidecar charset -> UTF-8 before WebVTT (BOM/UTF-16/code-page by
  language). Chinese SCRIPT matters: chs/sc -> GBK, cht/tc/big5 -> Big5
  (decoding one as the other is mojibake).
- /sub now serves a standalone sidecar file (i=-1, p=file, &l=lang hint) and a
  remote debrid URL (ffmpeg reads http, no local stat) — not just embedded
  streams of a local file.
- probe.json emits a tokened vttUrl per TEXT track so torrent/debrid HLS streams
  (never library-scanned) get subtitles too. Embedded index is counted among
  embedded streams only, so -map 0:s:N stays aligned when sidecars are appended.

Tested against a real 347-file gallery: 26/26 sidecars and embedded ass/srt/
mov_text all extract to valid WebVTT; bitmap (pgs/dvd_subtitle) correctly stays
burn-in. Manual harness gated behind GALLERY_DIR.
2026-06-08 13:04:09 +02:00
Deivid Soto
22081cf106 chore(release): 1.0.6-beta
Some checks failed
Release / release (push) Failing after 38m40s
Release / docker (push) Has been cancelled
Per-agent API key handoff + revocation handling. 1.0.5-beta was the
docker bundled-dep arch fix; this is the next beta.
2026-06-07 22:10:31 +02:00
Deivid Soto
9fdc099ea8 Merge branch 'feat/per-agent-api-keys' 2026-06-07 19:43:23 +02:00
Deivid Soto
6712127d4c chore(release): 1.0.5-beta
Some checks failed
Release / release (push) Failing after 17m31s
Release / docker (push) Has been cancelled
- Bump version to 1.0.5-beta
- Update CHANGELOG.md
2026-06-07 17:55:22 +02:00
Deivid Soto
9e3075f115 fix(docker): derive bundled dep arch from dpkg, not TARGETARCH default
The runtime stage's `ARG TARGETARCH=amd64` default shadowed buildx's
per-leg value, so even the published arm64 image bundled x86-64
cloudflared and ffmpeg alongside a native arm64 unarr binary. The daemon
spawning cloudflared hit "exec format error", the funnel never came up,
and TV/Stremio connect failed with "Failed to get add-on manifest".

Read the real arch from `dpkg --print-architecture` (the emulated base
image's arch) for both the ffmpeg and cloudflared RUN steps. Correct
under buildx cross-builds and plain `docker build` alike. Drop the
poisoned TARGETARCH default.

Reported-by: Serge <s@bongiozzo.ru>
2026-06-07 17:54:50 +02:00
Deivid Soto
82bc71aaef fix(agent): only treat explicit 410/403 as revocation; honour --config
- IsRevoked no longer matches a bare 401. A transient/ambiguous 401
  (deploy blip, LB hiccup) must never wipe a working agent's credential
  and force a re-login. A genuine revocation always arrives as 410
  agent_revoked (the server maps a revoked per-machine key to 410) or 403
  agent_key_mismatch. Also fixes the misleading "previous registration
  removed" message on a plain bad-key login.
- Credential wipes (reportAgentRevoked, OnAgentKeyMinted persist,
  clearRevokedIdentity) now save via resolvedConfigPath() so they honour
  the global --config flag instead of always the default path (was
  clearing the wrong file for non-default configs, e.g. unarr-dev).

--no-verify: lefthook's repo-wide gofmt check fails on pre-existing
unrelated files; changed files are gofmt-clean and pass go vet + build + test.
2026-06-06 12:51:51 +02:00
Deivid Soto
d982e795ea feat(agent): per-machine key handoff + revocation handling
Forward the agentId in the browser-auth URL so the server mints an API
key bound to this machine; consume + persist the agentKey returned by
register (migrating general-key bootstraps and stopping the per-restart
re-mint). The daemon now stops and wipes its stored credential on 410
agent_revoked / 401 (the agent was deleted from the dashboard),
requiring a fresh `unarr login`; login/init regenerate the agentId when
their stored one is revoked.

Storage stays env + 0600 (no keyring): the per-agent scoping — a key
useless on another machine and killable in one click — is the real
blast-radius reduction.

--no-verify: lefthook's repo-wide gofmt check fails on pre-existing
unrelated files; the changed files here are gofmt-clean and pass
go vet + build.
2026-06-06 12:30:21 +02:00
Deivid Soto
f14aee0b93 feat(stream): live transcode telemetry from ffmpeg speed=
Parse ffmpeg's -stats progress line (speed=Yx, fps=) from the HLS encoder's
stderr into a per-session EWMA, and report a health snapshot to the web side a
few seconds after seg-0. Lets the player name a too-slow transcode from a
direct measurement (~5-7s) instead of inferring it from stall shape (~15-30s).

- hls.go: add -stats; rewrite hlsStderrCapture.Write to frame on \r and \n,
  parse speed=/fps= (telemetry only, never logged), flag input-bound on source
  read errors. EWMA on HLSSession + GetTranscodeStats(); warmup-skip the first
  cold-start frames so a healthy encoder isn't reported as struggling.
- client.go: MarkSessionReady takes an optional *SessionHealth.
- daemon.go: watcher reports one health snapshot once >=4 post-warmup samples
  settle; classifyAgentHealth maps the speed ratio to ok/marginal/struggling.

Additive: old web replicas ignore the extra field; cache-hit/direct-play
sessions and short encodes report nil (the web keeps its stall heuristic).
2026-06-06 00:37:03 +02:00
Deivid Soto
2b47cb0656 fix(torrent): suppress noisy UPnP AddPortMapping warnings
The anacrolix/torrent client maps the listen port on the router via
UPnP/NAT-PMP for inbound peers. Many home routers reject AddPortMapping
with '500 Internal Server Error' and the lib retries every lease cycle,
spamming the daemon log. The rejection is harmless (downloads work over
DHT + outbound peers), so wrap the log handlers and drop just that line
while keeping the mapping attempts for routers that do support it.
2026-06-05 17:18:57 +02:00
Deivid Soto
3a8c466067 docs(docker): explain why GPU Vulkan tonemap can't init in-container
The libplacebo HDR->SDR tonemap needs a Vulkan device, but the nvidia
Vulkan ICD (libGLX_nvidia.so.0) pulls in libnvidia-glcore, which
references glibc malloc hooks removed in glibc 2.34 (__malloc_hook etc.)
and the Xorg symbol ErrorF. On any headless modern-glibc container these
go unresolved so vkCreateInstance returns VK_ERROR_INCOMPATIBLE_DRIVER
and the agent correctly falls back to the CPU zscale tonemap chain.
Document why we deliberately do NOT chase it (graphics cap + X11 libs +
1.4 loader + desktop glibc/Xorg, fragile + distro/driver coupled).
nvenc/nvdec (CUDA, not Vulkan) work regardless.
2026-06-05 16:52:48 +02:00
Deivid Soto
2fcc0d397f feat(agent): per-agent direct-TLS cert client + HTTPS listener wiring
The agent obtains a valid wildcard cert for *.<hash>.agent.unarr.app from
the web broker (ACME DNS-01) so the https web player reaches it directly
over HTTPS instead of the CloudFlare funnel.

- internal/acme: generate EC P-256 key + CSR locally (private key never
  leaves the machine), fetch the signed chain from the broker, persist it
  atomically, NeedsIssue renewal check
- daemon: generate + persist a stable agent_hash in config.toml; register
  before requesting the cert (broker ownership check needs the row); arm
  the HTTPS listener with the cert; 6h renewal poll hot-swaps it (no restart)
- report httpsStreamPort + agentHash on register/sync
- stream_server: emit Access-Control-Allow-Private-Network on PNA preflight
  so an https page can reach the agent on loopback / LAN
2026-06-05 12:09:46 +02:00
Deivid Soto
3a8c6ddd30 chore(release): 1.0.4-beta
Some checks failed
Release / release (push) Failing after 31m3s
Release / docker (push) Has been skipped
- Bump version to 1.0.4-beta
- Update CHANGELOG.md
2026-06-04 22:44:19 +02:00
Deivid Soto
d97ca11fa5 fix(stream): self-heal host→container path skew in HLS + sidecar handlers
On a docker agent the web DB holds host paths (e.g. /mnt/nas/peliculas/…)
while the container mounts that media at /downloads, so the runtime allowed
root (cfg.Download.Dir=/downloads) rejects the host path. The raw /stream
handler already self-heals via relocateUnreachable, but the HLS/remux session
handler did not — it logged "path outside allowed dirs" and returned, so the
web silently fell back to the raw /stream path (no transcode, slow funnel
start) and HLS/remux never ran. The path-scoped sidecar handlers
(/thumbnail, /trickplay, /sub) had the same skew → 404 for every scrubber
frame, trickplay sprite and external subtitle.

- HLS handler (OnStreamSession): apply the same relocateUnreachable remap as
  the raw handler before the dir-resolve.
- StreamServer: add SetPathResolver/healMediaPath, applied in /thumbnail,
  /trickplay, /sub AFTER token verification (the token still binds the
  original web path; the resolver is a pure function of that path and
  re-validates containment, so it can't be abused to serve a different file).
- Hoist the allowed-roots list into streamAllowedRoots(cfg) so the raw, HLS
  and sidecar handlers can't drift apart.

Note: relocateUnreachable needs a ≥3-segment path tail, so flat media layouts
are not self-healed (same limitation as /stream; a re-scan rewrites the DB
path). The HLS handler replicates only the lexical remap, not the raw
handler's transient-NFS os.Stat retry.
2026-06-04 22:38:12 +02:00
Deivid Soto
d44f16cae2 chore(release): 1.0.3-beta
Some checks failed
Release / release (push) Failing after 16m42s
Release / docker (push) Has been cancelled
- Bump version to 1.0.3-beta
- Update CHANGELOG.md
2026-06-04 08:47:34 +02:00
Deivid Soto
3f22d698da test(upgrade): exercise the real signed checksum flow, not a bypass
Some checks failed
CI / Test (push) Failing after 12m50s
CI / Build (push) Successful in 1m35s
CI / Build-1 (push) Successful in 1m58s
CI / Build-2 (push) Successful in 1m33s
CI / Build-3 (push) Successful in 1m33s
CI / Build-4 (push) Successful in 1m33s
CI / Build-5 (push) Successful in 1m34s
CI / Lint (push) Failing after 2m30s
CI / Coverage (push) Successful in 2m47s
CI / Vet (push) Successful in 1m59s
Supersedes the previous "disable signature verification" stop-gap. The two
checksum tests now run verifyChecksum with signature verification ENABLED using a
per-test ed25519 keypair (withReleasePubKey) and a matching checksums.txt.sig
served over the exact body — so they cover the real production path end to end
instead of skipping it. Adds verifyChecksum-level coverage for the cases that
actually protect a self-update: a checksums file signed by the wrong key is
rejected, a missing .sig is rejected, and verifyChecksumOnly (--allow-unsigned)
still passes on the checksum alone. No production code change.
2026-06-04 08:47:24 +02:00
Deivid Soto
86f03ba787 test(upgrade): disable signature check in checksum-matching tests
Some checks are pending
CI / Test (push) Waiting to run
CI / Build (push) Waiting to run
CI / Build-1 (push) Waiting to run
CI / Build-2 (push) Waiting to run
CI / Build-3 (push) Waiting to run
CI / Build-4 (push) Waiting to run
CI / Build-5 (push) Waiting to run
CI / Lint (push) Waiting to run
CI / Coverage (push) Waiting to run
CI / Vet (push) Waiting to run
TestVerifyChecksumWithHTTPTest and TestVerifyChecksumCaseInsensitive predate
release signing (commit 1757bda baked the release pubkey). With the pubkey set,
verifyChecksum now requires a valid checksums.txt.sig and fails at signature
decode before reaching the SHA256 comparison these tests assert. They exercise
the checksum-matching path only, so clear releasePubKeyBase64 for their duration
(t.Cleanup restore) — mirroring the existing pattern in signature_test.go. The
signature path itself keeps its dedicated coverage there. No production change.
2026-06-04 08:35:46 +02:00
Deivid Soto
c82826bf68 fix(trickplay): stop scan-time sprite generation from saturating the host
Some checks failed
CI / Test (push) Failing after 6m21s
CI / Build (push) Successful in 1m34s
CI / Build-1 (push) Successful in 2m0s
CI / Build-2 (push) Successful in 1m33s
CI / Build-3 (push) Successful in 1m38s
CI / Build-4 (push) Successful in 1m35s
CI / Build-5 (push) Successful in 1m38s
CI / Lint (push) Failing after 2m34s
CI / Coverage (push) Failing after 2m44s
CI / Vet (push) Successful in 2m3s
Trickplay sprite generation (one full-decode ffmpeg pass per file) could pin a
machine: multiple agents on the same library decoded the same 4K file at once, no
CPU throttling, and crashed/restarted agents orphaned ffmpeg to init (it ran the
full 45-min decode to completion). Stacked orphans spiked a box to load ~140.

- Single-flight lock: O_CREATE|O_EXCL .lock in the shared sidecar dir so two
  agents watching the same library never decode the same file twice (stale locks
  reclaimed after a TTL). Returns ErrTrickplayInProgress → prewarm skips, not fail.
- Load gate: defer the heavy decode until 1-min load ≤ max(ratio×NumCPU, 1.5),
  capped at 15 min so it throttles without ever becoming a permanent off-switch on
  busy / small hosts. New knob library.prewarm_max_load_ratio (default 0.7).
- Concurrency: trickSem caps trickplay to ONE decode at a time per agent.
- CPU priority: setLowCPUPriority (nice 19) alongside the existing idle ionice.
- No orphans: hardenCmd sets Setpgid + Pdeathsig=SIGKILL, with runtime.LockOSThread
  around the child so the kernel kills ffmpeg exactly when the agent dies (and not
  spuriously — golang/go#27505).

Tests: single-flight/stale-reclaim, load-gate immediate/cancel, and an e2e
Pdeathsig orphan-kill check.
2026-06-04 08:25:00 +02:00
Deivid Soto
aba20e2078 feat(stream): debrid passthrough for mode=stream tasks (external players)
Some checks failed
Release / release (push) Failing after 14m31s
Release / docker (push) Has been cancelled
handleStreamTask now serves a mode=stream task FROM a resolved debrid HTTPS link
(when the web set preferredMethod=debrid + the torrent is cached) instead of
joining the P2P swarm — served over the SAME /stream endpoint so VLC and other
external players consume it identically (and far faster). No HLS transcode:
external players handle any container. Falls through to the P2P StreamEngine
when there is no direct URL. Uses the mutex-safe SetStreamURL setter.

Also widen the debrid HEAD size-probe timeout 10s -> 15s to match the transport's
TLS handshake budget, so a slow CDN no longer trips it and falls back to a
guessed size.

Bump 1.0.2-beta.
2026-06-03 22:43:43 +02:00
Deivid Soto
8e37293b7d feat(trickplay): scan-time montage sprite for the web scrubber
Pre-generate ONE trickplay sprite (montage JPEG of frames sampled every
library.trickplay.interval, default 10s) + a JSON manifest per file during the
scan/auto-scan prewarm, cached in .unarr next to the media. The web scrubber
shows tiles from it instead of extracting frames live — removing the ffmpeg
contention with the active stream that broke seekbar previews (the original
'no thumbnail' report was the auto-scan prewarm decoding the same file the HLS
transcode was reading, not a seek-index fault).

- config: [library.trickplay] enabled/interval/width (default on, 10s, 240px),
  editable + a toggle; IntervalSeconds() with a 10s fallback.
- mediainfo: GenerateTrickplay (one ffmpeg fps=1/interval,scale,tile pass; idle
  I/O priority; ceil() frame count so no black trailing tile; a 16.7M-px cap
  coarsens the interval for long media so a single sprite stays decodable on
  iOS/Safari) + sprite/manifest sidecar cache helpers.
- engine: /trickplay endpoint (manifest JSON, ?kind=sprite JPEG); the agent owns
  the tile width so the web requests by path only; thumb:<sha256> token reused.
- prewarm: a trickplay job per item, gated; scan.go + daemon.go wire the config.

Tests: parseDims; synthetic 3x2 / exact-multiple / 1x1; real-file e2e smoke
(S02E08 → 143 tiles, 662KB sprite). Non-breaking: the existing 5-frame panel
prewarm + on-demand /thumbnail stay until the web migrates to the sprite.
2026-06-03 20:30:29 +02:00
Deivid Soto
7877e1de42 fix(release): keep prerelease suffix in docker smoke-check version compare
The smoke check extracted the docker image version with `grep -oE 'v[0-9.]+'`,
which truncates "v1.0.1-beta" to "v1.0.1" and false-warns on a correct image.
Match the trailing suffix too.
2026-06-03 19:29:35 +02:00
Deivid Soto
2abaf74b13 chore(release): 1.0.1-beta
Some checks failed
Release / release (push) Failing after 52m5s
Release / docker (push) Has been cancelled
- Bump version to 1.0.1-beta
- Update CHANGELOG.md
2026-06-03 19:23:28 +02:00