fix(transcoder): force aac stereo 48khz + frag_duration for mse compat

Two transcoder fixes for browser MediaSource Extensions parsing:

1. -ar 48000 -ac 2 on the audio output. Source 5.1 / 7.1 streams produced
   a moov atom Chrome CHUNK_DEMUXER refuses to parse, even when the video
   metadata is fine and a non-MSE video element accepts the same file.
   Forcing AAC-LC stereo 48 kHz makes the moov shape MSE-compatible.

2. -frag_duration 1000000 (1 second) so each moof+mdat fragment caps at
   ~1s of media. Without it, ffmpeg only splits at keyframes and high-
   bitrate 1080p produces 8 MiB+ mdat boxes — MSE waits for the whole
   mdat before parsing the first fragment, so playback never starts.

3. -movflags +negative_cts_offsets so b-frames carry the right pts/dts
   offsets and the playhead doesn't reset every fragment.

4. New range_req debug log to make sizing bugs greppable.
This commit is contained in:
Deivid Soto 2026-05-07 14:59:43 +02:00
parent 27fe84f2a0
commit 81abc4acca

View file

@ -259,18 +259,34 @@ func buildFFmpegArgs(filePath string, opts TranscodeOpts) []string {
filterChain = "format=yuv420p,setparams=colorspace=bt709:color_trc=bt709:color_primaries=bt709:range=tv"
}
args = append(args, "-vf", filterChain)
args = append(args, "-c:a", "aac", "-b:a", coalesce(opts.AudioBitrate, "192k"))
// Force AAC-LC stereo 48 kHz so MSE's CHUNK_DEMUXER accepts the moov.
// 5.1 / 7.1 source streams produce a moov shape that MSE refuses to
// parse (the <video src=blob:> demuxer is more forgiving), so we
// always downmix to stereo and resample to 48 kHz here. Source
// material that's already stereo passes through losslessly aside
// from the re-encode.
args = append(args,
"-c:a", "aac",
"-b:a", coalesce(opts.AudioBitrate, "192k"),
"-ar", "48000",
"-ac", "2",
)
}
// Common output flags — fragmented MP4 to a single pipe.
// NO faststart: that flag rewrites the moov atom to the front of the
// file as a SECOND pass after encoding finishes, which means the
// browser never sees a moov until ffmpeg exits. For live transcoding
// we need empty_moov (write a placeholder up front) so MSE can start
// decoding the very first fragment. faststart is only safe for
// already-finished files.
//
// * empty_moov + default_base_moof: write a header-only init segment
// up front so MSE can start decoding before the file is finished.
// * frag_duration=1s: cap each moof+mdat at ~1 second of media. Without
// this, ffmpeg only splits at keyframes, which on a high-bitrate
// 1080p stream produces 8 MiB+ mdat boxes — MSE refuses to parse
// the first fragment until the whole mdat lands, so playback never
// starts.
// * negative_cts_offsets: lets b-frames carry the right pts/dts so
// decoders don't reset the playhead to 0 every fragment.
args = append(args,
"-movflags", "frag_keyframe+empty_moov+default_base_moof",
"-movflags", "+frag_keyframe+empty_moov+default_base_moof+negative_cts_offsets",
"-frag_duration", "1000000",
"-f", "mp4",
"pipe:1",
)