Fix an occassional DTS overlap by
closing the filtergraph after each
segment and re-creating it at the
beginning of each segment, instead
of attempting to persist the
filtergraph in between segments.
This overlap occurred mostly when
flip-flopping segments between transcoders,
or processing non-consecutive segments within
a single transcoder. This was due to drift in
adjusting input timestamps to match the fps
filter's expectation of mostly consecutive
timestamps while adjusting output timestamps
to remove accumulated delay from the filter.
There is roughly a 1% performance hit on my
machine from re-creating the filtergraph.
Because we are now resetting the filter after
each segment, we can remove a good chunk of
the special-cased timestamp handling code
before and after the filtergraph since
we no longer need to handle discontinuities
between segments.
However, we do need to keep some filter flushing
logic in order to accommodate low-fps or low-frame
content.
This does change our outputs, usually by one
fewer frame. Sometimes we seem to produce an
*additional* frame - it is unclear why. However,
as the test cases note, this actually clears up a
numer of long-standing oddities around the expected
frame count, so it should be seen as an improvement.
---
It is important to note that while this fixes DTS
overlap in a (rather unpredictable) general case,
there is another overlap bug in one very specific case.
These are the conditions for bug:
1. First and second segments of the stream are being
processed. This could be the same transcoder or
different ones.
2. The first segment starts at or near zero pts
3. mpegts is the output format
4. B-frames are being used
What happens is we may see DTS < PTS for the
very first frames in the very first segment,
potentially starting with PTS = 0, DTS < 0.
This is expected for B-frames.
However, if mpegts is in use, it cannot take negative
timestamps. To accompdate negative DTS, the muxer
will set PTS = -DTS, DTS = 0 and delay (offset) the
rest of the packets in the segment accordingly.
Unfortunately, subsequent transcodes will not know
about this delay! This typically leads to an overlap
between the first and second segments (but segments after
that would be fine).
The normal way to fix this would be to add a constant delay
to all segments - ffmpeg adds 1.4s to mpegts by default.
However, introducing a delay right now feels a little
odd since we don't really offer any other knobs to control
the timestamp (re-transcodes would accumulate the delay) and
there is some concern about falling out of sync with the
source segment since we have historically tried to make
timestamps follow the source as closely as possible.
So we're leaving this particular bug as-is for now.
There is some commented-out code that adds this delay
in case we feel that we would need it in the future.
Note that FFmpeg CLI also has the exact same problem
when the muxer delay is removed, so this is not a
LPMS-specific issue. This is exercised in the test cases.
Example of non-monotonic DTS after encoding and after muxing:
Segment.Frame | Encoder DTS | Encoder PTS | Muxer DTS | Muxer PTS
--------------|-------------|-------------|-----------|-----------
1.1 | -20 | 0 | 0 | 20
1.2 | -10 | 10 | 10 | 30
1.3 | 0 | 20 | *20* | 40
1.4 | 10 | 30 | *30* | 50
2.1 | 20 | 40 | *20* | 40
2.2 | 30 | 50 | *30* | 50
2.3 | 40 | 60 | 40 | 60
The demuxer_opts pointer was left uninitialized when inp->demuxer.opts
was NULL. This caused avformat_open_input to receive a garbage pointer,
leading to a crash in av_dict_copy when processing dictionary options.
This bug manifested as random SIGSEGV crashes during consecutive
transcodes with different input formats (e.g., TestAPI_ConsecutiveMP4s).
Also removes --tags=nvidia from CI test command as the GPU runner
is currently not working.
Signed-off-by: livepeer-tessa <livepeer-tessa@users.noreply.github.com>
Co-authored-by: livepeer-tessa <livepeer-tessa@users.noreply.github.com>
Some inputs can trigger the FPS/filter pipeline to generate far more output frames
than are actually decoded, leading to very long, disk-filling transcodes.
Plumb decoded frame counts into the encoder path and, for video outputs, abort with
`lpms_ERR_ENC_RUNAWAY` when encoded frames exceed 25x decoded frames, excluding
`image2` inputs where expansion is expected.
The exact inputs which trigger this behavior are unknown as of now but we can
construct a contrived test which reproduces the issue.
This was causing some very large segments to be produced if the
input had some weird characteristics like missing timestamps.
Co-authored-by: Marco van Dijk <marco@stronk.rocks>
* add duration check for input file to ensure not transcoding long inputs and protect against inputs that have time stamp anomalies causing the output to be much longer than the input
---------
Co-authored-by: Josh Allmann <joshua.allmann@gmail.com>
Fixes a number of things including a LPMS crash, choppy video
quality, green screens during rotation, inconsistent frame counts
vs software decoding, etc. We also apparently gained GPU
support for MPEG2 decoding.
This is a massive change: we can no longer add outputs up front
due to the ffmpeg hwaccel API, so we have to wait until we receive
a decoded video frame in order to add outputs. This also means
properly queuing up audio and draining things in the same order.
This adds demuxer options as a complement to the existing encoder/muxer
options which allows us to:
1. explicitly select the demuxer to use if probing doesn't return a good result
2. configure the demuxer with additional options
This has come up a few times while looking at various things so it is good to
have an API that is fully configurable out of the box.
This allows the transcoded resolution to be re-clamped
correctly if the input resolution changes mid-segment.
As a result, we no longer need to do this clamping in golang.
Additionally, make the behavior between GPU and CPU more consistent
by applying nvidia codec limits and clamping CPU transcodes.
This usually happens with CUVID if the decoder needs to be reset
internally for whatever reason, such as a mid-stream resolution
change.
Also block demuxing until decoder is ready to receive packets again.
Also add another condition for re-initialization: if the
input resolution changes. This triggers the filter graph
to re-build and adjust to the new resolution, when CPU
encoders are in use.
This mostly ensures that non-B frames have the same dts/pts.
The PTS/DTS from the encoder can be "squashed" a bit during rescaling
back to the source timebase if it is used directly, due to the lower
resolution of the encoder timebase. We avoid this problem with the
PTS in in FPS passthrough mode by reusing the source pts, but only
rescale the encoder-provided DTS back to the source timebase for some
semblance of timestamp consistency. Because the DTS values are
squashed, they can differ from the PTS even with non-B frames.
The DTS values are still monotonic, so the exact numbers are not really
important. However, some tools use `dts == pts` as a heuristic to check
for B-frames ... so help them out to avoid spurious B-frame detections.
To fix the DTS/PTS mismatch, take the difference between the
encoder-provided dts/pts, rescale that difference back to the source
time base, and re-calculate the dts using the source pts.
Also see https://github.com/livepeer/lpms/pull/405
* Port install_ffmpeg.sh from go-livepeer
* Update ffmpeg and nv-codec-headers versions.
* Use local install_ffmpeg.sh in github CI
* Update transcoder for ffmpeg 7.0.1
* Update tests to be compatible with ffmpeg7 binary
* Fix FPS passthrough
* Set the encoder timebase using AVCodecContext.framerate instead of
the decoder's AVCodecContext.time_base.
The use of AVCodecContext.time_base is deprecated for decoding.
See https://ffmpeg.org/doxygen/3.3/structAVCodecContext.html#ab7bfeb9fa5840aac090e2b0bd0ef7589
* Adjust the packet timebase as necessary for FPS pass through
to match the encoder's expected timebase. For filtergraphs using
FPS adjustment, the filtergraph output timebase will match the
framerate (1 / framerate) and the encoder is configured for the same.
However, for FPS pass through, the filtergraph's output timebase
will match the input timebase (since there is no FPS adjustment)
while the encoder uses the timebase detected from the decoder's
framerate. Since the input timebase does not typically match the FPS
(eg 90khz for mpegts vs 30fps), we need to adjust the packet timestamps
(in container timebase) to the encoder's expected timebase.
* For the specific case of FPS passthrough, preserve the original PTS
as much as possible since we are trying to re-encode existing frames
one-to-one. Use the opaque field for this, since it is already being
populated with the original PTS to detect sentinel packets
during flushing.
Without this, timestamps can be slightly "squashed" down when
rescaling output packets to the muxer's timebase, due to the loss of
precision (eg, demuxer 90khz -> encoder 30hz -> muxer 90khz)
* Improve VFR support.
Manually calculate the duration of each frame and set
the PTS to that before submitting to the filtergraph.
This allows us to better support variable frame rates,
and is also better aligned with how ffmpeg does it.
This may change the number of frames output by the FPS
filter by +/- 1 frame. These aren't issues in themselves
but breaks a lot of test cases which will need to be updated.
* Update test cases for VFR.
This commit allows for switching between transcode() function
implementations. Refactored transcode2() will be used when
environment variable LPMS_USE_NEW_TRANSCODE is set, and the
original transcode() otherwise.
The idea is to allow for gradual roll-out of new refactored
implementation.
Due to my omission the previous solution was going against the
grain of upcoming refactoring changes in lpms_transcode().
Basically the idea is to have streamlined transcoder init() code
so that changes such as Low Latency can be implemented easily in
one place, instead of being distributed in many flows.
Besides, adding yet another "mode" to lpms_transcode() is not
really needed. The transcoder is kinda like any other object
instance - it retains its status between the lpms_transcode()
calls. So I think it is easier to add extra operations (such
as lpms_transcode_reopen_demux() introduced here) to change said
state, instead of increase the complexity of lpms_transcode()
call.
And finally, perhaps most important thing:
Changes like these are needed because LPMS doesn't really chave
good handling of changing configuration (in this case we are
talking about changing from container stream without audio to
the one with audio). It kind of pretends of doing so, and will
handle certain small differences kinda ok, but it is very easy
to come up with situation that will break it completely.
Ideally, I'd like to change that by reducing transient state
to the minimum (such as certain number of hardware buffers for
decoding and encoding) and really re-initialize everything else
that can be reinitialized cheaply. This however requires low
level hardware codec programming, it is not possible to do that
from ffmpeg level.