Packaging using an MPD Source Description
Feature Description
The primary use-case is to provide (detailed) information about media segment boundaries to the offline packager.
Using the same segment boundary points when packaging video, audio, and
subtitle tracks guarantees that all the streams are Chunk Synced
. I.e.
chunks across all the tracks are synched when the starting time of each chunk is
the same. Since the start time of audio and video rarely exactly align, the
start time of the audio may be slightly later.
The properties of Chunk Sync
make things like replacing content by
manipulating a manifest easier since the media share the same timeline
and contain the same number of chunks.
When packaging audio and video in separate workflows, or when adding audio
tracks at a later time, you can use the same Source Description used by the
video to bind the audio tracks and keep Chunk Sync
.
An MPD Source Description is used to describe these media segment boundaries. An MPD Source Description follows the MPD schema of MPEG DASH.
User Perspective
Unified Packager fragments the media by default on GOP boundaries of the media (and/or by a given --fragment_duration).
Specifying exact media segment boundaries is possible by specifying an MPD
source Description as input to Unified Packager
(--source_description=source_description.mpd). The SegmentTimeline
specifies
a timeline of arbitrary segment durations.
1<?xml version="1.0" encoding="utf-8"?>
2<!-- Created with Unified Streaming Platform (version=1.10.22-devel) -->
3<MPD
4 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
5 xmlns="urn:mpeg:dash:schema:mpd:2011"
6 xsi:schemaLocation="urn:mpeg:dash:schema:mpd:2011 http://standards.iso.org/ittf/PubliclyAvailableStandards/MPEG-DASH_schema_files/DASH-MPD.xsd"
7 type="static"
8 profiles="urn:mpeg:dash:profile:full:2011">
9 <Period>
10 <AdaptationSet>
11 <Representation>
12 <SegmentTemplate
13 timescale="90000">
14 <SegmentTimeline>
15 <S t="0" d="360000" r="1" />
16 <S d="180000" />
17 <S d="900000" r="190" />
18 <S d="187200" />
19 </SegmentTimeline>
20 </SegmentTemplate>
21 </Representation>
22 </AdaptationSet>
23 </Period>
24</MPD>
The SegmentTemplate@timescale
attribute specifies the timescale (in units
per second) used for the time and duration attributes in the SegmentTimeline
element.
Note that Unified Packager by default uses the timescale of the source media when packaging. Typically this is the media clock frequency, but e.g. Smooth Streaming requires a fixed 10MHz timescale.
Packaging audio for DASH and HLS (CMAF)
The Source Description Manifest looks like:
1<?xml version="1.0" encoding="utf-8"?>
2<!-- Created with Unified Streaming Platform (version=1.10.22-devel) -->
3<MPD
4 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
5 xmlns="urn:mpeg:dash:schema:mpd:2011"
6 xsi:schemaLocation="urn:mpeg:dash:schema:mpd:2011 http://standards.iso.org/ittf/PubliclyAvailableStandards/MPEG-DASH_schema_files/DASH-MPD.xsd"
7 type="static"
8 profiles="urn:mpeg:dash:profile:full:2011">
9 <Period>
10 <AdaptationSet>
11 <Representation>
12 <SegmentTemplate
13 timescale="48000">
14 <SegmentTimeline>
15 <S n="1" d="96256" />
16 <S n="2" d="95232" />
17 <S n="3" d="96256" />
18 <S n="4" d="96256" />
19 <S n="5" d="96256" />
20 <S n="6" d="96256" />
21 <S n="7" d="95232" />
22 <S n="8" d="96256" />
23 <S n="9" d="96256" />
24 <S n="10" d="95232" />
25 <S n="11" d="96256" />
26 <S n="12" d="96256" />
27 </SegmentTimeline>
28 </SegmentTemplate>
29 </Representation>
30 </AdaptationSet>
31 </Period>
32</MPD>
Example of packaging an audio CMAF track:
mp4split -o audio.cmfa --source_description=audio_description.mpd audio-128k.mp4
The SegmentTimeline
specifies the exact durations of the media segments.
Since the timescale is set to the audio samplingrate, the durations in the
SegmentTimeline
are sample accurate.
Packaging audio for Smooth Streaming (ISMA)
Smooth Streaming uses a fixed timescale of 10MHz. Converting the audio timescale to 10MHz may not always be exactly possible and a rounding error may be introduced. E.g. a single audio segment containing 94 AAC-LC frames, sampled at 48KHz has a duration of 94 * 1024 / 48000 = 2.0053333 seconds. To compensate for the (albeit small) rounding error, one in every 3 media segments is rounded up (2.0053334).
The manifest looks like:
1<?xml version="1.0" encoding="utf-8"?>
2<!-- Created with Unified Streaming Platform (version=1.10.22-devel) -->
3<MPD
4 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
5 xmlns="urn:mpeg:dash:schema:mpd:2011"
6 xsi:schemaLocation="urn:mpeg:dash:schema:mpd:2011 http://standards.iso.org/ittf/PubliclyAvailableStandards/MPEG-DASH_schema_files/DASH-MPD.xsd"
7 type="static"
8 profiles="urn:mpeg:dash:profile:full:2011">
9 <Period>
10 <AdaptationSet>
11 <Representation>
12 <SegmentTemplate
13 timescale="10000000">
14 <SegmentTimeline>
15 <S n="1" d="20053333" />
16 <S n="2" d="19840000" />
17 <S n="3" d="20053334" />
18 <S n="4" d="20053333" />
19 <S n="5" d="20053333" />
20 <S n="6" d="20053334" />
21 <S n="7" d="19840000" />
22 <S n="8" d="20053333" />
23 <S n="9" d="20053333" />
24 <S n="10" d="19840000" />
25 <S n="11" d="20053334" />
26 <S n="12" d="20053333" />
27 </SegmentTimeline>
28 </SegmentTemplate>
29 </Representation>
30 </AdaptationSet>
31 </Period>
32</MPD>
Example of packaging a Smooth Streaming audio track:
mp4split -o audio.isma --source_description=audio_description.mpd --timescale=10000000 audio-128k.mp4
The <SegmentTimeline>
species the exact durations of the media segments.
Only when packaging for Smooth Streaming and a @timescale of 10MHz we allow
the duration to be off by one 1 tick. When the sum of all sample durations in a
media segment are off by 1, we update the duration of the last sample, to
exactly match the duration given in the SegmentTimeline
.
Packaging audio for HLS (MPEG-TS)
HTTP Live Streaming (MPEG-TS) uses a fixed timescale of 90KHz. Converting the audio timescale to 90KHz is exact for audio sampled at 48KHz.
The manifest looks like:
1<?xml version="1.0" encoding="utf-8"?>
2<!-- Created with Unified Streaming Platform (version=1.10.22-devel) -->
3<MPD
4 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
5 xmlns="urn:mpeg:dash:schema:mpd:2011"
6 xsi:schemaLocation="urn:mpeg:dash:schema:mpd:2011 http://standards.iso.org/ittf/PubliclyAvailableStandards/MPEG-DASH_schema_files/DASH-MPD.xsd"
7 type="static"
8 profiles="urn:mpeg:dash:profile:full:2011">
9 <Period>
10 <AdaptationSet>
11 <Representation>
12 <SegmentTemplate
13 timescale="90000">
14 <SegmentTimeline>
15 <S n="1" d="180480" />
16 <S n="2" d="178560" />
17 <S n="3" d="180480" />
18 <S n="4" d="180480" />
19 <S n="5" d="180480" />
20 <S n="6" d="180480" />
21 <S n="7" d="178560" />
22 <S n="8" d="180480" />
23 <S n="9" d="180480" />
24 <S n="10" d="178560" />
25 <S n="11" d="180480" />
26 <S n="12" d="180480" />
27 </SegmentTimeline>
28 </SegmentTemplate>
29 </Representation>
30 </AdaptationSet>
31 </Period>
32</MPD>
mp4split -o audio-aac.m3u8 --package-hls \
--output-single-file --base-media-file=aac \
--source_description=audio_description.mpd audio-128k.mp4
The <SegmentTimeline>
is used to specify the exact durations of the media
segments.
Warnings and notices
When a source description is used in the media packaging step, there is an informational message about it. It also tells you how many segment boundaries are listed and the total duration.
I0.000 Loading source description from file:///../hls_ac3_fragmentation.mpd
I0.000 Added 1280 media segment boundaries with a total duration of 00:42:42.560000
When the total duration of the source description is longer than the duration of the media packaged then a warning is logged. It tells you the number of the last media chunk and how much media is missing.
W0.000 Not enough samples for media segment: n="1253" t=120408576 (missing 0.768 seconds)
If the duration of the media is a lot shorter than the source description then a warning is printed telling you how many media segments were ignored.
W0.000 Ignored 28 media segment boundaries with a total duration of 00:00:54.048000