Packaging using an MPD Source Description

Feature Description 

The primary use-case is to provide (detailed) information about media segment boundaries to the offline packager.

Using the same segment boundary points when packaging video, audio, and subtitle tracks guarantees that all the streams are Chunk Synced. I.e. chunks across all the tracks are synched when the starting time of each chunk is the same. Since the start time of audio and video rarely exactly align, the start time of the audio may be slightly later.

The properties of Chunk Sync make things like replacing content by manipulating a manifest easier since the media share the same timeline and contain the same number of chunks.

When packaging audio and video in separate workflows, or when adding audio tracks at a later time, you can use the same Source Description used by the video to bind the audio tracks and keep Chunk Sync.

An MPD Source Description is used to describe these media segment boundaries. An MPD Source Description follows the MPD schema of MPEG DASH.

User Perspective 

Unified Packager fragments the media by default on GOP boundaries of the media (and/or by a given --fragment_duration).

Specifying exact media segment boundaries is possible by specifying an MPD source Description as input to Unified Packager (--source_description=source_description.mpd). The SegmentTimeline specifies a timeline of arbitrary segment durations.

<?xml version="1.0" encoding="utf-8"?>
<!-- Created with Unified Streaming Platform (version=1.10.22-devel) -->
<MPD
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns="urn:mpeg:dash:schema:mpd:2011"
  xsi:schemaLocation="urn:mpeg:dash:schema:mpd:2011 http://standards.iso.org/ittf/PubliclyAvailableStandards/MPEG-DASH_schema_files/DASH-MPD.xsd"
  type="static"
  profiles="urn:mpeg:dash:profile:full:2011">
  <Period>
    <AdaptationSet>
      <Representation>
        <SegmentTemplate
          timescale="90000">
          <SegmentTimeline>
            <S t="0" d="360000" r="1" />
            <S d="180000" />
            <S d="900000" r="190" />
            <S d="187200" />
          </SegmentTimeline>
        </SegmentTemplate>
      </Representation>
    </AdaptationSet>
  </Period>
</MPD>

The SegmentTemplate@timescale attribute specifies the timescale (in units per second) used for the time and duration attributes in the SegmentTimeline element.

Note that Unified Packager by default uses the timescale of the source media when packaging. Typically this is the media clock frequency, but e.g. Smooth Streaming requires a fixed 10MHz timescale.

Packaging audio for DASH and HLS (CMAF)

The Source Description Manifest looks like:

<?xml version="1.0" encoding="utf-8"?>
<!-- Created with Unified Streaming Platform (version=1.10.22-devel) -->
<MPD
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns="urn:mpeg:dash:schema:mpd:2011"
  xsi:schemaLocation="urn:mpeg:dash:schema:mpd:2011 http://standards.iso.org/ittf/PubliclyAvailableStandards/MPEG-DASH_schema_files/DASH-MPD.xsd"
  type="static"
  profiles="urn:mpeg:dash:profile:full:2011">
  <Period>
    <AdaptationSet>
      <Representation>
        <SegmentTemplate
          timescale="48000">
          <SegmentTimeline>
            <S n="1" d="96256" />
            <S n="2" d="95232" />
            <S n="3" d="96256" />
            <S n="4" d="96256" />
            <S n="5" d="96256" />
            <S n="6" d="96256" />
            <S n="7" d="95232" />
            <S n="8" d="96256" />
            <S n="9" d="96256" />
            <S n="10" d="95232" />
            <S n="11" d="96256" />
            <S n="12" d="96256" />
          </SegmentTimeline>
        </SegmentTemplate>
      </Representation>
    </AdaptationSet>
  </Period>
</MPD>

Example of packaging an audio CMAF track:

mp4split -o audio.cmfa --source_description=audio_description.mpd audio-128k.mp4

The SegmentTimeline specifies the exact durations of the media segments. Since the timescale is set to the audio samplingrate, the durations in the SegmentTimeline are sample accurate.

Packaging audio for Smooth Streaming (ISMA)

Smooth Streaming uses a fixed timescale of 10MHz. Converting the audio timescale to 10MHz may not always be exactly possible and a rounding error may be introduced. E.g. a single audio segment containing 94 AAC-LC frames, sampled at 48KHz has a duration of 94 * 1024 / 48000 = 2.0053333 seconds. To compensate for the (albeit small) rounding error, one in every 3 media segments is rounded up (2.0053334).

The manifest looks like:

<?xml version="1.0" encoding="utf-8"?>
<!-- Created with Unified Streaming Platform (version=1.10.22-devel) -->
<MPD
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns="urn:mpeg:dash:schema:mpd:2011"
  xsi:schemaLocation="urn:mpeg:dash:schema:mpd:2011 http://standards.iso.org/ittf/PubliclyAvailableStandards/MPEG-DASH_schema_files/DASH-MPD.xsd"
  type="static"
  profiles="urn:mpeg:dash:profile:full:2011">
  <Period>
    <AdaptationSet>
      <Representation>
        <SegmentTemplate
          timescale="10000000">
          <SegmentTimeline>
            <S n="1" d="20053333" />
            <S n="2" d="19840000" />
            <S n="3" d="20053334" />
            <S n="4" d="20053333" />
            <S n="5" d="20053333" />
            <S n="6" d="20053334" />
            <S n="7" d="19840000" />
            <S n="8" d="20053333" />
            <S n="9" d="20053333" />
            <S n="10" d="19840000" />
            <S n="11" d="20053334" />
            <S n="12" d="20053333" />
          </SegmentTimeline>
        </SegmentTemplate>
      </Representation>
    </AdaptationSet>
  </Period>
</MPD>

Example of packaging a Smooth Streaming audio track:

mp4split -o audio.isma --source_description=audio_description.mpd --timescale=10000000 audio-128k.mp4

The <SegmentTimeline> species the exact durations of the media segments. Only when packaging for Smooth Streaming and a @timescale of 10MHz we allow the duration to be off by one 1 tick. When the sum of all sample durations in a media segment are off by 1, we update the duration of the last sample, to exactly match the duration given in the SegmentTimeline.

Packaging audio for HLS (MPEG-TS)

HTTP Live Streaming (MPEG-TS) uses a fixed timescale of 90KHz. Converting the audio timescale to 90KHz is exact for audio sampled at 48KHz.

The manifest looks like:

<?xml version="1.0" encoding="utf-8"?>
<!-- Created with Unified Streaming Platform (version=1.10.22-devel) -->
<MPD
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns="urn:mpeg:dash:schema:mpd:2011"
  xsi:schemaLocation="urn:mpeg:dash:schema:mpd:2011 http://standards.iso.org/ittf/PubliclyAvailableStandards/MPEG-DASH_schema_files/DASH-MPD.xsd"
  type="static"
  profiles="urn:mpeg:dash:profile:full:2011">
  <Period>
    <AdaptationSet>
      <Representation>
        <SegmentTemplate
          timescale="90000">
          <SegmentTimeline>
            <S n="1" d="180480" />
            <S n="2" d="178560" />
            <S n="3" d="180480" />
            <S n="4" d="180480" />
            <S n="5" d="180480" />
            <S n="6" d="180480" />
            <S n="7" d="178560" />
            <S n="8" d="180480" />
            <S n="9" d="180480" />
            <S n="10" d="178560" />
            <S n="11" d="180480" />
            <S n="12" d="180480" />
          </SegmentTimeline>
        </SegmentTemplate>
      </Representation>
    </AdaptationSet>
  </Period>
</MPD>

mp4split -o audio-aac.m3u8 --package-hls \
         --output-single-file --base-media-file=aac \
         --source_description=audio_description.mpd audio-128k.mp4

The <SegmentTimeline> is used to specify the exact durations of the media segments.

Warnings and notices 

When a source description is used in the media packaging step, there is an informational message about it. It also tells you how many segment boundaries are listed and the total duration.

I0.000 Loading source description from file:///../hls_ac3_fragmentation.mpd
I0.000 Added 1280 media segment boundaries with a total duration of 00:42:42.560000

When the total duration of the source description is longer than the duration of the media packaged then a warning is logged. It tells you the number of the last media chunk and how much media is missing.

W0.000 Not enough samples for media segment: n="1253" t=120408576 (missing 0.768 seconds)

If the duration of the media is a lot shorter than the source description then a warning is printed telling you how many media segments were ignored.

W0.000 Ignored 28 media segment boundaries with a total duration of 00:00:54.048000