Packaging for Unified Origin

Package MP4 to fragmented-MP4 and back

Fragmented-MP4 files such as PIFF, ISMV or CMAF can be packaged using Unified Packager for use with Unified Origin.

There is a specific section on How to package CMAF).

Converting from an MP4 file to a fragmented mp4 file (.ismv) can be done as follows:


mp4split -o tears-of-steel-avc1-750k.ismv \

The other way around, from fragmented MP4 / PIFF / ISMV / CMAF to MP4:


mp4split -o tears-of-steel-avc1-750k.mp4 \

Or from Adobe's F4F to MP4:


mp4split -o tears-of-steel-avc1-750k.mp4 \

You can also use the server manifest file as input. All the audio and video streams referenced in the manifest file are combined into one MP4 file.


mp4split -o tears-of-steel-avc1-750k.mp4 \

It is also possible to package multiple tracks into a single MP4. When doing this, the order in which the tracks are specified on the command-line will define the order of the tracks within the MP4 (which can be useful for Define default track when preparing content (track order)):


mp4split -o tos_avc1-sorted.ismv \
  tears-of-steel-avc1-1000k.mp4 \
  tears-of-steel-avc1-1500k.mp4 \
  tears-of-steel-avc1-750k.mp4 \

How to package CMAF

New in version 1.8.3.

The Common Media Application Format is defined in ISO/IEC 23000-19, and is a preferred interopable profile for using fragmented MP4.

The process of packaging CMAF is very similar to using mp4split to package ISMV. Depending on the extension of the output, mp4split will decide how the content is packaged. In the case of CMAF, you can choose between .cmfv for video, .cmfa for audio and .cmft for text streams.

The command-line that you use is very straightforward, as shown in the examples for packaging CMAF audio, video an text streams video below:


mp4split -o tears-of-steel-aac-128k.cmfa tears-of-steel-aac-128k.mp4
mp4split -o tears-of-steel-avc1-2200k.cmfv tears-of-steel-avc1-2200k.mp4
mp4split -o tears-of-steel-en.cmft

Use to package the Tears of Steel demo contents to CMAF.

If you are packaging CMAF to support fMP4 HLS and want to follow Apple's recommendation of delivering 6 second segments, you will have to make sure that the CMAF files that contain your media are fragmented with this length in mind as each fragment in a MP4 equals a 'play-out' segment when content is statically packaged (when using Origin, the segment length can be changed on-the-fly for each play-out format).

You can make sure your content is fragmented to your liking by using the --fragment_duration option while packaging CMAF (per default, Unified Packager fragments the MP4 according to the GOP structure of the media). See below for an example.

However, do note that the specified length should be a multiple of the media's GOP length, as segments can only contain full GOPs. So, in case you are ingesting media with 1.92 seconds GOPs and want to aim for a segment duration of 6 seconds, you should specify a fragment duration of 576/100.

In below example GOP length is 48 frames at 24Hz, or 2 seconds. We use 4 GOPs per fragment to align segment boundaries with audio (because 375 blocks of 1024 samples at 48kHz is precisely 8 seconds):


mp4split -o tos-8s-aac-128k.cmfa \
  --fragment_duration=384000/48000 \
mp4split -o tos-8s-avc1-2200k.cmfv \
  --fragment_duration=192/24 \
mp4split -o tos-8s-en.cmft \
  --fragment_duration=8/1 \


If you want to override or add any properties to a track when packaging CMAF, you can do so using the familiar options: Overriding and adding track properties.


Fragmented-MP4 based on CMAF, according to the specification only a single track per file is suppored. Thus in case you input contains multiple tracks of the same content type it is necesary to select the track using --track_id option.

Options for fragmented-MP4 packaging

The packager supports the following options.


Create a (progressive) mp4 that references a fragmented mp4 file, for ''progressive download'' to older players. Unified Origin will resolve media data references on playout.

The path(s) to the input you provide on the command-line can be relative or absolute. If a relative path is used, it must be a downward path and the input must be local. To create a dref that references remote content using relative paths, mount the remote content so that you process it as if it's local. You can do this using a tool like s3fs or similar.


New in version 1.7.27.

Like "--use_dref" creates mp4 that references an mp4 file, but without explicitly referencing sub-samples, resulting in a (considerably) smaller video mp4.

The path(s) to the input you provide on the command-line can be relative or absolute. If a relative path is used, it must be a downward path and the input must be local. To create a dref that references remote content using relative paths, mount the remote content so that you process it as if it's local. You can do this using a tool like s3fs or similar.


If you want offer a download to own option use --use_dref, because encryption requires sub-sample data. However, for progressive download without encryption --use_dref_no_subs suffices.


Do not write the output.


The output timescale used for media. Defaults to the original media or 10MHz when the "piff" brand is used.


The target duration of each fragment expressed as a fraction "X/Y" of seconds (or in milliseconds "X"), default 2s. Behaviour is similar to #EXT-X-TARGETDURATION in HLS or maxSegmentDuration in MPEG-DASH. When sync-samples are present, each fragment starts with a sync-sample and has 0 or more additional sync samples - as many as will fit into fragment duration.

This parameter can be useful to align fragment boundaries across different codecs or tuning the fragment duration for a specific playout format. For example, Apple recommends 6 second segments in HLS, while 2 seconds is common in Smooth Streaming.


Sets the 'brand'. Common options: "piff", "iso6", "ccff", "dash" and "cmfc". Default is "iso6", but with timescale=10000000 (10Mhz) the default is "piff". When packaging CMAF (i.e., extension of output is .cmfv, .cmfa or .cmft) the default is "cmfc".

When creating (progressive) mp4 files with negative-composition-times "iso4" is used as brand. When using "iso2" negative composition time offsets are disabled and an edit list is used to compensate for the ct_offset.

By using the --brand option, you overrule the default major brand for the given output. This can be helpful if you want to make sure that your output uses negative composition time offsets instead of an edit list ("iso4") or, the other way around, uses an edit list instead of negative composition time offsets ("iso2"). When using the option more than once on the same command-line, any brands specified after the first will be added as compatibility brands.

For instance, the first example below will result in a CMAF-file with "cmfc" as its major brand and "iso9" as a compatibility brand, whereas the compatibility brand will be "dash" in the second example:

mp4split -o example.cmfv --brand=cmfc --brand=iso9 example.mp4
mp4split -o example.cmfv --brand=cmfc --brand=dash example.mp4


If your workflow for fMP4 HLS involves one or more WebVTT files that are or were part of a HLS Transport Stream playout scenario, chances are that a 10 seconds time offset is signaled in the WebVTT (using the EXT-X-TIMESTAMP-MAP tag).

If this is the case, it will result in the subtitles being out of sync as the default when packaging CMAF is to not use a time offset. To synchronize the WebVTT timeline with the other media you can simply remove the EXT-X-TIMESTAMP-MAP tag from the WebVTT file.

Synchronizing the other media to the WebVTT timeline is also possible, but not recommended. To do so, offset all other media when packaging by adding the option --timestamp_offset=10. The following can also be used when re-packaging HLS-TS, but this is not recommended.


Whenever video media samples are reordered a composition delay is introduced. To compensate for this delay we use negative composition offsets (version 1 "ctts" and "trun" boxes) where necessary.

You can also use positive composition offsets (version 0 "ctts" and "trun" boxes). An edit list is then added to remove the composition delay. Note that the use of this option is not recommended.

Overriding and adding track properties

When generating a fragmented or progressive MP4 file (.mp4, .isma, .ismv or .ismt) from an input track, its track properties are based on the properties of the input track. It is possible to add or override some of these properties, but in most cases, this is not necessary.


You can also set a name and description for each track, but that is only possible when generating a server manifest. See --track_name and --track_description.

When generating a fragmented or progressive MP4, the track properties that can be added or overridden are the following:


By default track_language is taken from the input track's media info. In case you do need or want to set the language for a track, make sure to use the correct RFC 5646 language tags. These language tags consist of two-letter, three-letter, extended languages, and scripts.


Only DASH and HLS offer support for RFC 5646. For output formats that do not support it, a tag's first two or three characters are parsed according to ISO 639-1 or ISO 639-2/T. If this does not result in a valid language tag, und is used. To make sure that a valid fallback option is available, it is good practice to specify a macro language when possible. For example: signal Cantonese Chinese using zh-yue rather than yue so that 'Chinese' is used as the fallback option.

For example, specifying languages with two letter ISO 639-1 language codes:

  • en for English

  • nl for Dutch

  • es for Spanish

For languages that do not have two letter language codes but do have ISO 639-2/T or ISO 639-3 codes:

  • haw for Hawaiian

  • yue for Cantonese

For languages as used in different regions:

  • en-UK for English as used in the UK

  • nl-BE for Dutch; Flemish as used in Belgium

  • pt-BR for Portuguese as used in Brazil

For additional scripting tags for languages:

  • sr-Cyrl for Serbian using the Cyrillic script

  • zh-Hans for Chinese using the simplified script


In addition to the language tag, HLS requires the presence of a language name. For tags that are part of ISO 639-1 or ISO 639-2/T mapping of the tag to a name is automatic. For all other language tags, --track_description should be used to signal the name.


mp4split -o audio-en.mp4 --track_language=en
mp4split -o audio-nl-be.mp4 --track_language=nl-be \
            --track_description="Vlaams Nederlands"
mp4split -o audio-zh-yue-hant.mp4 --track_language=zh-yue-hant \
            --track_description="Cantonese Chinese using Traditional script"

Language tag formatting

New in version 1.10.2.


As defined in the RFC 5646 Formatting, the capitalization of language tags is now enforced. Thus nl-be will be formatted to nl-BE, zh-hans will be formatted to zh-Hans, and EN-US will be formatted to en-US.


Overrides the average bitrate of a track.

By default track_bitrate is the average bitrate (either from the metadata info of the input track, or calculated from the source samples). But it can be overridden explicitly like the following:


mp4split -o tos-override-bitrates.ismv \
 tears-of-steel-aac-64k.mp4 --track_bitrate=64000 \
 tears-of-steel-aac-128k.mp4 --track_bitrate=128000 \
 tears-of-steel-avc1-400k.mp4 --track_bitrate=400000 \
 tears-of-steel-avc1-750k.mp4 --track_bitrate=750000 \
 tears-of-steel-avc1-1000k.mp4 --track_bitrate=1000000

You can also set this to max, so that the maximum/peak bitrate is used. In case the source video contains audio as well then audio and video should be processed in separate steps to get the wanted values:


mp4split -o video.ism Origin.mp4 --track_type=video --track_bitrate=max
mp4split -o audio.ism Origin.mp4 --track_type=audio
mp4split -o main.ism audio.ism video.ism

Resulting manifest will have the following, with the max video bitrate taken from the source content:

<audio src="Origin.mp4" systemBitrate="96000" systemLanguage="por">
<video src="Origin.mp4" systemBitrate="3155800">


To determine the max bitrate the complete track is parsed and bitrate is calculated for each second of media data in the track. The highest value of all of the calculated values is considered the max bitrate of the track.


Sets the role of a track and can be used to further distinguish it, next to bitrate and language. The exact meaning of a role can be dependent on the kind of track it is added to (video, audio or text). All of the roles specified in urn:mpeg:dash:role:2011 can be used. Most of them are listed in the table below:




main media intended for presentation if no other information is provided.


media that is an alternative to the main media of the same type.


media that is supplementary to media content of a different media component type.


media content component with commentary.


media content component with captions (typically containing description of music and other sounds, in addition to transcript of dialog). note that this role triggers specific accessibility signaling for captions in both DASH and HLS.


media content component with subtitles.


track containing textual description (intended for audio synthesis) or audio description, describing visual component. note that this role does not trigger specific accessibility signaling, it only changes the role to 'description' for DASH.


media component containing information intended to be processed by application specific elements.


textual information meant for display when no other text representation is selected. note that this role triggers specific accessibility signaling for forced subtitles in HLS (as well as adding the 'forced-subtitle' role for the track in DASH, which is the default behavior for DASH when a role is specified for a subtitles track).


mp4split -o example-audio.isma \
  example-audio.mp4 --track_role=main --track_language=eng
mp4split -o example-commentary.isma \
  example-commentary.mp4 --track_role=alternate --track_language=eng


Never mix the use of --track_role and --track_kind when you want to enable the signaling of accessibility features for a track.

If you want to signal the accessibility features for an audio description track in DASH and HLS, using --track_role to define the track's role as 'description' will not get you the results that you expect (it will only affect the DASH output, and only change the track's role, not add an <Accessibility> element). Please read the section on the --track_kind option below, including the 'Use case walkthrough' that explains how to add this signaling step-by-step.

Adding accessibility signaling for a captions track is more straightforward, as it only requires you to specify the track's role as caption when packaging it, or when creating the server manifest (using --track_role=caption).

The signaling for both captions and audio description tracks is based on the DVB-DASH specification for DASH and the HLS Authoring Specification for HLS.

In addition to the signaling described above, it's also possible to trigger logic that will add the 'FORCED=YES' attribute and value to a subtitles track in HLS by specifying the track's role as forced-subtitle (this will also add the 'forced-subtitle' role for the track in DASH, which is the default behavior for DASH when a role is specified for a subtitles track).


New in version 1.7.31.

Adds a SchemeIdUri/Value pair to the 'kind' box when packaging a (f)MP4. This box should describe the intended purpose of the track. Similar to the --track_role option described above, the --track_kind option can be used to further distinguish a track, besides its bitrate and language.

Specifying the parameters of this option is done like so:


Where the <SchemeIdUri> and <Value> should be replaced with parameters of choice, preferably from the about:html-kind defined by W3C HTML5 or the urn:mpeg:dash:role:2011 scheme defined by MPEG-DASH (although the latter can be signaled more easily using the --track_role option).

When packaging for Unified Origin, the main use case of the --track_kind option is adding and properly signaling tracks that provide accessibility features, such as captions for the hard of hearing or an audio description of the video track for the visually impaired. The first can be signaled using urn:mpeg:dash:role:2011@caption (or --track_role=caption), whereas the about:html-kind can be used to signal the latter with about:html-kind@main-desc.

Take for example the situation in which the about:html-kind@main-desc 'kind' is present in a track that has been added to a server manifest. Unified Origin will then add the following parameters for this track when generating a DASH client manifest (.mpd) and HLS main playlist (.m3u8) for playout, based on the DVB-DASH specification and the HLS Authoring Specification respectively:

MPEG-DASH (.mpd)


HLS (.m3u8)


Configuring Audio description track how to

As an example, consider as a starting point:

  • ABR video in english-video.ismv

  • Main audio in english-audio.isma

  • Alternate audio in welsh-audio.isma

  • Audio description in english-ad-without-kind-box.isma

Assuming the other tracks are packaged correctly, only the audio description track needs to be repackaged (to include the 'kind' box with the accessibility info). Because its language, bitrate and codec is identical to the main audio track, the 'kind' box is what distinguishes it (apart from its actual content of course).

As indicated above, about:html-kind@main-desc should be used as the value for the kind box for audio description tracks, so:


mp4split -o english-ad-with-kind-box.isma \
  english-ad-without-kind-box.isma \

After repackaging the audio description track, generate a server manifest to stream with Unified Origin:


mp4split -o presentation.ism \
  --hls.client_manifest_version=4 \
  english-video.ismv \
  english-audio.isma \
  welsh-audio.isma \

Or use the tracks for a static packaging workflow using Unified Packager to create DASH or HLS streams (accessibility signaling is not supported for Smooth).

Packaging content for delivery by Unified Origin

The first step is to package all the source content into the format that is used by Unified Origin. This is the fragmented-MP4 format.

The example uses this Source Content.


mp4split -o video_400k.ismv \
  tears-of-steel-avc1-400k.mp4 \

mp4split -o video_800k.ismv \
  tears-of-steel-avc1-750k.mp4 \

mp4split -o video.ismv \
  tears-of-steel-avc1-400k.mp4 \
  tears-of-steel-avc1-750k.mp4 \
  tears-of-steel-dts-384k.mp4 \
  tears-of-steel-ac3-448k.mp4 \

Now that we have packaged all the audio and video, the following step is to create the two progressive download files. Instead of creating a completely new MP4 video file we will create an MP4 video that only contains the necessary index and references the actual movie data that is stored in the fragmented-MP4 format.


mp4split -o video_400k.mp4 --use_dref \

mp4split -o video_800k.mp4 --use_dref \

As a last step we create the server manifest file. This is an XML file that contains the media information about all the tracks and is used by the USP webserver module.


mp4split -o video.ism \
  video.ismv \
  video_400k.ismv \

At this point we have six files stored for our presentation:




AAC-LC, 400 kbps video


HE-AAC, 800 kbps video


200/600 kbps video, DTS, AC3


AAC-LC, 400 kbps video


HE-AAC, 800 kbps video


USP server manifest file

The USP webserver module makes the following URLs available. Note that except for the progressive download URLs, they are all virtual and do not exist on disk:

Playout format


Smooth Streaming

HTTP Live Streaming

HTTP Dynamic Streaming


Progressive download

Progressive download

Please download the sample script which creates the various server manifest as discussed above. The sample content is Tears of Steel.