Packaging Subtitles
Unified Packager allows you to package and prepare your subtitles for streaming delivery (using statically packaged files, or dynamic packaging with Origin):
General workflow for adding subtitles to a stream
Whether you are preparing your subtitles for streaming delivery using static
packaging with Packager or dynamic packaging with Origin, the general rule is
that subtitles have to be packaged in a fMP4 container (.ismt
or .cmft
)
before they can be added to a stream. All styling information and editorial
changes should be made before packaging using the relevant encoder or subtitle
tooling. When packaged in a fMP4 container, adding a subtitle track to a stream
works the same as adding audio or video tracks:
For streaming delivery using statically packaged files, add the
.ismt
or.cmft
with subtitles to yourmp4split
input when generating the client manifest (.mpd
or.m3u8
)For streaming delivery using dynamic packaging with Origin for VOD, add the
.ismt
or.cmft
with subtitles to yourmp4split
input when generating the server manifest (.ism
)For streaming delivery using dynamic packaging with Origin for Live, the encoder should POST the subtitles one track per language to the publishing point
How you should package your subtitles in a fMP4 container is explained on this page, where you are now. Example command-lines for adding fMP4 packaged subtitles to different kinds of streams can be found in the relevant parts of the documentation, listed below:
Note
The three exceptions to the general rule that you need to package your subtitles in a fMP4 container before you can add them to a stream are:
Supported formats for subtitles
You can use TTML (Timed Text Markup Language), WebVTT (Web Video Text Tracks) or SRT (SubRip Text) as your source and use Packager to convert one to the other, as well as package TTML and WebVTT in a fMP4 container.
For more information on these different formats, please read our blog about subtitles: Welcome to the jungle: caption and subtitle formats in video streaming. In short, WebVTT and SRT are nearly identical formats in plain-text, whereas TTML is XML-based.
New in version 1.10.16.
In addition to the above it is possible to extract subtitles from a CEA-608 embedded captions track, and store them as TTML or WebVTT.
Source |
Possible outputs |
---|---|
TTML |
WebVTT, TTML in fMP4 |
WebVTT (or SRT) |
TTML, WebVTT in fMP4 |
CEA-608 |
TTML, WebVTT |
Supported TTML profiles
The TTML specification defines the use of profiles. Each profile specifies a certain feature set. You can learn more about these profiles and their features in our blog about subtitles: Welcome to the jungle: caption and subtitle formats in video streaming. Packager can package TTML subtitles that follow any of the following profiles: DFXP, SMPTE-TT, EBU-TT-D, SDP-US, CFF-TT and the IMSC1 Text Profile. Unified Origin supports all of those profiles as well.
Difference between WebVTT and SRT
WebVTT is based on SRT and both are very similar, with only small differences in formatting. Overall, the most important difference is that the WebVTT has an official specification that is recommended by W3C and that allows for more advanced formatting features (such as positioning).
When using WebVTT or SRT as input for mp4split
, do consider that:
For SRT,
mp4split
assumes the input file is encoded in ASCII unless it starts with a Byte Order Marker (BOM) that describes how the input should be transformed to UnicodeFor WebVTT,
mp4split
always interprets the input files as being encoded as Unicode (regardless of any BOM), because WebVTT is UTF-8 by definition
Note
Both WebVTT and SRT do not contain signaling for the language of the
subtitles in the file. Therefore, always specify the language when using
WebVTT or SRT as input for mp4split
(using the --track_language
command-line option). Otherwise, the language that is signaled defaults to
English.
Packaging TTML, WebVTT or SRT in fMP4
When you use Packager to package your subtitles in a fMP4 container, we follow ISO 14496-30. This results in the following:
When using WebVTT (or SRT) as input, the resulting fMP4 will use the
wvtt
codecWhen using TTML as input, the resulting fMP4 will use the
stpp
codec
There are only two exceptions to this rule, which are related to packaging TTML and explained in the relevant section below.
When packaging subtitles in an fMP4 container, the following options may be relevant:
--track_language: When you need to add (for WebVTT or SRT) or overrule language signaling. (If the source does not contain language signaling and you do not add any, English is the default.)
--track_role and --track_kind: When you need to define a 'role' for the subtitles track, or want to add signaling for an accessibility feature.
--fragment_duration: When you want to specify the duration of the fMP4 media fragments in which the subtitles are chunked. E.g. to align it with the fragments (GOPs) of the other media in your presentation. (The default for subtitles is to create a sample for each separate subtitle cue (progressive) or 2s fragments (cmft/ismt).)
WebVTT (or SRT) in fMP4
New in version 1.7.31.
To create a fMP4 with subtitles that are formatted according to the wvtt
codec, use WebVTT (or SRT) subtitles as input. To avoid confusion about
character encoding, we recommend WebVTT which uses UTF-8 by definition. You
should always specify the language of the track that you are packaging (using
--track_language), because WebVTT and SRT files do not contain
language signaling. We recommend using a fragment duration that aligns with the
other tracks in the presentation (using --fragment_duration):
#!/bin/bash
mp4split -o tears-of-steel-wvtt-nl.ismt \
--fragment_duration=60/1 \
tears-of-steel-nl.webvtt --track_language=nl
mp4split -o tears-of-steel-wvtt-de.ismt \
--fragment_duration=60/1 \
tears-of-steel-de.srt --track_language=de
Note
Packaging wvtt
allows for WebVTT specific cue settings to
define individual subtitle positioning, region and styling information. This
provides more detailed control over WebVTT than relying on Unified Origin to
transcode WebVTT fragments from TTML formatted subtitles.
TTML in fMP4
To create a fMP4 with subtitles that are formatted according to the stpp
codec, use TTML subtitles as input: [1]
#!/bin/bash
mp4split -o tears-of-steel-ttml-nl.ismt \
tears-of-steel-nl.ttml
This command creates a file with a single track, which is why the TTML input file should contain only one language. If you have a single TTML file that contains multiple languages then you will have to extract separate TTML files for each language first.
As already noted above, there are two exceptions to take into account when packaging TTML in fMP4:
When you use SMPTE-TT formatted TTML with bitmaps as your input, the samples in the fMP4 are automatically formatted according to SMPTE-TT specification
When you are statically packaging HTTP Smooth Streaming (Packaging for HTTP Smooth Streaming (HSS)), you should use command-line option
--brand=piff
to ensure that the olderdfxp
codec is used, so that the timing of the@begin
and@end
attributes in the resulting fMP4 is relative to the start of each sample, instead of relative to the start of the track
Note
The distinction between the stpp
and dfxp
codec is only relevant
for statically packaged content. When you are working with Unified Origin, timing
will be adjusted automatically if necessary.
Converting WebVTT (or SRT) to TTML
When you convert WebVTT or SRT to TTML, the TTML will have a default styling
and layout that in general should work well (see the overview of supported cue
components below). To convert WebVTT or SRT to TTML, use a WebVTT or SRT file
as input and specify an output with .ttml
or .dfxp
as the extension.
#!/bin/bash
mp4split -o tears-of-steel-nl.ttml \
tears-of-steel-nl.webvtt --track_language="nl"
mp4split -o tears-of-steel-fr.ttml \
tears-of-steel-fr.srt --track_language="fr"
Supported cue components
When converting WebVTT or SRT to TTML, only a limited set of markup features is converted to their TTML equivalents. Others are either ignored or escaped (see the example below). The markup features that will be converted are the following:
Name |
Description |
---|---|
|
Bolds the textual content |
|
Italicises the text |
|
Underlines the textual content |
|
Specifies a line strike through on the text |
Here is an example of a regular WebVTT file with some cue point component elements:
WebVTT cue point example :
WEBVTT
1
00:00:15,000 --> 00:00:18.000
At the <u>left</u> we can see...
2
00:00:18,167 --> 00:00:20,083 position:35% line:20 align:left
At the <u>right</u> we can see the...
3
00:00:20,083 --> 00:00:22.000
...the <c.highlight>head-snarlers</c>
4
00:00:22,000 --> 00:00:24.417
Everything is safe.
<i>Perfectly</i> safe.
Result after converting to TTML:
<?xml version="1.0" encoding="utf-8"?>
<tt xmlns="..." xml:lang="en">
<head>...</head>
<body>
<div xml:lang="en">
<p begin="00:00:15.000" end="00:00:18.000" region="speaker">
At the <span tts:textDecoration="underline">left</span> we can see...
</p>
<p begin="00:00:18.167" end="00:00:20.083" region="speaker">
At the <span tts:textDecoration="underline">right</span> we can see the...
</p>
<p begin="00:00:20.083" end="00:00:22.000" region="speaker">
...the <c.highlight>head-snarlers</c>
</p>
<p begin="00:00:22.000" end="00:00:24.417" region="speaker">
Everything is safe.<br />
<span tts:fontStyle="italic">Perfectly</span> safe.
</p>
</div>
</body>
</tt>
Note
The settings (cue 2) are ignored when converting to TTML and unrecognized styling in the payload is escaped (cue 3).
Footnote
Converting TTML to WebVTT
In general, TTML offers a lot more flexibility regarding document structure and styling of cues. When converting TTML to WebVTT, only a subset of this extra information will be maintained:
Bold text
Italicized text
Underlined text
Strike through text
Also, only explicit line breaks will be respected (<br />
), meaning cues
spread out over more than one paragraph (<p>
) will end up on one line in
WebVTT.
Note
Converting image-based TTML to WebVTT is not supported. When using image-based TTML as an input for Origin, use Using dynamic track selection to filter out the image-based TTML input when requesting HLS.