Must Fix: a suitable bitrate ladder (content dependent)

One should prepare video content as a set of different bitrate tracks, with each of those tracks representing a different quality level. The selection of different bitrates is called a bitrate ladder.

A potential ladder could be (this should only be regarded as an example of how a ladder might look like, not as a recommendation to use this particular one):

Resolution (16:9)	Bitrate (H264)
416x234	145 kb/s
640x360	365 kb/s
768x432	730 kb/s
768x432	1100 kb/s
960x540	2000 kb/s
1280x720	4500 kb/s
1920x1080	6000 kb/s
1920x1080	7800 kb/s

Choosing a good ladder is important for quality and efficient delivery. What is 'good' depends on the capabilities of the end users' devices, network capacity, the codec that is used, and the content itself. Ideally, you adjust the bitrate ladder per asset in your library (as some content requires higher bitrates to achieve the same quality, and other content can be encoded more efficiently).

Choosing a bitrate ladder is subject to different opinions and research. For example, the above ladder is taken from Apple HLS Authoring Specification. A more in-depth look at bitrate ladders can be found in Optimal Design of Encoding Profiles for ABR Streaming, a paper that Yuri Reznik presented at the Packet Video Workshop during ACM MMSys 2018.

A short summary of Reznik's paper is that the first thing to explore would be the quality versus bitrate curve of your content, which, as noted earlier, will differ per asset. That is, for some assets introducing a higher bitrate won't offer significant gains in quality (as measured by PSNR and SSIM).

The second thing to determine is the steps between the bitrates that you want to offer. These steps will often be bigger for Live content than for VOD.

Finally, you can optimize a bitrate ladder for the characteristics of your network. This step is a bit more challenging, as you'll need a model for your network bandwidth and client behavior. In Reznik's paper, a simple approach based on an LTE model is used, with a client that will always choose the highest possible bitrate within the constraints of estimated available bandwidth.

Note

The aspect ratio must remain the same across your entire bitrate ladder.

Audio

The need for an audio specific bitrate ladder is less obvious, since audio can be encoded at high quality using relatively low bitrates (compared to video). That is, the difference between a 128 kb/s encoded AAC stereo track and a 64 kb/s version of that same track might not be worth complicating your setup for when streaming video.

This is different from a setup where you expand the audio that you offer beyond stereo, to include surround sound. You may even use a variety of codecs for your surround sound offerings (e.g., Dolby EC-3 and DTS:X). However, it is recommended to follow the Apple HLS Authoring Specification and make all of your audio offerings (different language and audio description tracks) available in the same combinations of codec and bitrate, e.g. (where 'AD' stands for an accessibility track that offers audio description):

Codec	Language	Bitrate
AAC-LC	English	128 kb/s
AAC-LC	English (AD)	128 kb/s
AAC-LC	Spanish (dubbed)	128 kb/s
Dolby EC-3	English	384 kb/s
Dolby EC-3	English (AD)	384 kb/s
Dolby EC-3	Spanish (dubbed)	384 kb/s

Note

When multiple languages are made available, all the audio profiles (codec and bitrate combinations) must be present for each language for Origin's HLS output to be compliant with the Apple HLS Authoring Specification.

Exception: radio (with audio only streams)

When an audio only streams are offered it may become more worthwhile to differentiate different bitrates for stereo tracks, so that end users can enjoy these streams even on very limited connections. In such cases, HE-AAC might be used for lower bitrates, and AAC-LC for higher ones. Perhaps even using different sample rates:

Bitrate (kbps)	Samplerate (KHz)	Audio codec
24	24	HE-AAC
64	32	HE-AAC
96	48	HE-AAC
128	48	AAC-LC
320	48	AAC-LC
384	48	Dolby AC-3

Should Fix: all tracks are compliant with a CMAF media profile

CMAF media profiles are an important feature of MPEG's CMAF specifcation. They define specific configurations of media and content. These profiles are mainly defined by MPEG and the CTA-WAVE Content Specification Task Force.

Media profiles of a visual track may for example define the following:

Maximum frame width and height
Visual Usability Information (VUI) usage
Codec usage and profile of the codec (e.g., AVC high profile)
Other codec specific settings
A brand name (4 character letters) to identify the profile used

Some example media profiles are 'cfhd' for High Definition AVC video and 'caac' for AAC audio.

Do note that for full CMAF compliance there are other constraints that have not been mentioned yet, but that tracks need to adhere to as well (typically, encoders will make sure this is the case):

Same aspect ratio for all renditions within a switching set (a "group" of video tracks that share a number of characteristics so that a player can switch between them without issue)
Same color space and color transfer characteristics for all renditions within a switching set
Visual tracks only contain samples for display, without padding of the image using black pixels to fit the aspect ratio
Bit-depth and chroma format does not change within tracks

Should Fix: timed metadata is carried in a separate sparse track

If all Timed Metadata for a stream is contained in a separate sparse track, Origin can rely on a single source of information for such metadata. Otherwise, Origin needs to scan all media tracks for potential Timed Metadata, which is less efficient and may also result in conflicting information (when certain Timed Metadata is present in one track, but not in another and vice versa).

A sparse track does not represent a continuous stream of data, but an intermittent one. This is an ideal fit for Timed Metadata, which occurs intermittently, as opposed to the tracks that store audio and video, which should be continuous.

Should Fix: add captions or subtitles

To increase accessibility of your content you should add captions or subtitles where possible.

Embedded in video (CEA-608 / CEA-708)

CEA-608 or CEA-708 closed captions may be embedded in the AVC video stream. When delivering captions or subtitles like this, make sure that the presence of CEA-608 or CEA-708 is signaled throughout the entire stream (i.e., even when captions or subtitles are not present).

Since CEA-608 embedded closed captions are embedded in the video track, the language of the closed captions is set to the language of the video track.

There are some limitations when using this technique to provide the subtitle tracks. There is a track limit, usage is limited to two language tracks, and as the tracks are embedded they are not available for styling or manipulation by the player.

As a separate track (TTML or WebVTT)

The recommended method for providing live subtitles is through an ISMT stream, which the origin converts to WebVTT for HLS clients. You gain the benefits of multiple language tracks and the ability to name and describe tracks.

The ISMT fragments should contain either WVTT (ISO/IEC 14496-30:2014 - Web Video Text Tracks) or TTML (Timed Text Markup Language) formatted text, similar to the output of mp4split. When TTML is used, there is basic support for WebVTT styling as supported by most HLS players. For more sophisticated cue styling and positioning, we recommend using WVTT.

For information about preparing VOD subtitles, please read the section on preparing subtitles.

Should Fix: add an audio description track (for the visually impaired)

To increase accessibility of your content you should add audio description tracks where possible. An audio description track is a track that contains not only the regular audio, but also a spoken description of what is happening in the video.

For VOD please refer to the Configuring Audio description track how to to know how to add a audio description track (note that is important to follow the steps exactly).

Should Fix:: avoid transcoding of subtitles when using advanced styling

If your subtitles contain cues with advanced styling, do take into consideration that this styling is stripped when Unified Packager or Unified Origin is used to transcode these subtitles from one format into another, i.e., TTML into WebVTT or vice versa.

That is, if you need advanced styling to be present in your output, make sure that your source subtitles are already encoded in the format that is required (i.e., TTML, WebVTT, or both).

Note

Captions embedded in the video track (i.e., CEA 608/708) should only be used if there is a clear business case/need for it. Otherwise using them is not recommended.