What is the preferred webserver for media delivery?
- TL;DR
We recommend running Unified Origin as an Apache module.
As the different webservers expose different APIs as well as use different approaches internally in terms of processing or resource management, the following provides an overview including the consequences of choices made by the various webservers.
Resource Management
Apache
Apache structures resource use using so-called "buckets" and "brigades". From Apache's documentation:
A bucket is a container for data. Buckets can contain any type of data. Although the most common case is a block of memory, a bucket may instead contain a file on disc, or even be fed a data stream from a dynamic source such as a separate program. Different bucket types exist to hold different kinds of data and the methods for handling it. In OOP terms, the apr_bucket is an abstract base class from which actual bucket types are derived. [1]
The bucket is contained in a so-called "brigade":
A brigade is a container that may hold any number of buckets in a ring structure. [1]
Buckets not only allow for efficient handling of data, in the sense that they can be copied, split and deleted in effect by copying pointers but also that the bucket API can be extended.
For instance, libfmp4 (the Unified Streaming core library) extends Apache's buckets with the following two types:
Bucket type |
Description |
---|---|
bucket_type_http |
Dynamic type, used to GET data over HTTP |
bucket_type_xfrm |
Dynamic type, used to encrypt outgoing data |
Access to the underlying data is handled different in each bucket_type, so the HTTP bucket implements a different read than a FILE, or an XFRM bucket.
This means that for instance when encrypting output the webserver does not have to load the whole file into memory but can delay the encryption to when it's actually required: just before sending the data out.
In short, when producing output libfmp4 creates the buckets and sets the type. Apache then traverses the bucket list and either sends the data straight out (as libfmp4 implements the bucket API, Apache can simply walk over it) or process it (automatically) when calling the "read" function (for instance dynamically encrypting when the type is an XFRM bucket).
The bucket list is created completely when processing output for a client request, but actual IO is deferred to when it is actually needed: when Apache walks over the bucket list just before data should be offloaded to the network. In effect, Apache takes control of the libfmp4 generated bucket list and uses what can be understood as lazy-evaluation on the bucket list.
By implementing specialized buckets through the bucket API Download to own for HTTP Live Streaming (HLS) or Progressive download become possible in Apache, without having to load all in memory and keeping resource use low and flexible.
Nginx
In many ways, Nginx can be seen as 'Apache Light': much of Apache's structure and ideas have been copied and then changed towards Nginx's own purpose.
Nginx unfortunately lacks key Apache features:
there is no (extensible) bucket API
only FILE and HEAP buckets are implemented in Nginx
The lack of the bucket API in Nginx limits certain uses-cases, notably progressive download and download to own: data must be copied into memory using the HEAP bucket as there is no XFRM (transform/encrypt) bucket.
Serving many of such requests would bring down the webserver as it would need to load the requested file into memory completely, therefore this functionality is not available.
Process Management
Regarding request handling, webservers have basically two choices:
thread/process based request handling
event based request handling
Both have a number of pros and cons, a good overview can be found in 'Concurrent Programming for Scalable Web Architectures'. [2]
An important difference is how state is handled:
A thread can store state as part of its thread stack, independently of any scheduling. In an event-driven system, it is the developer's responsibility to handle and recover state between event handlers. [3]
This basically implies that in an event based architecture a reversal of the control flow is required where the callback chain in effect forms a state machine.
Apache allows for both models as it supports both a thread/process back end as well as an event based back end.
Nginx however, supports only an event based model. Unfortunately this has a huge impact on synchronous I/O used in libfmp4, for instance when fetching samples over HTTP in an object storage (e.g. Amazon S3) or making upstream calls in the Remix workflow.
In fact the upstream call will block the Nginx event handler until it returns, rendering the Nginx process unresponsive for anything else. Interestingly, Nginx internally uses synchronous I/O in certain cases ("stat" for instance), which can become a bottleneck.
Nginx has recognized this and since version 1.7.11 (mid 2015) introduced the concept of a threadpool as 'it's not possible to avoid blocking operations in every case'. [4] The threadpool API is used to boost the performance of three of Nginx's internal I/O related system calls. [4]
However, (fast) CGI is supported as well by Nginx and might prove a better, more modular approach to (blocking) I/O than using a threadpool with all it's inherent complexity.
http://berb.github.io/diploma-thesis/original/043_threadsevents.html
https://www.nginx.com/blog/thread-pools-boost-performance-9x/
http://www.unified-streaming.com/cases/late-transmuxing-improving-caching-video-streaming-thesis
Media Origin
Looking at benchmarks between Apache and Nginx it becomes clear quite quickly that Nginx performs better when it comes to static file serving. [7]
However, a Media Origin serving video content in many different formats, adding subtitles and applying DRM dynamically is not a static file serving setup.
In fact, recent tests show that Nginx and Apache perform the same when it comes to dynamic content. [7] [8] [9]
When performance is not the key factor, then other factors become decisive: Apache has richer APIs and better internal documentation as well as a choice in processing models when it comes to an essentially I/O bound process like a (dynamic) Media Origin.
As far as deployment there is no difference either: Apache can easily be deployed as virtual machine, on hardware or as containerized micro service from Dockerhub serving particular use-cases - allowing to scale horizontally.
However, Nginx shines when it comes to reverse proxy and caching so it is highly recommended to use both. Whenever possible media fragments should be served from cache, either by the CDN or the upstream origin first from an Nginx cache and only on full cache MISS the origin should be contacted. See Origin shield cache for further details.
https://www.conetix.com.au/blog/apache-vs-nginx-vs-openlitespeed-part-1
Media Edge
It is possible to move just-in-time content generation to the edge of the network using a combination of Nginx and Apache.
Details of this setup can be found in the late transmuxing whitepaper.