Enhancing Video Streaming Quality for ExoPlayer—Part 1: Quality of User Experience Metrics
Written by: Mark Greve and Domițian Tămaș-Selicean
The online video player landscape is fragmented with a wide variety of players across a mix of popular platforms. In the world of HTML5-video players in browsers, there are a number of open-source solutions (e.g., hls.js, dash.js, Shaka Player), as well as commercial offerings which include Akamai's Adaptive Media Player (AMP).
On Android, one of the most popular choices is Google's ExoPlayer, which will be the focus of this blog post. ExoPlayer is an open-source player developed by Google for the Android platform and distributed under the Apache License 2.0. Although each Android version contains a native MediaPlayer out of the box, ExoPlayer has several advantages:
It is open-source, modular, customizable, and extensible
It supports multiple streaming formats (e.g., HLS, MPEG-DASH, SmoothStreaming) and features (e.g., Widevine common encryption)
It allows an app to use the same player across different Android versions
Specifically, in this blog series we will look at what options you have to improve the quality of the user experience (QoE) by tweaking configuration options in ExoPlayer. In this first post, we will define the QoE metrics, review common video player features, discuss how these features impact QoE, and describe the trade-offs between the different metrics.
In the subsequent post, we will focus on specific ExoPlayer configuration options, and how to tweak them to improve certain QoE metrics.
QoE is the overall experience of a user watching a video stream. Unlike quality of service (QoS), QoE is a subjective matter, thus difficult to measure, or to guarantee a certain level. Let’s examine the metrics that influence QoE.
Startup time is the period that passes since the playback session is initiated until the playback has started. Basically, it's the time between when the user presses Play until there's video playback on the screen.
The importance of short startup time is highlighted in a report we published in 2016, "viewers will start abandoning a video if startup takes longer than two seconds to begin playing and for every additional second of delay, roughly an additional 6% of the audience leaves. [...] with a 10 second delay, nearly half the audience has left." See the original report for more details.
In practice, the startup time is influenced by two factors:
The startup algorithm of the player that decides where in the stream to start, and how much to buffer before starting playback. For video-on-demand (VOD) content, the start point is evident -- at the beginning of the stream. For live content, this gets more complicated, as it depends on the type of stream, standards and player implementations.
Network conditions that dictate the delivery time, i.e., how much time it takes to download the actual content.
For this QoE metric, we have chosen to focus only on the first point, the startup algorithm of the player; and separate the delivery time into the next QoE metric.
Measure: the shorter, the better.
End-to-end hand-waving latency
End-to-end hand-waving latency (or hand-waving latency, end-to-end delay, glass-to-glass delay, capture-to-display delay) refers to the time it takes for a frame of a live stream since ingestion to be shown on the viewer's screen. The hand-waving latency is literally the time it takes from the moment a person waves a hand in front of the camera until it is seen by the viewer on screen.
The hand-waving latency is an important metric, but relevant only for live events. The end-to-end hand-waving latency can be broken down into three big components:
Ingest Time (also referred to as First Mile) -- the time it takes for the video stream to get from the camera, via the encoder, to the entry point in the Akamai Intelligent Platform (or another cloud).
Cloud Time -- the time it takes for the video stream to make its way through the cloud (including any replication, backup, live transcoding that might take place in the cloud).
Delivery Time (also referred to as the Last Mile) -- the time it takes for the stream from the cloud exit point until the user's end-device.
Please check out this Akamai guide to encoding and transcoding to see how Akamai can help you lower your end-to-end latency and the Akamai blog post on the options for ultra low end-to-end latency with chunked-encoded and chunk-transferred Common Media Application Format (CMAF).
Measure: the shorter, the better.
Video quality is a function of the video bitrate: usually, a higher bitrate means better video quality, clearer and crisper picture, richer colors.
Measure: higher quality is always better.
In the case of streams with multiple renditions of different qualities, we refer to a bitrate switch as the change from a rendition of a certain quality to another rendition of different quality. In case the new rendition is of higher quality (e.g., switch from 720p to 1080p), we call this an upswitch. In case the rendition is of lower quality (e.g., switch from 2160p to 1080p), we call this a downswitch.
Recent research has shown that viewers respond negatively to bitrate switches (both down- and upswitches), preferring a constant bitrate even to an upswitch.
Measure: the fewer, the better.
Rebuffering (also referred to as buffering, stalling) is possibly the most noticeable undesired playback event, during which the player runs out of media data, resulting in a pause of the video.
Research from 2016 has shown that "a viewer experiencing a rebuffer delay that equals or exceeds 1% of the video duration played 5.02% less of the video in comparison with a similar viewer who experienced no rebuffering." In 2018, Limelight noticed that 28% of viewers experiencing a rebuffering event abandon the playback session. One of the key findings of a 2019 report from Akamai and MTM was that a single rebuffering event could lead to loss of over 85,000 USD in revenue.
Measure: the fewer, the better.
Representing the QoE metrics
In this blog post series, we represent our QoE metrics using a radar chart:
Center level represents the "good" state (e.g., high video quality)
Middle level is "unclear"
Top level represents the "bad" state (e.g., high hand-waving latency)
The chart above shows the state of the ideal user experience: no bitrate switches or rebuffering events, minimal startup time and hand-waving latency, and the highest video quality available.
QoE references and further reading
In a survey from 2016 among 351 company managers, the participants have identified the following culprits that affect the user experience in a negative way:
Audio out of sync
Slow start, stops mid-play
Lagging behind the source.
A research paper from 2016 analyzed over 400,000 YouTube views for more than 900 viewers from over 100 countries found that rebuffering and bitrate switch events (even if upswitches) affect negatively the QoE.
According to a 2017 study, a single rebuffering event causes a decrease in positive emotions (happiness down 14%) and a 16% increase in negative emotions. The study also confirms that video quality matters: In non-buffering video sequences, higher resolutions produce 10.4% higher emotional engagement than lower resolutions.
A 2019 white paper cites a senior manager at a major broadcaster: "When rebuffering is less than 0.5, 90% of the sessions are completed. As soon as you get 0.5-1%, then the number starts to drop -- 80%. As soon as you hit 1% you see the rate drop down to 50%."
Player features that impact QoE
All video players for modern streaming formats (e.g., HLS and MPEG-DASH) have a common feature set. Many of the features are subject to various tradeoffs between QoE and other parameters, which means it's often possible to improve QoE by coming up with better heuristics. For some players (including ExoPlayer), there's an easier way to improve some QoE metrics at the expense of others, since the heuristics can be tweaked using configuration options in the player. The two major important features in modern video players that have an impact on QoE are:
Bitrate selection: to pick a suitable bitrate when there are multiple renditions in different qualities for a video stream. This feature is known by many names, e.g. adaptive bit rate (ABR) strategy, multi bit rate (MBR) strategy, automatic bitrate selection, etc.
Buffering strategy: for deciding the amount of media data to keep in the player's internal buffer, when to fetch media data, and how much media data is needed at startup before playback is initiated.
Next, we will present in more detail some of the questions and trade-offs that shape the bitrate selection strategy and the buffering strategy. In the next blog posts of this series, we will look at how it's possible to tweak the behavior in ExoPlayer for some of these key questions.
Bitrate selection strategy
If you were to develop a new video player, then you would face a number of questions to answer when building the bitrate selection strategy. Some of the key questions are listed below:
Which bitrate should the strategy pick at startup?
Picking a bitrate that is too high (i.e., one that cannot be sustained) may lead to a long startup time and to many rebuffering events. Picking a bitrate that is too low (i.e., well below what the connection can sustain) means that the video quality will be low and the viewer may experience several bitrate switches before reaching the highest available bitrate that it can sustain.
Which criteria are used for switching up in bitrate?
Switching up too fast will increase the chance of a rebuffering event, if the player cannot sustain the high bitrate. Switching up too slowly will keep the viewer at a low quality bitrate, and potentially increase the number of bitrate switch events.
Which criteria are used for switching down in bitrate?
If the strategy switches to a lower quality bitrate while the current one could have been sustained, the viewer will unnecessarily experience a low quality picture. If the switch happens too late, the viewer may experience rebuffering events.
How can you avoid rapid oscillations in bitrate switching?
As mentioned previously, research demonstrates that bitrate switches, regardless of upswitches or downswitches, impact the user experience negatively. Thus, it is important that the strategy avoids unnecessary bitrate switches, and implements a heuristic that avoids sudden changes.
For the buffering strategy, there are similar key questions and a short discussion of some of the trade-offs involved:
How much data should be buffered before playback can be initiated?
Buffering too little will lead to a rebuffering event. On the other hand, the more the strategy buffers, the more it increases the startup time and handwaving latency.
When and by how much should the player's internal buffer be filled with media data?
Keeping the internal buffer always filled to a level (drip style) versus filling it in bursts on an interval basis (or based on other metrics) affects network usage and battery usage of the device. For example, this thread on the ExoPlayer bug tracker reveals that the buffering strategy in ExoPlayer versions 2.9.6 and below is based on the assumption that network operators prefer burst-transfers, rather than drip style (they plan to change this strategy in a subsequent release).
On the topic of how much to fill up the internal buffer, a large buffer decreases the chances for rebuffering events considerably. However, the capabilities of the device can limit the buffer size. Furthermore, in the case of live streams, there's a direct correlation between the minimum required amount of media data in the buffer and the hand-waving latency -- the larger the required amount of media data, the further the player plays from the live edge of the stream.
How much data should be retained from the previous bitrate when switching to a new one?
Retaining too much media data from the previous bitrate limits the amount of buffer available for media data from the new bitrate. However, retaining too little (or none) runs the risk of rebuffering events in case the player reverts to the previous bitrate (i.e., it cannot sustain the new bitrate in case of an upswitch, it decides that the previous bitrate was sustainable in case of a downswitch).
ExoPlayer version investigated: 2.9.6
ExoPlayer is a modular open-source player, with the following four components common to all ExoPlayer implementations:
MediaSource defines and provides the media to be played. ExoPlayer has default implementations for HTTP Live Streaming (HLS), MPEG-DASH, and SmoothStreaming.
Renderer consumes the media from the MediaSource and renders the media read.
TrackSelector implements the bitrate selection strategy. ExoPlayer provides several default implementations (FixedTrackSelection, RandomTrackSelection, and AdaptiveTrackSelection).
LoadControl implements the buffering strategy. ExoPlayer provides a default configurable implementation (DefaultLoadControl).
In the next post of this series, we will show how to configure the TrackSelector and the LoadControl to improve one or several QoE metrics. Stay tuned!