21 May 2018
End-to-End Assessment of Mobile Video Services
Dr. Jens Berger and Dr. Silvio Borer of Rohde & Schwarz look at the needs and methods for assessing end to end mobile video services.
Videos are the most popular web content and have long formed the bulk of the data volume in mobile networks. Which is why not only video consumers but also video service providers and network operators have a vested interest in a high quality of experience. Mobile measuring systems assess this automatically and just as reliably as human viewers.
Today, mobile video is a much-discussed topic, albeit not a new one. Even shortly after the turn of the millennium, i.e. long before the first smartphone, it was possible to access and stream videos from media servers using QuickTime® or RealPlayer®. For these UDP/RTP-based transmissions, the term streaming in the sense of realtime transmission was indeed appropriate because the limited capacity in mobile networks meant that video could only be transmitted highly compressed and successively in realtime.
Neither buffering under increased transmission speed nor enhanced error correction was possible. These limitations existed despite modest image sizes, typically QCIF (144 × 176) or QVGA (240 × 320). Other mobile video technologies such as DVB-H were just as unsuccessful as these early streaming services.
Mobile video did not achieve a reasonable degree of acceptance until the advent of HSPA transmission technology and VGA display sizes (480 × 640) and higher. This was less than ten years ago. But since then, use has grown exponentially and now dominates the amount of data in networks (Fig. 1). The reason is naturally the increased transport capacity of mobile networks and the availability of less expensive data plans, but mainly it is due to the widespread use of high-resolution, large-format smartphones as central access devices to practically all media. As a result, video is increasingly used as a primary source of information. It is not without reason that today YouTube is the second most frequently used search engine, right behind Google.
Figure 1 : Growth of data traffic in mobile networks. The data volume of video services is increasing sharply, both in relative and absolute terms (source: Ericsson Mobility Report 2017).
Generally, mobile video services are not the primary services offered by network operators. They largely function independently of telecommunications norms and standards. Contents, servers and applications are made available by independent service providers who simply use the mobile network to transport data (known as OTT services).
The data is normally encrypted and transported using proprietary protocols on the application layer. The video compression technique is also service-specific. All information exchange between app and service takes place under the direction of the video service and is subject to continuous optimization and adaptation. It is therefore almost impossible to provide accurate and detailed information on the many video services available on the market. Instead, principal techniques will be briefly presented and the need for a perceptive assessment of the service quality explained.
The perceived quality of a video service can be roughly determined based on the following criteria:
Is the service available?
How long is the delay between the request and when the video starts (time to first picture)?
To what extent do unwanted interruptions occur (stalling)?
How high are the image resolution and quality? To what extent is the image quality affected by:
Compression losses (blurring due to compression and/or reduced resolution, reduced frame rate), blocking artifacts
Transmission errors (artifacts, corrupt images, brief stalling)
Desynchronization effects between audio and video may also occur.
Technical background for video transmission on mobile devices
The majority of requested videos are coder-compressed video files that are stored on a server waiting to be called up (video on demand as opposed to live video). Streaming is the term that is widely used to describe transmission to the consumer device but to be precise, this term only means the continuous transmission and realtime processing on the consumer device. But unlike the early days of mobile video mentioned above, today data is transmitted in larger sections and buffered.
The entire video can in principle be downloaded as a file and then viewed after it has been fully received. However, users are unwilling to accept the annoying waiting times. The progressive download technique provides a solution to the problem. The video starts as soon as the first section is available on the smartphone, while the rest of the video continues to be downloaded from the server in the background. Using this strategy (if there is sufficient channel capacity), the complete video will be available on the device after just a short time. The advantage is clear: once loaded, connection quality is irrelevant and the video can play without interruption.
However, users often do not watch videos to the end so a complete download would be a waste of transmission capacity. The solution is a compromise between the need to buffer sections of the video to ensure interruption-free playback and the desire to be economical with the available transmission capacity.
First of all, a large initial section of the video file is saved. If it is apparent that the viewer wants to continue to watch the video, the next section is downloaded when a certain playback point is reached. The length of each loaded section ranges from a few seconds to minutes, depending on the philosophy of the video service. The trend is moving toward shorter sections and is therefore again approaching the streaming ideal. However, unlike realtime streaming, a large section of the video remains in the buffer so that long gaps in the connection can be bridged (Fig. 2).
Figure 2 Three examples of data transmission measurements for video services.
From a technical viewpoint, video on demand is still a file download that does not require realtime transmission. Data transfer is mostly based on the reliable TCP/HTTP protocol, which prevents the loss of data and is supported by all operating systems. One exception is YouTube which is moving towards Google’s QUIC protocol. For performance reasons, UDP instead of TCP is used on the transport layer and this can potentially lead to data losses. On the application layer, however, QUIC has implemented mechanisms that prevent losses.
Compared to video on demand, live video still plays a minor role in the network as far as volume is concerned, but places greater realtime demands on the transmission path. Typical applications include video telephony, images from surveillance cameras and video-assisted remote control systems. How narrowly the term realtime is to be interpreted in each case depends on the application. In the private domain, TV and live video are of primary importance in social media. In both cases, the realtime requirements are less strict and a time delay of a few seconds is accepted. For this reason, transmission can take place on the same technical basis as for video on demand. The only difference is that the storage and reload intervals are reduced to just a few seconds.
The technique of using staggered, section-by-section transmission also makes it easy to adapt the bit rate to the transmission channel. Each section of video can be delivered with the appropriate compression (e.g. in line with the DASH method) based on the current channel capacity. If things get tight on the transmission path, the video section is delivered with lower resolution or higher compression, thereby reducing the data rate. This does affect the image quality, but pauses caused by emptying the buffer memory would be even more annoying.
The video provider is responsible for defining whether the client on the smartphone or the server decides what channel information to use as the basis for selecting the appropriate compression level, defining the time constants that regulate this behavior, and all other details. The mobile network only provides the means of transmission; the video service reacts to the given situation, with the primary objective of avoiding image freezing while maintaining a high image quality (depending on channel capacity). The compression methods used are not lossless. A varying amount of detail will be lost depending on the coding scheme and compression level. In the best case, the effects remain below the perception threshold. If greater compression is necessary, blurring occurs, which becomes more evident in moving scenes. Even greater compression causes annoying artifacts such as pixelation blocks and absent colour shading.
The data stream is only of limited help when assessing quality
Essentially, the technical makeup of a video service is irrelevant – ultimately what counts is what the viewer sees, i.e. the much quoted quality of experience. The question is how to assess this using technical methods.
The size of a video file in relation to the playback time and the associated bit rate provide only limited information because the individual codecs function at different levels of efficiency, i.e. transmit different image qualities at the same bit rate. Most codecs have multiple quality levels known as profiles. Profiles define the calculation effort that goes into the compression. More complex compression can result in a greater level of detail for the same data volume. Finally, the image content also affects data volume. Large-format images in a stationary scene can be encoded more effectively than small-format images with high motion, brightness and colour dynamics.
The server and app react to any change in the network and image material by adjusting their settings in a feedback loop. An assessment tool based only on data flow analysis and no knowledge of image and application metadata would not be able to provide reliable information on quality from the end customer’s perspective. And even if service-specific meta information were accessible, the change dynamics in this industry are so great that the analysis tool manufacturer would hardly be able to develop his software fast enough to keep up.
Plus, the majority of today’s video services already use encryption on the transport layer. Mere analysis of the received bitstream can only deliver a small amount of the information necessary for a quality assessment. The alternative is, quite literally, there for all to see: the displayed image itself serves as the source for analysis. Everything that happened prior to display, such as compressing, transmitting and decoding the video and preparing it for display, is reflected in the image and can be taken into account in the analysis. The only important thing is what the viewer ultimately sees. But to be able to analyze the screen content, it is necessary to access to the mobile device’s image memory – a difficult, but manageable challenge on smartphones.
Ultimately, what counts is what the viewer sees
As already mentioned, the time from when a video is requested to when playback begins (time to first picture) is an essential parameter when assessing service quality. Due to data buffering, the display does not start when the first data package is received on the IP layer – it starts much later. This time delay can only be measured by looking at the screen or in the image memory.
It is also not possible to (accurately) diagnose emptying of the buffer memory from the received data stream since you do not know how full the memory is nor if warping measures are used. You also have to look at the display to see if the video freezes or stalls. The measured display time of each image is used as the basis.
Assessment of the actual image remains a challenge in itself. It requires technical methods known as perceptual objective video quality models that take the peculiarities of human perception into consideration.
Perceptual objective video quality models
Perceptual objective video quality models evaluate not only the frames according to various criteria, but also motion patterns over long image sequences, in the same way as a person reacts to static and dynamic aspects. The analysis may be complex, but the result is simple. In the end, it is summarized as an overall value on a quality scale. For example, the internationally recognized absolute quality scale describes the quality as a value between 1 (bad) and 5 (excellent) (Fig. 3). The average of many individual assessments is the mean opinion score (MOS).
Figure 3 Commonly used international ratings for MOS.
A simple example of perceptual objective analysis is the assessment of stalling. The more dynamic a scene is, the more annoying image freezing will be. In a scene with very little movement, stalling will result in the loss of just a small amount of information, and may not be perceived at all in the case of a static subject such as a landscape. With a sports broadcast, on the other hand, even brief interruptions will be perceived as extremely annoying. The perceptual objective measure for the motion aspect is referred to as jerkiness; it weights the display duration of an image with the movement in the video and returns a single value that represents the loss of information and the annoyance of waiting.
The environment in which a disruption occurs is also decisive in assessing the disruption. Artifacts in the image foreground or in a moving object (attraction areas) result in a much more negative assessment than a block formation in an extremely bright or extremely dark image area where such artifacts are less noticeable.
Video codecs also employ perceptual objective strategies in order to use the characteristics of video content to optimize compression, e.g. to encode certain attractive areas in the image with a greater level of detail while permitting a greater loss of detail in unattractive areas.
Fields of application for standardized video quality models
The widespread use of IPTV has created a greater need to measure video quality at various distribution points in the network. Many video quality estimators have recently been developed for this purpose. Although these estimators only analyze the video bitstream, they provide sufficiently precise results for these applications (Fig. 4). If the bitstream is not encrypted, content information (display duration of a frame, compression structures) as well as metadata (codec type and profile, packet size) can be used, and it may even be possible to decode the image. In the case of an encrypted bitstream, the amount of information that can be evaluated is severely restricted. How severely depends on whether the encryption affects only the actual video data and on which protocol hierarchy level it is applied.
Figure 4 Bitstream-based quality estimators use a small amount of metadata and heuristic methods to derive an MOS.
Bitstream-based methods are intended for monitoring applications. Here, the video does not have to be known nor available in decoded form. Current methods are described in ITU P.1201, P.1202.1/.2 and in P.1203.1-4.
In the case of end-to-end tests (particularly in mobile communications), image-based models have the advantage, as explained above. They are the most accurate in representing the user experience because they are based on human perception and are able to analyze the image. Current HD-compatible measurement methods can be found in ITU J.341 and ITU J.343.1-6. A basic distinction can be made between reference-based and reference-free methods. Reference-based methods (picture-based, full reference methods, Fig. 5) have access to the source video and can calculate perception-relevant differences to the received video image by image and even pixel by pixel, and combine them to obtain a quality value. Such methods are described in ITU J.341 and ITU J.343.5/.6.
Figure 5 In reference-based methods, a streamed video is compared with the original stored on the measuring instrument.
To use these methods, however, reference videos must be previously uploaded to the server of the video service to be tested. During the quality measurement, these videos are compared with the same videos stored on the measuring instrument. This method is supported by services that permit private videos to be loaded and streamed (e.g. YouTube), but not usually by professional providers (e.g. Netflix). Reference-based methods are also unsuitable for assessing live video because there is no previous playback source available.
In contrast, reference-free methods (picture-based, no reference methods) do not need any a-priori knowledge of the source video. The received and decoded video is analyzed for typical disturbances (jerkiness, loss of detail, compression distortion, etc.) and this information is used to calculate the quality value. Standardized methods are described in ITU J.343.1/.2.
The advantage of these methods is their broad range of applications since they function irrespective of the transmission path. This is why ETSI TS 102 250-2 recommends the use of J.343.1 for all types of mobile video streaming services. The secure transmission methods used almost exclusively for mobile video streaming today do prevent bit errors that in the past resulted in severe artifacts and image errors and also reduce the problems of compression artifacts (loss of image details, blurred movements) and stalling, i.e. the freezing of moving images. But with the growing popularity of video telephony with its strict realtime requirements, non-secured (i.e. lossy) transmission is again becoming more prevalent on mobile devices. Many of the current measurement methods are prepared for this.
Structure and application of ITU J.343.1 in Rohde & Schwarz products
The quality measurement method in line with ITU J.343.1 is a SwissQual/Rohde & Schwarz in-house development. It was successfully tested and standardized in 2014 by the ITU and has since been implemented in the Rohde & Schwarz mobile network testing Android-based test applications. Some meta information from the video stream is also available to the method. A jerkiness value is calculated from the movement and display duration of the individual frames, and a loss of detail is calculated from information indicating the complexity of the images. At the end, the video quality is assessed on an MOS scale from 1 (bad) to 5 (excellent).
During development, a priority was ensuring that the measurement method could be used in realtime applications. The implementation analyzes only the current video frame in relation to a history comprising just a few images. Despite this constraint, the image assessment must be extremely quick so that it is completed before the next frame: with 25 frames per second, only 40 ms are available to analyze a 3 Mbyte image.
The method also obtains other information from the video signal. Stalling is detected, and the image size and video codec used are also recognized (Fig. 6). The data of the deeper protocol layers is also recorded. This results not only in the cumulative quality value, but also in a lot of information that can be used to optimize the transmission path and efficiently troubleshoot in the event of problems.
Figure 6 Video analysis in line with ITU J.343.1 is based on the images themselves as well as a small amount of metadata.
Video quality assessment is the main task when assessing a video service in general. The measurement applications support fully automatic control of YouTube, including YouTube live video as the most commonly used video service, as well as AT&T’s own DirecTV service. It is even possible to test almost any other video service in a semiautomatic measurement application. This allows you to quickly respond to new offerings as well as to assess and optimize regionally important video services.
The video test applications are supported by the QualiPoc product family. This family includes R&S®ROMES and QualiPoc Handheld for network optimization tasks, QualiPoc Remote Control for autonomous network monitoring, and especially the FreeRider walk test solutions and SwissQual Benchmarker as benchmarking systems.
Quality of experience is more than just image quality
Image quality may be the most important criterion when assessing a video service, but it is not the only one. Whether a service can be accessed, how long it takes to access it, as well as information on the loading progress are assessed. In order to gain an overall picture, Rohde & Schwarz mobile network testing products include a test sequence that measures the video quality by simulating the actual usage behavior – from starting the video application on the smartphone to requesting a certain video to analyzing the displayed images (Fig. 7). If waiting time plays a role in the real world, then the maximum waiting times of a hypothetical average user are used. If these times are exceeded, the test is regarded as “failed” in cases where the video never became visible, or as “dropped” in cases where the video froze for a long period of time. Such abort criteria are indispensable for an automated test sequence.
Figure 7 Measurement of a video service from start of the application to establishment of the connection.
The test sequence can be followed precisely in the test log on the smartphone (Fig. 8, left side). A successful test returns the overall quality (MOS) and other aspects such as jerkiness and freezing (stalling) (Fig. 8, right side). Many other measured values are collected in the background, including image rate, image resolution, the protocols used and the IP and trace log files. As a result, the user not only has access to the measured values for video quality, but also to all the information required to optimize the transmission path.
Figure 8 Realtime analysis of a YouTube video with QualiPoc as per the flowchart in Fig. 7. (Left Side)
Figure 8 Realtime analysis of a YouTube video with QualiPoc as per the flowchart in Fig. 7. (Right Side)
Summary and future developments
Videos have long accounted for the bulk of data transported in mobile networks, and forecasts predict continued dramatic growth. Network operators and video service providers therefore have a vested interest in keeping video consumers happy by ensuring that their services are of high technical quality. Automatic test systems quickly and reliably determine the level of quality. In the mobile sector, reference-free perceptual objective analysis methods have proven effective as an alternative to measuring video quality.
These methods deliver meaningful results with a computational effort that can be achieved even by smartphones and are therefore inexpensive and uncomplicated to use.
Although realtime applications such as video telephony do not play a major role at present, this will change in the foreseeable future. The upcoming, virtually latency-free 5G mobile standard will enable and facilitate realtime applications in high quality, e.g. video transmissions for telemedicine. Reliable, high path quality is essential. The current Rohde & Schwarz mobile network testing monitoring products are ready for these applications.
Quality evaluation of streaming video on mobile networks; Rohde & Schwarz white paper; search for 5215.4369.92 at www.rohde-schwarz.com.
ETSI Technical Report TR 102 493 (v.1.3.1): Guidelines for the use of Video Quality Algorithms for Mobile Applications. ETSI, Sophia Antipolis, 2017.
ETSI Technical Specification TS 102 250-2 (v.2.5.1): QoS aspects for popular services in mobile networks; Part 2: Definition of Quality of Service parameters and their computation. ETSI, Sophia Antipolis, 2016.
Recommendation ITU-T H.264: Advanced video coding for generic audiovisual services. ITU-T, Geneva, 2013.
Recommendation ITU-T H.265: High efficiency video coding. ITU-T, Geneva, 2016.
Recommendation ITU-T J.341: Objective perceptual multimedia video quality measurement of HDTV for digital cable television in the presence of a full reference. ITU-T, Geneva, 2016.
Recommendation ITU-T J.343.1: Hybrid perceptual bitstream models for objective video quality measurements. ITU-T, Geneva, 2014.
Recommendation ITU-T P.1201: Parametric non-intrusive assessment of audiovisual media streaming quality. ITU-T, Geneva, 2014.
Recommendation ITU-T P.1202.1: Parametric non-intrusive bitstream assessment of video media streaming quality. ITU-T, Geneva, 2014.
Recommendation ITU-T P.1203: Parametric bitstream-based quality assessment of progressive download and adaptive audiovisual streaming services over reliable transport. ITU-T, Geneva, 2016.
In with the new – the latest evolution of video standards
MPEG-4 (part 2), H.264 and H.265 are familiar standardized video codecs. For a long time, MPEG-4 (part 2) was the standard of choice for IPTV and DVD-Video. The next development step to H.264 (AVC) made HDTV practicable and is also used for the Blu-ray Disc format. The most recent standard codec is H.265 (HEVC), which is used by standards such as DVB-T2 and will establish itself as the codec for UHD1 transmissions (4K) because it is able to deliver acceptable image quality even with an extremely high degree of compression.
There are also proprietary, mostly open (but not standardized) codecs such as Google’s VP9 which, from a quality viewpoint, is somewhere between H.264 and H.265 and is at present the only codec used by YouTube. The transition to AV1 (a VP9-based, open source video codec from the Alliance for Open Media) is expected in the near future. When development is complete, AV1 will without doubt be adopted by YouTube and most likely also by other video service providers such as Netflix. There is currently a general trend among the major Internet players to move away from the classic standardization work in ITU and MPEG. Instead, they are discussing and adopting coding and transmission standards within the framework of mergers and consortia. Since every service maintains its own technical ecosystem and does not need to ensure compatibility with others, codecs (just like the communications between server and app) are usually changed without notice or disclosure.
YouTube is a perfect example. Less than two years ago, YouTube transmitted MPEG-4 coded videos in 3GP format via unencrypted TCP connections. Since then, videos have been encrypted, initially using TLS and later using Google’s own SPDY protocol. Videos were also recoded with H.264. Some time later, there was the transition to MPEG DASH in order to allow adaptive bit rates. Another step was to again recode the videos, this time with VP9, Google’s own video codec. Then, at the beginning of 2017 – for Android smartphones – YouTube abandoned TCP in favor of UDP and the QUIC application protocol. This list of changes only relates to the measures concerning the transmission of videos. With practically every new app version, YouTube also changes the way in which the buffer memory in the smartphone is managed, i.e. the rules that define how much and when data is buffered as well as the criteria according to which the bit rate is changed.
Other video services continually make similar adjustments. In order to compare the quality of different services without being influenced by the evolution of a service, all measurement methods, assessment methods and criteria used have to be measurable for all services over a long period of time and include all components that play a role along the transmission path.
Page 1 of 1
About the author
Dr Jens Berger is a Senior Director, Applied Research. He completed his Master studies in Communication Engineering in Dresden, Germany, in 1989, and started his career at the Research Institute of Deutsche Telekom. Later, Jens received a Ph.D degree in Electrical Engineering in the area of network- and system-theory from the Technical University of Kiel, Kiel, Germany, in 1998. His doctoral thesis focused on objective measurements of speech quality, discussing the modelling of complex transmission systems and the human auditory system by means of digital signal processing methods. Since 2003, he has been with SwissQual AG, Switzerland, and now an integral part of Rohde & Schwarz, heading the Applied Research department and is a member of the senior management team. He is responsible for defining cross-platform test cases, audio/video signal analysis, objective quality prediction methods in telecommunication network and QoE metrics. His work contribute to several ITU-T standards, as P.563 and P.863, both for voice quality production. In 2011, ITU-T standardized J.341, for objective HDTV quality prediction, and in 2014 J.343.1 for no-reference video quality prediction, both submitted by SwissQual/Rohde & Schwarz. For the past 16 years, Jens has lead the working group. “Perceptual-based objective methods for voice, audio and visual quality measurements in telecommunication services,” as Rapporteur in ITU-T SG12. He is further active in ETSI and other international standardization bodies.
Silvio Borer is a Team Leader Video Analysis, Rohde & Schwarz. He studied mathematics at the University of Zurich, Switzerland. He received a Doctoral Degree in science from the Swiss Federal Institute of Technology Lausanne (EPFL), Switzerland in 2004. He joined SwissQual/Rohde & Schwarz in 2006, and is an active member of ITU-T and the Video Quality Expert Group (VQEG).
Most popular articles in Cellular telecoms
Share this page
Want more like this? Register for our newsletter