Home | Get FREE Tools | Forums Login

Tools

First Installment

Measuring Video Quality

I hope that you enjoyed the first entry in Perceptual Matters and I encourage you to post questions related to any perceptual area of research since my intention is to cover everything from video to tactile thermals at some point.

For this edition of When Humans Collide with Technology, I would like to focus on Intel® DHCAT.

One of the most challenging problem spaces we face today is conveying what is good media experience. In my last post I mentioned a few basic questions that consumers are faced with. To better understand this, it’s useful to break the media experience into a few manageable chunks.

The first and most basic chunk is video encoding technologies. The broadcast industry uses a standard format of video encoding that was defined for viewing content on your television at home. It was named after the group that developed it, the Motion Pictures Experts Group, or MPEG. With the introduction of digital technologies came a flood of new codecs, which made video more pervasive since it was easier to distribute. In order to understand the origins of video quality measurement, we’ll initially ignore the introduction of computers and digital signal processing and all of the advances made in the digital revolution. First, let’s consider the analogue television environment where processing was limited to editing machines with no digital components.

The Mechanics

Ideally, to complete an accurate measurement of video quality, you need access to both the processed and source frame information.

Figure 1 – Example of Source and Processed Video

These measurements are called full reference metrics. In the past, measurements of television quality were done using a calculation called Peak Signal to Noise Ratio (PSNR). This measurement was only for images where the signal got processed entirely in the analogue domain. The calculation uses information in the luminance data of the processed video frame, and then compares that to the original. It’s essentially a ratio between the maximum power of a signal and the power of the degrading noise that will affect the quality of the image representation. For a full definition of PSNR you can go to Wikipedia http://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio and look at the calculation formula. The higher the PSNR value the better the image quality, or so the theory goes.

Once the industry migrated to mixed environments using digital and analogue processing, the measurement tools had to evolve as well. New tools were developed by several companies and institutions, and you can find a lot of the research and technologies among the links provided by the Video Quality Experts Group website http://www.its.bldrdoc.gov/vqeg/links/links.php Once standards bodies like the International Telecommunications Unit (ITU) tested these new tools to determine which would be standardized, full reference methods fell from favor.

The DHCAT was designed to translate performance of a media PC into simple, experiential terms that anyone can understand without extensive interpretation. When we started out, there were already tools in the industry for codec quality, where the primary degradation in the video is image compression artifacts. But there weren’t any that translated this into an expected end-user experience that would include things like the effect of dropped or repeated frames. So, our design team approached this from two perspectives: compression and experience.

Compression Perspective: Since technologies existed in the industry that were standardized, Intel worked with Psytechnics* and implemented their technology to measure recorded video quality. The technology that was implemented for this is described at http://www.psytechnics.com/site/sections/products/pva.php. We test a platform’s capabilities to record video, a Live TV program is recorded over a virtual tuner card created on the system under test and is written to disk using mpeg2, wmv, and DivX formats. More information on video formats can be found here http://www.xilisoft.com/multimedia-glossary/3GP-AVI-MPEG-MOV-RM-WMV-DivX-XviD.html The recorded video is compared to the source using the Psytechnics software and the results are published from the integrated module.

Here is the Psytechnics Video Agent tool inaction, comparing the reference video on the left with the one recorded on the platform being tested on the right.

Imperfections or flaws in video are called artifacts, and these artifacts can range from subtle to very noticeable. Here are some examples.

There are a few classes of artifacts that can impact visual quality that can be measured with objective tools.

Figure 2 Susie – compressed to 2Mbs showing fine grain blocking artifacts.

Figure 3 – Visible artifacts measurable with tools like Psytechnics*

Several Perspectives

Experience Perspective: Since there are repeatable ways to measure the amount of information or the number of frames being written out from a PC video frame buffer, measuring video playback was an interesting yet challenging design. Challenging because previous methods rely upon expensive measurements devices whose price tags routinely top $20,000. We wanted DHCAT to be self-contained, so we had to find another way to take this measurement from within the system.

Experience can take many different hits, for the case shown in Figure 2, a stream was being broadcast from a host system over a local wireless network to a remote corner of a house and decoded and displayed on a client system. From the time-code stamp you can see that the video was stalled for over seven seconds. This would mean that the viewer saw a frozen frame during that time period, which the viewer would rate poorly. Now to measure this objectively, we collect program statistics (like the test video’s actual run-time of the video its expected run-time), and correlate that to end-users experience.

Figure 4 – Video Stall for WMV playback

Using the instantaneous frame rates, we calculate the aggregate deviation (Root Mean Square Error, or RMSE) from the expected frame rate of the video. In English, if we were expecting 30 fps stream but only measure 26 at times and then an overflow of 31 at other times these are the values that are captured and summed. The Media and Acoustics Perception Lab conducted a suite of end-user research and experimentation with fixed FPS RMSE values, and mapped these to users’ mean opinion score, the average of all participants’ rating of each condition. From these data points, we developed a perceptual model, and integrated it into DHCAT so that mapping of the measured data would automatically report the subjective experience. This was the first research in the field of video quality measurement to map directly the measured RMSE to real user experiences. What’s powerful about this technique is that we found that RMSE had high predictive power about how people rate video quality. In other words, an objective measurement, RMSE, can predict real people’s subjective opinions about video quality.

We used a similar approach for measuring streaming video quality in DHCAT. First, DHCAT installs a virtual digital media adapter (DMA) on the system under test. Think of a DMA as a media “thin client,” that lives in your entertainment center, and can play back video, audio and display photos from PCs on your home network. As with the playback measurement methodology, we wanted DHCAT to be self-contained, so we created a virtual DMA that installs in the system being evaluated.

DHCAT streams video from the system being testing to the virtual DMA on the same system. The statistics of the video frames received into the DMA with the time stamps are collected and DHCAT calculates what’s called FPS Error. A 30-second video clip should take 30 seconds to play. Not 25. Not 35. 30 seconds. If a video takes longer to play then the actual run-length of the clip, that means you would have likely been seeing a frozen video frame on the DMA while the server and DMA tried to synch back up and restore playback. Our end-user testing showed that, not surprisingly, viewers don’t like to see a lot of freeze-frames in their video. Similar to the Playback usage, our team did end-user research and experimentation with fixed FPS Error values, and created a mapping of the mean opinion score (MOS) - an average over all participants for each condition. From these data points, we developed a perceptual model, which we integrated into DHCAT such we can automatically map the measured data to end-user subjective experience, and report the system capabilities in terms of the quality of experience.

When you look across these three modules it is a very nice way to package up and report out the end-user experience.

In my next blog entry I will dive into how we actually determined and implemented the perceptual models discussed here.

When Humans and Technology Collide



The field of Human-Computer Interaction (HCI) has been a critical catalyst in driving computer technology forward. The coolest, most breakthrough technology isn’t worth a thing if people can’t figure out how to use it, and can see the value of it.

Today, we launch another blog, this one about the field of perceptual modeling, which is at the heart of DHCAT 2.0. It’s authored by Philip Corriveau, an experimental psychologist and technologist working in Intel’s User-Centered Design Group in Oregon.

Phil’s work and expertise spans video quality, acoustics studies, as well as constructing perceptual models based on detailed user studies.


First, some introductions. I work at Intel in two capacities: I am Intel’s “Golden Eye” for Media Quality, and I manage a team of a dozen researchers who study a wide range of areas. These include Intel’s Media and Acoustics Perception Lab (MAPL), located in Hillsboro Oregon. There’s also a team focused on User Experience Assessment and another team that focuses on the Usage-to-Requirements translation. This blog will focus on the work the MAPL group has done, and continues to do, which is an interesting blend of technology-based research and basic human perception studies.

Cognitive science and experimental psychology have wide-ranging applications, many of which extend beyond hard core science realms. I am here to introduce you to an entirely new application for this area, which most term “the behavioral sciences.”

Before we dive into the depths of what MAPL does let me tell you a bit about me. I grew up in Ontario Canada, and attended Carleton University in Ottawa, where I got my degree in Bachelors of Science Honors in Psychology in 1990. I started my career with the Communications Research Centre where I began my quest to improve the end-users experience primarily focused first on video quality. The work focused on conducting thousands of tests with real end-users, everyday people, to determine what technology would be adopted for the next North American High Definition (HDTV) Standard. A lot of people are totally unaware that years of work and testing took place to ensure that the standards set now called ATSC was “good enough” for the North American viewing public. The type of testing used in this process is called subjective assessment.

Subjective assessment describes the use of a human end-user to provide feedback in a controlled way on their perception of the quality. There are two classes of subjective assessment, expert and non-expert. With expert assessments, people concerned with the technology are used in the evaluations; for example I am considered to be an expert viewer for video quality. Non-expert assessments are where we use actual intended end-users of the technology who don’t have technical knowledge about video quality or display technology. Researchers use a screening process to be sure that the users evaluated are not disguised experts.

Along with the standardization of HDTV in the early 1990’s, the quest of two related video activities were spurred. One was the development of new codec technology that could provide the same level of “Quality” using fewer bits that the current MPEG-2 brought to the table, and the second much more controversial topic was development and standardization of objective tools that estimate a persons expected visual experience. I was heavily involved in both activities from the end-user side, assessing the elusive value around what we call User Experience (UX).

In 1997, I was a founding member of a consortium that brought together core video from various companies and countries called the Video Quality Experts Group (VQEG, www.vqeg.org). New software full reference evaluation tools were proposed under the category of Video Quality Metrics tools and these are now standardized in the ITU-J144 standard. These tools were meant to displace analog test tools like Peak Signal to Noise Ratio (PSNR) calculations, where testing was simple. Since PSNR was no longer valid in the digital domain, some group had to verify the usefulness of the new tools. The group’s mission is to assess the usefulness and accuracy of these proposed tools and how they should be tested. As an expert in subjective assessment, I co-chaired this group for its first nine years. In 2006, I stepped down to return to my work studying human perception here at Intel.

Intel pays a lot of attention to the perceptual experience that customers get with our products. So much so in fact that they built a state-of-the-art facility in Hillsboro Oregon in 2001 specifically designed to provide a controlled environment for testing of Video, Audio, A/V Sync, Acoustics, Voice and to break new ground in the areas of tactile thermal and pure basic perceptual research. I have been managing and working at this facility for the last six years, driving home the importance of UX for aspects of Intel platforms and products that we make.

When a product reaches end-users, the worlds of technology, software, hardware and the end-user collide headlong into one another. That is not the moment for the product maker to realize that their video quality looks like a snowy day with the antenna turned the wrong way, now is it? And assuming you do achieve “good enough” video quality, how can you demonstrate or articulate that to people in ways that are easy to understand.

For the first few installments, we are going to really zoom in on video-related activities, research and requirements. Since my group’s research in the past few years has centered on understanding and quantifying the video experience, this makes for a logical starting point. So let’s dive in…

First, it is never too early to start considering, understanding and measuring the “good enough” experience. There are many technical terms that are used in the industry like “Just Noticeable Difference”, “Good Enough” and others that can be extremely misleading if taken out of context. Here are some simple questions to consider:

Have you truly looked at the quality of your TV, cable, satellite or DVD player?

Do you know what the perceptual difference should be between your current TV and a new HDTV?

Did you know that there are tools available that are designed to help translate performance to user experience? A great example of one that I worked on is DHCAT 2.0, which you can order from ICF for free. Two Video Quality Metric (VQM) tools are available from the Institute for Telecommunication Sciences, and it can be found here.

Did you know that there are limits to the human perceptual system dealing with resolution?

In the next blog we will dive into the playback portion of DHCAT 2.0, where my team conducted the first-ever experiments at Intel to truly map a measurable quantity to user experience.

Got questions? I encourage you to head over to the ICF forums and post them there.


Discuss this article!