Home | Get FREE Tools | Forums Login

Tools

Are You Capable?

For as fast as things move in the PC industry, some things have remained essentially the same.

For over 20 years, the PC industry has evaluated its platforms using one principal measurement: speed. Faster is better, end of story. Or is it? Today’s PC can deliver capabilities that PCs of ten years ago could have only dreamed of. Today, many of the PC’s most exciting new applications deliver video and audio content all over the home. In this media-centric environment, a platform evaluation approach that obsesses only on speed simply cannot tell consumers what they really want to know: what can this platform do well for me?

The Intel® Digital Home Capabilities Tool (Intel® DHCAT) is an innovative platform evaluation tool for home media PCs that breaks new ground in the well-worked field of PC platform evaluation, providing a complete and intuitively useful view of the systems under inspection. Unlike existing tools, DHCAT measures system capabilities and reports its findings in terms that really matter to users, as functions of their own experience quality. While the tool will not be open-source, it has been and will continue to be developed transparently to the industry, with its scoring system and design guidelines exposed for public discussion and debate. This article is one step in that disclosure process.

Beyond the Drag Race Approach

The process that led to the creation of DHCAT began when engineers in Intel’s Performance Benchmarking and Competitive Analysis made two key observations. The first was self-evident: usage models for home media PCs are evolving rapidly as new types of media, content sources and playback devices proliferate. Second, the conventional drag-race approach (faster-is-better) to digital home platform evaluation has run out of gas.

The problem has been that the speed-focused approach to platform evaluation alone didn’t fully or accurately predict people’s experiences using those platforms. Furthermore, tests that focus on a small number of media functions didn’t capture the diverse capabilities of devices that often handle television, radio, video, photo and music content from many different sources and in many different media formats. Quantifying platform performance was challenging enough with conventional computing applications, but for emerging usage areas such as digital home there were simply no tools available that could provide meaningful measurements.

But the need for such tools has been urgent and growing more so. Hardware OEMs, software developers, analysts and consumers all needed a more sophisticated toolset and methodology for evaluating platform goodness.

This tool would need to adhere to these four tenets:

Rule One: Create methodologies to measure and report user experience quality that is rigorous, repeatable, and meaningful to users. The digital home is about seamless delivery of entertainment content; the success of that delivery should be the primary measure of platform goodness.

Rule Two: Measure speed only where it determines or strongly influences user experience quality.

Rule Three: Report results in terms of capabilities – what the system can, and can’t do well.

Rule Four: The tool and the tests it runs should be based upon recognized open industry standards.

Early last year, a development team at Intel® began meeting to tackle the problem of designing an assessment tool for digital home PC platforms that would follow this more experiential approach. While this approach may seem simple enough, actually implementing it was anything but simple.

Let's Get Constitutional

The DHCAT team quickly realized that creating an experience-based assessment model required initial consensus on basic principles. So the first order of business became the development of a “constitution” that would articulate all the foundation assumptions, concepts and principles that would guide the development effort. The full list of all 15 Guiding Principles can be found here.

While the “framers” of this document are all long-time benchmarking professionals, there is nothing conventional in the framework they created for media PC evaluation. The core tenet of the DHCAT constitution is that it faithfully express a platform’s goodness in terms of user experience quality, not as a speed-centric, bigger-is-better index that is ultimately meaningless to most consumers. The tool’s outputs will focus on actual capabilities, identifying things a system does well and those it does poorly, with particular attention to real-life tasks that are important to actual users.

As work progressed and the development team inevitably confronted difficult decisions, the constitution provided consistent direction for the form, content and functionality of DHCAT 1.5, becoming, in effect, the heart and soul of the new tool.

A Modular and Extensible Architecture

Because the digital home PC is a relatively new platform category, it will inevitably continue to evolve rapidly for some time. Thus, a useful evaluation tool must be designed to evolve as well. To provide that flexibility, the team adopted an architectural framework that is both modular and extensible.

The team also decided that the new tool would evaluate subject platforms by executing a complete range of typical user tasks. An initial group of use cases was selected, drawn from user applications recognized by the Digital Living Network Alliance* (DLNA). Each selected task was then broken down into its basic functional units and organized into a hierarchy of increasing difficulty. The team devised a structural model of user tasks composed of three elements: primitives, scenarios and usage groups.

  • Primitives are the basic units of media-handling functionality; single actions performed on a Digital Home platform. Examples include playing back a video, reformatting a video file, recording a TV show, and streaming video. Primitives are sorted into three groups according to the characteristics that will be used for measuring and scoring performance.
  • Video quality-based primitives are functions where successful execution is largely determined by the user’s perception of video image quality. The framework for quantitative evaluation of perceived quality is really the core DHCAT innovation, and is described in detail later.
  • Response time-based primitives. Speed does matter in some instances, and this group of primitives is evaluated primarily on execution speed.
  • Capability check primitives are units of functionality that are scored on a pass/fail basis, depending on whether or not they are supported in a test system. These capability checks will sometimes determine which scenarios a platform can and cannot run.
  • A Scenario is a group of up to four typical user primitives that DHCAT runs on each test platform in order to assess media handling performance. They are grouped into three categories.
  • Single-primitive scenarios are simple single-function tasks
  • Multi-primitive scenarios are more complex tasks that combine two or more primitives for concurrent execution. An example would be playing a video while simultaneously recording a TV show.
  • Multi-instantiation primitives can be derived from Single and multi-primitive scenarios. These video-based primitives can be executed with multiple codecs and file formats. “Play video” for instance, might be executed with MPEG*-2, Windows*Media Video*, QuickTime*, or DivX*, depending on which are present on the machine.
  • Usage groups are three groups that subdivide the current universe of 25 Digital Home scenarios.
  • Usage Group 1 – Basic: Most scenarios in the Basic level specify execution with standard-definition video content.
  • Usage Group – HD: Most scenarios in the HD level specify execution with high-definition video content.
  • Usage Group – Connected: All scenarios in the Connected level include a video streaming primitive to evaluate the test platform’s ability to deliver content locally.

Designed for Simplicity and Consistency

To ensure simplicity and ease of use, DHCAT is completely self-contained. All that’s needed to perform an assessment is a test platform and the tool’s installation DVD—nothing else. Because many test scenarios would require a TV signal input or a network target for outbound video streams, the development team created software implementations of a virtual TV tuner and digital media adapter (DMA).

The virtual TV tuner simulates how a TV tuner card would hand off an MPEG-2 encoded bit-stream to the system. However, DHCAT checks for the presence of a TV tuner card with hardware-based MPEG-2 encoding. If one is not found the system will score very poorly on DHCAT.

The virtual DMA provides a target device for an installed media server, and allows DHCAT to conduct streaming tests without being connected to a network. Unlike the approach with the virtual TV tuner, DHCAT does not check for the presence of a physical DMA on a network. These virtual peripherals allow the completion of a full test sequence without an external video signal generator, or physical DMA.

To simplify and standardize the tool’s interaction with the test system, the Intel® team designed DHCAT to abstract both the platform’s hardware and software elements. The tool will automatically detect and use whatever resources are present and running on the system, making it ideal for testing the “out of the box” experience.

DHCAT Scoring Methodology

The constitution also guided the development team as it created the DHCAT scoring system, one of the new tool’s most critical and sophisticated elements. Three very different assessment and scoring processes are applied to the different types of Digital Home video primitives. Scoring Video Quality-based Primitives

The most difficult challenge facing the development team was creating an assessment process that would replicate, capture and accurately quantify a user’s subjective judgment of video image quality. In essence, the problem was how to embed a virtual human jury in the tool itself.

As it turns out, the DHCAT team’s problem was far from unique. Research in the broadcast and telecommunications industries has produced a significant body of knowledge about the interaction between delivered video/audio content and human physiology, and what influences perceived quality of a media experience. Those insights underlie the technical discipline of perceptual modeling, the use of mathematical analysis and modeling techniques to accurately predict user judgments. The practice was used extensively in the development of new industry standards for high definition television. For more about how DHCAT uses perceptual modeling, please see this article on the Intel® Capabilities Forum.

Working with specialists at Intel’s User Centered Design Group, and Psytechnics*, a leader in the development of perceptual modeling techniques, the development team created perception models specifically for video quality in three separate Digital Home applications: video playback, recording and streaming. In each application, the model maps an objectively measurable performance attribute measured by DHCAT to a Mean Opinion Score (MOS) that reflects the quality score that human evaluators would have provided. Numerical MOS scores are broadly rated as either acceptable or unacceptable, with finer-grained quality grades ranging from poor to excellent.

  • Video Playback Quality is measured by playing a video file on the test system, holding frame quality constant. DHCAT measures frame rate variation, and calculating the root-mean square error (RMSE). The lower the RMSE, the higher the MOS score awarded by the perceptual model.
  • Video Recording Quality is measured by recording a reference video file on the test system, then using Psytechnics*’ video analysis tool, to compare the recorded and reference files to quantify degradation. The amount of measured degradation in the recorded file is then mapped to a MOS using the perceptual model.
  • Video Streaming Quality is measured by streaming a video file from the test system to the DHCAT’s virtual digital media adapter (DMA). One of the principal goals of DHCAT was that it be self-contained, so the DHCAT team created a virtual DMA that can receive a video stream from the tested platform’s video server. Every video file has an expected run length. However, if a system is overworked, the video server will not be able to deliver smoothly the video stream to a DMA. The likely result will be freeze-frames while the DMA waits for new video frames from the server. From a user study, Intel’s User Centered Design Group determined that on a 24-second video clip, people will tolerate about three seconds of freeze frames – increasing actual playback time to 27 seconds – before MOS scores drop off considerably. So DHCAT’s streaming video methodology looks at the actual playback time, and compares it to the expected playback time. The bigger the difference between the two, the lower the MOS score. Scoring Response Time-based Primitives

Only one of the primitives included in DHCAT v1.5 is scored on the basis of response time—Prepare video for a Portable Media Player (PMP). It is tested by encoding a 57-second video clip for transfer to a PMP. The encoding time is recorded and compared to the original file runtime, providing a ratio that is referred to as the speedup factor.

Scoring Capability Check Primitives

Six of the primitives included in DHCAT v1.5 are scored on a pass/fail basis, depending on whether or not that functionality is supported in the test system. The six current DHCAT capability checks are:

  • Listen to stereo audio
  • Watch your favorite TV show
  • Watch your favorite HDTV show
  • Record two of your favorite Standard Definition TV (SDTV) shows
  • Listen to High Definition Audio
  • Presence of a DLNA* v1.0-compliant media server.

Scoring Multiple Instantiations

DHCAT will run video-based primitives up to four times to accommodate the four supported video formats : WMV*, DivX*, MPEG-2 and QuickTime* -- if they are present on the test platform. Each instance produces a separate run score, all of which are recorded and included in aggregate scores. This provides higher aggregate scores for test systems that support more video formats, which enables a wider array of content to run on the platform.

The Platform Capabilities Score (PCS)

click on image for full-sized view

The Platform Capabilities Indicator (PCI): This is the first results screen you’ll encounter after a DHCAT run. The PCI provides a visual indication of how well the platform did in the three Usage Groups – Basic, HD and Connected. The more a given area is filled in green, the more of that Usage Group the platform was able to run well. The yellow bar below the “wavelet” shows the Platform Capabiliies Score (PCS).

How DHCAT Derives the Platform Capabilities Score (PCS)

The PCS consists of a combination of the scores from the two types of primitives: video quality tests, and response-time tests.

To summarize the rewarding of points in video quality tests:

MOS Score Achieved
Points Awarded
Excellent
4
Good
2
Acceptable
1
Poor/Fail
0

For each response-time based primitive, the ratio of the time of the video clip (57 seconds) to the actual time taken for encoding is recorded. This is termed the speedup factor.

For example, say an encode of the 57-second clip took 24 seconds. The speedup factor would be:

57 / 24 = 2.375

DHCAT then takes all speedup factors, and sums them to arrive at a total speedup factor raw score.

Let’s assume that System A has achieved the following scores on DHCAT:

Video Quality Rating
Number
Total Points
Excellent (4 points each)
16
64
Good (2 points each)
27
54
Acceptable (1 point each)
7
7
TOTAL:
125




Let’s also assume that for DHCAT’s 22 response-time tests the total speedup factor raw score is 36.4.

Here is how DHCAT arrives at its final score.

STEP 1: DHCAT adds the point-scores for the video primitives to derive a PCS-Grade score.

PCS-grade of System A = [ (Total Excellent) * 4 + (Total goods) * 2 + (Total acceptable) * 1 ]

= ( 16* 4 + 27 * 2 + 7*1 ) = 125

STEP 2: DHCAT adds the speedup factors for the time-based primitives to produce a PCS-speedup-factor score.

PCS-speedup-factor of System A = Sum of all 22 speedup factors = 36.4

STEP 3: DHCAT normalizes the PCS-grade of System A to the DHCAT Calibration System, which has the following components:

Intel® Pentium® D Processor 940, 3.2GHz
Intel® D955XBK motherboard
1GB of DDR2-667 RAM (2x512MB, dual-channel)
ATI* Radeon* x850 graphics card

Software Installed:
DivX* 6.1.1
Quicktime*
Cyberlink* PowerDVD* 6
Windows* Media Player 10

This calibration system is a “stake in the ground” to represent a high-end mainstream system in the market as of the beginning of 2006. Over time, the specs of this calibration system will evolve as new platforms come to the market. Note that the calibration system is meant to model a typical software load-out from a PC system maker. As such, the calibration system has DivX*, Quicktime*, Windows* Media Player and the Cyberlink* PowerDVD* player installed. It does not however have a streaming media server installed.

Normalized PCS-grade of system A = PCS-grade of system A / PCS-grade of calibration system

= (125/66) * 100 = 189.39

STEP 4: DHCAT then normalizes the PCS total speedup-factor of System A

Normalized PCS-speedup-factor of system A

= speedup-factor of system A / speedup-factor of calibration system

= 36.25/22.36*100 = 162.11

STEP 5: In order to arrive at the final PCS, DHCAT takes the geometric mean of the two normalized PCS scores.

PCS = sqrt(Normalized PCS-grade) * sqrt(Normalized PCS-speedup-factor)

= sqrt(189.39) * sqrt(162.11) = 175.22

DHCAT uses a geometric mean rather than an arithmetic mean (average) because the geometric mean places more equal weight on the numbers in the data set. We believe this scoring system brings a balanced approach to expressing overall digital home platform goodness. That said, we’re constantly working to improve this scoring methodology, as well as all aspects of the tool and how it runs its tests. Please post feedback in the Digital Home Section of our forums.

Secret Sauce: the Platform Capabilities Matrix

click on image for full-sized view

For a closer look at the performance details that underlie the PCS display, DHCAT users can click the detail button in the lower left corner to bring up the Platform Capabilities Matrix (PCM) screen. This is where the rubber meets the road, as the PCM clearly shows which scenarios a platform did very well, passably well, and not well at all. The PCM provides a graphic overview of the test platform’s performance at each capability level, and in scenario-by-scenario detail. Each scenario tested is shown with a color-coded display that indicates the performance achieved.

  • Green = Excellent
  • Yellow = Good - Acceptable
  • Red = Poor - Failure

Multi-primitive scenarios receive an overall grade equivalent to that of the worst performing primitive. Similarly, scenarios that are tested in multiple instantiations (to accommodate multiple video codecs) receive an overall grade equal to the worst performing primitive in the best performing instantiation.

Drill Down Deeper

click on image for full-sized view

Beneath the PCM display there is an additional level of detail available in an HTML file that can be viewed with any standard Web browser. Compete details are included here, down to the level of individual scores for each codec, for each task in each scenario.

click on image for full-sized view

Dare to Compare

DHCAT also makes comparing results from multiple systems a breeze with its Results Comparison Tool. With it, you can view results from multiple systems, and clearly identify which platform performs best across the board, and in specific areas.

A More Complete Picture

The Intel® Digital Home Capabilities Assessment Tool fills a crucial gap in the PC industry’s platform evaluation arsenal. DHCAT gives platform evaluators a powerful instrument to gauge digital home platform goodness, and express to consumers what capabilities a digital home platform can deliver. DHCAT provides a comprehensive testing and reporting solution that addresses the full range of today’s home media applications, on an extensible, standards-based architecture that will adapt and expand apace with platform innovation. By integrating perceptual modeling to capture and quantify experiential quality, DHCAT offers a far richer and more nuanced view of system capabilities, and makes user experience the central metric.

Version 1.5 of the tool is available now, and the DHCAT team is already working on version 2.0, due out in the second half of 2006. For more information about DHCAT, and to order your free copy, please visit www.intelcapabilitiesforum.net. In an ongoing effort to make the best possible testing tool, the DHCAT team welcomes feedback and participation in defining future versions on the site’s forums.

User experience is at the heart of all digital media devices. DHCAT is the industry’s first testing tool that goes straight to the heart of the matter.

Don't forget you can get your very own free copy of the Intel® DHCAT right here at ICF. Test your own system. Post your findings here at ICF. Amaze your friends and neighbors. Just go to the order page, punch in your info, and a DVD will be on its way to you.

* Other names and brands may be claimed as the property of others.

The 15 Guiding Principles of Intel® DHCAT

  1. In Intel® DHCAT 1.5 a scenario can be comprised of a maximum of four primitives. From this universe of four primitive scenarios, DHCAT applies the following guidelines to select the final set of scenarios:
  2. Remove all scenarios that are practically or ergonomically impossible to achieve on a digital home platform. For example, watching 2 video streams at the same time.
  3. All scenarios will have a primitive that displays video (e.g. Playback or Live TV) while the scenario is tested.
  4. Remove duplicate scenarios that perform the same work from a benchmark’s perspective. For example, Watch TV on channel 1 & Record on Channel 2 vs. Watch TV on Channel 2 and Record on Channel 1.
  5. In DHCAT 1.5, the maximum number of Record TV primitives in a scenario is set to two.
  6. In DHCAT 1.5, only one Record HDTV primitive can exist in a scenario.
  7. In a scenario with multiple instances of a primitive, all instances have the same characteristics. For example, in a scenario that involves two Record TV primitives – the target recording format will be the same (MPEG-2, Windows*Media, or DivX). The scenarios are then classified into three Usage Groups – Basic, HD and Connected. All scenarios in the Basic group have at least one DH primitive that uses standard definition video or audio. In the HD group all scenarios have at least one primitive using High Definition video or High Definition audio. The Connected category’s focus is to measure the users’ experience in streaming videos from the test PC to a DLNA compliant networked media player device.

    After applying the above guidelines, Intel® DHCAT 1.5 constitutes the following 25 scenarios.

  8. A scenario will be run in all formats for which codecs are present on the test system. (DHCAT currently supports MPEG-2, Windows* Media Video, QuickTime*, H.264 and DivX* codec formats) Example –Play Video scenario will have five instances - - Play Video in MPEG-2. - Play Video in Windows*Media. - Play Video in DivX.
  9. In scenarios with multiple DH primitives, all primitives will utilize the same codec stack.

    Example –Play Video and Record to High compression scenario will be tested in two instances:

    - Play Video in Windows*Media and Record TV in Windows*Media Video format.

    - Play Video in DivX*and Record TV in DivX*format.

  10. All scenarios defined in the Connected Usage Group will be measured based on the test system’s capabilities to extend media to a DLNA compliant networked media player device.
  11. The networked media player device emulated in IDHCAT to test the Connected scenarios is based on a device that conforms to the mandatory requirements in DLNA Home Networked Device Interoperability Guidelines 1.0 .
  12. Intel® DHCAT will test a scenario only if all the hardware and software platform components required for the user to actually perform the scenario are present on the test system. The following table lists the platform requirements verified by Intel® DHCAT for each scenario.
  13. In calculating the Platform Capabilities Score, a video-based primitive with an Acceptable user experience level is given a score of 1. Primitives which are performed at Good and Excellent experience levels are given a score of 2 and 4 respectively.
  14. In a scenario with multiple primitives, the overall experience grade of the scenario is equal to the lowest of the experience grades achieved by the primitives in the scenario. For example, the overall user experience grade of Play Video and Record TV scenario will be Acceptable, if the Play Video achieved an Excellent grade and the Record TV achieved an Acceptable experience level.
  15. In cases where a scenario is tested in multiple instances (e.g. based on content formats), the overall experience grade of the scenario is equal to the experience grade of the highest experience grade achieved by the scenario instances. For example, the overall user experience grade of Play Video scenario will be Excellent, if the three instances - Play Video in MPEG-2, Play Video in Windows*Media, and Play Video in DivX*– achieve Excellent, Good, and Acceptable grades respectively.

Discuss this article!