Work Unit Validation and Credits at WCG

This document explains how WCG work units are validated and awarded credits. It is written from a BOINC user's perspective because BOINC users see more details in the reports and statistics made available to them on the WCG website. UD users will neverthelesss find the information in this document useful because the underlying concepts of the workgroup and the quorum apply to UD as well as BOINC.

This document also describes how BOINC users can view detailed information on their results at the WCG website and track the progress of the results they return. UD does not provide that capability and probably never will. If you use UD and want that capability then you should consider migrating to BOINC.

The Ideal Progression

In this section we will examine how the process is supposed to progress. Of course we know nothing works the way it's supposed to work all the time. Ocassionally there are errors but we'll ignore those for just a moment to keep the discussion simple. In the next section we will discuss what happens when there are errors.

First some terms you might not be familiar with. A WU is a work unit. Crunching is processing the WU. Crunching is basically just running a bunch of numbers through a series of complex calculations. A host is a computer that crunches.

When WCG issues a WU crunching, 3 copies of the WU are dispatched, 1 copy to each of 3 different hosts. The copies are often called iterations. The 3 copies are usually issued within a few minutes of each other and each copy starts with In Progress status. For example, a WU might be issued to 3 hosts named A, B and C. Think of A, B and C as belonging to a temporary workgroup. Each host in the workgroup crunches its copy of the WU and returns a crunch report to the server at WCG when it finishes crunching the WU. The report contains the CPU time required to crunch the WU, a claim for credit, whether the crunch succeeded or aborted with an error, plus other details we won't get into here. If the crunch aborted with an error then the host sends only the crunch report because the result is of no use to anybody. If the crunch was successful then the result is returned too.

If the 3 reports indicate all three crunches were successful then the 3 results are compared. If they match (and they usually do) then WCG is assured they have a valid result. Three matching results form "the quorum". When quorum is achieved the 3 results are promoted to Valid status and the result is sent to the research team for further analysis.

Obviously hosts A, B and C in our imaginary workgroup will probably not return their results at exactly the same time. The difference in return times ranges from a few hours to several days. Results awaiting the return of results from other hosts in the workgroup have Pending Validation status.

To summarise, the ideal progression is:

  1. 3 hosts receive a copy of the same WU
  2. all 3 hosts crunch the WU successfully and return their report and result
  3. the 3 results match, quorum is achieved
  4. all 3 results receive Valid status
  5. the result is submitted to the project scientists for further processing

Work unit status changes as the unit progresses through various stages. Work unit status will always be one of the following:

  1. In Progress
  2. Pending Validation
  3. Valid
  4. Invalid
  5. Error
  6. Inconclusive
  7. No Report
  8. Other

We have discussed the first 3 statuses in the above list and touched briefly on Error status. Next we look at Error in more detail and introduce Invalid, Inconclusive, No Report and Other.

When Things Go Wrong

In the previous section we discussed how WUs normally progress from In Progress status to Valid status. To keep the discussion simple we looked at the ideal scenario where nothing goes wrong. In reality one or more hosts might abort the WU due to a computing error while crunching. The building a host is housed in might catch fire and be destroyed. Also, a host's owner has the option of manually aborting any or all WUs on the host at any time.

Error Status

Let's say host A in our imaginary workgroup encounters a computation error while crunching a WU. Host A then aborts the WU and reports the error to the server. Host A's result receives Error status and the server issues another copy of the WU to host D which expands the workgroup to 4 members. Usually the 3 results from hosts B, C and D will match. If they do then B, C and D receive credits and their results are given Valid status. Host A will not receive credit because results with Error status do not receive credits. Aborting a WU manually also causes the host to report an error and that error is treated the same as a computation error.

Sometimes computation errors are due to a fault in the WU rather than a fault in the host computer. Faulty WUs usually cause an error on more than 1 host in a workgroup. Sometimes all hosts to which they are issued return an error. If a WU produces more than 12 errors it is pulled out of circulation and discarded or else repaired and reissued to a new workgroup.

At this point it should be clear that Error status is determined by the hosts that crunch the WUs. Error status is not synonymous with Invalid status. Invalid status is determined on the server as we will see shortly.

Late Status

Hosts are allowed 7 days to return a result. If a host passes the 7 day deadline then the server gives that result Late status and issues another WU to another host. Late results do not receive credits. There are several reasons why results are returned late and we will discuss those reasons later to help you avoid going past the deadline.

Inconclusive Status

Occasionally a workgroup returns 3 error free results before the deadline but the results do not match. In that case the 3 results are given Inconclusive status. Another copy of the WU is issued to another host with the hope that the result will match with 2 others and break the tie. Usually the 4th result does break the tie though on occassion a 5th or 6th WU must be issued to achieve a quorum. The 3 results that match are then given Valid status. Those that do not match receive Invalid status.

Invalid Status

Invalid status implies the host crunched the WU successfully (error free) but the result failed to match the results returned by the other 2 hosts in the Workgroup. At this time (July 2006) WCG is awarding credits for Invalid results for HPF2 WUs due to the fact that a bug in their recently launched HPF2 project is causing an abnormally high number of Invalid results. When that bug is fixed, WCG may return to their normal policy of not awarding credits for Invalid results.I am not certain if credits are awarded for invalid results for WUs from WCG's other projects.

No Report Status

No Report applies when a result was returned but the crunch report was not received before the 7 day deadline. The result cannot be validated without the crunch report therefore credits cannot be awarded.

Other Status

A very small percentage of results don't fall into any of the above categories so they end up here in this miscellaneous category. Results with Other status do not receive credit.

Determining Credits

BOINC and UD calculate the number of credits it thinks the host should receive for processing a WU. The number is referred to as "claimed credits". The amount claimed depends on the CPU time required to crunch the WU, CPU speed and other factors. The claim is part of the crunch report the host submits for every WU it crunches.

BOINC and UD award credits differently. UD awards hosts whatever they claim when they return a successful result and validates the results later. BOINC waits until after quorum is achieved and then awards the median of the 3 credit claims to all 3 hosts. For example, if host A claims 63, B claims 72 and C claims 50, the median (middle) value is 63 so all 3 hosts receive 63 credits.

Your total credits is reported on your My Grid page on the WCG website. Statistics on My Grid are updated every 6 hours. Credits earned since the previous update will not be included in the stats on My Grid until the next update. In other words, statistics on My Grid can be up to 6 hours behind so they may not agree with your actual totals.

Tracking BOINC Results at WCG

If you login at the WCG website and click My Grid->Device Manager->Results Status you will see a list of WUs that have been issued to you. The list may span more than 1 page. Initially the list is sorted by the date the WUs were issued to you with most recent WU at the top of the list. The list shows the status of your recent WUs. WUs drop off the list 4 days after they receive Valid status. Information on the list is current, in other words, it is the most recent data WCG has in its records. If you send in a crunch report for a WU your Results Status page will show the information from that report a minute or 2 later.

Click on the WU at the top of the left hand column. You'll get a pop-up window something like the image below. There you see the progress of the workgroup crunching that WU. Your computer is the one that is highlighted.

image 1

In the above workgroup, two hosts have finished crunching the WU and have returned a crunch report. Since both have Pending Validation status rather than Error status we know both crunches were successful. One host has not returned a report therefore its status is In Progress. As only 2 hosts have reported it's too early to have a quorum therefore the rightmost column shows 2 hosts have claimed credits but none have been awarded.

image 2

In this example, we know the 3 original hosts in the workgroup crunched the WU successfully because none have Error status. Unfortunately the first 3 results did not achieve quorum so the server would have given all 3 Inconclusive status. Then the server dispatched a 4th copy of the WU. The 4th result matched 2 of the original 3 results so quorum was achieved at that point. Now we see the result from my computer was the one that did not match because it has Invalid status while the other 3 are Valid. Note that all 4 hosts received 47 credits which is the median of 152, 47 and 34.

image 2

In this example, the workgroup has expanded from the original 3 hosts to 6 hosts. So far 5 hosts have aborted the WU, probably due to a computation error. They will not receive credits. The server will issue a maximum of 12 copies of the WU. If quorum is not achieved in 12 attempts the WU will be pulled out of circulation until the problem is fixed.