Why Run a Benchmark with each Work Unit?
When each work unit is sent, very little is really known about that computer. Since the last work unit was sent to that computer, the memory or CPU clocking or even the CPU could have changed. People could even be using something referred to as "SneakerNet" as discussed in the WCG forums to run that WU on some other computer that is not even connected to the net. In some cases, people have changed out the whole computer and transfered the agent to a new computer where everything is different. Additionally, the last work unit may have run while the owner was sleeping and hence run at full speed. The current WU might be running while he/she is playing a game that is taking a large percentage of the CPU and memory. Or surfing...
In short, each WU on a computer can be running in the same dynamic environment as the last... or a completely different one. Since points (we must keep score, right??) are calculated based on runtime and resources consumed, then the rating (and consequently the points) for a particular computer on this WU can be very different than they were for that same computer on the last. So, a test or benchmark is run with each WU to estimate the total true consumption of that WU in its then current dynamic environment.